FRM Level 1 Part 2 Flashcards
chapter 1: probability
1. random event and probability
basic concept of probability
- outcome and sample space
2. relationship amount events mutually exclusive events (互斥事件) exhaustive events(便利事件) independent events(独立事件) the occurrence of B has no influence of the occurrence of A
the type of probability
Joint probability is the probability of two events occurring simultaneously.
Marginal probability is the probability of an event irrespective of the outcome of another variable.
Conditional probability is the probability of one event occurring in the presence of a second event.
unconditional probability
p(A)
conditional probability
p(A|B)
join probability
p(AB)
two important rule
multiplication rule
p(AB) = p(A|B)xp(B)
if they are independent
p(AB) = p(A)xp(B)
additional rule
p(A+B) = p(A) + p(B) - p(AB)
if mutually exclusive
p(A+B) = p(A) + p(B)
- discrete and continuous random variable
discrete random variable
number of possible outcomes can be counted
continuous random variable
it can take on any value within a given infinite and finite range
P(X=x) = 0 even thought event X can occur
provability density distribution:
discrete random variable (probability that a discrete random will take on the value X)
continuous random variable
PDF 即 f(x), 他表示X 对应的函数值
p(x1<=X<=x2)(区间与PDF围成的面积即为概率)
cumulative distribution function concept: the probability that a random variable will be less than or a given value ----> F(x) = F(X <= x) characteristic: 单调性递增 有界性:x 逼近负无穷为0,正无穷为1 P(a
Chapter 2 Bayesian Analysis
Total probability theorem
if A1,….,An are mutually exclusive and exhaustive
p(B) = the sum of p(Aj)p(B|Aj) from j = 1 to n
Baye’s Theorm
p(A|B) = p(B|A)/p(B) x p(A)
p(A|B): updated probability 后验概率
p(A): prior probability 先验概率
p(B)用全概率公式计算
3.Basic Statistic
arithmetic mean population mean mu = (the sum of Xi from i=1 to N)/N sample mean x = (the sum of Xi from i=1 to n)/n
median
the middle item of a set of items sorted into ascending or descending order
odd (n+1)/2 and even n/2
mode
most frequently occurring value of the distribution
expected value definition E(x) = X1*p(X=x1)+...+Xn*p(X=xn) properties: if c is any constant, then E(cx + a) = cE(x) + a E(X+ Y) = E(X) + E(Y) if X and Y are independent random variables, then E(XY) = E(X)*E(Y) E(X^2) != [E(X)]^2
- dispersion
variance for data:
population variance
sample variance
standard deviation
population standard deviation
sample standard deviation
variance for random variable formula: var(x) = E[(X - mu)^2] = E(X^2) - [E(X)]^2 properties: if c is any constant Var(X+c) = Var(X) Var(cX) = c^2 *var(X)
if X and Y are independent random variable, then
Var(X+Y) = Var(X) + Var(Y) + 2Cov(X,Y)
Var(X-Y) = Var(X) + Var(Y) - 2Cov(X,Y)
平方根法则: 期初债券持有量应为(n-1)Y/n,现金持有量为Y/n;期末债券持有量为0,现金持有量为Y/n。平均债券持有量就是(n-1)Y/2n。那么考虑如下的最大化过程(持有债券收益的最大化,控制变量为变现次数n) Max(n-1)Yr/2n-nb 一阶条件为Yr/2n^2-b=0 解得n=√(Yr/2b) 其平均的现金持有量为Y/2n,将上述结果代入得 Md=√(Yb/2r)这就是平方根法则的数学表示,也是鲍莫尔模型的最大化结果
covariance
definition:
the relationship between the deviation of two variables
Cov(X, Y) = E[X - E(X)][Y - E(Y)] = E(XY) - E(X)E(Y)
properties
1. Cov(X, Y) 是正无穷到负无穷
2. if X and Y are independent, then E(XY) = E(X)E(Y) and cov(X, Y) = 0
3. if X = Y, then Cov(X,X) =E[X - E(X)]E[X - E(X)] = var(X)
4. Cov(a+bX, c+dY) = bdCov(X,Y)
5. Var(w1X1, w2X2) = [w1^2]Var(X1) + [w2^2]Var(X2) + 2w1w2Cov(X1,X2)
w1 与w2 分别为X1 与X2 的权重
correlation
definition:
linear relationship between two variable px,y = Cov(x,y)/std(x)*std(y). p 区间在-1 到1 and it is no units
properties
p = 0 indicate the absence of any linear relationship, but perhaps exist non-linear relationship
the bigger of the absolute value, the stronger linear relationship
correlation coefficient with interpretation
p = +1 perfect positive linear correlation
0<p></p>
- Skewness
definition
how symmetrical the distribution around the mean
skewness = E[(X - mu)^3]/std^3
properties
symmetrical distribution:
Skewness = 0
positively skewed distribution (right skew): Skewness>0
outlier in the right tail———-mean>median>mode
negatively skewed distribution (left skew):
Skewness < 0
outlier in the left tail: many financial asset exhibit negative skew (more risky)———-mean
- Kurtosis
definition: the degree of how many weight on extreme points in the tail
Kurtosis =E[(X - mu)^4]/std^4
leptokurtic mesokurtic platykurtic kurtosis >3 = 3 <3 excess kurtosis >0 =0 <0
chapter 4
1. discrete probability distribution
Bernoulli distribution:
definition: trail produces one of two outcomes (success or failure)
properties
E(X) = p1 +(1-p)0
Var(X) = p*(1-p)
Binomial Distribution
definition: The distribution of binominal random variable which is defined as the number of success in n Bernoulli trails
properties
The probability is constant for all trails
the trail are all independent
E(X) = np
Var(X) = np(1-p)
p(x) = P(X = x) = n!/[(n - x)!x!] * p^x(1-p)^(n-x)
as n increase and P —–> 0.5 approximate normal distribution
Poisson distribution
definition: used to model the occurrence of events over time
properties:
f(x) = P(X = x) = (v^x*e^(-v))/x!
v —> the average or expected number of events in the interval
X —> the success number of number of events in the interval
continuous probability distribution
uniform distribution
definition:
the probabilities for all possible outcomes are equal
graph: probability density function f(x)= 1/(b-a) for (a <= x <= b) 0 for otherwise cumulative probability function F(x) = 0 for x <= a (x - a)/(b - a) for a < x < b 1 for x >= b
properties
E(x) =(a + b)/2 Var(X) = (b - a)^2/12
For all a <= x1 <= b: P(x1<=X<=x2) =(x1-x2)/(b-a)
standard uniform distribution: a = 0, b = 1
normal distribution
properties:
completely described by mean and variance
X~N(mean, variance)
Skewness = 0, kurtosis = 3
1. linear combination of independent normal distributed random variables is also normally distributed
2. probabilities decrease further from the mean. But the tails go on forever
some special data
68% confidence interval is [X - 1std, X + 1std]
90% confidence interval is [X - 1.65std, X + 1.65std]
95% confidence interval is [X - 1.96std, X + 1.96std]
98% confidence interval is [X - 2.33std, X + 2.33std]
99% confidence interval is [X - 2.58std, X + 2.58std]
normal distribution —–when mean = 0, std = 1 ———————–>standard normal distribution (standardizing steps Z = (X-mean)/std)
lognormal distribution
definition:
lnx is normal, x is lognormal
Y is normal, e^y is lognormal
properties
if lnX is normal, then X is lognormal
if lnX is normal distribution, e^x is lognormal distribution
chart
Right skewed
Bounded from below by zero
Sampling distribution
student distribution
definition:
if Z is a standard normal distribution and U is a chi-square variable with K degrees of freedom, which is independent of Z, then the random variable X follows a t-distribution with k degree of freedom X
X = Z/sqrt(U/K)
Z: standard normal variable
U: chi-square variable
K: degree of freedom
tips:
chi-square variable could be the sum of squares
Y = S1^2+…+Sn^2
where S1,…, Sn are independent standard normal random variables
properties:
1. symmetrical (bell shaped), skewness = 0
2, defined by single parameter: degrees of freedom (df), and df = n -1, where n is the sample size
3. comparison with normal distribution
fatter tails
As df increase, T-distribution is approaching to standard normal distribution
Given a degree of confidence, t-distribution has a wider confidence interval
As df increase, t-distribution is becoming more peaked with thinner tails, which means smaller probabilities for extreme values
Chi-Square (x2) distribution
definition:
If we have k independent standard normal variables, Z1,…,Zk, then sum of their square S, has a chi-square distribution
S = Z1 +… +Zk
k is the degree of freedom (df = n-1 when sampling)
properties
Asymmetrical, bounded below by zero
As df increase, converage to normal distribution
if the sum of two independent chi-square variables, with k1 and k2, degree of freedom will follow chi-square distribution, with k1 + k2 degrees of freedom
F-distribution
definition:
if U1 and U2 are two independent chi-squared distribution with K1 and K2 degrees freedom, then X follow an F-distribution
X = (U1/K1)/(U2/K2)
properties
As K1 and K2 —–> 正无穷, F-distribution approach normal distribution
if X follow t(k), then X^2 has an distribution: X^2 has an F-distribution
when sampling, df are n1-1 and n2-1
X^2 ~F(1,k)
degree freedom
Df =N−1
Df = degree of freedom
N = sample size
chapter 5 confidence interval and hypothesis testing
1. point estimation
statistical inference:
making forecasts, estimates or judgments about a population from the sample actually drawn from that population
Getting sampling from sampling population
Sample statistic use the sample to estimate parameter
so, we can use population to directly parameter
sample mean & Sample variance
sample mean: s = the sum of (Xi - mean) / (n)
E(X) = mean
var(X)= var/n
sample variance: s = the sum of (Xi - mean) / (n-1)
central limit theorem:
假设条件: simple random sample (即i.i.d), 方差有限不为0,样本量>30
结论 X~N(mean, var(x))
properties of estimator:
unbiased:
the expected value of the estimate equals the parameter
efficient (Best) var(population of X) <= var(sample of X)
if variance of estimator smallest amount all unbiased estimator
consistent: n 越大,参数估计约准确
linearity
- confidence interval
point estimate +or- reliability factor x standard error
known population variance
x +or- Z a/2 sigma/sqrt(n)
unknown population variance
x +or- t a/2 sigma/sqrt(n)
CI with known and unknown population variance
sampling from a: reliability factor
distribution variance small s(n<30) large(>30)
normal known z-statistic z-statistic
normal unknown t-statistic t-statistic
Non-nor known Not available z-statistic
Non-nor unknown Not available t-statistic
Factor affect the width of confidence interval
change in Factors for z-distribution for t-~
Larger alpha smaller smaller
larger n smaller smaller
larger df N/A smaller
larger s larger larger
z-distribution
(x-mean) / standard deviation
Hypothesis test
Null hypothesis —–> Ho
Alternative hypothesis —–> Ha
我们通常将想要的结果放在备择假设
page 7
one tail test vs. two tailed test
type I error vs. type II error
type I error
rejecting null hypothesis when it is true
the probability of making a type I error is equal to alpha, also known as the level of significant of the test
type II error
failing to reject the null hypothesis when it is true
the probability of make a type II error is equal to beta
Power of the test: rejecting the null hypothesis when it is false. the probability of power of test is equal to 1 - beta
summarize page 7
test of population mean and variance
page 8
summary of hypothesis testing
1. mean hypothesis s testing:
1.1 normal distributed population, known as population variance
mean = mean0 (mean of null hypothesis)
Z = (sample of mean - mean0) / [std/sqrt(sample size)]
tip: std = sqrt(population variance)
N(0, 1) normal distribution
1.2 normal distributed population, unknown population variance
mean = mean0 (mean of null hypothesis)
t = (sample of mean - mean0) / [s/sqrt(sample size)]
tip: s = sample variance
tn-1 t-distribution
- variance hypothesis s testing
2.1. normal distributed population
variance = variance0 (variance of null hypothesis)
X^2 = (n-1) x sample variance /variance0 (variance of null hypothesis)
X^(n-1) chi-square distribution
2.2 two independent normal distribution populations
variance of first normal distribution populations = variance of second normal distribution populations
F = sample variance of first one/ sample variance of second one
F(first n - 1, second n -1) F distribution is the ratio of the two independent chi-square distribution
decision rule
1. p-value
definition: the smallest significant level at which the null hypothesis can be rejected
decision rule: rejecting the null hypothesis if p-value <= alpha (单双尾规则一样)
2. 样本统计量 > critical value, 此时就拒绝原假设
Chapter 6: Confidence interval and hypothesis testing
1. regression equation
population
Yi = Beta0 + Beta1Xi +ui
Y: dependent (explained) variable, regressand (回归子)
X: independent (explanatory) variable, regressor (回归元)
Beta0: regression intercept term
Beta1: regression slope coefficient
ui: error term (residual term)
sample
- Ordinary least Square (OLS)
assumption
E(ui|xi) = 0
all (X, Y) observation are independent and identically distributed (i.i.d.)
large outliers are unlikely
principle
minimize the squared residuals (error term)
formula
beta1 = Cov(X, Y)/var(X)
beta0 =Y - beta1 * X
因为回归曲线一定通过(X, Y)
- measure of fit
coefficient of determination (R^2)
R^2 = ESS/TSS = 1 - SSR/TSS
for i in n
Total Sum of Square (TSS) :
TSS = sum[(Yi within population - population mean)^2]
Explained Sum of Squares (ESS)
ESS = sum[(Yi within sample - population mean)^2]
Residual Sum of Square (SSR):
SSR = sum[( Y within population - sample mean)]
characteristic:
R^2 range between 0 and 1. near 1 indicate X is good at predicting Y
for one independent variable: R^2 = px,y^2
standard error of regression
identification:
an estimator of the standard deviation of the regression error ui
formula: SER = sqrt(SSR/[n-2]) = sqrt(the sum of ui^2/[n-2])
judgement:
该指标越小越好
chapter 7: testing hypothesis and confidence intervals in single regression
- testing hypothesis about coefficient and confidence interval
null hypothesis and alternative hypothesis
H0: beta 1 = beta 1, 0 (the hypothesized value)
如果 beta1,0 = 0, 则为显著性(significant)测验
t-statistic:
t = (sample of beta1 - beta1,0)/SE(sample of beta1) 自由度为n-1
decision rule:
reject H0 if t-statistic > t critical or t-statistic < -t critical
p-value < alpha
the meaning of reject H0
regression coefficient is different from the beta1,0, given a level of significant alpha
common format for regression result
test score = 698.9 - 2.28 ClassSize
R^2 = 0.051 SER =18.6
Low R^2 not imply that this regression is good or bad. But it tell us that there has other important factors
page 9 ????
- Binary /Dummy/indicator variable
identification it take on only two value, 0 or 1 formula Yi = Beta0 + beta1Di + ui Di = 0 or 1 Beta0 indicates E(Y|Di = 0) Beta1 indicates E(Y|Di = 1) - E(Y|Di = 0)
- Homoscedasticity and heteroskedasticity
Homoscedasticity
Var (ui|X) = σ^2
This means that the variance of the error term ui is the same regardless of the predictor variable X.
- Homoskedasticity occurs when the variance of the error term in a regression model is constant.
- If the variance of the error term is homoskedastic, the model was well-defined. If there is too much variance, the model may not be defined well.
- Adding additional predictor variables can help explain the performance of the dependent variable.
- Oppositely, heteroskedasticity occurs when the variance of the error term is not constant.
heteroskedasticity
If Homoscedasticity is violated,
e.g. if Var (ui|X) = σ^2*(X), then we say the error term is heteroskedastic.
consequences
1. the OLS estimator is still unbiased, consistent and asymptotically normal. But not efficient
2. influence the standard error of the coefficient
if standard deviation of coefficient small, t-statistic large, Type I error
if standard deviation of coefficient large, t-statistic small, Type II error
how to deal
calculate robust standard errors
use weighted least square (WLS)