MTH2006 STATISTIC MODELLING AND INFERENCE Flashcards

(85 cards)

1
Q

cumulative distribution function (cdf) of a random variable Y

A

F_Y(y) = Pr(Y < y) where y belongs to the range space of Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

probability mass function (pmf) [if Y is discrete]

A

f_Y(Y) = Pr(Y = y) and F_Y(y) = x:x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

probability density function (pdf) [if Y is continuous]

A

f_Y(y) = d/dy F_Y(y) and F_Y(y) = integral(y -> −∞)(f_Y(x)) dx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

p-quantile of a random variable Y

A

the value y_p for which Pr(y ≤ y_p) = p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Pr(Y > y)

A

1 − Pr(Y ≤ y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

joint cumulative distribution function (cdf) of a vector Y1,…Yn

A

F_Y(y1, . . . , yn) = Pr(Y1 ≤ y1, . . . , Yn ≤ yn)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If Y1, . . . , Yn are discrete then their joint pmf is defined by

A

f_Y(y1, . . . , yn) = Pr(Y1 = y1, . . . , Yn = yn)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If Y1, . . . , Yn are continuous then their joint pdf is defined by

A

fY (y1, . . . , yn) = ∂^n/∂y_1 . . . ∂y_n F_Y (y1, . . . , yn)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Y1, . . . , Yn are independent if

A

f_Y (y1, . . . , yn) = f_Y1

(y1). . . fY_n(yn) for all y1,…,yn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Y1, . . . , Yn are identically distributed if

A

f_Y1(y) = . . . = f_Yn(y) for all y1,…yn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

if Y1, . . . , Yn are independent and identically distributed (iid) then their joint pdf or pmf is

A

f_Y(y1, . . . , yn) = f_Y1(y1). . . f_Y1(yn)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

explanatory variable

A

plotted on the x-axis and is the variable manipulated by the researcher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

response variable

A

plotted on the y-axis and depends on the other variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

if Y has a poisson distribution with parameter µ, then we write Y~Poi(µ) and Y has pmf

A

f_Y(y) = µ^ye^-µ / y! for y = 0,1,2…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

if Y has a exponential distribution with parameter θ, then we write Y~Exp(θ) and Y has cdf

A

F_Y(y; θ) = 1 - e^-θy for y > 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

if Y has a exponential distribution with parameter θ, then we write Y~Exp(θ) and Y has pdf

A

f_y(y; θ) = d/dy F_Y(y; θ) = θe^-θy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

for p-quantile cdf is

A

F_Y(y_p) = p and y_p = F_Y^-1(p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

expectation is

A

E(g(Y)) = sum(Pr(Y=x)g(x) = sum(F(x)g(X) where F(x) is the pmf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

variance of random variable Y is

A

Var(Y) = E(Y - E(Y)^2) = E(Y^2) - E(Y)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

empirical probability r/n is

A

r/n = Pr(X ≤ x_r)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

simple linear model means

A

one explanatory variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

an example of a joint distribution for two variables is the…

A

bivariate normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

if X and Y are independent

A

f(x,y) = f_x(x)f_y(y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

covariance formula:

A

Cov(X, Y) = E[(X - E(X))(Y - E(Y))] = E(XY) = E(X)E(Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
if independent, covariance formula:
Cov(X, Y) = 0 | E(XY) = E(X)E(Y)
26
covariance with correlation/variance formula:
Cov(p*sqrt[Var(X)Var(Y)])
27
an example of a joint distribution for two variables, each with a normal distribution is called
the bivariate normal distribution
28
the joint pdf of the bivariate normal distribution
f(x, y; θ) = 1 / 2πσXσY sqrt(1 − ρ^2) * exp(−1/2(1 − ρ^2)[(x − µX)^2/σX^2 + (y − µY)^2/σY^2 − 2ρ(x − µX)(y − µY )/σXσY
29
a continuous random variable Y defined on | (−∞,∞) with pdf f(y; θ) has expectation denoted by E(Y) and defined as..
E(Y ) = integral(∞ -> −∞) y * f(y;θ) dy
30
a discrete random variable with range space R and pmf f(y; θ), E(Y) is defined as...
E(Y ) = sum(y∈R) [y * f(y; θ)]
31
for a real valued function g(Y), when continuous, E[g(Y)] is
integral(∞ -> −∞) g(y) * f(y; θ)dy
32
for a real valued function g(Y), when discrete, E[g(Y)] is
sum(y ∈ R) g(y) * f(y; θ)
33
α-confidence interval is ...
an interval | estimator that contains the true parameter value θ with probability α for every θ
34
null hypothesis
H_0 : θ = x (this is also a simple hypothesis = completely species a probability model by specifying a specific value)
35
alternative hypothesis
H_1 : θ NOT= x | this is also a composite hypothesis = it does not completely specify a probability model
36
if H_1 : θ NOT= x; which specifies values either side of H_0 it is called
a two-sided alternative
37
if H_1 : θ < x it is called a
one-sided alternative
38
for a null hypothesis H_0 : θ = θ_0, the null distribution is the distribution of T(Y) when ...
θ = θ_0
39
let f_Y(y; θ) be continuous and denote the joint pdf of Y then Pr(Y ∈ C)
integral(C) f_Y (y; θ), if f(y; θ) is continuous
40
let f_Y(y; θ) be discrete and denote the joint pmf of Y then Pr(Y ∈ C)
sum(y ∈ C) f_Y (y; θ), if f(y; θ) is discrete
41
the size of the test α
α = Pr(Y ∈ C; θ_0)
42
probability of a type I error
is to reject H_0 when it is true
43
probability of a type II error
is to reject H_0 when it is false
44
if the alternative hypothesis is simple, then the power of the test is
Pr(Y ∈ C; θ_1) = the probability of not making a type II error/detecting that H_0 is false
45
for a set y1, ..., yn, the sample moments are
m^r = 1/n sum(i = 1 -> n) y^r_i
46
for a continuous or discrete random variable Y, the moment generating function (mgf) of Y is
M_Y(t) = E(e^(tY))
47
for the kth moment of Y it is
E(Y^k) = m_k
48
the central moments of Y are
E[{Y − E(Y)}^r]
49
the method of moment estimate , ˆθ, is such | that ...
m_r(ˆθ) = ˆmr for r = 1, . . . , d
50
for sample variance we will use
s^2 = (n − 1)^(−1) * sum(i=1(yi − yBAR)^2)
51
the distribution of the different estimates is called the
sampling distribution
52
the standard deviation of the estimated sampling distribution gives the
estimated standard error
53
for the data yBAR = (y1,...,yn) we have an estimate
ˆθ(y)
54
for the data yBAR = (y1,...,yn) we have an estimator
ˆθ(Y)
55
for critical region C = {y : T(y) < c}, the p-value is
p = Pr[T(Y ) ≤ t; θ0]
56
for critical region C = {y : T(y) > c}, the p-value is
p = Pr[T(Y ) ≥ t; θ0]
57
the null distribution of the t-statistic is defined as
T(Y) = YBAR − µ0 / s/√n (which is called the t-distribution with n-1 degrees of freedom)
58
the t-statistic has the cdf which is denoted as
Φ_n−1(y)
59
the sample variance of Φ_n−1(y) is
s^2 = 1 / n − 1 sum(n -> i=1) (Yi − YBAR)^2 is an unbiased estimator of σ^2
60
critical region
when {C : T > tc} where tc is the critical value
61
power of the test is when the alternative hypothesis is simple and is the probability of ...
not making a type II error or the probability of detecting that H_0 is false
62
the p-value is the probability that ...
the observed test statistic is no better than the value we observed with respect to H_0 when H_0 is true
63
sum(n -> i = m) (cx_i) =
c ^(n−m+1) sum(n -> i = m) x_i | .
64
sum(n -> i = m) x^c_i =
[sum(n -> i = m) x_i ]^c
65
sum(n -> i = m) c^(x_i) =
c^(sum(n -> i = m) x_i
66
a sample, y = (y1, . . . , yn), modelled as a realisation of independent random variables, Y = (Y1, . . . , Yn). For i = 1, . . . , n, let f_Yi (y; θ) denote the pmf or pdf of Yi, where θ is the model parameter. The joint pmf or pdf of Y evaluated at our sample y is then fY (y; θ) =
= fY_1(y1; θ). . . fY_n(yn; θ) = sum(n -> i = 1) f_Yi(y_i; θ) by independence
67
the joint pmf or pdf as a function of θ it is referred | to as the likelihood function and is denoted ...
L(θ; y) = f_Y (y; θ)
68
the parameter value that maximizes the likelihood is called the ...
maximum likelihood estimate (mle)
69
it is usually simpler to maximize the logarithm of the likelihood instead - the log-likelihood is denoted ...
l(θ; y) = log L(θ; y) = l(θ)
70
a constant will not affect the shape of the likelihood - true or false..
true (and therefore will affect the shape of the maximum - we can ignore multipliers when we calculate the mle)
71
mean squared error
Let ˆθ be an estimator for θ. The mean squared error of ˆθ is mse(ˆθ) = E{(ˆθ −θ)^2} and the bias of ˆθ is Bias(ˆθ) = E(ˆθ) − θ. If the bias is zero then the estimator is unbiased.
72
mean squared error can be written in terms of bias and variance:
mse(ˆθ) = Var(ˆθ) + Bias(ˆθ)^2
73
a consistent estimator
The estimator ˆθ is consistent for θ if, for all e > 0, lim(n→∞)Pr(|ˆθ − θ| > e) = 0. [this is the asymptotic limit)
74
Only one parameter in the model, the mle is ...
scalar and its approximate sampling distribution will be a univariate normal distribution
75
More than one parameter in the model, the mle is ...
a vector and its sampling distribution will be a multivariate normal distribution
76
expectation of vector of random variables with ith element Yi
Let Y be a vector of random variables with ith element Yi. The expectation of Y is the vector with ith element E(Yi).
77
variance of vector of random variables with ith element Yi
The variance of Y is the matrix with (i, j)th element Cov(Yi, Yj ). This can be written as Var(Y ) = E[{Y − E(Y )}{Y − E(Y )}^T] = E(YY^T)) − E(Y)E(Y)^T.
78
observed information J(θ)
``` Let l(θ) be a log-likelihood function. The observed information is J(θ) = −∂^2 l(θ) / ∂θ ∂θ^T. ```
79
If θ is a scalar then, J(θ) is
-d^2 l(θ) / dθ^2
80
If θ is a vector with ith element θi then, J(θ) is a matrix with (i, j)th element,
-∂^2 l(θ) / ∂θi∂θj
81
expected information is I(θ) = E{J(θ)}, that is the matrix with (i, j)th element,
E{−∂^2 l(θ) / ∂θi∂θj}
82
multivariate normal distribution
The random variable Y = (Y1, . . . , Yd) has a multivariate normal distribution with expectation E(Y) = µ and variance Var(Y) = Σ if the pdf of Y is f_Y (y; µ, Σ) = (2π)^(−d/2)det(Σ)^(−1/2)exp[−(1/2)(y − µ)^(T)Σ^(−1)(y − µ)], in which case we write Y∼N(µ,Σ).
83
useful properties of multivariate normal distribution:
``` if two random variables are independent then they are uncorrelated because independence implies that E(Y1Y2) = E(Y1)E(Y2) and so Cov(Y1, Y2) = E(Y1Y2) − E(Y1)E(Y2) = 0. also linear transformations of multivariate normal random variables are also multivariate normal ```
84
THEOREM: When n is large, the sampling distribution of the mle is approximately N(θ, I(θ)^(−1)). This is called the asymptotic distribution of the mle.
If n is large then the square roots of the diagonal elements of I(θ)^−1 approximate the standard errors of the mles in ˆθ. These standard errors can be estimated by replacing θ with ˆθ in I(θ)^−1.
85
likelihood ratio test statistic
T = 2 { l(ˆθ; y) − l(θ_0; y) }