Statistics Flashcards

(101 cards)

1
Q

F (x) = f/n

A

Relative frequency = frequency/ total number of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

CDF plot

A

Fx against data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the gradient of a CDF plot mean?

A

The steepest part of CDF plot corresponds to most number of data points. Shallower CDF -> low number of data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Skewness?

A

Which way the curve leans how asymmetrical it is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Positive skew

A

Skews to the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Negative skew

A

Skews right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Interquartile range?

A

Difference between third quartile and first quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sample space?

A

The set Ω of all possible outcomes of an experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Event?

A

A sub-set A of the sample space A c Ω

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Α n B?

A

A and B occurring together (intersection)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A u B

A

A or B occurring (union)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

_
A?

A

Not A (complement)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mutually exclusive?

A

Events cannot occur at same time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Independent?

A

Event is not affected by previous events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Dependent?

A

Event is affected by other event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Permutations?

A

Order of outcome is important

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Combinations?

A

Order of outcomes is not important

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

P(A|B)?

A

Probability of A given that B has already occurred

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

P(A|B)P(B) = P(B|A)P(A) = ?

A

P(A n B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does it mean if P(A|B) = P(A)?

A

Events A and B are independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Partitioned sample space?

A

The events are non-empty, non- overlapping whose union forms the whole sample space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

ΣP(B|A)P(A) = ? (Total probability law)

A

P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

P(A|B)P(A) / P(B) = ? (bayes law)

A

P(A|B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

P(A|B n C) P(B|C) P(C) = ?

A

P(A n B n C)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Sx^2 = 1/(n-1) Σ(xi-x~)^2
Sample variance Sample deviation = Sx N= number of data points x= value (each one) x~= mean
26
Vx= Sx/ x~
Coefficient of variation = standard deviation/ mean
27
dx=1/n * Σ|xi-x~|
Mean absolute deviation = dx n= number of data points xi = data points x~= mean
28
Unbiased skewness= sqrt(n-1)/ (sqrt(n)*(n-2)(σ^3) * Σ(xi-x~)^3
n= number of data points σ=standard deviation xi= data point x~ = mean
29
Biased skewness = (1/n) * Σ(xi-x~)^3 / σ^3
n= number of data points xi= data point x~ = mean σ= standard deviation
30
What order is mode median and mean for positive skewness?
Mode - median - mean
31
In what order is mode median and mean for a symmetric curve?
All equal
32
In what order is mode median and mean for negative skewness?
Mean-> median -> mode
33
cov= 1/(n-1) Σ(xi-x~)(yi-y~)
Sample convergence n= number of data points xi= each x point x~ = mean x yi= y point y~ = y mean
34
Cxy =1/(n-1) Σ(xi-x~)(yi-y~) / SxSy
Cxy= sample correlation coefficient (has to be between -1 and 1) n= number of points xi= x point x~ = x mean yi= y point y~ = y mean Sx= x standard deviation Sy= y standard deviation
35
What does Cxy= -1 mean?
Perfect negative linear correlation
36
What does Cxy= 0 mean?
No correlation
37
What Cxy= 1 mean?
Perfect positive linear correlation
38
Variable?
A quantity that can vary
39
Random?
The result will be the outcome of a random experiment
40
Discrete?
Has a limited number of outcomes
41
Continuous?
Has a limitless number of outcomes in between two points
42
PDF=f(x) = dF(x)/ dx
Probability density function = the gradient of CDF plot
43
What is the area under a CDF plot?
1
44
The steepest part of a CDF plot is where on a PDF (probability density function) plot?
The peak
45
Cov(X,Y)
= E[(X-E[X]) (Y-E[Y])) = E[XY]-E[X]E[Y]
46
Cov (X,X)?
Var (X)
47
Var(X+Y)?
Var (X) + Var(Y) + 2Cov(X,Y)
48
What is the convenience of random variables X and Y?
A measure of the linear dependence between these variables
49
Cov(cx,Y) = Cov(X,cY)
c*Cov(X,Y)
50
Cov(X+Y,Z) ?
Cov(X,Z) + Cov(Y,Z)
51
Cov (X,Y) = (If symmetrical???)
Cov(Y,X)
52
Cov(X,c)?
0
53
Cov(Σ ai Xi, Σbj Yj) =
Σ Σ ai bj Cov(Xi,Yj)
54
How do you standardise a value for confidence intervals to N(0,1)
Z-E(X) / σ
55
When do you apply a continuity correction?
When you are using a normal distribution to approximate a discrete distribution (binomial, poisson)
56
What is the continuity correction?
+/- 0.5 to values
57
Sqrt (n)= (z*σ)/ E
n = minimum sample size z = x value relating to probability E= margin of error σ = standard deviation
58
_ X - μ / σ/ sqrt(n) ~ N(0,1)
X = value comparing to μ = mean σ = standard deviation n = population
59
Cis for the mean with known standard deviation?
P(μ- zσ/sqrt(n) < X < μ + zσ/sqrt(n) ) = 1- α
60
Cis for the mean with unknown standard deviation n?
_ X +- t*s/sqrt(n) = CI Where v (degree of freedom) for t is n-1
61
Normal approximation for large sample sizes of binomial?
N(p, p(1-p)/n)
62
Inf-0 Σ μj/ j!
e^μ
63
pXY = Cov(X,Y) / sqrt(Var(X) Var(Y))
Correlation coefficient = covariance of X and Y / standard deviation of x and y
64
-1 < p(X,Y) < 1
Correlation coefficient is between -1 and 1
65
If p(X,Y) = 0
X and Y are uncorrelated
66
p(X,Y) > 0
X and Y are positively correlated
67
p(X,Y) < 0
X and Y are negatively correlated
68
FXY (j,k) = P( X
Joint cumulative function
69
Inf to -inf Σ Σ fXY (x,y) dxdy = 1
Sum of all fXY over infinity =1
70
Σy -inf Σ x -inf fXY (u,v) dudv = FXY (j,k)
Sum of all fxy up to x and y is the cumulative function
71
Px (j) = Σ PXY (j,k) from 0 to inf
-
72
fx(x) = Σ fXY (x,y) dy from inf to -inf
-
73
If X and Y are independent?
- PXY (j,k) = PX(j) PY(k) (discrete case) - fXY (x,y) = fX(x) fY(y) (continuous case) - fX|Y (x|y) = fX(x)
74
fX|Y(x|y) = fXY (x,y) / fY(y)
-
75
Px|Y (j,k) = P(X= j and Y=k) / P(Y=k)
-
76
What is simple linear regression?
A process where you try and find the line of best fit which is the line that has the minimum Σ (yi-a-bxi)^2
77
b = Cov xy/ var x
B for line of best fit which
78
a= ymean- b*xmean
A for line of best fit
79
r2 = var (a+bx) / vary
Coefficient of determination = variance from the line of best fit to data points / variance of y
80
r^2 = cov^2xy / varx * vary
Coefficient of determination
81
Z ~ N(0,σ^2/Z)
The errors Zi = y-a-bx can be estimated as independent and normally distributed
82
Yi|X = xi ~ N(a+bxi, σ^2Z)
Assumption
83
a ~ (a, (1/n + mx^2/nvarx) Varz)
Normal distribution of a
84
Var(a) = (1/n + mx^2/ nvarx) VarZ
a= value for y= a+ xb n= population mx = mean of x varx = variance of x Var Z = variance of error values
85
VarZ = 1/n-2 (nvary- ncov^2xy/ varx)
VarZ = variance of Z errors n= population Vary = variance of y Cov= covariants of x and y varx = variance of x
86
b ~ N(b, VarZ/ nvarx)
b can be normally distributed for y= a+xib
87
Cis for b
b +/- sqrt(Varb) * z
88
What t should you use for no known population of linear regression error z?
n-2
89
Cis for a?
a +- sqrt(Vara) * z
90
Cis for a+bx0
a + bx0 +- (t sqrt var (a+bx0)) Var (a+bx0) = Var(Z) (1/n + (x0-mx)^2/nvarx)
91
CIS got a+bx0 + Z
Add + 1 to 1/n for formula for a+bx0
92
Test on the slope parameter b?
1. Null hypothesis H0: b=b* 2. Alternative Hypothesis H1: b=/ b* 3. T=2.953 lt b-b/ sqrt(VarZ/nvarx) 4. Distribution of test statistic T~t(n-2) 5. Critical region R |T| > t n-2 6. Evaluate T under H0. If |t0|> tn-2 Reject H0 7. p-value p|T|>t0
93
Test on the correlation for linear regression?
1. Null hypothesis H0: corr(X,Y) =0 2. Alternative Hypothesis H1: Corr(X,Y) =/0 3. Test statistic V= sqrt(n-3)/2 ln((1+cxy)(1-corrXY)/(1-cxy)(1+corrXY)) 4. Distribution of test statistic V~N(0,1) 5. Critical region |V|> z 6. Evaluate V under H0 if |v0|>z -> Reject H0 7. p-value p(|V|>v0)
94
ssreg = nvar(a+bx)
Regression sum of squares
95
ssrcs = (n-2) VarZ
Residual sum of squares
96
sstot = ssreg + ssrcs = nvary
Total sum of squares = regression sum of squares + residual sum of squares
97
F= ssreg/1 / ssrcs/n-2 ~ F(1,n-2)
F distribution
98
Test on the regression F-test for linear regression?
1. Null hypothesis H0: b=0 2. Alternative hypothesis H1: b=/0 3. Test statistic F= ssreg/ssrcs/n-2 4. Distribution of test statistic F~F(1,n-2) 5. Critical region F>f1,n-2 6. Evaluate F under H0 if f0>f1,n-2 -> Reject H0 7. p-value p(F>f0)
99
Mean exponential?
1/λ
100
Log - likelihood function?
l(λ) = -λ Σti + n lnλ
101
Variance of two parts together?
Var Z = VarX + VarY + 2 σ x σ y