Stats 2 Flashcards

(50 cards)

1
Q

What is the null and study hypothesis’s ?

A
Null= no dif between exposed and unexposed 
Study = there is a dif between exposed and unexposed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a p-value?

A

The chance that what you saw happened by chance

P= 0.02 means probability it happened by chance is 2 in 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

3 steps to hypothesis testing

A

1) specify the study hypothesis and the null hypothesis
2) assume the null hypothesis is true
3) calculate the p-value- if low (<0.05) reject null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a type 1 error?

A

False positive -reject the null hypothesis even though it is actually true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a type 2 error?

A

False negative - fail to reject null hypothesis even though it is actually false.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you calculate the T statistic?

A

Observed mean difference/

Standard error of difference between means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What assumptions does the t-test make?

A

The outcome is continuous and normal distribution

And variance in the two groups is equal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the levene’s test measure?

A

That variance between two groups is equal (one of the assumptions needed for the t-test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In spss what is the p-value called?

A

Sig

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Chi- squared tests are used to calculate p-values from what kind of data?

A

Categorical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

T- tests are used to calculate the p-value from what kind of data?

A

Continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the student t-test?

A

Independent samples t-test

Used when u want to compare two groups of continuous, normally distributed variables

(Assumes that variance/scatter in two groups are similar)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does the t-test work?

A

Uses the observed difference in sample means and the standard error (sampling error) for the dif in means to calculate the p value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is this t-statistic?

A

Difference in sample means/ standard error of the dif in means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What would result in a small t-statistic and what does it mean?

A

Either small difference in means or a large SE.

Means probability that the observed dif happened by chance.= large = large p-value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a parametric test? Name one.

A

Only valid when the data is normally distributed and 2 populations have equal variance
Example= t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the negatives and positives of NON- parametric tests?

A

Pos: makes no assumptions of underlying distribution of data.

Neg: less powerful than parametric test- difficult to get confidence intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How to transform moderately positively skewed data?

A

Logarithm of each number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you transform strongly positively skewed data?

A

Reciprocal (l/x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you transform weakly positively skewed data?

A

Square root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you transform moderately negatively skewed data?

22
Q

How do you transform strongly negatively skewed data?

23
Q

How do you transform data with unequal variation

A

Log, reciprocal or square root

24
Q

How to describe skewed variables if data not normally distributed?

A

1) present medians
2) present interquartile range
3) difference between two medians (CI not easy)

25
Name two non-parametric equivalents of the t-test
Wilcoxon rank sum test | Mann-Whitney U test
26
Describe wilcoxon rank sum test
Two independent groups (1 smaller) 1) rank all observations in asset order 2) sum ranks for smaller group (t-statistic) 3) look up t on wilcoxon rank sum table of critical values to get p-value
27
What assumption does wilcoxon rank sum test and Mann-Whitney U test make?
That there are no tied ranks- if many identical results - complicated corrections are applied
28
What assumptions does the Chi- squared test make?
Each subject contributes data to only one cell | The expected count in each cell should be at least 5
29
What test can u use for p-value if u have a 2X2 table with numbers <5
Fishers exact test
30
What is the chi-squared test for trends?
Used when exposure variable is ordered (eg age) and outcome is binary.
31
What does the test statistic measure (chi-squared)
How close the observed values in a table are to expected values were there no true association
32
How do you calculate the chi-squared statistic?
For each cell: subtract expected (E) from observed (O) then square and divide all by E Sum over all the cells to give overall chi squared statistic (x^2)
33
A large chia squared statistic means the data is more or less consistent with the null hypothesis?
Less
34
A large x^2 value means aargh or small p-value?
Small
35
What is the difference between hypothesis testing and measures of effect?
Hypothesis testing = assesses the significance of an association Measure of effect (or/rr) assesses the magnitude of an association
36
On a scatter plot comparing two continuous variables - what do you plot on x-axis?
Y axis is response or outcome and x axis is the exposure
37
What does correlation measure?
The closeness or degree of association between two continuous variables
38
What does Pearsons correlation coefficient (R) measure?
The degree of linear association
39
Can pearsons coefficient be used in data that is unevenly distributed?
No. Only evenly distributed data
40
Between what values will R always lie?
-1 and + 1
41
What does an R of +1 or -1 look like?
R=1 is a positive straight line (going up) r=-1 is a straight line going down as x increases y decreases R= 0 is no correlation
42
Can you calculate 95% CI for R?
Yes - tells us where we are 95% confident the true value lies
43
What does r squared represent?
An estimate of the proportion of the total variation of our outcome variable that is explained by the other variable. R between blood pressure and stress =O.80, r2= 0.64. Therefore stress accounts for 64% of total variation in BP.
44
Name two non-parametric correlation coefficient tests.
1)Spearman' s rank correlation coefficient 2) Kendall's coefficient Both rank the values of each of the two variables and examine how closely the ranks are correlated
45
Can you use correlation when there are multiple measurements on each individual?
No
46
How should you describe correlation?
Talk about the direction (pos or neg R), the strength (size or R) and the significance (p-value)
47
In relations to calculating linear regression what does each symbol represent y= a + bx
``` Y= outcome variable X= exposure variable A= the point where line crosses the y axis (where x is 0) B= slope of the line A and b are the regression coefficients ```
48
How do you determine he best fit line (linear regression)
Method of least squares ( find the regression coefficients a and b that minimise the vertical distances between the points and the line
49
What is the most important value in linear regression (i.e. Y, x, a or b)
B is most important as is tells you how much the outcome y increases or decreases per unit increase in the exposure Can calculate 95% CI around b P- value to test null hypothesis (no association)
50
What does linear regression tell you?
The nature of the relation between x and y by giving us the equation of the best fit line but does not tell us how well the line fits the data.