Key Concepts Flashcards

1
Q

What are the two different types of data?

A

Categorical and quantitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name three types of categorical data and give an example of each

A

Binary (two levels) - e.g. Are you a smoker yes or no

Nominal (no ranking) - e.g. ethnicity

Ordinal(ranked)- e.g. height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name two types of quantitative data

A

Discrete (isolated values) e.g. number of therapy sessions completed 1, 2, 3

Continuous (any values in interval) e.g. age, clinical scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give 4 factors that define a normal distribution of continuous data and include an example

A

Symmetrical
Most data close to the middle
Extreme values are rare
Mathematically helpful
E.g. height of men

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does positive/right skewed distribution of continuous data appear?

A

Most values are clustered around the left tail of the distribution while the right tail of the distribution is longer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does negative/left skewed distribution of continuous data appear?

A

Most values are clustered around the right tail of the distribution while the left tail of the distribution is longer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a fat-tailed distribution?

A

Where extreme values are more likely
E.g. Distribution of wealth, 80/20 rule

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What can make classical statistics difficult?

A

Fat tailed distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do descriptive statistics describe?

A

Data collected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What cannot be used to make inference about the wider population as values in the true population could differ due to chance?

A

Descriptive statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is typically used to describe quantitative (continuous) data?

A

A measure of the average (mean or median)

A measure of variability (standard deviation, quartiles)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A symmetric mean equals…

A

median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

True or false:

Skewed data mean does not equal median

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is sensitive to outliers?

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is on the same scale as your data?

A

Standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is not on the same scale as your data?

A

Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are two main approaches to measure variance?

A
  1. SD and variance
  2. Percentiles
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the difference between standard deviation and variance?

A

Variance is the average squared deviations from the mean, while standard deviation is the square root of this number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the empirical rule?

A

The percentage of values that lie within an interval estimate in a normal distribution: 68%, 95%, and 99.7% of the values lie within one, two, and three standard deviations of the mean,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What descriptive statistics are used to describe categorical data?

A

Binary and multinomial data:
Number and proportion in each category

Ordinal data:
Small number of categories: Number and proportion in each category

Larger categories for ordinal data: Median and 25th and 75th percentile
Mean (sd) – less common.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is statistical inference?

A

Making statements about the population from the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does statistical inference not address?

A
  1. If a study is biased
  2. If observed associations are causal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are two different approaches to statistical inference?

A
  1. Frequentist
  2. Bayesian
24
Q

What approach to statistical inference is more common in medical and psychological research at the moment?

A

Frequentist

25
What features define a frequentist approach to statistical inference?
1. Using p-values, confidence intervals, maximum likelihood 2. Inference is based on the observed data. 3. Make probability statements about the data, given the value of a parameter: “The probability of observing data as extreme as this, given there is no treatment effect is 3%.” 4. Different people will get the same results applying the same analysis to the same data.
26
What features define a bayesian approach to statistical inference?
1. Credible intervals, priors, posterior probability 2. Incorporates prior beliefs into statistical inference 3. Allows probability statements about parameters, given the data (and prior beliefs) e.g. “Given the data we have observed, there is a 97% chance the treatment is effective” 4. Different people will get different results depending on their prior beliefs
27
In bayesian and frequentist statistics conclusions will be similar if..
The sample size is large enough and the strength of prior beliefs weak.
28
What is used as a measure of uncertainty when using a frequentist approach?
1. Confidence interval 2. p-value:
29
What is an alpha-level confidence interval?
An interval of uncertainty around an estimate for a parameter
30
Confidence intervals are...
Intervals that, under repeated sampling, would contain the true value alpha percent of the time .
31
What do we typically calculate?
95% confidence intervals
32
What is the standard error?
The standard deviation of an estimate’s sampling distribution
33
Often the standard error can be calculated from the standard deviation of the population the statistic is being calculated on True or false
True
34
How can you calculate the standard error for a mean?
Divide standard deviation by the square root of the sample size
35
The standard error will, in most cases..
Get smaller as n increases
36
The standard deviation does change systematically with sample size True or false
False The standard deviation does not change systematically with sample size
37
If we assume our estimate is from a normal distribution how can we calculate the confidence interval?
95% 𝐶𝐼=𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ±1.96×𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 E.g. for a mean 95% 𝐶𝐼=𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ±1.96 (𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛)/√𝑛
38
As sample size increases confidence interval becomes what?
Smaller
39
For means what is often used instead of a normal distribution and what does this often lead to?
t-distribution This leads to a different multiplier for the standard error to 1.96, usually fairly close to 2
40
What is a p-value?
The probability of observing the data, or data more extreme given the parameter of interest takes a given value.
41
What is a null hypothesis?
The value the parameter is set to take Typically the null hypothesis is for no effect or association.
42
p-values reported from models are typically..
for parameters to be equal to zero
43
What is used to make decisions (or inference) about the value of a population parameter?
Statistical test of hypothesis
44
What does a statistical test of inference consist of?
A statistical test of hypothesis consists of five parts 1 . The null hypothesis, denoted by H0 2. The alternative hypothesis, denoted by H1 1. One tailed: H1 d: parameter > H0 2. Two tailed: H1 : parameter ≠ H0 Two tailed p-values are almost always used 3. The p-value 4. A significance threshold (0.05)
45
When would we reject the null hypothesis?
If the p-value is below the significance threshold we reject the null hypothesis and conclude that the alternative hypothesis is true
46
If the p-value is not below the significance threshold we do not have evidence to reject the null hypothesis. Why is this?
This does not mean the null hypothesis is true A non-significant p-values tells us we do not know much
47
What does 'p < 0.05' mean?
There is evidence that there is a difference: If there was no difference we’d have been unlikely to see the data we did.
48
What does p > 0.05 mean?
There is insufficient evidence to conclude there is a difference. If there was no difference our results would not be unexpected.But we cannot rule out a difference. If a p-value is not statistically significant we cannot conclude that there is no difference.
49
What are two errors from hypothesis tests?
Type 1 error (α) Type 2 error (β)
50
What is a type 1/a error?
Falsely conclude there is a difference Controlled with significance threshold If the significance threshold is 0.05 we expect a type 1 error rate of 5%
51
What is a Type 2 error (β)?
Fail to conclude that the there is evidence for a difference when there is a true difference Sample size, magnitude of true difference, and variability of data effect type 2 error rates Power = 1 - β Power is the probability of concluding there is a difference, when true. Low powered test: Unlikely to be significant even if there is a difference
52
What can you determine when given a -1 alpha level confidence interval?
Whether the p-value is statistically significant at the α level. i.e. given a 95% confidence interval you can tell if the p-value would be significant at the 5% level
53
If the confidence interval contains the null hypothesis, p > 0.05 If the confidence interval does not contain the null hypothesis p <0.05 What is an example of this?
For example if the null hypothesis is 0: 95% CI of -1.1 to -0.1 would correspond to a statistically significant result -1.1 to 0.1 would correspond to a result that was not statistically significant.
54
What causes type 1 error?
Multiple testing & p-hacking
55
What enhances the issue of multiple testing and p-hacking?
-Selective reporting enhances the problem, eg: Only report significant results and ignore non-significant results - Place more emphasis on significant results -Selective reporting can occur at the study level: studies with non-significant findings are less likely to be published
56
What are solutions to multiple testing and p hacking?
1. Bonferroni correction: divide significance threshold by number of tests - This can be conservative - Leads to larger sample sizes being required 2. Pre-specification of outcomes, analysis methods, and studies - Can specify primary outcome – stops emphasis being shifted to significant results - Makes visible the number of tests conducted - Compulsory in randomised controlled trials e.g. All trials campaign http://www.alltrials.net/ - Harder to do in more exploratory studies
57
What are some reasons for banning the p-value?
1. Can be manipulated with multiple testing 2. They are often misinterpreted People often interpret p > 0.05 as meaning “no effect” 3. Over reliance on significance thresholds p = 0.04 given wildly different interpretation to p = 0.06 4. Bayesian argument: p-value tells us probability of observing the data given no effect What we want to know is probability of an effect. This can only be achieved with Bayesian inference.