All stats Flashcards

(85 cards)

1
Q

Name the 2 broad categories data can be split into

A
  1. Categorical

2. Quantitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can categorical data be split into?

A
  1. Binary
  2. Nominal
  3. Ordinal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What can quantitative data be split into?

A
  1. Discrete

2. Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is binary data?

A

Data split into 2 categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give an example of binary data

A

Success/ failure

Yes/ No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is nominal data

A

More than 2 categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Give an example of nominal data

A

Eye colour
Hair colour
Hair type

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is ordinal data

A

Ordered data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Give an example of ordinal data

A

Happiness rating on a scale of 1-10

Customer server rating of 1-5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is discrete data

A

Data in the form of numerical values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Give examples of discrete data

A
  1. Number of kids

2. Movie rating in stars

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is continuous data

A

Uninterrupted data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Give examples of continuous data

A

Height
Time
Weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Name the best way to represent categorical data

A

In a bar chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Name the best way to represent continuous data

A

Histogram or box plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define skewness

A

Skewness is a measure of probability distribution around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Name the 3 ways be describe skewness

A
  1. Left skew
  2. Symmetrical
  3. Right skew
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Describe the relationship between median and mean in a data set that is left skewed

A

Mean < median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Describe the relationship between median and mean in a data set that is right skewed

A

Mean > median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is central tendency

A

Measures of specific points in a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Give examples of central tendency measures

A

Mean
Median
Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are variation measures?

A

Measures of spread of variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Give examples of variation measures

A
  1. Variance

2. Standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the standard deviation

A

A measure of the average scatter around the mean

greater the spread of data greater the SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is normal distribution used to describe?
Used to describe continuous data that forms a bell shaped symmetrical curve
26
What is a key characteristic of normally distributed data
Mean, median and mode are all equal
27
What symbol to we give to represent the mean?
μ
28
What symbol to we give to represent the SD
σ
29
Give examples of data that could be normally distributed
``` Height Ade Weight Bone density Exam scores BP ```
30
How do we check for normality
1. Look at the histogram does it appear bell shaped 2. Are mean, median and mode similar 3. Do 2/3rds of the data lie within 1 sd from the mean 4. Run numerical tests of normality
31
Describe a Q-Q plot for normally distributed data
1. Follows a straight line
32
Give examples of numerical tests we can use to assess normality
1. Kolmogorov-Smirnov | 2. Shapiro Wilk
33
What requirements must a qualitative data set fulfil before we can calcite a central limit theorem for it?
Sample size must be larger than 30
34
What does μ+σ mean and what does it determine on a curve for normally distributed data?
mean+standard deviation | Determines the shape of the curve
35
What does μ mean and what does it determine on a curve for normally distributed data?
μ is the mean and it determines the line of symmetry on a bell curve
36
What does σ mean and what does it determine on a curve for normally distributed data?
σ Is the standard deviation and it determines the spread of data around the mean
37
What does the empirical rule state?
All curves are standardised where: μ= 0 σ= 1
38
How much of the populations represented 1 standard deviation +/-mean
68%
39
How much of the populations represented 2 standard deviation +/-mean
95%
40
How much of the populations represented 3 standard deviation +/-mean
99.7%
41
Define population
A group of all items of interests
42
Define sample
A set of data drawn from the population
43
Define parameter
A descriptive measure of a population
44
Define statistics
A descriptive measure of a sample
45
What is inferential statistics
Drawing conclusions. inferences about characteristics of a population based on SAMPLE data
46
What is descriptive statistics
Is using data to provide descriptions of the population through numerical calculations or graphs or tables
47
What is a statistical inference?
Is the process of making an estimate, prediction or decision about a population based on the data from a sample
48
What is standard error?
The standard deviation of the sample mean
49
How do we calculate confidence interval
Sample statistic +/- measure of how confident we want to be (1.96)*SE
50
What does the sample statistic equal
The sample mean
51
What do we mean when we say we are 95% confident
We are 95% confident that our true population mean lies in this bar
52
What is hypothesis testing
Testing whether the difference in values obtained is significant or not
53
Talk through the steps of hypothesis testing
1. Decide statistical question 2. Assume the null hypothesis 3. , Predict the sampling variability assuming the null hypothesis 4. Do the experiment 5. Calculate the p value 6. Hypothesis test
54
When do we accept our null hypothesis
If the p value is greater than 0.05 (p>0.05) There is no association between the 2 factors
55
When do we reject our null hypothesis
If the p value is LESS than 0.05 (p<0.05) There IS an association between the 2 factors
56
What gives us more information hypothesis test or confidence interval?
Confidence interval
57
What does a confidence interval overlapping with zero indicate
There is no difference and therefore we reject the null hypothesis
58
What is a type I error
When you reject the null hypothesis when it it true (false positive)
59
What is a type II error
When you accept the null hypothesis when it was false (false negative)
60
What does power mean in term of statistics?
The probability of finding a difference in 2 groups if one truly exists (the probability of NOT making a type II error)
61
Do want our study to have a high or low power?
High power (at least 0.8/80%)
62
List some factors that affect the power
1. Size of effect 2. Standard deviation 3. Sample size 4. Significance level
63
How does size of effect affect the power of our study
A larger difference in observed values will increase the power as values are further from 0
64
How does standard deviation affect the power of our study
A larger SD decreases the power as it means more variability meaning a shallower curve
65
How does sample size affect the power of our study
A larger sample size increases the power as it narrows the curves so less of the observed data is likely to fall within "rejection" region
66
How does significance level affect the power of our study
Increasing significance level decreases the power
67
What is correlation?
Describes the relationship between two variables
68
What is regression
Regards one variable as the predicted and one as the outcome
69
What is the 'predictor varibale'
Independent variable
70
What is the 'outcome variable'?
Dependant variable
71
What assumptions do we make when looking at regression
1. Y is normally distributed at each normal value of X | 2. The variance of Y at every value of X is the same (
72
How do we calculate the residual of a data set
observed value-predicted value
73
How do we calculate the observed value when calculating regression?
We extrapolate data from a linear graph
74
What formula does a linear graph follow
y=mx+c
75
List some functions of multivariate analysis
1. Control for cofounders 2. Test for interactions between predictors 3. Improve predictions
76
Define risk ratio
Rate of condition in exposed: rate of condition in no exposed
77
When are risk ratios used
WUsed for categorical data
78
What is an odds ration
Odds of event occurring in a treatment group: odds of event occurring in a control group
79
What does an odd ratio of 1 mean
No difference between control and treatment group
80
What does an odds ration of not 1 mean
There is an association between the groups
81
What is survival analysis?
A statistical method for analysing longitudinal data on occurrence of events
82
Name the curve commonly used to describe survivorship of study populations
The Kaplan Meier curve
83
What does a correlation co efficient of -1 mean
Negative relationship as the x variable increases y decreases
84
What does a correlation co efficient of +1 mean
Positive relationship as the x variable increases y increases
85
What does a correlation co efficient of 0mean
no association as x increases y stays the same (straight line on a graph)