(1) Basic Statistical Concepts Flashcards

(33 cards)

1
Q

Normalization

A

forces something into a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Standardization

A

dividing it by something to remove its effect

Ex: dividing something by area of pop size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

QQ/quantile plot

A

Visualization to see if data is normally distributed

negative = points curve beneath line
positive skew = points curve above
normal = points are on line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

r or coefficient of correlation

A

looks at whether 2 variables vary together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Range for correlation coefficient and what is positive/negative/0?

A

-1 to 1
positive = both variables go up
negative = one goes up, one goes down
0 = no association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Standard deviation

A

measures how far data values are from the mean
little variation in values means small standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Analysis of variance (ANOVA)

A

Parametric test to see if there are significant differences in 3+ categorical groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Covariance

A

Testing 2 variables to see if they vary together or not using a correlation coefficient (r)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Kernel density (3 facts about it)

A
  1. removes statistical noise from data by smoothing it
  2. Uses Gaussian weighting (closer points = more weight)
  3. good for showing generalized densities of points
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

p value (3 facts)

A
  1. doesn’t tell you size of difference, just that there is one
  2. says if result is significant
  3. whether or not to reject null hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to use a p value in a sentence to explain random chance and null hypothesis (hint: %)

A
  1. ___% chance you saw these results by random chance
  2. ___% chance you are falsely rejecting the null hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

histogram

A

x-axis = category
y-axis = frequency in that category

way to visualize frequency/distribution of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Z score meaning

A

Number of standard deviations away from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Z score formula

A

(score - mean) / standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Coefficient of determination (r-squared)

A

High = good fit
Low = poor fit
How much of the variance in y is described by variance in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Sentence using coefficient of determination

A

Variable x explains 80% of the variation in variable y

17
Q

Kruskall-Wallis test

A

look at more than 2 populations for similarity

non-parametric version of ANOVA

18
Q

Central limit theorem

A

Distribution approaches normal as sample size increases

19
Q

Mann-Whitney U

A

compares 2 sample populations
non-parametric
scores are ranked from small to large and then ranks of scores are compared

20
Q

Sample mean

A

mean of a sample of the data

21
Q

non-parametric statistics (list tests)

A

does not follow a Gaussian distribution

Mann Whitney-U, Kruskall Wallis, Spearman’s Rho

22
Q

Normal (Gaussian) distribution (3 facts)

A
  1. follows a bell curve
  2. uses parametric stats
  3. defined using the mean/standard deviation
23
Q

Normal QQ plot

A

is like a qq/quantile plot but compares the data quantiles against the quantiles of a normal distribution

24
Q

Null hypothesis

A

no significant difference, effect, or relationship in the population

25
Parametric statistics (also list tests)
follows a Gaussian distribution 2 sample t test, ANOVA, Pearson's R/correlation
26
Parsimony
Keep it simple and make it clear
27
Pearson's R
measure the strength/direction between 2 variables Parametric
28
Residual plot
plots the residuals from a regression model If there is an obvious pattern to the residuals than the model might not work
29
Residuals
distance between point and the best fit line kind of like error
30
Interpreting residuals (+ and -)
+ = overestimating rates of something - = underestimating rates of something
31
Shapiro test
null hypothesis = samples come from a normal distribution
32
Spearman's Rho
compares differences between the ranks in 2 data sets values range from -1 - +1 (same as r-squared value)
33
Square of the error
quantifies difference between observed and expected values