Quiz 4 Flashcards

(61 cards)

1
Q

Three General Assumptions of Parametric Statistical Tests

A
  1. Normality of sampling distributions/population residuals
  2. When you compare more than 1 population, the variances of the populations are equal
  3. Data from your population are independent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Parametric Tests

A
  • statistical tests used to estimate a specific parameter value (e.g. t-tests)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Normality

A

Inferential statistics assume that our sample data are drawn from normal sampling distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can we know about normality?

A
  • If we have a large enough sample (typically more than 30), we have met the assumption
  • If we have a small sample, we examine our sample to infer normality of the sampling distribution
  • If you have multivariate data, examine each variable by itself to see if it is normally distributed (e.g. aX + bY) → a linear combination needs to be normal
  • If we have a normally distributed sample data, it is likely that it is from a normally distributed population → sampling distribution would be normal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Skewness

A
  • When a distribution is perfectly normal, the values of skewness and kurtosis are zero
  • Positive skewness means that there is a pile up of cases on the left and a long right tail (skewed to the right)
  • Negative skewness means that there is a pileup of cases to the right and a long left tail (skewed to the left)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When does the CLT not work/apply?

A
  • When distributions have thick tails
  • If your sample is small
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The Central Limit Theorem

A
  • The CLT is one of the most remarkable results of the theory of probability
  • In its simplest form, theorem states that:
    - The mean of a large
    number of
    independent
    observations from
    the same distribution
    has, under certain
    general conditions, an
    approximate normal
    distribution
  • Note: exception of distributions with heavy tails
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Testing Normality in Single Variables

A

TB p. 183-191

  1. Is the sample size big enough to assume that the sampling distribution is normally distributed?
  2. Look at histogram of each continuous variable → starting at visual inspection of normality
  3. Perform the Kolmogrorov-Smirnov (K-S) test or the Shapiro-Wilk test
  • Significant results would suggest that the data are NOT normally distributed
  • Caveat: the power of the test depends on the sample size, and is often a moot point because in large samples without thick tails, we would assume normality anyways
  • Shapiro-Wilk test is highly sensitive to even small deviations from normality in large samples
    Look to skewness and kurtosis stats
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

For the formula for skewness

A
  • Average of the z scores raised to the third power

–> Increases the influence of outliers by raising to the third power

  • Converting raw scores into z scores
  • If skewed to the right, we will get a positive skewness score
  • If skewed to the left, we will get a negative value
  • No skewness → Formula results in zero

Cutoffs: ± 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If skewed to the right, we will get a _______ skewness score

A

positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If skewed to the left, we will get a ______ value

A

negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Formula for Kurtosis K4

A
  • Kurtosis values above zero indicate a distribution that is too peaked with short thicks tails
  • Kurtosis value below zero is platykurtic

Leptokurtic (thicker tails)→ positive kurtosis statistic

Platykurtic (thinner tails) → negative kurtosis statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Kurtosis value below zero is _________

A

platykurtic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Leptokurtic (thicker tails)→ ______ kurtosis statistic

A

positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Platykurtic (thinner tails) → ________ kurtosis statistic

A

negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Rule of thumb cutoffs for kurtosis

A

±7 be concerned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Significance Tests for Skewness and Kurtosis

A

Step 1: convert skewness and kurtosis scores into z scores

Step 2: compare z scores to critical values of ±1.96 for small samples and ±2.58 for large samples. If greater than the critical value, significant skewness/kurtosis

  • More stringent for larger samples
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the Big Deal if a Distribution Isn’t “Normal?”

A
  • We could get inaccurate results from our analysis
  • Mess with type I and type II error rates
  • Meaning that the null could be true when our stats tell us it isn’t or vice versa
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What to do if normality assumption is NOT met

A
  1. Data transformation
    Appropriate when there is skewness in distribution
    - Replacing the data with a function of all the data within that variable
  2. Non-parametric tests
  3. Modern methods (e.g. bootstrapping)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Data Transformation

A

Most common → square root transformation

Most useful when data are skewed to the right

Pulls more extreme values closer to the middle

Bigger impact on bigger values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Square Root Transformation is most useful when _______

A

data are skewed to the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

When data are skewed left what transformation can be done?

A

When data are skewed to the left:

  • Reflect scores and the do a square root transformation
  • Subtract values from a large number to reflect
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Log transformations

A

For extreme positive skew → reduces positive skew
Pulls in values to a greater degree than square root transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Inverse transformation

A
  • Transforms data with extreme positive skew to normal
  • 1 / (value of data)
  • Need to add a constant to bring all values to non zero, but CAN have a negative as long as there’s no zero
  • Table 6.1 in TB
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
To Transform … or Not
- The CLT: sampling distribution will be normal in samples >30 (unless sample distribution has a substantial skew; heavy tails) - Transformations sometimes fail to restore normality and equality of variances - They make the interpretation of results difficult, as findings are based on transformed data
24
Rank Order Tests (AKA Non-Parametric Tests)
Can be used in place of their parametric counterparts when it is questionable that the normality assumption holds sometimes more powerful in detecting population differences when certain assumptions are not satisfied Nonparametric tests cannot assess interactions
25
Modern Methods
Refers to approaches dealing with non normal data that require a lot of computing power - E.g. bootstrapping methods
26
Bootstrapping
Goal is to observe the sampling distribution shape directly to allow us to do hypothesis testing Uses sample data to estimate the sampling distribution itself By drawing random samples with replacement from the sample data Because sampling distribution is estimated directly, no need to assume normality P-value can be calculated based on how rare it is to get the observed test-statistic value or more extreme values in the estimated sampling distribution (regardless of whether it is normal or not)
27
Rule of thumb for Bootstrapping
5,000 - 10,000 bootstrap samples
28
Can use bootstrapping to create a CI
Find central value (mean) of the data points and look at what falls at lower 2.5th percentile and upper 97.5th percentile and use them as upper and lower bounds for CI
29
The Three Assumptions of Parametric Statistics
1. The sampling distribution(s) is(are) normally distributed 2. Homogeneity (equality) of Variance 3. Data from your population are independent
30
Homogeneity (equality) of Variance
The assumption that the dependent variable exhibits similar amounts of variance across the range of values for an independent variable
31
Assessing Homogeneity of Variance
Visual inspection of graphs - Scatter plot, residual plot Levene’s Tests Can become overly sensitive in large samples Variance Ratio (Hartley’s FMAX)
32
Variance Ratio (Hartley's Fmax)
With 2 or more groups VR = Large variance/smallest variance If VR < 2, homogeneity of variance can be assumed If the group sizes are roughly equal, hypothesis testing results are robust to the violation of homogeneity Would still likely still get valid results even if If the largest group size is smaller than 1.5 times the smallest group size → can use this concept
33
Leven's Tests
- Tests if variance in different groups are the same - Significant = variances not equal - Non significant = variances are equal Null hypothesis is that there’s homogeneity of variance
34
Visual inspection of graphs for assessing homogeneity of variance
- Scatter plot, residual plot *Space between line of best fit and actual point --> Deviation, error, residual
35
Addressing Homogeneity of Variance
1. Using robust methods 2. Bootstrapping 3. Transforming an outcome variable
36
Why is independence of data important?
- general formula for test statistic involves two types of variability, one in numerator and one in denom - Formula for test-statistic = explained variability/unexplained variability - We want test statistic to be bigger such that we have greater explanatory power BUT with dependent data, unexplained variability becomes artificially smaller In the case of dependent data → increased Type I error rate
37
Measuring Relations Between Variables
We can see whether, as one variable deviates from its own mean, the other deviates in the same way from its own mean, the opposite way, or stays the same This can be done by calculating the covariance If there is a similar (or opposite) pattern, we say the two variables covary
38
Variance
measure of how much a group of scores deviates from the mean of a single variable Average squared deviation from the mean
39
Covariance
Tells us by how much a group of scores on two variables differ from their respective means
40
Covariance Steps
Calculate the deviation (error) between the mean and each subject’ score for the first variable (x) Calculate the deviation (error) between the mean and their score for the second variable (y) Multiply these deviations (error) values These multiplied numbers are called the cross product deviations Add up these cross product deviations Divide by N-1 → result is covariance
41
COVARIANCE IS THE AVERAGE _________
CROSS-PRODUCT DEVIATION
42
Limitations of Covariance
depends upon the units of measurement E.g. the covariance of two variables measures in miles and dollars would be much smaller than if the variables were measures in feet and cents, even if the relationship was exactly the same
43
Solution to the limitations of covariance
Solution --> standardize it Divide by the product of standard deviations of both variables The standardized version of covariance is known as the correlation coefficient Relatively unaffected by units of measurement
44
Correlation Coefficient
When x and y are both continuous, their correlation is called the Pearson Product Moment Correlation Coefficient Correlation statistics are standardized and can range from -1 to 1 It can be used as a measure of effect size
45
The equation for Correlation is similar to z scores....
both a standardized statistic that can compare across samples
46
Convention for effect size in correlation
Convention .1 = small effect .3 = medium effect .5 = large effect
47
Correlation and Causality
Correlation is a necessary but not sufficient criteria for causality Possible directions of causality: X → Y X ← Y A third factor leads to changes in both x and y Correlation is by coincidence
48
Things needed to determine causality
Temporal precedence Demonstrating empirical association between variables Control for confounds
49
Types of Correlations
Pearson’s Correlation (r) Spearman’s p (greek rho) (rs) Kendall’s Tau (t) Point Biserial Correlation (rpb) Biserial Correlation (rb) Phi Coefficient (φ)
50
Pearson’s Correlation (r)
For analyzing relationship between two continuous variables Assumes normality, homogeneity of variance, independence of data, ANDlinear relationship
51
Spearman’s p (greek rho) (rs)
When one or both variables are ordinal E.g. SAT score and high school class standing Nonparametric alternative to pearson’s coefficient Does not assume linear relationship
52
Kendall’s Tau (t)
Better for smaller samples Possible to be helpful in cases of ranks?
53
Point Biserial Correlation (rpb)
One continuous and one dichotomous variable Used for continuous and binary variable that is quantitatively dichotomous
54
Biserial Correlation (rb)
One continuous and one dichotomous variable When the variable is not truly quantitatively dichotomous, but treated as such E.g. pass/fail class grades (categories based on continuum) Median split → create groups
55
Phi Coefficient (φ)
Two categorical variables
56
Procedure in Experimental research to Rule Out Confounding Variables
Random assignment
57
Coefficient of Determination, R2
By squaring the value of r, you get the proportion of variance in one variable shared by the other(s), R2 Can only take the value of 0-1, because it is a squared value (must be positive)
57
Spurious Relationship
That the two variables have no causal connection, yet it may be inferred that they do, due to an unknown confounding factor I.e. ice cream sales and death by drowning → third confounding is temperature outside
58
Caveat for biserial and point biserial correlations
when group sizes for binary variable are unequal → correlation values can become smaller