module 5 Flashcards

(36 cards)

1
Q

data fishiness assumptions

A
  • assumption of normality
  • assumption of homogeneity of variance
  • independence of observation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

assumption of normality general definition

A

scores on the dependent variable within each group are assumed to be sampled from a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

NHST for evaluating normality general definition

A
  • tests if sample distribution is sig different from normal distribution (same mean and SD)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what tests are used for NHST tests for assumption of normality

A
  • shapiro wilkes test
  • kolomogorov smirnov test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

skew and kurtosis definition and cut offs

A
  • skew: asymmetry of distribution (0=normal) for descriptive approach >2
  • kurtosis: measure of how heavy/light distribution tails are (heavy=high kurtosis/many outliers, light=low kurtosis/no outliers) for descriptive approach >7
  • for both, 1.96 or above is non normal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

limitations of stat tests of normality

A
  • big difference needed for small samples, small difference for large sample
  • non-normality is less of a concern in small samples
  • doesnt take type of non normality into account
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

descriptive approach for evaluating normality definition

A
  • looks at descriptives and or graphic displays to quantify magnitude and nature of non-normality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

____ kurtosis is more problematic than ____ kurtosis in t tests, ANOVAs, correlations, and regressions

A

positive, negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

which approach makes more sense for normality testing; NHST or descriptives

A

descriptives bc it combines threshold of values and qq plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

thin vs fat tails for normality distributions

A
  • thin: fewer extreme observations than normal distributions
  • fat: more extreme observations than normal distributions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

if data is normal, scatterplot should resemble a _____

A

straight line (as opposed to cloud shape)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

if the middle of the scatterplot line is straight and the ends flatten, it _____

A

indicates thin tails and is not problematic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

if the middle of the scatterplot line is straight and the ends have a steep slope, it _____

A

indicates fat tails and is problematic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

assumption of homogeneity of variance definition

A

variance of scores on dependent variable with in each group (condition) are the same across all groups (conditions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

evaluating homo of variance; NHST approach definition

A
  • tests if variances in groups are significantly dif from one another
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

evaluating homo of variance; descriptive approach

A
  • looks only at imperfection
  • looks at descriptive stats and or graphic displays to quantify magnitude of differential variances (largest vs smallest SD)
  • looks at threshold ratio of largest to smallest variances
17
Q

tests for homo of variance

A
  • levenes tests
  • hertley variance ratio test or f-max tests
18
Q

limitations of NHST approach for homo of variance

A
  • role of sample size (dif in variance is less concern for small and more concern for larger sample sizes)
  • insensitive to dif in variance in small and sensitive to big
  • dif in variance is a magnitude problem
19
Q

if variances are equal, scatterplot should resemble a straight line with a slope of ___ and the intercept is ____ whereas when the variances are not equal, scatterplot will not cluster around the line and will be different from __

A

1, the difference between means,1

20
Q

independence of observation definition

A
  • each observation (between subjects) or set of observations (repeated measures) from the dataset is independent of all other observations/sets
  • ex of independance= roommates/partners
21
Q

positive associations inflate ___ and negative associations inflate ___

22
Q

evaluating independence of observation

A
  • examine structural properties of data to see if basis exists for questioning validity of assumption
  • if no evident basis, its okay to carry on
  • thresholds are up for debate
  • if basis exists, independence can be assessed by computing interclass correlation for the part of data that is assumed to have lack of independence
  • if correlation is very small (<0.10), its fine to use t test/ANOVA
23
Q

address violation for normality

A
  • use alt stat procedures that dont need normality
  • evaluate level of measurement assumptions
  • identity and remove outliers
  • transform data to normalize distribution
24
Q

address violations of homo of variance

A
  • use alt procedures that dont need normality
  • evaluate level of measurement assumptions
  • identity and remove outliers
25
addressing violations of independence of observations
- alt stat procedures - ex multi level modeling (MLM) or hierarchal linear modeling (HLM)
26
outliers definition
- extreme values that differ largely from other other observations in dataset and suggest theyre drawn from another population
27
examples of common outliers
- data entry/encoding error (less common now, no longer manual data entry) - response latency data (longer response time due to distortion of error, due to distraction etc) - open ended estimate data
28
problems with outliers
- responsible often for violations of homo variance/normality - conceptual validity - disproportionate influence on stat results
29
identifying outliers
- impossible values in frequency tables/histogram - steep tails in normal qq plots - standardized residuals for observations - studentized deleted residuals
30
standardized residuals for observations
- index of deviation from the mean - follows z distribution - normal distributed N=100, 1 value should be >2.6 - normal distributed N=1000, 4 values should be >3.0 - general threshold of 4 or 5 is suggested
31
studentized deleted residuals
- index of deviation from mean (not including target observation in mean and SD calculation) - follows t distribution of df=n-2 - sample of 100, value of >3.6 = outlier - sample of 1000, value of >4.07 = outlier
32
response to outlier
- correct or treat impossible values as missing data - possible but highly discrepant values can be trimmed or capped to most extreme value/specified values - highly discrepant values are treated as missing
33
philosophical issues w outliers
- minimalist perspective: never touch the data, strong rational needed for deletion/alteration of data (due to potential abuse) - maximalist perspective: routine altering/deleting of values, outliers violate assumptions, hard to interpret, must set clear rules/procedures to avoid abuse - intermediate perspective: justifiable w/ clear rules/procedures and high thresholds for outliers
34
levels of measurement
- nominal: # assignment is abt group membership/categorical (ex nationality) - ordinal: # assignment is abt rank order on scale but is not reflective of mag of dif (ex favs, difference between top 1-2 and 4-5 may be different) - interval: # assignment is abt rank order and mag of dif but no ratio (ex C degrees scale, 0 for freezing, 100 for boiling) - ratio: # assignment is abt rank order, mag and ratio dif (ex mass, length)
35
what level of measurement has an absolute meaning ful zero (0) point
ratio
36
before conducting analysis (t test/ANOVA) and descriptive stats, its only meaningful independent variable has at least _______ properties
interval