Statistics Flashcards Preview

Research Design, Stats, and Test Construction > Statistics > Flashcards

Flashcards in Statistics Deck (67):
1

nominal data

involves tallying people to see which non-ordered category each person falls into
e.g. sex, voting preference, ethnicity

2

ordinal data

involves tallying people to see which ordered category each person falls into
group means cannot be calculated from ordinal data

3

interval data

involves obtaining numerical scores for each person, where score values have equal intervals
either no zero score (e.g. IQ scores, t-scores) or zero is not absolute (e.g. temperature)
group mean can be calculated from interval data

4

ratio data

involves obtaining numerical scores for each person, where scores have equal intervals and an absolute zero
e.g. savings in bank, scores on EPPP, number of children, weight
comparisons can be made across score values (e.g. $10 is twice as much as $5)

5

measures of central tendency

mean, median, mode
best measure of central tendency typically the mean
when data skewed or there are some very extreme scores present, median preferable

6

standard deviation

measure of average deviation (or spread) from the mean in a given set of scores
square root of the variance

7

variance

standard deviation squared

8

range

crudest measure of variability
difference between highest and lowest value obtained

9

positive skew

higher proportion of scores in the lower range of values
mode has lowest value, mean has highest value
(bump on left)

10

negative skew

higher proportion of scores in the higher range of values
mean has lowest value, mode has highest value
(bump on right)

11

kurtosis

how peaked a distribution is
leptokurtotic distribution - very sharp peak
platykurtotic - flattened

12

norm-referenced score

provides information on how the person scored relative to the group
e.g. percentile rank

13

criterion-reference or domain-referenced score

e.g. percentage correct

14

standard scores

based on the standard deviation of the sample
e.g z-scores, t-scores, IQ scores, SAT scores, EPPP scores

15

z-scores

mean of zero, SD of one
shape of z-score distribution always identical to shape of the raw score distribution
useful because correspond directly to percentile ranks (ONLY IF distribution is normal) and easy to calculate from raw score data
transforming raw scores into z-scores does not normalize distribution

16

z-score formula

z=(score-mean)/(SD)

17

standard error of the mean

if researcher were to tape many, many samples of equal size and plot the mean IQ scores of these samples, researcher would get normal distribution of means
any spread or deviation in these means is error
average amount of deviation = standard error of the mean

18

standard error of the mean formula

SD(population) / SQRT (N)

19

central limit theorem

assuming an infinite number of equal sized samples are drawn from the population, and the means of these samples are plotted, a normally distributed of the means will result
tells researcher how likely it is that particular mean will be obtained just y chance - can calculate whether the obtained mean is most likely due to treatment or experimental effects or to chance (sampling error, random error)

20

rejection region

aka rejection of unlikely values
size of rejection region corresponds to alpha level e.g. when alpha is .05, rejection region is 5% of curve
when obtained values fall in rejection region, null hypothesis rejected, researcher concludes treatment did have an effect

21

Type I error

mistakenly rejecting null (differences found when they don't exist)
corresponds to alpha

22

Type II error

mistakenly accepting null (differences not found, but they do exist)
corresponds to beta

23

power

defined as ability to correctly reject the null
increased when sample size is large, magnitude of intervention is large, random error is small, statistical test is parametric, test is one-tailed
power = 1-beta
as alpha increases, so does power

24

non-parametric tests

e.g. Chi-square, Mann-Whitney, Wilcoxin
if DV is nominal or ordinal

25

parametric tests

e.g. t-test, ANOVA
if DV is interval or ratio

26

assumptions of parametric tests

homoscedasticity - there should be similar variability or SD in the different groups
data are normally distributed

27

Kolmogorowv-Smirnov test

same qualifications as independent samples or single sample t-test, except it's a non-parametric test
1 IV, 1 DV
1 or 2 independent groups

28

Wilcoxon (sign rank)

same qualifications as matched t-test, except it's a non-parametric test
1 IV, 1 DV
2 correlated groups

29

Krusall Wallis

same qualifications as 1-way ANOVA, except it's a non-parametric test
1 IV, 1 DV
>2 independent groups

30

Friedman test

same qualifications as 1-way repeated measures ANOVA, except it's non-parametric test
1IV, 2 DV
>2 correlated groups

31

single sample chi-square test
description and degrees of freedom

nominal data collected for one independent variable
e.g. 100 psychologists sampled as to voting preference
df = #columns - 1 (in example, 3-1=2 df)

32

multiple sample chi-square

nominal data collected for two IVs
e,g. 100 psychologists sampled for voting preference and ethnicity
df = (#rows - 1)(#columns-1)
in example (3-1)(5-1) = 2X4 = 8

33

t-test for simple sample

interval or ratio data collected for one group of subjects
df=N-1

34

t-tests for matched or correlated samples

interval or ratio data collected for two correlated groups of subjects
df = #pairs - 1

35

t-tests for independent samples

interval or ratio data collected for two independent groups of subjects
df = N-2

36

one-way ANOVAs: dfs

df total = N-1
df between groups = #groups-1
df within groups = dftotal - dfbetween

37

One-Way ANOVA:
F ratio

MSbetween/MSwithin
When F ratio equals or approximately 1, no significance
As F ratio gets above 2.0, typically considered to be significant

38

One-Way ANOVA: mean squares

MS between = SS between/df between
MS within = SS between/df within

39

Post Hoc tests

Scheffe followed by Tukey, provide most protection from Type I error (most conservative)
Fisher's LSD provides least protection from Type I error
Duncan, Dunette, Neuman-Kuels provide mid-range protection
REVERSE true for Type II error

40

assumptions of bivariate correlations

linear relationship
homoscedasticity - similar spread of scores across the entire scatter plot
unrestricted range

41

Spearman's Rho or Kendall's Tau Correlation

ordinal (rank ordered) X
ordinal (rank ordered) Y

42

Pearson's r Correlation

interval or ratio X
interval or ratio Y

43

Point-Biserial Correlation

interval or ratio X
true dichotomy Y

44

Biserial Correlation

interval or ratio X
artificial dichotomy Y

45

Phi Correlation

true dichotomy X
true dichotomy Y

46

Tetrachoric Correlation

artificial dichotomy X
artificial dichotomy Y

47

Eta correlation

curvilinear relationship between X and Y

48

zero-order correlation

most basic correlation
analyzes relationship between X and Y when it is believed that there are no extraneous variables affecting the relationship

49

partial correlation (first order correlation)

examines the relationship between X and Y with the effect of a third variable removed
e.g. if it is believed that parent education (third variable) affects both SAT an GPA, this variable could be measured and its effect removed from the correlation of SAT and GPA

50

part (semipartial) correlation

examines relationship between Z and Y with the influence of a third variable removed from only one of the original variables

51

coefficient of multiple determination

R squared
index of the amount of variability in the criterion Y that is accounted for by the combination of all the predictors (Xs)

52

multiple R

correlation between 2 or more IVs (Xs) and one DV (Y) where Y is always interval or ratio data at at least one X is interval or ratio data

53

multicollinearity

problem that occurs in multiple regression when predictors are highly correlated with one another and essentially redundant

54

canonical R

extension on multiple R
correlation between two or more IVs (X) and two or more DVs (Y)
e.g. examining relationship between time spent studying for EPPP (X1) and number f practice tests completed (X2) with score obtained on exam (Y1) and amount of subjected distress experienced while taking the exam (Y2)

55

discriminant function analysis

special case of multiple regression
used when there are two or more Xs and one Y
however, used when Y is nominal (Categorial)

56

loglinear anlysis

aka logit analysis
used to predict categorical Y based on categorical Xs
e.. if type of graduate school and sex were used to predict likelihood of passing or failing the EPPP

57

path analysis

applies multiple regression techniques to testing a model that specifies causal links among variables

58

structural equation modeling

enables researchers to make inferences about causation
e.g. LISREL ( Linear Structure Relations)

59

factor analysis

operates by extracting as many significant factors from data as possible

60

eigenvalues

factor analysis
indicates strength of factor
<1.0 usually not considered significant
aka characteristic root

61

factor loadings

correlation between a variable (e.g. item or subtest) and underlying factor
interpreted if equal or exceed +/- .30

62

orthogonal rotation

type of factor rotation
axes remain perpendicular (90 degrees)
always results in factors that have no correlation with one another
generally preferred because easier to interpret
communalities must be calculated

63

communalities

calculated in orthogonal rotation
refers to how much of a test's variability is explained by combination of all the factors
factor loadings all squared and added together

64

oblique rotation

type of factor rotation
angle between axes is non-perpendicular and factors are correlated
some argue that oblique rotations are preferable to orthogonal rotations because factors tend to be correlated in the real world

65

principal components analysis

type of factor analysis
when one is trying to extract factors and there is no empirical or theoretical guidance on the values of the communalities
always results in a few unrelated factors, called components
factors empirically derived, researcher has no prior hypotheses
first factor (component) accounts for largest amount of variability, each additional component explaining somewhat less

66

(principle) factor analysis

type of factor analysis
communality values would need to be ascertained before analysis

67

Normal curve

See pic