Flashcards in Statistics Deck (67):

1

## nominal data

###
involves tallying people to see which non-ordered category each person falls into

e.g. sex, voting preference, ethnicity

2

## ordinal data

###
involves tallying people to see which ordered category each person falls into

group means cannot be calculated from ordinal data

3

## interval data

###
involves obtaining numerical scores for each person, where score values have equal intervals

either no zero score (e.g. IQ scores, t-scores) or zero is not absolute (e.g. temperature)

group mean can be calculated from interval data

4

## ratio data

###
involves obtaining numerical scores for each person, where scores have equal intervals and an absolute zero

e.g. savings in bank, scores on EPPP, number of children, weight

comparisons can be made across score values (e.g. $10 is twice as much as $5)

5

## measures of central tendency

###
mean, median, mode

best measure of central tendency typically the mean

when data skewed or there are some very extreme scores present, median preferable

6

## standard deviation

###
measure of average deviation (or spread) from the mean in a given set of scores

square root of the variance

7

## variance

### standard deviation squared

8

## range

###
crudest measure of variability

difference between highest and lowest value obtained

9

## positive skew

###
higher proportion of scores in the lower range of values

mode has lowest value, mean has highest value

(bump on left)

10

## negative skew

###
higher proportion of scores in the higher range of values

mean has lowest value, mode has highest value

(bump on right)

11

## kurtosis

###
how peaked a distribution is

leptokurtotic distribution - very sharp peak

platykurtotic - flattened

12

## norm-referenced score

###
provides information on how the person scored relative to the group

e.g. percentile rank

13

## criterion-reference or domain-referenced score

### e.g. percentage correct

14

## standard scores

###
based on the standard deviation of the sample

e.g z-scores, t-scores, IQ scores, SAT scores, EPPP scores

15

## z-scores

###
mean of zero, SD of one

shape of z-score distribution always identical to shape of the raw score distribution

useful because correspond directly to percentile ranks (ONLY IF distribution is normal) and easy to calculate from raw score data

transforming raw scores into z-scores does not normalize distribution

16

## z-score formula

### z=(score-mean)/(SD)

17

## standard error of the mean

###
if researcher were to tape many, many samples of equal size and plot the mean IQ scores of these samples, researcher would get normal distribution of means

any spread or deviation in these means is error

average amount of deviation = standard error of the mean

18

## standard error of the mean formula

### SD(population) / SQRT (N)

19

## central limit theorem

###
assuming an infinite number of equal sized samples are drawn from the population, and the means of these samples are plotted, a normally distributed of the means will result

tells researcher how likely it is that particular mean will be obtained just y chance - can calculate whether the obtained mean is most likely due to treatment or experimental effects or to chance (sampling error, random error)

20

## rejection region

###
aka rejection of unlikely values

size of rejection region corresponds to alpha level e.g. when alpha is .05, rejection region is 5% of curve

when obtained values fall in rejection region, null hypothesis rejected, researcher concludes treatment did have an effect

21

## Type I error

###
mistakenly rejecting null (differences found when they don't exist)

corresponds to alpha

22

## Type II error

###
mistakenly accepting null (differences not found, but they do exist)

corresponds to beta

23

## power

###
defined as ability to correctly reject the null

increased when sample size is large, magnitude of intervention is large, random error is small, statistical test is parametric, test is one-tailed

power = 1-beta

as alpha increases, so does power

24

## non-parametric tests

###
e.g. Chi-square, Mann-Whitney, Wilcoxin

if DV is nominal or ordinal

25

## parametric tests

###
e.g. t-test, ANOVA

if DV is interval or ratio

26

## assumptions of parametric tests

###
homoscedasticity - there should be similar variability or SD in the different groups

data are normally distributed

27

## Kolmogorowv-Smirnov test

###
same qualifications as independent samples or single sample t-test, except it's a non-parametric test

1 IV, 1 DV

1 or 2 independent groups

28

## Wilcoxon (sign rank)

###
same qualifications as matched t-test, except it's a non-parametric test

1 IV, 1 DV

2 correlated groups

29

## Krusall Wallis

###
same qualifications as 1-way ANOVA, except it's a non-parametric test

1 IV, 1 DV

>2 independent groups

30

## Friedman test

###
same qualifications as 1-way repeated measures ANOVA, except it's non-parametric test

1IV, 2 DV

>2 correlated groups

31

##
single sample chi-square test

description and degrees of freedom

###
nominal data collected for one independent variable

e.g. 100 psychologists sampled as to voting preference

df = #columns - 1 (in example, 3-1=2 df)

32

## multiple sample chi-square

###
nominal data collected for two IVs

e,g. 100 psychologists sampled for voting preference and ethnicity

df = (#rows - 1)(#columns-1)

in example (3-1)(5-1) = 2X4 = 8

33

## t-test for simple sample

###
interval or ratio data collected for one group of subjects

df=N-1

34

## t-tests for matched or correlated samples

###
interval or ratio data collected for two correlated groups of subjects

df = #pairs - 1

35

## t-tests for independent samples

###
interval or ratio data collected for two independent groups of subjects

df = N-2

36

## one-way ANOVAs: dfs

###
df total = N-1

df between groups = #groups-1

df within groups = dftotal - dfbetween

37

##
One-Way ANOVA:

F ratio

###
MSbetween/MSwithin

When F ratio equals or approximately 1, no significance

As F ratio gets above 2.0, typically considered to be significant

38

## One-Way ANOVA: mean squares

###
MS between = SS between/df between

MS within = SS between/df within

39

## Post Hoc tests

###
Scheffe followed by Tukey, provide most protection from Type I error (most conservative)

Fisher's LSD provides least protection from Type I error

Duncan, Dunette, Neuman-Kuels provide mid-range protection

REVERSE true for Type II error

40

## assumptions of bivariate correlations

###
linear relationship

homoscedasticity - similar spread of scores across the entire scatter plot

unrestricted range

41

## Spearman's Rho or Kendall's Tau Correlation

###
ordinal (rank ordered) X

ordinal (rank ordered) Y

42

## Pearson's r Correlation

###
interval or ratio X

interval or ratio Y

43

## Point-Biserial Correlation

###
interval or ratio X

true dichotomy Y

44

## Biserial Correlation

###
interval or ratio X

artificial dichotomy Y

45

## Phi Correlation

###
true dichotomy X

true dichotomy Y

46

## Tetrachoric Correlation

###
artificial dichotomy X

artificial dichotomy Y

47

## Eta correlation

### curvilinear relationship between X and Y

48

## zero-order correlation

###
most basic correlation

analyzes relationship between X and Y when it is believed that there are no extraneous variables affecting the relationship

49

## partial correlation (first order correlation)

###
examines the relationship between X and Y with the effect of a third variable removed

e.g. if it is believed that parent education (third variable) affects both SAT an GPA, this variable could be measured and its effect removed from the correlation of SAT and GPA

50

## part (semipartial) correlation

### examines relationship between Z and Y with the influence of a third variable removed from only one of the original variables

51

## coefficient of multiple determination

###
R squared

index of the amount of variability in the criterion Y that is accounted for by the combination of all the predictors (Xs)

52

## multiple R

### correlation between 2 or more IVs (Xs) and one DV (Y) where Y is always interval or ratio data at at least one X is interval or ratio data

53

## multicollinearity

### problem that occurs in multiple regression when predictors are highly correlated with one another and essentially redundant

54

## canonical R

###
extension on multiple R

correlation between two or more IVs (X) and two or more DVs (Y)

e.g. examining relationship between time spent studying for EPPP (X1) and number f practice tests completed (X2) with score obtained on exam (Y1) and amount of subjected distress experienced while taking the exam (Y2)

55

## discriminant function analysis

###
special case of multiple regression

used when there are two or more Xs and one Y

however, used when Y is nominal (Categorial)

56

## loglinear anlysis

###
aka logit analysis

used to predict categorical Y based on categorical Xs

e.. if type of graduate school and sex were used to predict likelihood of passing or failing the EPPP

57

## path analysis

### applies multiple regression techniques to testing a model that specifies causal links among variables

58

## structural equation modeling

###
enables researchers to make inferences about causation

e.g. LISREL ( Linear Structure Relations)

59

## factor analysis

### operates by extracting as many significant factors from data as possible

60

## eigenvalues

###
factor analysis

indicates strength of factor

<1.0 usually not considered significant

aka characteristic root

61

## factor loadings

###
correlation between a variable (e.g. item or subtest) and underlying factor

interpreted if equal or exceed +/- .30

62

## orthogonal rotation

###
type of factor rotation

axes remain perpendicular (90 degrees)

always results in factors that have no correlation with one another

generally preferred because easier to interpret

communalities must be calculated

63

## communalities

###
calculated in orthogonal rotation

refers to how much of a test's variability is explained by combination of all the factors

factor loadings all squared and added together

64

## oblique rotation

###
type of factor rotation

angle between axes is non-perpendicular and factors are correlated

some argue that oblique rotations are preferable to orthogonal rotations because factors tend to be correlated in the real world

65

## principal components analysis

###
type of factor analysis

when one is trying to extract factors and there is no empirical or theoretical guidance on the values of the communalities

always results in a few unrelated factors, called components

factors empirically derived, researcher has no prior hypotheses

first factor (component) accounts for largest amount of variability, each additional component explaining somewhat less

66

## (principle) factor analysis

###
type of factor analysis

communality values would need to be ascertained before analysis

67