Statistics Flashcards Preview

Research Design, Stats, and Test Construction > Statistics > Flashcards

Flashcards in Statistics Deck (67):

nominal data

involves tallying people to see which non-ordered category each person falls into
e.g. sex, voting preference, ethnicity


ordinal data

involves tallying people to see which ordered category each person falls into
group means cannot be calculated from ordinal data


interval data

involves obtaining numerical scores for each person, where score values have equal intervals
either no zero score (e.g. IQ scores, t-scores) or zero is not absolute (e.g. temperature)
group mean can be calculated from interval data


ratio data

involves obtaining numerical scores for each person, where scores have equal intervals and an absolute zero
e.g. savings in bank, scores on EPPP, number of children, weight
comparisons can be made across score values (e.g. $10 is twice as much as $5)


measures of central tendency

mean, median, mode
best measure of central tendency typically the mean
when data skewed or there are some very extreme scores present, median preferable


standard deviation

measure of average deviation (or spread) from the mean in a given set of scores
square root of the variance



standard deviation squared



crudest measure of variability
difference between highest and lowest value obtained


positive skew

higher proportion of scores in the lower range of values
mode has lowest value, mean has highest value
(bump on left)


negative skew

higher proportion of scores in the higher range of values
mean has lowest value, mode has highest value
(bump on right)



how peaked a distribution is
leptokurtotic distribution - very sharp peak
platykurtotic - flattened


norm-referenced score

provides information on how the person scored relative to the group
e.g. percentile rank


criterion-reference or domain-referenced score

e.g. percentage correct


standard scores

based on the standard deviation of the sample
e.g z-scores, t-scores, IQ scores, SAT scores, EPPP scores



mean of zero, SD of one
shape of z-score distribution always identical to shape of the raw score distribution
useful because correspond directly to percentile ranks (ONLY IF distribution is normal) and easy to calculate from raw score data
transforming raw scores into z-scores does not normalize distribution


z-score formula



standard error of the mean

if researcher were to tape many, many samples of equal size and plot the mean IQ scores of these samples, researcher would get normal distribution of means
any spread or deviation in these means is error
average amount of deviation = standard error of the mean


standard error of the mean formula

SD(population) / SQRT (N)


central limit theorem

assuming an infinite number of equal sized samples are drawn from the population, and the means of these samples are plotted, a normally distributed of the means will result
tells researcher how likely it is that particular mean will be obtained just y chance - can calculate whether the obtained mean is most likely due to treatment or experimental effects or to chance (sampling error, random error)


rejection region

aka rejection of unlikely values
size of rejection region corresponds to alpha level e.g. when alpha is .05, rejection region is 5% of curve
when obtained values fall in rejection region, null hypothesis rejected, researcher concludes treatment did have an effect


Type I error

mistakenly rejecting null (differences found when they don't exist)
corresponds to alpha


Type II error

mistakenly accepting null (differences not found, but they do exist)
corresponds to beta



defined as ability to correctly reject the null
increased when sample size is large, magnitude of intervention is large, random error is small, statistical test is parametric, test is one-tailed
power = 1-beta
as alpha increases, so does power


non-parametric tests

e.g. Chi-square, Mann-Whitney, Wilcoxin
if DV is nominal or ordinal


parametric tests

e.g. t-test, ANOVA
if DV is interval or ratio


assumptions of parametric tests

homoscedasticity - there should be similar variability or SD in the different groups
data are normally distributed


Kolmogorowv-Smirnov test

same qualifications as independent samples or single sample t-test, except it's a non-parametric test
1 IV, 1 DV
1 or 2 independent groups


Wilcoxon (sign rank)

same qualifications as matched t-test, except it's a non-parametric test
1 IV, 1 DV
2 correlated groups


Krusall Wallis

same qualifications as 1-way ANOVA, except it's a non-parametric test
1 IV, 1 DV
>2 independent groups


Friedman test

same qualifications as 1-way repeated measures ANOVA, except it's non-parametric test
1IV, 2 DV
>2 correlated groups


single sample chi-square test
description and degrees of freedom

nominal data collected for one independent variable
e.g. 100 psychologists sampled as to voting preference
df = #columns - 1 (in example, 3-1=2 df)


multiple sample chi-square

nominal data collected for two IVs
e,g. 100 psychologists sampled for voting preference and ethnicity
df = (#rows - 1)(#columns-1)
in example (3-1)(5-1) = 2X4 = 8


t-test for simple sample

interval or ratio data collected for one group of subjects


t-tests for matched or correlated samples

interval or ratio data collected for two correlated groups of subjects
df = #pairs - 1


t-tests for independent samples

interval or ratio data collected for two independent groups of subjects
df = N-2


one-way ANOVAs: dfs

df total = N-1
df between groups = #groups-1
df within groups = dftotal - dfbetween


One-Way ANOVA:
F ratio

When F ratio equals or approximately 1, no significance
As F ratio gets above 2.0, typically considered to be significant


One-Way ANOVA: mean squares

MS between = SS between/df between
MS within = SS between/df within


Post Hoc tests

Scheffe followed by Tukey, provide most protection from Type I error (most conservative)
Fisher's LSD provides least protection from Type I error
Duncan, Dunette, Neuman-Kuels provide mid-range protection
REVERSE true for Type II error


assumptions of bivariate correlations

linear relationship
homoscedasticity - similar spread of scores across the entire scatter plot
unrestricted range


Spearman's Rho or Kendall's Tau Correlation

ordinal (rank ordered) X
ordinal (rank ordered) Y


Pearson's r Correlation

interval or ratio X
interval or ratio Y


Point-Biserial Correlation

interval or ratio X
true dichotomy Y


Biserial Correlation

interval or ratio X
artificial dichotomy Y


Phi Correlation

true dichotomy X
true dichotomy Y


Tetrachoric Correlation

artificial dichotomy X
artificial dichotomy Y


Eta correlation

curvilinear relationship between X and Y


zero-order correlation

most basic correlation
analyzes relationship between X and Y when it is believed that there are no extraneous variables affecting the relationship


partial correlation (first order correlation)

examines the relationship between X and Y with the effect of a third variable removed
e.g. if it is believed that parent education (third variable) affects both SAT an GPA, this variable could be measured and its effect removed from the correlation of SAT and GPA


part (semipartial) correlation

examines relationship between Z and Y with the influence of a third variable removed from only one of the original variables


coefficient of multiple determination

R squared
index of the amount of variability in the criterion Y that is accounted for by the combination of all the predictors (Xs)


multiple R

correlation between 2 or more IVs (Xs) and one DV (Y) where Y is always interval or ratio data at at least one X is interval or ratio data



problem that occurs in multiple regression when predictors are highly correlated with one another and essentially redundant


canonical R

extension on multiple R
correlation between two or more IVs (X) and two or more DVs (Y)
e.g. examining relationship between time spent studying for EPPP (X1) and number f practice tests completed (X2) with score obtained on exam (Y1) and amount of subjected distress experienced while taking the exam (Y2)


discriminant function analysis

special case of multiple regression
used when there are two or more Xs and one Y
however, used when Y is nominal (Categorial)


loglinear anlysis

aka logit analysis
used to predict categorical Y based on categorical Xs
e.. if type of graduate school and sex were used to predict likelihood of passing or failing the EPPP


path analysis

applies multiple regression techniques to testing a model that specifies causal links among variables


structural equation modeling

enables researchers to make inferences about causation
e.g. LISREL ( Linear Structure Relations)


factor analysis

operates by extracting as many significant factors from data as possible



factor analysis
indicates strength of factor
<1.0 usually not considered significant
aka characteristic root


factor loadings

correlation between a variable (e.g. item or subtest) and underlying factor
interpreted if equal or exceed +/- .30


orthogonal rotation

type of factor rotation
axes remain perpendicular (90 degrees)
always results in factors that have no correlation with one another
generally preferred because easier to interpret
communalities must be calculated



calculated in orthogonal rotation
refers to how much of a test's variability is explained by combination of all the factors
factor loadings all squared and added together


oblique rotation

type of factor rotation
angle between axes is non-perpendicular and factors are correlated
some argue that oblique rotations are preferable to orthogonal rotations because factors tend to be correlated in the real world


principal components analysis

type of factor analysis
when one is trying to extract factors and there is no empirical or theoretical guidance on the values of the communalities
always results in a few unrelated factors, called components
factors empirically derived, researcher has no prior hypotheses
first factor (component) accounts for largest amount of variability, each additional component explaining somewhat less


(principle) factor analysis

type of factor analysis
communality values would need to be ascertained before analysis


Normal curve

See pic