Flashcards in Stats & Test Construction Deck (60):
Type I error
Mistakenly rejecting the null hypothesis when it's true
Type II error
Mistakenly retaining the null hypothesis when it is false
Technique in multivariate statistics that describes differences between 2+ groups on a set of measures or that classifies subjects into groups based on a set of measures
Threats to internal validity
Maturation, history, instrumentation, statistical regression, selection, attrition/mortality, interaction w/ selection
Ways to control threats to internal validity
Random assignment, within-subjects designs, blocking, matching subjects, ANCOVA
Threats to external validity
interaction b/t testing & treatment, interaction b/t selection & tx, reactivity, multiple tx interference (order/carryover effects)
Ways to control external validity
Random sampling, naturalistic/field research, single or double-blind designs, counterbalance
What are some ways to increase power?
Increase alpha, increased N, increase effect size, decrease error, use powerful statistics, one-tailed if possible
What percentage of scores on the normal curve fall between +/- 1 SD, +/- 2 SD, +/- 3 SD?
What percentiles are equivalent to the following z-scores?
0.1 = -3
2 = -2
16 = -1
84 = 1
98 = 2
99.9 = 3
Factors affecting test reliability
Test characteristics (length, item type, item homogeneity, influence of guessing), sample characteristics (sample size, range, variability), extent of test clarity
Sources of error in internal reliability
Content sampling, heterogeneity of content domain
Sources of error in test-retest reliability
Which type of reliability is best for speed tests?
Sources of error in inter-rater reliability
Factors related to raters (motivation, biases), characteristics of measuring device, consensual observer drift
Dimensions of relevance in item analysis
1) Content appropriateness (item assesses bx domain the test is intended to evaluate)
2) Taxonomic level (does item reflect appropriate cognitive or ability level of population intended for)
3) Extraneous abilities (to what extent are knowledge or skills needed that is outside the domain being evaluated)
The %age of people who get an item correct
Extent an item differentiates between those who get a high vs. low score
.35 or more is acceptable
Item response theory
Tests based on examinee's level on the trait being measured vs total test score
Proportion of variability in obtained test scores that reflects true score variability
Never squared to interpret
Standard error of measurement (SEM)
An index of the amount of error that can be expected in a person's obtained scores due to the unreliability of the test
What qualitative evidence do you look for in a task that has good content validity?
Coefficient of internal consistency will be large
Test will correlate highly with other tests of the same domain
Pre- and post-test evals of the program designed to increase familiarity with domain will indicate appropriate changes
Resulting factors are uncorrelated; attribute measured by one factor is independent from the attributes measured by the other factor
Resulting factors are correlated & attributes measured by the factors are not independent
What is the Rosenthal/Pygmalion effect?
Tendency for participant's performance to be effected by the expectations of the tester
What is the Hawthorne effect?
Tendency of subjects to behave differently when they are in a research study
What is the most common measure for internal test reliability?
Cronbach's alpha (can't be used for dichotomous tests)
What measure is used to evaluate the effect of lengthening or shortening a test?
Spearman-Brown correction formula
What formula is used to assess the reliability of a test with dichotomous responses?
What are acceptable scores of reliability?
.80 & above = good
.70-79 = acceptable
.60-.69 = marginally reliable
.59 and below = not reliable
Name the 4 scales of measurement
1) Nominal = names of categories
2) Ordinal = rank data
3) Interval = no absolute 0, numbers scaled at equal distances
4) Ratio = has absolute 0
What are the assumptions of parametric statistics?
Homogeneity of variance (variance equal among all groups)
Independence of observations
F-ration in a one-way ANOVA
Ratio of between group to within group variance
Relationship of A and C depends on the value of B (the moderator)
Accounts for (or partially accounts for) a relationship b/t an IV and DV
Relationship between A and C decreases or is eliminated when B is included in the model
What is the null hypothesis in Chi-square?
Observed frequencies are randomly distributed
Alternate hypothesis is that the observed frequencies are related to the treatment effect
Central limit theorem
As sample size increases, shape of sampling distribution of sample means approximates a normal distribution.
Mean of sampling distribution of sample means = mean of population.
What factors affect Pearson's product moment correlation?
Linearity (assumes linear relationship b/t 2 variables)
Homoscedasticity (scores are equally distributed)
Range of scores (wider range provides more accurate estimate)
Correlation between one continuous variable & one dichotomous variable
Correlation b/t 2 dichotomous variables
Assumptions of regression
Linear relationship b/t X and Y
Homoscedasticity (error scores of criterion are the same across range of x)
Homogeneity of variance
Degree to which predictors correlate with each other
Decreases the accuracy of the regression equation
TP/TP + FN
TN/TN + FP
Positive likelihood ratio
Indicates the odds that a positive test comes from a true positive (a PLR of 3 means that a pt w/ a +predictor is 3x as likely to have the condition)
Positive predictive power
Probability that a pt with a + test has the true condition
TP/TP + FP
Negative predictive power
Probability that a pt with a negative test result does not have the condition
TN/TN + FN
Relationship between base rate & PPP/NPP
As the base rate increases, PPP will increase, whereas NPP will decrease. Converse is true as the base rate declines.
Often employed in decision analysis, allowing calculation of the posterior probability of an event (conditioned probability it is assigned when the relevant evidence is taken into account)
Item characteristic curve
Plot the proportion of ppl who answered correctly against the total test score, performance on an external criterion, or mathematically-derived estimate of ability; provides info on relationship between examinee's level on the trait measured by the test & the probability that he will respond correctly on that item
Which ANOVA post-hoc correction is most conservative?
Which ANOVA post-hoc correction is appropriate for pairwise comparisons?
Compare two independent groups on a DV measured with rank-ordered data
Most scores are high but few extreme low scores; mean < median < mode; easy test, ceiling effects
Most scores are low but few extreme high scores; mean > median > mode; difficult test, floor effects
Average of the square differences of each observation from the mean
Null hypothesis in ANOVA
Group means were drawn from the same population (i.e., means are equal in the population)
What factors may lead to non-normal test distributions?
1) existence of discrete subpopulations w/i the general population w/ differing abilities
2) ceiling or floor effects
3) tx effects that change the location of means, medians & modes, affect variability & distribution shape
How is SEM related to test reliability?
The greater the reliability, the smaller the SEM