Stats & Test Construction Flashcards Preview

ABPP > Stats & Test Construction > Flashcards

Flashcards in Stats & Test Construction Deck (60):
1

Type I error

Mistakenly rejecting the null hypothesis when it's true

Alpha

2

Type II error

Mistakenly retaining the null hypothesis when it is false

Beta

3

Discriminant analysis

Technique in multivariate statistics that describes differences between 2+ groups on a set of measures or that classifies subjects into groups based on a set of measures

4

Threats to internal validity

Maturation, history, instrumentation, statistical regression, selection, attrition/mortality, interaction w/ selection

5

Ways to control threats to internal validity

Random assignment, within-subjects designs, blocking, matching subjects, ANCOVA

6

Threats to external validity

interaction b/t testing & treatment, interaction b/t selection & tx, reactivity, multiple tx interference (order/carryover effects)

7

Ways to control external validity

Random sampling, naturalistic/field research, single or double-blind designs, counterbalance

8

What are some ways to increase power?

Increase alpha, increased N, increase effect size, decrease error, use powerful statistics, one-tailed if possible

9

What percentage of scores on the normal curve fall between +/- 1 SD, +/- 2 SD, +/- 3 SD?

68%
95%
99%

10

What percentiles are equivalent to the following z-scores?
-3
-2
-1
1
2
4

0.1 = -3
2 = -2
16 = -1
84 = 1
98 = 2
99.9 = 3

11

Factors affecting test reliability

Test characteristics (length, item type, item homogeneity, influence of guessing), sample characteristics (sample size, range, variability), extent of test clarity

12

Sources of error in internal reliability

Content sampling, heterogeneity of content domain

13

Sources of error in test-retest reliability

Time-sampling factors

14

Which type of reliability is best for speed tests?

Alternate forms

15

Sources of error in inter-rater reliability

Factors related to raters (motivation, biases), characteristics of measuring device, consensual observer drift

16

Dimensions of relevance in item analysis

1) Content appropriateness (item assesses bx domain the test is intended to evaluate)
2) Taxonomic level (does item reflect appropriate cognitive or ability level of population intended for)
3) Extraneous abilities (to what extent are knowledge or skills needed that is outside the domain being evaluated)

17

Item difficulty

The %age of people who get an item correct

18

Item discrimination

Extent an item differentiates between those who get a high vs. low score

.35 or more is acceptable

19

Item response theory

Tests based on examinee's level on the trait being measured vs total test score

20

Reliability coefficient

Proportion of variability in obtained test scores that reflects true score variability

Never squared to interpret

21

Standard error of measurement (SEM)

An index of the amount of error that can be expected in a person's obtained scores due to the unreliability of the test

22

What qualitative evidence do you look for in a task that has good content validity?

Coefficient of internal consistency will be large

Test will correlate highly with other tests of the same domain

Pre- and post-test evals of the program designed to increase familiarity with domain will indicate appropriate changes

23

Orthogonal rotation

Resulting factors are uncorrelated; attribute measured by one factor is independent from the attributes measured by the other factor

24

Oblique rotation

Resulting factors are correlated & attributes measured by the factors are not independent

25

What is the Rosenthal/Pygmalion effect?

Tendency for participant's performance to be effected by the expectations of the tester

26

What is the Hawthorne effect?

Tendency of subjects to behave differently when they are in a research study

27

What is the most common measure for internal test reliability?

Cronbach's alpha (can't be used for dichotomous tests)

28

What measure is used to evaluate the effect of lengthening or shortening a test?

Spearman-Brown correction formula

29

What formula is used to assess the reliability of a test with dichotomous responses?

Kuder-Richardson formula

30

What are acceptable scores of reliability?

.80 & above = good
.70-79 = acceptable
.60-.69 = marginally reliable
.59 and below = not reliable

31

Name the 4 scales of measurement

1) Nominal = names of categories
2) Ordinal = rank data
3) Interval = no absolute 0, numbers scaled at equal distances
4) Ratio = has absolute 0

32

What are the assumptions of parametric statistics?

Normal distribution
Homogeneity of variance (variance equal among all groups)
Independence of observations

33

F-ration in a one-way ANOVA

Ratio of between group to within group variance

34

Moderator variable

Relationship of A and C depends on the value of B (the moderator)

35

Mediating variable

Accounts for (or partially accounts for) a relationship b/t an IV and DV

Relationship between A and C decreases or is eliminated when B is included in the model

36

What is the null hypothesis in Chi-square?

Observed frequencies are randomly distributed

Alternate hypothesis is that the observed frequencies are related to the treatment effect

37

Central limit theorem

As sample size increases, shape of sampling distribution of sample means approximates a normal distribution.

Mean of sampling distribution of sample means = mean of population.

38

What factors affect Pearson's product moment correlation?

Linearity (assumes linear relationship b/t 2 variables)

Homoscedasticity (scores are equally distributed)

Range of scores (wider range provides more accurate estimate)

39

Point-biserial coefficient

Correlation between one continuous variable & one dichotomous variable

40

Phi coefficient

Correlation b/t 2 dichotomous variables

41

Assumptions of regression

Linear relationship b/t X and Y

Homoscedasticity (error scores of criterion are the same across range of x)

Homogeneity of variance

42

Multicollinearity

Degree to which predictors correlate with each other

Decreases the accuracy of the regression equation

43

Sensitivity

TP/TP + FN

44

Specificity

TN/TN + FP

45

Positive likelihood ratio

Indicates the odds that a positive test comes from a true positive (a PLR of 3 means that a pt w/ a +predictor is 3x as likely to have the condition)

Sensitivity/1-specificity

46

Positive predictive power

Probability that a pt with a + test has the true condition

TP/TP + FP

47

Negative predictive power

Probability that a pt with a negative test result does not have the condition

TN/TN + FN

48

Relationship between base rate & PPP/NPP

As the base rate increases, PPP will increase, whereas NPP will decrease. Converse is true as the base rate declines.

49

Bayes theorem

Often employed in decision analysis, allowing calculation of the posterior probability of an event (conditioned probability it is assigned when the relevant evidence is taken into account)

50

Item characteristic curve

Plot the proportion of ppl who answered correctly against the total test score, performance on an external criterion, or mathematically-derived estimate of ability; provides info on relationship between examinee's level on the trait measured by the test & the probability that he will respond correctly on that item

51

Which ANOVA post-hoc correction is most conservative?

Scheffe

52

Which ANOVA post-hoc correction is appropriate for pairwise comparisons?

Tukey

53

Mann-Whitney U

Compare two independent groups on a DV measured with rank-ordered data

54

Negative skew

Most scores are high but few extreme low scores; mean < median < mode; easy test, ceiling effects

55

Positive skew

Most scores are low but few extreme high scores; mean > median > mode; difficult test, floor effects

56

Variance

Average of the square differences of each observation from the mean

57

Null hypothesis in ANOVA

Group means were drawn from the same population (i.e., means are equal in the population)

58

What factors may lead to non-normal test distributions?

1) existence of discrete subpopulations w/i the general population w/ differing abilities
2) ceiling or floor effects
3) tx effects that change the location of means, medians & modes, affect variability & distribution shape

59

How is SEM related to test reliability?

The greater the reliability, the smaller the SEM

60

Reliable change index (RCI)

Indicator of the probability that an observed difference b/t 2 scores from the same examine on the same test can be attributed to measurement error