Flashcards in Stats & Test Construction Deck (60):

1

## Type I error

###
Mistakenly rejecting the null hypothesis when it's true

Alpha

2

## Type II error

###
Mistakenly retaining the null hypothesis when it is false

Beta

3

## Discriminant analysis

### Technique in multivariate statistics that describes differences between 2+ groups on a set of measures or that classifies subjects into groups based on a set of measures

4

## Threats to internal validity

### Maturation, history, instrumentation, statistical regression, selection, attrition/mortality, interaction w/ selection

5

## Ways to control threats to internal validity

### Random assignment, within-subjects designs, blocking, matching subjects, ANCOVA

6

## Threats to external validity

### interaction b/t testing & treatment, interaction b/t selection & tx, reactivity, multiple tx interference (order/carryover effects)

7

## Ways to control external validity

### Random sampling, naturalistic/field research, single or double-blind designs, counterbalance

8

## What are some ways to increase power?

### Increase alpha, increased N, increase effect size, decrease error, use powerful statistics, one-tailed if possible

9

## What percentage of scores on the normal curve fall between +/- 1 SD, +/- 2 SD, +/- 3 SD?

###
68%

95%

99%

10

##
What percentiles are equivalent to the following z-scores?

-3

-2

-1

1

2

4

###
0.1 = -3

2 = -2

16 = -1

84 = 1

98 = 2

99.9 = 3

11

## Factors affecting test reliability

### Test characteristics (length, item type, item homogeneity, influence of guessing), sample characteristics (sample size, range, variability), extent of test clarity

12

## Sources of error in internal reliability

### Content sampling, heterogeneity of content domain

13

## Sources of error in test-retest reliability

### Time-sampling factors

14

## Which type of reliability is best for speed tests?

### Alternate forms

15

## Sources of error in inter-rater reliability

### Factors related to raters (motivation, biases), characteristics of measuring device, consensual observer drift

16

## Dimensions of relevance in item analysis

###
1) Content appropriateness (item assesses bx domain the test is intended to evaluate)

2) Taxonomic level (does item reflect appropriate cognitive or ability level of population intended for)

3) Extraneous abilities (to what extent are knowledge or skills needed that is outside the domain being evaluated)

17

## Item difficulty

### The %age of people who get an item correct

18

## Item discrimination

###
Extent an item differentiates between those who get a high vs. low score

.35 or more is acceptable

19

## Item response theory

### Tests based on examinee's level on the trait being measured vs total test score

20

## Reliability coefficient

###
Proportion of variability in obtained test scores that reflects true score variability

Never squared to interpret

21

## Standard error of measurement (SEM)

### An index of the amount of error that can be expected in a person's obtained scores due to the unreliability of the test

22

## What qualitative evidence do you look for in a task that has good content validity?

###
Coefficient of internal consistency will be large

Test will correlate highly with other tests of the same domain

Pre- and post-test evals of the program designed to increase familiarity with domain will indicate appropriate changes

23

## Orthogonal rotation

### Resulting factors are uncorrelated; attribute measured by one factor is independent from the attributes measured by the other factor

24

## Oblique rotation

### Resulting factors are correlated & attributes measured by the factors are not independent

25

## What is the Rosenthal/Pygmalion effect?

### Tendency for participant's performance to be effected by the expectations of the tester

26

## What is the Hawthorne effect?

### Tendency of subjects to behave differently when they are in a research study

27

## What is the most common measure for internal test reliability?

### Cronbach's alpha (can't be used for dichotomous tests)

28

## What measure is used to evaluate the effect of lengthening or shortening a test?

### Spearman-Brown correction formula

29

## What formula is used to assess the reliability of a test with dichotomous responses?

### Kuder-Richardson formula

30

## What are acceptable scores of reliability?

###
.80 & above = good

.70-79 = acceptable

.60-.69 = marginally reliable

.59 and below = not reliable

31

## Name the 4 scales of measurement

###
1) Nominal = names of categories

2) Ordinal = rank data

3) Interval = no absolute 0, numbers scaled at equal distances

4) Ratio = has absolute 0

32

## What are the assumptions of parametric statistics?

###
Normal distribution

Homogeneity of variance (variance equal among all groups)

Independence of observations

33

## F-ration in a one-way ANOVA

### Ratio of between group to within group variance

34

## Moderator variable

### Relationship of A and C depends on the value of B (the moderator)

35

## Mediating variable

###
Accounts for (or partially accounts for) a relationship b/t an IV and DV

Relationship between A and C decreases or is eliminated when B is included in the model

36

## What is the null hypothesis in Chi-square?

###
Observed frequencies are randomly distributed

Alternate hypothesis is that the observed frequencies are related to the treatment effect

37

## Central limit theorem

###
As sample size increases, shape of sampling distribution of sample means approximates a normal distribution.

Mean of sampling distribution of sample means = mean of population.

38

## What factors affect Pearson's product moment correlation?

###
Linearity (assumes linear relationship b/t 2 variables)

Homoscedasticity (scores are equally distributed)

Range of scores (wider range provides more accurate estimate)

39

## Point-biserial coefficient

### Correlation between one continuous variable & one dichotomous variable

40

## Phi coefficient

### Correlation b/t 2 dichotomous variables

41

## Assumptions of regression

###
Linear relationship b/t X and Y

Homoscedasticity (error scores of criterion are the same across range of x)

Homogeneity of variance

42

## Multicollinearity

###
Degree to which predictors correlate with each other

Decreases the accuracy of the regression equation

43

## Sensitivity

### TP/TP + FN

44

## Specificity

### TN/TN + FP

45

## Positive likelihood ratio

###
Indicates the odds that a positive test comes from a true positive (a PLR of 3 means that a pt w/ a +predictor is 3x as likely to have the condition)

Sensitivity/1-specificity

46

## Positive predictive power

###
Probability that a pt with a + test has the true condition

TP/TP + FP

47

## Negative predictive power

###
Probability that a pt with a negative test result does not have the condition

TN/TN + FN

48

## Relationship between base rate & PPP/NPP

### As the base rate increases, PPP will increase, whereas NPP will decrease. Converse is true as the base rate declines.

49

## Bayes theorem

### Often employed in decision analysis, allowing calculation of the posterior probability of an event (conditioned probability it is assigned when the relevant evidence is taken into account)

50

## Item characteristic curve

### Plot the proportion of ppl who answered correctly against the total test score, performance on an external criterion, or mathematically-derived estimate of ability; provides info on relationship between examinee's level on the trait measured by the test & the probability that he will respond correctly on that item

51

## Which ANOVA post-hoc correction is most conservative?

### Scheffe

52

## Which ANOVA post-hoc correction is appropriate for pairwise comparisons?

### Tukey

53

## Mann-Whitney U

### Compare two independent groups on a DV measured with rank-ordered data

54

## Negative skew

### Most scores are high but few extreme low scores; mean < median < mode; easy test, ceiling effects

55

## Positive skew

### Most scores are low but few extreme high scores; mean > median > mode; difficult test, floor effects

56

## Variance

### Average of the square differences of each observation from the mean

57

## Null hypothesis in ANOVA

### Group means were drawn from the same population (i.e., means are equal in the population)

58

## What factors may lead to non-normal test distributions?

###
1) existence of discrete subpopulations w/i the general population w/ differing abilities

2) ceiling or floor effects

3) tx effects that change the location of means, medians & modes, affect variability & distribution shape

59

## How is SEM related to test reliability?

### The greater the reliability, the smaller the SEM

60