Test Construction Flashcards

1
Q

Reliability

A

Amt of consistency, repeatability, and dependability in scores on a given test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Classical Test Theory

A

Obtained score in a combo of truth & error

X=T + E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

True score variability vs Error Variability

A

due to real diffs in ability or knowledge in test takers;

caused by chance or random factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Range of reliability coefficient for a test

A
  1. 00 to 1.00

* Interp directly as the percentage of true score variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Minimum acceptable Reliability

A

0.80

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sources of Error in Test Impacting Reliability

A
  1. Content Sampling- items that by chance tap into test taker’s knowledge, or don’t
  2. Time Sampling-impact of giving tests at 2 diff points in time
  3. Test Heterogeneity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Factors Affecting Reliability

A
# of items (more items--> higher rel)
Homogeneity of items (items tap similar content areas--> higher rel)
Range of Scores (unrestricted range maximizes reliability)
Ability to guess (easier to guess, lower reliability b/c no longer assessing true knowledge)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

4 Estimates of Reliability

A

Test-Retest
Parallel Forms
Internal consistency
Interrater Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Coefficient of stability

A

for test-retest reliability

Major source of error is time sampling (time interval btwn administrations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Coefficient of equivalence

A

for parallel forms reliability

major sources of error include time sampling & content sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Spearman Brown Prophecy Formula

A

used for split half reliability computation to compensate for impact of reducing # of items on each half of the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Split half reliability is inappropriate for ___ and ___ tests

A

Speeded (all items easy, but about how many comp in allotted time)
Power (subjects need to complete all items of varying difficulty, like EPPP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Ways to measure internal consistency reliability

A

Split Half

Kuder-Richardson (used when items are scored dichotomously) & Cronbach’s Alpha (used when items are likert scale)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Standard Error of Measurement

A

standard deviation of a theoretically normal distribution of test scores obtained by one ind on equivalent tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Range of values for Standard Error of the Measurement

A

0.0 to standard deviation of the test (when test is totally unreliable, SEM equal the standard deviation of the test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Due to error in measurement, subjects scores often reported using ___ ___

A

confidence bands (standard error of measurement corresponds to z scores)
68% btwn -1 and +1 z score
95% -2 and + 2
99% -3 and + 3

17
Q

Validity of a test

A

meaningfulness, usefulness, or accuracy of a test

18
Q

3 types of validity

A

Content
Criterion related
Construct

19
Q

Content validity

A

how adequately does a test sample a particular content area
-quantified by asking a panel of experts
no numerical validity coefficient

20
Q

Criterion related validity

A

how adequately a test score can be used to predict criterion outcome
ex- how well do SAT scores predict college GPA

21
Q

How is criterion related validity calculated?

A

Pearson r; coefficient range of -1.0 to + 1.0
as low as .20 freq considered acceptable
*Correlation squared tells how much variability in the criterion is accounted for by performance on the predictor

22
Q

2 subtypes of criterion related validity

A

Concurrent-predictor & criterion measured at about the same time
Predictive-delay btwn meas of predictor & criterion

23
Q

Standard Error of the Mean vs
Standard Error of Measurement vs
Standard Error of Estimate

A

Mean-the means of many many samples should be normally distributed, the amt of variation/deviation in these group means is standard error of the mean (umbrella of inferential stats)

Measurement-under umbrella of reliability; standard deviation of theoretically normal distribution of test scores obtained by ONE Person on equivalent tests

Estimate- under umbrella of validity; standard deviation of a theoretically normal distribution of criterion scores obtained by ONE person measured repeatedly

24
Q

Standard Error of Estimate

A

standard deviation of a theoretically normal distribution of criterion scores obtained by ONE person measured repeatedly
min value of 0.0, max value is SD of the criterion
if perfect predictor SEE is 0.0; if no predictive value, SEE=standard deviation of the criterion

25
Q

3 Applications of Criterion Related Validity Coefficient

A

Expectancy Tables
Taylor Russell Tables
Decision Making Theory

26
Q

Expectancy Tables

A

List the probability that a person’s criterion score will fall in a specified range based on the range in which a person’s predictor score fell

27
Q

Taylor Russell Tables

A

Show how much more accurate selection decisions are when using a particular predictor test as opposed to no predictor test
Speaks to INCREMENTAL VALIDITY (amt of improvement in success rate resulting from use of a predictor test)

28
Q

3 variables affecting incremental validity (and used in Taylor Russell Tables)

A
Criterion related validity coefficient of the predictor test
Company's base rate  (rate of selecting succ employees w/o a predictor test)
Selection Ratio (proportion of available openings to available applicants)
29
Q

Item Response Curve

A

provides one to 3 pieces of info re: test item:
it’s difficulty (rep by position of the curve-left vs. right)
it’s ability to distinguish between high & low scorers (the slope of the curve)
probability of getting item right by guessing (y intercept)

30
Q

Item Response Theory

A

Based on complex math assumptions, including invariance of item parameters, which holds that charac of items should be same for all theoretically equivalent groups of subjects chosen from the same population
For example, items should not have diff charac for minority & non minority groups
*used to dev comp programs that tailor tests to the individual’s ability level

31
Q

Other assumptions of Item Response Theory

A

assumes test items measure a latent trait

usually has little sig unl;ess working w/very large samples

32
Q

Incremental Validity is greatest when:

A

base rate is moderate, validity coefficient is high, & selection ratio is low