Item Analysis & Test Reliability Flashcards

(26 cards)

1
Q

this refers to due to random factors that affect the test performance of examinees in unpredictable ways and include distractions during testing, ambiguously worded test items, and examinee fatigue

A

measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

this refers to the extent to which a test provides consistent information

A

test reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

reliability coefficients are interpreted directly as…

A

the amount variability in obtained test scores thats due to true variability

e.g., if a test has a reliability coefficient of .80, this means that 80% of variability in obtained test scores is due to true variability and the remaining 20% is due to measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

which type of tests have higher reliability coefficients: attitude tests, personality tests, or cognitive ability tests? why?

A

standardized cognitive ability tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

match each description with the correct method of assessing reliability: test-retest, alternate forms, internal consistency, or inter-rater reliability

a) provides information about the consistency of scores over different test items; useful for tests that are designed to measure a single content domain or aspect of behavior
b) provides informaiton on the consistency of scores or ratings assigned by different raters; most useful for measures that are subjectively scored
c) provides information about the consistency of scores over time; most useful fro tests designed to measure a characteristic that’s stable over time
d) provides information about the consistency of scores over different forms of the test and, when the second form is administered at a later time, the consistency of scores over time; most useful/important whenever a test has more than 1 form

A

a) internal consistency reliability
b) inter-rater reliability
c) test-retest reliability
d) alternate forms reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what tests is internal consistency reliability not useful for and why?

A

speed tests (tests that measure speed of performance rather than knowledge or skill level)
* because it tends to overestimate their reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

list 4 methods of evaluating internal consistency

A

1) coefficient alpha (aka Cronbach’s alpha)
2) Kuder-Richardson 20 (KR-20)
3) split-half reliability
4) Spearman-Brown prophecy formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

match the description of methods used to evaluate internal consistency with the correct name of the method: coefficient alpha, KR-20, split-half reliabilty, or Spearman-Brown

A) involves administering the test to a sample of examinees, splitting the test in half (often in terms of even- and odd-numbered items, and correlating the scores on the two halves
B) used to determine the effects of lengthening or shortening a test on its reliabiity coefficient; usually used to correct underestimation of a test’s reliability
C) used when test items are dichotomously scored (e.g., as correct or incorrect)
D) involves administering the test to a sample of examinees & calculating the average inter-item consistency

A

A) split-half reliability
B) KR-20
C) Spearman-Brown prophecy formula
D) coefficient alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

this method of evaluating inter-rater reliability can be calculated for 2 or more raters, but will not take chance agreement into account and can result in an overestimate of reliability

A

percent agreement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

this method for evaluating inter-rater reliability is used to assess the consistency of ratings assigned by two raters when the ratings represent a nominal scale

A

Cohen’s kappa coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

the reliability of subjective ratings can be affected by this, which occurs when 2 or more raters communicate with each other while assignning ratings and results in increased consistency (but often decreased accuracy) in ratings & an overestimate of inter-rater reliability

A

consensual observer drift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

list ways to eliminate/reduce consensual observer drift

A
  • not having raters work together
  • providing raters with adequate training
  • regularly monitoring the accuracy of raters’ ratings
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

list & describe the 3 factors that affect the size of the reliability coefficient

A

1) content homogeneity: tests that are homogenous with regard to content tend to have larger reliability coefficients than heterogeneous tests, esp. for internal consistency reliability

2) range of scores: larger when test scores are unrestricted in terms of range, which occurs when the examinees included in the sample are heterogeneous with regard to characteristic(s) measure by the test (e.g., when the sample includes examinees who have high, moderate, and low levels of the characteristic)

3) guessing: affected by the likelihood that test items can be answered correctly by guessing (i.e., the easier it is to choose the correct answer by guessing, the lower the reliability coefficient)

  • a true/false test is likely to be less reliable than a multiple-choice test that has 3+ answer choices
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

this is used to determine which items to include in the test, involves determining each item’s difficulty level & ability to discriminate between examinees who obtain high & low total test scores, and is based on classical test theory

A

item analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

with regard to item analysis, what is the typical p value range for moderately difficult items

18
Q

what size p value is preferred for test items used to identify examinees who have mastered a certain level of knowledge or skill (e.g., on mastery tests)

A

lower p values

19
Q

this index indicated the difference between the percentage of examinees with high total test scores (often the top 27%) who answered the item correctly and the percentage of examinees with low total test scores (often the bottom 27%) who answered the item correctly

A

item discrimination index (D)

As an example, when 90% of examinees in the high-scoring group and 20% of examinees in the low-scoring group answered an item correctly, the item’s D value is .90 minus .20, which is .70.

19
Q

an item’s difficulty level affects its ability to…

A

discriminate (higher levels of discrimination for items of moderate difficulty)

20
Q

this indicates the range within which an examinee’s true score is likely to be given their obtained score

A

confidence interval

21
Q

this is used to construct a confidence interval, and is calculated by multiplying the test’s standard deviation by the square root of 1 minus the reliability coefficient, e.g., (SD)√1-r

A

standard error of measurement

For instance, if a test has a standard deviation of 5 and a reliability coefficient of .84, its standard error of measurement equals 5 times the square root of 1 minus .84: 1 minus .84 is .16, the square root of .16 is .4, and 5 times .4 is 2.
* In other words, when a test’s standard deviation is 5 and its reliability coefficient is .84, its standard error of measurement is 2.

22
Q

constructing a 68%, 95%, & 99% confidence interval

A

68% CI: add & subtract 1 standard error of measurement to & from the obtained score
95% CI: add & subtract 2 standard errors of measurement to & from the obtained score
99% CI: add & subtract 3 standard errors of measurement to & from the obtained score

23
Q

an examinee obtained a score of 90 on a test that has a standard error of measurement of 5. Identify the 95% confidence interval for this score.

A

80 to 100
(90 + 10 = 100
90 - 10 = 80)

To do so, you add and subtract 10 (two standard errors) to and from 90, which gives you a 95% confidence interval of 80 to 100.

24
classical test theory (CTT) vs. item response theroy (IRT)
CTT is **test based** & focuses on examinees' total test scores IRT is **item based** & focuses on examinees' responses to individual test items **IRT is better suited for developing computerized adaptive tests**
25