Item Analysis & Test Reliability Flashcards
(26 cards)
this refers to due to random factors that affect the test performance of examinees in unpredictable ways and include distractions during testing, ambiguously worded test items, and examinee fatigue
measurement error
this refers to the extent to which a test provides consistent information
test reliability
reliability coefficients are interpreted directly as…
the amount variability in obtained test scores thats due to true variability
e.g., if a test has a reliability coefficient of .80, this means that 80% of variability in obtained test scores is due to true variability and the remaining 20% is due to measurement error
which type of tests have higher reliability coefficients: attitude tests, personality tests, or cognitive ability tests? why?
standardized cognitive ability tests
match each description with the correct method of assessing reliability: test-retest, alternate forms, internal consistency, or inter-rater reliability
a) provides information about the consistency of scores over different test items; useful for tests that are designed to measure a single content domain or aspect of behavior
b) provides informaiton on the consistency of scores or ratings assigned by different raters; most useful for measures that are subjectively scored
c) provides information about the consistency of scores over time; most useful fro tests designed to measure a characteristic that’s stable over time
d) provides information about the consistency of scores over different forms of the test and, when the second form is administered at a later time, the consistency of scores over time; most useful/important whenever a test has more than 1 form
a) internal consistency reliability
b) inter-rater reliability
c) test-retest reliability
d) alternate forms reliability
what tests is internal consistency reliability not useful for and why?
speed tests (tests that measure speed of performance rather than knowledge or skill level)
* because it tends to overestimate their reliability
list 4 methods of evaluating internal consistency
1) coefficient alpha (aka Cronbach’s alpha)
2) Kuder-Richardson 20 (KR-20)
3) split-half reliability
4) Spearman-Brown prophecy formula
match the description of methods used to evaluate internal consistency with the correct name of the method: coefficient alpha, KR-20, split-half reliabilty, or Spearman-Brown
A) involves administering the test to a sample of examinees, splitting the test in half (often in terms of even- and odd-numbered items, and correlating the scores on the two halves
B) used to determine the effects of lengthening or shortening a test on its reliabiity coefficient; usually used to correct underestimation of a test’s reliability
C) used when test items are dichotomously scored (e.g., as correct or incorrect)
D) involves administering the test to a sample of examinees & calculating the average inter-item consistency
A) split-half reliability
B) KR-20
C) Spearman-Brown prophecy formula
D) coefficient alpha
this method of evaluating inter-rater reliability can be calculated for 2 or more raters, but will not take chance agreement into account and can result in an overestimate of reliability
percent agreement
this method for evaluating inter-rater reliability is used to assess the consistency of ratings assigned by two raters when the ratings represent a nominal scale
Cohen’s kappa coefficient
the reliability of subjective ratings can be affected by this, which occurs when 2 or more raters communicate with each other while assignning ratings and results in increased consistency (but often decreased accuracy) in ratings & an overestimate of inter-rater reliability
consensual observer drift
list ways to eliminate/reduce consensual observer drift
- not having raters work together
- providing raters with adequate training
- regularly monitoring the accuracy of raters’ ratings
list & describe the 3 factors that affect the size of the reliability coefficient
1) content homogeneity: tests that are homogenous with regard to content tend to have larger reliability coefficients than heterogeneous tests, esp. for internal consistency reliability
2) range of scores: larger when test scores are unrestricted in terms of range, which occurs when the examinees included in the sample are heterogeneous with regard to characteristic(s) measure by the test (e.g., when the sample includes examinees who have high, moderate, and low levels of the characteristic)
3) guessing: affected by the likelihood that test items can be answered correctly by guessing (i.e., the easier it is to choose the correct answer by guessing, the lower the reliability coefficient)
- a true/false test is likely to be less reliable than a multiple-choice test that has 3+ answer choices
this is used to determine which items to include in the test, involves determining each item’s difficulty level & ability to discriminate between examinees who obtain high & low total test scores, and is based on classical test theory
item analysis
with regard to item analysis, what is the typical p value range for moderately difficult items
.30 to .70
what size p value is preferred for test items used to identify examinees who have mastered a certain level of knowledge or skill (e.g., on mastery tests)
lower p values
this index indicated the difference between the percentage of examinees with high total test scores (often the top 27%) who answered the item correctly and the percentage of examinees with low total test scores (often the bottom 27%) who answered the item correctly
item discrimination index (D)
As an example, when 90% of examinees in the high-scoring group and 20% of examinees in the low-scoring group answered an item correctly, the item’s D value is .90 minus .20, which is .70.
an item’s difficulty level affects its ability to…
discriminate (higher levels of discrimination for items of moderate difficulty)
this indicates the range within which an examinee’s true score is likely to be given their obtained score
confidence interval
this is used to construct a confidence interval, and is calculated by multiplying the test’s standard deviation by the square root of 1 minus the reliability coefficient, e.g., (SD)√1-r
standard error of measurement
For instance, if a test has a standard deviation of 5 and a reliability coefficient of .84, its standard error of measurement equals 5 times the square root of 1 minus .84: 1 minus .84 is .16, the square root of .16 is .4, and 5 times .4 is 2.
* In other words, when a test’s standard deviation is 5 and its reliability coefficient is .84, its standard error of measurement is 2.
constructing a 68%, 95%, & 99% confidence interval
68% CI: add & subtract 1 standard error of measurement to & from the obtained score
95% CI: add & subtract 2 standard errors of measurement to & from the obtained score
99% CI: add & subtract 3 standard errors of measurement to & from the obtained score
an examinee obtained a score of 90 on a test that has a standard error of measurement of 5. Identify the 95% confidence interval for this score.
80 to 100
(90 + 10 = 100
90 - 10 = 80)
To do so, you add and subtract 10 (two standard errors) to and from 90, which gives you a 95% confidence interval of 80 to 100.