CHAPTER 5: RELIABILITY Flashcards
It is an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance.
Reliability Coefficient
A statistic useful in describing sources of test score variability, and is the standard deviation squared.
Variance (σ2)
It came from true differences.
True Variance (σ2th)
A variance from irrelevant, random sources.
Error Variance (σ2)
It is the difference between a person’s observed score on a test and their true score. It reflects inaccuracies or inconsistencies in the testing process, such as unclear questions, environmental distractions, or test-taker factors like fatigue. It refers to, collectively, all of the factors associated with the process of measuring some variable, other than the variable being measured.
Measurement Error
It is a source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process. Sometimes referred to as “noise,” this source of error fluctuates from one testing situation to another with no discernible pattern that would systematically raise or lower scores.
Random error
It refers to a source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured.
Systematic Error
Terms that refer to variation among items within a test as well as to variation among items between tests.
Item Sampling or Content Sampling
What are the sources of error variance?
Test construction
Test administration
Test scoring and interpretation
Terms that refer to variation among items within a test as well as to variation among items between tests.
Item Sampling or Content Sampling
It introduces error variance primarily through item or content sampling, meaning that variation in item wording or topic selection can influence test scores. Even tests aiming to measure the same trait or knowledge area may differ significantly based on what and how the content is presented. A test taker’s performance can be boosted or hindered depending on whether the specific items align with what they know or expect. The key challenge for test developers is to minimize error variance and maximize true variance so that scores more accurately reflect the intended construct.
Test Construction
A source of error variance that can introduce significant error variance through environmental, test taker, and examiner-related factors. Environmental conditions like room temperature, noise, or seating can distract examinees, while physical or emotional discomfort, fatigue, or even current events may impact their performance. Test taker variables such as illness, medications, or personal experiences also affect scores. Additionally, examiner behavior—such as inconsistent administration, physical cues, or personal biases—can unintentionally influence outcomes. Altogether, these factors can distort test results, making them less reflective of the true ability or trait being measured.
Test Administration
It can introduce error variance, especially in assessments requiring human judgment. While computer scoring has minimized errors for objective tests, many assessments—like intelligence, personality, creativity, and behavioral tests—still depend on trained scorers. Subjectivity can lead to variability in scoring, especially when responses fall in gray areas or when raters interpret behaviors differently. Even with detailed scoring guidelines, inconsistencies may arise due to individual differences among scorers. To reduce such errors, rigorous training and clear criteria are essential to ensure scoring reliability and fairness.
Test Scoring and Interpretation
It is an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test.
Test-retest Reliability
It is a type of test-retest reliability estimate that reflects the consistency of test scores over a long time interval, typically more than six months.
Coefficient of Stability
It refers to an estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when, for each form of the test, the means and variances of observed test scores are equal.
Parallel Forms Reliability
The degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel-forms coefficient of reliability.
Coefficient of Equivalence
P_____ ____ of a test exist when, for each form of the test, the means and the variances of observed test scores are equal. Strictly equivalent in means, variances, and reliability
Parallel forms
It refers to an estimate of the extent to which these different forms of the same test have been affected by item sampling error, or other error.
Alternate Forms Reliability
These are simply different versions of a test that have been constructed so as to be parallel. Although they do not meet the requirements for the legitimate designation “parallel,” ____ forms of a test are typically designed to be equivalent with respect to variables such as content and level of difficulty. Different versions intended to be similar; may not be statistically equal.
Alternate forms
It is obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once. It is a useful measure of reliability when it is impractical or undesirable to assess reliability with two tests or to administer a test twice (because of factors such as time or expense).
Split-half Reliability
It is a method of estimating internal consistency by correlating odd vs. even items on a test. It is a type of split-half reliability where you split the test into two halves by assigning all odd-numbered items (1, 3, 5…) to one half, and even-numbered items (2, 4, 6…) to the other half.
Odd-even Reliability
It allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test.
Spearman–Brown formula
It refers to the degree of correlation among all the items on a scale.
Inter-item Consistency