Reliability of a test Flashcards

1
Q

dependability or consistency of the instrument or scores obtained by the same person when re-examined with the same test on different occasions, or with different sets of equivalent items

A

Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance

A

Reliability Coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

score on an ability test is presumed to reflect not only the testtaker’s true score on the ability being measured but also the error

A

Classical Test Theory (True Score Theory)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

refers to the component of the observed test score that does not have to do with the testtaker’s ability

A

Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Factors that contribute to consistency

A

stable attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Factors that contribute to inconsistency

A

characteristics of the individual, test, or situation, which have nothing to do with the attribute being measured, but still affect the scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Goals of Reliability:

A

EEDT
✓ Estimate errors
✓ Devise techniques to improve testing and reduce errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

useful in describing sources of test score variability

A

Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

variance from true differences

A

True Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

variance from irrelevant random sources

A

Error Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

all of the factors associated with the process of measuring some variable, other than the variable being measured

A

Measurement Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  • difference between the observed score and the true score
A

Measurement Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sources of Error Variance that refer to variation among items within a test as well as to variation among items between tests

  • The extent to which testtaker’s score is affected by the content sampled on a test and by the way the content is sampled is a source of error variance
A

Item Sampling/Content Sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sources of Error Variance that testtaker’s motivation or attention, environment, etc.

A

Test Administration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Sources of Error Variance that may employ objective-type items amenable to computer scoring of well-documented reliability

A

Test Scoring and Interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in measurement process (e.g., noise, temperature, weather)

A

Random Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

source of error in a measuring a variable that is typically constant or proportionate to what is presumed to be the true values of the variable being measured

  • has consistent effect on the true score -

SD does not change, the mean does

A

Systematic Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

________ refers to the proportion of total variance attributed to true variance

A

Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The _____ the proportion of the total variance attributed to true variance, the ________ the test

A

greater - more reliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

___________ may increase or decrease a test score by varying amounts, consistency of test score, and thus, the reliability can be affected

A

Error variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Error: Time Sampling

A

Test-Retest Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the test

A

Test-Retest Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

appropriate when evaluating the reliability of a test that purports to measure an enduring and stable attribute such as personality trait

  • established by comparing the scores obtained from two successive measurements of the same individuals and calculating a correlated between the two set of scores
A

Test-Retest Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

the longer the time passes, the greater likelihood that the reliability coefficient would be insignificant

A

Test-Retest Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

happened when the test-retest interval is short, wherein the second test is influenced by the first test because they remember or practiced the previous test = inflated correlation/overestimation of reliability

A

Carryover Effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

scores on the second session are higher due to their experience of the first session of testing

A

Practice Effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

test-retest with ______ interval might be affected of other extreme factors, thus, resulting to _____ correlation

A

longer - low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

problems in absences in second session (just remove the first tests of the absents)

A

Mortality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

statistical tool Test-Retest Reliability

A

Pearson R, Spearman Rho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Error: Item Sampling (Immediate), Item Sampling changes over time (delaued)

A

Parallel Forms/Alternate Forms Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

established when at least two different versions of the test yield almost the same scores

  • has the most universal applicability
A

Parallel Forms/Alternate Forms Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

each form of the test, the means, and the variances, are EQUAL; same items, different positionings/numberings

A

Parallel Forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

simply different version of a test that has been constructed so as to be parallel

A

Alternate Forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q
  • technique to avoid carryover effects for parallel forms, by using different sequence for groups
  • can be administered on the same day or different time
A

Counterbalancing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

most rigorous and burdensome, since test developers create two forms of the test - main problem: difference between the two test - test scores may be affected by motivation, fatigue, or intervening events - means and the variances of the observed scores must be equal for two forms - Statistical Tool: Pearson R or Spearman Rho

A

Counterbalancing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

used when tests are administered once - consistency among items within the test - measures the internal consistency of the test which is the degree to which each item measures the same construct

A

Internal Consistency (Inter-Item Reliability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Error: Item Sampling Homogeneity

A

Internal Consistency (Inter-Item Reliability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

measurement for unstable traits

  • if all items measure the same construct, then it has a good internal consistency
A

Internal Consistency (Inter-Item Reliability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

if a test contains items that measure a single trait (unifactorial)

A

Homogeneity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

degree to which a test measures different factors (more than one factor/trait) - more homogenous = higher inter-item consistency

A
  • Heterogeneity
40
Q

______ homogenous = _____ inter-item consistency

A

more - higher

41
Q

used for inter-item consistency of dichotomous items (intelligence tests, personality tests with yes or no options, multiple choice), unequal variances, dichotomous scored

A

KR-20

42
Q

if all the items have the same degree of difficulty (speed tests), equal variances, dichotomous scored

A

KR-21

43
Q

used when two halves of the test have unequal variances and on tests containing non-dichotomous items, unequal variances

A

Cronbach’s Coefficient Alpha

44
Q

measure used to evaluate internal consistence of a test that focuses on the degree of differences that exists between item scores

A

Average Proportional Distance

45
Q

obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered ONCE

A

Split Half Reliability

45
Q

Error: Item sample: Nature of Split

A

Split-Half Reliability

46
Q

useful when it is impractical or undesirable to assess reliability with two tests or to administer a test twice

  • cannot just divide the items in the middle because it might spuriously raise or lower the reliability coefficient, so just randomly assign items or assign odd-numbered items to one half and even-numbered items to the other half
A

Split-Half Reliability

47
Q

allows a test developer of user to estimate internal consistency reliability from a correlation of two halves of a test, if each half had been the length of the whole test and have the equal variances

A

Spearman-Brown Formula

48
Q

estimates how many more items are needed in order to achieve the target reliability

A

Spearman-Brown Prophecy Formula

49
Q

counterpart of spearman-brown formula, which is the ratio of the variance of difference between the odd and even splits and the variance of the total, combined odd-even, score

  • if the reliability of the original test is relatively low, then developer could create new items, clarify test instructions, or simplifying the scoring rules
  • equal variances, dichotomous scored
A

Rulon’s Formula

50
Q

Error: Scorer Difference

A

Inter-Scorer Reliability

51
Q

the degree of agreement or consistency between two or more scorers with regard to a particular measure

  • used for coding nonbehavioral behavior
  • observer differences
A

Inter-Scorer Reliability

52
Q

determine the level between TWO or MORE raters when the method of assessment is measured on CATEGORICAL SCALE

A

Fleiss Kappa

53
Q

two raters only

A

Cohen’s Kappa

54
Q

two or more rater, based on observed disagreement corrected for disagreement expected by chance

A

Krippendorff’s Alpha

55
Q

Tests designed to measure one factor _____ are expected to have _____ of internal consistency and vice versa

A

(Homogenous) - high degree

56
Q

trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experience

A

Dynamic

57
Q

barely changing or relatively unchanging

A

Static

58
Q

– if the variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be lower

A

Restriction of range or Restriction of variance

59
Q

when time limit is long enough to allow test takers to attempt all times

A

Power Tests

60
Q

generally contains items of uniform level of difficulty with time limit

A

Speed Tests

61
Q

Reliability should be based on performance from two independent testing periods using _______ and _________ or split-half-reliability

A

test-retest - alternate-forms

62
Q

designed to provide an indication of where a testtaker stands with respect to some variable or criterion

A

Criterion-Referenced Tests

63
Q

As individual differences ______, a traditional measure of reliability would also_______, regardless of the stability of individual performance

A

decrease - decrease

64
Q

everyone has a “true score” on test

A

Classical Test Theory

65
Q

genuinely reflects an individual’s ability level as measured by a particular test

A

True Score

66
Q

estimate the extent to which specific sources of variation under defined conditions are contributing to the test scores

A

Domain Sampling Theory

67
Q

_______ is conceived of as an objective measure of how precisely the test score assesses the domain from which the test draws a sample

A

Test reliability

68
Q

based on the idea that a person’s test scores vary from testing to testing because of the variables in the testing situations

A

Generalizability Theory

69
Q

test situation

A

Universe:

70
Q

number of items in the test, amount of review, and the purpose of test administration

A

Facets

71
Q

According to ____________, given the exact same conditions of all the facets in the universe, the exact same test score should be obtained (Universe score)

A

Generalizability Theory

72
Q

developers examine the usefulness of test scores in helping the test user make decisions

A

Decision Study

73
Q

the probability that a person with X ability will be able to perform at a level of Y in a test

Focus: item difficulty

A

Item Response Theory

74
Q

a system of assumption about measurement and the extent to which item measures the trait

A

Latent-Trait Theory

75
Q

The ______ is used to focus on the range of item difficulty that helps assess an individual’s ability level

A

computer

76
Q

attribute of not being easily accomplished, solved, or comprehended

A

Difficulty

77
Q

degree to which an item differentiates among people with higher or lower levels of the trait, ability etc.

A

Discrimination

78
Q

can be answered with only one of two alternative responses

A

Dichotomous

79
Q

3 or more alternative responses

A

Polytomous

80
Q

provide a measure of the precision of an observed test score

A

Standard Error of Measurement

81
Q

Standard deviation of errors as the ________ of error

A

basic measure

82
Q

Index of the amount of inconsistent or the amount of the ______ error in an individual’s score

A

expected

83
Q

Higher reliability the ______

A

lower Standard Error of Measurement

84
Q

a range or band of test scores that is likely to contain true scores

A

Confidence Interval

85
Q

can aid a test user in determining how large a difference should be before it is considered statistically significant

A

Standard Error of the Difference

86
Q

refers to the standard error of the difference between the predicted and observed values

A

Standard Error of Estimate

87
Q

a range of and of test score that is likely to contain true score

Tells us the relative ability of the true score within the specified range and confidence level

The larger the range, the higher the confidence

A

Confidence Interval

88
Q

If the reliability is low, you can increase the number of _____ or use factor analysis and item analysis to increase internal consistency

A

items

89
Q

nature of the test will often determine the reliability metric

A

Reliability Estimates

90
Q

detects true positive

A

Test Sensitivity

91
Q

detects true negative

A

Test Specificity

92
Q

proportion of the population that actually possess the characteristic of interest

A

Base Rate –

93
Q

– no. of available positions compared to the no. of applicants

A

Selection ratio

94
Q

one of the Four Possible Hit and Miss Outcomes– predict success that does occur

A

True Positives (Sensitivity)

95
Q

one of the Four Possible Hit and Miss Outcomes – predict failure that does occur

A

True Negatives (Specificity)

96
Q

one of the Four Possible Hit and Miss Outcomes – success does not occur

A

False Positive (Type 1)

97
Q

one of the Four Possible Hit and Miss Outcomes – predicted failure but succeed

A

False Negative (Type 2)