Ch. 5 - Reliability Flashcards

1
Q

alternate forms

A

different versions of the same test or measure;

contrast with parallel forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

alternate-forms reliability

A

estimate to the extent to which item sampling and other errors have affected scores on two versions of the same test;

contrast with parallel-forms reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

bias

A

a factor inherent within a test that systematically prevents accurate, impartial measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

classical test theory (CTT)

A

aka ‘true score theory / model’ …

system of assumptions about measurement that includes the notion that a test score (and even a response to an individual item) is composed of a relatively stable component that actually is what the test or individual item is designed to measure, as well as a component that is error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

coefficient alpha

A

aka ‘Cronbach’s alpha’ and alpha…

a statistic widely employed in test construction and used to assist in deriving an estimate of reliability

more technically, equal to mean of all split-half reliabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

coefficient of equivalence

A

an estimate of parallel-forms reliability or alternate-forms reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

coefficient of generalizability

A

index of the influence that particular facets have on a test score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

coefficient of inter-scorer reliability

A

determines the degree of consistency among scorers in the scoring of a test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

coefficient of stability

A

estimate of test-retest reliability obtained during time intervals of six months or longer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

confidence interval

A

range or band of test scores that is likely to contain the “true score”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

content sampling

A

variety of the subject matter contained in the items;

aka item sampling, in context of variation between individual test items in a test or between test items in two or more tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

criterion-referenced test

A

aka ‘domain-referenced testing’ and ‘content-referenced testing’

method of evaluation and a way of deriving meaning from test scores by evaluating an individual’s score with reference to a set standard (or criterion)

contrast with norm-referenced testing and assessment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

decision study

A

conducted at the conclusion of a generalizability study, this research is designed to explore the utility and value of test scores in making decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

dichotomous test item

A

test item or question that can be answered with only one of two response options (true/false, yes/no)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

discrimination

A

in IRT, degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured by a test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

domain sampling theory

A

a system of assumptions about measurement that includes the notion that a test score (and even response to an individual item) consists of a relatively stable component that actually is what the test or individual item is designed to measure as well as relatively unstable components that collectively can be accounted for as error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

dynamic characteristic

A

a trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences

contrast with static characteristic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

error variance

A

in true score model…

component of variance attributable to random sources irrelevant to the trait or ability the test purports to measure in an observed score or distribution of scores

common sources of error variance include those related to test construction (including item or content sampling), test administration, and test scoring and intrepration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

estimate of inter-item consistency

A

an estimate of the reliability of a test obtained from a measure of inter-item consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

facet

A

in generalizability theory…

variables of interest in the universe including number of items in the test, amount of training the test scorers have had, purpose of the test administration, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

generalizability theory

A

aka domain sampling theory

system of assumptions about measurement that includes the notion that a test score (and response) consists of a relatively stable component that actually is what the test or individual item is designed to measure as well as relatively unstable components that collectively can be accounted for as error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

generalizability study

A

in context of generalizability theory…

research conducted to explore the impact of different facets of the universe on a test score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

heterogeneity

A

more generally, having diverse contents

heterogeneous test measures multiple factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

homogeneity

A

describes degree to which a test measures a single trait

25
Q

inflation of range/variance

A

a reference to a phenomenon associated with reliability estimates wherein the variance of either variable in a correlational analysis is inflated by the sampling procedure used and so the resulting correlation coefficient tends to be higher

contrast with restriction of range

26
Q

information function

A
27
Q

inter-item consistency

A

the consistency or homogeneity of the items of a test, estimated by techniques such as the split-half method

28
Q

internal consistency estimate of reliability

A

an estimate of the reliability of a test obtained from a measure of inter-item consistency

29
Q

inter-scorer reliability

A

aka inter-rater reliability, observer reliability, judge reliability, and scorer reliability

an estimate of the degree of agreement of consistency between two and more scorers (or judges, raters, observers)

30
Q

item response theory (IRT)

A

aka latent-trait theory / model

system of assumptions about measurement (including assumption that a trait being measured by a test is unidimensional) and the extent to which each test item measures the trait

31
Q

item sampling

A

aka content sampling

variety of the subject matter contained in the items

freq ref to in context of the variation between individual test items in a test or between test items in two or more tests

32
Q

latent-trait theory

A

aka latent-trait model

system of assumptions about measurement, including the assumption that a trait being measured by a test is unidimensional, and the extent to which each test item measures the trait

33
Q

measurement error

A

refers to the inherent uncertainty associated with any measurement, even after care has been taken to minimize preventable mistakes

34
Q

odd-even reliability

A

estimate of split-half reliability of a test, obtained by assigning odd-numbered items to one half of the test and even-numbered items to the other half

35
Q

parallel forms

A

two or more versions of forms of the same test where, for each form, the means and variances of observed test scores are equal

contrast with alternate forms

36
Q

parallel-forms reliability

A

as estimate of the extent to which item sampling and other errors have affected test scores on two versions of the same test when, for each form of the test, the means and variances of observed test scores are equal

contrast with alternate-forms reliability

37
Q

polytomous test item

A

a test item or question with three or more alternative responses, where only one alternative is scored correct or scored as being consistent with a targeted trait or other construct

38
Q

power test

A

a test, usually of achievement or ability, which 1) either no time limit or such a long time limit that all test-takers can attempt all items and 2) some items so difficult that no test-taker can obtain a perfect score

contrast with speed test

39
Q

random error

A

a source of error in measuring a targeted variable, caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process

contrast with systematic error

40
Q

Rasch model

A

reference to an IRT model with very specific assumptions about the underlying distribution

41
Q

reliability

A

the extent to which measurements are consistent or repeatable

also extent to which measurements differ from occasion to occasion as a function of measurement error

42
Q

reliability coefficient

A

general term for an index of reliability or the ratio of true score variance on a test to the total variance

43
Q

replicability crisis

A

low replication rates commonly found in psychological research

44
Q

restriction of range/variance

A

aka restriction of variance

phenomenon associated with reliability estimates wherein the variance of either variable in a correlational analysis is restricted by the sampling procedure used and so the resulting correlation coefficient tends to be lower

contrast with inflation of range

45
Q

Spearman-Brown formula

A

equation used to estimate internal consistency reliability from a correlation of two halves of a test that has been lengthened or shortened

inappropriate for use with heterogeneous tests or speed tests

46
Q

speed test

A

test usually of achievement or ability, with a time limit

speed tests usually contain items of uniform difficulty level

47
Q

split-half reliability

A

estimate of the internal consistency of a test obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once

48
Q

standard error of a score

A

in true score theory, aka SEM

a statistic designed to estimate the extent to which an observed score deviates from a true score

49
Q

standard error of measurement

A

(SEM, aka std err of score)

in true score theory

a statistic designed to estimate the extent to which an observed score deviates from a true score

50
Q

standard error of the difference

A

a statistic designed to aid in determining how large a difference between two scores should be before it is considered statistically significant

51
Q

static characteristic

A

a trait, state, or ability presumed to be relatively unchanging over time

contrast with dynamic characteristic

52
Q

systematic error

A

a source of error in measuring a variable that is typically constant and proportionate to what is presumed to be the true value of the variable being measured

contrast with random error

53
Q

test-retest reliability

A

estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test

54
Q

transient error

A

source of error attributable to variations in the test-takers feelings, moods, or mental state over time

55
Q

true score

A

a value that, according to classical test theory, genuinely reflects an individual’s ability (or trait) level as measured by a particular test

56
Q

true variance

A

in the true score model

component of variance attributable to true differences in the ability or trait bring measured that are inherent in an observed score of distribution of scores

57
Q

universe

A

in generalizability theory

the total context of a particular test situation, including all the factors that lead to an individual’s test-taker’s score

58
Q

universe score

A

in generalizability theory

a test score corresponding to the particular universe being assessed or evaluated

59
Q

variance

A

a measurement of variability equal to the arithmetic mean of the squares of the differences between the scores in a distribution and their mean