Test Realibility Flashcards

(43 cards)

1
Q

is an index of reliability, a proportion that indicates the ratio between the
true score variance on a test and the total variance

A

Reliability coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  • a score on an ability test reflects not only the testtaker’s true score on the ability being
    measured but also error
A

Classical Test Theory (True Score Theory)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

3 Sources of Error Variance

A

Test Construction
Test Administration
Test Scoring and Interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

variance is attributed to item/content sampling

A

Test Construction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

test environment, testtaker variables, examiner-related variables are factors that may
influence testtaker’s attention or motivation

A

Test Administration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

technical glitches, subjectivity of scorer, human error, etc

A

Test Scoring and Interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

§ obtained by correlating pairs of scores from the same people on two different administrations of the same test
§ appropriate when evaluating a test measuring a construct that is relatively stable over time (e.g. personality)
§ coefficient of stability
§ source of error variance: ti

A

Reliability Estimates (STABILITY)
TEST-RETEST RELIABILITY ESTIMATE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

§ two test administrations with the same group of test takers
§ coefficient of equivalence

A

Reliability Estimates (EQUIVALENCE)
PARALLEL-FORMS and ALTERNATE-FORMS RELIABILITY ESTIMATES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

a test exist when, for each versions of the test, the means and
variances of observed test scores are equal.

A

Parallel-forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

a test are typically designed to be equivalent/identical with
respect to variables such as content and level of difficulty

A

Alternate-forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

§obtained by correlating two pairs of scores obtained from
equivalent halves of a single test administered once

A

SPLIT-HALF RELIABILITY ESTIMATE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

◦ used to estimate internal consistency reliability from a correlation of two
halves of a test (either lengthened or shortened)

A

Spearman-Brown formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Full meaning of KR 20 & 21

A

KUDER-RICHARDSON FORMULA 20 & 21

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

used to determine the inter-item consistency of
dichotomous items - items that can be scored right or wrong (e.g.
Multiple-choice, Yes/No, True/False, Agree/Disagree)

A

KR-20

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

items that can be scored right or wrong (e.g.
Multiple-choice, Yes/No, True/False, Agree/Disagree)

A

dichotomous items -

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

may be used if all the test items have approximately the
same degree of difficulty

A

KR-21

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

§most accepted and widely used reliability estimate
§Provides a measure of reliability from a single test administration
§developed by Lee Joseph Cronbach that’s why it is also called
Cronbach’s alpha
appropriate for use on tests containing nondichotomous items

A

COEFFICIENT ALPHA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Coefficient alpha developed by __________ that’s why it is also called _________

A

Lee Joseph Cronbach
Cronbach’s alpha

19
Q

appropriate for use on tests containing ________ items
(Strongly Disagree - Strongly Agree

A

nondichotomous

20
Q

§degree of agreement or consistency between two or more scorers
with regard to a particular measure
§scorers must have sufficient training in standardized scoring
§source of error: scoring criteria
§coefficient of inter-scorer reliability

A

INTER-SCORER RELIABILITY ESTIMATE

21
Q

Using and Interpreting a Reliability Coefficient
When purchasing tests:

üNever buy any form of assessment/measurement where there is

A

no reliability
coefficient or where it is below 0.7

22
Q

Using and Interpreting a Reliability Coefficient
When purchasing tests:

üPersonality and similar measures: ___________ is often
recommended as minimum

A

0.6 to 0.8 although above 0.7

23
Q

Using and Interpreting a Reliability Coefficient
When purchasing tests:

üAbility, aptitude, IQ and other forms of reasoning tests should have coefficients
___________has been recommended as an excellent value. Where the
intention is to compare people’s scores, such as when selecting people for a job,
values ______ should be the aim.

A

above 0.8. Above 0.85
& above 0.85

24
Q

Using and Interpreting a Reliability Coefficient
When purchasing tests:

üThe sample size used for calculation of reliability should never be _____

25
5 Reliability and Nature of the Test
Homogeneity vs. Heterogeneity of test items Dynamic vs. Static characteristics Restriction or Inflation of range Speed tests vs Power tests Criterion-referenced tests
26
- uniformity of test items
Homogenous items
27
- various items measuring multiple constructs
Heterogenous items
28
- changing trait, state, or ability (e.g. anxiety)
Dynamic
29
- stable/enduring trait, state, or ability
Static
30
Variability of test scores is directly related to correlation coefficient
Restriction or Inflation of range
31
- reliability estimate of speed tests should be based on performance from two independent testing periods
Speed tests vs Power tests
32
- traditional procedures of estimating reliability are usually not appropriate for use with__________ though there may be instances in which traditional estimates can be adopted
Criterion-referenced tests
33
3 Alternatives to the True Score Theory or Classical Test Theory
DOMAIN SAMPLING THEORY GENERALIZABILITY THEORY ITEM RESPONSE THEORY
34
§ seek to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score § posits that a test score is a sample from a larger, theoretical "domain" of possible items, and the reliability of a test increases with the number of items sampled from that domain
DOMAIN SAMPLING THEORY
35
GENERALIZABILITY THEORY originally referred to as the ________ is a modified form of DST § developed by _______
Domain Sampling Theory; GT Cronbach and colleagues
36
§ a person's test scores vary from testing to testing because of variables in the testing situation § given the exact same conditions of all the facets in the universe, the exact same test score should be obtained § test reliability does not reside within the test itself, rather, it is a function of the circumstances under which the test is developed, administered, and interpreted
GENERALIZABILITY THEORY
37
§ a theory of testing based on the relationship between an individual’s performance on a test item and the test taker’s level of performance on an overall measure of the ability the item was designed to measure. § Persons with lower ability have less of a chance, while persons with high ability are very likely to answer correctly; for example, students with higher math ability are more likely to get a math item correct.
ITEM RESPONSE THEORY
38
ITEM RESPONSE THEORY IRT models are often referred to as ____________. The term latent is used to emphasize that discrete item responses are taken to be observable manifestations of hypothesized traits, constructs, or attributes, not directly observed, but which must be inferred from the manifest responses
latent trait models
39
2 Reliability and Individual Scores
STANDARD ERROR OF MEASUREMENT STANDARD ERROR OF THE DIFFERENCE
40
- a range or band of test scores that is likely to contain the true score
confidence interval
41
often abbreviated as SEM or SEM, provides a measure of precision of an observed score; an estimate of an amount of error inherent in an observed § SEM and reliability of a test has an inverse relationship, that is, the higher the reliability of a test (or individual subtest within a test), the lower the SEM § it can be used to set the confidence interval for a particular score or to determine whether a score is significantly different from a criterion.
STANDARD ERROR OF MEASUREMENT
42
SEM
STANDARD ERROR OF MEASUREMENT
43
used to determine how large a difference should be before it is considered statistically significant § in cases such as recruitment and selection, ________ can be used to compare the test scores of applicants which can help personnel officers in making hiring decisions
STANDARD ERROR OF THE DIFFERENCE