Test Realibility Flashcards by Unknown Unknown

is an index of reliability, a proportion that indicates the ratio between the
true score variance on a test and the total variance

Reliability coefficient

How well did you know this?

Not at all

Perfectly

a score on an ability test reflects not only the testtaker’s true score on the ability being
measured but also error

Classical Test Theory (True Score Theory)

How well did you know this?

Not at all

Perfectly

3 Sources of Error Variance

Test Construction
Test Administration
Test Scoring and Interpretation

How well did you know this?

Not at all

Perfectly

variance is attributed to item/content sampling

Test Construction

How well did you know this?

Not at all

Perfectly

test environment, testtaker variables, examiner-related variables are factors that may
influence testtaker’s attention or motivation

Test Administration

How well did you know this?

Not at all

Perfectly

technical glitches, subjectivity of scorer, human error, etc

Test Scoring and Interpretation

How well did you know this?

Not at all

Perfectly

§ obtained by correlating pairs of scores from the same people on two different administrations of the same test
§ appropriate when evaluating a test measuring a construct that is relatively stable over time (e.g. personality)
§ coefficient of stability
§ source of error variance: ti

Reliability Estimates (STABILITY)
TEST-RETEST RELIABILITY ESTIMATE

How well did you know this?

Not at all

Perfectly

§ two test administrations with the same group of test takers
§ coefficient of equivalence

Reliability Estimates (EQUIVALENCE)
PARALLEL-FORMS and ALTERNATE-FORMS RELIABILITY ESTIMATES

How well did you know this?

Not at all

Perfectly

a test exist when, for each versions of the test, the means and
variances of observed test scores are equal.

Parallel-forms

How well did you know this?

Not at all

Perfectly

a test are typically designed to be equivalent/identical with
respect to variables such as content and level of difficulty

Alternate-forms

How well did you know this?

Not at all

Perfectly

§obtained by correlating two pairs of scores obtained from
equivalent halves of a single test administered once

SPLIT-HALF RELIABILITY ESTIMATE

How well did you know this?

Not at all

Perfectly

◦ used to estimate internal consistency reliability from a correlation of two
halves of a test (either lengthened or shortened)

Spearman-Brown formula

How well did you know this?

Not at all

Perfectly

Full meaning of KR 20 & 21

KUDER-RICHARDSON FORMULA 20 & 21

How well did you know this?

Not at all

Perfectly

used to determine the inter-item consistency of
dichotomous items - items that can be scored right or wrong (e.g.
Multiple-choice, Yes/No, True/False, Agree/Disagree)

KR-20

How well did you know this?

Not at all

Perfectly

items that can be scored right or wrong (e.g.
Multiple-choice, Yes/No, True/False, Agree/Disagree)

dichotomous items -

How well did you know this?

Not at all

Perfectly

may be used if all the test items have approximately the
same degree of difficulty

KR-21

How well did you know this?

Not at all

Perfectly

§most accepted and widely used reliability estimate
§Provides a measure of reliability from a single test administration
§developed by Lee Joseph Cronbach that’s why it is also called
Cronbach’s alpha
appropriate for use on tests containing nondichotomous items

COEFFICIENT ALPHA

How well did you know this?

Not at all

Perfectly

Coefficient alpha developed by __________ that’s why it is also called _________

Study These Flashcards

Lee Joseph Cronbach
Cronbach’s alpha

appropriate for use on tests containing ________ items
(Strongly Disagree - Strongly Agree

Study These Flashcards

nondichotomous

§degree of agreement or consistency between two or more scorers
with regard to a particular measure
§scorers must have sufficient training in standardized scoring
§source of error: scoring criteria
§coefficient of inter-scorer reliability

Study These Flashcards

INTER-SCORER RELIABILITY ESTIMATE

Using and Interpreting a Reliability Coefficient
When purchasing tests:

üNever buy any form of assessment/measurement where there is

Study These Flashcards

no reliability
coefficient or where it is below 0.7

Using and Interpreting a Reliability Coefficient
When purchasing tests:

üPersonality and similar measures: ___________ is often
recommended as minimum

Study These Flashcards

0.6 to 0.8 although above 0.7

Using and Interpreting a Reliability Coefficient
When purchasing tests:

üAbility, aptitude, IQ and other forms of reasoning tests should have coefficients
___________has been recommended as an excellent value. Where the
intention is to compare people’s scores, such as when selecting people for a job,
values ______ should be the aim.

Study These Flashcards

above 0.8. Above 0.85
& above 0.85

Using and Interpreting a Reliability Coefficient
When purchasing tests:

üThe sample size used for calculation of reliability should never be _____

Study These Flashcards

below 100

5 Reliability and Nature of the Test

Homogeneity vs. Heterogeneity of test items Dynamic vs. Static characteristics Restriction or Inflation of range Speed tests vs Power tests Criterion-referenced tests

- uniformity of test items

Homogenous items

- various items measuring multiple constructs

Heterogenous items

- changing trait, state, or ability (e.g. anxiety)

Dynamic

- stable/enduring trait, state, or ability

Static

Variability of test scores is directly related to correlation coefficient

Restriction or Inflation of range

- reliability estimate of speed tests should be based on performance from two independent testing periods

Speed tests vs Power tests

- traditional procedures of estimating reliability are usually not appropriate for use with__________ though there may be instances in which traditional estimates can be adopted

Criterion-referenced tests

3 Alternatives to the True Score Theory or Classical Test Theory

DOMAIN SAMPLING THEORY GENERALIZABILITY THEORY ITEM RESPONSE THEORY

§ seek to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score § posits that a test score is a sample from a larger, theoretical "domain" of possible items, and the reliability of a test increases with the number of items sampled from that domain

DOMAIN SAMPLING THEORY

GENERALIZABILITY THEORY originally referred to as the ________ is a modified form of DST § developed by _______

Domain Sampling Theory; GT Cronbach and colleagues

§ a person's test scores vary from testing to testing because of variables in the testing situation § given the exact same conditions of all the facets in the universe, the exact same test score should be obtained § test reliability does not reside within the test itself, rather, it is a function of the circumstances under which the test is developed, administered, and interpreted

GENERALIZABILITY THEORY

§ a theory of testing based on the relationship between an individual’s performance on a test item and the test taker’s level of performance on an overall measure of the ability the item was designed to measure. § Persons with lower ability have less of a chance, while persons with high ability are very likely to answer correctly; for example, students with higher math ability are more likely to get a math item correct.

ITEM RESPONSE THEORY

ITEM RESPONSE THEORY IRT models are often referred to as ____________. The term latent is used to emphasize that discrete item responses are taken to be observable manifestations of hypothesized traits, constructs, or attributes, not directly observed, but which must be inferred from the manifest responses

latent trait models

2 Reliability and Individual Scores

STANDARD ERROR OF MEASUREMENT STANDARD ERROR OF THE DIFFERENCE

- a range or band of test scores that is likely to contain the true score

confidence interval

often abbreviated as SEM or SEM, provides a measure of precision of an observed score; an estimate of an amount of error inherent in an observed § SEM and reliability of a test has an inverse relationship, that is, the higher the reliability of a test (or individual subtest within a test), the lower the SEM § it can be used to set the confidence interval for a particular score or to determine whether a score is significantly different from a criterion.

STANDARD ERROR OF MEASUREMENT

SEM

STANDARD ERROR OF MEASUREMENT

used to determine how large a difference should be before it is considered statistically significant § in cases such as recruitment and selection, ________ can be used to compare the test scores of applicants which can help personnel officers in making hiring decisions

STANDARD ERROR OF THE DIFFERENCE

Test Realibility Flashcards

(43 cards)