- due to random factors that affect test performance of examinees in unpredictable ways Example: distractions, ambiguously worded test items, and examinee fatigue

- The extent to which a test provides consistent information - there are several methods for evaluating this and each is appropriate for different circumstances + Most methods provide a reliability coefficient

- test retest reliability and alternative forms reliability are appropriate

-aka cronbach's alpha - involves administering the test to a sample of examinees and calculating the average inter-item consistency

Test Construction Flashcards by Bernadette Hinojos

Classical test theory

measurement that is used to develop and evaluate tests
framework
assumes that obtain test scores (X) are due to the combination of true variability (T) and measurement error (E)

X=T+E

How well did you know this?

Not at all

Perfectly

True score variability

result of actual differences among examinees in regards to whatever the test is measuring
assumed to be consistent

How well did you know this?

Not at all

Perfectly

Measurement error

due to random factors that affect test performance of examinees in unpredictable ways

Example: distractions, ambiguously worded test items, and examinee fatigue

How well did you know this?

Not at all

Perfectly

Test reliability

The extent to which a test provides consistent information
there are several methods for evaluating this and each is appropriate for different circumstances

+ Most methods provide a reliability coefficient

How well did you know this?

Not at all

Perfectly

Reliability coefficient

type of correlation coefficient
ranges from 0 to 1.0
designated with the letter r with the subscript of two of the same letters or numbers
always interpreted as the the direct amount of variability obtained in test scores that’s due to true score variability
.70 or higher is considered the minimally acceptable level, but 90 is usually required for higher stakes tests used to select employees, assign diagnoses, or other important decisions about individuals

How well did you know this?

Not at all

Perfectly

The acceptable level of reliability

depends on the type of test and its purpose

How well did you know this?

Not at all

Perfectly

Standardized cognitive ability tests versus personality test

cognitive ability test have a higher reliability coefficient

How well did you know this?

Not at all

Perfectly

Standardized cognitive ability tests versus personality test

cognitive ability test have a higher reliability coefficient

How well did you know this?

Not at all

Perfectly

Test retest reliability

provides information on the consistency of scores over time
involves administering the test to a sample of examinees and readministering the test to the same examinees at a later time than correlating the two sets of scores
useful for tests that are designed to measure a characteristic that is stable over time

How well did you know this?

Not at all

Perfectly

Alternative forms reliability

,- provides information about the consistency of scores over different forms of the test and when the second form is administered at a later time, the consistency of scores over time

involves administering one forum to a sample of examinees administering the other forms to the same sample of examinees and correlating the sets of scores
important whenever a test has more than one form

How well did you know this?

Not at all

Perfectly

Internal consistency reliability

provides information on the consistency of scores over different test items
useful for tests that are designed to measure a single content, domain or aspect of behavior

,- not useful for speed tests because it overestimates their reliability

How well did you know this?

Not at all

Perfectly

Speed test

test retest reliability and alternative forms reliability are appropriate

How well did you know this?

Not at all

Perfectly

Coefficient alpha

-aka cronbach’s alpha

involves administering the test to a sample of examinees and calculating the average inter-item consistency

How well did you know this?

Not at all

Perfectly

Coefficient alpha

-aka cronbach’s alpha

involves administering the test to a sample of examinees and calculating the average inter-item consistency

How well did you know this?

Not at all

Perfectly

Kuder Richardson’s 20( kr-20)

alternative to coefficient alpha
can be used when test items are dichotomously scored

How well did you know this?

Not at all

Perfectly

Split half reliability

involves administering the test to a sample of examinees splitting the test in half, usually in terms of even and odd numbers and correlating the scores on the two halves

How well did you know this?

Not at all

Perfectly

Problem with split half reliability

Study These Flashcards

You’re required to calculate reliability coefficient for two forms of the test that are half as long as the original test and shorter tests tend to be less reliable than longer ones
It usually underestimates a test’s reliability and is usually corrected with the spearman Brown prophecy formula

Inter rater reliability

Study These Flashcards

important for measures that are subjectively scored
provides info on the consistency of scores or ratings assigned by different raters

Percent agreement and cohens alpha coefficient

Study These Flashcards

methods for calculating inter rater reliability

Percent agreement

Study These Flashcards

can be calculated on two or more raters
does not take chance agreement into account which can result in an ever overestimate of reliability

Cohen’s kappa coefficient

Study These Flashcards

aka the kappa statistic
One of several inter-rator reliability coefficients that is corrected for chance agreement between raters
used to assess the consistency of ratings assigned by two raters when the ratings represent a nominal scale

Factors that affect Reliability of subjective ratings

Study These Flashcards

can be affected by consensual observer drift

Consensual observer drift

Study These Flashcards

occurs when two or more raters communicate to each other while assigning ratings which results in an increased consistency but decreased accuracy in ratings and an overestimate of iterator reliability

Ways to reduce consensual observer drift

Study These Flashcards

not having raiders work together
providing raiders with adequate training and regular monitoring the accuracy of raters ratings

Factors that affect the reliability coefficient

- content homogeneity - range of scores - guessing - reliability index - item analysis

Content homogeneity

- tests that are homogeneous tend to have larger reliability coefficients than tests that are heterogeneous - especially true for internal consistency reliability

Content homogeneity

- tests that are homogeneous tend to have larger reliability coefficients than tests that are heterogeneous - especially true for internal consistency reliability

Range of scores

,- reliability coefficients are larger when the scores are unrestricted in terms of range - when a sample includes examinees who have high, moderate, and low levels of the characteristics

Guessing

- reliability coefficients are affected by the likelihood that a test items could be answered correctly by guessing - The easier the questions. The lower the reliability coefficient - true false tests are likely to be less reliable than multiple choice tests with three or more answers

Reliability index

- is either the reliability coefficient or the theoretical correlation between the observed test scores and the true test scores calculated by taking the square root of the reliability coefficient

Item analysis

- used to determine which items to include on a test and involves determining each item's difficulty level and ability to discriminate between examinees who obtain high and low test scores

Standard error of measurement

- used to construct a confidence interval and is calculated by multiplying the test standard deviation times 1- us the reliability coefficient

Calculating confidence interval

- for 68%. You add and subtract one standard error of measurement to and from the obtained score - for 95% confidence interval you add and subtract two standard deviations of measurement + And for 99% confidence interval you add and subtract three standard deviations of measurement

Adequate reliability

- test scores can be expected to be consistent ,- does not indicate that the test measures what it's designed to measure

Validity

- The degree to which the test accurately measures which what it was designed to measure Three types ,- content validity - Construct validity Criterion related validity

P content validity

- important for tests that have been designed to measure one or more content or behavioral domain ,- example achievement tests or work samples - established during the development of a test by clearly defining the domain to be assessed and including items that are representative samples of that domain ,- test items are then systematically reviewed by subject matter experts to ensure the item's address all important aspects of the domain

Construct validity

- important for Tess that have been designed to measure a hypothetical trait. Example intelligence, motivation, or introversion that cannot be directly observed but is inferred by an examinees behavior - involves using several procedures including obtaining evidence of a test, convergent and divergent validity

Convergent validity

,- The degree to which scores on a test have high correlations with scores or other measures designed to assess the same or related constructs

Convergent validity

,- The degree to which scores on a test have high correlations with scores or other measures designed to assess the same or related constructs

Divergent validity

- also known as discriminant validity - The degree to which scores on the test have low correlations with scores on measures of unrelated constructs

Test Construction Flashcards

(40 cards)