Ch. 5 - Reliability Flashcards

(57 cards)

1
Q

reliability

A

consistency in measurement (not good or bad, right or wrong, just consistent); the proportion of the total variance attributed to true variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

reliability coefficient

A

a proportion that indicates the ratio between the true score variance on a test and the total variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

concept of reliability - equation

A

Observed Score = True Score + Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

we use X to describe test score variability / reliability

A

variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

the proportion of the total variance attributed to true variance is

A

reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

the greater the reliability…

A

indicates that you are capturing more true variance than “noise”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

measurement error

A

all of the factors associated with the process of measuring some variable, other than the variable being measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

error variance

A

variance from irrelevant, random sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

sources of error variance

A
test construction (content sampled, way items are worded
test administration (environment: lighting, temperature; testtaker variables: sick, bad mood; examiner-related variables: "giving away" answers with tone of voice)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

more sources of error variance

A

computer glitches or errors in hand-scoring; testtakers may over or under report
sampling error - only contacting voters with landlines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

test-retest reliability

A

a method of reliability. obtained by correlating pairs of scores from the same people on two different administrations of the same test. use when measuring something that’s stable over time (trait)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

as the time between test administrations increases, the correlation usually…

A

decreases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

coefficient of stability

A

the estimate of test-retest reliability, when the interval between testing is greater than six months

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

coefficient of equivalence

A

the degree of the relationship between various forms of a test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

parallel forms (reliability)

A

for each form of the test, the means and variances of observed test scores are equal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

alternate forms (reliability)

A

these don’t necessarily met the requirements of parallel forms (same means and variances) but are equivalent in terms of content, level of difficulty, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

parallel or alternate forms relaibility

A

the extent to which item sampling and other errors have affected test scores on versions of the same test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

how do you obtain parallel or alternate forms reliability estimates?

A

administer test two times with same group (like test-retest but don’t have to wait)
same problems: scores affected by item sampling, testtaker variables, etc
time consuming and expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

estimate of inter-item consistency

A

degree of correlation among all items on a scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

how do you do a split-half reliability estimate?

A

(1) divide test into equivalent halves
(2) find Pearson r between the scores on each half
(3) adjust the half-test reliability with Spearman-Brown formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is a split-half reliability estimate?

A

obtaining reliability estimate evaluating the internal consistency of the test (no need for two firms or time elapsing).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

how should split the test for a split-half reliability estimate?

A

not down the middle
randomly assign items
split odd-even
divide by content and difficulty

i.e. make mini parallel forms!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Spearman-Brown Adjustment

A

determines the reliability of a whole test from a shortened version. (half)

24
Q

don’t use split-half reliability with what kind of test?

A

heterogeneous (measures more than one trait)

25
reliability usually increases as...
test length increases
26
alternatives to the Spearman-Brown reliability estimate (for split-half)
Kuder-Richardson (for tests with dichotemous items) Average Proportional Distance Cronbach's alpha - "mean of all possible split-half correlations"
27
reliability coefficients range from
0 to 1. possible to get negative, but usually a mistake in data entry
28
measures of reliability are subject to
error. they are estimates
29
a reliability coefficient may not be acceptable if
it is done with the same test on a very different set of testtakers
30
what's a good reliability?
like grades! .90 is an A, .80 is a B
31
if reliability is really high on a split-half estimate, what is likely the cause?
redundancy in test items
32
the more homogeneous a test is...
the more inter-item consistency it can be expected to have (duh)
33
split-half reliability, odd-even, Spearman-Brown formula, Kuder-Richardson (KR-20), alpha, and Average Proportional Distance are all methods of evaluating...
the internal consistency of a test
34
inter-scorer reliability
the degree of agreement or consistency between two or more scorers/judges/raters
35
if inter-scorer reliability is high,...
test scores can be derived in a systematic, consistent way by trained scorers
36
what are the three approaches for estimating reliability?
test-retest, alternate or parallel forms, internal or inter-item consistency
37
what about the nature of a test might influence reliability? (5)
``` homogeneous vs heterogeneous test dynamic vs static characteristics restriction or inflation of range speed vs power test criterion-referenced vs norm-referenced tests ```
38
heterogeneous vs homogeneous test
measures different factors; measures one factor/trait
39
traditional ways of estimating reliability are often not appropriate for what kind of test?
criterion-referenced
40
what kind of reliability estimate is best for a heterogeneous test?
test-retest (not inter item consistency because that will be low)
41
what kind of reliability estimate is best for a measurement of dynamic characteristics?
inter-item consistency (not test-retest)
42
power test
has a long time limit, but some items are so hard that no testtaker will get a perfect score
43
speed test
must be done in a certain amount of time. easy items but tough to get them all done (typing)
44
classical test theory believes that...
everyone has a "true score" on a test. very test-dependent, though
45
what are alternatives to classical test theory?
domain sampling theory generalizability theory Item Response Theory (IRT)
46
domain sampling theory
test's reliability is an objective measure of how precisely the test measures the "domain" of the test (ex: behavior). takes issue with the true score + error = score
47
generalizability theory
a person's test scores vary from testing to testing because of the variables in the testing sitaution. takes issue with the true score + error = score
48
Item Response Theory (IRT)
hundreds of varietys; items vary in many different ways including: Difficulty and Discrimination
49
what tells us how much error could be in single test score?
Standard Error of Measurement (SEM)
50
Standard Error of Measurement
estimates the extent to which an observed score deviates from a "true" score
51
the higher reliability of a test, the ____ the SEM
lower
52
if a person were to take a bunch of equivalent tests, scores would be...
normally distributed with their true score at the mean
53
confidence interval
the range or band of scores that is likely to contain the true score
54
95% confidence interval - what does it mean?
we are 95% confident that the true score is within +- 2 standard errors of measurement. 95% of this testtaker's scores are expected to fall within this range on the distribution
55
true differences in a characteristic being measured might be from another source besides error or change from one testing to another. what might that be?
an actual difference. might be what you're looking for in psychotherapy outcome reasearch
56
standard error of the difference helps you determine
if your research showed statistically significant results of something weird!
57
the standard error of the difference will always be ___ compared to the standard error of measurement for a score.
larger, because both include error.