Week 3 - Reliability and Validity Flashcards

1
Q

Classical Test Theory

A

Test scores are result of

  • Factors that contribute to consistency
  • Factors that contribute to inconsistency (characteristics of test takers, things that have nothing to do with attribute such as situation, environment)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

X = T + e

A
X = obtained score
T = true score
e = errors of measurement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sources of Error

A
  • Item selection
  • Test administration
  • Test scoring
  • Systematic measurement error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Domain-sampling mode

A

a way of thinking that sees the test as a representative sample of a large domain of possible items that could be included on the test

  • considers the problem of using only a sample of items to represent a construct
  • as test gets longer, should represent construct better, increase reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Inter-rater reliability

A

the extent to which different raters agree in their assessments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Method variance

A

the variability among scores that arises because of the form as distinct from the content of the test - the method of administering the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Reliability

A

the consistency that a test will give the same result each time it is used to measure the same thing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Stability over time

A

the extent to which test scores remain stable when a test is administered on more than one occasion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Internal consistency

A

the extent to which a psychological test is homogeneous or heterogeneous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Social desirability bias

A

a form of method variance that arises when people respond to questions that place them in a favourable or unfavourable light

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Test-Retest Stability

A

The same test administered to the same group twice at different points
- may not get identical scores due to practice effects, maturation, treatment effects or setting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Parallel or alternate forms of reliability

A

Two forms of same test developed, different items selected according to the same rules

Parallel - same distribution of scores (mean and variance equal)
Alternate - different distribution of scores (mean and variance may not be equal)

Both matched for content and difficulty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Split half method

A

Test is divided into halves that are compared

- useful in overcoming logistical difficulties of test-retest reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Measuring Internal consistency

Cronbach’s Alpha

A

Cronbach’s alpha - a generalised reliability coefficient for scoring systems that are graded (i.e agree - disagree)

Acceptable levels of reliability

  • .70-.80 acceptable or good
  • greater than .91 may indicate redundancy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Standard error of measurement (SEM)

A

allows estimation of precision of an individual test score
- the larger the SEM, the less certain we are that the test score represents the true score

Reliability coefficient (r) - an index of the ratio of true score to error score variance in a test
- SEM = (1-r)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Validity

A

the extent to which a test measures the construct it is intended to measure
- inferences from test must be appropriate, meaningful, and useful

17
Q

Face validity

A

does the test look like it measures the relevant construct

18
Q

Content validity

A

the extend to which items on a test represent the universe of behaviour the test was designed to measure

  • logical deduction rather than strict analysis
19
Q

Construct underrepresentation

A

failure to capture important components of a construct

20
Q

Construct-irrelevant variance

A

measuring things other than the construct of interest

21
Q

Criterion related validity

A

the extent to which a measure is related to an outcome
Good criterions share - reliable, appropriate

relationship between test and criterion usually expressed as correlation

22
Q

Predictive evidence

criterion related validity

A

how well the test predicts performance on a criterion

23
Q

Concurrent evidence

criterion related validity

A

refers to a comparison between the measure in question and an outcome assessed at the same time

24
Q

Incremental validity

A

the extent to which knowledge of a score adds to that obtained by a pre-existing test score or psychological characteristic

25
Q

Construct validity

A

concerned with establishing how well a test measures a psychological construct

26
Q
Convergent evidence 
(construct validity)
A

refers to the degree that two constructs which should be related, are related
- identify relationships we would expect if the test is actually measuring the construct

27
Q

Discriminant (divergent) evidence

construct validity

A

aims to demonstrate that the test is unique

- low correlations should be observed with constructs that are unrelated to what the test is trying to measure

28
Q
Factor analysis
(construct validity)
A
  • to observe patterns
  • items may cluster, which is attributed to action of latent/unobserved variables or factors
  • Exploratory factor analysis
  • Confirmatory factor analysis
29
Q

Decision-theoretic approach to predictive validity

A

includes

  • cutting point
  • valid positive and negative decisions
  • false positive and negative decisions
  • base rate
  • selection ratio
30
Q

Cutting point

A

the test score or point on the scale that is used to split those being assessed into two groups predicted to show and not show the studied behaviour

31
Q

Valid positive and negative decisions

A

Positive - where the person is predicted to show the behaviour and shows it

Negative - where the person is not predicted to show the behaviour and does show it

32
Q

False positive and negative decisions

A

Positive - the prediction is that the person has the characteristic but does not

Negative - the prediction is that the person does not have the characteristic but does

33
Q

Base rate

A

the proportion of individuals in the population show show the behaviour of interest

34
Q

Selection ratio

A

the proportion of those assessed who can be allocated to the category of showing the behaviour