Lecture 4.2 Reliability Flashcards

1
Q

Reliability

A

• The consistency with which a test measures what it purports to measure in any given set of circumstances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True

A

True or False

A reliable test will result in the same score every time it is used to measure the same thing under the same conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Reliability coefficient

A

An index of reliability that indicates the ratio between the true score variance on a test and the total variance (SD2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

> .90

A

Reliability coefficient of _______ is excellent for research purposes, appropriate for individual assessment purposes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

> .80

A

Reliability coefficient of _______s good for research purposes, marginal for individual assessment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Reliability coefficient

A
  • higher scores = higher reliability
  • > .6 is marginal for research purposes
  • > .70 is adequate for research purposes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Classic Test Theory

A

assumes that each person has an innate true score. It can be summed up with an equation:
X = T + E,
Real score is true score plus error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

more reliable

A

higher proportion of true variance =

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

less reliable

A

higher proportion of error variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

increase or decrease

A

error variance may______________ or _________________ a test score by varying amounts –leading to lower reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Systematic error and unsystematic

A

Two types of testing error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Systematic error

A

Testing error that doesn’t affect reliability. Consistent error, predictable (when aware) – leaking tyre

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Unsystematic error

A

Testing error that effects reliability. Inconsistent, unpredictable – electrical problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Test construction

A

Sources of Error Variance T_______ C_______
The content covered by test items, the way questions are asked, and the response format all add to the error variance of a test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Test administration

A

Sources of Error Variance T_______ A_______
• Test environment (including test materials), test-taker variables (e.g., alertness, wellbeing, mistakes) & administrator-related variables (e.g., presence or absence, demeanour, departure from procedure, unconscious cues, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Test scoring & interpretation

A

Sources of Error Variance T_______ s _______ a ________
Human error - data entry, transcription, coding, calculation, timing, etc.
Level of objectivity/subjectivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Human fallibility

A

Sources of Error Variance h _______ f _________.
• Forgetting or misremembering
• Failing to notice or not being aware
• Not understanding or following instructions
• Under- and over-reporting
• Differences of opinion
• Lying or misleading

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Time and practice effects

A

Sources of Error Variance ti_________ and pr______eff________.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Domain Sampling Model

A

This model assumes that the items that have been selected for any one test are just a sample of items from an infinite domain of potential items. Error that occurs in the development of a test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Domain Sampling Model

A

• Seeks to determine how precisely the test score assesses the domain from which the test draws a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

True score

A

The score you would get if you answered all the items that could be conceivable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Standard Error of Measurement (SEM)

A

• Measures the precision of an observed score & provides an estimate of the amount of error inherent in an observed score or measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Standard Error of Difference (SED)

A

Can be used to compare:
• an individual’s scores on two different tests
• two different people’s scores on the same test
• two different people’s scores on two different tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Test-Retest Reliability

A
  • Calculated by correlating scores from the same people on two different administrations of the same test
  • Used for measuring characteristics that are thought to be stable (e.g. personality traits or intelligence)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
amount of time between administrations | Any interventions, treatment or trauma, taking place between test administrations;
Test-retest reliability will be affected by
26
Parallel & Alternate Forms Reliability
Different versions of a test, matched for content and difficulty
27
Split-Half Reliability
Scores from one half of a test are correlated with the other half of the test, using equivalent halves • Random, odds & evens, content & difficulty
28
Inter-Rater Reliability
The degree of agreement between two or more scorers. Reduced by appropriate training.
29
Test-retest
correlate scores from 2 administrations of the same test
30
Parallel forms
correlate scores from 2 versions of the same test
31
Split-half
correlate scores from 2 equivalent halves of the same test
32
Internal consistency
correlate items within the same test
33
Inter-rater
correlate scores from 2 scorers for one test taker
34
reliability coefficients
Indicates the ratio between the true score variance on a test and the total variance Range from 0 to 1: closer to 1, the higher the reliability
35
Homogenous
__________________ test unifactorial, so consist of items measuring a single trait or factor
36
Heterogenous
________________ test is multifactorial, so measure more than one trait or factor
37
static
a characteristic, trait, or ability that is presumed to be relatively unchanging
38
dynamic
a characteristic, state, or ability that is presumed to be ever changing as a function of situational and cognitive experiences
39
Restricted range or variance
sampling procedure used to gather the test scores does not result in a full spread of scores (e.g., having only university students complete an IQ test)
40
Inflated range or variance
when the sample includes people who are outside of the range of the test so the scoring range is inflated (e.g., adults completing a test designed for children)
41
speed test
all items of equal difficulty, and time limited so that no-one is likely to be able to answer all items
42
power test
time limit is long enough for all items to be attempted, but some items are so difficult that no-one is likely to get them all right
43
Criterion-Referenced
Designed to provide an indication of where a test taker stands with respect to some criterion (i.e., pass/fail type tests)
44
Validity
The extent to which evidence supports the meaning and use of a psychological test (or other assessment device)
45
The validity coefficient
A correlation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure
46
Validity
How well a test or measurement tool measures what it purports to measure in a particular context
47
Classic (trinitarian) Model
focuses on three categories of validity
48
Content Validity
Type of validity - scrutinizing the test’s content
49
Criterion-related validity
Type of validity - relating scores obtained on the test to other test scores or other measures
50
Construct validity
Type of validity - ‘umbrella validity’; comprehensive analysis of how test scores relate to scores on other tests/measures & how test scores relate to the construct that the test was designed to measure
51
Unitary Model of validity
_____________ view takes everything into account, from implications of test scores in terms of societal values to the consequences of use
52
Test validation
* The process of gathering and evaluating validity evidence. * Test developer is responsible for supplying validity info in the test manual and/or through a ‘test validation’ journal article
53
Content Validity
• Describes a judgement of how adequately a test samples behaviour representative of the universe of behaviour that the test was designed to sample
54
Face Validity
Type of content validity | A judgement concerning how relevant the test items appear to be to the test-taker
55
Quantifying content validity
Important in employment settings, where tests are used to hire & promote • Tests must be shown to include relevant items in terms of job skills required for the position • Lawshe (1975): • Is the skill or knowledge measured by this item: 1) Essential; 2) Useful but not essential; 3) Not necessary to the performance of the job?
56
Culture
C____________ has an impact on judgements concerning the validity of tests and test items
57
Criterion-Related Validity
C __________ R________ V __________ A judgement of how adequately a test score can be used to infer an individual’s most probable standing on some measure of interest – the measure of interest being the criterion
58
criterion
A _____________ is the standard against which a test or test score is evaluated -can be almost anything:
59
1. RELEVANT 2. VALID 3. UNCONTAMINATED
A criterion should be: 1. R___________ – pertinent or applicable to the matter at hand 2. V___________ for the purpose for which it is being used 3. U____________ – not based on predictor measures
60
Predictive Validity
P ______________ V ______________ is the degree to which a test score predicts a criterion measure at a future time
61
Concurrent Validity
C___________ v_________ is the degree to which a test score is related to a criterion measure that is obtained at (about) the same time
62
Incremental Validity
I___________ V__________ The degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use
63
False negatives
test takers predicted not to show characteristic but do
64
False positives
test takers predicted to show characteristic but don’t
65
Miss rate
M_____ r_______the proportion of people incorrectly classified
66
Hit rate
H________ r_______the proportion of people correctly identified
67
Base rate
B______ r________ the extent to which a particular trait, behaviour, characteristic or attribute exists in the population
68
Construct validity
C_________ v___________ A judgement about the appropriateness of inferences drawn from test scores regarding individual standings on a variable called a construct
69
``` Homogeneity of items Changes with age Pre-test to post-test changes Group differences Convergent evidence Divergent evidence Factor analysis ```
``` Evidence of construct validity H_____________ of items Changes with a____ Pre-test to p_____________changes G________ differences C__________ evidence D__________ evidence F_________ analysis ```
70
Evidence of homogeneity
E__________ of h___________ - How uniform the test is in measuring a single concept
71
Evidence of changes with age
Some constructs are expected to change with age, particularly during childhood/adolescence
72
Evidence of pre-test/post-test changes
Evidence that scores change as the result of some experience between a pre-test and a post-test can be evidence of construct validity
73
Evidence from distinct groups
Demonstrating that scores on the test vary in a predictable way as a function of membership in some group
74
Convergent evidence
When test scores on a new test are found to correlate highly in the predicted direction with scores on a older, more established and validated test designed to measure the same construct
75
Discriminant evidence
Shown when test scores are found to have little or no relationship with test scores or variables for which theoretically there should be no relationship
76
Factor Analysis
Can be used to determine both convergent and discriminant evidence of construct validity
77
Confirmatory Factor Analysis
A factor structure is explicitly hypothesised and is tested for its fit with the observed covariance structure of the measured variables
78
Exploratory Factor Analysis
Estimating or extracting factors, deciding how many factors to retain, rotating factors to an interpretable orientation