Reliability & Validity Flashcards

(39 cards)

1
Q

Reliability

A
  • Are the results consistent?

- Provides an estimate of the proportion of unsystematic error <—need to know the degree of to determine reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Validity

A
  • Does it measure what it says it measures?
  • Overall eval of evidence and degree of trustworthiness
  • Determine if enough support exists to use the test in a certain way
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Classical Test Theory

A
  • Observed score = T + E
  • T is the true score if the test is completely free from error
  • E is the error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Unsystematic Error

A
  • Random errors: mood, health, fatigue
  • Administration differences
  • Scoring differences
  • Random guessing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Systematic Error

A

Constant errors that occur every time tested, like a typo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Reliability Related to Validity

A
  • High validity can occur if high reliability exists
  • High validity cannot occur if low reliability
  • High reliability does not suggest high validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Correlation Related to Reliability

A
  • Correlation: Statistical technique used to examine consistency
  • Reliability is often based on consistency between two sets of scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Positive Correlation

A

As one increases, so does the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Negative Correlation

A

As one increases, the other decreases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Correlation Coefficient (Pearson-Product Moment)

A
  • Correlation coefficient: numerical indicator of the relationship between two sets of data
  • PPM correlation coefficient - most common
  • -1 to +1: closer to absolute value 1=stronger relationship
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Test-Retest

A
  • Give same test twice to same group
  • Correlation between first and second administration (2-6 weeks away)
  • Possible influences: shorter gap, high correlation, changes in administration, interventions, practice test
  • Ex: skills-based test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Alternate Forms

A
  • Very difficult
  • Correlation off scores from two equivalent forms of a test
  • Measures stability (over time) and equivalence (construct similarity)
  • Use sample of different times from same domain
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Internal Consistency

A
  • One administration
  • One form of instrument
  • Divides instrument and correlates the scores from the different portions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Split-Half Reliability

A
  • Given once then split in half to determine reliability
  • Need to divide instrument into equivalent halves, like even and odd
  • Problem: dividing instrument in half makes number of items smaller —> smaller correlation

Doesn’t work if test increases in difficulty and doesn’t quick fix problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Kinder-Richardson

A
  • KR-20: heterogeneous items
  • KR-21: homogenous items - single construct (cannot be used if items are from the same domain or differ in difficulty)
  • Lower reliability coefficient then split-half
  • Purpose: Estimate the average of all split-half reliabilities from all ways of splitting the instrument
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Pearson-Product Coefficient Alpha

A
  • Used for non-dichotomous scoring
  • Ex: Likert scales
  • Cronbach’s alpha
  • Takes into account variance of each item
  • Conservation estimate of reliability
  • Most common
17
Q

Standard Error of Measurement (SEM)

A
  • Provides estimate of range of scores if someone were to take instrument repeatedly
  • Based on idea that if someone takes test multiple times, scores would fall into a normal distribution
18
Q

SEM v. SD

A
  • SD is spread of scores between students
  • SEM is spread of scores for one student
  • Uses same estimations
19
Q

Content-Related Validity

A
  • Test items measure the objectives they are supposed to measure
  • Focus on how content was determined
  • May be based on test creator’s own analysis of topic or expert analysis
  • How well do test items reflect the domain of material being tested
20
Q

Criterion-Related Validity

A
  • Test scores related to specific criterion/variable
  • Sources of criterion scores: academic achievement, level of education, performance in specialized training, job performance, psychiatric diagnosis, ratings by supervisors, correlations with previously available tests
21
Q

Concurrent Validity (Criterion-Related)

A
  • Scores on test and criterion measure are collected at same point
  • Ex: achievement, certification
  • Scorer typically higher than predictive
  • Require reliable and bias-free measures
22
Q

Predictive Validity (Criterion-Related)

A
  • Test is administered first and scores on criterion measure are collected at a later time
  • Ex: SAT, college GPA
  • Require reliable and bias-free measures
23
Q

Construct Validity

A
  • What do scores on this test mean or signify
  • Construct: Grouping of variables that make up observed behavior patterns
  • Ex: Self-efficacy, personality
  • Measured by correlation of 2 scores or factor analysis
  • Often seen in psych tests
24
Q

Convergent v. Discriminant (Construct Validity)

A

-Covergent: Positive correlation with other tests measuring the same/similar construct

25
Threats to Construct Validity
- Too many variables - Under-represented: missing measuring parts of construct - Extra questions - Items are too similar
26
Overall Threats to Validity
- History: outside events during course of test - Maturation: natural development with age - Testing: repeat testing; changes due to practice - Instrumentation: changes in measurement procedures - Statistical regression: regression to mean after extreme score first time - Interaction: any combo of 2 - Mortality: drop out - Collection of subjects: bias of collecting subjects and assigning to groups
27
Face Validity
- Not legitimate | - Based on appearance of the measure and its test items
28
Types of Evidences
- Test content - Response processes - Internet structures - Relations to other variables - Consequences of testing
29
Item Analysis
- Examine and eval each item in the test ---> get rid of items that don't work - Done during instrument development or revision
30
Item Difficulty
- Index reflecting proportion of people getting item correct - 0.0= no one got it correct - 1.0= everyone got it correct - 0.5= ideal for differentiation
31
Item Discrimination
- Degree to which item correctly differentiates among test takers - Extreme group method: 2 groups - high scores, low scores (works with normal distribution) - Correlational method: performance of test v. item
32
Item Response Theory (IRT)
- Focus on each item -considers mathematical relationship between abilities - 2 major assumptions: unidimensionality, local independence - Most common in testing where there is a right/wrong answer v. preference - Models student ability using each question instead of aggregate score
33
Unidimensionality
Each item measures one ability or trait
34
Local Independence
Unrelated to responses on other items
35
Selecting Tasks
- Determine what info is needed - Consider what info is needed - Search assessment resources - Eval possible instruments
36
Administering Tests
- Pre-testing procedures - Administration - Scoring: by hand, computer, Internet
37
Communicating Results
- Simple language - Individual v. Group - Written v. Oral - Communicate test's strengths and limitations - Know the manual - Describe v. Just report cases - Use various results - Involve client - Encourage asking questions - Relate test to a goal
38
Problems with Reporting Result
- Acceptance - Readiness of client - Negative results - Flat profiles and doesn't show anything - Motivation and attitude
39
Communicating Test Results for Parents
- Identifying information - Reason for referral - Background info - Test results and interpretation - Diagnostic impressions and summary