Reliability and Validity Flashcards

1
Q

Define

Alternative

A

Two forms of the same test developed; different items selected according to the same rules. Different distribution of scores (mean and variance may not be equal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define

Base rate

A

the proportion of individual in the population who show the behaviour of interest in a given psychological testing or assessment situation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define

Classical test theory

A

a body of related psychometric theory that predicts outcomes of psychological testing such as the difficulty of items or the ability of test-takers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define

Concurrent validity

A

a form of predictive validity in which the index of social behaviour is obtained close in time to score on the psychological test (or other assessment device)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define

Construct underrepresentation

A

failure to capture important components of a contruct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define

Construct validity

A

the meaning of a test score made possible by knowledge of the pattern of relationships it enters into with other variables and the theoretical interpretation of those relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define

Constuct-irrelevant variance

A

measuring things other than the construct of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define

Content validity

A

the extent to which items on a test represent the universe of behaviour the test was designed to measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define

Convergent and discriminant validity

A

the subjection of a multitrait-multimethod matric to a set of criteria that specify which correlations should be large and which small in terms of a psychological theory of the constructs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define

Criterion-related validity

A

the extent to which a measure is related to an outcome (e.g. marks in Year 12 are used to predict performance at university)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define

Cronbach’s alpha

A

an estimate of reliability that is based on the average intercorrelation of the items in a test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define

Cutting point

A

the test score or point on a scale, in the case of another assessment device, that is used to split those being tested or assessed into two groups predicted to show or not show some behaviour of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define

Domain-sampling model

A

a way of thinking about the composition of a psychological test that sees the test as a representative sample of the larger domain of possible items that could be included in the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define

Equivalent forms reliability

A

the estimate of reliability of a test obtained by comparing two forms of a test constructed to measure the same construct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define

Errors of measurement

A

Factors that contribute to inconsistency - characteristics of tests taker, test or situation that have nothing to do with attribute being tested by effect scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define

Face validity

A

Does the test look like it measures the relevant construct?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Define

Factor analysis

A

a mathematical method of summarising a matric of values (such as the intercorrelation of test scores) in terms of a smaller number of values (factors) from which the original matric can be reproduced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Define

False negative decision

A

a decision that incorrectly allocates a test taker or person being assessed to the category of those predicted not to show some behaviour of interest on the basis of their score on a test or other assessment device

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define

False positive decision

A

a decision that incorrectly allocates a test taker or person being assessed to the category of those predicted to show some behaviour of interest on the basis of their score on a test or other assessment device

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Define

Generalisability theory

A

a set of ideas and procedures that follow from the proposal that the consistency or precision of the output of a psychological assessment device depends on specifying the desired range of conditions over which this is to hold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Define

Incremental validity

A

the extent to which knowledge of score on a test (or other assessment device) adds to that obtained by another, pre-existing score or psychological characteristic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Define

Inter-rater reliability

A

the extent to which different raters agree in their assessments of the same sample of ratees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Define

Internal consistency

A

the extent to which a psychological test is homogenous or heterogeneous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Define

Kuder-Richardson 20 (KR20)

A

a particular case of Cronbach’s alpha for dichotomously scored items (i.e. scored as 0 or 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Define

Method variance

A

the variability among scores on a psychological test or other assessment device that arises because of the form as distinct from the content of the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Define

Multitrait-multimethod matrix

A

the patterns of correlations resulting from testing all possible relationships among two or more methods of assessing two ro more constructs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Define

Parallel forms relaibility

A

Two forms of the same test developed; different items selected according to the same rules. Same distribution of scores (mean and variance equal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Define

Predictive validity

A

the extent to which a score on a psychological test (or other assessment device) allows a statement about standing on a variable indexing important social behaviour independent of the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Define

Reliability

A

the consistency with which a test measures what it purports to measure in any given set of circumstances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Define

Reliability coefficient

A

an index - often a Pearson product moment correlation coefficient - of the ratio of true score to error score variance in a test as used in a given set of circumstances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Define

Selection ratio

A

the proportion of those tested or assessed who can be allocated to the category of showing the behaviour of interest in a given psychological testing or assessment situation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Define

Social desirability bias

A

a form of method variance common in the construction of psychological tests of personality that arises when people respond to questions that place them in a favourable or unfavourable light

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Define

Spearman-Brown formula

A

applied to estimate reliability if each half of the test was the same length as the test. I.e. allows you to estimate internal consistency if the test was longer or shorter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Define

Split-half reliability

A

the estimate of reliability obtained by correlating scores on the two halves of a test formed in some systematic way (e.g. odd versus even items)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Define

Stability over time

A

the extent to which test scores remain stable when a test is administered on more than one occasion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Define

Standard error of estimate

A

an index of the amount of error in predicting one variable from another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Define

Standard error of measurement

A

an expression of the precision of an individual test score as an estimate of the trait it purports to measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Define

Test-Retest Reliability

A

the estimate of reliability obtained by correlating scores on the test constructor is seeking to measure and the conditions under which it will be used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Define

True scores

A

Factors that contribute to consistency - stable attributes under examination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Define

Valid negative decision

A

a decision that correctly allocates a test taker or person being assessed to the category of those predicted not to show some behaviour of interest on the basis of their score on a test or other assessment device

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Define

Valid positive decision

A

a decision that correctly allocates a test taker or person being assessed to the category of those predicted to show some behaviour of interest on the basis of their score on a test or other assessment device

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Define

Validity

A

the extent to which evidence supports the meaning and use of a psychological test (or other assessment device)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Definition

Two forms of the same test developed; different items selected according to the same rules. Different distribution of scores (mean and variance may not be equal)

A

Alternative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Definition

the proportion of individual in the population who show the behaviour of interest in a given psychological testing or assessment situation

A

Base rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Definition

a body of related psychometric theory that predicts outcomes of psychological testing such as the difficulty of items or the ability of test-takers

A

Classical test theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Definition

a form of predictive validity in which the index of social behaviour is obtained close in time to score on the psychological test (or other assessment device)

A

Concurrent validity

47
Q

Definition

failure to capture important components of a contruct

A

Construct underrepresentation

48
Q

Definition

the meaning of a test score made possible by knowledge of the pattern of relationships it enters into with other variables and the theoretical interpretation of those relationships

A

Construct validity

49
Q

Definition

measuring things other than the construct of interest

A

Constuct-irrelevant variance

50
Q

Definition

the extent to which items on a test represent the universe of behaviour the test was designed to measure

A

Content validity

51
Q

Definition

the subjection of a multitrait-multimethod matric to a set of criteria that specify which correlations should be large and which small in terms of a psychological theory of the constructs

A

Convergent and discriminant validity

52
Q

Definition

the extent to which a measure is related to an outcome (e.g. marks in Year 12 are used to predict performance at university)

A

Criterion-related validity

53
Q

Definition

an estimate of reliability that is based on the average intercorrelation of the items in a test

A

Cronbach’s alpha

54
Q

Definition

the test score or point on a scale, in the case of another assessment device, that is used to split those being tested or assessed into two groups predicted to show or not show some behaviour of interest

A

Cutting point

55
Q

Definition

a way of thinking about the composition of a psychological test that sees the test as a representative sample of the larger domain of possible items that could be included in the test

A

Domain-sampling model

56
Q

Definition

the estimate of reliability of a test obtained by comparing two forms of a test constructed to measure the same construct

A

Equivalent forms reliability

57
Q

Definition

Factors that contribute to inconsistency - characteristics of tests taker, test or situation that have nothing to do with attribute being tested by effect scores

A

Errors of measurement

58
Q

Definition

Does the test look like it measures the relevant construct?

A

Face validity

59
Q

Definition

a mathematical method of summarising a matric of values (such as the intercorrelation of test scores) in terms of a smaller number of values (factors) from which the original matric can be reproduced

A

Factor analysis

60
Q

Definition

a decision that incorrectly allocates a test taker or person being assessed to the category of those predicted not to show some behaviour of interest on the basis of their score on a test or other assessment device

A

False negative decision

61
Q

Definition

a decision that incorrectly allocates a test taker or person being assessed to the category of those predicted to show some behaviour of interest on the basis of their score on a test or other assessment device

A

False positive decision

62
Q

Definition

a set of ideas and procedures that follow from the proposal that the consistency or precision of the output of a psychological assessment device depends on specifying the desired range of conditions over which this is to hold

A

Generalisability theory

63
Q

Definition

the extent to which knowledge of score on a test (or other assessment device) adds to that obtained by another, pre-existing score or psychological characteristic

A

Incremental validity

64
Q

Definition

the extent to which different raters agree in their assessments of the same sample of ratees

A

Inter-rater reliability

65
Q

Definition

the extent to which a psychological test is homogenous or heterogeneous

A

Internal consistency

66
Q

Definition

a particular case of Cronbach’s alpha for dichotomously scored items (i.e. scored as 0 or 1)

A

Kuder-Richardson 20 (KR20)

67
Q

Definition

the variability among scores on a psychological test or other assessment device that arises because of the form as distinct from the content of the test

A

Method variance

68
Q

Definition

the patterns of correlations resulting from testing all possible relationships among two or more methods of assessing two ro more constructs

A

Multitrait-multimethod matrix

69
Q

Definition

Two forms of the same test developed; different items selected according to the same rules. Same distribution of scores (mean and variance equal)

A

Parallel forms relaibility

70
Q

Definition

the extent to which a score on a psychological test (or other assessment device) allows a statement about standing on a variable indexing important social behaviour independent of the test

A

Predictive validity

71
Q

Definition

the consistency with which a test measures what it purports to measure in any given set of circumstances

A

Reliability

72
Q

Definition

an index - often a Pearson product moment correlation coefficient - of the ratio of true score to error score variance in a test as used in a given set of circumstances

A

Reliability coefficient

73
Q

Definition

the proportion of those tested or assessed who can be allocated to the category of showing the behaviour of interest in a given psychological testing or assessment situation

A

Selection ratio

74
Q

Definition

a form of method variance common in the construction of psychological tests of personality that arises when people respond to questions that place them in a favourable or unfavourable light

A

Social desirability bias

75
Q

Definition

applied to estimate reliability if each half of the test was the same length as the test. I.e. allows you to estimate internal consistency if the test was longer or shorter

A

Spearman-Brown formula

76
Q

Definition

the estimate of reliability obtained by correlating scores on the two halves of a test formed in some systematic way (e.g. odd versus even items)

A

Split-half reliability

77
Q

Definition

the extent to which test scores remain stable when a test is administered on more than one occasion

A

Stability over time

78
Q

Definition

an index of the amount of error in predicting one variable from another

A

Standard error of estimate

79
Q

Definition

an expression of the precision of an individual test score as an estimate of the trait it purports to measure

A

Standard error of measurement

80
Q

Definition

the estimate of reliability obtained by correlating scores on the test constructor is seeking to measure and the conditions under which it will be used

A

Test-Retest Reliability

81
Q

Definition

Factors that contribute to consistency - stable attributes under examination

A

True scores

82
Q

Definition

a decision that correctly allocates a test taker or person being assessed to the category of those predicted not to show some behaviour of interest on the basis of their score on a test or other assessment device

A

Valid negative decision

83
Q

Definition

a decision that correctly allocates a test taker or person being assessed to the category of those predicted to show some behaviour of interest on the basis of their score on a test or other assessment device

A

Valid positive decision

84
Q

Definition

the extent to which evidence supports the meaning and use of a psychological test (or other assessment device)

A

Validity

85
Q

What is reliability?

A
  • The degree to which a test tool provides consistent results
  • A test is considered reliable when it produces the same results again and again, when measuring the same thing
86
Q

What is validity?

A

Validity can be broadly understood as the extent to which a test measures the constuct it is intended to measure

87
Q

John is feeling unwell and visits his GP. The GP orders some blood tests. The results of the blood test indicate that John is iron deficient. The doctor prescribes iron supplements which John immediately start taking as prescribed. After a few weeks he returns to the doctor to repeat the blood tests. The results indicate that the iron levels have reduced!

What might be happening?

A
  • The test may have poor validity (i.e. it is measuring some other variable)
  • The test has poor reliability (i.e. when repeated, the test often shows different results)
88
Q

Why is reliability and validity important?

A
  • Diagnosis
  • Assessment of ability
  • Treatment
    • Decisions around recommending treatment
    • Monitoring treatment outcomes (e.g. reliability would be really important if you are repeating tests to see if the treatment is working)
  • The conclusions you can draw rely on the reliability and validity of the tests/assessments you are using.
    • Important clinically and in research
89
Q

True or False:

Test can be reliable without being valid

A

True

90
Q

True or False:

Test can be valid without being reliable

A

False

Tests cannot be valid without being reliable

91
Q

According to Classical Test Theory, what are test scores the result of?

A
  • Factors that contribute to consistency – stable attributes under examination (“True Scores”)
  • Factors that contribute to inconsistency – characteristics of tests taker, test or situation that have nothing to do with attribute being tested but effect scores. (“Errors of Measurement”)
92
Q

What are some commone sources of error on a test?

A

Item selection

Test administration

Test scoring

Systematic measurement error

93
Q

How is item selection a potential source of error?

A

sample of items chosen may not be equally reflective of every individual’s true score.

94
Q

How is test administration a potential source of error?

A

General environmental conditions e.g. temperature, lighting, noise; temporary “states” of test taker e.g. fatigue, anxiety, distraction.

E.g. completing an IQ test in a loud noisy room. Examiner providing non-standerdised instructions

95
Q

Domain Sampling Theory considers the problem of using only a ________ of items to represent a construct

A

Domain Sampling Theory considers the problem of using only a sample of items to represent a construct

96
Q

If the same test is administered to the same group twice at two different times, why might the scores not be the same?

A

More test-retest reliability

Practice effects

Maturation

Treatment effects or setting

97
Q

Which of these would test-retest be approapriate for?

  • State anxiety
  • Weight of a baby
  • Extraversion
  • Intelligence
A

Which of these would test-retest be approapriate for?

  • State anxiety
  • Weight of a baby
  • Extraversion
  • Intelligence
98
Q

How do you maximise test-retest reliability?

A
  • Test a relatively stable construct
  • No intervention in between testing
  • Shorter time between testing
99
Q

What is the difference between parallel and alternate forms reliability?

A

The both involve two forms of the same test developed; different items selected according to the same rules.

Parallel Forms: same distribution of scores (means and variance equal)

Alternate Forms: different distribution of scores (mean and variance may not be equal)

100
Q

What is the split-half method?

A
  • Test is split into halves (randomly, odd-even system, top vs bottom)
  • Correlate the two halves
  • Estimate of reliability based on split half is smaller due to smaller number of items
  • Spearman-Brown formula is applied to estimate reliability if each half of the test was the same length as the test.
    • i.e. Allows you to estimate internal consistency if the test if it was longer or shorter
101
Q

What is the rationale for the split-half method?

A

if scores on 2 half tests from single administration are highly correlated, scores on 2 whole tests from separate administrations should also be highly correlated.

102
Q

What is Cronbach’s alpha?

A

A generalised reliability coefficient for scoring systems that are graded for each item (i.e. agree to disagree)

  • Mean of all possible split-half correlations, corrected by the Spearman-Brown formula
  • Ranges from 0 (no similarity) to 1 (perfectly identical)
103
Q

What are considered acceptable levels of reliability?

A

Depends on the purpose to some extent

  • .70-.80 acceptable or good
  • Greater than .90 may indicate redundancy in items
  • High reliability is really important in clinical settings when making decisions for a person (e.g. decision making capacity assessment).
104
Q

__________________: a particular case of Cronbach’s alpha for dichotomously scored items (i.e. scored as 0 or 1)

A

Kuder-Richardson 20 (KR20): a particular case of Cronbach’s alpha for dichotomously scored items (i.e. scored as 0 or 1)

105
Q

The __________ the SEM, the less certain we are that the test score represents the true score.

A

The larger the SEM, the less certain we are that the test score represents the true score.

106
Q

How do you maximise reliability?

A
  • Clear administration and scoring instructions for test user
  • Clear instructions for the test taker
  • Unambiguous test items
  • Standardised testing environment and procedure
  • Reduced time between testing sessions
  • Increase assessment length/items
  • Test try-out and modification
  • Discarding items that decrease reliability (item analysis)
  • Maximise VALIDITY
107
Q

Draw a diagram that demonstrate the different types of validity

A
108
Q

What are the four main types of validity?

A

Face validity

Content validity

Criterion-related validity

Construct validity

109
Q

What are some issues for content validity?

A
  • Construct underrepresentation: failure to capture important components of a construct.
    • e.g. A depression scale that assesses cognitive and emotional components of depression, but not behavioural components.
  • Construct-irrelevant variance: measuring things other than the construct of interest.
    • e.g. The wording of our depression scale may make it likely that people will respond in socially desirable ways.
110
Q

What are some examples of criterion-related validity?

A
  • e.g. marks in Year 12 are used to predict performance at university
  • e.g. a marital satisfaction survey is used to predict divorce
  • e.g. scores on an anxiety scale you developed are correlated with clinical observations.
111
Q

What is an example of concurrent validity?

A

A test designed to measure anxiety may be issued in conjunction with a diagnostic interview by an experienced clinician using the DSM-5. The concurrent validity of the test represents the extent to which the test score corresponds with the clinician’s observations of the client’s anxiety levels.

112
Q

What is an example of predictive evidence?

A

VCE marks or ATAR scores are used to predict performance at university

113
Q

What is an example of convergent evidence?

A

e. g. Relationship between score on measure of psychopathy and low emotional arousal.
e. g. Relationship between low self-esteem and depression

114
Q

What are some examples of discriminant evidence?

A

Scores on an anxiety measure should differ from scores on a depression measure, if each measure is assessing these individual constructs.