Reliability and Validity Flashcards

Question 1

Q

Define

Alternative

Answer

A

Two forms of the same test developed; different items selected according to the same rules. Different distribution of scores (mean and variance may not be equal)

Question 2

Q

Define

Base rate

Answer

A

the proportion of individual in the population who show the behaviour of interest in a given psychological testing or assessment situation

Question 3

Q

Define

Classical test theory

Answer

A

a body of related psychometric theory that predicts outcomes of psychological testing such as the difficulty of items or the ability of test-takers

Question 4

Q

Define

Concurrent validity

Answer

A

a form of predictive validity in which the index of social behaviour is obtained close in time to score on the psychological test (or other assessment device)

Question 5

Q

Define

Construct underrepresentation

Answer

A

failure to capture important components of a contruct

Question 6

Q

Define

Construct validity

Answer

A

the meaning of a test score made possible by knowledge of the pattern of relationships it enters into with other variables and the theoretical interpretation of those relationships

Question 7

Q

Define

Constuct-irrelevant variance

Answer

A

measuring things other than the construct of interest

Question 8

Q

Define

Content validity

Answer

A

the extent to which items on a test represent the universe of behaviour the test was designed to measure

Question 9

Q

Define

Convergent and discriminant validity

Answer

A

the subjection of a multitrait-multimethod matric to a set of criteria that specify which correlations should be large and which small in terms of a psychological theory of the constructs

Question 10

Q

Define

Criterion-related validity

Answer

A

the extent to which a measure is related to an outcome (e.g. marks in Year 12 are used to predict performance at university)

Question 11

Q

Define

Cronbach’s alpha

Answer

A

an estimate of reliability that is based on the average intercorrelation of the items in a test

Question 12

Q

Define

Cutting point

Answer

A

the test score or point on a scale, in the case of another assessment device, that is used to split those being tested or assessed into two groups predicted to show or not show some behaviour of interest

Question 13

Q

Define

Domain-sampling model

Answer

A

a way of thinking about the composition of a psychological test that sees the test as a representative sample of the larger domain of possible items that could be included in the test

Question 14

Q

Define

Equivalent forms reliability

Answer

A

the estimate of reliability of a test obtained by comparing two forms of a test constructed to measure the same construct

Question 15

Q

Define

Errors of measurement

Answer

A

Factors that contribute to inconsistency - characteristics of tests taker, test or situation that have nothing to do with attribute being tested by effect scores

Question 16

Q

Define

Face validity

Answer

A

Does the test look like it measures the relevant construct?

Question 17

Q

Define

Factor analysis

Answer

A

a mathematical method of summarising a matric of values (such as the intercorrelation of test scores) in terms of a smaller number of values (factors) from which the original matric can be reproduced

Question 18

Q

Define

False negative decision

Answer

A

a decision that incorrectly allocates a test taker or person being assessed to the category of those predicted not to show some behaviour of interest on the basis of their score on a test or other assessment device

Question 19

Q

Define

False positive decision

Answer

A

a decision that incorrectly allocates a test taker or person being assessed to the category of those predicted to show some behaviour of interest on the basis of their score on a test or other assessment device

Question 20

Q

Define

Generalisability theory

Answer

A

a set of ideas and procedures that follow from the proposal that the consistency or precision of the output of a psychological assessment device depends on specifying the desired range of conditions over which this is to hold

Question 21

Q

Define

Incremental validity

Answer

A

the extent to which knowledge of score on a test (or other assessment device) adds to that obtained by another, pre-existing score or psychological characteristic

Question 22

Q

Define

Inter-rater reliability

Answer

A

the extent to which different raters agree in their assessments of the same sample of ratees

Question 23

Q

Define

Internal consistency

Answer

A

the extent to which a psychological test is homogenous or heterogeneous

Question 24

Q

Define

Kuder-Richardson 20 (KR20)

Answer

A

a particular case of Cronbach’s alpha for dichotomously scored items (i.e. scored as 0 or 1)

Question 25

Q

Define

Method variance

Answer

A

the variability among scores on a psychological test or other assessment device that arises because of the form as distinct from the content of the test

Question 26

Q

Define

Multitrait-multimethod matrix

Answer

A

the patterns of correlations resulting from testing all possible relationships among two or more methods of assessing two ro more constructs

Question 27

Q

Define

Parallel forms relaibility

Answer

A

Two forms of the same test developed; different items selected according to the same rules. Same distribution of scores (mean and variance equal)

Question 28

Q

Define

Predictive validity

Answer

A

the extent to which a score on a psychological test (or other assessment device) allows a statement about standing on a variable indexing important social behaviour independent of the test

Question 29

Q

Define

Reliability

Answer

A

the consistency with which a test measures what it purports to measure in any given set of circumstances

Question 30

Q

Define

Reliability coefficient

Answer

A

an index - often a Pearson product moment correlation coefficient - of the ratio of true score to error score variance in a test as used in a given set of circumstances

Question 31

Q

Define

Selection ratio

Answer

A

the proportion of those tested or assessed who can be allocated to the category of showing the behaviour of interest in a given psychological testing or assessment situation

Question 32

Q

Define

Social desirability bias

Answer

A

a form of method variance common in the construction of psychological tests of personality that arises when people respond to questions that place them in a favourable or unfavourable light

Question 33

Q

Define

Spearman-Brown formula

Answer

A

applied to estimate reliability if each half of the test was the same length as the test. I.e. allows you to estimate internal consistency if the test was longer or shorter

Question 34

Q

Define

Split-half reliability

Answer

A

the estimate of reliability obtained by correlating scores on the two halves of a test formed in some systematic way (e.g. odd versus even items)

Question 35

Q

Define

Stability over time

Answer

A

the extent to which test scores remain stable when a test is administered on more than one occasion

Question 36

Q

Define

Standard error of estimate

Answer

A

an index of the amount of error in predicting one variable from another

Question 37

Q

Define

Standard error of measurement

Answer

A

an expression of the precision of an individual test score as an estimate of the trait it purports to measure

Question 38

Q

Define

Test-Retest Reliability

Answer

A

the estimate of reliability obtained by correlating scores on the test constructor is seeking to measure and the conditions under which it will be used

Question 39

Q

Define

True scores

Answer

A

Factors that contribute to consistency - stable attributes under examination

Question 40

Q

Define

Valid negative decision

Answer

A

a decision that correctly allocates a test taker or person being assessed to the category of those predicted not to show some behaviour of interest on the basis of their score on a test or other assessment device

Question 41

Q

Define

Valid positive decision

Answer

A

a decision that correctly allocates a test taker or person being assessed to the category of those predicted to show some behaviour of interest on the basis of their score on a test or other assessment device

Question 42

Q

Define

Validity

Answer

A

the extent to which evidence supports the meaning and use of a psychological test (or other assessment device)

Question 43

Q

Definition

Two forms of the same test developed; different items selected according to the same rules. Different distribution of scores (mean and variance may not be equal)

Answer

A

Alternative

Question 44

Q

Definition

the proportion of individual in the population who show the behaviour of interest in a given psychological testing or assessment situation

Answer

A

Base rate

Question 45

Q

Definition

a body of related psychometric theory that predicts outcomes of psychological testing such as the difficulty of items or the ability of test-takers

Answer

A

Classical test theory

Question 46

Q

Definition

a form of predictive validity in which the index of social behaviour is obtained close in time to score on the psychological test (or other assessment device)

Answer

A

Concurrent validity

Question 47

Q

Definition

failure to capture important components of a contruct

Answer

A

Construct underrepresentation

Question 48

Q

Definition

the meaning of a test score made possible by knowledge of the pattern of relationships it enters into with other variables and the theoretical interpretation of those relationships

Answer

A

Construct validity

Question 49

Q

Definition

measuring things other than the construct of interest

Answer

A

Constuct-irrelevant variance

Question 50

Q

Definition

the extent to which items on a test represent the universe of behaviour the test was designed to measure

Answer

A

Content validity

Question 51

Q

Definition

the subjection of a multitrait-multimethod matric to a set of criteria that specify which correlations should be large and which small in terms of a psychological theory of the constructs

Answer

A

Convergent and discriminant validity

Question 52

Q

Definition

the extent to which a measure is related to an outcome (e.g. marks in Year 12 are used to predict performance at university)

Answer

A

Criterion-related validity

Question 53

Q

Definition

an estimate of reliability that is based on the average intercorrelation of the items in a test

Answer

A

Cronbach’s alpha

Question 54

Q

Definition

the test score or point on a scale, in the case of another assessment device, that is used to split those being tested or assessed into two groups predicted to show or not show some behaviour of interest

Answer

A

Cutting point

Question 55

Q

Definition

a way of thinking about the composition of a psychological test that sees the test as a representative sample of the larger domain of possible items that could be included in the test

Answer

A

Domain-sampling model

Question 56

Q

Definition

the estimate of reliability of a test obtained by comparing two forms of a test constructed to measure the same construct

Answer

A

Equivalent forms reliability

Question 57

Q

Definition

Factors that contribute to inconsistency - characteristics of tests taker, test or situation that have nothing to do with attribute being tested by effect scores

Answer

A

Errors of measurement

Question 58

Q

Definition

Does the test look like it measures the relevant construct?

Answer

A

Face validity

Question 59

Q

Definition

a mathematical method of summarising a matric of values (such as the intercorrelation of test scores) in terms of a smaller number of values (factors) from which the original matric can be reproduced

Answer

A

Factor analysis

Question 60

Q

Definition

a decision that incorrectly allocates a test taker or person being assessed to the category of those predicted not to show some behaviour of interest on the basis of their score on a test or other assessment device

Answer

A

False negative decision

Question 61

Q

Definition

a decision that incorrectly allocates a test taker or person being assessed to the category of those predicted to show some behaviour of interest on the basis of their score on a test or other assessment device

Answer

A

False positive decision

Question 62

Q

Definition

a set of ideas and procedures that follow from the proposal that the consistency or precision of the output of a psychological assessment device depends on specifying the desired range of conditions over which this is to hold

Answer

A

Generalisability theory

Question 63

Q

Definition

the extent to which knowledge of score on a test (or other assessment device) adds to that obtained by another, pre-existing score or psychological characteristic

Answer

A

Incremental validity

Question 64

Q

Definition

the extent to which different raters agree in their assessments of the same sample of ratees

Answer

A

Inter-rater reliability

Answer 65

A

Internal consistency

Answer 66

A

Kuder-Richardson 20 (KR20)

Answer 67

A

Method variance

Answer 68

A

Multitrait-multimethod matrix

Answer 69

A

Parallel forms relaibility

Answer 70

A

Predictive validity

Answer 71

A

Reliability

Answer 72

A

Reliability coefficient

Answer 73

A

Selection ratio

Answer 74

A

Social desirability bias

Answer 75

A

Spearman-Brown formula

Answer 76

A

Split-half reliability

Answer 77

A

Stability over time

Answer 78

A

Standard error of estimate

Answer 79

A

Standard error of measurement

Answer 80

A

Test-Retest Reliability

Answer 81

A

True scores

Answer 82

A

Valid negative decision

Answer 83

A

Valid positive decision

Answer 84

A

The degree to which a test tool provides consistent results
A test is considered reliable when it produces the same results again and again, when measuring the same thing

Answer 85

A

Validity can be broadly understood as the extent to which a test measures the constuct it is intended to measure

Answer 86

A

The test may have poor validity (i.e. it is measuring some other variable)
The test has poor reliability (i.e. when repeated, the test often shows different results)

Answer 87

A

Diagnosis
Assessment of ability
Treatment
- Decisions around recommending treatment
- Monitoring treatment outcomes (e.g. reliability would be really important if you are repeating tests to see if the treatment is working)
The conclusions you can draw rely on the reliability and validity of the tests/assessments you are using.
- Important clinically and in research

Answer 88

A

False

Tests cannot be valid without being reliable

Answer 89

A

Factors that contribute to consistency – stable attributes under examination (“True Scores”)
Factors that contribute to inconsistency – characteristics of tests taker, test or situation that have nothing to do with attribute being tested but effect scores. (“Errors of Measurement”)

Answer 90

A

Item selection

Test administration

Test scoring

Systematic measurement error

Answer 91

A

sample of items chosen may not be equally reflective of every individual’s true score.

Answer 92

A

General environmental conditions e.g. temperature, lighting, noise; temporary “states” of test taker e.g. fatigue, anxiety, distraction.

E.g. completing an IQ test in a loud noisy room. Examiner providing non-standerdised instructions

Answer 93

A

Domain Sampling Theory considers the problem of using only a sample of items to represent a construct

Answer 94

A

More test-retest reliability

Practice effects

Maturation

Treatment effects or setting

Answer 95

A

Which of these would test-retest be approapriate for?

State anxiety
Weight of a baby
Extraversion
Intelligence

Answer 96

A

Test a relatively stable construct
No intervention in between testing
Shorter time between testing

Answer 97

A

The both involve two forms of the same test developed; different items selected according to the same rules.

Parallel Forms: same distribution of scores (means and variance equal)

Alternate Forms: different distribution of scores (mean and variance may not be equal)

Answer 98

A

Test is split into halves (randomly, odd-even system, top vs bottom)
Correlate the two halves
Estimate of reliability based on split half is smaller due to smaller number of items
Spearman-Brown formula is applied to estimate reliability if each half of the test was the same length as the test.
- i.e. Allows you to estimate internal consistency if the test if it was longer or shorter

Answer 99

A

if scores on 2 half tests from single administration are highly correlated, scores on 2 whole tests from separate administrations should also be highly correlated.

Answer 100

A

A generalised reliability coefficient for scoring systems that are graded for each item (i.e. agree to disagree)

Mean of all possible split-half correlations, corrected by the Spearman-Brown formula
Ranges from 0 (no similarity) to 1 (perfectly identical)

Answer 101

A

Depends on the purpose to some extent

.70-.80 acceptable or good
Greater than .90 may indicate redundancy in items
High reliability is really important in clinical settings when making decisions for a person (e.g. decision making capacity assessment).

Answer 102

A

Kuder-Richardson 20 (KR20): a particular case of Cronbach’s alpha for dichotomously scored items (i.e. scored as 0 or 1)

Answer 103

A

The larger the SEM, the less certain we are that the test score represents the true score.

Answer 104

A

Clear administration and scoring instructions for test user
Clear instructions for the test taker
Unambiguous test items
Standardised testing environment and procedure
Reduced time between testing sessions
Increase assessment length/items
Test try-out and modification
Discarding items that decrease reliability (item analysis)
Maximise VALIDITY

Answer 105

A

Face validity

Content validity

Criterion-related validity

Construct validity

Answer 106

A

Construct underrepresentation: failure to capture important components of a construct.
- e.g. A depression scale that assesses cognitive and emotional components of depression, but not behavioural components.
Construct-irrelevant variance: measuring things other than the construct of interest.
- e.g. The wording of our depression scale may make it likely that people will respond in socially desirable ways.

Answer 107

A

e.g. marks in Year 12 are used to predict performance at university
e.g. a marital satisfaction survey is used to predict divorce
e.g. scores on an anxiety scale you developed are correlated with clinical observations.

Answer 108

A

A test designed to measure anxiety may be issued in conjunction with a diagnostic interview by an experienced clinician using the DSM-5. The concurrent validity of the test represents the extent to which the test score corresponds with the clinician’s observations of the client’s anxiety levels.

Answer 109

A

VCE marks or ATAR scores are used to predict performance at university

Answer 110

A

e. g. Relationship between score on measure of psychopathy and low emotional arousal.
e. g. Relationship between low self-esteem and depression

Answer 111

A

Scores on an anxiety measure should differ from scores on a depression measure, if each measure is assessing these individual constructs.

Brainscape's Knowledge GenomeTM

Reliability and Validity Flashcards

Brainscape's Knowledge Genome^TM