- How well a test measures what it purports to measure important Implications regarding appropriateness of inferences made and actions taken on the basis of measurements

- sensitivity & specificity - always a compromise between sensitivity & specificity - usually screening process using sensitive test - then use highly specific test to determine which actually have dementia - 3:00

- test needs to be accurate 6: 30

- stability of measurement - measurement is stable over time & within itself 7: 20

- the proportion of total variance (σ2) made up of the true variance (σ2tr) - variability in test scores: σ2 = σ2tr + σ2e - reliability of a test score is always made up of true score + error X=T+E - error is made up of random error & systematic error

Lecture 2 - Dr Greg Yelland (DN) (incomplete) Flashcards by user delete

validity

How well a test measures what it purports to measure
important Implications regarding
- appropriateness of inferences made and
- actions taken on the basis of measurements

How well did you know this?

Not at all

Perfectly

precision

sensitivity & specificity
always a compromise between sensitivity & specificity
usually screening process using sensitive test
then use highly specific test to determine which actually have dementia
3:00

How well did you know this?

Not at all

Perfectly

accuracy

test needs to be accurate
6: 30

How well did you know this?

Not at all

Perfectly

reliability

stability of measurement
measurement is stable over time & within itself
7: 20

How well did you know this?

Not at all

Perfectly

what are the three components of reliability?

1) inter-rater reliability - more to do with scoring than the nature of tests
2) test-retest reliability - should get the same score when doing the same test twice
3) internal consistency - within the test ppl should be scoring consistently
- items should items should be equally good at measuring what they are trying to measure
7: 50

How well did you know this?

Not at all

Perfectly

What is test reliability?

this is not scorer reliability
test-retest - stability over time
internal consistency
homogenous - all items just testing one factor (anxiety)
should be equally good at assessing that factor
need to be aware of how many factors/behaviours a test is measuring
if intend to measure one then should only measure one
10: 00

How well did you know this?

Not at all

Perfectly

What is reliability?

the proportion of total variance (σ2) made up of the true variance (σ2tr)
variability in test scores: σ2 = σ2tr + σ2e
reliability of a test score is always made up of

true score + error
X=T+E

error is made up of random error & systematic error

How well did you know this?

Not at all

Perfectly

Whenever we are talking about reliability & validity, we are talking about……..

correlation or correlation coefficients

i.e., how well things are correlated on different aspects
e. g., with:
test-retest (looking at the correlation between first & second time test taken)
internal consistency (looking at the correlation between different items on the test)

15:30

How well did you know this?

Not at all

Perfectly

What are some sources of error variance?

Test Construction
Test Administration
- Environment
- Test-Taker Variables
- Examiner-Related Variables
Test Scoring/Interpretation

each can contain both random & systematic error

16:20

How well did you know this?

Not at all

Perfectly

What is the difference between systematic & random error variance?

Systematic - constant, or proportionate source of errror in variables other than the target variable
- should not affect variance in scores
Random - caused by unpredictable fluctuations & inconsistencies in variables other than the target variable

Systematic changes should not affect the scores; unpredictable changes will affect the correlation; the more robust the test to fluctuation, the greater the reliability.

How well did you know this?

Not at all

Perfectly

How does error occur in test contruction?

the way you select or sample test items

if all items consistently perform in the same way (the way you intended them)

systematic error - could come from an ambiguous question - some ppl may respond one way and others another

random error - may have one or two questions where someone does not have enough experience to give the standard response to the item

17:00

How well did you know this?

Not at all

Perfectly

How can error occur during test administration?

 Environmental Variables
 Test-Taker Variables
 Examiner-Related Variables

How well did you know this?

Not at all

Perfectly

How do testtakers contribute to error?

**Test-Taker Variables **

during test administration
differences between ppl taking the tests

systematic - different ages & not taking ages into account

random - age, personality etc

issue:

dont necessarily want to minimize by only testing 10 year olds coz then test is only relevant to 10 yr olds

solution:

so do 10 yrs, 11yrs, 12yrs etc, then create norms for different ages (age norms) - takes care of the variable by having different normative data for different ages

20:00

How well did you know this?

Not at all

Perfectly

How does the test environment contribute to error?

during test administration
one may be tested in noisy another in a quiet environment
testing in a group or individually

affects test scores

How well did you know this?

Not at all

Perfectly

How can examiners contribute to error?

during test administration
examiner humanness - may be exhausted by last test - may skip bits to hurry it up

How well did you know this?

Not at all

Perfectly

How can test scoring/interpretation contribute to error?

Study These Flashcards

subjectively scored tests have greater error (because rely on subjective judgements)
moving toward computer based scoring to remove this source of error
cannot have computer based if its the quality of the response (qualitative)
much more error on qualitative than quantitative
22: 35

What should we aim for with regard to error & reliability

Study These Flashcards

aim to remove systematic error and minimise random error so we get better reliability

24:35

What are some reliability estimates?

Study These Flashcards

test-retest
parallel forms/alternate forms

24:50

What is a test-retest reliability estimate?

24:45

Study These Flashcards

same test taken twice - then see how well the scores are correlated
issue of how long an interval between testing?
the shorter the interval = the higher the test-retest reliability, because there are lots of things that can change in an individual over time
systematic changes should not affect test-retest reliability e.g., hot room, cold room (everyone affected equally) 26:50
random changes will affect correlation (test-retest reliability) (27:15)
the more robust the test is to fluctuation = more reliability
e. g., a test that is not affected by time of day, or amount of sleep etc - robust enough to wash those effects out - therefore (28:30)
participant factors will affect test-retest reliability - experience, practice, fatigue, memory, motivation, morningness/eveningness
- as everyone differs in these areas = greater error variance
- practise effects - give you a clue about what is going to happen next time we do the same test - this may mean that we cannot use test-retest
24: 45

When would we use Parallel or Alternate forms of a test?

Study These Flashcards

when we cannot use tes-retest reliability
due to e.g., practise effects giving testtaker a clue about what will be on the test next time

What is a parallel forms or alternate forms reliability estimate?

Study These Flashcards

parallel vs. alternate

** -** parallel forms - are better developed

         - items have been selected so that the mean & variance has been shown to be equal

** **- alternate forms - similar but no guarantee that variance is the same (hence have introduced a source of error)

testing is similar to process as test-retest - do one test then do the parallel or an alternate form.
test sampling issues - problem: is test sampling issue (choice of items)
- best items are usually the best of the items available (unless create both tests at the same time
30: 50

What is one of the biggest problems faced when using a parallel form or alternate form of a test?

Study These Flashcards

test sampling issues - problem: is test sampling issue (choice of items)
- best items are usually used when creating the initial version of the test

(unless creating both tests at the same time)

identifying source of error
is it because it is not stable over time or is it because the different items (content) of the two tests are introducing error
is it stable over time? (external)
internal consistency across the two tests? (internal)
33: 50

Internal Consistency (Reliability)

Study These Flashcards

Split-Half testing

 Split into two halves
 Obtain correlation coefficient

What is the point of Split-Half testing?

Study These Flashcards

 To obtain internal consistency of full version - Spearman-Brown Formula

Estimates internal consistency of a test that is twice the length

When is the Spearman-Brown formula used?

- To obtain **internal consistency** of full version - of split-half tests - Estimates internal consistency of a test that is twice the length - not used when more than one factor (heterogeneity) - not appropriate for speed tests - must have **homogeneity** when using split-half method because could end up with an imbalanced distribution of the factors across the two halfs **Spearman-Brown Split-Half Coefficient** rSB = 2rhh / (1+rhh ) rSB = 2 x 0.9/ (1+0.9) rSB = 1.8/ 1.9 rSB = 0.947

When would we use Cronbach's Alpha?

- when we need an estimate to represent the sum of all of the individual variances in a split half test - it estimates internal consistency for every possible split-half - A generalised reliability coefficient for scoring systems that are graded by each item (sums all of them) - used when items are graded (cannot not be used with dichotomous items) - Essentially an estimate of ALL possible test-retest or split- half coefficients.  α can range between 0 and 1 (ideally closer to 1) - cannot measure mutliple traits - must be homogeneous

When would we use Kuder-Richardson? 51:25

- when test is **dichotomous** - tests every possible split-half correlations or test-retest - mainly used in split-half

What is acceptable range of reliability? 53:35

 Clinical – r \> 0.85 acceptable  Research – r \> ~0.7 acceptable * *_Reliabilities of Major Psychological Tests_** * * ** **INTERNAL CONSISTENCY** * WAIS – r = 0.87 * MMPI – r = 0.84 **TEST-RETEST** * WAIS – r = 0.82 * MMPI – r = 0.74

summary of reliability

Lecture 2 - Dr Greg Yelland (DN) (incomplete) Flashcards

(29 cards)