Lecture 2 - Dr Greg Yelland (DN) (incomplete) Flashcards Preview

z. Psychological Testing and Assessment > Lecture 2 - Dr Greg Yelland (DN) (incomplete) > Flashcards

Flashcards in Lecture 2 - Dr Greg Yelland (DN) (incomplete) Deck (29):


- How well a test measures what it purports to measure

  • important Implications regarding
    • appropriateness of inferences made and
    • actions taken on the basis of measurements





- sensitivity & specificity

- always a compromise between sensitivity & specificity

- usually screening process using sensitive test

- then use highly specific test to determine which actually have dementia

- 3:00



- test needs to be accurate




- stability of measurement

- measurement is stable over time & within itself



what are the three components of reliability?

1) inter-rater reliability - more to do with scoring than the nature of tests

2) test-retest reliability - should get the same score when doing the same test twice

3) internal consistency - within the test ppl should be scoring consistently

- items should items should be equally good at measuring what they are trying to measure




What is test reliability?

- this is not scorer reliability

- test-retest - stability over time


- internal consistency

- homogenous - all items just testing one factor (anxiety)

- should be equally good at assessing that factor

- need to be aware of how many factors/behaviours a test is measuring

- if intend to measure one then should only measure one



What is reliability?

- the proportion of total variance (σ2) made up of the true variance (σ2tr)

- variability in test scores: σ2 = σ2tr + σ2e

- reliability of a test score is always made up of 

true score + error

- error is made up of random error & systematic error


Whenever we are talking about reliability & validity, we are talking about........

correlation or  correlation coefficients

- i.e.,  how well things are correlated on different aspects

e.g., with:

  • test-retest (looking at the correlation between first & second time test taken)
  • internal consistency (looking at the correlation between different items on the test)



What are some sources of error variance?

  • Test Construction
  • Test Administration

                  - Environment

                  - Test-Taker Variables

                  - Examiner-Related Variables

  • Test Scoring/Interpretation

each can contain both random & systematic error



What is the difference between systematic & random error variance?

  • Systematic - constant, or proportionate source of errror in variables other than the target variable
    • should not affect variance in scores
  • Random - caused by unpredictable fluctuations & inconsistencies in variables other than the target variable


Systematic changes should not affect the scores; unpredictable changes will affect the correlation; the more robust the test to fluctuation, the greater the reliability.


How does error occur in test contruction?

the way you select or sample test items

- if all items consistently perform in the same way (the way you intended them)

systematic error - could come from an ambiguous question - some ppl may respond one way and others another

random error - may have one or two questions where someone does not have enough experience to give the standard response to the item




How can error occur during test administration?

 Environmental Variables
 Test-Taker Variables
 Examiner-Related Variables


How do testtakers contribute to error?

Test-Taker Variables 

- during test administration

- differences between ppl taking the tests

systematic - different ages & not taking ages into account

random - age, personality etc


- dont necessarily want to minimize by only testing 10 year olds coz then test is only relevant to 10 yr olds


so do 10 yrs, 11yrs, 12yrs etc, then create norms for different ages (age norms) - takes care of the variable by having different normative data for different ages



How does the test environment contribute to error?

- during test administration

- one may be tested in noisy another in a quiet environment

- testing in a group or individually

affects test scores



How can examiners contribute to error?

- during test administration

- examiner humanness - may be exhausted by last test - may skip bits to hurry it up


How can test scoring/interpretation contribute to error?

- subjectively scored tests have greater error (because rely on subjective judgements)

- moving toward computer based scoring to remove this source of error

- cannot have computer based if its the quality of the response (qualitative) 

- much more error on qualitative than quantitative




What should we aim for with regard to error & reliability

aim to remove systematic error and minimise random error so we get better reliability




What are some reliability estimates?

  • test-retest
  • parallel forms/alternate forms




What is a test-retest reliability estimate?


  • same test taken twice - then see how well the scores are correlated
  •  issue of how long an interval between testing?

- the shorter the interval = the higher the test-retest reliability, because there are lots of things that can change in an individual over time

  • systematic changes should not affect test-retest reliability e.g., hot room, cold room (everyone affected equally) 26:50

- random changes will affect correlation (test-retest reliability) (27:15)

- the more robust the test is to fluctuation = more reliability

e.g.,  a test that is not affected by time of day, or amount of sleep etc - robust enough to wash those effects out - therefore (28:30)

  • participant factors will affect test-retest reliability - experience, practice, fatigue, memory, motivation, morningness/eveningness

                   - as everyone differs in these areas = greater error variance

                    - practise effects - give you a clue about what is going to happen next time we do the same test - this may mean that we cannot use test-retest 



When would we use Parallel or Alternate forms of a test?

- when we cannot use tes-retest reliability

- due to e.g., practise effects giving testtaker a clue about what will be on the test next time


What is a parallel forms or alternate forms reliability estimate?

  • parallel vs. alternate

             - parallel forms - are better developed

             - items have been selected so that the mean & variance has been shown to be equal

             - alternate forms - similar but no guarantee that variance is the same (hence have introduced a source of error)

  • testing is similar to process as test-retest - do one test then do the parallel or an alternate form.
  • test sampling issues - problem: is test sampling issue (choice of items)

             - best items are usually the best of the items available (unless create both tests at the same time



What is one of the biggest problems faced when using a parallel form or alternate form of a test? 

  • test sampling issues - problem: is test sampling issue (choice of items)

             - best items are usually used when creating the initial version of the test

 (unless creating both tests at the same time)

  • identifying source of error
  • is it because it is not stable over time or is it because the different items (content) of the two tests are introducing error

- is it stable over time? (external)

- internal consistency across the two tests? (internal)



Internal Consistency (Reliability)

- Split-Half testing

 Split into two halves
 Obtain correlation coefficient


What is the point of Split-Half testing?



 To obtain internal consistency of full version - Spearman-Brown Formula

Estimates internal consistency of a test that is twice the length


When is the Spearman-Brown formula used?

- To obtain internal consistency of full version - of split-half tests

- Estimates internal consistency of a test that is twice the length


- not used when more than one factor (heterogeneity)

- not appropriate for speed tests

- must have homogeneity when using split-half method because could end up with an imbalanced distribution of the factors across the two halfs

Spearman-Brown Split-Half Coefficient

rSB = 2rhh / (1+rhh )

rSB = 2 x 0.9/ (1+0.9)

rSB = 1.8/ 1.9

rSB = 0.947


When would we use Cronbach's Alpha?

- when we need an estimate to represent the sum of all of the individual variances in a split half test

- it estimates internal consistency for every possible split-half

- A generalised reliability coefficient for scoring systems that are graded by each item (sums all of them)

- used when items are graded (cannot not be used with dichotomous items)

- Essentially an estimate of ALL possible test-retest or split- half coefficients.

 α can range between 0 and 1 (ideally closer to 1)

- cannot measure mutliple traits - must be homogeneous


When would we use Kuder-Richardson?


- when test is dichotomous

- tests every possible split-half correlations or test-retest

- mainly used in split-half


What is acceptable range of reliability?


 Clinical – r > 0.85 acceptable
 Research – r > ~0.7 acceptable


Reliabilities of Major Psychological Tests


  • WAIS – r = 0.87
  • MMPI – r = 0.84


  • WAIS – r = 0.82
  • MMPI – r = 0.74


summary of reliability