Week 3 Flashcards
(33 cards)
What is reliability
The degree to which a test tool provides consistent results
what is Validity
the extent to which a test measures the construct it is intended to measure
can tests be valid without being reliable?
No but they can be reliable without being valid.
What is Classical test theory?
test/obtained scores are a combination of the true score of the test plus a level of error.
what is more accurate large score large error or small score small error
small score small error
Describe item selection as a source of error
sample of items chosen may not be equally reflective of every individual’s true score.
Describe test administration as a source of error
general environmental conditions at time of adminsitartion e.g. temperature, lighting, noise; temporary “states” of test taker e.g. fatigue, anxiety, distraction influence validity.
Describe test scoring as a source of error
Come about when performance on a test is subjective to the test administrator. Especially problematic when subjectively scored e.g. projective tests, essay tests. –> error is less when tests have set scales and scoring systems
Describe systematic measurment error as a source of error
if, unknown to test developer, test consistently taps into something other than attribute being tested.
Discuss Spearment to measure reliability
ranged from 0-1 scores closed to 1 a more reliable.
What is Domain Sampling thoery
true score could only be found if people repsond to ALL items which represent the contruct. this is lengthy and not always possible. So, the domain sampling problem considers the problem of using only a sample of items to represent a constuct.
What is Item reponse thoery approach to test development.
focus on individual items rather than test as whole.
What is internal consistency? what does high internal consistency?
the extent to which a psycholigcial test is homogenous/ heterogenous.
(measuring one construct)
HIC = all items should correlate
what is the stability over time issue for reliability?
The interpretation of individual score chnages when a test is administered on more than one occassion.
Describe test retest (stability)
determines relaibility - same test administered to the same group at two different time points. if the test is relaible there scores from each time point should be highly correlated.
when is test retest not appropriate to determine relaibility?
when the contructs is not stable, chnaging rapidly. emotion not good, IQ very good.
How do you Maximise Test retest reliability?
use stable contruct, no intervention and short time between testing
describe Parrallell/alternate for of reliability
what does high parrallel/alternate relaibility look like.
two forms of the same test developed with the same content and difficulty administered to the same group.
High reliability would be strong correllation between scores of the same test.
Describe the Split half method of Reliability what is an advantage of this method.
test divided into two halfs and compared. there should be strong correltions between each half. if scores on two test are same then scores on half of once test should be same eliminates need to screate scond test to test relaibility
does the split half over of underestimate relaibility? why?
why is it better than parrellel/alternate form relaibility testing?
underestimated because of smaller number of items used in correlation
it is better because it eliminates the need to create a second test to test relaibility
What is the Spearmen Brown formula used for?
used to test reliability when each half of the test for test re test of not the same length.
what is cronbachs aplha on which data is it used?
what is the range and redundency?
scores reliability for tests
used= on tests with graded score system (agree to disagree)
Range 0 (not similair) 1 (identical) .7 adequate .8 good .9 redundant.
when is Kuder Richardson 20 used
to determined reliability for tests with dichotomously scored items (0 or 1)
what is Content Validity?
does the test adeqaulty represent all the possible items which measure the contruct. If a unit spend half the time on math and half on phsycs the test should reflect this in the final exam.