Test Construction Flashcards
psychological test
an objective and standardized measure of a sample of behavior
standardization
uniformity of procedure in administering and scoring the test;
test conditions and scoring procedures should be the same for all examinees
norms
the scores of a representative sample of the population on a particular test;
interpretation of most psychological tests involves comparing an individual’s test score to norms
conceptual points about norms
1) norms are obtained from a sample that is truly representative of the population for which the test is designed;
2) to be truly representative, a sample must be reasonably large;
3) examinee’s score should be compared to the scores obtained by a representative sample of the population to which he or she belongs;
4) norm-referenced scores indicate an examinee’s standing on a test as compared to other persons, which permits comparison of an individual’s performance on different tests;
5) don’t provide a universal standard of “good” or “bad” performance - represent the performance of persons in the standardization sample
objective
administration, scoring, and interpretation of scores are “independent of the subjective judgment of the particular examiner”;
the examinee will obtain the same score regardless of whoever administers or scores the test
sample of behavior
the test will sample the behavior in question
reliability
yields repeatable, dependable, and consistent results;
yields examinees’ true scores on whatever attribute that it measures
validity
measures what it purports to measure
maximum performance
tells us about an examinee’s best possible performance, or what a person can do;
achievement and aptitude tests
typical performance
tell us what an examinee usually does or feels;
interest and personality tests
pure speed (speeded) test
the examinee’s response rate is assessed;
have time limits and consist of items that all (or almost all) examinees would answer correctly if given enough time
power test
assesses the level of difficulty a person can attain;
no time limit or a time limit that permits most or all examinees to attempt all items;
items are arranged in order from least difficult to most difficult
mastery tests
designed to determine whether a person can attain a pre-established level of acceptable performance;
“all or none” score (e.g., pass/fail);
commonly employed to test basic skills (e.g., basic reading, basic math) at the elementary school level
ipsative measure
individual themself (opposed to a norm group or external criterion) is the frame of reference in score reporting;
scores are reported in terms of the relative strength of attributes within the individual examinee;
scores reflect which needs are strongest or weakest within the examinee, rather than as compared to a norm group;
examinees express a preference for one item over others, rather than responding to each item individually - required to choose which of 2 statements appeals to you the most
normative measures
provide a measure of the absolute strength of each attribute measured by the test;
examinees answer every item;
score can be compared to those of other examinees
classical test theory
a given examinee’s obtained test score consists of two components: truth and error
true score
reflects the examinee’s actual status on whatever attribute is being measured by the test
error (measurement error)
factors that are irrelevant to whatever is being measured; random;
does not affect all examinees in the same way
reliability coefficient
a correlation coefficient that ranges in value from 0.0 to +1.0;
indicates the proportion of variability that is true score variability;
0.0 - test is completely unreliable; observed variability (differences) in test scores due entirely to random factors;
1.0 - perfect reliability; no error - all observed variability reflects true variability;
.90 - 90% of observed variability in obtained test scores due to true score differences among examinees and the remaining 10% of observed variability represents measurement error;
cannot be squared
test-retest reliability coefficient (“coefficient of stability”)
administering the same test to the same group of people, and then correlating scores on the first and second administrations
“time sampling”
factors related to time that are sources of measurement error for the test-retest coefficient;
from one administration to the next, there may be changes in exam conditions (noises, weather) or factors such as illness, fatigue, worry, etc.
practice effects
doing better the second time around due to practice
drawbacks of test-retest reliability coefficient
examinees systematically tend to remember their previous responses;
not appropriate for assessing the reliability of tests that measure unstable attributes (mood);
recommended only for tests that are not appreciably affected by repetition, so very few psychological tests fall into this category
alternate forms (equivalent forms or parallel forms) reliability coefficient
administering two equivalent forms of a test to the same group of examinees, and then obtaining the correlation between the two sets of scores