L6 Flashcards
(62 cards)
is the extent to which a score from a selection measure is stable and free from error. If a score from a measure is not stable or error-free, it is not useful
Reliability
reliability is an essential characteristic of an
effective measure. Test reliability is determined in four ways:
test-retest reliability, alternate-forms reliability, internal reliability, and scorer reliability
each one of several people take the same test twice. The scores from the first administration of the test are correlated with scores from the second to determine whether they are similar. If they are, the test is said to have temporal stability
Test-Retest Reliability
With the alternate-forms reliability method, two forms of the same test are constructed
Alternate-forms reliability
A third way to determine the reliability of a test or inventory is to look at the consistency with which an applicant responds to items measuring a similar dimension or construct (e.g., personality trait, ability, area of knowledge). The extent to which similar items are answered in similar ways is referred to as internal consistency and measures item stability. In general, the longer the test, the higher this is—that is, the agreement among responses to the various test items
internal reliability
When measuring internal consistency (how
consistent the test items are with each other),
researchers use different methods like the:
Split-half method,
Cronbach’s alpha,
and the Kuder-Richardson (K-R) formula
The test is divided into two groups (odd-numbered and even-numbered questions)
Split-half method
These are more accurate methods usually calculated using a computer. This is used for tests with different types of answers (like rating scales). while this one is for tests with only two answer choices (yes no or true/false).
Cronbach’s alpha and K-R formula
A fourth way of assessing reliability is ?
Score reliability
The reliability coefficient for a test can be obtained from your own data, the test manual, journal articles using the test, or test compendia that will be discussed later in the chapter. To evaluate the coefficient, you can compare it with reliability coefficients typically obtained for similar types of tests
Evaluating the reliability of a test
A test or inventory can have homogeneous items and yield heterogeneous scores and still not be reliable if the person scoring the test makes mistakes. This is an issue in projective or subjective tests in which there is no one correct answer, but even tests scored with the use of keys suffer from scorer mistakes
Scorer Reliability
is the degree to which inferences from scores on tests
or assessment are justified by the evidence. As with
reliability, a test must be valid to be useful. But just
because a test is reliable does not mean it is valid
Validity
the extent to which test items sample the content that they are supposed to measure
Content Validity
When choosing an assessment tool, ensure it is:
reliable, valid, and legally sound
gives consistent results, while a valid test accurately measures job-related skills. Check for adverse impact on certain groups and review any legal challenges the test has faced.
Reliable test
should be based on job analysis, covering only relevant skills and knowledge. Unnecessary complexity, like difficult vocabulary, can make a test unfair. By following these guidelines, organizations can select fair, effective, and legally defensible assessments
Well-designed test
measures how well a test predicts job performance. It is assessed through two research designs
Criterion Validity
The test is given to current employees, and their scores are correlated with their existing job performance
Concurrent Validity
The test is administered to applicants before hiring, and their scores are compared with their future job performance. Predictive validity is stronger but harder to implement because hiring all applicants is impractical.
Predictive Validity
refers to whether a test that is valid for a job in one organization remains valid for the same job in another.
Validity Generalization
This is especially useful for smaller organizations
Validity Generalization
refers to the extent to which a test accurately measures the theoretical concept (construct) it claims to measure. Unlike content validity, which focuses on whether a test covers the appropriate material, this is concerned with how well test scores align with the intended construct.
construct validity
One common way to establish this is through correlational studies, where test scores are compared with other tests measuring the same or different constructs.
construct validity
This is only valid when it correlate highly with another psychology test but not with unrelated tests like reading ability.
A valid psychology knowledge test