L6 Flashcards by Louzel Allena

is the extent to which a score from a selection measure is stable and free from error. If a score from a measure is not stable or error-free, it is not useful

Reliability

How well did you know this?

Not at all

Perfectly

reliability is an essential characteristic of an
effective measure. Test reliability is determined in four ways:

test-retest reliability, alternate-forms reliability, internal reliability, and scorer reliability

How well did you know this?

Not at all

Perfectly

each one of several people take the same test twice. The scores from the first administration of the test are correlated with scores from the second to determine whether they are similar. If they are, the test is said to have temporal stability

Test-Retest Reliability

How well did you know this?

Not at all

Perfectly

With the alternate-forms reliability method, two forms of the same test are constructed

Alternate-forms reliability

How well did you know this?

Not at all

Perfectly

A third way to determine the reliability of a test or inventory is to look at the consistency with which an applicant responds to items measuring a similar dimension or construct (e.g., personality trait, ability, area of knowledge). The extent to which similar items are answered in similar ways is referred to as internal consistency and measures item stability. In general, the longer the test, the higher this is—that is, the agreement among responses to the various test items

internal reliability

How well did you know this?

Not at all

Perfectly

When measuring internal consistency (how
consistent the test items are with each other),
researchers use different methods like the:

Split-half method,
Cronbach’s alpha,
and the Kuder-Richardson (K-R) formula

How well did you know this?

Not at all

Perfectly

The test is divided into two groups (odd-numbered and even-numbered questions)

Split-half method

How well did you know this?

Not at all

Perfectly

These are more accurate methods usually calculated using a computer. This is used for tests with different types of answers (like rating scales). while this one is for tests with only two answer choices (yes no or true/false).

Cronbach’s alpha and K-R formula

How well did you know this?

Not at all

Perfectly

A fourth way of assessing reliability is ?

Score reliability

How well did you know this?

Not at all

Perfectly

The reliability coefficient for a test can be obtained from your own data, the test manual, journal articles using the test, or test compendia that will be discussed later in the chapter. To evaluate the coefficient, you can compare it with reliability coefficients typically obtained for similar types of tests

Evaluating the reliability of a test

How well did you know this?

Not at all

Perfectly

A test or inventory can have homogeneous items and yield heterogeneous scores and still not be reliable if the person scoring the test makes mistakes. This is an issue in projective or subjective tests in which there is no one correct answer, but even tests scored with the use of keys suffer from scorer mistakes

Scorer Reliability

How well did you know this?

Not at all

Perfectly

is the degree to which inferences from scores on tests
or assessment are justified by the evidence. As with
reliability, a test must be valid to be useful. But just
because a test is reliable does not mean it is valid

Validity

How well did you know this?

Not at all

Perfectly

the extent to which test items sample the content that they are supposed to measure

Content Validity

How well did you know this?

Not at all

Perfectly

When choosing an assessment tool, ensure it is:

reliable, valid, and legally sound

How well did you know this?

Not at all

Perfectly

gives consistent results, while a valid test accurately measures job-related skills. Check for adverse impact on certain groups and review any legal challenges the test has faced.

Reliable test

How well did you know this?

Not at all

Perfectly

should be based on job analysis, covering only relevant skills and knowledge. Unnecessary complexity, like difficult vocabulary, can make a test unfair. By following these guidelines, organizations can select fair, effective, and legally defensible assessments

Well-designed test

How well did you know this?

Not at all

Perfectly

measures how well a test predicts job performance. It is assessed through two research designs

Criterion Validity

How well did you know this?

Not at all

Perfectly

The test is given to current employees, and their scores are correlated with their existing job performance

Concurrent Validity

How well did you know this?

Not at all

Perfectly

The test is administered to applicants before hiring, and their scores are compared with their future job performance. Predictive validity is stronger but harder to implement because hiring all applicants is impractical.

Predictive Validity

How well did you know this?

Not at all

Perfectly

refers to whether a test that is valid for a job in one organization remains valid for the same job in another.

Validity Generalization

How well did you know this?

Not at all

Perfectly

This is especially useful for smaller organizations

Validity Generalization

How well did you know this?

Not at all

Perfectly

refers to the extent to which a test accurately measures the theoretical concept (construct) it claims to measure. Unlike content validity, which focuses on whether a test covers the appropriate material, this is concerned with how well test scores align with the intended construct.

construct validity

How well did you know this?

Not at all

Perfectly

One common way to establish this is through correlational studies, where test scores are compared with other tests measuring the same or different constructs.

construct validity

How well did you know this?

Not at all

Perfectly

This is only valid when it correlate highly with another psychology test but not with unrelated tests like reading ability.

A valid psychology knowledge test

How well did you know this?

Not at all

Perfectly

involves comparing test scores between groups expected to differ on the construct

Known-group validity

sufficient when a test directly relates to job duties. However, if the connection is unclear, a criterion validity study may be needed.

Content validity

studies carry risks—if results are insignificant, they can weaken credibility in legal disputes. Since validity coefficients are often low (0.20–0.35), they may be difficult to defend.

Criterion validity

refers to how much a test appears to be related to the job it assesses. While not an official validity measure under federal guidelines, it is crucial because test-takers and administrators need to perceive a test as relevant for it to be accepted and taken seriously. It reduces lawsuits and applicant dropout rates

Face Validity

can be improved by explaining how the test relates to job performance, using multimedia formats, and providing honest feedback with respectful treatment

Acceptance of test results

accuracy in measuring what it intends to measure

Validity

consistency of results

Reliability

A comprehensive guide containing expert reviews and details on thousands of psychological tests.

Seventeenth Mental Measurements Yearbook

A compendium listing available psychological tests along with essential reliability and validity data.

Tests in Print VII

Group tests are more efficient, but individual tests provide deeper insights into problem-solving approaches.

Group vs. Individual

Adjusts question difficulty based on responses, making testing faster and moe precise.

Computer-Adaptive Testing (CAT)

Reduces costs, speeds up feedback, and maintains accuracy.

Computer-Based Testing

are designed to estimate the percentage of future employees who will be-successful on the job if an organization uses a particular test.

Taylor-Russell Tables

Estimate the percentage of successful hires

Taylor-Russell tables

predict individual success probability

expectancy charts and Lawshe tables

calculates potential cost savings from adopting a new test.

Utility Formula

which is simply the percentage of people an organization must hire

Selection Ratio

the percentage of employees currently on the job who are considered successful

Base rate

is easier to do but less accurate than the Taylor-Russell tables. The only information needed to determine the proportion of correct decisions is employee test scores and the scores on the criterion

Proportion of correct decisions

estimate the monetary savings to an organization.

Utility Formula

This is the average amount of time that employees in the position tend to stay with the company. The number is computed by using information from company records to identify the time that each employee in that position stayed with the company. The number of years of tenure for each employee is then summed and divided by the total number of employees.

Average Tenure

This figure is the criterion validity coefficient that was obtained through either a validity study or validity generalization

Test Validity

obtain the average score on the selection test for both the applicants who are hired and the applicants who are not hired. The average test score of the nonhired applicants is subtracted from the average test score of the hired applicants. This difference is divided by the standard deviation of all the test scores.

Mean standardized predictor score of selection applicants

impact is generally considered to occur when the selection rate for any group is less than 80% of the selection rate of the highest-scoring group, and this difference is statistically significant.

Adverse impact

where a test predicts job performance for one specific group but not others, is a rare phenomenon usually attributed to small sample sizes and methodological flaws. When encountered, organizations have three options: disregard it (as it's likely due to chance), stop using the test altogether, or use the test only for the validated group and find a different test for other groups.

Single-group validity

the most appropriate choice because there's often no logical reason why a valid test of a specific construct (like intelligence or personality) would predict job performance differently for equally qualified groups (like different races or genders).

Disregarding single-group validity

occurs when a test is a valid predictor of job performance for two different groups, but it predicts performance significantly better for one group compared to the other.

Differential validity

a test is valid for only one group. Research suggests differential validity is also rare, and when observed, it's often in occupations dominated by one sex, with the test being more valid for that dominant group and tending to overpredict the performance of the minority group

Single group validity

A test is generally considered fair if it does not result in?

adverse impact, does not exhibit single-group validity, and does not show differential validity.

administered to a group of applicant, a final decision must be made as to which applicants to hire.

Fair selection test

each test score weighted according to how well it predicts the criterion

Multiple Regression

applicants are rank-ordered on the basis of their test scores. Selection is then made by starting with the highest score and moving down until all openings have been filled.

Unadjusted top-down selection

the assumption is that if multiple test scores are used, the relationship between a low score on one test can be compensated for by a high score on another

Compensatory approach to top-down selection

which the names of the top three scorers are given to the person making the hiring decision (e.g., police chief, HR director). This person can then choose any of the three based on the immediate needs of the employer. This method ensures that the person hired will be well qualified but provides more choice than does top-down selection

Rule of three or five

means for reducing adverse impact and increasing flexibility. With this system, an organization determines the lowest score on a test that is associated with acceptable performance on the job.

Passing score

the applicants would be administered all of the tests at one time. If they failed any of the tests (fell below the passing score), they would not be considered further for employment.

Multiple-Cutoff

approaches are often used. the applicant is administered one test at a time, usually beginning with the least expensive. Applicants who fail a test are eliminated from further consideration and take no more tests. Applicants who pass all of the tests are then administered the linearly related tests; the applicants with the top scores on these tests are hired

Multiple-Hurdle

into consideration the degree of error associated with any test score. Thus, even though one applicant might score two points higher than another, the two-point difference might be the result of chance (error) rather than actual differences in ability

Banding

L6 Flashcards

(62 cards)