L6 Flashcards

(62 cards)

1
Q

is the extent to which a score from a selection measure is stable and free from error. If a score from a measure is not stable or error-free, it is not useful

A

Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

reliability is an essential characteristic of an
effective measure. Test reliability is determined in four ways:

A

test-retest reliability, alternate-forms reliability, internal reliability, and scorer reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

each one of several people take the same test twice. The scores from the first administration of the test are correlated with scores from the second to determine whether they are similar. If they are, the test is said to have temporal stability

A

Test-Retest Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

With the alternate-forms reliability method, two forms of the same test are constructed

A

Alternate-forms reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A third way to determine the reliability of a test or inventory is to look at the consistency with which an applicant responds to items measuring a similar dimension or construct (e.g., personality trait, ability, area of knowledge). The extent to which similar items are answered in similar ways is referred to as internal consistency and measures item stability. In general, the longer the test, the higher this is—that is, the agreement among responses to the various test items

A

internal reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When measuring internal consistency (how
consistent the test items are with each other),
researchers use different methods like the:

A

Split-half method,
Cronbach’s alpha,
and the Kuder-Richardson (K-R) formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The test is divided into two groups (odd-numbered and even-numbered questions)

A

Split-half method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

These are more accurate methods usually calculated using a computer. This is used for tests with different types of answers (like rating scales). while this one is for tests with only two answer choices (yes no or true/false).

A

Cronbach’s alpha and K-R formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A fourth way of assessing reliability is ?

A

Score reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The reliability coefficient for a test can be obtained from your own data, the test manual, journal articles using the test, or test compendia that will be discussed later in the chapter. To evaluate the coefficient, you can compare it with reliability coefficients typically obtained for similar types of tests

A

Evaluating the reliability of a test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A test or inventory can have homogeneous items and yield heterogeneous scores and still not be reliable if the person scoring the test makes mistakes. This is an issue in projective or subjective tests in which there is no one correct answer, but even tests scored with the use of keys suffer from scorer mistakes

A

Scorer Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

is the degree to which inferences from scores on tests
or assessment are justified by the evidence. As with
reliability, a test must be valid to be useful. But just
because a test is reliable does not mean it is valid

A

Validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

the extent to which test items sample the content that they are supposed to measure

A

Content Validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When choosing an assessment tool, ensure it is:

A

reliable, valid, and legally sound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

gives consistent results, while a valid test accurately measures job-related skills. Check for adverse impact on certain groups and review any legal challenges the test has faced.

A

Reliable test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

should be based on job analysis, covering only relevant skills and knowledge. Unnecessary complexity, like difficult vocabulary, can make a test unfair. By following these guidelines, organizations can select fair, effective, and legally defensible assessments

A

Well-designed test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

measures how well a test predicts job performance. It is assessed through two research designs

A

Criterion Validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

The test is given to current employees, and their scores are correlated with their existing job performance

A

Concurrent Validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The test is administered to applicants before hiring, and their scores are compared with their future job performance. Predictive validity is stronger but harder to implement because hiring all applicants is impractical.

A

Predictive Validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

refers to whether a test that is valid for a job in one organization remains valid for the same job in another.

A

Validity Generalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

This is especially useful for smaller organizations

A

Validity Generalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

refers to the extent to which a test accurately measures the theoretical concept (construct) it claims to measure. Unlike content validity, which focuses on whether a test covers the appropriate material, this is concerned with how well test scores align with the intended construct.

A

construct validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

One common way to establish this is through correlational studies, where test scores are compared with other tests measuring the same or different constructs.

A

construct validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

This is only valid when it correlate highly with another psychology test but not with unrelated tests like reading ability.

A

A valid psychology knowledge test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
involves comparing test scores between groups expected to differ on the construct
Known-group validity
26
sufficient when a test directly relates to job duties. However, if the connection is unclear, a criterion validity study may be needed.
Content validity
27
studies carry risks—if results are insignificant, they can weaken credibility in legal disputes. Since validity coefficients are often low (0.20–0.35), they may be difficult to defend.
Criterion validity
28
refers to how much a test appears to be related to the job it assesses. While not an official validity measure under federal guidelines, it is crucial because test-takers and administrators need to perceive a test as relevant for it to be accepted and taken seriously. It reduces lawsuits and applicant dropout rates
Face Validity
29
can be improved by explaining how the test relates to job performance, using multimedia formats, and providing honest feedback with respectful treatment
Acceptance of test results
30
accuracy in measuring what it intends to measure
Validity
31
consistency of results
Reliability
32
A comprehensive guide containing expert reviews and details on thousands of psychological tests.
Seventeenth Mental Measurements Yearbook
33
A compendium listing available psychological tests along with essential reliability and validity data.
Tests in Print VII
34
Group tests are more efficient, but individual tests provide deeper insights into problem-solving approaches.
Group vs. Individual
35
Adjusts question difficulty based on responses, making testing faster and moe precise.
Computer-Adaptive Testing (CAT)
36
Reduces costs, speeds up feedback, and maintains accuracy.
Computer-Based Testing
37
are designed to estimate the percentage of future employees who will be-successful on the job if an organization uses a particular test.
Taylor-Russell Tables
38
Estimate the percentage of successful hires
Taylor-Russell tables
39
predict individual success probability
expectancy charts and Lawshe tables
40
calculates potential cost savings from adopting a new test.
Utility Formula
41
which is simply the percentage of people an organization must hire
Selection Ratio
42
the percentage of employees currently on the job who are considered successful
Base rate
43
is easier to do but less accurate than the Taylor-Russell tables. The only information needed to determine the proportion of correct decisions is employee test scores and the scores on the criterion
Proportion of correct decisions
44
estimate the monetary savings to an organization.
Utility Formula
45
This is the average amount of time that employees in the position tend to stay with the company. The number is computed by using information from company records to identify the time that each employee in that position stayed with the company. The number of years of tenure for each employee is then summed and divided by the total number of employees.
Average Tenure
46
This figure is the criterion validity coefficient that was obtained through either a validity study or validity generalization
Test Validity
47
obtain the average score on the selection test for both the applicants who are hired and the applicants who are not hired. The average test score of the nonhired applicants is subtracted from the average test score of the hired applicants. This difference is divided by the standard deviation of all the test scores.
Mean standardized predictor score of selection applicants
48
impact is generally considered to occur when the selection rate for any group is less than 80% of the selection rate of the highest-scoring group, and this difference is statistically significant.
Adverse impact
49
where a test predicts job performance for one specific group but not others, is a rare phenomenon usually attributed to small sample sizes and methodological flaws. When encountered, organizations have three options: disregard it (as it's likely due to chance), stop using the test altogether, or use the test only for the validated group and find a different test for other groups.
Single-group validity
50
the most appropriate choice because there's often no logical reason why a valid test of a specific construct (like intelligence or personality) would predict job performance differently for equally qualified groups (like different races or genders).
Disregarding single-group validity
51
occurs when a test is a valid predictor of job performance for two different groups, but it predicts performance significantly better for one group compared to the other.
Differential validity
52
a test is valid for only one group. Research suggests differential validity is also rare, and when observed, it's often in occupations dominated by one sex, with the test being more valid for that dominant group and tending to overpredict the performance of the minority group
Single group validity
53
A test is generally considered fair if it does not result in?
adverse impact, does not exhibit single-group validity, and does not show differential validity.
54
administered to a group of applicant, a final decision must be made as to which applicants to hire.
Fair selection test
55
each test score weighted according to how well it predicts the criterion
Multiple Regression
56
applicants are rank-ordered on the basis of their test scores. Selection is then made by starting with the highest score and moving down until all openings have been filled.
Unadjusted top-down selection
57
the assumption is that if multiple test scores are used, the relationship between a low score on one test can be compensated for by a high score on another
Compensatory approach to top-down selection
58
which the names of the top three scorers are given to the person making the hiring decision (e.g., police chief, HR director). This person can then choose any of the three based on the immediate needs of the employer. This method ensures that the person hired will be well qualified but provides more choice than does top-down selection
Rule of three or five
59
means for reducing adverse impact and increasing flexibility. With this system, an organization determines the lowest score on a test that is associated with acceptable performance on the job.
Passing score
60
the applicants would be administered all of the tests at one time. If they failed any of the tests (fell below the passing score), they would not be considered further for employment.
Multiple-Cutoff
61
approaches are often used. the applicant is administered one test at a time, usually beginning with the least expensive. Applicants who fail a test are eliminated from further consideration and take no more tests. Applicants who pass all of the tests are then administered the linearly related tests; the applicants with the top scores on these tests are hired
Multiple-Hurdle
62
into consideration the degree of error associated with any test score. Thus, even though one applicant might score two points higher than another, the two-point difference might be the result of chance (error) rather than actual differences in ability
Banding