Evaluating Selection Techniques and Decisions Flashcards by John Gandeza

_____ is the extent to which a score from a selection measure is stable and free from error. If a score from a measure is not stable or error-free, it is not useful.

Reliability

How well did you know this?

Not at all

Perfectly

_____ is determined in four ways: test-retest reliability, alternate-forms reliability, internal reliability, and scorer reliability.

Test reliability

How well did you know this?

Not at all

Perfectly

_____: The test scores are stable across time and not highly susceptible to such random daily conditions as illness, fatigue, stress, or uncomfortable testing conditions.

temporal stability

How well did you know this?

Not at all

Perfectly

Test-Retest Reliability

Typical time intervals between test administrations range from _____. Usually, the longer the time interval, the lower the reliability coefficient.

The time interval should be long enough so that the specific test answers have not been memorized, but short enough so that the person has not changed significantly.

3 days to 3 months

How well did you know this?

Not at all

Perfectly

The typical test-retest reliability coefficient for tests used in industry is _____

.86

How well did you know this?

Not at all

Perfectly

Alternate-Forms Reliability

This _____ of test-taking order is designed to eliminate any effects that taking one form of the test first may have on scores on the second form.

counterbalancing

How well did you know this?

Not at all

Perfectly

The average correlation between alternate forms of tests used in industry is _____.

In addition to being correlated, two forms of a test should also have the same mean and standard deviation.

.89

How well did you know this?

Not at all

Perfectly

A third way to determine the reliability of a test or inventory is to look at the consistency with which an applicant responds to items measuring a similar dimension or construct (e.g., personality trait, ability, area of knowledge). The extent to which similar items are answered in similar ways is referred to as internal consistency and measures item stability.

Internal Reliability

How well did you know this?

Not at all

Perfectly

Another factor that can affect the internal reliability of a test is item homogeneity. That is, do all of the items measure the same thing, or do they measure different constructs? The more homogeneous the items, the higher the _____.

internal consistency

How well did you know this?

Not at all

Perfectly

Internal Reliability

The _____ method is the easiest to use, as items on a test are split into two groups. Usually, all of the odd-numbered items are in one group and all the even-numbered items are in the other group.

split-half

How well did you know this?

Not at all

Perfectly

The scores on the two groups of items are then correlated. Because the number of items in the test has been reduced, researchers have to use a formula called _____ prophecy to adjust the correlation.

Spearman-Brown

How well did you know this?

Not at all

Perfectly

_____ is used for tests containing dichotomous items (e.g., yes/no, true/ false), whereas the coefficient alpha can be used not only for dichotomous items but for tests containing interval and ratio items such as five-point rating scales.

K-R 20

How well did you know this?

Not at all

Perfectly

_____ is an issue in projective or subjective tests in which there is no one correct answer, but even tests scored with the use of keys suffer from scorer mistakes.

Scorer Reliability

How well did you know this?

Not at all

Perfectly

_____ is the degree to which inferences from scores on tests or assessments are justified by the evidence.

Validity

How well did you know this?

Not at all

Perfectly

One way to determine a test’s validity is to look at its degree of _____—the extent to which test items sample the _____ that they are supposed to measure.

In industry, the appropriate content for a test or test battery is determined by the job analysis.

Content Validity

How well did you know this?

Not at all

Perfectly

Another measure of validity is _____, which refers to the extent to which a test score is related to some measure of job performance called a _____.

Commonly used criteria include supervisor ratings of performance, actual measures of performance (e.g., sales, number of complaints, number of arrests made), attendance (tardiness, absenteeism), tenure, training performance (e.g., police academy grades), and discipline problems.

criterion validity

How well did you know this?

Not at all

Perfectly

With a _____ design, a test is given to a group of employees who are already on the job. The scores on the test are then correlated with a measure of the employees’ current performance.

concurrent validity

How well did you know this?

Not at all

Perfectly

With a _____ design, a test is given to a group of employees who are already on the job. The scores on the test are then correlated with a measure of the employees’ current performance.

concurrent validity

How well did you know this?

Not at all

Perfectly

With a _____ design, the test is administered to a group of job applicants who are going to be hired. The test scores are then compared with a future measure of job performance.

predictive validity

How well did you know this?

Not at all

Perfectly

Why is a concurrent design _____ than a predictive design? The answer lies in the homogeneity of performance scores. In a given employment situation, very few employees are at the extremes of a performance scale. Employees who would be at the bottom of the performance scale either were never hired or have since been terminated. Employees who would be at the upper end of the performance scale often get promoted. Thus, the restricted range of performance scores makes obtaining a significant validity coefficient more difficult.

weaker

How well did you know this?

Not at all

Perfectly

A major issue concerning the criterion validity of tests focuses on a concept known as _____—the extent to which a test found valid for a job in one location is valid for the same job in a different location. It was previously thought that the job of typist in one company was not the same as that in another company, the job of police officer in one small town was not the same as that in another small town, and the job of retail store supervisor was not the same as that of supervisor in a fast-food restaurant.

Study These Flashcards

validity generalization, or VG

_____ validity is the most theoretical of the validity types. Basically, it is defined as the extent to which a test actually measures the _____ that it purports to measure.

_____ validity is concerned with inferences about test scores, in contrast to content validity, which is concerned with inferences about test _____

_____ validity is usually determined by correlating scores on a test with scores from other tests. Some of the other tests measure the same _____, whereas others do not.

Study These Flashcards

Construct

Another method of measuring construct validity is _____ validity. This method is not common and should be used only when other methods for measuring construct validity are not practical. With _____ validity, a test is given to two groups of people who are “known” to be different on the trait in question.

For example, suppose we wanted to determine the validity of our new honesty test. The best approach might be a criterion validity study in which we would correlate our employees’ test scores with their dishonest behavior, such as stealing or lying. The problem is, how would we know who stole or who lied? We could ask them, but would dishonest people tell the truth? Probably not. Instead, we decide to validate our test by administering it to a group known as honest (priests) and to another group known as dishonest (criminals).

Study These Flashcards

known-group

Although _____ validity is not one of the three major methods of determining test validity cited in the federal Uniform Guidelines on Employee Selection Procedures, it is still important. _____ validity is the extent to which a test appears to be job-related.

Study These Flashcards

face

_____ tests that are accepted by applicants decrease the chance of lawsuits, reduce the number of applicants dropping out of the employment process, and increase the chance that an applicant will accept a job offer.

Face-valid

_____ statements—statements so general that they can be true of almost everyone. For example, if I described you as “sometimes being sad, sometimes being successful, and at times not getting along with your best friend,” I would probably be very accurate. However, these statements describe almost anyone. So, face validity by itself is not enough.

Barnum

Perhaps the most common source of test information is the Seventeenth _____, which contains information about thousands of different psychological tests as well as reviews by test experts.

Mental Measurements Yearbook (MMY)

If two or more tests have similar validities, then cost should be considered. For example, in selecting police officers, it is common to use a test of cognitive ability such as the Wonderlic Personnel Test or the Wechsler Adult Intelligence Scale (WAIS). Both tests have similar reliabilities and validities, yet the _____ costs only a few dollars per applicant and can be administered to groups of people in only 12 minutes.

Wonderlic

Establishing the Usefulness of a Selection Device _____ tables are designed to estimate the percentage of future employees who will be successful on the job if an organization uses a particular test.

Taylor-Russell

To use the _____, three pieces of information must 1. Criterion validity coefficient—The higher the validity coefficient, the greater the possibility the test will be useful. 2. Selection ratio, which is simply the percentage of people an organization must hire. The lower the selection ratio, the greater the potential usefulness of the test. 3. Base rate of current performance—the percentage of employees currently on the job who are considered successful.

Taylor-Russell tables

The Taylor-Russell tables were designed to determine the overall impact of a testing procedure. But we often need to know the probability that a particular applicant will be successful. The _____ tables were created to do just that.

Lawshe

To use these tables, three pieces of information are needed. The validity coefficient and the base rate are found in the same way as for the Taylor-Russell tables. The third piece of information needed is the applicant’s _____.

test score

Another way to determine the value of a test in a given situation is by computing the amount of money an organization would save if it used the test to select employees. Fortunately, I/O psychologists have devised a fairly simple utility formula to estimate the monetary savings to an organization.

Brogden-Cronbach-Gleser Utility Formula

The term _____ or un_____ ed refers to technical aspects of a test. A test is considered _____ if there are group differences (e.g., sex, race, or age) in test scores that are unrelated to the construct being measured. For example, if race differences on a test of logic are due to vocabulary words found more often in the White than the African American culture, the test might be considered _____.

bias

The term _____ can include bias, but also includes political and social issues. Typically, a test is considered fair if people of equal probability of success on a job have an equal chance of being hired. Though some people argue that a test is unfair if members of a protected class score lower than the majority (e.g., Whites, men), most I/O psychologists agree that a test is fair if it can predict performance equally well for all races, genders, and national origins.

fairness

The first step in determining a test’s potential bias is finding out whether it will result in _____. There are two basic ways to determine this: looking at test results or anticipating _____ prior to the test. As mentioned in Chapter 3, _____ occurs if the selection rate for any group is less than 80% of the highest scoring group (practical significance) and the difference is statistically significant (statistical significance). If a person applying for a job does not meet the minimum qualifications, he is not considered as an applicant in _____ calculations. For example, if a plumber applies for a job as a brain surgeon, he clearly lacks the minimum qualifications for the job—a medical

adverse impact

In addition to adverse impact, an organization might also determine whether a test has _____ Validity, meaning that the test will significantly predict performance for one group and not others. For example, a test of reading ability might predict performance of White clerks but not of African American clerks

Single-Group

The last test of fairness that can be conducted involves _____ validity. With _____ validity, a test is valid for two groups but more valid for one than for the other. Single-group validity and _____ validity are easily confused, but there is a big difference between the two. Remember, with single-group validity, the test is valid only for one group. With _____ validity, the test is valid for both groups, but it is more valid for one than for the other.

Differential

If a test does not lead to adverse impact, does not have single-group validity, and does not have differential validity, it is considered to be _____.

fair

If more than one criterion-valid test is used, the scores on the tests must be combined. Usually, this is done by a statistical procedure known as _____, with each test score weighted according to how well it predicts the criterion.

multiple regression

Linear approaches to _____ usually take one of four forms: unadjusted top-down selection, rules of three, passing scores, or banding.

hiring

The advantage to _____ selection is that by hiring the top scorers on a valid test, an organization will gain the most utility. The disadvantages are that this approach can result in high levels of adverse impact and it reduces an organization’s flexibility to use nontest factors such as references or organizational fit. With _____ selection, applicants are rank-ordered on the basis of their test scores. Selection is then made by starting with the highest score and moving down until all openings have been filled.

top-down

In a _____ approach to top-down selection, the assumption is that if multiple test scores are used, the relationship between a low score on one test can be compensated for by a high score on another.

compensatory

A technique often used in the public sector is the rule of _____ (or rule of five), in which the names of the top _____ scorers are given to the person making the hiring decision (e.g., police chief, HR director). This person can then choose any of the _____ based on the immediate needs of the employer. This method ensures that the person hired will be well qualified but provides more choice than does top-down selection.

three

_____ are a means for reducing adverse impact and increasing flexibility. With this system, an organization determines the lowest score on a test that is associated with acceptable performance on the job.

Passing scores

If there is more than one test for which we have passing scores, a decision must be made regarding the use of a _____ or a multiple-hurdle approach. Both approaches are used when one score can’t compensate for another or when the relationship between the selection test and performance is not linear.

multiple-cutoff

One problem with a multiple-cutoff approach is the _____. If an applicant passes only three out of four tests, he will not be hired, but the organization has paid for the applicant to take all four tests

cost

To reduce the costs associated with applicants failing one or more tests, multiple-hurdle approaches are often used. With a _____ approach, the applicant is administered one test at a time, usually beginning with the least expensive. Applicants who fail a test are eliminated from further consideration and take no more tests. Applicants who pass all of the tests are then administered the linearly related tests; the applicants with the top scores on these tests are hired.

multiple-hurdle

As mentioned previously, a problem with top-down hiring is that the process results in the highest levels of adverse impact. On the other hand, use of passing scores decreases adverse impact but reduces utility. As a compromise between top-down hiring and passing scores, _____ attempts to hire the top test scorers while still allowing some flexibility for affirmative action.

banding

Evaluating Selection Techniques and Decisions Flashcards

(50 cards)