Selection (Measurement/Testing/Reliability) Flashcards
(97 cards)
define measurement in the context of selection
“systematic application of pre-established rules or standards for assigning scores to the attributes or traits of an individual.” - Gatewood & Field, 7e
what is the overarching purpose of selection measures?
to be used as a predictor or criterion; to detect any true differences that may exist among individuals with regard to the attribute being measured
A predictor or criterion measure is standardized if it possesses each of the follow- ing characteristics
- Content—All persons being assessed are measured by the same information or content. This includes the same format (for example, multiple-choice, essay, and so on) and medium (for example, paper-and-pencil, computer, video). 2. Administration—Information is collected the same way in all locations and across all administrators, each time the selection measure is applied. 3. Scoring—Rules for scoring are specified before administering the measure and are applied the same way with each application. For example, if scoring requires subjective judgment, steps should be taken (such as rater training) to ensure inter-rater agreement or reliability.6
what scale must a selection criterion be measured at?
interval
List and describe types of criterion methods
- Objective production data—These data tend to be physical measures of work. Number of goods produced, amount of scrap left, and dollar sales are examples of objective production data. 2. Personnel data—Personnel records and files frequently contain information on workers that can serve as important criterion measures. Absenteeism, tardiness, voluntary turnover, accident rates, salary history, promotions, and special awards are examples of such measures. 3. Judgmental data—Performance appraisals or ratings frequently serve as criteria in selection research. They most often involve a supervisor’s rating of a subordi- nate on a series of behaviors or outcomes found to be important to job success, including task performance, citizenship behavior, and counterproductive behav- ior. Supervisor or rater judgments play a predominant role in defining this type of criterion data. 4. Job or work sample data—These data are obtained from a measure developed to resemble the job in miniature or sample of specific aspects of the work process or outcomes (for example, a typing test for a secretary). Measurements (for exam- ple, quantity and error rate) are taken on individual performance of these job tasks, and these measures serve as criteria. 5. Training proficiency data—This type of criterion focuses on how quickly and how well employees learn during job training activities. Often, such criteria are labeled trainability measures. Error rates during a training period and scores on training performance tests administered during training sessions are examples of training proficiency data.
what are the two basic options for choosing selection measures?
locate existing measures or create your own measures
locating existing selection measures: discuss the advantages
- Use of existing measures is usually less expensive and less time-consuming than developing new ones. 2. If previous research was conducted, we will have some idea about the reliability, validity, and other characteristics of the measures. 3. Existing measures often will be superior to what could be developed in-house.
List the basic steps involved in developing your own selection measure
- Analyzing the job for which a measure is being developed 2. Selecting the method of measurement to be used 3. Planning and developing the measure 4. Administering, analyzing, and revising the preliminary measure 5. Determining the reliability and validity of the revised measure for the jobs studied 6. Implementing and monitoring the measure in the human resource selection system
creating your own selection measure: 1. work analysis
-broader analysis of work can be used in situations wherein technology/jobs are changing too rapidly and quickly for a traditional job analysis to be carried out -purpose is to determine KSAs necessary for the work activities or identify employee competencies from broader perspective -provides foundation for criterion measures to be chosen/developed -
creating selection measures: selecting the measurement method
depends on: -nature of job tasks and level of responsibility -skill of people who are administering and scoring -costs -resources available for development -applicant characteristics -choose the method that’s most appropriate; for example, to test an industrial electrician applicant’s ability to solder connections, you wouldn’t give a paper pencil test, but probably a work sample test
creating selection measures: planning and developing the selection measure; specifications required for each measure
-prepare an initial version of the measure 1. The purposes and uses the measure is intended to serve. 2. The nature of the population for which the measure is to be designed. 3. The way the behaviors or knowledge, skills, abilities, and other attributes (KSAOs) will be gathered and scored. This includes decisions about the method of administration, the format of test items and responses, and the scoring procedures.15
describe the general method for generating items for selection measures
Substantial work is involved in selecting and refining the items or questions to be used to measure the attribute of interest. This often involves having subject-matter experts (SMEs) create the items or rewrite them. In developing these items, the reviewers (for example, SMEs) should consider the appropriateness of item content and format for fulfilling its purpose, including characteristics of the applicant pool; clarity and grammatical correctness; and consideration of bias or offensive portrayals of a subgroup of the population.
discuss the two types of response formats for selection measure responses
Broadly, there are two types of formats—the first uses objective or fixed-response items (multiple-choice, true-false); the second elicits open-ended, free-response formats (essay or fill-in-the-blank). The fixed-response format is the most popular; it makes efficient use of testing time, results in few (or no) scoring errors, and can easily and reliably be transformed into a numerical scale for scoring purposes. The primary advantage of the free-response format is that it can provide greater detail or richer samples of the candidates’ behavior and may allow unique characteris- tics, such as creativity, to emerge. Primarily due to both the ease of adminis- tration and objectivity of scoring, fixed-response formats are most frequently utilized today, particularly if the measure is likely to be administered in a group setting. Finally, explicit scoring of the measure is particularly critical. Well-developed hiring tools will provide an “optimal” score for each item that is uniformally applied.
creating selection measures: administering, analyzing, and revising
-pilot testing -The measure should be administered to a sample of people from the same population for which it is being developed. -Choice of participants should take into account the demographics, motivation, ability, and experience of the applicant pool of interest. - if a test is being developed for which item analyses (for ex- ample, factor analyses or the calculation of means, standard deviations, and reliabilities) are to be performed, a sample of at least a hundred, preferably several hundred, will be needed. -Based on the data collected, item analyses are performed on the preliminary data. The objective is to revise the proposed measure by correcting any weakness and deficien- cies noted. Item analyses are used to choose the content, permiting it to discriminate between those who know and those who do not know the information covered.
creating selection measures: psychometric characteristics to consider when analyzing pilot test data
- The reliability or consistency of scores on the items. In part, reliability is based on the consistency and precision of the results of the measurement process and in- dicates whether items are free from measurement error. 2. The validity of the intended inferences. Do responses to an item differentiate among applicants with regard to the characteristics or traits that the measure is designed to assess? For example, if the test measures verbal ability, high-ability individuals will answer an item differently than those with low verbal ability. Often items that differentiate are those with moderate difficulty, where 50 percent of applicants answer the item correctly. This is true for measures of ability, which have either a correct or incorrect answer. 3. Item fairness or differences among subgroups. A fair test has scores that have the same meaning for members of different subgroups of the population. Such tests would have comparable levels of item difficulty for individuals from diverse de- mographic groups. Panels of demographically heterogeneous raters, who are qualified by their expertise or sensitivity to linguistic or cultural bias in the areas covered by the test, may be used to revise or discard offending items as war- ranted. An item sensitivity review is used to eliminate or revise any item that could be demeaning or offensive to members of a specific subgroup.
creating selection measures: implementing the measure
After we obtain the necessary reliability and va- lidity evidence, we can then implement our measure. Cut-off or passing scores may be de- veloped. Norms or standards for interpreting how various groups score on the measure (categorized by gender, ethnicity, level of education, and so on) will be developed to help interpret the results. Once the selection measure is implemented, we will continue to mon- itor its performance to ensure that it is performing the function for which it is intended. Ultimately, this evaluation should be guided by whether the current decision-making pro- cess has been improved by the addition of the test.
Using norms to interpret scores on selection measures
-a score may take on different meanings depending on how it stands relative to the scores of others in particular groups. Our interpretation will depend on the score’s relative standing in these other groups. -norm group for comparison should be relevant/comparable to the applicant group -use local norms -norms are transitory- they’re specific to the point in time when they were collected, and probably change over time -Norms are not always necessary in HR selection. For example, if five of the best per- formers on a test must be hired, or if persons with scores of 70 or better on the test are known to make suitable employees, then a norm is not necessary in employment decision making. One can simply use the individuals’ test scores. On the other hand, if one finds that applicants’ median selection test scores are significantly below that of a norm group, then the firm’s recruitment practices should be examined. The practices may not be at- tracting the best job applicants; normative data would help in analyzing this situation.
Reliability: definition (selection measures)
degree of dependability, consistency, or stability of scores on a measure used in selection - Gatewood, 7e
In general, how is reliability of a measure determined?
by the degree of consistency between two sets of scores on the measure
In general, what determines whether a measure has low or high reliability?
more measurement error = lower reliability less measurement error = higher reliability
Discuss the concept of “true scores” in the context of reliability of selection measures
The true score is really an ideal conception. It is the score individuals would obtain if external and internal conditions to a measure were perfect. For example, in our mathe- matics ability test, an ideal or true score would be one for which both of the following conditions existed: 1. Individuals answered correctly the same percentage of problems on the test that they would have if all possible problems had been given and the test were a construct valid measure of the underlying phenomenon of interest (see next chapter). 2. Individuals answered correctly the problems they actually knew without being affected by external factors such as lighting or temperature of the room in which the testing took place, their emotional state, or their physical health. Because a true score can never be measured exactly, the obtained score is used to estimate the true score. Reliability answers this question: How confident can we be that an individual’s obtained score represents his or her true score?
Discuss the idea of error score in the context of reliability of selection measures
A second part of the obtained score is the error score. This score represents errors of measurement. Errors of measurement are those factors that affect obtained scores but are not related to the characteristic, trait, or attribute being measured.7 These factors, present at the time of measurement, distort respondents’ scores either over or under what they would have been on another measurement occasion. There are many reasons why individuals’ scores differ from one measurement occasion to the next. Fatigue, anxi- ety, or noise during testing that distracts some text takers but not others are only a few of the factors that explain differences in individuals’ scores over different measurement occasions.
discuss the function of the reliability coefficient in the context of selection measures
A reliability coefficient is simply an index of relationship. It summarizes the relation between two sets of measures for which a reliability estimate is being made. The calcu- lated index varies from 0.00 to 1.00. In calculating reliability estimates, the correlation coefficient obtained is regarded as a direct measure of the reliability estimate. The higher the coefficient, the less the measurement error and the higher the reliability estimate. Conversely, as the coefficient approaches 0.00, errors of measurement increase and reli- ability correspondingly decreases. Of course, we want to employ selection measures hav- ing high reliability coefficients. With high reliability, we can be more confident that a particular measure is giving a dependable picture of true scores for whatever attribute is being measured.
list the primary types of methods of estimating reliability for selection measures
- test-retest 2. parallel (equivalent forms) 3. internal consistency 4. interrater