Selection Flashcards

(52 cards)

1
Q

Morgeson & Campion (1997)

A

Social and cognitive sources of potential inaccuracy in job analysis

SOCIAL SOURCES:
social influence processes (e.g., conformity pressure, extremity shift, motivation loss) and self-presentation processes (e.g., impression management, social desirability, demand effects)

COGNITIVE SOURCES
limitations in information processing (e.g., information overload, heuristics, categorization) and biases in information processing (e.g., carelessness, order and contrast, leniency and severity, method effects)

different sources of inaccuracy affect different parts of job analysis data: interrater reliability, interrater agreement, discriminability between jobs, dimensionality of factor structures, mean ratings, completeness of job information

also specify the job analysis facets impacted in a table, with each type of applicable bias from above checked off:
- job descriptors (job oriented, worker oriented)
- analysis activity (generate, judge)
- data collection (group meeting, individual interview, observation, questionnaire)
- source of data (incumbent, supervisor, analyst)
- purpose (compensation, selection, training)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Strah & Rupp (2022)

A

Are there cracks in our foundation? an integrative review of diversity issues in job analysis

extending Morgeson & Campion (1997) - describes the sources of true and error (in)variance in JA data across demographic subgroups

job analysis needs to more inclusively and accurately capture the job experiences of individuals from different demographic subgroups

antecedents of TRUE differences in work across subgroups = job-relevant individual differences; performing differently as a response to stereotypes, different assigned work, different environmental/societal restrictions –> true

diversity related barriers = conformity to norms, impression management, lack of opportunity for voice, demand effects, over-reliance on specific perspectives, language bias, majority effect –> non-random error

true (in)variance + error (in)variance = total (in)variance –> HR practices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Campion et al. (2011)

A

Competency modeling

CM vs JA
1. executives typically pay more attention to CM
2. CM often attempt to distinguish top performers from average performers
3. CM often include how competencies change across employee level
4. CM usually linked directly with business objectives and strategies
5. CM typically developed top-down (start at C suite) rather than bottom up (start with employees)
6. CM may consider future job requirements (directly or indirectly)
7. CM can be easier to interact with (org-specific language, visuals, etc)
8. Finite number of competencies are identified across multiple functions/jobs
9. CM frequently used to align HR systems
10. CM are often used in org development and change, rather than simple data collection

Organize it well and make it accessible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sanchez & Levine (2009)

A

Competency modeling vs job analysis

CM should be used in tandem with TJA, using TJA data as a base for the models

in general TJA is used to better understand work assignments, capturing essential elements, work-focused, typical performance

while CM is more about influencing how assignments should be performed, worker-oriented, organization-wide, maximal performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Putka et al. (2023)

A

Evaluating NLP approach to estimating KSA and interest JA ratings

input = job descriptions and task statements from ONET (training) and independent set of occupations from large org (testing)

ML approach: produced KSAO predictions that had cross-validated correlation with SME ratings of KSAs:
knowledge (.74)
skills (.8)
abilities (.75)
interests, RIASEC (.84)

found clear evidence for validity of machine-based prediction based on:
(a) convergence of machine-based and SME-furnished ratings
(b) conceptually meaningful patterns of prediction and model regression coefficients among KSAOs
(c) conceptual relevance of top predictor models underlying related clusters of KSAOs in the PCAs analyzed (beyond the stats, the clusters made sense)

prediction models developed on ONET data produced meaningful results on the independent set of job descriptions and tasks (testing data, no KSAOs in that set)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sackett et al. (2022)

A

Revisiting meta-analytic estimates of validity in personnel selection

discusses range restriction issues by saying apporaches traditionally used to correct for range restrictions (building range restriction artifact disctributions) have significant flaws that have generally led meta-analysts to substantially overcorrect for range restriction

after critiquing previous RR practices, they offer a best estimate of mean operational validity which often reflects either a range restriction correction or no correction at al

new top 8: structured interview (.42), job knowledge test (.4), empirically keyed biodata (.38), work sample tests (.33), cognitive ability tests (.31), integrity tests (.31) personality based emotional intelligence (.3), assessment centers (.29)
highest BW subgroup differences: cognitive ability tests (.79), work sample tests (.67), job knowledge tests (.54)

contextualized personality tests went up in validity from general personality and have low BW differences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sackett et al. (2023)

A

Revisiting the design of selection systems in light of new findings regarding validity of widely used predictors

A number of predictors at the top of the list, such as job knowledge tests, work sample tests, and empirically keyed biodata, are not generally applicable in situations where KSAs are developed after hire via training or on the job. Work samples and job knowledge tests fall into this category

Since cognitive ability no longer emerges as the top predictor for validity findings, cognitive ability does not need to be the centerpiece of selection procedures; in fact, this changes characteristics for the validity-diversity tradeoff

Ultimately, how should practitioners and researchers estimate operational validity?

1) Correct for reliability first, then for range restriction
2) Measurement error exists in all our criteria, correcting for unreliability is important for all validity studies
3) Use estimate of interrater reliability, not internal consistency
4) Consider local interrater reliability, if available
5) If not available, consider reliability estimates from similar settings with similar measures
6) If neither above are available, utilize relevant meta-analytic reliability estimate
7) triangulate between local and meta-analytic reliability estimates if multiple estimates available
8) Lower reliability estimates produce larger corrections (based on the formula)
9) If objective performance is used, consistency over time is the basis for reliability
10) Correcting for range restriction requires credible estimate of predictor standard deviation in applicant pool and the standard deviation among selected employees
11) If predictor in question was used in selecting validation sample, range restriction is particularly important issue
12) Range restriction generally does not have sizeable effect if predictor was not used in selecting validation sample
13) Obtain local applicant and incumbent sample standard deviation if possible
14) Be cautious when using formulas that convert selection ratio into U-ratio for range restriction correction
15) Be cautious about using publisher norms as estimate of applicant pool standard deviation
16) Do not use mean range restriction correction from meta-analysis as basis for correction in concurrent studies (key message from Sackett et al., 2022)
17) Use mean range restriction correction factor from meta-analysis with extreme caution
18) Make no correction unless confident in the standard deviation information at hand

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Scherbaum et al. (2017)

A

Chapter on subgroup differences in selection assessments

big point: Combining multiple methods in a balanced selection battery can help mitigate adverse impact while maintaining predictive validity.

GMA tests, despite their predictive power, present the largest subgroup differences and the greatest risk of adverse impact.

Personality tests, integrity tests, structured interviews, and work samples are more equitable and offer viable alternatives or supplements to cognitive assessments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Stanek & Ones (2018)

A

cog ability and personality. massive paper

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Schneider & Newman (2015)

A

Intelligence is multidimensional: Theoretical review and implications of specific cognitive abilities.

HRM usually treats cognitive ability as a unidimensional construct.

possible rationales for this choice = practical convenience, the parsimony of Spearman’s theory of general mental ability (g), positive manifold among cognitive tests (all positively related to each other), and empirical evidence of only modest incremental validity of specific cognitive abilities for predicting job and training performance over and above g.

Recommend use of narrower, second-stratum cognitive abilities (e.g., fluid reasoning, crystallized intelligence).

The renewed focus on multiple dimensions of intelligence is supported by several arguments:

  • empirical evidence of modest incremental validity (typically at or above 2%) of specific cognitive abilities predicting job performance beyond g
  • compatibility principle - specific abilities predict specific job tasks better than general performance (e.g., spatial reasoning for engineering tasks)
  • application of bifactor and relative importance methodologies to predict job performance via g and specific abilities simultaneously
  • Selection tools emphasizing specific abilities may reduce racial subgroup differences compared to g-heavy tests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Melson-Silimon et al. (2023) and commentaries

A

Personality testing and the Americans with Disabilities Act: Cause for concern as normal and abnormal personality models are integrated

Concerns for personality testing out of risk of ADA breach

Neuroticism (+) Borderline PD
Agreeableness (–) Narcissistic PD
Extraversion (–) Avoidant, schizoid PD
Conscientiousness (–) Antisocial PD

RECOMMENDATIONS

  1. Establish job relatedness through a proper job
    analysis. Whenever possible, utilize alternative selection methods that are less invasive but with equivalent validity.
  2. Avoid personality tests that assess constructs closely related to PDs, “dark side” traits, and normal personality traits that are highly correlated with PDs.
  3. Conduct more research involving development
    and validation of personality tests to be used in
    preselection.
  4. Ensure items ask about behavior in the workplace.
  5. Do not involve persons with clinical or medical
    licensure in administration or interpretation unless
    clinical personality diagnosis is job related and, if so, administer the test AFTER a conditional job offer.
  6. Advocate for direct conversation with various
    disciplines in psychology and the EEOC through
    research and discussion on implications of an
    anticipated change in PD diagnosis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Dahlke & Sackett (2017)

A

guidance on handling effect sizes in differential prediction

PREDICTIVE BIAS

subgroup differences and predictive bias can exist independently of one another

testing for predictive bias involves using moderated multiple regression, where the criterion measure is regressed on the predictor score, subgroup membership, and an interaction term between the two

Slope and/or intercept differences between subgroups indicate predictive bias

EFFECT SIZES

in predictive bias analyses, it is useful to consider effect sizes as well as statistical significance. See Nye & Sackett (2017) and Dahlke & Sackett (2017) for treatment of effect sizes in predictve bias analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Schmidt & Hunter (1998)

A

Precursor to Sackett et al. (2022) with meta-analytic estimates of predictor validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sackett et al. (2024)

A

A contemporary look at the relationship between general cognitive ability and job performance. [meta-analysis]

Main point: GCA is related to job performance, but our estimate of the magnitude (validity = .22) of the relationship is lower than prior estimates.

The relationship between general cognitive ability (GCA) and overall job performance has been a long-accepted fact in industrial and organizational psychology. However, the most prominent data on this relationship date back more than 50 years.

mean observed validity of .16, with a residual SD of .09. Correcting for unreliability in the criterion and correcting predictive studies for range restriction produces a mean corrected validity of .22 and a residual SD of .11.

While this is a much smaller estimate than the .51 value offered by Schmidt and Hunter (1998), that value has been critiqued by Sackett et al. (2022), who offered a mean corrected validity of .31 based on integrating findings from prior meta-analyses of 20th century data. (new estimate is based on 21st century data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Hoffman et al. (2015)

A

A review of the content, criterion-related, and construct validity of ACs

big recommendation: don’t use exercise based scoring over dimension-based scoring, BUT both can be meaningful and should be further investigated

meta analysis of exercise dimensions for content, criterion related, construct and incremental validity of 5 common AC exercises

in-basket (given a set of info and need to respond accordingly), LGD (leaderless group discussion), case analysis, oral presentation, role play

all 5 types significantly related to job performance (rho = .16 - .19)

nomological network analysis –> exercises tend to be modestly associated with GMA, extraversion, and to a lesser extent openness, and unrelated to agreeableness, conscientiousness, and emotional stability

exercises tend to explain far beyond GMA and big 5, and exercises are not interdependent of what they are measuring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Kleinmann & Ingold (2019)

A

Toward a Better Understanding of Assessment Centers: A Conceptual Review [annual review]

ACs are a commonly-utilized method for assessing employees, especially leaders
ACs comprise multiple assessment components, at least one of which is a behavioral simulation exercise.

An AC may consist solely of simulation exercises, or combine them with other methods, such as interviews, personality inventories, and/or ability tests.

The result is a comprehensive, partially-or fully-behavioral evaluation of an assessee’s proficiency on a set of job-relevant, behaviorally-defined performance dimensions

Can be used for assessment, diagnostic, and developmental purposes

Can apply systems 1 and 2 processes for influencing assessee ratings, and CAPS theory for assessor behavior

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Kuncel & Sackett (2014)

A

Resolving the AC construct validity problem as we know it

importance of dimension variance in ACs

ongoing concern about the construct validity of AC dimensions, long standing concern that post-exercise dimension ratings (PEDRs) reflect more exercise variance than dimension variance, such that the variance in these measures is more about their performance on specific exercises than actually measuring a dimension (e.g., leadership)

however, PEDRs are not the final score. they are an intermediate step toward an overall dimension rating– and the overall dimension rating should be the focus of inquiry– dimension variance will quickly overtake exercise-specific variance as the dominant source of variance when ratings from multiple exercises are combined (good thing)
– with as few as 2 exercises, dimension variance can reach the lowest level of construct variance dominance

however, the largest source of dimension variance is a general factor (meaning general performance or getting things done, which makes it difficult to really pinpoint multiple distinct constructs)

Suggests that ACs may not be measuring multiple, distinct constructs, but rather a general capability to perform well in workplace simulations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Speer et al. (2023)

A

meta-analysis on biodata in employment settings: providing clarity on criterion and constructed related validity estimates

main point: biodata inventories are highly predictive assessment methods and are likely to provide unique variance over other common predictors

2 defining features of biodata validity
(a) construct domain
(b) scoring method (rational, hybrid, empirical)

biodata had criterion related validity with job performance and additional outcomes, convergent validity with common external hiring measures

biodata inventories are one of the most predictive assessment methods available, but the relationship with work outcomes differs by construct domain and scoring method

  • empirically scored was strongest CR validity (rho = .44) compared to rational (rho = .29)
  • scales developed to measure conscientiousness and leadership were generally the most predictive of the job performance of narrow construct domains, and particularly when empirically keyed
  • when biodata scales were correlated with theoretically aligned performance ratings, rational scoring resulted in similar validity coefficients to as empirical scoring
  • biodata scales exhibited expected patterns of correlations with external measures and were only moderately correlated with cognitive ability and big 5
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Whetzel et al. (2020)

A

Situational Judgment Tests: An Overview of Development Practices and Psychometric Characteristics

Lots of guidance on SJTs:

SCENARIOS
- critical incidents enhance realism of scenarios
- SPECIFIC scenarios –> higher validity, less assumptions by examinee
- brief scenarios can reduce reading load, can reduce group differences
- avoid: sensitive topics, overly simplistic scenarios (one plausible response), overly complex scenarios

RESPONSE OPTIONS
- use SMEs to develop responses
- range of effectiveness levels
- be careful about transparency/obvious when assessing a construct
- only one action, no double-barreled
- have options of active bad (do something wrong) and passive bad (do nothing)
- check for tone cues

RESPONSE FORMAT
- use knowledge-based (should-do) in high stakes to help with faking
- use behavioral tendency (would-do) in non-cognitive constructs like personality
- use the method where examinees rate each option (higher reliability and favorable applicant reactions)
- single-response SJTs are easy for analysis but can have higher reading load on candidates

SCORING
- empirical and rational keys have similar levels of reliability and validity, use SME input
- develop more scenarios and options than you will end up needing
- use 10-12 raters with different perspectives
- use means (effectiveness levels) and SDs (rater agreement) to select options

reliability and validity
- do NOT use alpha for multidimensional SJTs
- instead used split half with spearman brown (assuming content is balanced)
- validity is similar for knowledge and behavioral tendency
- SJTs have slight incremental validity over cog ability and personality, they likely also measure a general personality factor, and it can correlate with other constructs (cog ability/personality)
- have been used in military settings

group differences
- smaller on SJTs than GMA test
- women perform slightly better
- behavioral tendency has smaller group differences than knowledge
- rate format has lower group differences than ranking or selecting best and worst

presentation methods
- SJTs have several advantages in terms of avatar and video based
- higher face and criterion-related validity, but may be less reliable
- using avatars may be less costly, but developers should consider uncanny valley effects when using 3D human imaging

faking
- faking DOES effect rank ordering of candidates and who is hired
- faking is more of a problem with BEHAVIORAL tendency (would do) than knowledge-based (should do), especially in high stakes situations
- SJTs generally appear less vulnerable to faking than personality measures

coaching
- examinees can be coached on how to maximize SJT responses, orgs can endorse this to help level the playing field (as opposed to individuals seeking it out on their own)
- scoring adjustments (key stretching, within-person standardization across scores)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Hartwell et al. (2022)

A

social media assesssment

lays out map for how to structure a SMA

identifies potential issues with using SMA including missing information, privacy concerns, discrimination

should base components of an SMA on components of structured interviews

structural components of SMA: job relatedness, procedural consistency, rating scales used, documentation, assessor training, having multiple raters, separating raters from the decision makers, informed consent, notifying about results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Huber et al. (2021)

A

Faking and the validity of personality tests: An experimental investigation using modern forced choice measures.

MFC scales substantially reduced motivated score elevation but also appeared to
elicit selective faking on work-relevant dimensions.

Despite reducing the effectiveness of
impression management attempts, MFC scales did not retain more validity than Likert scales when participants faked.

However, results suggested that faking artificially bolstered the criterion-related validity of Likert scales while diminishing their construct validity

22
Q

Blackhurst et al. (2011)

A

Should You Hire BlazinWeedClown@ Mail. Com?

conducted a study to test whether applicant email addresses are related to their owners’ job-related qualifications

Found that that those with appropriate (versus inappropriate or questionable) email addresses had higher conscientiousness, professionalism, and work-related experience.

NO difference for cognitive ability

however, there is not as strong a distinction between questionable and appropriate
caution the hiring manager who wants to use only email addresses to screen applicants.

although there are significant differences between applicants with appropriate vs questionable or inappropriate email addresses, the effect sizes are not large.
there is a difference of roughly 10% between the high and low group means on each of the masures.

rather than using email addresses to screen applicants, authors suggest view the less-than-professional email address as a yellow flag

23
Q

Campion & Campion (2023)

A

overview of special issue which features shorter descriptions of work from practice-side involving ML & selection

illustrative ML applications:
- scoring resumes and employment applications
- scoring constructed responses to assessments (interviews, write-in test answers)
- combining scores to increase prediction
- combining scores to reduce subgroup differences
- creating test questions
- analyzing jobs to determine requirements
- inferring skills and personality from narrative applicant information

lessons learned: alpha is not always best/look at test-retest; model can be more reliable than criterion and may not be fully accurate; etc

emerging best practices
future research suggestions

24
Q

McDaniel et al. (2011) and commentaries

A

The Uniform Guidelines are a detriment to the field of personnel selection.

UG hasn’t been updated in 30+ years – science and practice is outdated

SIOP should have a larger role in setting standards

UG’s perspective on separate ‘types’ of validity – rather than types of validity evidence

UG’s false assumptions regarding AI – the 4/5ths rule has no scientific basis and burden on employer to provide validity evidence can be very expensive for small or medium orgs

25
Yu & Kuncel (2020)
Comparing random weighting schemes with expert judgments 2 general ways to go about making judgments in selection: (a) mechanical methods, statistically combine predictors (b) holistic/clinical methods, human expert, subjective judgment Mechanical methods outperform human judgment However, individuals may have concerns about using purely mechanical methods with no human input to determine weights (gets at concerns of face validity, applicant reactions to the process) Compared the following: Predictor weighting (regression) - random weighting = consistently applied, inconsistently applied - optimal regression weighting Found: experts outperformed inconsistent weights (meaning everything was completely random within and across candidates); consistent random weights (I.e., assigning a random weight to each predictor in the same way across all candidates) reliably outperformed expert judgment for hiring decisions across 3 datasets implication: experts do not make judgments completely inconsistently and are aware, to some extent, of what information is most valuable. However, their inconsistency in combining information does drastically damage their accuracy. authors suggest to develop decision making systems that help control consistency or to manage consistency by aggregating multiple expert judgments
26
van Iddekinge et al. (2023)
Personnel selection: a review of ways to maximize validity, diversity, and the applicant experience review of personnel psychology: criterion related and content validity subgroup differences Adverse impact Bias Applicant reactions cost and time effects on unit and firm-level outcomes development decisions: use multiple selection procedures, add structure, contextualize selection procedures, minimize cognitive load, review selection procedures, limit faking opportunities, consider gamifying selection procedures, consider pre-test explanations/practice tests/coaching, share target KSAOs to applicants, allow for retesting weight prediction and criteria, consider AI, score banding future research: bias and opportunity in AI, selection across cultures, selection for gig work and telework
27
Kuncel et al. (2013)
Mechanical versus clinical data combination in selection and admissions decisions: A meta-analysis. Mechanical / algorithmic = applying an algorithm or formula to each applicant’s scores Holistic / clinical = data are combined using judgment, insight, or intuition, rather than an algorithm or formula that is applied the same way for each decision Mechanical methods outperform holistic methods when combining criteria see also Yu & Kuncel (2020)
28
Woo et al. (2023) and commentaries
woo compares validity and reliability, bias, and fairness of grad school assessment methods (GRE, undergrad gpa, personal statements, resume, letters of rec, interviews); finds that other assessment methods than GRE have limited psychometric evidence and may be plagued by sociocognitive rater biases; suggests future steps Gomez et al. (2023) - commentary to Woo et al. Argue that Woo et al. engage in epistemic oppression by failing to cite germane research, erroneously do not evaluate the GRE's construct validity, and do not consider Black American's humanity by casually talking about George Floyd's murder Kuncel & Worrell (2023) - commentary to Woo et al. Extend upon next steps based on Woo et al.'s findings; invest in education for gifted and talented students at all levels of education; consider broader skills than overall cog ability like psychosocial skills and more *narrow* cognitive skills [specific]
29
Woo et al. (2023) - rejoinder
summarizes points of divergence and convergence between authors; suggests next steps argues that Gomez and Boykin missed the main point of the argument (that other methods of graduate school admission assessment are likely even MORE unfair than the GRE and potentially are less valid and reliable) agree that additional predictors should also be used, but future research needs to evaluate psychometric properties of these other predictors
30
Newman et al. (2014)
Why minority recruiting doesn’t often work, and what can be done about it conduct a number of simulations to suggest that targeted recruiting must explicitly model applicant qualifications as part of the recruiting process. important to focus on increasing minority applicant qualifications rather than just # of minority applicants to increase workplace diversity recruiting intervention x race x applicant qualifications model: - designed to enhance the probability of applying among more qualified minority applicants, more so than it enhances the probability of applying among majority applicants and also among less qualified minority applicants
31
King & Gilrane (2015)
Social science strategies for managing diversity: I-O opportunities to enhance inclusion white paper that lists individual (employee, manager) and organizational (hiring, div training, and perf management) suggestions for enhancing inclusion individuals (a) employees: provide support (listen), confront bias (intervene) (b) managers: beware of own bias blind spots, question assumptions, be a role model for inclusion organizations (a) hiring: job related measurement, consider full range of competencies, address stereotype threat (b) diversity training: include multiple group rather than single group focus; use multiple learning techniques; awareness AND behavioral goals; integrate into larger strategic diversity initiatives (c) performance management: provide more information, increase time, increase accountability
32
Vodanovich & Rupp (2022), adverse impact burden shift, disparate treatment burden shift
employment discrimination book; provides review of legal landscape key topics burden shifting = legal framework used in discrimination cases to guide how evidence is presented and evaluated, and who is responsible for providing evidence at each step (a) adverse impact burden shifting framework - legal steps adverse impact = neutral practices disproportionately harm protected group members, focuses on consequences of employment decisions, no intent required; typically assessed with applicant flow data - phase 1: challenger must demonstrate a particular employment practice causes discrimination in question - phase 2: company must demonstrate that challenged practice is job related and consistent with business necessity (data from job analysis typically used as evidence) -phase 3: plaintiff must prove then an equally valid, job related practice exists with less, or no, adverse impact (b) disparate treatment burden shifting framework - legal steps - disparate treatment = intentionally treating individuals differently based on their group membership in a protected group - phase 1: the prima facie (establishing foundation for case with evidence) case for disparate treatment requires sufficient direct or indirect evidence that plaintiffs belong to a protected class, are qualified for the position, are subject to negative employment decision, others not in the protected group are treated more favorably - phase 2: burden is shifted to employer, defense must articulate (not prove) that a legitimate reason exists for the alleged discriminatory practice - phase 3: plaintiff must prove by direct or indirect evidence that the organization's reason for their decision is a pretext for discrimination
33
Morris (2017)
chapter in adverse impact analysis textbook provides an overview of practical significance testing for adverse impact reviews data, statistical inference, type I and II errors, 1 or 2 tailed tests z-test, chi-square test, FET, LMP Assumptions: independence, large sample size, correct sampling model choosing among tests, multiple comparisons limitations of statistical significance tests, using confidence intervals
34
Oswald et al. (2017)
chapter in adverse impact analysis book provides an overview of statistical significance testing for adverse impact presents model for jointly considering practical AND statistical significance statistical significance --> statistical precision (e.g., confidence intervals) --> practical significance impact ratio, phi coefficient, odds ratio absolute difference in selection rates, h statistic, shortfall statistic when to use which effect size and how to interpret each
35
Outtz & Newman (2010)
A theory of adverse impact proposes a theoretical model of adverse impact that includes factors like SES, exposure to test content, and exchange motivation performance irrelevant race related variance in tests contributes to adverse impact shows a 3-part venn diagram of test variance (red), race variance (yellow), and job performance variance (blue). what they are discussing is the overlap of test and race variance (orange) that does not include the overlap of job performance variance (brown) --> performance-irrelevant race-related variance in test scores = r^2PIRV racial subgroup differences are NOT uniform across cognitive subtests, with crystallized intelligence showing much larger racial differences than fluid intelligence subtests (via cognitive speed) proposes solutions for IOs to tackle: long term (change systems and structures of quality control and opportunity, but this is a huge undertaking and should be seen as a distal goal) medium term (test development) short term (predictor weighting, criterion weighting, recruiting)
36
Berry (2015)
Differential validity and differential prediction of cognitive ability tests: understanding test bias in the employment context Differential validity = show that observed and operational validities of cognitive ability tests are about 10-20% lower for African Americans and Hispanic Americans than for Whites differential prediction shows slope and intercept differences, whites have a slightly higher intercept than African Americans and Hispanic Americans, meaning that non-White job performance is typically OVERpredicted should be noted that most differential validity and prediction data is from the 1980s/earlier, many have used the general aptitude test battery (GATB) and do not provide adequate information on statistical artifacts future research needs to look at issues from indirect range restriction need research on what causes predictive bias... range restriction? psychometric characteristics of test or criterion? contextual influences (e.g., stereotype threat)? true differences in cognitive ability?
37
Gatewood et al. (2019)
Human Resource Selection Good textbook cite regarding the following chapters: job performance concepts and measures, job analysis, legal issues in selection, recruitment of applicants human resource measurement in selection, reliability of selection measures, validity of selection procedures Application forms (biodata and training/experience evaluations), reference and social media checks, selection interviews, ability tests, personality assessment, simulation tests, test for CWBs, strategies for selection decision making
38
Realistic job previews in recruitment
Breaugh, 2013 Theory suggests that providing realistic information about a job during recruitment should result in new employees having job expectations met, based on the assumption that an RJP allows people who do not perceive a strong person-job fit to withdraw from the process In turn, met expectations --> lower turnover and higher job satisfaction employer's recruitment actions can influence the interest of prospective job applicants in a job opening and the ability or individuals the org hires, their diversity, their job performance, and their retention
39
Pareto-optimization
weights for each predictor are statistically determined to simultaneously optimize two (or more) criteria (e.g., job performance and diversity) as compared to only optimizing one criterion (job performance) when regression weights are estimated in the weights analysis De Corte et al. (2007) introduced this method in selection and highlighted that after using incumbent data, it's necessary to cross-validate with applicant data Song et al. (2017) found that when pareto-optimal weights were applied to an applicant sample, both expected diversity and job performance outcomes decreased (diversity and validity shrinkage) Sample size. Validity shrinkage and diversity shrinkage both decrease when sample size increases. (>100) In any case, pareto optimization still outperformed the unit weighted solution Rupp et al. (2020) provided a user friendly demonstration of how to use pareto-optimization suggest that the metrics need to be considered carefully in light of legal issues legal counsel should be sought out to examine the legal intricacies current applicants should not be used to determine pareto optimal weights (instead use a calibration sample a priori)
40
Correction for range restriction/artifact distribution
Related to Sackett et al (2022) well understood that the goal of validation of selection procedures used for predicting job performance = estimation of OPERATIONAL validity in an applicant sample, using a criterion measure that is free of measurement error (unrealistic) OBSERVED validity estimates are underestimates in the presence of range restriction and measurement error in the criterion; because of this, corrections for range restriction and measurement error were developed in early psychometrics, used to obtain 'better' estimates of operational validity meta-analysis = ideal approach - obtain estimate of operational validity for each study and cumulate the findings - in order for this to work correctly, we'd need to know the amount of criterion measurement error and range restriction for each study. - here, we use an artifact distribution: estimates of reliability for the criterion measure may be available for a subset of the collected studies. the mean and variance of the artifact distribution of reliability estimates are obtained --> correct the mean and variance of the full set of observed validity estimates using this distribution issues: assumes subset studies used to build the artifact distribution are randomly drawn from studies included in the meta; independence of artifacts if looking at multiple
41
Approaches to diversity staffing
Goldberg (2005) - Similarities between recruiters and applicants positively influence application attraction and selection decisions Avery & McKay (2006) - minority job seekers respond more positively to messages about diversity, including descriptions of diversity philosophies or diversity management policies, and often seek out information when making job choice decisions Williamson et al. (2008) - non-minority job seekers are more attracted to organizations that express a value for diversity, suggesting effectiveness of practice across demographic groups
42
Social media in selection
cites to use: Van Iddekinge et al. (2016) - facebook profiles of college students and followed up with them in new jobs, SMA ratings were unrelated to supervisor's performance ratings Roulin and Levashina (2019) - LinkedIn in two studies. 1: Profiles that are longer, include a picture, and have more connections are rated more positively. Raters were fairly consistent. 2: itemized LinkedIn assessment is more effective than a global assessment. social media profiles are commonly looked at by potential employers to inform selection decisions. some studies demonstrate that SMA contains sufficient interrater reliability, convergent validity with traditional methods, and small yet significant criterion related validity (Roulin and Levashina 2019) SMA can lack both convergent and criterion related validity (Van Iddekinge et al. 2016) there may be potential for subgroup differences in SMA ratings, empirical research has supported the possibility that this leads to unfair discriminatory selection decisions (Van Iddekinge et al., 2016) applicants may respond negatively to SMA and believe it is an invasion of privacy
43
AC dimensions versus exercises debate
cites to use: Arthur et al. (2003) - provides AC dimensions: consideration/awareness of others, communication, drive, influencing others, organizing and planning, problem solving, stress tolerance Sackett & Dreher (1982) - exercise variance is way more prominent in ratings than dimension variance (bad) Kuncel & Sackett (2014) critiqued these findings and argued that overall dimension ratings, which are a composite measure of individual dimension ratings rather than exercises, explain the most variance in assessor ratings Sackett (2021) remarked that he originally got it wrong in 1982, as dimensions DO reliably and validly explain variance in ratings
44
Biodata
Gatewood et al. (2019) Biodata is historical information that represents the applicants past behaviors and past experiences (at work, education, family, community involvement; separate from measures like personality or values). It is empirically developed and scaled in a way to maximize prediction Use of biodata has declined since the 70s because many of the components reflect the same aspects of personality tests and require considerable resources to develop (technical expertise and large sample) based on the idea that past behaviors are good predictors of future behaviors; an applicants previous experiences will predict how they perform on the job ideally, items would be developed based on data from a job analysis, have validity evidence, and have been screened for possible discrimination impact against protected groups concerns about accuracy / implications for validity and reliability
45
Hickman et al. (2022)
Automated video interview personality assessments: reliability, validity, and generalizability investigations organizations are increasingly using AVIs to screen applicants, but AVIs lack supporting evidence this paper developed AVIs that use verbal, para-verbal and nonverbal behaviors extracted from the video interviews ---> assessing big 5 results: AVI personality assessments exhibited stronger evidence of validity when trained on when trained on interviewer reports (compared to trained on self-reports from applicants) when cross-validated in other samples AVI personality assessments trained on interviewer reports [as outcome] had mixed evidence of reliability, exhibited consistent convergent and discriminant relations; the model used predictors that appear to be conceptually relevant to the focal traits and predicted academic outcomes little evidence of reliability or validity for AVIs trained from self-report applicant data
46
Zhang et al. (2023)
Reducing subgroup differences in personnel selection through the application of ML Main point: Collectively, the studies in this article illustrate that ML is unlikely to be able to resolve the issue of adverse impact, but it may assist in finding incremental improvements [personal comment: this gets at the idea that ML is likely to be most beneficial when it comes to automating selection procedures and text analysis stuff] STUDY 1: fairness aware ML algorithms (designed to optimize for predictive accuracy while limiting adverse impact of predictions) that statistically eliminate subgroup differences must create predictive bias *mathematically*, which may reduce validity and penalize high-scoring racial minorities STUDY 2: statistically removing subgroup differences (by oversampling higher performing minorities during ML training stage) only slightly reduced adverse impact ratios of resulting ML model, but also slightly reduced model accuracy (convergent validity in this study)
47
Koenig et al. (2023)
Improving measurement and prediction in personnel selection through the application of ML ML can score audio constructed responses with as much reliability and criterion-related validity as humans ML can be used to predict multiple outcomes simultaneously (e.g., productivity and turnover) but gains over traditional methods are small with highly structured data [may not be worth all the extra effort]
48
Landers et al. (2023)
A simulation of the impacts of ML to combine psychometric employee selection system predictors on performance, adverse impact, and number of dropped predictors In a large scale set of simulations, ML does not greatly predict beyond traditional methods like regression, UNLESS samples are small relative to the number of parameters (n to k ratio is less than 3) - BUT there are many nuanced findings where ML may be better, such as when item-level models are used most consistently valuable improvement from adopting ML over traditional regression was from dropping predictors rather than by improving prediction the future of ML: - potential of ML for selection is unlikely to be realized in selection systems focusing on the combo of scales composites from preciously validated psychometric tests - INSTEAD, it will likely be realized in *unconventional design scenarios*, such as the use of individual items to make multiple trait inferences, or with novel data formats like text, image, audio, video, or behavioral traces
49
Rotundo & Sackett (2002)
The Relative Importance of Task, Citizenship, and Counterproductive Performance to Global Ratings of Job Performance: A Policy-Capturing Approach 3 component model Task performance → activities formally recognized as part of a job Contextual performance → discretionary behaviors that contribute positively to organization’s environment Ex. organizational citizenship behaviors (OCBs) Counterproductive performance → voluntary bc harming well-being of organization or its members Ex. CWB, deviant policy-capturing approach to understand what is the relative importance to managers of task performance, OCB, and CWB in performance ratings policy capturing presents individuals with different scenarios to capture the weight that decision makers place on different cues within the scenarios Can help to understand the relative importance of a set of factors to a person/decision-maker/employee found that performance raters fell into 3 clusters: (1) weighting task performance highest, (2) CWB highest, or (3) equal and large weights to task performance and CWB (citizenship was generally given less weight across the board)
50
Fisher et al. (2024) *consider
AI-based tools in selection: Considering the impact on applicants with disabilities Our research suggests that people with disabilities face many barriers to finding employment. Could technologies based on AI used in selection be one of those barriers? Or is it possible that AI technologies help people with disabilities find employment more easily? **Games should be designed with multiple modes of signaling to ensure they can be used by applicants with either visual or auditory disabilities, giving all applicants a full opportunity to perform well and thereby perceive the selection tool as providing fair decisions.** Tips Choose vendors and software carefully Provide options for requesting accommodations Make choices that work for all applicants Inform applicants of technology requirements Learn about algorithmic auditing Help applicants get the information they need Learn about the legal context for use of selection technologies Like any new approach to employee selection, the adoption of AI based tools needs to be carefully considered. On the one hand, some tools may offer features that some applicants with disabilities appreciate and that allows them to more fairly present their talents in the selection process. On the other hand, these tools may create additional barriers for other applicants with disabilities thereby leading to less inclusive employment opportunities
51
Silver et al. (2025)
Conscientiousness assessments for people with attention-deficit/hyperactivity disorder: Measurement properties and potential issues FOR to be work-oriented helped level out some of the differences between ADHD and non ADHD individuals important to look at facets of conscientiousness, as not all facets are as important to all jobs (e.g., cautiousness, compared to achievement striving) [side note: leaders have all sorts of conscientiousness levels] hiring professionals should strive to use more job-relevant conscientiousness self-report measures in selection procedures, both in terms of item contextualization and the use of achievement-striving and dutifulness facets.
52
Wegmeyer & Speer (2023)
Examining personality testing in selection for neurodiverse individuals Area for Future Research 1: Are there differences as to how neurotypical and neurodivergent individuals react to personality testing in preemployment settings? Area for Future Research 2: What factors affect neurodivergent reactions to personality testing in preemployment settings? Area for Future Research 3: Is there evidence of test bias or measurement bias on personality tests used for selection across neurotypical and neurodivergent individuals? Area for Future Research 4: Is there evidence of AI on personality tests for neurodivergent individuals?