- pilot test with a representative sample (if you’re measuring students test on students) - the results of the pilot’s test tell us how Is our scale on content validity - pilot test is also useful for item analysis – it tells us how is the scales: - Distractor analysis: Ensuring all incorrect options are equally selected. - Item difficulty: Ideal difficulty level around 0.5. - Item discrimination: Ability to differentiate between high and low performers.

lect 3 Flashcards by L J

what type of measures are there

Nominal – Categories without numeric meaning (e.g. gender, nationality).

Ordinal – Rank order without equal intervals (e.g. job performance ranking).

Interval – Equal intervals but no true zero (e.g. IQ scores).

Ratio – Equal intervals with a true zero (e.g. sales numbers, time spent).

How well did you know this?

Not at all

Perfectly

critically assess existing measures (e.g. selection procedures. criteria) and evaluate their adequacy based on validity and reliability evidence

1. Cognitive Ability Tests
Strengths: High predictive validity for job performance, especially in complex jobs. High reliability (consistent results across time and people).
Weaknesses: Can lead to adverse impact (e.g. ethnic group differences). May not predict interpersonal or contextual performance.

2. Big Five Personality Tests
Strengths: Conscientiousness consistently predicts performance across jobs. Low adverse impact compared to cognitive tests. Generally reliable, especially with good test construction.
Weaknesses: Lower validity than cognitive ability. Vulnerable to faking in high-stakes settings.

3. Situational Judgment Tests (SJTs)
Strengths: Good for measuring interpersonal and soft skills. Moderate predictive validity. Low adverse impact. Incremental validity over cognitive ability and personality.
Weaknesses: Lower reliability if not well-developed (especially open-ended SJTs). May not generalize well across different jobs unless tailored.

How well did you know this?

Not at all

Perfectly

why is measurement important

It helps describe, understand, and predict individual differences.

Essential for HR decision-making (e.g. hiring, training, performance evaluation).

Without accurate measurement, organizational decisions are arbitrary and possibly unfair.

How well did you know this?

Not at all

Perfectly

how we assess adequacy of measures)

to assess the adequacy of a measure, we look at its reliability and validity. Reliability refers to how consistently the measure produces results—this includes test–retest, internal consistency, and interrater reliability. Validity refers to whether the measure actually assesses what it claims to measure.

How well did you know this?

Not at all

Perfectly

what is a test

Any psychological measurement instrument, technique or procedure that systematically measures a sample of behavior

How well did you know this?

Not at all

Perfectly

what are the 3 steps of test development

item generation
pilot test
post pilot activities

How well did you know this?

Not at all

Perfectly

what is item generation

its a step in test develipment
1. determine purpose - What is the goal of the test? (e.g., selection, evaluation, research)
2. Define the attribute - Identify the relevant psychological constructs.
3. Develop a measurement plan - Format, response scales, and anchors.
4. Write items - Ensure clarity, include reverse items, avoid double-barreled questions.

How well did you know this?

Not at all

Perfectly

what is pilot teast

pilot test with a representative sample (if you’re measuring students test on students)
the results of the pilot’s test tell us how Is our scale on content validity
pilot test is also useful for item analysis – it tells us how is the scales:
Distractor analysis: Ensuring all incorrect options are equally selected.
Item difficulty: Ideal difficulty level around 0.5.
Item discrimination: Ability to differentiate between high and low performers.

How well did you know this?

Not at all

Perfectly

post pilot activities

Select good items.
Determine reliability and validity.
Revise and update test items as needed.

How well did you know this?

Not at all

Perfectly

in what 3 ways can testing be systematic

Content Validity: Items are systematically chosen.
Administration: Standardized testing procedures.
Scoring: Objective and consistent rating system

How well did you know this?

Not at all

Perfectly

how do you determine what to measure

To determine what to measure, you start with a job analysis. This is a systematic process that identifies the key tasks, responsibilities, and the required knowledge, skills, abilities, and other characteristics (KSAOs) for a specific job. Based on this information, you can choose or develop measures that are directly related to the job’s performance requirements. Job analysis ensures that your selection tools are relevant, valid, and legally defensible.

How well did you know this?

Not at all

Perfectly

where and how do you find the measure you are looking for

To find the right measure, you first check if a validated existing test already exists—for example, cognitive ability tests, personality inventories, or structured interviews available from test publishers or academic literature. If no suitable measure exists, you can develop your own by:

Conducting a job analysis to define what needs to be measured.

Generating items that reflect key KSAOs.

Pilot testing the measure on a representative sample.

Analyzing reliability and validity.

Revising and finalizing the measure.

This ensures your tool is job-relevant, reliable, and valid.

How well did you know this?

Not at all

Perfectly

realiability

The consistency of a measure across time, forms, raters, or items.

Types:
Test–retest (same test at different times)
Parallel forms (alternate versions)
Internal consistency (e.g. Cronbach’s alpha)
Interrater reliability (consistency between raters)
Rule of thumb:
>.70 for research
>.90 for selection decisions

How well did you know this?

Not at all

Perfectly

validity

The extent to which a measure assesses what it’s intended to.

You can have a reliable but not valid test, but not vice versa.

How well did you know this?

Not at all

Perfectly

how to test reliablity

correlation coeficient (r)
Different methods of measuring reliability
- Test-retest: giving the same test twice to assess stability.
- Parallel/alternate forms: Using different but equivalent versions of a test.
- Internal consistency: Measuring correlation between different items within the same test (Cronbach’s alpha, split-half reliability).
- stability/equivalence – combines different sources of error
- internal consistency – estimation of reliability within 1 assessment with Cronbach alpha and split half
- Interrater reliability: Checking if different raters score the same test similarly.

How well did you know this?

Not at all

Perfectly

what is a good reliability,

Study These Flashcards

over 0,7 in psych

what are the different types of validity

Study These Flashcards

content validity - tested by experts -
criterion validity - the relationsjip between predictor and criteruom
comstruct validity - am i measuring what i want to measure
incremental validity - does a new selection method add predictive value
face validity - does the test look like it measures what it is supposed to measure

what is content validity

Study These Flashcards

Content validity refers to how well a test or measure covers the full range of the relevant job domain or construct.

Example: A typing test for a secretary job has high content validity if it includes real job-related tasks like formatting and speed typing.

Often assessed using Subject Matter Experts (SMEs).

what is criterion validity and the 2 types

Study These Flashcards

Criterion validity looks at whether a test predicts actual performance or outcomes. It answers: Does this test relate to success on the job?

Two Types:
Predictive Validity:

The test is given before job performance is known.

Example: Using a test during hiring and comparing scores to performance 6 months later.

Concurrent Validity:

The test and performance are measured at the same time (e.g. current employees).

Example: Giving a test to employees and comparing it to their current performance ratings.

what factors affect criterion validity

Study These Flashcards

Range Enhancement: When a predictor is validated on a highly diverse (heterogeneous) group, making it appear as if it discriminates better than it actually does in real-world applications.
Range Restriction: Can distort validity coefficients by limiting the variability in scores.

what are the 3 types of range restricition

Study These Flashcards

1) Direct range restriction - the test being validated is used for selection purposes before its validity has been established
2) Indirect range restriction - selection is made on the basis
of some other variable that is correlated with the predictor test being validated
3) Natural attrition - if good or bad performers leave before criteria are measured. This further restricts the available range.

what is construct validity

Study These Flashcards

Construct validity examines whether a test accurately measures the theoretical trait or concept it claims to measure (e.g. leadership, intelligence).

It involves showing that the test correlates with similar constructs (convergent validity) and does not correlate with unrelated ones (discriminant validity).

does the test measure a specific psychological construc

how do we know we are measuring what we want to measure

Study These Flashcards

the scale should be related to other measures of the same construct - convergent validity
the scale should be unrelated to scored on instruments that are not supposed to be measures of that construct discriminant validity

what is synthetic validity

Study These Flashcards

A method to generalize validity evidence when local validation isn’t feasible.

Combines job analysis + existing validation data from similar jobs.

what are Situational Judgment Tests (SJTs)

Present hypothetical work scenarios; applicants choose the best response. Measure procedural knowledge and interpersonal skills. Shown to have incremental validity over cognitive ability and reduced adverse impact.

lect 3 Flashcards

(25 cards)