L3: Individual Differences & validation Flashcards

1
Q

define interpersonal skills

A

skills related to social sensitivity, relationship building, working w others, listening, and communication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

define incremental validity

A

how much a new test or predictor adds value byeond what is already measured by existing predictors.
- empirical
aka it shows whether adding a new test (like SJTs) improves the ability to predict an outcome (job performance- compared to traditional tests (cognitive ability tests)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

why do SJTs predict long-term success?

this is important for its validity as a predictor for future success

A
  • they assess soft skills like communication, empathy, and decision making
  • these skills remain stable over time & are crucial for job performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

define general procedural knowledge

A

the knowledge someboy has acquired about effective & ineffectives courses of trait related behaviour in situations like those described in the SJT
in LIevens study, general procedural knowledge reltaes to students procedural knowledge about (in)effective behaviour in interpersonal situations (with patients) as depicted in the SJT items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are the implications of LIevens study?

A
  • SJT are valuable for student seleciton alongside cognitvie tests
  • predicting post academic success is important, not just academic success
  • long term validity of SJTs supports their use in professional admissions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

define reliability

A

consistency of a test or measurement. so whether the measure is:
- dependable
- stable
- consistent over time
- gives the truest picture of someone’s abilities/characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how do you test reliability?

A
  • correlation coefficient methods
  • test retest, coeff of stabiility
  • parallel/alternate forms, coeff of equivalence
  • coeff of stability & equivalence: combines different sources of error
  • internal consistency (alpha)
  • interrater reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

define correlation coefficient

A

degree of consistency/agreement between 2 sets of independently derived scores (r)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

define internal consistency

A

degree to which the items in one test are intercorrelated
split half: split test into 2 equivalent halves
chronbachs alpha: mean of all possible split half
most commonly used
ex: all math problems in a test should measure math ability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

define interrater reliabilty

A

agreement between raters on their rating of some dimensiion
ex: olympic judges giving similar scores for a gymnast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is parallel forms reliability? and example?

A

consistency across different versions of the test
ex: 2 versions of the SAT should give similar results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is test retest reliablity & an example

A

consistency over time
ex: taking an IQ test twice & getting similar scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is a “good” reliability?

A
  • depends on what you want to do with the scores
  • the more important the decision (like life or death for reliability of fMRI scan) the more precise the measure needs to be. generally for research in social sciences >.7 and for selection decisiosn >.9
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what does validity look at?

A

whether a measure is:
- actually measuring the construct its supposed to measure &
- whether the decisions based on that measure are correct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

define content validity

A

The extent to which test items cover the intended
performance domain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how do you measure content validity?

A
  • rational examination of the manner in which the performance domain is sampled by the predictor
  • SMEs: degree of agreement among them regarding how essential a particular item/test is (if more than 1/2 SMEs say an item is essential, that item has at least some content validity, CVI formula)
  • difficult for more abstract constructs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

define criterion validity

A
  • How well a test predicts real-world outcomes.
  • empirical relationship between predictor & criterion (performance measure)
  • subtypes: predictive & concurrent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is predictive criterion validity vs concurrent criterion validity?

A

predictive: measures how well a test predicts future success
(test scores collected before the criterion like college entrance exam)
concurrent: measures how well a test correlates w current performance (test scores & criterion data collected at the same time like job skills test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

how do you measure predictive validity?

A
  • use stats to demonstrate the actual relationship between predictors & criteria
  • for ex: linear realtionship Y = a + bX + c
    1. measure candidates on predictor during selection (like conscientiousness)
    2. select candidates without using the results (need to validate first)
    3. obtain measurement of criterion performance later -> time period depends on type of job & how much training is needed (approx 6m)
    4. assess the strength of the relationship (stats)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

how do you measure concurrent validity?

A

both predictor & criterion data gathered from same employees (incumbents)
- cross sectional study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what are some issues affecting validity?

A
  • range restriction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

define meta analysis

A

combines multiple studies to determine overall validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what are the pros & cons of meta analysis

A

Pros: Resolves inconsistencies, finds general patterns.
Cons: Affected by publication bias (only strong results get published), study differences, and data quality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is range restriction?

A

if only a specific group is tested, the validity appears lower than it really is
ex: if a study only includes top students, the entrance test might seem weak at predicting performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
what are the 3 types of range restriction?
Direct range restriction – If only high-scoring applicants are selected, we underestimate the test’s true predictive power. Indirect range restriction – If a selection method is based on a different variable (e.g., motivation letters instead of grades). Natural attrition – If weak or strong performers quit early, reducing data availability.
26
define construct validity
Whether a test actually measures the concept it’s supposed to measure. - subtypes: convergent & discriminant validity
27
what is convergent validity & discriminant validity
- C: should be related to scores on other measures of the same construct - D: should be unrelated to scores on instruments that are not supposed to be measures of that construct (theoretically distinct)
28
how can we generalize the findings between predictor & criterion from all the different studies?
meta analysis
29
what are 2 examples of individual difference tests used in personnel psych to predict performance
- general mental ability (or cognitive ability): best predictor of job success, especially for complex jobs - the big 5: conscientiousness strong predictor of job success (but too much of it -> perfectionism, rigidity). job complexity affects how much personality matters
30
why is intelligence, as measured in general mental ability (GMA), a strong predictor of performance?
higher GMA = faster learning, better problem solving
31
define face validity
Whether a test looks like it measures what it should (even if it might not). - not technical but may affect applicants reactions to test (motivation, sense of fairness)
32
why is measurement of individual differences important?
- there are many individual level differences (personality, motivation, skills etc) - we aim to describe this variability & understand it, explain it, and predict it - measurement is a tool that helps us achieve that goal - also important in HR
33
what is the importance of measurement of indivdiual differences in HR?
- decision making: personnel decisions, evaluation of employees - HR specialists need to be able to: select & use psych measurements, interpret the results, communicate these results to others can affect careers of individuals, so essential for HR specialists to understand the application of psych measures. need to judge how effective they are.
34
how do we assess adequacy of measurement of individual differences?
1. Are we actually measuring what we want to measure? 2. Are we measuring this consistently? 3. Does our measure of the predictor (e.g. personality) actually predict future performance?
35
define test
any psych measurement instrument, technique or procedure that systematically mesures a sample of behaviour (eg interview, presentation, performance test etc)
36
how do you develop a test?
1. item generation (steps 1-4) 2. pilot test (steps 5-7) 3. post pilot activities (steps 8-10)
37
how does item generation go in test dev?
1. determine purpose (are u trying to evalute performance? select applications? doing research?) 2. define attribute (whats the content, so which constructs should/shouldnt be included) 3. develop measure plan (formatting, what will the anchors be) 4. write items (include some reverse items): double than what u need (if u want 10 item scale, dev 20 items), be specific & concrete simple, avoid negation
38
how do you pilot test in test dev?
5. pilot test w representative sample 6. feedback from pilot test sample on test-perceptions & item clarity -> content validity 7. item analysis (distractor analysis, item difficulty, item discrimination) -
39
what is distractor analysis?
frequency of each incorrect item (should be approx equal)
40
what is item difficulty?
nr who answer correct/total (approx 5)
41
what is item discrimination?
how well does the item serve to discriminate between better vs worse performers
42
what are post pilot activities in test dev?
8. select items (good to have normal distribution- not too hard & not too easy) 9. determine reliability & validity 10. reivse & update items
43
in what 3 ways can testing be systematic?
1. content: items are systematically chosen from behavioural domain to be measured 2. administration: standardized procedures so that each time the test is given, same directions, same recording of answers, w same time limits & as little distractions as possible 3. scoring: rules are specified in advance
44
whats the point of systematic testing
to minimize contamination on test scores
45
what are potential issues when designing a test?
- cost - face validity - contamination on test scores - interpretation of the test results by examiner
46
how do you choose predictors?
- **what** should you measure?: job analysis give u clus about what variables are related to job success (ex: negotiation skills) - **where** & **how** do you find the measure youre looking for? : does it already exist or do i have to create it? only create when unavailable then run a pilot and assess the items (difficulty, discirmination etc)
47
what is measurement in psychology?
the process of assigning numbers to objects or events according to rules - goal is to quantify individual differences in traits like intelligence, personality, job performance etc
48
what are some examples of psych measures?
- cognitive ability tests - personality inventories (like big 5) - situational judgment tests (SJTs) - interviews
49
what are the 4 different scales of measurement?
- nominal - ordinal - interval - ratio
50
what is the nominal scale of measurement?
- categorizes data without any order ex: male vs female, eye colour etc
51
what is the ordinal scale of measurement?
ranks data but does not indicate precise differences - ex: ranking in a race: you know 1st place is better than 2nd but u dont know if the time difference between them is equal
52
what is the interval scale of measurement?
- equal intervals between values but no true zero (so also no 2x as much statements) - like temperature: difference between 20 and 30 c is same as between 30 and 40 c + but zero doesnt mean "no temperature" (theres no true zero); IQ scores
53
what is the ratio scale of measurement?
like interval (equal intervals between values) but with an absolute zero ex: weight, height, age
54
which type of data analysis can be conducted with nominal scales?
count, mode
55
which type of data analysis can be conducted with ordinal scales?
median, percentiles, rank order correlation
56
which type of data analysis can be conducted with interval scales?
means, sds, correlation, regression
57
which type of data analysis can be conducted with ratio scales?
all math operations, including ratios & percentages
58
what scale of measurement is often applied in psychology?
many psych traits are measured on interval scales, even tho they dont have a true zero
59
what are the 7 concrete steps of selecting/creating the right measure | 7 steps
1. Determine the Measure’s Purpose – What are you trying to assess? 2. Define the Attribute – Is it cognitive ability, job performance, or personality? 3. Develop a Measure Plan – What type of questions will be used? 4. Write Items – Ensure the questions or tasks are clear and relevant. 5. Conduct a Pilot Study & Traditional Item Analysis – Test the measure before full implementation. 6. Use Item Response Theory (IRT) to analyze the questions – Identify good vs. bad items. 7. Revise & Update Items – Remove ineffective or biased questions.
60
how do you use item response theory to conduct comprehensive item analyses?
IRT is used to evaluate individual test items based on difficulty, discrimination, and guessing
61
define item difficulty
Determines how hard a question is.
62
define item discrimination
Measures how well a question differentiates between high- and low-ability test takers.
63
define guessing factor
Some multiple-choice items can be answered correctly just by luck so need to account for this
64
what does generalizability theory (G theory) say?
it assess reliablity in mesurement by analyzing MULTIPLE sources of error to give a more complete picture of a tests accuracy ex: a test score might be influenced by time errors (morning vs evening test taking conditions), rating bias (some interviewers might be stricter), item variance (some questions might be harder than others) G theory helps separate these factors to improve measurement accuracy
65
what is the scale coarseness problem? solution?
some measurement tools lack precision cause they dont offer enough response options ex: survey w only agree or disagree solution: use continuous or fine grained scales where possible
66
what are some upcoming trends in psych measurement?
changes in how we measure individual differences - AI in testing - gamification - mobile & online assesments - new biometric & neuroscientific approaches (eye tracking, brain scans etc)
67
what is the relationship between reliablity & validity?
a test must be reliable to be valid, but a reliable test is not necessarily valid ex: a rule that gives different lengths for the same object is unreliable, but if it consistenly gives wrong lengths, its reliable but not valid Validity≤ sqrt (Reliability) low reliability places a ceiling on valiidty. even if a test is somewhat valid, if it has low reliabliity, its usefulness is limited
68
what is the point of cross validation?
helps ensure a test remains valid across different samples
69
how do you conduct cross validation
1. conduct an initial validity study 2. apply the test in a new group & see if results remain the same 3. if the validity coefficient drops, this is called **shrinkage**
70
why is cross validation important?
- prevent overfitting: test working well in one study but not in real world settings - ensures predictiv epower remains stable across different samples
71
what is synthetic validation? when is it used?
using previous research & job analysis to infer test validiity used when local validation is not possible (small organisation often lacks neough data to conduct full validation studies) ex: instead of testing 1k salespeople, use existing data from other companies to support a tests validity