Week 3, Measurement, Key Terms Flashcards
(34 cards)
Measurement
The assignment of numbers to objects or events according to a set of rules
Indicators
In psychological measurement we do not measure constructs directly (try to put a finger on IQ…) .
Instead we measure the characteristics or properties associated with individuals.
We measure indicators (signs that point to something else).
Why not measure organizational constructs directly?
We loose specificity as we move from micro – to macro level – easier to do direct measurement at the indiv. level than it is to do at the Org. level
Scales of Measurement
Psychological measurement varies in precision.
Differences in precision are reflected in the types of scales on which particular characteristics are being measured.
Four levels of measurement
Nominal
Ordinal
Interval
Ratio
Nominal measurement
Lowest level of measurement
Represent differences in kind
Individuals are assigned or classified into qualitatively different categories
Merely labels
Frequently used to identify or catalog individuals and events
Ex.
SS#
Assign 1 to males and 2 to females
The classes must be mutually exclusive
Ordinal Measurement
Not only allows classification by category, but also provides an indication of magnitude
Rank ordered according to greater or lesser amounts of some dimension
If (a>b) and (b>c) then (a>c)
In top down selection this may be all the info that we need to know
Interval Measurement
Have other useful properties
Scores can be transformed in any linear fashion without altering the relationships between the scores
Allows two scores from different tests to be compared directly on a common metric
Standardization
Ratio Measurement
Highest level of measurement
In addition to equality, transitivity, additivity, the ratio scale has a natural or absolute 0 point.
Height, distance, & weight are all ratio scales
Don’t see these scales much in psych measurement
Psychological Measurement
Principally concerned with individual differences in traits, attitudes, or behaviors.
Trait – a descriptive label applied to a group of interrelated behaviors
Based on standardized samples of individual behavior we infer the position or standing of the individual on the trait in question
Systematic Nature of Measurement
TEST - a systematic procedure for measuring a sample of behavior.
Procedures are systematic in order to minimize the effects of unwanted contaminants (error or bias)
What is the difference between a personality “test” and a test of cognitive ability?
Found in:
Mental Measurements Yearbook
&
Publishers
&
3rd Party (e.g. Rocket-Hire)
&
Authors* (Taking the Measure of Work)
Classifying tests
Content
Tests may be classified in terms of the task inherent in the scale
Ex Cognitive ability tests
Achievement
Aptitude
VS
Non-cognitive instruments (or inventories)
Tests may also be classified in terms of the efficiency with which they can be administered.
E.g.
Individual vs. Group
Speed vs. Power – designed to prevent perfect scores (always want variability on measurement tools)
Speed test – more items than you can answer in an amount of time
Power- you can take as long as you want to answer the items, scored by correct answers – the longer you take to take the test – the more variance you get in the scores – it could take someone 24 hours to take a test because they want to do the best they can – too much variance
Likert Scales
When I am stressed, sometimes I get high.
A. strongly disagree
B. disagree
C. agree
D. strongly agree
Self-report measure
Behavioral Observation
The other end of the continuum
Best predictor of future behavior…
Issue of Obtrusiveness:
-Heisenberg uncertainty principle (observer principle)
–When people see that you’re paying attention to them, their behavior will change
-Hawthorne effect
–Turned the heat up – performance went up, turned the lights up – performance went up, turned the heat down - performance went up – WHY? Because people are observing their performance
Can be cumbersome with large N size
To capture behavior you must be there when it occurs
Naturalistic observation
Situational Judgment Test
The purpose is to identify a respondent’s intentions
Presents the person with a series of relevant incidents, and asks what he/she would do in that situation
The typical question is “ what would you do if …”
Often used to assess intelligence in a more “real world” fashion
Can assess a variety of constructs
Theory Based
Goal setting theory
Intentions or goals are the immediate precursor of a person’s behavior
Added benefit of content validity
Attitudes>Intentions>Behavior
Assessment Centers
Simulate the situation in which the individual will be performing
Predicts how successful that person will be in the actual situation
Exercises vary in fidelity and immersion
Assessment Center Examples
AT & T developed and operated the Advanced Management Potential Assessment Program (AMPA) for itself and the Bell System Operating Companies. The program was used by all the Bell System companies from 1979 through 1983.
Dr. Rich’s example of the study he conducted in the early 2000s where he and his team immersed executives in a situation in Baltimore – testing their adaptability – they were put into different situations all over Baltimore – e.g. they were told to talk to a guy about a problem – when they got to him – they realized that he was def – so some people just gave up since they couldn’t use sign language – others would grab a napkin and a pen so that they could communicate with him.
The CEO could then see who was needed at the company and who wasn’t – like the person who would give up when they couldn’t figure out a situation.
Psychometrics
RELIABILITY
If measurement procedures are to be useful, they must produce dependable scores
Consistency
Freedom from unsystematic (random) errors of measurements
Methods to assess reliability
Test Re-test
Parallel (alternate) forms
Internal consistency
-Split half
–Splitting a test in half – you can split the test anyway
-Kuder-Richardson 20-
–Test with a right and wrong answer
-Alpha
–Average of all split-half reliabilities
-Omega
Test Re-test is a good way to test reliability.
The downside to giving someone the same test twice: is the practice effect – will do better since they’ve taken it once.
Issues Related To Reliability
No fixed value that indicates acceptable
Reliabilities often range from .70 -.90
Range of scores (need variability)
-A range of scores is reliable
Sample size & number of items
-The more observation you have the more reliability you have
Reliability & Validity
Theoretically it would be possible to develop a perfectly reliable measure whose scores were completely uncorrelated with any other variable.
This measure would have no practical value.
It would be highly reliable but would have no validity.
Limit on validity
Validity is reduced by the unreliability in a set of measures
Ex. performance appraisal
-Typical reliabilities are low (.60)
-Sets a cap on possible criterion validity
-We can statistically correct for this type of unreliability
What is Validity ?
The extent to which a measurement procedure actually measures what it is designed to measure
Degree to which evidence and theory support the interpretation of test scores for their intended purpose
The investigation processes of gathering or evaluating data to asses this is called validation.
Really concerned with two issues 1. What a test measures 2. How well it measures it.
Validity
Tests scores are typically used to draw inferences about applicant behavior in situations beyond the testing environment
Test user must be able to justify the inferences drawn by having a cogent rationale or empirical support linking the test score to the inferred outcome
Nobody cares about the test score – what they care about are the consequences (inferences)