Flashcards in PSYC4022 Testing and Assessment Week Two Psychometrics Deck (60):
"An objective & standardized procedure for measuring a psychological construct using a sample of behavior” (Guion, 1998)
Unmodified account of test performance.
involves administering a test to a representative sample for the purpose of establishing norms.
Show the distribution of results for the sample from a certain population. Raw scores from the sample can be transformed to standard scores to enable the development of norms.
Is the group of people whose performance on a particular test is analysed for reference in evaluating the performance of individual testtakers.
is a standard on which a judgement or decision is based.
Criterion Referenced Evaluation
A method of evaluation and a way of deriving meaning from test scores by evaluating an individual's score with reference to a set standard. For example, a certain level GPA to gain entry into Honours
a set of test scores
Where you tally the number of score occurances
Grouped Frequency Distribution
Where intervals are counted
Measures of Central Tendency
E.g. Mean, Medium, Mode
Measures of Variability/ Dispersion
Range, SD, Square Root of Variance
How good is a score of 46.5?
You'd need to know;
1. What was the test? How did developers define the concepts
2. What type of distribution/ test norm referenced or criterion referenced?
3. Norms for test (if norm referenced) Cut off score if criterion referenced.
4. Was it a raw score or a scaled score
5. Mean for population; Standard deviation
6. Reliability - not a chance happening
7. How it is scored? What it is out of, what scale?
8. Content and Construct Validity
9. Qualifications/ Experience of the tester
10. What population was sampled
The Normal Curve is also called the...
6 Types of Distribution Curves are...
3. Positively Skewed
4. Negatively Skewed
What percentage of scores lie within 1, 2, 3, 4, 5 and 6 SDs of the mean?
A test is standardised when it has (3 Things);
1. Supervised Administration
2. Consistent conditions, instructions, wording and timing
3. been administered to representative group from a target population (random, stratified or convenience)
Raw scores from the sample can be transformed into standard scores to enable the development of...
The Ravens has a standard norm and a....
Which accounting firm developed the managerial norm for the Ravens?
Price Waterhouse Coopers
Raw scores that have been converted from one scale to another, the latter scale having some arbitrary set mean and SD.
Z-score ** EXAM
(mean set at zero, and SD set at one), which results from the conversion of a raw score into a number indicating how many SD units the raw score is below or above the mean.
T-score ** EXAM
(mean set at 50 and SD set at 10 (50 plus or minus ten scale)). A scale that ranges from 5 standard deviations below the mean to 5 standard deviations above the mean. A raw score that falls at 5 SDs below the mean would be 0, and one that falls at the mean would be 50, and 5 above would be 100.
From Z Scores to T Scores Formula
= z (10) + 50
The Mean and SD of IQ
M = 100, SD = 15
What is the formula to work out a Z score from an IQ?
Z = (X-M)/SD
What is the formula for measurement error?
X = T + e
There are a number of sources of error. Name 3
1. Test construction
2. Test administration
3. Test scoring and Interpretation
Area of Psychology concerned with the quality of tests/scales and items designed to measure psychological constructs
The degree to which evidence supports the interpretation of test scores for their intended purposes.
Adequacy of operational definition of variables. Does the test measure what is purports to measure?
There are two criteria used to judge the quality of tests/scales
Network of research evidence surrounding and supporting the validity of a construct. It supports, doesn't PROVE a construct.
If a measure has convergent validity, it should correlate with questionnaires that measure the same and/ or related constructs.
Discriminant/ Divergent Validity
If a measure has discriminant validity, it should not correlate with questionnaires that measure different constructs or unrelated constructs.
Reliability and Validity are usually measured with...
Exploratory Factor Analysis
Exploratory Factor Analysis is used to identify the factor structure of a construct, generally during test development.
There are 4 types of Validity. They are;
1. Face Validity
2. Content Validity
4. Criterion Validity
The face-appearance that a test measures what it purports to measure.
The extent to which a test measure what it was designed to measure.
...is a type of external validity which describes how well the sample used can be extrapolated to a population as a whole.
Criterion Validity (also criterion-related validity).
a. Concurrent and
A judgement regarding how adequately a score or index on a test or other tool of measurement can be used to infer an individual's most probabaly standing on some measure of interest (or criterion). a. Concurrent - at the time of testing. b. Predictive - at some future point in time.
The consistency or stability of a measure of behaviour
What is the relationship between standard error of measurement and reliability?
The larger the SEM, the smaller the reliability.
What is the formula for SEM
= SD*(SQRT 1-r)
There are 4 types of Reliability. What are they?
1. Test-Retest Reliability (.7 or .8 is acceptable)
2. Inter-rater Reliability
3. Internal Consistency (coefficient alpha)
4. Parallel Forms
Standard Error of Measurement (SEM)
Provides an indication of the dispersion of the measurement errors when you are trying to estimate true scores from observed scores.
Varimax Rotation and Structural Equation Modelling are two methods of....
Can a test have good reliability but poor validity?
Yes, using foot measurement as an IQ test
Can a test have good validity but poor reliability?
Generally not, but you could have a one-off
Can a test have good validity but poor utility?
Yes, e.g. drug patches on juvenile offenders.
Can a test have good utility but poor reliability?
Probably not, that would be difficult
Can a test have good utility but poor validity?
Yes, like the Myers-Briggs has questionable validity but is useful for on-boarding
There are 7 stages of a meta-analysis. What are they?
1. Articulate a research question
2. Identify a relevant population
3. Capture all the available evidence (studies) (see Cochrane Library and PRISMA framework)
4. Form meaningful measure of effect size
5. Pool studies in an appropriate manner and formulate a balance-of-evidence conclusion
6. Look for sources of variation in effect size
7. Look for possible bias in available evidence
Exercise 1 on Measuring Ourselves. What did we learn?
1. Definitions of concepts is important.
2. Require agreed method of measuring
3. Require time to administer instruments
4. Validity and Reliability are important
5. Correct choice of instruments
6. Rely on openness - willingness for self disclosure
7. Need to consider ethics at all times
8. Variety of sources of information
9. Cultural Differences
10. Can we really measure all concept - implications
Sources of Error - Test Construction (2 things)
1. To what extent do items adequately sample the construct being assessed?
2. To what extent are items clearly worded? Any ambiguities?
Sources of Error - Test Administration (5 things)
1. Distractions in the Environment
2. Emotional State of test taker
3. Attitude/Disposition of test administrator
4. Interpersonal Issues between test taker and administrator
5. Malfunctions of Instruments
Name 1 possible error with online testing
1. Internet Speed is an unknown variable. In a timing test, you loose precision.
The WAIS-IV was stratified on a US sample according to the following;
1. Geographic region: 4 Regions by census reports including MW, NE, SW….
2. Race - White, African Americans, Hispanics, Asians and other racial groups.
3. Age - 13 age groups represented (not all of equal range or size)
4. Sex - Under 65 Equal Numbers, ; over 65 proportional to population
5. Educational Level - 5 Educational Levels based on number of years completed