PSYC4022 Testing and Assessment Week Two Psychometrics Flashcards Preview

PSYC4022 Testing and Assessment > PSYC4022 Testing and Assessment Week Two Psychometrics > Flashcards

Flashcards in PSYC4022 Testing and Assessment Week Two Psychometrics Deck (60):


"An objective & standardized procedure for measuring a psychological construct using a sample of behavior” (Guion, 1998)


Raw Score

Unmodified account of test performance.



involves administering a test to a representative sample for the purpose of establishing norms.



Show the distribution of results for the sample from a certain population. Raw scores from the sample can be transformed to standard scores to enable the development of norms.


Normative Sample

Is the group of people whose performance on a particular test is analysed for reference in evaluating the performance of individual testtakers.



is a standard on which a judgement or decision is based.


Criterion Referenced Evaluation

A method of evaluation and a way of deriving meaning from test scores by evaluating an individual's score with reference to a set standard. For example, a certain level GPA to gain entry into Honours



a set of test scores


Frequency Distribution

Where you tally the number of score occurances


Grouped Frequency Distribution

Where intervals are counted


Measures of Central Tendency

E.g. Mean, Medium, Mode


Measures of Variability/ Dispersion

Range, SD, Square Root of Variance


How good is a score of 46.5?

You'd need to know;
1. What was the test? How did developers define the concepts
2. What type of distribution/ test norm referenced or criterion referenced?
3. Norms for test (if norm referenced) Cut off score if criterion referenced.
4. Was it a raw score or a scaled score
5. Mean for population; Standard deviation
6. Reliability - not a chance happening
7. How it is scored? What it is out of, what scale?
8. Content and Construct Validity
9. Qualifications/ Experience of the tester
10. What population was sampled


The Normal Curve is also called the...

Gaussian Curve


6 Types of Distribution Curves are...

1. Normal
2. Bi-Modal
3. Positively Skewed
4. Negatively Skewed
5. J-shaped
6. Rectangular


What percentage of scores lie within 1, 2, 3, 4, 5 and 6 SDs of the mean?

1. 34%
2. 68%
3. 82%
4. 96%
5. 98%
6. 100%


A test is standardised when it has (3 Things);

1. Supervised Administration
2. Consistent conditions, instructions, wording and timing
3. been administered to representative group from a target population (random, stratified or convenience)


Raw scores from the sample can be transformed into standard scores to enable the development of...



The Ravens has a standard norm and a....

managerial norm


Which accounting firm developed the managerial norm for the Ravens?

Price Waterhouse Coopers


Standard Scores

Raw scores that have been converted from one scale to another, the latter scale having some arbitrary set mean and SD.


Z-score ** EXAM

(mean set at zero, and SD set at one), which results from the conversion of a raw score into a number indicating how many SD units the raw score is below or above the mean.


T-score ** EXAM

(mean set at 50 and SD set at 10 (50 plus or minus ten scale)). A scale that ranges from 5 standard deviations below the mean to 5 standard deviations above the mean. A raw score that falls at 5 SDs below the mean would be 0, and one that falls at the mean would be 50, and 5 above would be 100.


From Z Scores to T Scores Formula

= z (10) + 50


The Mean and SD of IQ

M = 100, SD = 15


What is the formula to work out a Z score from an IQ?

Z = (X-M)/SD


What is the formula for measurement error?

X = T + e


There are a number of sources of error. Name 3

1. Test construction
2. Test administration
3. Test scoring and Interpretation



Area of Psychology concerned with the quality of tests/scales and items designed to measure psychological constructs



The degree to which evidence supports the interpretation of test scores for their intended purposes.


Construct Validity

Adequacy of operational definition of variables. Does the test measure what is purports to measure?


There are two criteria used to judge the quality of tests/scales

1. Reliability
2. Validity


Nomological Network

Network of research evidence surrounding and supporting the validity of a construct. It supports, doesn't PROVE a construct.


Convergent Validity

If a measure has convergent validity, it should correlate with questionnaires that measure the same and/ or related constructs.


Discriminant/ Divergent Validity

If a measure has discriminant validity, it should not correlate with questionnaires that measure different constructs or unrelated constructs.


Reliability and Validity are usually measured with...

correlation co-efficient


Exploratory Factor Analysis

Exploratory Factor Analysis is used to identify the factor structure of a construct, generally during test development.


There are 4 types of Validity. They are;

1. Face Validity
2. Content Validity
3.Population Validity
4. Criterion Validity


Face Validity

The face-appearance that a test measures what it purports to measure.


Content Validity

The extent to which a test measure what it was designed to measure.


Population Validity a type of external validity which describes how well the sample used can be extrapolated to a population as a whole.


Criterion Validity (also criterion-related validity).
a. Concurrent and
b. Predictive

A judgement regarding how adequately a score or index on a test or other tool of measurement can be used to infer an individual's most probabaly standing on some measure of interest (or criterion). a. Concurrent - at the time of testing. b. Predictive - at some future point in time.



The consistency or stability of a measure of behaviour


What is the relationship between standard error of measurement and reliability?

The larger the SEM, the smaller the reliability.


What is the formula for SEM

= SD*(SQRT 1-r)


There are 4 types of Reliability. What are they?

1. Test-Retest Reliability (.7 or .8 is acceptable)
2. Inter-rater Reliability
3. Internal Consistency (coefficient alpha)
4. Parallel Forms


Standard Error of Measurement (SEM)

Provides an indication of the dispersion of the measurement errors when you are trying to estimate true scores from observed scores.


Varimax Rotation and Structural Equation Modelling are two methods of....

Factor Analysis


Can a test have good reliability but poor validity?

Yes, using foot measurement as an IQ test


Can a test have good validity but poor reliability?

Generally not, but you could have a one-off


Can a test have good validity but poor utility?

Yes, e.g. drug patches on juvenile offenders.


Can a test have good utility but poor reliability?

Probably not, that would be difficult


Can a test have good utility but poor validity?

Yes, like the Myers-Briggs has questionable validity but is useful for on-boarding


There are 7 stages of a meta-analysis. What are they?

1. Articulate a research question
2. Identify a relevant population
3. Capture all the available evidence (studies) (see Cochrane Library and PRISMA framework)
4. Form meaningful measure of effect size
5. Pool studies in an appropriate manner and formulate a balance-of-evidence conclusion
6. Look for sources of variation in effect size
7. Look for possible bias in available evidence


Exercise 1 on Measuring Ourselves. What did we learn?

1. Definitions of concepts is important.
2. Require agreed method of measuring
3. Require time to administer instruments
4. Validity and Reliability are important
5. Correct choice of instruments
6. Rely on openness - willingness for self disclosure
7. Need to consider ethics at all times
8. Variety of sources of information
9. Cultural Differences
10. Can we really measure all concept - implications


Sources of Error - Test Construction (2 things)

1. To what extent do items adequately sample the construct being assessed?
2. To what extent are items clearly worded? Any ambiguities?


Sources of Error - Test Administration (5 things)

1. Distractions in the Environment
2. Emotional State of test taker
3. Attitude/Disposition of test administrator
4. Interpersonal Issues between test taker and administrator
5. Malfunctions of Instruments


Name 1 possible error with online testing

1. Internet Speed is an unknown variable. In a timing test, you loose precision.


The WAIS-IV was stratified on a US sample according to the following;

1. Geographic region: 4 Regions by census reports including MW, NE, SW….
2. Race - White, African Americans, Hispanics, Asians and other racial groups.
3. Age - 13 age groups represented (not all of equal range or size)
4. Sex - Under 65 Equal Numbers, ; over 65 proportional to population
5. Educational Level - 5 Educational Levels based on number of years completed


What are the benefits of assessment? (4 Things)

1. Diagnosis
2. Differentiate
3. Severity
4. Monitor/ Malingering