Flashcards in PSYC4022 Testing and Assessment Week Two Psychometrics Deck (60):

1

## Test

### "An objective & standardized procedure for measuring a psychological construct using a sample of behavior” (Guion, 1998)

2

## Raw Score

### Unmodified account of test performance.

3

## Standardisation

### involves administering a test to a representative sample for the purpose of establishing norms.

4

## Norms

### Show the distribution of results for the sample from a certain population. Raw scores from the sample can be transformed to standard scores to enable the development of norms.

5

## Normative Sample

### Is the group of people whose performance on a particular test is analysed for reference in evaluating the performance of individual testtakers.

6

## Criterion

### is a standard on which a judgement or decision is based.

7

## Criterion Referenced Evaluation

### A method of evaluation and a way of deriving meaning from test scores by evaluating an individual's score with reference to a set standard. For example, a certain level GPA to gain entry into Honours

8

## Distribution

### a set of test scores

9

## Frequency Distribution

### Where you tally the number of score occurances

10

## Grouped Frequency Distribution

### Where intervals are counted

11

## Measures of Central Tendency

### E.g. Mean, Medium, Mode

12

## Measures of Variability/ Dispersion

### Range, SD, Square Root of Variance

13

## How good is a score of 46.5?

###
You'd need to know;

1. What was the test? How did developers define the concepts

2. What type of distribution/ test norm referenced or criterion referenced?

3. Norms for test (if norm referenced) Cut off score if criterion referenced.

4. Was it a raw score or a scaled score

5. Mean for population; Standard deviation

6. Reliability - not a chance happening

7. How it is scored? What it is out of, what scale?

8. Content and Construct Validity

9. Qualifications/ Experience of the tester

10. What population was sampled

14

## The Normal Curve is also called the...

### Gaussian Curve

15

## 6 Types of Distribution Curves are...

###
1. Normal

2. Bi-Modal

3. Positively Skewed

4. Negatively Skewed

5. J-shaped

6. Rectangular

16

## What percentage of scores lie within 1, 2, 3, 4, 5 and 6 SDs of the mean?

###
1. 34%

2. 68%

3. 82%

4. 96%

5. 98%

6. 100%

17

## A test is standardised when it has (3 Things);

###
1. Supervised Administration

2. Consistent conditions, instructions, wording and timing

3. been administered to representative group from a target population (random, stratified or convenience)

18

## Raw scores from the sample can be transformed into standard scores to enable the development of...

### norms

19

## The Ravens has a standard norm and a....

### managerial norm

20

## Which accounting firm developed the managerial norm for the Ravens?

### Price Waterhouse Coopers

21

## Standard Scores

### Raw scores that have been converted from one scale to another, the latter scale having some arbitrary set mean and SD.

22

## Z-score ** EXAM

### (mean set at zero, and SD set at one), which results from the conversion of a raw score into a number indicating how many SD units the raw score is below or above the mean.

23

## T-score ** EXAM

### (mean set at 50 and SD set at 10 (50 plus or minus ten scale)). A scale that ranges from 5 standard deviations below the mean to 5 standard deviations above the mean. A raw score that falls at 5 SDs below the mean would be 0, and one that falls at the mean would be 50, and 5 above would be 100.

24

## From Z Scores to T Scores Formula

### = z (10) + 50

25

## The Mean and SD of IQ

### M = 100, SD = 15

26

## What is the formula to work out a Z score from an IQ?

### Z = (X-M)/SD

27

## What is the formula for measurement error?

### X = T + e

28

## There are a number of sources of error. Name 3

###
1. Test construction

2. Test administration

3. Test scoring and Interpretation

29

## Psychometrics

### Area of Psychology concerned with the quality of tests/scales and items designed to measure psychological constructs

30

## Validity

### The degree to which evidence supports the interpretation of test scores for their intended purposes.

31

## Construct Validity

### Adequacy of operational definition of variables. Does the test measure what is purports to measure?

32

## There are two criteria used to judge the quality of tests/scales

###
1. Reliability

2. Validity

33

## Nomological Network

### Network of research evidence surrounding and supporting the validity of a construct. It supports, doesn't PROVE a construct.

34

## Convergent Validity

### If a measure has convergent validity, it should correlate with questionnaires that measure the same and/ or related constructs.

35

## Discriminant/ Divergent Validity

### If a measure has discriminant validity, it should not correlate with questionnaires that measure different constructs or unrelated constructs.

36

## Reliability and Validity are usually measured with...

### correlation co-efficient

37

## Exploratory Factor Analysis

### Exploratory Factor Analysis is used to identify the factor structure of a construct, generally during test development.

38

## There are 4 types of Validity. They are;

###
1. Face Validity

2. Content Validity

3.Population Validity

4. Criterion Validity

4a.Concurrent

4b.Predictive

39

## Face Validity

### The face-appearance that a test measures what it purports to measure.

40

## Content Validity

### The extent to which a test measure what it was designed to measure.

41

## Population Validity

### ...is a type of external validity which describes how well the sample used can be extrapolated to a population as a whole.

42

##
Criterion Validity (also criterion-related validity).

a. Concurrent and

b. Predictive

### A judgement regarding how adequately a score or index on a test or other tool of measurement can be used to infer an individual's most probabaly standing on some measure of interest (or criterion). a. Concurrent - at the time of testing. b. Predictive - at some future point in time.

43

## Reliability

### The consistency or stability of a measure of behaviour

44

## What is the relationship between standard error of measurement and reliability?

### The larger the SEM, the smaller the reliability.

45

## What is the formula for SEM

### = SD*(SQRT 1-r)

46

## There are 4 types of Reliability. What are they?

###
1. Test-Retest Reliability (.7 or .8 is acceptable)

2. Inter-rater Reliability

3. Internal Consistency (coefficient alpha)

4. Parallel Forms

47

## Standard Error of Measurement (SEM)

### Provides an indication of the dispersion of the measurement errors when you are trying to estimate true scores from observed scores.

48

## Varimax Rotation and Structural Equation Modelling are two methods of....

### Factor Analysis

49

## Can a test have good reliability but poor validity?

### Yes, using foot measurement as an IQ test

50

## Can a test have good validity but poor reliability?

### Generally not, but you could have a one-off

51

## Can a test have good validity but poor utility?

### Yes, e.g. drug patches on juvenile offenders.

52

## Can a test have good utility but poor reliability?

### Probably not, that would be difficult

53

## Can a test have good utility but poor validity?

### Yes, like the Myers-Briggs has questionable validity but is useful for on-boarding

54

## There are 7 stages of a meta-analysis. What are they?

###
1. Articulate a research question

2. Identify a relevant population

3. Capture all the available evidence (studies) (see Cochrane Library and PRISMA framework)

4. Form meaningful measure of effect size

5. Pool studies in an appropriate manner and formulate a balance-of-evidence conclusion

6. Look for sources of variation in effect size

7. Look for possible bias in available evidence

55

## Exercise 1 on Measuring Ourselves. What did we learn?

###
1. Definitions of concepts is important.

2. Require agreed method of measuring

3. Require time to administer instruments

4. Validity and Reliability are important

5. Correct choice of instruments

6. Rely on openness - willingness for self disclosure

7. Need to consider ethics at all times

8. Variety of sources of information

9. Cultural Differences

10. Can we really measure all concept - implications

56

## Sources of Error - Test Construction (2 things)

###
1. To what extent do items adequately sample the construct being assessed?

2. To what extent are items clearly worded? Any ambiguities?

57

## Sources of Error - Test Administration (5 things)

###
1. Distractions in the Environment

2. Emotional State of test taker

3. Attitude/Disposition of test administrator

4. Interpersonal Issues between test taker and administrator

5. Malfunctions of Instruments

58

## Name 1 possible error with online testing

### 1. Internet Speed is an unknown variable. In a timing test, you loose precision.

59

## The WAIS-IV was stratified on a US sample according to the following;

###
1. Geographic region: 4 Regions by census reports including MW, NE, SW….

2. Race - White, African Americans, Hispanics, Asians and other racial groups.

3. Age - 13 age groups represented (not all of equal range or size)

4. Sex - Under 65 Equal Numbers, ; over 65 proportional to population

5. Educational Level - 5 Educational Levels based on number of years completed

60