PSYC4022 Testing and Assessment Week Two Psychometrics Flashcards Preview

PSYC4022 Testing and Assessment > PSYC4022 Testing and Assessment Week Two Psychometrics > Flashcards

Flashcards in PSYC4022 Testing and Assessment Week Two Psychometrics Deck (60):
1

Test

"An objective & standardized procedure for measuring a psychological construct using a sample of behavior” (Guion, 1998)

2

Raw Score

Unmodified account of test performance.

3

Standardisation

involves administering a test to a representative sample for the purpose of establishing norms.

4

Norms

Show the distribution of results for the sample from a certain population. Raw scores from the sample can be transformed to standard scores to enable the development of norms.

5

Normative Sample

Is the group of people whose performance on a particular test is analysed for reference in evaluating the performance of individual testtakers.

6

Criterion

is a standard on which a judgement or decision is based.

7

Criterion Referenced Evaluation

A method of evaluation and a way of deriving meaning from test scores by evaluating an individual's score with reference to a set standard. For example, a certain level GPA to gain entry into Honours

8

Distribution

a set of test scores

9

Frequency Distribution

Where you tally the number of score occurances

10

Grouped Frequency Distribution

Where intervals are counted

11

Measures of Central Tendency

E.g. Mean, Medium, Mode

12

Measures of Variability/ Dispersion

Range, SD, Square Root of Variance

13

How good is a score of 46.5?

You'd need to know;
1. What was the test? How did developers define the concepts
2. What type of distribution/ test norm referenced or criterion referenced?
3. Norms for test (if norm referenced) Cut off score if criterion referenced.
4. Was it a raw score or a scaled score
5. Mean for population; Standard deviation
6. Reliability - not a chance happening
7. How it is scored? What it is out of, what scale?
8. Content and Construct Validity
9. Qualifications/ Experience of the tester
10. What population was sampled

14

The Normal Curve is also called the...

Gaussian Curve

15

6 Types of Distribution Curves are...

1. Normal
2. Bi-Modal
3. Positively Skewed
4. Negatively Skewed
5. J-shaped
6. Rectangular

16

What percentage of scores lie within 1, 2, 3, 4, 5 and 6 SDs of the mean?

1. 34%
2. 68%
3. 82%
4. 96%
5. 98%
6. 100%

17

A test is standardised when it has (3 Things);

1. Supervised Administration
2. Consistent conditions, instructions, wording and timing
3. been administered to representative group from a target population (random, stratified or convenience)

18

Raw scores from the sample can be transformed into standard scores to enable the development of...

norms

19

The Ravens has a standard norm and a....

managerial norm

20

Which accounting firm developed the managerial norm for the Ravens?

Price Waterhouse Coopers

21

Standard Scores

Raw scores that have been converted from one scale to another, the latter scale having some arbitrary set mean and SD.

22

Z-score ** EXAM

(mean set at zero, and SD set at one), which results from the conversion of a raw score into a number indicating how many SD units the raw score is below or above the mean.

23

T-score ** EXAM

(mean set at 50 and SD set at 10 (50 plus or minus ten scale)). A scale that ranges from 5 standard deviations below the mean to 5 standard deviations above the mean. A raw score that falls at 5 SDs below the mean would be 0, and one that falls at the mean would be 50, and 5 above would be 100.

24

From Z Scores to T Scores Formula

= z (10) + 50

25

The Mean and SD of IQ

M = 100, SD = 15

26

What is the formula to work out a Z score from an IQ?

Z = (X-M)/SD

27

What is the formula for measurement error?

X = T + e

28

There are a number of sources of error. Name 3

1. Test construction
2. Test administration
3. Test scoring and Interpretation

29

Psychometrics

Area of Psychology concerned with the quality of tests/scales and items designed to measure psychological constructs

30

Validity

The degree to which evidence supports the interpretation of test scores for their intended purposes.

31

Construct Validity

Adequacy of operational definition of variables. Does the test measure what is purports to measure?

32

There are two criteria used to judge the quality of tests/scales

1. Reliability
2. Validity

33

Nomological Network

Network of research evidence surrounding and supporting the validity of a construct. It supports, doesn't PROVE a construct.

34

Convergent Validity

If a measure has convergent validity, it should correlate with questionnaires that measure the same and/ or related constructs.

35

Discriminant/ Divergent Validity

If a measure has discriminant validity, it should not correlate with questionnaires that measure different constructs or unrelated constructs.

36

Reliability and Validity are usually measured with...

correlation co-efficient

37

Exploratory Factor Analysis

Exploratory Factor Analysis is used to identify the factor structure of a construct, generally during test development.

38

There are 4 types of Validity. They are;

1. Face Validity
2. Content Validity
3.Population Validity
4. Criterion Validity
4a.Concurrent
4b.Predictive

39

Face Validity

The face-appearance that a test measures what it purports to measure.

40

Content Validity

The extent to which a test measure what it was designed to measure.

41

Population Validity

...is a type of external validity which describes how well the sample used can be extrapolated to a population as a whole.

42

Criterion Validity (also criterion-related validity).
a. Concurrent and
b. Predictive

A judgement regarding how adequately a score or index on a test or other tool of measurement can be used to infer an individual's most probabaly standing on some measure of interest (or criterion). a. Concurrent - at the time of testing. b. Predictive - at some future point in time.

43

Reliability

The consistency or stability of a measure of behaviour

44

What is the relationship between standard error of measurement and reliability?

The larger the SEM, the smaller the reliability.

45

What is the formula for SEM

= SD*(SQRT 1-r)

46

There are 4 types of Reliability. What are they?

1. Test-Retest Reliability (.7 or .8 is acceptable)
2. Inter-rater Reliability
3. Internal Consistency (coefficient alpha)
4. Parallel Forms

47

Standard Error of Measurement (SEM)

Provides an indication of the dispersion of the measurement errors when you are trying to estimate true scores from observed scores.

48

Varimax Rotation and Structural Equation Modelling are two methods of....

Factor Analysis

49

Can a test have good reliability but poor validity?

Yes, using foot measurement as an IQ test

50

Can a test have good validity but poor reliability?

Generally not, but you could have a one-off

51

Can a test have good validity but poor utility?

Yes, e.g. drug patches on juvenile offenders.

52

Can a test have good utility but poor reliability?

Probably not, that would be difficult

53

Can a test have good utility but poor validity?

Yes, like the Myers-Briggs has questionable validity but is useful for on-boarding

54

There are 7 stages of a meta-analysis. What are they?

1. Articulate a research question
2. Identify a relevant population
3. Capture all the available evidence (studies) (see Cochrane Library and PRISMA framework)
4. Form meaningful measure of effect size
5. Pool studies in an appropriate manner and formulate a balance-of-evidence conclusion
6. Look for sources of variation in effect size
7. Look for possible bias in available evidence

55

Exercise 1 on Measuring Ourselves. What did we learn?

1. Definitions of concepts is important.
2. Require agreed method of measuring
3. Require time to administer instruments
4. Validity and Reliability are important
5. Correct choice of instruments
6. Rely on openness - willingness for self disclosure
7. Need to consider ethics at all times
8. Variety of sources of information
9. Cultural Differences
10. Can we really measure all concept - implications

56

Sources of Error - Test Construction (2 things)

1. To what extent do items adequately sample the construct being assessed?
2. To what extent are items clearly worded? Any ambiguities?

57

Sources of Error - Test Administration (5 things)

1. Distractions in the Environment
2. Emotional State of test taker
3. Attitude/Disposition of test administrator
4. Interpersonal Issues between test taker and administrator
5. Malfunctions of Instruments

58

Name 1 possible error with online testing

1. Internet Speed is an unknown variable. In a timing test, you loose precision.

59

The WAIS-IV was stratified on a US sample according to the following;

1. Geographic region: 4 Regions by census reports including MW, NE, SW….
2. Race - White, African Americans, Hispanics, Asians and other racial groups.
3. Age - 13 age groups represented (not all of equal range or size)
4. Sex - Under 65 Equal Numbers, ; over 65 proportional to population
5. Educational Level - 5 Educational Levels based on number of years completed

60

What are the benefits of assessment? (4 Things)

1. Diagnosis
2. Differentiate
3. Severity
4. Monitor/ Malingering