Assessment and Testing Flashcards

1
Q

Measurement

A

process of determining dimensions of an attribute or trait

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

assessment

A

processes and procedures for collecting info about human behavior

eg: tests, inventories, interview data, observation, rating scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

appraisal

A

implies going beyond measurement to making judgments about human attributes and behaviors

  • used interchangeably with evaluation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Interpretation

A

making a statement about the meaning or usefulness of measurement data based on the counselor’s knowledge or judgment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

measures of central tendency

A

distribution of scores can be measured using:

mean: symbolized by M or X (with horizontal line on top)
median: middle score
mode: most frequent scor

these 3 fall in same place when distribution is symmetrical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

skew

A

(see iphone pic)

refers to the degree to which a distribution of scores is not normally distributed

mode=top curve
median=middle score
mean=pulled in the direction of the extreme scores (which is represented by the tail)

a negative skew is with the tail pointing to left; positive is pointing to right (think of how values increase/decrease on horizontal axis)

Skewed left = negative
Skewed right = positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Standard Deviation

A
  • describes the variability w/in a distribution of scores
  • is essentially the mean of all deviations from the mean
  • an excellent measure of dispersion of scores
  • Use ‘SD’ to signify standard deviation from a sample
  • use sigma (think cursive ‘o’ without the first part) for population variability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

variance is measured how?

A

the SD squared (SDˆ2)

[NOT the square root of SD]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

normal bell curve

A

distributes scores into 6 equal parts–3 above the mean, 3 below–such that:

34% & 34%=68%, comprises 1 standard deviation

13.5% & 13.5%=95%, comprises 2 standard deviations

2% & 2%=99%, comprises 3 standard deviations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

standardized scores

A

are scores converted from the individual’s raw score that allows for comparison bw individuals and bw the same individual’s various scores (ie vocab and math)

they basically represent the person’s distance from the mean in terms of standard deviation

two most commonly used standardized scores:

z-score: the mean=0, standard deviation =1, rnge for SD is +3/-3 [the ‘z’ =zero]

T score: mean=50, standard deviation=10. Transforming this score eliminates negative numbers (unlike Z score)
[the “T”=Ten]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

two most commonly used standardized scores

A

z-score: the mean=0, standard deviation =1, rnge for SD is +3/-3 [the ‘z’ =zero]

T score: mean=50, standard deviation=10. Transforming this score eliminates negative numbers (unlike Z score)
[the “T”=Ten]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

stanine

A

from STAndard NINE
converts distributioin into 9 parts, with 5 in middle and SD of ~2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

correlation coefficient

A
  • measures reliability
  • ranges from -1.00 to 1.00 (perfect correlation
    -shows the reln’p bw two sets of #s, but nothing about cause and effect
  • if the reliability coefficient is high (>=.70), then it’s reliable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

bivariate vs multivariate

A
  • correlation bw 2 variables=bivariate
  • ” bw 3 or more variables=multivariate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

reliability

A
  • a necessary psychometric property of tests and measures
  • consistency of a test or measure
  • the extent to which a measure is free from error (if the instrument has little error, it’s reliable)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

stability

A
  • test-retest reliability using same instrument
  • 2 weeks is sufficient bw test administrations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Equivalence

A
  • alternate forms of the same test administered to same group
  • comparable forms of the tests, intervening events, and experiences will influence reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

spearman-brown formula

A

may use this to see how reliable a split half test would be had you not split it in two

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

other name for the spearman-brown formula?

A

prophecy formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

internal consistency

A
  • this is a split-half method where the test is divided into halves and the correlation bw these halves is calculated
  • determined by measuring inter-item consistency. the more homogenous the items the more reliable the test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what are the different formulas used to determine internal consistency and when are they used?

A

Kuder-Richardson formula used if the test has dichotomous items (ie true/false, yes/no)

Cronbach alpha coefficient is applied for nondichotomous items (ie multiple choice, essay)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is used to determine reliability?

A

correlation coefficient
- if the reliability coefficient is high (>=.70), then it’s reliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Kuder-Richardson formula

A

Denoted as KR-20 or KR-21

Kuder-Richardson formula used to measure internal consistency; if the test has dichotomous items (ie true/false, yes/no)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Cronbach alpha coefficient

A

Used to measure internal consistency
- is applied for nondichotomous items (ie multiple choice, essay)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
True vs error variance Coefficient of determination Coefficient of non determination
If 2 tests given and the correlation bw them is .9 (for example), then the true variance measured in common is .9^2=81%. coefficient of determination=degree of common variance (81%) coefficient of nondetermination=the unique variance, not common (19%=error variance)
26
Standard error of measurement (SEM)
Another measure of reliability helpful in interpreting test scores - helps determine the range in which a persons score probably falls - aka “Confidence Band” or “confidence limits” Ex: A person scores a 92 on a test, and Sem is 5. On a normal curve, 1SD away will be 97 and 1 below will be 87, which is where his score will be 67% of time. 95% of time his calcite will be be 82 and 102 (2SD’s away from his mean of 92).
27
Validity
Degree to which a test measures what it’s supposed to measure
28
Face validity
The instrument looks valid (i.e., a math test has math items)
29
Content validity
The test contains items drawn from the domain of items which could be included Ex: two pro professors of psychology, 101 device an exam that covers the content that they both teach
30
Predictive validity
Predictions made by the test are confirmed by later behavior Ex: the scores of the GRE predict later grade point average
31
Concurrent validity
The results of the test are compared with other tests, results or behaviors at/about the same time Ex: scores of an art aptitude test may be compared to grades already assigned to students in an art class
32
Construct validity
Refers to the extent that a test measures a hypothetical construct such as anxiety, creativity, etc.
33
Convergent validation
Occurs when there’s high correlation between the construct under investigation and others
34
Discriminant validation
Occurs when there is no significant correlation between the construct under investigation and others
35
Test may be reliable, but not valid, but valid test are reliable. True or false?
T
36
Another name for True variance
Coefficient of determination
37
Another name for error variance
Coefficient of non determination
38
Power based tests
No time limits or very generous ones (ie NCE)
39
Speed based test
Timed, emphasis on speed and accuracy (ie intelligence, ability, attitude)
40
Norm referenced assessment
Comparing individuals to others
41
Criterion referenced assessment
Comparing an individuals performance to some predetermined criteria such as NCE cutoff score
42
Ipsatively interpreted assessment
Comparing results on test within the individual, not to other others. May also compare an individual’s score on one test with another.
43
A maximal performance test may generate a person’s ______ on __________
Best performance; aptitude or achievement test
44
A typical performance may occur on what types of test?
An interest or personality test
45
what is meant by regression toward the mean and what is another name for it?
aka "statistical regression" if a person scores very high (>=85%) or very low (<=15%) on a pretest then they will probably score closer to the mean on the post test. Why? Because of the error resulting from chance, personal, and environmental factors.
46
defn of intelligence
ability to think in abstract terms; to learn. also called general or cognitive ability
47
Intelligence tests
Stanford-Binet Intelligence scales Wechsler adult intelligence scale (WAIS-IV) Wechslet intelligence scale for children (WISC-V) Cognitive abilities test
48
specialized (intelligence) ability tests
Kaufman assessment battery for children - II System of multicultural pluralistic assessment (SOMPA). Measures medical, social systems and pluralistic factors SMAG: SAT (scholastic aptitude test) Miller Analogies Test (MAT) ACT (american college test) Graduate record exam (GRE)
49
what do achievement tests measure and example of them
measures what a person has already learned/experienced - used diagnostically (K-12 achievement tests) - National assessment of educational progress (NAEP) is a national measure of academic performance [there's the national level and the state "levels" below]` - California Achievement tests - Iowa Test of basic skills - Stanford achievement test
50
specialized achievement tests
- general education development (GED) - college board's advanced placement program - college-level examination program (CLEP)
51
what do aptitude tests measure?
also called ability tests, aptitude measures one’s potential to learn; used to predict future performance
52
examples of aptitude tests
-Differential aptitude test (DAT) - O*NET ability profiler (formerly General Aptitude Battery Test, GATB) - ASVAB - Career ability placement survey (CAPS)
53
Projective tests. what do they do and examples
present an unstructured task and the person projects processes, needs, anxieties... ex: Rorschbach TAT (thematic apperception test) Rotter incomplete sentences blank Draw a person test
54
types of personality inventories
- minnesota multiphasic personality inventory - Californiia psychological inventory (CPI) - NEO Personality inventory - Beck Depression Inventory - MBTI
55
examples of Interest tests
- Strong interest inventory - self-directed search - career assessment inventory - campbell interest and skill survey - ONET interest profiler
56
Intrusive vs unobtrusive measurement
Intrusive: reactive measurement where the person being measured knows they're being watched and this knowledge affects their performance. - Ex: questionnaires, surveys, observation Unobtrusive: nonreactive where data is collected without the person's awareness or without changing the natural course of events. - Ex: reviewing existing records or unobtrusive observation
57
Semantic differential
refers to a scale that asks respondents to report where they are on a dichotomous range bw two affective polar opposites. - ex: Very bad ____ _____ _____ Very good - adjective pairs usually have an evaluative, potency, and activity underlying structure that serves as a secondary analysis
58
Observation as appraisal technique
- Observing samples from a stream of behavior - may use schedules, coding systems, or record forms
59
Case/historical study
Analytical and/or diagnostic investigation of a person or group
60
Rating scales
Used to report the degree to which an attribute or characteristic is present
61
Sociometry
- Used to identify isolates, rejectees , or stars (popular ppl) - Requires revealing personal feelings about each other
62
Social desirability
Tendency for test takers to respond in ways that are perceived to be socially desirable
63
Grade and age equivalent scores
Scores on an achievement test often reported as grade equivalent scores. I.e., if a student completes the number on a test that the average sixth graders scores, then he has a grade equivalent score of six. Age equivalent scores work similarly. For agr, an individual score is compared to the average score of others at the same age. So if a 7.5 year-old student earned a score equivalent to an eight-year-old, then 8 would be his age equivalent score.
64
Percentile ranks
Indicate the percent of people who scored above or below. So if I score in 35th percentile, then I scored higher than 34% of the people and 65% scored higher than me.
65
Assessment resources
Mental measurements year - from Buros Institute - has critical reviews of tests and lists published references of 20th edition published in 2017 Test and print IX (2016) - Has information on approximately 3000 testing instruments A comprehensive guide to career assessment - Published by national career development Association - edited by Kevin Stoltz and Susan Barclay in 2019
66
Association for assessment and research and counseling
One of 18 divisions of the ACA
67
The ________ indicates the percentage of individuals who answered each item correctly
difficulty index A 0.5 difficulty index (also called a difficulty value) would that 50% of those tested answered the question correctly, while 50% did not. For example, you set the difficulty index to .25 in order to ferret out the lower 75% you do not wish to admit into a program. Item difficulty ranges from 0.0 to 1.0. The higher the index number, the easier the question is to answer.
68
vertical vs horizontal testing
a vertical is the same subject test given to different levels or ages ; test would have versions for various age brackets or levels of education (e.g., a math achievement test for preschoolers and a version for middle school children). A horizontal test covering material across various subjects; measures various factors (e.g., math and science) during the same testing procedure.
69
what is a test battery?
In a test battery, several measures are used to produce results that could be more accurate than those derived from merely using a single source. (ie horizontal test)
70
What does Inter-rater testing assess? What are other names for it? When is it used?
Assesses reliability in qualitative research Other Names: Inter observer, scorer reliability Used with subjective tests to determine whether the scoring criteria are such that two people who graded or assessed the responses will produce roughly the same score - Is reliability calculated by correlating responses of several readers
71
What is the acceptable reliability coefficient for job selection?
>=.8
72
Francis Galton
Felt intelligence Was genetically determined Coined “eugenics” Said intelligence was normally distributed like height or weight, and it was primarily genetic
73
Differences between fluid intelligence and crystallized intelligence and who made it famous?
Raymond Cattell Fluid intelligence is flexible crystallized is rigid and does not change your data
74
Charles Spearman
Felt intelligence was best explained via two factor theory—a general ability G and a specific ability S
75
JP Guilford
Isolated 120 factors that added up to intelligence; known for his thoughts on convergent and divergent thinking
76
The Stanford-Binet IQ test is standardized, T/F?
T
77
Simon and Binet pioneered the first IQ test to_____
identify children with an intellectual disability so they could be taught separately
78
What happens to a test’s reliability if you increase or decrease its length.
Increasing a test’s length raises reliability, shortening a test’s length decreases reliability
79
Describe variance
How much the data points differ from the mean. However, like SD, it only describes one variable. Think of it this way: SD=square root of variance
80
Who are the audiences for: WAIS WPPSI WISC
Adults Preschoolers (pee pee) Children (C= children)
81
Normative format
Means of testing to compare individuals to others
82
What three things make up a real experiment?
Control group, experimental group, must be randomized Note: a quasi experiment is missing one of these
83
Example of convenience sampling
People exiting a grocery store and you ask them if they want to taste test a Coke or Pepsi
84
What is an acceptable probability level for tests in social science and what does probability level mean?
.05 It means there is a 5% chance of the differences between the control and experimental groups being due to chance.