testing and measurement 2 Flashcards

1
Q

6 Steps to Test Development

A

1) defining purpose
2) preliminary design issues
3) item prep
4) item analysis
5) standardizing & research
6) prep of final product

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Step one of test development

A

Statement of purpose, simple one sentence

-include character trying to measure, target

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Preliminary Design Issues

A

Step to:
Mode of administration, length, item format, number of scores, score reports, administrator training and background research

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Mode of Administration

A

Group or Individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Item Format

A

multiple choice, true/false, agree or disagree, or constructed by the responder (written answers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Number of Scores

A

Related to length, how many scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Score Reports

A

computer generated, hand written? total score, norms, subgroups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Administrator Training

A

Extensive professional training to administer, score and interpret? How will that be provided? Or no training?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Background Research

A

standard lit on things being studied, and study of clinicians who would use the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Anatomy of a Test Item

A

Stimulus, Response Format (Conditions Governing Response), Scoring Procedures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Stimulus

A

the question being asked

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Response Format

A

how can the person respond? Multiple Choice or T/F or constructed (meaning anyway you want)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Constructed Response

A

The person taking the test respond in anyway they choose, written responses, free response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Conditions Governing the Response

A

what influences response, time limit, can the administrator ask for clarification, answer sheet or writing etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Scoring Procedures

A

Partial credit, correct/incorrect, constructed response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Two Types of Test Items

A

Selected-Response Test Items, Constructed Test Items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Selected-Response Test Items

A

multiple choice, forced choice, likert format, true/false items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Scoring Selected-Response Items

A

correct/incorrect, sometimes using weighted questions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Constructed Response Example Items

A

Essay Test, Performance Assessment, Portfolio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Scoring Constructed-Response Items

A

need to have inter-rater reliability, and conceptualizing a scheme for scoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Holistic Score

A

scoring constructed response items by the rater giving them one whole score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Analytic Scoring

A

constructed response item scoring where the rater assesses different dimensions of the test (and they might even be rated by different people)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Point System

A

Point system of scoring Constructed Response Items, awarding points for certain predetermined aspects of things

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Automated Scoring of Constructed Response Items

A

Using sophisticated computers to judge free responses by simulating human judgement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Suggestions for Writing Selected-Response
Extensive List: but keep it simple, get to the point, don't give away the answer
26
Suggestions for Writing Constructed-Response
Task Should be clear, specific about scoring system when item made, use a sufficient number of items
27
Pros of Selected Vs. Constructed
Scoring reliability, takes less time to get more information, the scoring is more efficient
28
Pros of Constructed Vs. Selected
easier to understand how test taker thinks, they can explore more personal difference (oddities that wouldn't come up in selected response)
29
Item Analysis
involves item tryout, statistical analysis and item selection (figuring out which items are 'good' or 'bad')
30
Item Tryout
two stages, formal and informal
31
Formal Item Tryout
administering test items to samples of target population
32
Informal Item Tryout
very small groups of the population asked what they think about the items, or think aloud as they complete them
33
Item Difficulty
percent of population who gets something right or wrong
34
P-Values
the item difficulty levels are often called this, meaning the p (percentage) who got it right
35
Item Discrimination
Item's ability to differentiate statistically between groups of examinees
36
Distractor Analysis
a distractor is an incorrect or non-preferred item, and analyzing those shows misinterpretation of question etc
37
Factor Analysis
Used to determine which items are going to provide better scores
38
Item Selection
Choosing which items that 1) increase reliability of test, 2) finding the right average difficulty, 3) items that can discriminate between groups and 4) D (discrimination) when P (difficulty) is at its midpoint, 5) make sure the content is actually covered, don't eliminate important items
39
Standardizing Program
shows the norms for the test
40
Equating Programs
making sure tests equate to one another
41
Publishing Tests Materials
Technical Manuals, Score Reports, Supplementary Materials, Test Completed, Administrator Training
42
Continuing Research on New Tests
Updating new norms and discovering applicability
43
Two Classical Theories of Intelligence
Spearman's g, and Thurstone's primary mental abilities
44
Spearman's Theory of Intelligence
Intelligence, g, is general intelligence. S were a variety of tests/subtests that made up g. Two factor, g and s, theory
45
Thurstone
Primary Mental Abilities theory of intelligence
46
Primary Mental Abilities Theory of Intelligence
Thurstone's, originally 9 mental abilities, a multiple-factor theory
47
The Original Nine Primary Mental Abilities
Spatial, Perceptual (speed of perception), numerical, verbal, memory, words, induction (finding a rule or principle to solve a problem), Reasoning (arithmetic), Deduction (factor weakly defined calling for application of a rule)
48
Hierarchical Model
Compromise, different intelligences are arranged with some more important than others
49
Cattell
Fluid and Crystallized Intelligence
50
Hierarchical Characteristics
Complex factor analysis, separate intelligences, some better than others
51
One Vs. Many
argument of intelligences, Spearman says 1, Thurstone says many
52
Gc
crystalized intelligence by Cattell, sum of everything one has learned, mental skills, education, relationships etc.
53
Gf
General fluid intelligence is the raw mental power, potential for intelligence
54
Additional Factors for Cattell & Horn's Model
short and long term memory, visual and auditory skills, processing speed on easy tasks, decision speed (problem solving tasks) and quantitative reasoning
55
Vernon's Model
Hierarchy, all under g, then split into v:ed (verbal:educational) and then into k:m (spatial:mechanical) and then some of the other skills cluster under these (numbers, psychomotor, reading)
56
Carroll's Summary
Three-stratum theory
57
Three Stratum Theory
g at the top, then Gc and Gf (as well as others, some like Thurstone's), third level there are more specific abilities
58
Developmental Theories
1) stages, 2) stages happen in the same order for all people (if not the same time), 3) stages are cumulative and not reversible
59
Piaget Theory of Cognitive development
4 stages
60
Sensorimotor
no object permanence, lack of input | birth-2 yrs
61
Preoperational
use words to symbolize, lacks principles of conservation | 2-6 yrs
62
Concrete Operational
Uses principles of conservation and reversibility | 7-12 yrs
63
Formal Operational
Mature Adult thinking in terms of hypotheses, cause and effect 12+ yrs
64
Information Processing Model
theory of intelligence that focuses on how people processes what happens, computer processing
65
Biological Models
brain functioning, as the basis for understanding human intelligence
66
Assimilation
putting things into your schemas, all four legged animals are dogs to kids
67
Accommodation
changing your perception to fit reality, horses aren't dogs
68
Howard Gardener
Theory of Multiple Intelligences, at least 8 intelligences
69
Gardener's 8 Multiple Intelligences
``` Spatial Linguistic Logical-mathematical Bodily-kinesthetic Musical Interpersonal Intrapersonal Naturalistic ```
70
3 Things to Remember about Group Differences
1) Distributions mostly overlap, even if averages are slightly different, 2) a difference doesn't tell us why, 3) difference are always changing, and may not last forever
71
Differences in Intelligence by Sex
minimal in terms of total scores, some difference in verbal and spatial skills. More males tend to perform very high or very low
72
Group Age differences
steep increase: 0-12 Maximum: 16-20 Level: 25-60 Period of Decline: 60+
73
Flynn Effect
Increase in IQ scores of 3 pts per decade (meaning 12 pts across generations)
74
Group intelligence by Race
Hispanic and Native America lower than white by 1/2 to 1 SD Black about 1 SD below white Asian about 1 SD above on non verbal
75
Number of Chromosomes
23 pairs, 46 total | DNA effects intelligence
76
Behavioral Genetics
the genetic and environmental basis for differences in psychological traits
77
Heritability of Intelligence
.6 of general intelligence is related to genes
78
Common Features of Individual Intelligence Tests
1) most are individually administered 2) administration requires advanced training ) 3) wide range of age and ability (w/ start and stop rules 4) establish rapport with examinee 5) free response 6) immediate scoring 7) about 1 hr for test 8) opportunity for observation
79
Two Main Uses of Intelligence Tests
Clinical/School/Counseling or for research
80
Sir Francis Galton
using simple measures, studied heritability, and used bivariate/invented bivariate distribution
81
Alfred Binet
Father of intelligence testing, Binet scale 1905, mental age
82
Lewis Terman
revised Binet to the Standford-Binet
83
Intelligence Quotient
Mental Age/True Age times 100= intelligence
84
Arthur Otis
Revised Standford Binet and invented army alpha and beta for group testing
85
David Weschler
Invented Weschler tests, with standardized IQ score of M=100 and SD= 15
86
Frequency Distribution for Weschler Scale (3 SD)
55 is 3 SD below, 145 is 3 SD above, 99.7% fall within these numbers, .3% fall outside (below or above)
87
99.7% on 100 mean IQ
fall between 55 and 145
88
Validity of IQ Tests
Predicts school success, reliability of .60/.70
89
Army Alpha
Created by Otis, Army intelligence tests for literate recruits
90
Army Beta
Created by Otis, Army intelligence tests for illiterate or immigrant recruits
91
Number of Group Mental Ability Tests given per year
50 million
92
Achievement tests and group Mental Ability Tests
given together they show differences between ability and achievement
93
Four Major Uses of Group Mental Ability Tests
1) in schools 2) Predicting success 3) job selection 4) research in social and behavioral sciences
94
8 Common Characteristics of GROUP mental ability tests
1) given to large groups 2) content similar to individual 3) multiple choice, machine scored 4) fixed time limit/number of questions 5)45-60 min OR 2.5- 3 hours 6) one total score, several subscores 7) normative samples very large 8) main purpose is prediction
95
Start-Stop Rules
For individual tests, tells the person where to start or stop the given questions
96
Multilevel Tests
group tests are multilevel, there are different tests for different ages or grades
97
Otis-Lennon School Ability Test
most widely used, scholastic achievement for scholastic grade levels
98
OLSAT 8 Structure
there are 7 levels for Kindergarten through grade 12, about 1 hour, test levels overlap to assess students above or below
99
OLSAT 8 Framework & Items & Philosophy
Uses Vernon's hierarchical model, looks for V:ed, and there are 8 items with cluster
100
OLSAT 8 Item Clusters
Verbal comp, verbal reason, pictorial reasoning, figural reasoning, quantitative reasoning
101
OLSAT 8 Scores
total score: verbal + nonverbal and these three scores are used to find SAI: School Ability Index: M= 100, SD= 16 Normed by age to 3 months Used to predict performance
102
OLSAT 8 Norms
Normed for both fall and spring, and for Socio economic status, geographic region and ethnicity
103
Two Cautions about OLSAT 8 Norms
1) We don't know who wasn't there, thus excluded from norms 2) We don't know motivation of students (how hard were they trying)
104
OLSAT 8 Reliability (total, verbal/nonverbal, SEM)
Internal Total= .89 to .94 Verbal and Nonverbal= .81 to .90 No Test-Rest Data Standard Error of Measurement- 5.5 to 5.8
105
OLSAT 8 Validity
Criterion related with the Stanford Achievement Test, no factor analysis though
106
College Admissions Tests 3 Purposes
1st: Selection of Students 2nd: placement into courses in college 3rd: describe the college (our students average score = ____)
107
SAT 2005
Critical Reading, Math, Writing (as of 2005) attempts to measure general abilities developed in school Takes 3 hours, 20 min
108
SAT Test Items
Critical Reading, Mathematics, Writing
109
Critical Reading SAT
Sentence Completion, Reading passages | 25, 25, 20= 70 total in 3 sections
110
Mathematics SAT
Multiple chice, grid-in | 25, 25, 20= 70 in 3 sections
111
Writing SAT
essay, multiple choice | 23, 25= 60 total in 2 sections
112
HSGPA, SAT = college success
correlate about equally with success, .5 reliability
113
SAT Scores and Norms
M= 500 SD= 100, range= 200 to 800 | Percentile norms adjusted annually, norms determined nationally
114
Reliability of total scores, main tests, subscores and writing in SAT
Total= .95 Main (math and reading)= .85-.90 Subscores = .65- .85 Writing = .6
115
SAT Validity
actually uses predictive validity, compares to Freshman GPA | Multiple Regression for HSGPA, FGPA and SAT to FGPA then both
116
Weakness of SAT Validity
can only be compared to those who go to college
117
SAT Correlations to FGPA
FGPA & SAT= .5 FGPA & HSGPA= .5 Both and FGPA= .6 (incremental validity)
118
GRE (Graduate Records Examination) | as of August 2011 content
Verbal Reasoning, Quantitative Reasoning, Analytical Writing
119
GRE Reliability for Verbal, Quantitative, Analytical
Verbal= .93 Quantitative= .94 Analytical Writing = .79
120
Culture Fair Tests
Trying to create tests that are fair to people across cultures
121
Raven's Progressive Matrices
Example of Culture Fair attempt, lots of non-verbal, measures nonverbal g well, 3 versions (colored, standard, advanced). Uses lots of pattern completion
122
Three Generalizations of Culture Fair tests
1) Mainly measure figural/spatial intelligence (not general) 2) less predictive for jobs and school than verbal 3) still present group differences
123
Six Generalizations about Group Mental Ability Tests
1) same content as Individual, 2) Reliability good for total, less for sub-scores 3) predictive validity between .30-.60, 4) sub-group validity generally poor 5) Range restrictions and imperfect reliability in criterion 6) Culture-Fair tests still don't exist
124
6 reasons for Clinical Neuropsychological (CN) Assessment
1) Diagnosis, 2) Finding strengths and weakness 3) vocational and educational planning 4) treatment planning and evaluation 5) forensics 6 researc
125
Fixed Battery Neuropsychological tests
standardized tests given to everyone with fixed cut offs | Example: Halsted-Reitan Neuropsychological Batter
126
Impairment Index
uses a cut off point to determine if there is or is not neuropsychological deficit
127
General Neuropsychological Deficit
reflects severity of neuropsychological deficit | -good test-retest reliability, discriminates those with brain damage from not with 80% accuracy
128
Flexible Batteries of Neuropsychological Tests
Varies by reason for referral, clinical data, ability to cooperate, information obtained, tailored on individual basis
129
Mini-Mental Status Examination (MMSE) | Structure and scores
most routinely administered 11 questions, 30 points Scores 24-30 are in normal range (but may still have impairment)
130
MMSE Assesses What:
Orientation, Attention/Concentration, Language, Cognitive Flexibility, Constructional Ability and Immediate or brief delay recall
131
Premorbid Intelligence/Achievement
Intelligence before the onset of impoairment, school records or Wechsler often used
132
Continuous Performance Tasks (CPT) (How it Works)
one of many ways to evaluate attention - measures the ability to respond to sequential presented target stimuli and not respond to non-rarget stimuli over long period and in face of boredom
133
The Continuous Performance Tasks Measures
Ability to maintain alertness/vigilance or sustained attention
134
Continuous Performance Tasks Brain Areas
reticular formation and the frontal lobes
135
Wechsler Adult Intelligence Scale (WAIS-IV) Working Memory Subset
Another way to evaluate attention
136
WAIS - IV Working Memory Subset item types
Arithmetic, Digit Span (forward and backward) and letter-number sequencing
137
Brain Areas the WAIS-IV Working Memory Subset tests
frontal lobes, particularly the dorsolateral prefrontal cortex
138
Trail Making Test (Part A Halstead-Reitan)
measures attention | draw a line connecting 25 numbered circles as quickly as possible without lifting the pencil
139
Most Frequent Linguistic Impairment
Naming ability, called dysnomia
140
Dysnomia Assessment
assessed by procedures that require the naming of line drawings on visual confrontation
141
Boston Naming Test (structure and brain area)
60 line drawings to be named, increasing difficulty | Brain Area: left temporal lobe
142
Controlled Oral Word Association Test (COWAT)
Looks at Verbal Fluency in two categories 1) Letter (phonemic) and Semantic (category)
143
Letter (Phonemic) part of COWAT Assesses what skills and what brain area
Controlled Oral Word Association Test 60 seconds Words that begin with letters (no proper names or repeating) Measures Frontal Lobe
144
Semantic (Category)
60 seconds categories brain area: temporal lobe
145
Block Design for finding Constructional Apraxia (test)
WAIS-IV Subtest which requires the person to reproduce a 2X2 or 3x3 design with red and white blocks
146
Constructional Apraxia
Inability to assemble or copy 2 or 3 dimensional objects (Visual Spatial)
147
Hooper Test of Constructional Apraxia
30 common objects that have been cut into pieces (visually) and examinees need to reassemble in their heads and name object
148
Spacial Neglect
Inattention to one side of space (usually left parietal lobe and right parietal lobe damage)
149
Line Bisection Spatial Neglect Test
Examinee is asked to bisect lines on a page placed at midline
150
Clock Drawing Spatial Neglect Test
examinee first is asked to draw a clock to command (as in 10 min after 11) and then to copy a clock drawing with the hands already set
151
Memory & Neuropsychological Evaluation
Memory is the most frequent complaint by persons referred for neurological evaluation
152
Example of Nonverbal Memory Test (& brain area)
verbal = left hemisphere | California Verbal Learning Test (CVLT-II)
153
Rey Complex Figure Test (whats it for and brain area)
Measures Nonverbal memory- right hemisphere | RCFT
154
CVLT - II (Name, Structure, Function)
California Verbal Learning Test (verbal memory) list, distractor list, immediate and long delay recall, yes/no long delay recognition trial and finally forced choice long-delay recognition trial
155
Dementia
progressive/incurable disorder marked by memory loss and disturbances of higher mental functions (5% of older adults)
156
Dementia Diagnosis
a noticeable decline from previous social or occupational functioning (not related to medical/psychiatric conditions, 2) significant impairment of memory function 3) at least one of the following: aphasia, apraxia, agnosia, executive dysfunction
157
Aphasia
impairment of language
158
Apraxia
decline in motor skills
159
Agnosia
inability to identify familiar object/faces
160
Executive Dysfunction
difficulty planning etc
161
Pseudodementia
related to psychiatric condition, and cognitive impairments similar to dementia
162
Characteristics of Pseudodementia
-depressed mood, no language impairment, better recognition than recall memory, gives up easily but will persist with encouragement, saying 'i don't know' not wrong answer, problems often improve with encouragement, retesting or antidepressants
163
3 Areas of Motor Functioning Tested
Grip Strength, Fine Motor Coordination, Motor Speed
164
Hand Dynamometer (main test and what tested)
Measures Grip Strength | Part of Halstead-Reitan
165
Grooved Pegboard Test
Fine motor coordination
166
Finger Tapping
Measures Motor Speed | part of Halstead-Reitan
167
WAIS-IV Psychomotor Tests
Motor Speed | Uses cancellation, coding and symbol search
168
Supervisory Cognitive Processes
involved in the organization and execution of complex thoughts and behaviors Part of Executive Functions
169
Three Processes Underlying Executive Functions
1) working memory 2) inhibition and switching 3) sustained and selective attention
170
Executive Function Tasks Used to Measure
Tower Test, Trail Making Test (Part B), Stroop Interference Test, Wisconsin Card Sorting Test
171
Stroop Interference test
words reading/color naming/incongruent color naming
172
Stroop Effect
also called interference effect, when red is printed in green, say color instead of ink
173
Stroop Interference Test Activates Parts of Brain
dorsolateral prefrontal cortex (preparing to exert conscious control) and anterior cingulate
174
Anterior Cingulate
involved in consciously regulating conflicting cues and inhibiting responses that are incorrect
175
Wisconsin Card Sorting Test
Most frequently used Executive Function 4 stimulus cards: 128 response cards sorting into categories, not told what criteria is, continues until 6 categories are completed (6 categories - 10 each)
176
MMPI-2
Minnesota Multiphasic Personality Inventory most frequent used objective inventory used to assess psychological state
177
BDI-II
Beck Depression Inventory most widely used self-report of depression asses psychological state
178
Malingering
faking deficits for secondary gain
179
MMPI-2 Fake Bad Scale
used for malingering
180
TOMM
Test of Memory Malingering | 50 item recognition (two learning trials & and optional retention trial)
181
Supplementary Information to Evaluate Neuropsychological Assessment
Medical History, Psychiatric History, Psychosocial History, School Records, Collateral Information (family, friends, caretakers) and behavior observations