Psychological Testing #2 Flashcards

1
Q

*Restriction of range

A

Makes test-retest reliability low, because not very many subjects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is standard error of measurement?

A

Theoretically, if the subject took many tests, their various scores would result in a normal curve. That curve would have units of standard deviation. This SD unit is an SEM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a confidence interval?

A

A confidence interval is the amount of confidence we have that our score falls within a certain range, based on the intervals of SEM. It is given in a percentage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the standard error of the difference?

A

A statistical measure that can help a test user determine whether the difference between scores is significant. It is usually used for sub-scores on a test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ch4: What is validity?

A

Does the test measure what it claims to measure?

A test is valid to the extent that inferences made from it are appropriate meaningful, and useful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the relationship between validity and reliability?

A

If a test is not reliable it’s not going to be valid. However a reliable test can be invalid. Something can be consistently bad. (You have to understand the relationship between reliability and validity.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do we mean by a continuum of validity?

A

Validity cannot be captured in statistical summaries, instead it is on a continuum ranging from weak to acceptable to strong, based on the three types of validity evidence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the three categories of accumulating validity evidence?

A

Content validity
Criterion-related validity
Construct-validity

An ideal validation includes several types of evidence in all three categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is face-validity?

A

Well for one, it’s not actually validity. It’s how the test looks to examinees. It’s important because it can impact a person’s approach to the test. It’s loosely related to content validity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is content validity?

A

Content validity is determined by the degree to which the questions, tasks, or items on a test are representative of the universe of behavior the test was designed to sample. Especially useful when a great deal is known about the construct.

Item sampling - (Behavior) Do the items on the test fit the content for what you’re wanting to test. If I’m testing 4th grade math level, and I examine skills that aren’t taught until 5th grade, then that’s poor content validity.
Types of skills - (Responses) Multiple choice or open ended?

“Expert review” is often the choice of evidence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is criterion-related validity?

A

The test score is compared to an outcome measure (criterion). The criterion can be concurrent, e.g. people take a new IQ test and and established IQ test at the same time. The criterion can also be predictive, like in college readiness tests and employment tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What makes a good criterion for criterion-related validity?

A

RELIABLE - consistency of scores.

APPROPRIATE - Well duh, but actually sometimes this can be tricky. Should the criterion measure of an aptitude test indicate satisfaction, success, or continuance in the activity?

FREE FROM THE CONTAMINATION OF THE TEST - This is where that becomes a problem, when your criterion becomes contaminated because of the test score. I want to see if this is useful, but you already used to test to determine who you hired.It can also be contaminated by overlap between questions, e.g. if both tests ask about eating habits and sleeping habits will artificially inflate the correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is decision theory?

A

The purpose of psychological testing is not measurement for its own sake, but measurement in the service of decision making.

Making decisions based on test scores results in a matrix of outcomes. With hits and misses (false positives and false negatives). You have to determine where you want your mistakes to be.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is construct validity?

A

A construct is a theoretical, intangible quality or trait in which individuals differ. Construct validity is theory based: Based on my understanding of this particular construct, what would I expect to see in a test?

No criterion or universe of content is accepted as entirely adequate to define the quality to be measured, so a variety of evidence is required to establish construct validity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is test homogeneity?

A

A measure of construct validity.

Does it measure a single construct?

If my theory about this is a unitary construct and I do internal consistency and it looks like it’s just one construct. It could be measuring one thing, but it might not be the right thing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are appropriate developmental changes?

A

A measure of construct validity.Is my construct something that changes as people age?Ego-centrism would have different results. The scores should go down as kids get older.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are theory-consistent group differences?

A

A measure of construct validity.Can we predict who will have high and low scores for this construct?Different rates of extroversion in different professions. Nuns are high in social interest. Models and criminals are low in social interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are theory-consistent intervention effects?

A

A measure of construct validity.Does the construct change in the appropriate direction after intervention/treatment?People’s scores of spatial orientation should increase after training, more than those who did not receive training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is convergent and discrimination validation?

A

A measure of construct validity.What should it correlate with and what should it be different from? Intelligence and social interest are theoretically unrelated.Anxiety and eating disorders overlap.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is factor analysis?

A

A measure of construct validity.How many factors are you actually measuring?If you think you’re measuring three factors, and a factor analysis shows three factors, that’s a good sign.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is classification accuracy?

A

A measure of construct validity.
How well does it give accurate identification of test takers? Test makers strive for high levels of:
SENSITIVITY: Accurate identification of patients who have a syndrome.
SPECIFICITY: Accurate identification of normal patients.

These are measured by percentages. Sensitivity: 79% (correctly identifies 79% of affected individuals). Specificity: 83% (correctly identifies 79% of unaffected individuals).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are extravalidity concerns?

A

Side effects and unintended consequences of testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are some of the unintended side effects of testing?

How do we prevent extravalidity problems?

A

AKA Extravalidity concerns.

Children identified my feel unusual or dumb. Legal consequences. Test should also be evaluated for (1) values in interpretation, (2) usefulness in particular application, and (3) potential and actual social consequences. Along with traditional validity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does NOIR stand for?

A

Nominal
Ordinal
Interval
Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is a nominal scale?

A

Where the scales are simply categories, without any absolute order.

Male = 1, Female = 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is an ordinal scale?

A

A scale with categories following a specific order, but the distance between the categories is variable.

Freshman, Sophomore, Junior, Senior.
Ranking something from most liked to least liked

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is an interval scale?

A

A scale in which the units have an order and equal distance between each unit. It does not posses an absolute 0. A Likert scale is considered an interval scale for statistical purposes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is a ratio scale?

A

A ratio scale is rare in psychological measurement. A scale with an absolute 0, which also allows for categorization, ranking, and intervals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What are some scaling methods? Which ones are best?

A
"No single scaling method is uniformly better than the others." 
Expert Ranking
Likert scales
Guttman scales
Empirical keying
Rational scale construction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What’s an example of expert ranking?

A

The Glasgow Coma Scale

How would experts rank each of these responses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What are methods of absolute scaling?

A

A procedure for obtaining a measure of absolute item difficulty based on different age groups of test takers. You don’t want questions to be bunched around certain ages and leave gaps at others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is empirical keying?

A

You develop a long list of questions and try them out on contrasting groups (depressed/not depressed, delinquents/non-delinquents) and try and see if the groups answer the questions differently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is the heart of the method of rational scaling?

A

That all the scale items correlate positively with each other and also with the total score for the scale. The questions need to correlate with each other, or we won’t keep them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What are the initial questions of test construction?

A

Range of difficulty
Item format
Item difficulty
Item-discrimination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How would range of difficulty be different for different types of tests?

A

Norm-referenced tests would have a greater range of difficulty, because we want to know who the outliers our.
Criterion-referenced tests would be more restricted, because no one cares if you’re in the 99th percentile of drivers on your driving test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What are some examples of item format and what are their strengths and weaknesses?

A

A multiple choice questions can capture conceptual as well as factual knowledge and can be easily judge for fairness based on statistics. However, they can be difficult to write with good distractors, and they can can queue a half knowledgeable respondent.

Matching questions are problematic because the responses may not be independent.

True/false questions can be easy to understand but people may choose the most desirable answer.

Forced choice questions can prevent people from picking the most desirable option, but they haven’t been embraced yet by test developers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What are the best types of items to use?

A

It depends on the test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

How do we measure item difficulty?

A

We measure how many people get the item correct.
An item with a .3 is an item 3% of people got correct. So it’s hard. An easier question would be a .8.

Generally, item difficulty hovers around .5 with a range of .3-.7, but this will change depending on the type of test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What are the two types of item-discrimination?

A
  1. High vs low scorers - If a lot of the high scoring people get it right, and the low scoring people get it wrong, it’s a good question. So what if most of the people As and Bs get it wrong and the people who get Cs and Ds get it right? Then there might be a problem with the key or the question is poorly worded.
  2. Analysis of item choices - What was the variability of the choices? Did everyone guess A and B and no one guesses C or D? Then C and D are wastes of space. You want good distractors. Occasionally, B could be too close to A, so you want to make the distractor less like the actual answers.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is cross-validation and how is it related to validity shrinkage?

A

Cross-validation means using the original regression equation in a new sample to determine whether the test predicts the criterion as well as it did in the original sample. Because the test was developed based on the original sample, it follows that it would correlate less with the second sample. This phenomenon is called validity shrinkage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

How might you get feedback from examinees, and how will that contribute to test development?

A

You can give questionnaires to the examinees after the test or you can have them think aloud about it in an open-ended manner.

The Inter-University entrance exam was modified in numerous ways in response to feedback. Time limits on some sections were increased. Perceived culturally unfair items were deleted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

How are testing materials important?

A

Tri-fold board instead of just one piece of cardboard. Books that stand up on their own. Intelligence tests have a lot of components that need to be manipulated, and on top of those manuals, stopwatches, and small children.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What are the two manuals you need for a test and why?

A

Technical manual and user’s manual - A test user needs both of these. The technical manual tells you the background and helps you determine if you want to use the test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What is an real definition and how is it different from an operational definition?

A

A real definition is one that seeks to tell us the true nature of the thing being defined. An operational definition is a definition of a concept in terms of the way it is measured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What are the shortcomings of operational definitions of intelligence?

A

They are circular: “What the tests test.”

They block further progress in understanding the nature of intelligence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

How does the textbook define intelligence?

A

Intelligence is:

  1. The capacity to learn from experience.
  2. The capacity to adapt to one’s environment.

These two themes occur again and again in definitions of intelligence. Many textbooks also include the ability to engage in abstract reasoning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is Spearman’s theory of intelligence?

A

Two factors - G and S

G = General factor - This is what Spearman emphasized. Your score on a test would be strongly affected by G. So he wanted tests that would measure G.

S = Specific ability - Like verbal skills, spacial skills.

48
Q

What is Thurstone’s theory of intelligence?

A

Unlike Spearman, Thurstone didn’t believe in a general ability or G factor. Instead he said there were several broad factors like verbal comprehension and perceptual speed. He called these Primary Mental Abilities.

49
Q

What is Cattell-Horn-Carroll’s theory of intelligence?

A

Their had a fairly complex theory with lots of pieces.

They said there were three types of intelligence, which kind of combine Spearman’s and Thurstone’s theories.

1) Pervasive (similar to G)
2) Broad (similar to Primary Mental Abilities)
3) Specific (similar to S)

Their broad factors included the differentiation of fluid and crystallized intelligence.

50
Q

What is fluid intelligence?

A

Higher level reasoning, like testing hypothesis, inductive reasoning, etc. Mostly non-verbal and not culturally bound. Could also be considered the process for the process for solving problems.

51
Q

What is crystallized intelligence?

A

Our acquired knowledge, what we accumulate across time. Especially cultural knowledge and language.

52
Q

What is Guilford’s theory of intelligence?

A

Structure of Intellect Model
Guilford went a little overboard and came up with 150 factors of intelligence. He had to simplify that somehow, so now we have these three things:

1) Operations - What kind of intellectual operation required by the test? Is memorization or evaluation?
2) Contents - How are the materials or information presented to the examinee? Are they visual or auditory?
3) Products - What kind of mental structure must the brain produce? A unit or a system?

53
Q

What is Naglieri and Das’s theory of intelligence?

A

PASS: Planning, Attention, Simultaneous, and Successive Theory
Can be considered an information processing theory. Does something need simultaneous or successive processing? Planning is the last step.

54
Q

What is Gardner’s theory of intelligence?

A

Theory of Multiple Intelligences

7 types of intelligences according to this book, with three under investigation

Gardner doesn’t have the most pieces, but his theory is the broadest, covering things like bodily-kinesthetic ability although many wouldn’t consider that an intelligence.

He uses research with savants to defend his intelligences. Savants challenge Spearman’s G.

55
Q

What is Sternberg’s theory of intelligence?

A

Triarchic Theory of Intelligence

Componential (Analytical) - part of traditionally IQ testing
Experiential (Creative) - how we deal with novelty and automatize information processing, not really assessed by IQ tests
Contextual (Practical) - how we select, adapt, and shape our environment, not really assessed by IQ tests

56
Q

Why do we need to know about different tests?

A

Because we need a knowledge of a tests strengths and weaknesses as they pertain to the referral question.

57
Q

How are the Wechsler-Bellevue Intelligence Scales 1939 important to intelligence test history?

A

Described by some as the first successful test of intelligence for adults because for a long time, intelligence testing was based on what works with kids, which adults found boring.

Wechsler through out mental age, which didn’t mean anything for adults, and use IQ constancy instead.

58
Q

What is IQ constancy?

A

The IQ retains its properties and remains constant across different ages, even though raw intellectual ability might shift.

59
Q

What were the three scores of the Wechsler-Bellevue Intelligence Scales 1939?

A

Verbal Scale IQ
Performance Scale IQ
Composite score

60
Q

What are the three Weschsler scales now?

A

WPPSI-IV (ages 2-7)
WISC-IV (ages 6-16)
WAIS-IV (16-90)

61
Q

What does WPPSI mean?

A

Wechsler Preschool and Primary Scale of Intelligence

62
Q

What does WISC mean?

A

Wechsler Intelligence Scale for Children

63
Q

What does WAIS mean?

A

Wechsler Adult Intelligence Scale

64
Q

Why is important that there are commonalities among the tests?

A

This has helped them stay relevant, because once you’re trained in one, you are trained in others.

65
Q

What single best measure of overall intelligence on Wechsler scales?

A

Vocabulary.

…but you can’t use this test alone.

66
Q

What’s a controversial section of the test?

A

Picture completion because it may be inappropriate for culturally disadvantaged. One of the early tests had a tennis court with a net missing. Some children would say the body was missing in the picture of the face. Some children never see pictures of just a face.

67
Q

What section are test developers working to make more global?

A

Information

68
Q

Why do the Weschler scales include supplemental subtests?

A

1) Used when a problem occurs with another subtest. It doesn’t happen very often.
2) When additional information is needed.

People are tired of you and tired of taking the test after an hour to and hour and a half, so you don’t need to do any more than necessary.

69
Q

Besides the composite score, what are the four scores given by the WISC and the WAIS?

A

Verbal comprehension
Perceptual reasoning
Working memory
Processing speed

70
Q

What are the five factors and two domains of intelligence on the SB5?

A

The domains: nonverbal and verbal
The factors: Fluid reasoning, knowledge, quantitative reasoning, visual-spatial processing, and working memory.

2 Domains X 5 Factors = 10 subtests

71
Q

What is a routing procedure?

A

A routing procedure is on the SB5. It estimates the general cognitive ability of the examinee in order to determine the starting points on subtests.

72
Q

Besides the composite score, what are the two scores given by the SB5?

A

Verbal IQ

Nonverbal IQ

73
Q

What makes the SB5 special

A

It has extensive high- end items and improved low-end items.
It can be used to assess individuals with limited English.
The test was evaluated on fairness, including religious tradition.
The working memory factor can help assess ADHD.

74
Q

What’s important about the Detroit Tests of Learning Aptitude?

A

It has 10 subtests, and 16 composite scores, including general intelligence, optimal level, and 14 ability areas (in 7 dichotomies). General intelligence correlates, but the theory behind the other composites hasn’t been supported. Also, there are more composite scores than subtests, which is weird.

75
Q

What’s important to know about the Cognitive Assessment System - II?

A

Based on PASS
Children with ADHD score lower on Planning and Attention.
Less differences between black and white scores.

76
Q

What does SS stand for in PASS

A

Simultaneous and Successive

77
Q

What’s important about the Kaufman Brief Intelligence test-2 (KBIT-2)

A

It only takes about 20 minutes.
Mainly a screening test.
Correlates strongly with the WISC, but tends to overestimate scores by about 3-5 points.

78
Q

What’s the difference between intelligence tests and achievement tests?

A

Intelligence tests are designed to measure the broad mental abilities of the individual, but achievement tests are intended to appraise what a person has learned in school or some other course of study. They are often used in diagnosing learning disabilities.

79
Q

What’s is the KTEA-II?

A

Kaufman Test of Educational Achievement.
Individual Test of Achievement
It scores ages 4 1/2 - 25

Reading
Mathematics
Written Language
Oral Language.

80
Q

What is a learning disability?

A

It’s a long story…
The government says one thing, but it didn’t get help to kids who needed it.
The NJCLD said a Learning disorder is an intrinsic to the individual, identifies the central nervous system dysfunction as the origin, and states that LD may extend into adulthood.
A person who has weakness in all areas does not have an LD.

81
Q

What is the preferred method for identifying children with learning disabilities?

A

Response to intervention (RTI)
The focus is on early results and outcomes rather than later spending excessive time and resources on children who are already feeling because of their LD.

82
Q

What are five features of learning disabilities?

A
  1. a relative weakness in one area (intraindividual)
  2. a coexisting condition cannot be the primary cause
  3. They are heterogeneous
  4. developmental
  5. social and emotional difficulties
83
Q

What are the two types of learning disabilities?

A

Dyslexia or verbal learning disability (left brain)

Right hemisphere or nonverbal learning disability

84
Q

What are the three types of group tests?

A

Ability
Aptitude
Achievement
The distinction between these is often fuzzy, they differ mainly in their functions and applications but not so much in content.

85
Q

What are ability test used for?

A

Estimate current intellectual level. Maybe used for screening or placement purposes such as the gifted and talented program.

86
Q

What are aptitude tests used for?

A

They measure a few homogeneous segments of the ability are designed to predict future performance. Predictive validity is most important.

87
Q

What are achievement test used for?

A

Assess current skill attainment in relation to the goals of school and training programs.

88
Q

What are the two problems with group tests?

A
  1. Some examination score far below their true ability.

2. Invalid scores and I’ll be recognizes such.

89
Q

What problems are associated with culture-free and culture-fair tests?

A

Culture free tests are impossible, all of our knowledge is acquired in a culture.
Culture-fair tests are questionable. Some people say you can reduce the impact of culture so it’s fair, but other people say it’s just a nice idea. Even a novel stimulus, that no 5yo has ever encountered before, a child may pick something different as the goal. A lot of culture fair tests have validity problems.

90
Q

What is standard error of the estimate?

A

The margin of error to be expected in the predicted criterion score.

91
Q

What are Guttman Scales?

A

A scale where if you endorse one statement, you also endorse all the milder statements.

I occasionally feel sad or blue.
I often feel sad or blue.
I feel sad or blue most of the time.
I always feel sad or blue.

92
Q

What is the WIAT-II?

A

The Wechsler Individual Achievement Test
Ages 4-50

Linked with all Wechsler scales for comparison of intelligence and achievement. Good for identifying learning disabilities.

93
Q

What is the WJ III?

A

The Woodcock-Johnson III Tests of Achievement

Co-normed with its own intelligence test. The most extensive and comprehensive achievement battery of any tests.

Area scores are linked directly to federal standards of public law 94-142.

94
Q

What is the WRAT-4?

A

Wide Range Achievement Test-4

A screening instrument (15-25 min) not for specific achievement deficits.

95
Q

What are basal and ceiling rules?

A

Rules that tell us when to start the test and when to stop them. Used in the Wechsler Scales.

96
Q

What should be considered when choosing an intelligence test?

A

Although the overall scores of each test correlate strongly, the different approaches yield distinct sets of subscores. Also the referral question. Know strengths and weaknesses of each test.

97
Q

What are the general features of the Wechsler tests?

A

13-15 subtests

Breakdown of scores 4 ways

A common metric for IQ and Index scores across all three tests.

Common subtests among the tests

98
Q

What is the Raven’s Progressive Matrix?

A

Originally designed to measure Spearman’s G, the eduction of correlates (eduction = figuring out relationships)

Two factors, with conjectured associated skills:
Adding and subtracting items = rapid decision making and perception of part-whole relationships
Pattern of progression items = mechanical ability, estimating projected movement, and mental rotations.

Test-retest reliability isn’t great, especially with younger subjects.

About as culture fair as it gets.

99
Q

What’s the purpose of ASVAB?

A

Military - It screens people and helps the military determine what kind of training or role they should have.

It has lots of composite scores made up of the subtests. But these composite scores correlate strongly.

Most widely used aptitude test.

100
Q

What do we know about the predictive validity of the SAT and ACT?

A

Good measure of general cognitive ability.

Only about r = .42-.62 correlation, with the higher level from using HS GPA.

Colleges probably feel like they make better decisions with this than without it.

101
Q

What is test bias?

A

Various ways that tests are culturally and sexually biased.

A test is deemed biased if it is differentially valid for different subgroups.

102
Q

What is content bias and how is it determined?

A
  1. Items ask for information that ethnic minority or disadvantaged persons have not had equal opportunity to learn.
  2. The scoring of the items is improper, since the test author has arbitrarily decided on the only correct answer, which may not be correct in all cultures.
  3. The wording may be unfamiliar.
103
Q

What problems arise from judges review of content bias?

A

Expert judges cannot identify culturally biased test items based on an analysis of item characteristics.

104
Q

What are the four procedures (components) used to empirically address the issue of test bias?

A

Factor analysis
Regression equations
Intergroups comparisons of difficulty levels
Rank ordering of item difficulties

105
Q

How is test fairness different from test bias?

A

A test is deemed biased if it is differentially valid for different subgroups.

Test fairness is a broad concept that recognizes to importance of social values in test usage.

106
Q

What are the three ethical positions related to test fairness?

A

Unqualified individualism
Quotas
Qualified individualism

107
Q

What is heritability and how is it researched?

A

How much of our psychological make-up comes from our genetics vs. the environment.
Research using twin studies.

108
Q

What does research on impoverished, at-risk groups indicate about the environmental influences on intelligence?

A

Environmental circumstances impact intelligence.

Lack of enrichment means kids’ scores actually go down with time.

109
Q

What are the possible explanations for differences in intelligence test scores between racial groups?

A

Socioeconomics
Test bias
Genetics (not strong because the gap has decreased over time)

110
Q

What are the research findings regarding changes in intelligence with age and how is that impacted by the type of research design selected?

A

Cross-sectional is not the best way to research these questions.

Cross-sequential research has shown important points:
Overall, declines begin to occur around age 70.
Different findings for different abilities
More decline for processing speed
May even have improvements in some skills (vocabulary)

111
Q

What is the Flynn effect and the implications of its occurrence?

A

IQ scores improve through generations.
Are you taking a new test or one that’s about to be replaced. It could make a difference on whether they receive educational benefits.

Test revisions are essential.
Mazes are no longer used in IQ tests.

112
Q

What is content bias?

A

A test that is relatively more difficult for members of one group than another when there’s not reasonable explanation for it.

113
Q

What is construct bias?

A

A test measures different hypothetical trains for one group than the other.

114
Q

What is the ethical stance of unqualified individualism?

A

The best candidates without exception should be selected.

115
Q

What is the ethical stance of quotas?

A

Selecting employees to match the general racial make-up of the area, even if people aren’t the most qualified.

116
Q

What is the ethical stance of qualified individualism?

A

Refusing to race or sex to make decisions, even when it is empirically justified to do so.