19 - Test Bias Flashcards

1
Q

African Americans score about how many points lower than white Americans on standardized IQ tests?

A

15 points - one SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If you were to administer the Stanford-Binet or the Wechsler scale to large random samples of African Americans and white Americans, it is likely you would get the same results

what is the dispute then?

Why are there differences?

A

Dispute has not been over whether these differences exist but over why they do

differences result from environmental factors?biological? and related to the general (g) factor measured by IQ tests?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Increasing number of people no longer report their race when asked. Why?

What does this affect?

A

4% of the test takers did not disclose their ethnicity
Another 4% did not find a standard category

As a result, it is difficult to determine why the performance gap is not narrowing

African American students, because of stereotype threat, perform more poorly on tests when they reveal their race
White students decline to report their race because they feel there is discrimination in favor of ethnic minorities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

differential validity

A

Some psychologists argue that the tests are differentially valid for African Americans and whites.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Does differences btw ethic groups test performance indicate test bias?

A

Differences among ethnic groups on test performance do not necessarily indicate
test bias.

The question is whether the test has different meanings for different groups. - validity defines the meaning of a test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

African American and white employees
on subjective measures such as supervisor ratings versus objective measures based
on more formal evaluations

A

objective measures showed even larger differences between African American and white employees than the subjective evaluations for measures of work quality, quantity, and absenteeism. Differences between Hispanic and white employees were not as large as those between African American and white employees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Content-Related Evidence for Validity

A

Test constructors and users were accused of being biased because some children never have the opportunity to learn about some of the items; furthermore, members of ethnic groups might answer some items differently but still correctly.

argued that scores on intelligence tests are affected by language skills inculcated as part of a white, middle-class upbringing but foreign
to inner-city children

Flaugher (1978) concluded that many perceived test bias problems are based on misunderstandings about the way tests are usually interpreted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Flaugher on Content-Related Evidence for Validity

Flaugher argued that the purpose of aptitude and achievement tests is to

A

measure performance on items sampled from a wide range of information

Not particularly concerned about individual items, test developers focus on test performance, making judgments about it based on correlations between the tests and external criteria.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Content Related Evidence for Validity

Many test critics, though, focus attention on specific items

A

Owen (1985) reported that several intelligent and well-educated people had difficulty with
specific items on the SAT and Law School Admission Test (LSAT) examinations.

Some items on standardized tests are familiar only to those with a middle-class background.

Test developers are indifferent to people’s opportunities to learn the test information. Again, the meaning they eventually assign to the tests come from correlations of test scores with other variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

some evidence suggests that the linguistic bias in standardized tests does not cause the observed differences

A

Quay (1971) administered the Stanford-Binet test to 100 children in an innercity Head Start program.

Half of the children took a version of the test that used African American dialect, while the others took the standard version

less than a 1-point increase in test scores

African American children can comprehend standard English about as well as they can comprehend African American dialect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Some studies have failed to demonstrate that biased items in well-known standardized tests account for the differences in scores among ethnic groups

A

many attempts to “purify” tests using this approach have not eliminated differences between groups.

In one study, 16% of the items in an elementary reading test were eliminated after experts reviewed them and labeled them as potentially biased toward the majority group.

However, when the “purged” version of the test was used, the differences between the majority and the minority school populations were no smaller than they had been originally

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Another approach to the same problem is to find those classes of items that are most likely to be missed by members of a particular minority group

A

important; if they identify certain types of items that discriminate among groups, then these types of items can be avoided on future tests

results have not been encouraging; studies have not clearly identified such categories of items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Differential Item Functioning (DIF) Analysis

A

developed by the Educational Testing Service (ETS)

Creates and administers a variety of aptitude tests, including the Graduate Record Examination (GRE), the SAT, and the LSAT

Performance of white test takers differs significantly from the performances of other racial and ethnic groups on verbal and analysis measures

attempts to identify items that are specifically biased against any ethnic, racial, or gender group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DIF analysis steps

A
  1. equates groups on the basis of overall score. find subgroups of test takers who obtain
    equivalent scores.
  2. evaluates differences in performance between men and women on particular items.
  3. . Items that differ significantly between the groups are thrown out and the entire test is rescored.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

some evidence that test items that depict people do not accurately portray the distribution of genders and races in the population

A

white male characterization occurred with disproportionate frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

studies have not supported the popular belief that items have different meanings for different groups

A

effort should be taken to purge items that have the potential for being biased

17
Q

Criterion-Related Sources of Bias

A

Standardized tests such as the SAT have been found to satisfactorily predict performance during the first year of college.

On the average, minority applicants have lower test scores than do nonminority applicants.

At the same time, most universities and colleges are attempting to increase their minority enrollments.

Because minority applicants are considered as a separate category, we should ask whether the tests have differential predictive power for the
two groups of applicants

, we assess the criterion-related evidence for
validity of a test by the coefficient of correlation between the test and some criterion

18
Q

isodensity curve

A

ellipse is used to encircle a specified portion
of the cases that constitute a particular group

19
Q

The equal slopes of the lines suggest

A

equal predictive evidence for validity.

20
Q

two regression lines are not parallel;

A

the coefficient of one group differs from that of the other

a test that is designed to predict performance in a mechanical training program would show differential validity if it predicted performance much better for men than for women.

21
Q

There are two potential explanations for why some children do more poorly on standardized
tests than do other children

A
  1. less intelligent—the “stupidity” explanation.
  2. some children do more poorly because they
    are ignorant of the right responses for a particular test - differences in IQ scores are of less concern because they can be changed

minority children have not learned how to answer items that might predict success
in white, middle-class culture, so many white, middle-class children have not learned
how to succeed in the inner city

22
Q

Dove Counterbalance General Intelligence Test, = Chitling Test

A

developed to demonstrate that there is a body of information about which the white middle class is ignorant

show that African Americans and whites are just not talking the same language.

African American children obtained significantly
higher scores.

However, there was no clear evidence that scores on the Chitling Test
predicted academic success or even success in the children’s home environment.

23
Q

scientific racism - Williams

A

the use of intelligence tests is a subtle and dangerous form of racism because the tests are purportedly supported by scientific validity studies

IQ and standardized achievement tests as
“nothing but updated versions of the old signs down South that read ‘For Whites
Only’”

Of particular interest to Williams and his colleagues is the assessment of the ability to survive in the African American community

survival quotient (SQ) is more important than assessment of IQ, which indicates only the likelihood of succeeding in the white community

24
Q

The Black Intelligence Test of Cultural Homogeneity (BITCH)

A

asks respondents to define 100 vocabulary words relevant to African American culture

African American people obtain higher scores than do their white counterparts on the BITCH

Williams argues that traditional IQ and achievement tests are nothing more than culture-specific tests that assess how much white children know about white culture.

little convincing validity data on the BITCH are available. Although the test manual does report some studies, the samples are small and do not represent any clearly defined population

25
Q

The System of Multicultural Pluralistic Assessment (SOMPA) (Mercer, 1979)

A

beliefs about what is fair and what knowledge exists are related to the social structure

all cultural groups have the same average potential.

Any differences among cultural groups are assumed to be caused by differences in access to cultural experiences.

Those who do not perform well on the tests are not well informed about the criteria for success usually set forth by the dominant group.

26
Q

SOMPA attempts to integrate three different approaches to assessment: medical, social, and pluralistic.

A
  1. Medical - physical measures: vision, hearing, and motor functioning.
  2. social-system - entire WISC-R is given and evaluated according to the regular criteria.
  3. Pluralistic- WISC-R scores are adjusted for socioeconomic background known as Estimated learning potentials. Poorer predictor of school success than WISC.

correlation between ELPs and school achievement is approximately .40, whereas the
correlation between the WISC-R and school achievement is nearly .60

ELPs are a poorer predictor of school success than are WISC-R scores.

27
Q

Hunter and Schmidt (1976) identified three ethical positions that set the tone for much of the debate:

A
  1. unqualified individualism would use tests to select the most qualified individuals they could find indifferent to the race or gender of applicants
    —-If race or gender was a valid predictor of performance over and above the
    information in the test, then the unqualified individualist would see nothing wrong
    with considering this information in the selection process
    - under prediction for some groups and over prediction for others
  2. quotas, which explicitly recognize race and gender differences - represent population
    - top from each group
  3. qualified individualism embraces the notion that one should select the best-qualified people. But unqualified individualists also take information about race, gender, and religion into consideration if it helps to predict performance on the criterion—that is, if not to do so results in underprediction of performance for one group and overprediction for another
    —– people from two different groups have the same test score, adv to lower group,higher group disadv.
28
Q

Regression

A

Separate regression lines are used for different groups. Those with predicted criterion scores are selected.

fair because those with the highest estimated level of success are selected.

Few minority group members selected

good performance on criteria

29
Q

Constant ratio

A

Points equal to approximately half of the average difference between the groups are added to the test scores of the group with the lower score. Then a single regression line is used, and those with the highest predicted
scores are selected.

fair reflects the potential of the lower scoring group

some incr in number of minority

somewhat lower performance on criterion

30
Q

Cole/ Darlington

A

Separate regression equations are used for each group, and points are added to the scores
of those from the lower group to ensure that those with the same criterion score have the
same predictor score.

fair because it selects more potentially successful people from the lower
group.

Larger increase in the number of minority group members selected

lower performance on criterion

31
Q

Quota

A

The proportion of people to be selected from each group is predetermined. Separate regression equations are used to select those from each group who are expected to perform
highest on the criterion.

fair because members of different subgroups are selected based on their proportions in the community

Best representation of minority group

same as cole/darlington - lower

may lead to greater rates of failure among some groups.

32
Q

Pareto optimal

A

balances competing goals, in this case between criterion performance and ethnic or racial balance.

33
Q

observed differences between minority and nonminority groups on standardized tests pose a problem - opportunity for test developers and users is to see test results in new ways.

A

differences in test scores may reflect patterns of problem solving that characterize different subcultures

teach us important things about the relationship between socialization and problem-solving approaches.

Knowing more about the ways different groups approach problems can lead to the development of improved predictors of success for minority groups

34
Q

average child takes how many mandated tests from k-12

A

112

35
Q

predictive bias

A

describe different patterns of association between test scores and criterion variables for different ethnic, racial, or study differences between male and female performance, and between performance of African American and white students and white and Hispanic
students using hundreds of thousands of test-taker groups

occurs when there are different slopes or intercepts for different groups.

36
Q

Supporters of the use of the tests give three reasons not to use grades as the
criterion

A

First, teacher-assigned grades are unstandardized and open to subjective bias

Second, few available studies have used
grades as the criterion.

Third, the most frequently cited study (Goldman & Hartig, 1976) is open to other explanations. In this study, the teachers rated the classroom
performance of nearly all of the minority children as poor. These low ratings resulted
in little variance on the criterion measure