Test Construction - Domain Quiz Flashcards

1
Q

Content appropriateness, taxonomic level, and extraneous abilities are factors that are considered when evaluating:
Select one:

a. a test’s factorial validity.
b. a test’s incremental validity.
c. the relevance of test items.
d. the adequacy of the “actual criterion.”

A

In the context of test construction, relevance refers to the extent to which test items contribute to achieving the goals of testing.

Answer C is correct: Content appropriateness, taxonomic level, and extraneous abilities are three factors that may be considered when determining the relevance of test items.

Answer A is incorrect: Factorial validity refers to the extent to which a test has high correlations with factors it is expected to correlate with and low correlations with factors it is not expected to correlate with.

Answer B is incorrect: Incremental validity refers to the degree to which a test improves decision-making accuracy.

Answer D is incorrect: The actual criterion refers to the actual (versus ultimate) measure of performance.

The correct answer is: the relevance of test items.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

For an achievement test item that has an item discrimination index (D) of +1.0, you would expect:
Select one:

a. high achievers to be more likely than low achievers to answer the item correctly.
b. low achievers to be more likely than high achievers to answer the item correctly.
c. moderate achievers to be more likely than high and low achievers to answer the item correctly.
d. low and high achievers to be equally likely to answer the item correctly.

A

The item discrimination index is calculated by subtracting the percent of examinees in the lower-scoring group who answered the item correctly from the percent of examinees in the upper-scoring group who answered the item correctly and ranges in value from -1.0 to +1.0.

Answer A is correct: When all examinees in the upper-scoring group and none in the lower-scoring group answered the item correctly, D is equal to +1.0.

The correct answer is: high achievers to be more likely than low achievers to answer the item correctly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The item difficulty index (p) ranges in value from:
Select one:

a. -1.0 to +1.0.
b. -.50 to +.50.
c. 0 to +1.0.
d. 0 to 50.

A

The item difficulty index (p) indicates the proportion of examinees in the tryout sample who answered the item correctly.

Answer C is correct: The item difficulty index ranges in value from 0 to +1.0, with 0 indicating that none of the examinees answered the item correctly and +1.0 indicating that all examinees answered the item correctly.

The correct answer is: 0 to +1.0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The optimal item difficulty index (p) for items included in a true or false test is:
Select one:

a. .25.
b. .50.
c. .75.
d. 1

A

One factor that affects the optimal difficulty level of an item is the likelihood that an examinee can choose the correct answer by guessing, with the preferred level being halfway between 100% and the level of success expected by chance alone.

Answer C is correct: For true or false items, the probability of obtaining a correct answer by chance alone is .50. Therefore, the optimal difficulty level for true or false items is .75, which is halfway between 1.0 and .50.

The correct answer is: .75.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
The slope (steepness) of an item characteristic curve indicate's the item's:
Select one:

a. difficulty level.
b. discrimination.
c. reliability.
d. validity.

A

The various item response theory models provide item characteristic curves that provide information on one, two, or three parameters – i.e., difficulty level, discrimination, and probability of guessing correctly. Additional information on the item characteristic curve is provided in the Test Construction chapter of the written study materials.

Answer B is correct: An item’s ability to discriminate between high and low achievers is indicated by the slope of the item characteristic curve – the steeper the slope, the greater the discrimination.

The correct answer is: discrimination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

According to classical test theory, total variability in obtained test scores is composed of:
Select one:

a. true score variability plus random error
b. true score variability plus systematic error
c. a combination of communality and specificity
d. a combination of specificity and error

A

Answer A is correct: As defined by classical test theory, total variability in test scores is due to a combination of true score variability plus measurement (random) error - i.e., X = T + E.

The correct answer is: true score variability plus random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A problem with using percent agreement as a measure of inter-rater reliability is that it doesn’t take into account the effects of:
Select one:

a. sample heterogeneity.
b. test length.
c. chance agreement among raters.
d. inter-item inconsistency.

A

Inter-rater reliability can be assessed using percent agreement or by calculating the kappa statistic.

Answer C is correct: A disadvantage of percent agreement is that it doesn’t take into account the amount of agreement that could have occurred among raters by chance alone, which can provide an inflated estimate of the measure’s reliability. The kappa statistic is more accurate because it adjusts the reliability coefficient for the effects of chance agreement.

The correct answer is: chance agreement among raters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A researcher correlates scores on two alternate forms of an achievement test and obtains a reliability coefficient of .80. This means that ___% of observed test score variability reflects true score variability.
Select one:

a. 80
b. 64
c. 36
d. 20

A

Answer A is correct: A reliability coefficient is interpreted directly as a measure of true score variability.

The correct answer is: 80

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A test has a standard deviation of 12, a mean of 60, a reliability coefficient of .91, and a validity coefficient of .60. The test’s standard error of measurement is equal to:
Select one:

a. 12
b. 9.6.
c. 3.6.
d. 2.8.

A

To calculate the standard error of measurement, you need to know the standard deviation of the test scores and the test’s reliability coefficient.

Answer C is correct: The standard deviation of the test scores is 12 and the reliability coefficient is .91. To calculate the standard error, you multiply the standard deviation times the square root of one minus the reliability coefficient: 1 minus .91 is .09; the square root of .09 is .3; .3 times 12 is 3.6. Additional information about the calculation and use of the standard error of measurement is provided in the Test Construction chapter of the written study materials.

The correct answer is: 3.6.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Consensual observer drift tends to:
Select one:

a. increase the probability of answering a test item correctly by chance alone.
b. decrease the probability of answering a test item correctly by chance alone.
c. produce an overestimate of a test’s inter-rater reliability.
d. produce an underestimate of a test’s inter-rater reliability.

A

Consensual observer drift occurs when two or more observers working together influence each other’s ratings on a behavioral rating scale so that they assign ratings in a similar idiosyncratic way.

Answer C is correct: Consensual observer drift makes the ratings of different raters more similar, which artificially increases inter-rater reliability.

The correct answer is: produce an overestimate of a test’s inter-rater reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

For a newly developed test of cognitive flexibility, coefficient alpha is .55. Which of the following would be useful for increasing the size of this coefficient?
Select one:

a. adding more items that are similar in terms of content and quality
b. adding more items that are similar in terms of quality but different in terms of content
c. reducing the heterogenity of the tryout sample
d. using a true or false format for the items rather than a multiple-choice format

A

For the exam, you want to be familiar with the methods for increasing reliability that are described in the Test Construction chapter of the written study materials.

Answer A is correct: A test’s reliability is increased when the test is lengthened by adding items of similar content and quality, the range of scores is unrestricted (i.e., the tryout sample heterogeneity is maximized), and the ability to choose the correct answer by guessing is reduced.

The correct answer is: adding more items that are similar in terms of content and quality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sally Student receives a score of 450 on a college aptitude test that has a mean of 500 and standard error of measurement of 50. The 68% confidence interval for Sally’s score is:
Select one:

a. 400 to 450.
b. 400 to 500.
c. 450 to 550.
d. 350 to 550.

A

The standard error of measurement is used to construct a confidence interval around an obtained test score.

Answer B is correct: To construct the 68% confidence interval, one standard error of measurement is added to and subtracted from the obtained score. Since Sally obtained a score of 450 on the test, the 68% confidence interval for her score is 400 to 500. Additional information on constructing confidence intervals is provided in the Test Construction chapter of the written study materials.

The correct answer is: 400 to 500.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The kappa statistic for a test is .95. This means that the test has:
Select one:

a. adequate inter-rater reliability.
b. adequate internal consistency reliability.
c. inadequate intra-rater reliability.
d. inadequate alternate forms reliability.

A

The kappa statistic (coefficient) is a measure of inter-rater reliability.

Answer A is correct: The reliability coefficient ranges in value from 0 to +1.0. Therefore a kappa statistic of .95 indicates a high degree of inter-rater reliability.

The correct answer is: adequate inter-rater reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

To assess the internal consistency reliability of a test that contains 50 items that are each scored as either “correct” or “incorrect,” you would use which of the following?
Select one:

a. KR-20
b. Spearman-Brown
c. kappa statistic
d. coefficient of concordance

A

For the exam, you want to be familiar with all of the measures listed in the answers to this question.

Answer A is correct: The Kuder-Richardson Formula 20 (KR-20) is a measure of internal consistency reliability that can be used when test items are scored dichotomously (correct or incorrect).

Answer B is incorrect: The Spearman-Brown formula is used to estimate the effects of lengthening or shortening a test on its reliability.

Answer C is incorrect: The kappa statistic (also known as the kappa coefficient) is a measure of inter-rater reliability.

Answer D is incorrect: The coefficient of concordance is another measure of inter-rater reliability.

The correct answer is: KR-20

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

To determine a test’s internal consistency reliability by calculating coefficient alpha, you would:
Select one:

a. administer the test to a single sample of examinees two times.
b. administer two alternate forms of the test to a single sample of examinees.
c. administer the test to a single sample of examinees and have the tests scored by two raters.
d. administer the test to a single sample of examinees one time.

A

Knowing that coefficient alpha is a measure of internal consistency reliability would have helped you identify the correct answer to this question.

Answer D is correct: Determining internal consistency reliability with coefficient alpha involves administering the test once to a single sample of examinees and using the formula to determine the degree of inter-item consistency.

Answer A is incorrect: Administering the same test to a single sample of examinees on two occasions would be the procedure for assessing test-retest reliability.

Answer B is incorrect: Administering two alternate forms of the test to a single sample of examinees is the procedure for assessing alternate (equivalent) forms reliability.

Answer C is incorrect: Having a test that was administered to a single sample of examinees scored by two raters is the procedure for assessing inter-rater reliability.

The correct answer is: administer the test to a single sample of examinees one time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

To estimate the effects of lengthening a 50-item test to 100 items on the test’s reliability, you would use which of the following?
Select one:

a. eta
b. KR-20
c. kappa coefficient
d. Spearman-Brown formula

A

For the exam, you want to be familiar with the measures listed in the answers to this questions. These are described in the Test Construction chapter of the written study materials.

Answer D is correct: The Spearman-Brown prophecy formula is used to estimate the effects of lengthening or shortening a test on its reliability coefficient.

The correct answer is: Spearman-Brown formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which of the following methods for evaluating reliability is most appropriate for speed tests?
Select one:

a. split-half
b. coefficient alpha
c. kappa statistic
d. coefficient of equivalence

A

Answer D is correct: Of the methods for evaluating reliability, the coefficient of equivalence (also known as alternative or equivalent forms reliability) is most appropriate for speed tests. Split-half reliability and coefficient alpha are types of internal consistency reliability, and measures of internal consistency reliability overestimate the reliability of speed tests. The kappa statistic is a measure of inter-rater reliability.

The correct answer is: coefficient of equivalence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

You administer a test to a group of examinees on April 1st and then re-administer the same test to the same group of examinees on May 1st. When you correlate the two sets of scores, you will have obtained a coefficient of:
Select one:

a. internal consistency.
b. determination.
c. equivalence.
d. stability.

A

Correlating two sets of scores obtained by the same group of examinees produces a test-retest reliability coefficient.

Answer D is correct: Test-retest reliability indicates the stability of scores over time, and the test-retest reliabiity coefficient is also known as the coefficient of stability.

The correct answer is: stability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A test developer uses a sample of 50 current employees to identify items for and then validate a new selection test (predictor). When she correlates scores on the test with scores on a measure of job performance (criterion) for this sample, she obtains a criterion-related validity coefficient of .63. When the test developer administers the test and the measure of job performance to a new sample of 50 employees, she will most likely obtain a validity coefficient that is:
Select one:

a. greater than .63.
b. less than .63.
c. about .63.
d. negative in value.

A

This question is asking about “shrinkage,” which occurs when a test is cross-validated on another sample.

Answer B is correct: The validity coefficient tends to “shrink” (be smaller) on the second sample because the test was tailor-made for the initial sample and the chance factors that contributed to the validity coefficient in the initial sample will not all be present in the second sample.

The correct answer is: less than .63.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

A test’s content validity is established primarily by which of the following?
Select one:

a. conducting a factor analysis
b. assessing the test’s convergent and divergent validity
c. having subject matter experts systematically review the test’s items
d. testing hypotheses about the attribute(s) measured by the test

A

For the exam, you want to be familiar with the differences between content, construct, and criterion-related validity.

Answer C is correct: Content validity refers to the degree to which test items are an adequate sample of the content domain and is determined primarily by the judgment of subject matter experts. The methods listed in the other answers are used to establish a test’s construct validity.

The correct answer is: having subject matter experts systematically review the test’s items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

A test’s specificity refers to the number of __________ that were identified by the test.
Select one:

a. true positives
b. false positives
c. true negatives
d. false negatives

A

For the exam, you want to know the difference between specificity and sensitivity, which are terms that are used to describe a test’s accuracy.

Answer C is correct: Specificity refers to the identification of true negatives (percent of cases in the validation sample who do not have the disorder and were accurately classified by the test as not having the disorder). Additional information on sensitivity and specificity is provided in the Test Construction chapter of the written study materials.

Answer A is incorrect: Sensitivity refers to the number of true positives.

The correct answer is: true negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

In a multitrait-multimethod matrix, a test’s construct validity would be confirmed when:
Select one:

a. monotrait-monomethod coefficients are low and heterotrait-heteromethod coefficients are high.
b. monotrait-heteromethod coefficients are high and heterotrait-monomethod coefficients are low.
c. monotrait-monomethod coefficients are high and monotrait-heteromethod coefficients are low.
d. heterotrait-monomethod coefficients and heterotrait-heteromethod coefficients are low.

A

This question is asking about the pattern of correlation coefficients in a multitrait-multimethod matrix that provide evidence of a test’s construct validity.

Answer B is correct: When monotrait-heteromethod (same trait-different methods) coefficients are large, this provides evidence of the test’s convergent validity – i.e., it shows that the test is measuring the trait it was designed to measure. Conversely, when heterotrait-monomethod (different traits-same method) coefficients are small, this provides evidence of the test’s discriminant validity – i.e., it shows that the test is not measuring a different trait. Additional information on the correlation coefficients contained in a multitrait-multimethod matrix is provided in the Test Construction chapter of the written study materials.

The correct answer is: monotrait-heteromethod coefficients are high and heterotrait-monomethod coefficients are low.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

In a scatterplot constructed from data collected in a concurrent validity study, the number of “false negatives” is likely to increase if:
Select one:

a. the predictor and criterion cutoff scores are both raised.
b. the predictor and criterion cutoff scores are both lowered.
c. the predictor cutoff score is raised and or or the criterion cutoff score is lowered.
d. the predictor cutoff score is lowered and or or the criterion cutoff score is raised.

A

An illustration is provided in the Test Construction materials that can help you visualize what happens when the predictor and or or criterion cutoff scores are changed.

Answer C is correct: The number of false negatives increases as the predictor cutoff score is raised (moved to the right in a scatterplot) and when the criterion cutoff score is lowered (moved toward the bottom of the scatterplot).

The correct answer is: the predictor cutoff score is raised and or or the criterion cutoff score is lowered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

____________ refers to the percent of examinees who have the condition being assessed by a predictor who are identified by the predictor as having the condition.
Select one:

a. Specificity
b. Sensitivity
c. Positive predictive value
d. Negative predictive value

A

Answer B is correct: Sensitivity refers to the probability that a predictor will correctly identify people with the disorder from the pool of people with the disorder. It is calculated using following formula: true positives or (true positives + false negatives).

The correct answer is: Sensitivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

The results of a factor analysis indicate that Test A has a factor loading of .70 for Factor I and a factor loading of .20 for Factor II. Assuming that only two factors were extracted and that the factors are orthogonal, you can conclude that the communality for Test A scores is:
Select one:

a. 90%.
b. 53%.
c. 49%.
d. 4%.

A

Factor loadings are interpreted like correlation coefficients between two or more variables and are squared to obtain a measure of shared variability. When the factors are orthogonal (uncorrelated), the squared factor loadings can be added to obtain the communality.

Answer B is correct: The factor loading for Factor I is .70 and the factor loading for Factor II is .20: .70 squared is 49% and .20 squared is 4%, so the communality is 49% plus 4%, which equals 53%. This means that the total amount of variability in Test A scores explained by the factor analysis is 53%.

The correct answer is: 53%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

The standard error of estimate is used to:
Select one:

a. estimate the difference between an examinee’s obtained test score and his or her true test score.
b. estimate the difference between an examinee’s predicted criterion score and his or her true criterion score.
c. determine the maximum a predictor’s validity coefficient can be given the reliabilities of the predictor and the criterion.
d. predict the probability that an examinee will obtain a particular score on one or more predictors.

A

For the exam, you want to be sure you know the difference between the standard error of measurement and the standard error of estimate.

Answer B is correct: The standard error of estimate is used to estimate the range within which an examinee’s true criterion score is likely to fall given his or her predicted score on the criterion.

The correct answer is: estimate the difference between an examinee’s predicted criterion score and his or her true criterion score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

To ascertain if the test you have developed is valid as a screening test for determining whether a person has an anxiety or affective disorder, you would be most interested in evaluating the test’s:
Select one:

a. content validity.
b. external validity.
c. concurrent validity.
d. differential validity.

A

This situation is analogous to using a predictor to estimate current performance on a criterion. The predictor, in this case, is the screening test, while the criterion is the accuracy of the diagnosis.

Answer C is correct: Concurrent validity is a type of criterion-related validity. It is used to establish validity when the purpose of the test is to estimate current status on a criterion. In this case, the criterion would be some method of diagnosis that is known to be accurate.
d. Incorrect A test has differential validity when it has different validity coefficients for different groups. Differential validity is not relevant to this situation.

Answer A is incorrect: Content validity would be of interest when a test is designed to be a sample of a particular content domain.

Answer B is incorrect: External validity refers to the generalizability of research results and does not apply to this situation.

Answer D is incorrect: A test has differential validity when it has different validity coefficients for different groups. Differential validity is not relevant to this situation.

The correct answer is: concurrent validity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

To evaluate the concurrent validity of a new selection test for computer programmers, you would:
Select one:

a. use factor analysis to determine if the test measures the abilities it was designed to measure.
b. have subject matter experts review test items to ensure they are relevant to success as a computer programmer.
c. administer the test to current computer programmers and correlate their test scores with recently assigned job performance ratings
d. administer the test to applicants for computer programmer jobs, hire all applicants regardless of their scores on the test, and correlate their test scores with job performance ratings they receive six months later

A

Answer C is correct: Concurrent validity is a type of criterion-related validity and involves correlating scores on the predictor and criterion when both measures have been administered to examinees at about the same time.

The correct answer is: administer the test to current computer programmers and correlate their test scores with recently assigned job performance ratings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Validity is best described as:
Select one:

a. consistency.
b. accuracy.
c. distinctiveness.
d. stability.

A

A test is valid to the degree that it measures what it was designed to measure.

Answer B is correct: When a test is valid, it accurately measures the attribute(s) it was designed to measure.

Answer A is incorrect: Reliability is a measure of consistency.

The correct answer is: accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

When conducting a factor analysis, you would choose an oblique rotation of the factors if:
Select one:

a. you are assessing the construct validity of a test designed to measure a single trait.
b. you believe that each test included in the analysis measures a different construct.
c. you believe the constructs measured by the tests included in the analysis are correlated.
d. you want to determine if a test has an adequate level of incremental validity.

A

For the exam, you want to know what “orthogonal” and “oblique” mean in the context of factor analysis.

Answer C is correct: In factor analysis, orthogonal means uncorrelated, while oblique means correlated. Therefore, you would conduct an oblique rotation if you believe the test you are validating measures constructs that correlate with the constructs measured by the other tests you’ve included in the analysis.

The correct answer is: you believe the constructs measured by the tests included in the analysis are correlated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

When determining a predictor’s incremental validity, the positive hit rate is calculated by:
Select one:

a. dividing the number of true positives by the total number of positives.
b. dividing the total number of positives by the number of people in the sample.
c. dividing the base rate by the number of true positives.
d. dividing the total number of positives by the base rate.

A

The positive hit rate is the proportion of people who would have been selected on the basis of their predictor scores and who are successful on the criterion.

Answer A is correct: The positive hit rate is calculated by dividing the number of true positives by the total number of positives. The result indicates the percent of positives who were actually successful on the criterion.

The correct answer is: dividing the number of true positives by the total number of positives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Which of the following best defines the relationship between a predictor’s reliability coefficient and its criterion-related validity coefficient?
Select one:

a. A test’s validity coefficient cannot exceed its reliability coefficient.
b. A test’s validity coefficient cannot exceed the square root of its reliability coefficient.
c. A test’s validity coefficient cannot exceed the square of its reliability coefficient.
d. A test’s reliability coefficient cannot exceed its validity coefficient.

A

For the exam you want to know that reliability places a ceiling on validity and be familiar with the formula for the relationship between reliability and validity so that you can answer questions like this one on the exam.

Answer B is correct: This answer describes the formula that defines the relationship between reliability and validity – i.e., a test’s validity coefficient cannot be greater than the square root of its reliability coefficient.

The correct answer is: A test’s validity coefficient cannot exceed the square root of its reliability coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Your newly developed measure of integrity correlates highly with a well-known and widely used measure of integrity. This correlation provides evidence of your measure’s ________ validity.
Select one:

a. incremental
b. internal
c. discriminant
d. convergent

A

In this situation, one measure of a specific construct correlates highly with another measure of the same construct.

Answer D is correct: A high correlation between a new and an established measure of the same construct provides evidence of the new measure’s convergent validity (which, in turn, provides evidence of its construct validity).

Answer A is incorrect: Incremental validity is a measure of decision-making accuracy and is associated with criterion-related validity.

Answer B is incorrect: Internal validity is one of the standards used to evaluate research designs and is not relevant to the situation described in this question.

Answer C is incorrect: Discriminant validity (also known as divergent validity) refers to the extent to which a test does not correlate with measures of different constructs.

For additional information on convergent and discriminant validity, see the section on construct validity in the Test Construction chapter of the written study materials.

The correct answer is: convergent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Assuming a normal distribution, which of the following represents the highest score?
Select one:

a. a Z score of 1.5
b. a T score of 70
c. a WAIS Full Scale IQ score of 120
d. a percentile rank of 88

A

For the exam, you want to be familiar with the relationship of z scores, T scores, WAIS IQ scores, and percentile ranks in a normal distribution so that you can answer questions like this one.

Answer B is correct: A T score of 70 is two standard deviations above the mean.

Answer A is incorrect: A Z score of 1.5 is one and one-half standard deviations above the mean.

Answer C is incorrect: A WAIS IQ score of 120 is slightly over one standard deviation above the mean.

Answer D is incorrect: A percentile rank of 88 is slightly over one standard deviation above the mean.

The correct answer is: a T score of 70

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Dina received a percentile rank of 48 on a test, while her twin brother, Dino, received a percentile rank of 98. Their teacher realizes she made an error in scoring their tests and adds four points to Dina’s and Dino’s raw scores. (The other students’ tests were scored correctly.) When she recalculates Dina’s and Dino’s percentile ranks, the teacher will find that:
Select one:

a. Dina’s percentile rank will change by more points than Dino’s.
b. Dino’s percentile rank will change by more points than Dina’s.
c. Dina’s and Dino’s percentile ranks will change by the same number of points.
d. Dina and Dino’s percentile ranks will not change.

A

As described in the Test Construction chapter of the written study materials, percentile ranks maximize differences in the middle of the raw score distribution and minimize differences at the extremes.

Answer A is correct: This general rule means that Dina’s percentile rank (which is near the middle of the distribution) will be affected more by the four-point addition to her raw score than Dino’s percentile rank (which is extremely high)

The correct answer is: Dina’s percentile rank will change by more points than Dino’s.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Eigenvalues are associated with:
Select one:

a. internal consistency reliability.
b. criterion-referenced interpretation.
c. the multitrait-multimethod matrix.
d. principal components analysis.

A

An eigenvalue indicates the total amount of variability in a set of tests or other variables that is explained by an identified component or factor.

Answer D is correct: Eigenvalues can be calculated for each component “extracted” in a principal component analysis. Additional information on eigenvalues and principal component analysis is provided in the Test Construction chapter of the written study materials.

The correct answer is: principal components analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Stanford-Binet and Wechsler IQ scores are:
Select one:

a. percentile ranks.
b. ipsative scores.
c. standard scores.
d. stanine scores.

A

Standard scores report an examinee’s performance in terms of standard deviations from the mean.

Answer C is correct: Stanford-Binet and Wechsler IQ scores are standard scores that indicate an examinee’s performance in terms of standard deviations from the mean obtained by examinees in the norm group.

The correct answer is: standard scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Which of the following scores is NOT a norm-referenced score?
Select one:

a. percentile rank
b. T-score
c. pass or fail
d. grade-equivalent scores

A

When using norm-referenced interpretation, an examinee’s score indicates how well he or she did on the test relative to examinees in the norm group.

Answer C is correct: Pass or fail is a criterion-referenced score. It indicates whether a person has or has not mastered the test content and does not measure performance in terms of other examinees. A “pass” score obtained by one examinee does not indicate how many other examinees passed or failed.d. Incorrect A grade-equivalent score is a norm-referenced score. It allows a test user to compare an examinee’s test performance to that of students in different grade levels.

Answer A is incorrect: Percentile ranks are norm-referenced scores. A percentile rank indicates the percent of examinees in the norm group who obtained a lower score.

Answer B is incorrect: A T-score is a type of standard score, and standard scores are norm-referenced scores that indicate how well an examinee did in terms of standard deviation units from the mean score of the norm group.

Answer D is incorrect: A grade-equivalent score is a norm-referenced score. It allows a test user to compare an examinee’s test performance to that of students in different grade levels.

The correct answer is: pass or fail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Zelda Z. obtains a score of 41 on a test that has a mean of 50 and a standard deviation of 6. If all of the scores in the distribution are transformed so that the test now has a mean of 100 and a standard deviation of 12, Zelda’s score in the new distribution would be:
Select one:

a. 91
b. 82
c. 41
d. 20.5.

A

To identify the correct answer to this question, you have to recognize that Zelda’s original score was 1-1/2 standard deviations below the mean.

Answer B is correct: A score of 82 is 1-1/2 standard deviations below the mean of the new distribution and, therefore, equivalent to a score of 41 in the original distribution.

The correct answer is: 82

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

A test developer is concerned that her newly developed test of academic achievement has limited floor. Therefore, she should be best advised to increase the proportion of items in the test that have an item difficulty index (p) of:

a. .80 to .95
b. .15 to .30
c. 0.
d. -.95 to -1.0

A

Correct answer: .80 to .95

a. Easy items.
b. Hard items.
c. No one answered correctly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

An item discrimination index (D) of ____ indicates that the item was answered correctly by more low achieving examinees than high-achieving examinees.

a. 10
b. +.50
c. 0
d. -.50

A

b. answered correctly more by high achieving than by low achieving
c. answered equally by both high and low achieving (doesn’t discriminate between the two groups
d. was answered more by low achieving than by high achieving

Correct answer: D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

A kappa coefficient of .94 indicates:

a. a low level of alternate forms reliability
b. a low level of item discrimination
c. an acceptable level of internal consistency reliability
d. an acceptable level of inter-rater reliability

A

Correct Answer: D

It indicates a HIGH LEVEL of inter-rater reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

A test designed to measure knowledge of test construction is likely to have the (LOWEST) reliability coefficient when the test consists of ____ items and the tryout sample consists of examinees who are ______ with regard to their knowledge of test construction.

a. 40; heterogenous
b. 40; homogenous
c. 80; heterogenous
d. 80; homogenous

A

Correct Answer: B

b. shorter test and homogenous examinees will produce lowest Reliability Coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

When a test’s reliability coefficient is equal to 0, the standard error of measurement for the test:

a. is equal to the test’s mean.
b. is equal to 1.
c. is equal to the test’s standard deviation.
d. cannot be determined.

A

Correct Answer: C

MEMORISE FORMULA

Ex. What is the SEM when the standard error of measurement is equal to 1?

Answer: C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

In a multitrait-multimethod matrix, a large monotrait heteromethod coefficient provides evidence that the test being validated has:

a. adequate divergent validity
b. adequate convergent validity
c. inadequate divergent validity
d. inadequate convergent validity

A

Correct answer: B

b. adequate convergent validity

Same Trait / Different Methods

You want this coefficient to be large.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

In factor analysis, a communality indicated the proportion of variance accounted for in:

a. a single variable by a single factor
b. multiple variables by a single factor
c. a single variable by all of the identified factors.
d. multiple variables by common error.

A

Correct answer: C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

An educational psychologist designs a screen test to identify underachieving first and second-grade students who may have a learning disability. The psychologist will be most concerned that her test has adequate _____ validity.

a. content
b. construct
c. concurrent
d. predictive

A

Correct answer: C (concurrent)

If they were asking about learning disability in the future, the correct answer would have been “predictive.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

A personnel director uses an assertiveness test to hire salespeople. However, several of the people who are hired based on their test results turn out to be less than adequate performers. These individuals are:

a. false positives
b. false negatives
c. true positives
d. true negatives

A

Correct answers: A

b. individuals who are predicted to do well on the predictor but do poorly on the criterion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Which of the following best described the relationship between a test’s reliability and validity?

a. An invalid test is never reliable.
b. A valid test may or may not be reliable.
c. A reliable test is also valid.
d. A reliable test may or may not be valid.

A

Correct answer: D

Reliability is a necessary but not sufficient condition for validity.

A test can be reliable, but it may not measure what it was intended to measure.

50
Q

In a normal distribution, which of the following represents the lowest score?

a. percentile rank of 16
b. z-score of -1.5
c. T-score of 30
d. deviation IQ of 80

A

Correct answer: C

a. this is equivalent one SD below the mean
b. equivalent to 1.5 SD below the mean
c. Mean of 50 and SD of 10; 30 = 2 SD’s below mean
d. Mean of 100 and SD of 15; slightly less than 1 SD below the mean

51
Q

The distribution of percentile ranks is always:

a. normal regardless of the shape of the distribution of raw scores
b. flat (rectangular) regardless of the shape of the distribution of raw scores
c. leptokurtic regardless of the shape of the distribution of raw scores
d. the same as the shape of the distribution of raw scores

A

Correct answer: B

52
Q

An advantage of using the kappa statistic rather than percent agreement when assessing a test’s inter-rater reliability is that the former:
Select one:

A. is easier to calculate.
B. corrects for chance agreement.
C. corrects for small sample size.
D. takes into account the effects of multicollinearity.

A

If you know that the problem with percent agreement as a measure of inter-rater reliability is that it is inflated by chance agreement, you would have been able to answer this question correctly even if you aren’t familiar with the kappa statistic.

a. Incorrect The kappa statistic is actually more difficult to calculate than is percent agreement.
b. CORRECT The kappa statistic (which is also known as Cohen’s kappa and the kappa coefficient) provides a more accurate estimate of reliability than percent agreement because its calculation includes removing the effects of chance agreement.
c. Incorrect This does not describe the kappa statistic.
d. Incorrect This does not describe the kappa statistic.

The correct answer is: corrects for chance agreement.

53
Q

The correction for attenuation formula is used to measure the impact of increasing:
Select one:

A. a tests reliability on its validity.
B. a tests validity on its reliability.
C. the number of test items on the tests validity.
D. the number of test items on the tests reliability.

A

For the exam, you’ll want to be familiar with the purpose of several formulas including the correction for attenuation formula.

a. CORRECT The correction for attenuation formula is used to determine the impact of increasing the reliability of the predictor (test) and/or the criterion on the predictor’s validity.
b. Incorrect See explanation for response a.
c. Incorrect See explanation for response a.
d. Incorrect See explanation for response a.

The correct answer is: a tests reliability on its validity.

54
Q

In a multitrait-multimethod matrix, the coefficient that indicates a test’s reliability is the _____________ coefficient.
Select one:

A. heterotrait-heteromethod
B. heterotrait-monomethod
C. monotrait-heteromethod
D. monotrait-monomethod

A

The multitrait-multimethod matrix contains four types of correlation coefficients.

a. Incorrect The heterotrait-heteromethod coefficient is a measure of divergent validity.
b. Incorrect This heterotrait-monomethod coefficient is also a measure of divergent validity.
c. Incorrect The monotrait-heteromethod coefficient is a measure of convergent validity.
d. CORRECT The monotrait-monomethod coefficient indicates the correlation of the test with itself and is a measure of the test’s reliability.

The correct answer is: monotrait-monomethod

55
Q

After reviewing the data collected on a new selection test during the course of a criterion-related validity study, a psychologist decides to lower the selection test cutoff score. Apparently the psychologist is hoping to do which of the following?
Select one:

A. reduce the number of false negatives
B. increase the number of true positives
C. reduce the number of false positives
D. increase the number of false negatives

A

To identify the correct answer to this question, you have to be familiar with the effects of changing a predictor cutoff score on the number of true and false positives and negatives. A diagram that you may find helpful for understanding the effects of changing the predictor and criterion cutoff scores is included in the Test Construction chapter of the written study materials.

a. Incorrect Lowering the predictor cutoff will decrease the number of false negatives. However, it’s not likely that the psychologist will want to decrease the number of individuals who would have been successful on the criterion if they had been hired. Therefore, this isn’t the best answer.
b. CORRECT By lowering the selection test (predictor) cutoff score, the psychologist will increase the number of people who are accepted on the basis of their selection test score – i.e., doing so will increase the number of positives, including the number of true positives, who are individuals who will be selected on the basis of their test scores and will be successful on the criterion.
c. Incorrect Lowering the selection test cutoff will increase the number of false positives.
d. Incorrect Lowering the selection test cutoff will decrease the number of false negatives.

The correct answer is: increase the number of true positives

56
Q

The optimal item difficulty level (p) for a true/false test is:
Select one:

A. +1.0.
B. .75.
C. .25.
D. -1.0.

A

The optimal item difficult level depends on several factors, including the probability that an examinee can select the correct answer by chance alone.

a. Incorrect See explanation for response b.
b. CORRECT When considering the probability that an examinee can select the correct answer by chance alone, the optimal difficulty level is halfway between 100% of examinees answering the item correctly and the probability of answering the item correctly by chance alone. For a true/false item, the latter is 50%, so the optimal item difficulty is 75% (.75), which is halfway between 100% and 50%.
c. Incorrect See explanation for response b.
d. Incorrect See explanation for response b.

The correct answer is: .75.

57
Q

The primary advantage in using a percentile rank, z-score, or T-score is that these scores:
Select one:

A. are easy to interpret because they reference an individuals test performance to an absolute standard of performance.
B. are easy to interpret because they reference an individual’s test performance to the performance of other examinees.
C. are easy to interpret because they make it possible to predict which criterion group an examinee is likely to belong to.
D. normalize the raw score distribution so that parametric tests can be used to analyze test scores.

A

The scores listed in this question are all norm-referenced scores that indicate how well an examinee did in comparison to others in the norm group.

a. Incorrect This answer does not accurately describe norm-referenced scores.
b. CORRECT Because it is usually difficult to “make sense” of raw scores, they are often transformed into scores that are easier to interpret. The advantage of norm-referenced scores (which are a type of transformed score) is that they make it possible to determine how well an examinee did in comparison to other examinees.
c. Incorrect This describes a criterion-referenced score, not a norm-referenced score.
d. Incorrect Although some norm-referenced scores are “normalized” (e.g., normalized z-scores), this is not true for all norm-referenced scores.

The correct answer is: are easy to interpret because they reference an individual’s test performance to the performance of other examinees.

58
Q

Which of the following best describes the relationship between validity and reliability?
Select one:

A. A valid test is also a reliable test.
B. A valid test may or may not be a reliable test.
C. A reliable test is also a valid test.
D. An invalid test is not a reliable test.

A

Knowing that reliability is a necessary but not sufficient condition for validity would have helped you identify the correct answer to this question.

a. CORRECT Reliability sets an upper limit on validity, which means that a valid test must also be a reliable test. However, high reliability does not guarantee validity - i.e., a test can be free from the effects of measurement error but not measure the attribute it was designed to measure.
b. Incorrect A valid test must also be reliable.
c. Incorrect A reliable test may or may not be valid.
d. Incorrect An invalid test may or may not be reliable.

The correct answer is: A valid test is also a reliable test.

59
Q

Assuming no constraints in terms of time, money, or other resources, the best (most thorough) way to demonstrate that a test has adequate reliability is by using which of the following techniques?
Select one:

A. equivalent (alternate) forms
B. test-retest
C. Cronbach’s alpha
D. Cohen’s kappa

A

The most thorough method for assessing reliability is the one that takes into account the greatest number of potential sources of measurement error.

a. CORRECT Because equivalent forms reliability takes into account error due to both time and content sampling, it is the most thorough method for establishing reliability and, consequently, is considered by some experts to be the best method.
b. Incorrect See explanation for response a.
c. Incorrect See explanation for response a.
d. Incorrect See explanation for response a.

The correct answer is: equivalent (alternate) forms

60
Q

When using criterion-referenced interpretation of scores obtained on a job knowledge test, you would most likely be interested in which of the following?
Select one:

A. the total number of test items answered correctly by an examinee
B. an examinee’s performance relative to that of other examinees
C. an examinee’s standing on two or more measures designed to assess the same characteristic
D. ensuring that test items are based on a systematic job evaluation

A

As its name implies, criterion-referenced interpretation entails interpreting an examinee’s score in terms of a criterion, or standard of performance.

a. CORRECT One criterion that is used to interpret a person’s test score is the total number of correct items. This criterion is probably most associated with “mastery testing.” A person is believed to have mastered a content area when he/she obtains a predetermined minimum score on the test that is designed to assess knowledge of that area. There are other types of criteria that are external to the test itself that are used in criterion-referenced interpretation but none of the other responses addresses those types of interpretation; and, consequently, this answer is the best one.
c. Incorrect This answer doesn’t describe criterion-referenced interpretation.
d. Incorrect For some tests used in organizational settings, it might be important to base the content of the test on a job analysis, but not on a job evaluation which is conducted to set wages and salaries). Also, this is not relevant to criterion-referenced interpretation of test scores.

The correct answer is: the total number of test items answered correctly by an examinee

61
Q

In factor analysis, a factor loading indicates the correlation between:
Select one:

A. a test and an identified factor.
B. two different tests.
C. two factors measured by the same test.
D. two factors measured by different tests.

A

A factor loading provides information about a test’s factorial validity.

a. CORRECT In factor analysis, a factor loading is a correlation coefficient that indicates the correlation between a test and an identified factor.
b. Incorrect See explanation for response a.
c. Incorrect See explanation for response a.
d. Incorrect See explanation for response a.

The correct answer is: a test and an identified factor.

62
Q

A reliability coefficient of .60 indicates that ___ of variability in test scores is true score variability.
Select one:

A. 60%
B. 40%
C. 36%
D. 16%

A

Answer A is correct. A reliability coefficient is interpreted directly as a measure of true score variability. A reliability coefficient of .60 indicates that 60% of the variability in scores is true score variability, while the remaining 40% of the variability is due to measurement (random) error.

The correct answer is: 60%

63
Q

In terms of magnitude, the standard error of measurement can be:
Select one:

A. no greater than 1.0.
B. no less than 1.0.
C. no greater than the standard deviation of the test scores.
D. no less than the standard deviation of the test scores.

A

Knowing the formula for the standard error of measurement would have helped you identify the correct answer to this question. The standard error of measurement equals the standard deviation of the test (predictor) scores times the square root of one minus the reliability coefficient.

a. Incorrect The standard error can be larger than 1.0.
b. Incorrect The standard error can also be less than 1.0.
c. CORRECT The maximum value for the standard error of measurement is the value of the standard deviation of the test scores. The standard error is equal to the standard deviation when the reliability coefficient is zero.
d. Incorrect The standard error of measurement is equal to 0 when the test’s reliability coefficient is +1.0. In this situation, if the test’s standard deviation is larger than 0 (which it usually is), the standard error will be less than the standard deviation.

The correct answer is: no greater than the standard deviation of the test scores.

64
Q

To maximize the ability of a test to discriminate among test takers, a test developer will want to include test items that vary in terms of difficulty. If the test developer wants to add more difficult items to her test, she will include items that have an item difficulty index of:
Select one:

A. .90.
B. .50.
C. .10.
D. 0

A

The item difficulty index ranges from 0 (which occurs when no one answers the item correctly) to 1.0 (which occurs when everyone in a sample answers the item correctly).

a. Incorrect When an item’s difficulty index is .90, this means that it is a very easy item - i.e., it was answered correctly by 90% of examinees in the sample.
b. Incorrect An item difficulty level of .50 indicates an item of moderate difficulty (50% of examinees answered the item correctly).
c. CORRECT An item difficulty level of .10 indicates a difficult item (only 10% of examinees in the sample answered it correctly) and is the best answer of those given.
d. Incorrect An item difficulty level of 0 means that no examinees answered this item correctly. Although this is a difficult item, it would not be useful for discriminating among examinees and, therefore, items with a difficulty index of 0 would not be included in a test.

The correct answer is: .10.

65
Q

Stella S. obtains a score of 50 on a test that has a standard deviation of 10 and a standard error of measurement of 5. The 95% confidence interval for Stella’s score is approximately:
Select one:

A. 45 to 55.
B. 40 to 60.
C. 35 to 65.
D. 30 to 70.

A

Answer B is correct. The 95% confidence interval for an obtained test score is constructed by multiplying the standard error of measurement by 1.96 and adding and subtracting the result to and from the examinee’s obtained score. An interval of 40 to 60 is closest to the 95% confidence interval and was obtained by multiplying the standard error by 2.0 (instead of 1.96) and then adding and subtracting the result (10) to and from Stella’s score of 50. Additional information on calculating confidence intervals is provided in the Test Construction chapter of the written study materials.

The correct answer is: 40 to 60.

66
Q

You ask a group of experienced salespeople to review the test items included in a test you have developed to help select new sales applicants. You are apparently interested in determining the test’s ______ validity.
Select one:

A. incremental
B. content
C. concurrent
D. differential

A

Of the types of validity listed in the answers, only one is established primarily by having items reviewed by subject matter experts.

a. Incorrect Incremental validity refers to the degree to which use of a test increases decision-making accuracy.
b. CORRECT A test’s content validity refers to the extent to which test items represent the domain of knowledge, skills, and/or abilities the test was designed to measure. Content validity is established primarily by having subject matter experts evaluate items in terms of their representativeness.
c. Incorrect Concurrent validity is a type of criterion-related validity and is evaluated by correlating scores on the test with scores on an external criterion.
d. Incorrect A test has differential validity when it has different validity coefficients for different groups.

The correct answer is: content

67
Q

A 200-item test that has been administered to 100 college students has a normal distribution, a mean of 145, and a standard deviation of 12. When the students’ raw scores have been converted to percentile ranks, Alex obtains a percentile rank of 49, while his twin sister Alicia obtains a percentile rank of 90. The teacher realizes that she made a mistake in scoring Alex’s and Alicia’s tests: Both should have received a raw score that was five points higher. In terms of their percentile ranks, when the teacher adds the five points to Alex’s and Alicia’s scores, she can expect that:
Select one:

A. Alicia’s percentile rank will increase more than Alex’s.
B. Alex’s percentile rank will increase more than Alicia’s.
C. Alicia’s and Alex’s percentile ranks will increase by the same amount.
D. Alicia’s and Alex’s percentile ranks will not change.

A

A problem with percentile ranks is that, when the raw scores are normally distributed, raw score differences near the center of the distribution are exaggerated when they are converted to percentile ranks, while raw score differences at the extremes are reduced. (A useful mnemonic for remembering this is “more change in the middle.”)

a. Incorrect See explanation for response b.
b. CORRECT Because of the above-described phenomenon, Alex’s percentile rank will increase more than Alicia’s. This makes sense if you think about the normal distribution: Since most of the scores are “piled up” near the center of the distribution, the increase in 5 points in Alex’s score will position him above a larger number of examinee’s than the 5 point increase in Alicia’s score. This difference will be reflected in their percentile ranks.
c. Incorrect See explanation for response b.
d. Incorrect See explanation for response b.

The correct answer is: Alex’s percentile rank will increase more than Alicia’s.

68
Q

In the context of test construction, cross-validation is associated with which of the following?
Select one:

A. shrinkage
B. criterion deficiency
C. criterion contamination
D. banding

A

For the exam, you want to have cross-validation associated with “shrinkage.”

a. CORRECT Cross-validation refers to re-assessing a test’s criterion-related validity with a new sample. Because the chance factors operating in the original sample are not all present to those operating in the cross-validation sample, the validity coefficient usually “shrinks” (is smaller) for the new sample.
b. Incorrect See explanation for response a.
c. Incorrect See explanation for response a.
d. Incorrect See explanation for response a.

The correct answer is: shrinkage

69
Q

A test designed to measure knowledge of clinical psychology is likely to have the highest reliability coefficient when:
Select one:

A. the test consists of 30 items and the tryout sample consisted of individuals who are heterogeneous in terms of knowledge of clinical psychology.
B. the test consists of 30 items and the tryout sample consisted of individuals who are homogeneous in terms of knowledge of clinical psychology.
C. the test consists of 80 items and the tryout sample consisted of individuals who are heterogeneous in terms of knowledge of clinical psychology.
D. the test consists of 80 items and the tryout sample consisted of individuals who are homogeneous in terms of knowledge of clinical psychology.

A

The reliability of a test is affected by several factors including the length of the test and the heterogeneity of the sample in terms of the abilities or other attributes measured by the test items.

a. Incorrect See explanation for response c.
b. Incorrect See explanation for response c.
c. CORRECT All other things being equal, longer tests are more reliable than shorter tests. In addition, the reliability coefficient (like any other correlation coefficient) is larger when there is an unrestricted range of scores - i.e., when the tryout sample contains examinees who are heterogeneous with regard to the attribute(s) measured by the test.
d. Incorrect See explanation for response c.

The correct answer is: the test consists of 80 items and the tryout sample consisted of individuals who are heterogeneous in terms of knowledge of clinical psychology.

70
Q

The best way to control consensual observer drift is to:
Select one:

A. use the correction for attenuation formula.
B. use a true experimental research design.
C. videotape the observers.
D. alternate raters.

A

Consensual observer drift occurs when observers’ ratings become increasingly less accurate over time in a systematic way.

a. Incorrect See explanation for response d.
b. Incorrect See explanation for response d.
c. Incorrect See explanation for response d.
d. CORRECT Of the actions described in the answers to this question, this one is the best way to alleviate consensual observer drift, which occurs when raters who are working together influence each other’s ratings so that they assign ratings in increasingly similar (and idiosyncratic) ways.

The correct answer is: alternate raters.

71
Q

Which of the following scores does not “belong with” the other three?
Select one:

A. stanine scores
B. z-scores
C. percentile ranks
D. percentage scores

A

This question requires you be able to distinguish between norm- and criterion-referenced scores, which are described in the Test Construction chapter of the written study materials.

a. Incorrect See explanation for response d.
b. Incorrect See explanation for response d.
c. Incorrect See explanation for response d.
d. CORRECT The scores listed in answers a, b, and c are norm-referenced scores that permit an examinee’s score to be compared to the scores of others who are taking or have taken the same test. In contrast, percentage scores are a type of criterion-referenced score that reference an examinee’s score to the content of the exam and indicate how much of the content an examinee has mastered.

The correct answer is: percentage scores

72
Q

The minimum and maximum values of the standard error of estimate are:
Select one:

A. -1 and +1.
B. 0 and 1.
C. 0 and the standard deviation of the predictor.
D. 0 and the standard deviation of the criterion.

A

Knowing the formula for the standard error of estimate would have helped you identify the correct answer to this question.

a. Incorrect See explanation for response d.
b. Incorrect See explanation for response d.
c. Incorrect See explanation for response d.
d. CORRECT The standard error of estimate equals the standard deviation of the criterion scores times the square root of one minus the validity coefficient squared. This formula indicates that the standard error of estimate ranges from 0 (which occurs when the validity coefficient is 1.0) to the standard deviation of the criterion scores (which occurs when the validity coefficient is 0).

The correct answer is: 0 and the standard deviation of the criterion.

73
Q

When a test has been constructed on the basis of item response theory, an examinee’s total test score provides information about his/her:
Select one:

A. status on a latent trait or ability.
B. predicted performance on an external criterion.
C. performance relative to other examinees included in the standardization sample.
D. current developmental level.

A

Item response theory is an alternative to classical test theory for the development of tests and interpretation of test scores.

a. CORRECT Scores on tests developed on the basis of item response theory are reported in terms of the examinee’s level on the trait or ability measured by the test rather than in terms of a total score. An advantage of this method of score reporting is that it makes it possible to compare scores from different sets of items and from different tests.
b. Incorrect See explanation for response a.
c. Incorrect See explanation for response a.
d. Incorrect Information on current developmental level might be provided by some, but not all, tests that are developed on the basis of item response theory. Therefore, this is not the best answer of those given.

The correct answer is: status on a latent trait or ability.

74
Q

The distribution of percentile ranks is always:
Select one:

A. the same as the shape of the distribution of raw scores.
B. normal regardless of the shape of the distribution of raw scores.
C. rectangular (flat) regardless of the shape of the distribution of raw scores.
D. bimodal regardless of the shape of the distribution of raw scores.

A

For the exam, you want to be familiar with the shape of the normal distribution (bell-shaped) and the shape of a distribution of percentile ranks (rectangular).

a. Incorrect See explanation for response c.
b. Incorrect See explanation for response c.
c. CORRECT A distinguishing characteristic of percentile ranks is that their distribution is always rectangular (flat) regardless of the shape of the distribution of raw scores.
d. Incorrect See explanation for response c.

The correct answer is: rectangular (flat) regardless of the shape of the distribution of raw scores.

75
Q

It would be most important to assess the test-retest reliability of a measure that:
Select one:

A. is subjectively scored.
B. assesses examinees’ speed of responding.
C. measures a stable trait.
D. measures a characteristic that fluctuates over time.

A

To evaluate test-retest reliability, the same test is administered to the same group of examinees on two different occasions. The two sets of scores are then correlated.

a. Incorrect Test-retest reliability could be used to assess the reliability of a subjectively scored test, but it would not be “important” to do so. Therefore, this isn’t the best answer.
b. Incorrect Alternate forms reliability is better for speeded tests because it eliminates the problem of practice effects.
c. CORRECT If a test is designed to measure a stable trait, you would want to make sure that scores are stable over time. Therefore, test-retest reliability would be important for this kind of test.
d. Incorrect Test-retest would be the wrong type of reliability for a characteristic that fluctuates over time.

The correct answer is: measures a stable trait.

76
Q

The standard error of measurement is used to:
Select one:

A. estimate a test’s “true” reliability coefficient.
B. estimate a test’s “true” criterion-related validity coefficient.
C. calculate the range within which an examinee’s true test score is likely to fall given her obtained score.
D. calculate the range within which an examinee’s true criterion score is likely to fall given her predicted criterion score.

A

Since no test is completely reliable, any test score may be in error (i.e., may not reflect the examinee’s “true” test score). Consequently, the best strategy is to interpret the examinee’s score in terms of a confidence interval.

a. Incorrect See explanation for response c.
b. Incorrect See explanation for response c.
c. CORRECT The standard error of measurement is an index of error and is used to construct an interval in which an examinee’s true test score is likely to fall given his or her obtained test score.
d. Incorrect This answer describes the use of the standard error of estimate.

The correct answer is: calculate the range within which an examinee’s true test score is likely to fall given her obtained score.

77
Q

A personnel director uses a mechanical aptitude test to hire machine shop workers. Several of the people hired using the test turn out to be less than adequate performers. These individuals are:
Select one:

A. true positives.
B. true negatives.
C. false positives.
D. false negatives.

A

For the exam, you’ll want to be familiar with the definitions of the terms listed in the answers to this question.

a. Incorrect True positives are individuals who are predicted to perform satisfactorily by the predictor and, in fact, do well on the criterion.
b. Incorrect True negatives are individuals who are predicted to perform poorly by the predictor and, in fact, perform poorly on the criterion.
c. CORRECT False positives are individuals who are predicted to perform satisfactorily by the predictor but, in fact, perform poorly on the criterion. In other words, these individuals have been “falsely identified as positives.”
d. Incorrect False negatives are individuals who are predicted to perform poorly by the predictor but, in fact, do well on the criterion.

The correct answer is: false positives.

78
Q

Which of the following item difficulty (p) levels maximizes the differentiation of examinees into high- and low-performing groups?
Select one:

A. 0.5
B. 0.9
C. 1.5
D. 0

A

Answer A is correct. An item difficulty level (p) ranges in value from 0 to +1.0 with a value of 0 indicating a very difficult item and a value of +1.0 indicating a very easy item. A difficulty index of .50 indicates that 50% of examinees in the try-out sample answered the item correctly. When p equals .50, this means that the item provides maximum differentiation between the upper- and lower-scoring examinees - i.e., a large proportion of examinees in the upper group answered the item correctly, while a small proportion of examinees in the lower group answered it correctly.

The correct answer is: 0.5

79
Q

Criterion contamination has which of the following effects?
Select one:

A. It artificially increases scores on the criterion.
B. It artificially reduces the criterion’s reliability coefficient.
C. It artificially increases the predictor’s criterion-related validity coefficient.
D. It artificially attenuates scores on the predictor and the criterion.

A

Criterion contamination occurs when a rater’s knowledge of a person’s predictor performance biases how he/she rates the person on the criterion.

a. Incorrect The effect of criterion contamination on an individual’s criterion score will depend on how well he/she did on the predictor (i.e., if the individual received a low score on the predictor, she will receive a low rating on the criterion and vice versa).
b. Incorrect The effect on reliability will depend on the situation and the method used to establish the reliability of the criterion.
c. CORRECT Criterion contamination has the effect of artificially inflating the correlation between the predictor and the criterion.
d. Incorrect This response doesn’t really make sense since the meaning of “attenuation” in this context is unclear.

The correct answer is: It artificially increases the predictor’s criterion-related validity coefficient.

80
Q

You would use a “multitrait-multimethod matrix” in order to:
Select one:

A. compare a test’s predictive and concurrent validity.
B. determine if a test has adequate convergent and discriminant validity.
C. identify the common factors underlying a set of related constructs.
D. test hypotheses about the causal relationships among variables.

A

A multitrait-multimethod matrix contains the correlation coefficients between measures that do and do not purport to measure the same trait.

a. Incorrect Predictive and concurrent validity are types of criterion-related validity. The multitrait-multimethod matrix is used to assess a measure’s construct validity.
b. CORRECT When a measure correlates highly with other measures of the same trait, the measure has convergent validity; when it has low correlations with measures of different traits, it has discriminant (divergent) validity. Convergent and discriminant validity are used as evidence of construct validity, and the multitrait-multimethod matrix contains correlation coefficients that provide information about a measure’s convergent and discriminant validity.
c. Incorrect This answer describes factor analysis, which is used to identify the common factors that underlie a set of tests, test items, or other variables.
d. Incorrect This answer doesn’t describe the purpose of the multitrait-multimethod matrix.

The correct answer is: determine if a test has adequate convergent and discriminant validity.

81
Q

The applicants for sales positions at the Acme Company complain that the selection test they are required to take is unfair because it doesn’t “look like” it measures the knowledge and skills that are important for successful job performance. Their complaint suggests that the selection test is lacking which of the following?
Select one:

A. incremental validity
B. differential validity
C. construct validity
D. face validity

A

In this situation, the selection test doesn’t appear to be measuring the skills and knowledge that are important for success as a salesperson.

a. Incorrect See explanation for response d.
b. Incorrect See explanation for response d.
c. Incorrect See explanation for response d.
d. CORRECT Face validity refers to the extent that a test appears to be valid to test-takers - i.e., to the extent that the test “looks like” it is measuring what it is supposed to be measuring.

The correct answer is: face validity

82
Q

In factor analysis, the original factor matrix is usually rotated in order to:
Select one:

A. facilitate interpretation of the identified factors.
B. determine how many factors to extract.
C. cross-validate the factor analysis.
D. verify the causal relationships among the identified factors.

A

One characteristic of the original factor matrix is that it is usually difficult to interpret because it does not provide a clear pattern of factor loadings. Additional information on rotation and other aspects of factor analysis that you want to be familiar with for the exam is provided in the Test Construction chapter of the written study materials.

a. CORRECT The rotation of factors provides a clearer pattern of factor loadings - i.e., in the rotated matrix, some tests correlate most highly with one factor, while other tests correlate more highly with a different factor. This makes it easier to identify the factors (dimensions) that account for the intercorrelations between the tests.
b. Incorrect The correlation matrix provides the information needed to determine how many factors to extract.
c. Incorrect See explanation for response a.
d. Incorrect See explanation for response a.

The correct answer is: facilitate interpretation of the identified factors.

83
Q

A final exam is developed to evaluate students’ comprehension of information presented in a high school history class. When the exam is administered to three classes of students at the end of the semester, all students obtain failing scores. This suggests that the exam may have poor ________ validity.

Select one:
A. concurrent
B. incremental
C. content
D. divergent
A

The first sentence of this question gives you the information you need to identify the correct answer to this question: It states that the purpose of the test is to assess the students’ knowledge of the content presented in the history course.

a. Incorrect Concurrent validity (a type of criterion-related validity) refers to the extent to which test scores correlate with scores on an external criterion. In this situation, performance on the test is the measure of performance, and test scores are not being correlated with scores on an external criterion. Consequently, concurrent validity is not relevant to this situation.
b. Incorrect Incremental validity is associated with criterion-related validity and refers to the increase in decision-making accuracy that results from use of a predictor. Incremental validity is not an issue in this situation.
c. CORRECT If all students do poorly on a test designed to assess their mastery of the course content, one possible reason is that the test questions do not represent that content; i.e., the test does not have adequate content validity. (There are, of course, other possible reasons, for the students’ low scores but, of answers given, this is the best one.)
d. Incorrect A measure has divergent validity when scores on the measure do not correlate with scores on measures of unrelated traits. Divergent validity is not relevant to this situation.

The correct answer is: content

84
Q

In factor analysis, when two factors are “orthogonal,” this means that:
Select one:

A. the factors are correlated.
B. the factors are uncorrelated.
C. the factors explain a statistically significant amount of variability in test scores.
D. the factors do not explain a statistically significant amount of variability in test scores.

A

For the exam, you’ll want to know the difference between orthogonal and oblique factors.

a. Incorrect See explanation for response b.
b. CORRECT In factor analysis, orthogonal factors are uncorrelated (independent) and oblique factors are correlated (dependent).
c. Incorrect See explanation for response b.
d. Incorrect See explanation for response b.

The correct answer is: the factors are uncorrelated.

85
Q

Incremental validity is a measure of:
Select one:

A. decision-making accuracy.
B. shrinkage
C. the generalizability of research results.
D. the costs involved in using a predictor.

A

The name “incremental validity” can help you remember what this term refers to.

a. CORRECT Incremental validity refers to the increase (“increment”) in decision-making accuracy that results from the use of a new predictor (e.g., the increase in accurate hiring decisions).
b. Incorrect See explanation for response a.
c. Incorrect See explanation for response a.
d. Incorrect See explanation for response a.

The correct answer is: decision-making accuracy.

86
Q

When the kappa statistic for a measure is .90, this indicates that the measure:
Select one:

A. has adequate inter-rater reliability.
B. has adequate internal consistency reliability.
C. has low criterion-related validity.
D. has low incremental validity.

A

Knowing that the kappa statistic (also known as the kappa coefficient) is a measure of inter-rater reliability would have enabled you to identify the correct response to this question.

a. CORRECT Reliability coefficients range from 0 to +1.0, so a coefficient of .90 indicates good reliability.
b. Incorrect The kappa statistic is a measure of inter-rater reliability, not internal consistency reliability.
c. Incorrect See explanation above.
d. Incorrect See explanation above.

The correct answer is: has adequate inter-rater reliability.

87
Q

Which type of reliability would be most appropriate for estimating the reliability of a multiple-choice speeded test?
Select one:

A. split-half
B. coefficient of concordance
C. alternate forms
D. coefficient alpha

A

Speeded tests are designed so that all items answered by an examinee are answered correctly, and the examinee’s total score depends primarily on his/her speed of responding. Because of the nature of these tests, a measure of internal consistency will provide a spuriously high estimate of the test’s reliability.

a. Incorrect Split-half reliability is a type of internal consistency reliability and is not appropriate for speeded tests.
b. Incorrect The coefficient of concordance is a measure of inter-rater reliability for three or more raters when ratings are ranks. The test described in this question is a multiple-choice test, not a rating scale or other subjectively scored measure that would report scores in terms of ranks. Therefore, inter-rater reliability would not be a concern.
c. CORRECT Alternate-forms reliability is an appropriate method for establishing the reliability of speeded tests.
d. Incorrect Coefficient alpha is a type of internal consistency reliability and is less appropriate for speeded tests than is alternate forms reliability.

The correct answer is: alternate forms

88
Q

In a distribution of percentile ranks, the number of examinees receiving percentile ranks between 20 and 30 is:
Select one:

A. equal to the number of examinees receiving percentile ranks between 50 and 60.
B. greater than the number of examinees receiving percentile ranks between 50 and 60.
C. about equal to one-half the number of examinees receiving percentile ranks between 50 and 60.
D. about equal to one-fourth the number of examinees receiving percentile ranks between 50 and 60.

A

Knowing that a distribution of percentile ranks is flat (rectangular) would have helped you identify the correct answer to this question.

a. CORRECT The flatness of a percentile rank distribution indicates that scores are evenly distributed throughout the full range of the distribution. In other words, at least theoretically, the same number of examinees fall at each percentile rank. Consequently, the same number of examinees obtain percentile ranks between the ranks of 20 and 30, 30 and 40, etc.
b. Incorrect See explanation for response a.
c. Incorrect See explanation for response a.
d. Incorrect See explanation for response a.

The correct answer is: equal to the number of examinees receiving percentile ranks between 50 and 60.

89
Q

To obtain a “coefficient of stability,” you would:
Select one:

A. administer the same test twice to the same group of examinees on two separate occasions and correlate the two sets of scores.
B. administer a test to a group of examinees and determine the average inter-item correlation.
C. administer a test to two different random samples of examinees on two occasions and correlate the two sets of scores.
D. administer parallel forms of a test to the same group of examinees and correlate the two sets of scores.

A

The coefficient of stability is another name for the test-retest reliability coefficient.

a. CORRECT To obtain a coefficient of stability, the same measure is administered to the same group of examinees on two separate occasions and the scores obtained by the examinees are correlated. The result indicates the consistency (stability) of scores over time.
b. Incorrect This answer describes the procedure for obtaining a coefficient of internal consistency.
c. Incorrect When assessing test-retest reliability, the test is administered twice to the same group of examinees, not to different to different groups of examinees.
d. Incorrect This would provide a coefficient of equivalence (alternate forms reliability coefficient).

The correct answer is: administer the same test twice to the same group of examinees on two separate occasions and correlate the two sets of scores.

90
Q

The point at which an item characteristic curve intercepts the vertical (Y) axis provides information on which of the following?
Select one:

A. the item’s difficulty level
B. the item’s ability to discriminate between low and high scorers
C. the probability of answering the item correctly by guessing
D. the item’s ability to predict performance on an external criterion

A

As noted in the Test Construction chapter of the written study materials, most item characteristic curves provide information on three parameters – difficulty level, discrimination, and probability of guessing correctly.

a. Incorrect See explanation for response c.
b. Incorrect See explanation for response c.
c. CORRECT The vertical axis indicates the probability of choosing a correct response as a function of an examinee’s ability level. The point at which the item characteristic curve intercepts the vertical axis indicates the probability of choosing the correct response by chance alone.
d. Incorrect See explanation for response c.

The correct answer is: the probability of answering the item correctly by guessing

91
Q

A test developer would construct an expectancy table to:
Select one:

A. facilitate norm-referenced interpretation of test scores.
B. facilitate criterion-referenced interpretation of test scores.
C. correct obtained scores for the effects of guessing.
D. correct obtained test scores for the effects of measurement error.

A

For the exam, you want to be familiar with the methods used to facilitate the interpretation of test scores that are described in the Test Construction chapter of the written study materials.

a. Incorrect See explanation for response b.
b. CORRECT An expectancy table provides the information needed to interpret an examinee’s score in terms of expected performance on an external criterion and, consequently, is a method of criterion-referenced interpretation.
c. Incorrect See explanation for response b.
d. Incorrect See explanation for response b.

The correct answer is: facilitate criterion-referenced interpretation of test scores.

92
Q

To evaluate the validity of a newly developed selection test for clerical workers, a test developer will correlate scores obtained on the test by newly hired clerical workers with the job performance ratings they receive after being on-the-job for six months. The resulting correlation coefficient will provide information on the test’s:
Select one:

A. discriminant validity.
B. predictive validity.
C. construct validity.
D. concurrent validity.

A

The test developer is correlating predictor (selection test) scores with future criterion (job performance) scores and, therefore, is conducting a criterion-related validity study.

a. Incorrect Discriminant validity is a type of construct validity and involves correlating scores on measures of unrelated characteristics.
b. CORRECT There are two types of criterion-related validity – predictive and concurrent. As its name implies, predictive validity involves correlating predictor scores with criterion scores that are obtained at a later time to determine how well the predictor predicts future performance on the criterion.
c. Incorrect The test developer is evaluating the test’s criterion-related validity, not its construct validity.
d. Incorrect The test developer would be evaluating concurrent validity (a type of criterion-related validity) if she obtained predictor and criterion scores at about the same time.

The correct answer is: predictive validity.

93
Q

When using principal component analysis:
Select one:

A. the first principal component represents the largest share of the total variance.
B. the first principal component represents the smallest share of the total variance.
C. each component represents an equal share of the total variance.
D. the order of the components is not related to the share of total variance they represent.

A

This is an example of the “distant galaxy” questions that sometimes appear on the licensing exam and, fortunately, is more difficult than the majority of questions you’ll encounter.

a. CORRECT A characteristic of principal components analysis is that the components (factors) are extracted so that the first component reflects the greatest amount of variability, the second component the second greatest amount of variability, etc.
b. Incorrect See explanation for response a.
c. Incorrect See explanation for response a.
d. Incorrect See explanation for response a.

The correct answer is: the first principal component represents the largest share of the total variance.

94
Q

A test developer would use the Kuder-Richardson Formula (KR-20) in order to:
Select one:

A. evaluate a tests internal consistency reliability.
B. evaluate a tests test-retest reliability.
C. determine the impact of increasing a tests reliability on its validity.
D. determine the impact of lengthening a test on its reliability.

A

KR-20 provides information on internal consistency reliability.

a. CORRECT KR-20 is used to determine a test’s internal consistency reliability when test items are scored dichotomously.
b. Incorrect See explanation for response a.
c. Incorrect See explanation for response a.
d. Incorrect See explanation for response a.

The correct answer is: evaluate a tests internal consistency reliability.

95
Q

Which of the following is used to estimate the effects of shortening or lengthening a test on the test’s reliability coefficient?
Select one:

A. Cohen’s kappa statistic
B. Kuder-Richardson Formula 20
C. Cronbach’s coefficient alpha
D. Spearman-Brown formula

A

For the exam, you want to be familiar with all of the formulas listed in the answers to this question, and these are described in the Test Construction chapter of the written study materials.

a. Incorrect See explanation for response d.
b. Incorrect See explanation for response d.
c. Incorrect See explanation for response d.
d. CORRECT Although the Spearman-Brown formula is probably most often used in conjunction with split-half reliability, it can actually be used whenever a test developer wants to estimate the effects of increasing or decreasing the number of test items on the test’s reliability coefficient.

The correct answer is: Spearman-Brown formula

96
Q

In a normal distribution, a T score of ___ is equivalent to a percentile rank of 16.
Select one:

A. 10
B. 20
C. 30
D. 40

A

To identify the correct answer to this question, you need to be familiar with the areas under the normal curve and know that T scores have a mean of 50 and a standard deviation of 10. A picture of the normal curve with corresponding T-scores, z-scores, etc. is provided in the Test Construction chapter of the written study materials.

a. Incorrect See explanation for response d.
b. Incorrect See explanation for response d.
c. Incorrect See explanation for response d.
d. CORRECT In a normal distribution, a percentile rank of 16 and a T score of 40 are both one standard deviation below the mean.

The correct answer is: 40

97
Q

Content sampling is not a potential source of measurement error for which of the following methods for evaluating a test’s reliability?
Select one:

A. coefficient alpha and alternate forms
B. alternate forms and test-retest
C. split-half only
D. test-retest only

A

Content sampling refers to the extent to which test scores depend on factors specific to the particular items included in the test (i.e., to its content). Note that this question is asking about the type of reliability that is not affected by content sampling.

a. Incorrect Coefficient alpha and alternate forms reliability are both subject to content sampling error. Coefficient alpha, a measure of internal consistency, will yield a low reliability coefficient if items are not internally consistent (i.e., if items do not measure the same content domain); alternate forms reliability will be low if the two forms do not assess the same content.
b. Incorrect Although test-retest reliability is not affected by content sampling error (see response d), as noted above, alternate forms reliability is.
c. Incorrect Split-half reliability would yield a low reliability coefficient if the two halves of the test do not assess the same content.
d. CORRECT Because test-retest reliability involves administering the same test (i.e., the same content) twice, content sampling is not a source of error.

The correct answer is: test-retest only

98
Q

When a test user uses a correction for guessing formula that involves subtracting points from each examinee’s scores, the resulting distribution of scores will have a ____________________ than the original (non-corrected) distribution.
Select one:

A. higher mean and larger standard deviation
B. higher mean and smaller standard deviation
C. lower mean and larger standard deviation
D. lower mean and smaller standard deviation

A

This is a “distant galaxy” question that, fortunately, is more difficult than most of the questions on test construction that you’ll encounter on the licensing exam.

a. Incorrect See explanation for response c.
b. Incorrect See explanation for response c.
c. CORRECT The effect of this type of correction for guessing formula on a distribution’s mean is fairly easy to understand - i.e., use of the formula will result in reducing the scores of some examinees and, thereby, reduce the size of the mean. To understand its effect on the standard deviation, assume that the lowest possible score on a test is 0 and that the highest score is 100, which is obtained by at least one examinee. In this situation, as a result of the correction for guessing, some examinees will obtain scores lower than 0, while the highest scorer will still receive a score of 100. When this occurs, the range of scores will increase, and this will be reflected in the distribution’s standard deviation.
d. Incorrect See explanation for response c.

The correct answer is: lower mean and larger standard deviation

99
Q

Cronbach’s alpha is an appropriate method for evaluating reliability when:
Select one:

A. all test items are designed to measure the same underlying characteristic.
B. test items are subjectively scored.
C. the test will be administered to examinees at regular intervals over time.
D. there is a restriction in the range of scores.

A

To answer this question, you need to know that Cronbach’s alpha is another name for coefficient alpha and is used to assess internal consistency reliability.

a. CORRECT Cronbach’s alpha is an appropriate method for evaluating reliability when the test is expected to be internally consistent - i.e., when all test items measure the same or related characteristics.
b. Incorrect See explanation above.
c. Incorrect See explanation above.
d. Incorrect See explanation above.

The correct answer is: all test items are designed to measure the same underlying characteristic.

100
Q

A psychologist develops a diagnostic test to identify people who have injection phobia. In this situation, the test’s ________ refers to how good the test is at identifying people who actually have injection phobia from the pool of people who have injection phobia.
Select one:

A. specificity
B. sensitivity
C. positive predictive value
D. negative predictive value

A

The terms listed in the answers to this question are used to describe the accuracy of diagnostic tests. Additional information about these terms is provided in the Test Construction chapter of the written study materials.

a. Incorrect Specificity refers to probability that a test will correctly identify people without the disease from the pool of people without the disease. It is calculated with the following formula: true negatives/(true negatives + false positives).
b. CORRECT Sensitivity refers to the probability that a test will correctly identify people with the disease from the pool of people with the disease. It is calculated using the following formula: true positives/(true positives + false negatives).
c. Incorrect The positive predictive value is the probability that a person identified by the test as having the disease actually has the disease. It is calculated with the following formula: true positives/(true positives + false positives).
d. Incorrect The negative predictive value is the probability that a person identified by the test as not having the disease doesn’t actually have the disease. The following formula is used to calculate the negative predictive value: true negatives/(true negatives + false negatives).

The correct answer is: sensitivity

101
Q

A reliability coefficient is best defined as a measure of:
Select one:

A. relevance.
B. consistency.
C. interpretability.
D. generalizability.

A

A test is reliable when it provides consistent results, with inconsistency in test scores being the result of random factors that are present at the time of testing.

a. Incorrect See explanation for response b.
b. CORRECT A reliability coefficient indicates the proportion of variance in test scores that is consistent (i.e., is due to true score variability rather than to measurement error).
c. Incorrect See explanation for response b.
d. Incorrect See explanation for response b.

The correct answer is: consistency.

102
Q

All other things being equal, which of the following tests is likely to have the largest reliability coefficient?
Select one:

A. a multiple-choice test that consists of items that each have five answer options
B. a multiple-choice test that consists of items that each have four answer options
C. a multiple-choice test that consists of items that each have three answer options
D. a true-false test

A

For the licensing exam, you want to be familiar with the factors that affect the size of the reliability coefficient that are described in the Test Construction chapter of the written study materials.

a. CORRECT All other things being equal, tests containing items that have a low probability of being answered correctly by guessing alone are more reliable than tests containing items that have a high probability of being answered correctly by guessing alone. Of the types of items listed, multiple-choice items with five answer options have the lowest probability of being answered correctly by guessing alone.
b. Incorrect See explanation for response a.
c. Incorrect See explanation for response a.
d. Incorrect See explanation for response a.

The correct answer is: a multiple-choice test that consists of items that each have five answer options

103
Q

Which of the following is NOT an example of a standard score?
Select one:

A. WAIS IQ score
B. percentage score
C. z score
D. T score

A

A standard score is a norm-referenced score that indicates an examinee’s performance in terms of standard deviation units (e.g., a z score of 1.0 indicates a raw score that is one standard deviation above the mean).

a. Incorrect WAIS IQ scores are standard scores - e.g., a WAIS IQ of 115 indicates a score that is one standard deviation above the mean of the norm group.
b. CORRECT Percentages are not standard scores.
c. Incorrect A z score is a standard score.
d. Incorrect A T score is also a standard score - e.g., a T score of 40 indicates a score that is one standard deviation below the mean.

The correct answer is: percentage score

104
Q

To maximize the inter-rater reliability of a behavioral observation scale, you should make sure that coding categories:
Select one:

A. are mutually exclusive.
B. are measured on an interval or ratio scale.
C. produce criterion-referenced scores.
D. produce scores that are normally distributed.

A

When a person’s behavior is to be observed and recorded, that behavior must be operationalized in order for the observations to be meaningful. For example, a psychologist interested in obtaining data about aggressiveness in children might record data using categories such as “hits others” or “destroys property.”

a. CORRECT To maximize the reliability of a behavior observation scale, coding categories must be discrete and mutually exclusive. For example, if the behavioral categories for aggressiveness were “aggressive acts” and “emotional displays,” the same behavior might be recorded twice, and an unreliable picture of a child’s behavior would be obtained.
b. Incorrect It is ordinarily better for categories to be discrete rather than continuous (i.e., to be measured on a nominal rather than an interval or ratio scale).
c. Incorrect This is not a requirement for (or usual characteristic of) behavioral observation scales.
d. Incorrect Since coding categories are often discrete (nominal), they do not produce scores that are normally distributed.

The correct answer is: are mutually exclusive.

105
Q

In factor analysis, communality refers to:
Select one:

A. the proportion of variance accounted in a single variable by a single factor.
B. the proportion of variance accounted in multiple variables by a single factor.
C. the proportion of variance accounted for in a single variable by all of the identified factors.
D. the total proportion of variance in all of the variables included in the analysis that is not due to error.

A

A communality is a measure of “common variance” and is a reflection of the amount of variance that a test has in common with the other tests included in the factor analysis.

a. Incorrect See explanation for response c.
b. Incorrect See explanation for response c.
c. CORRECT In factor analysis, a communality is calculated for each test (variable) included in the analysis. The communality indicates the total amount of variability accounted for in the test by all of the identified factors. Additional information on the communality and other aspects of factor analysis that you want to be familiar with for the exam is provided in the Test Construction chapter of the written study materials.
d. Incorrect See explanation for response c.

The correct answer is: the proportion of variance accounted for in a single variable by all of the identified factors.

106
Q

When the heterotrait-monomethod coefficient is large, this indicates:
Select one:

A. a lack of differential validity.
B. a lack of discriminant validity.
C. adequate convergent validity.
D. adequate concurrent validity.

A

The heterotrait-monomethod coefficient represents the correlation between different traits being measured with the same kind of method.

a. Incorrect See explanation for response b.
b. CORRECT If you are validating a test, you want the heterotrait-monomethod coefficient to be low so that you have evidence of discriminant (divergent) validity. When this coefficient is large, this indicates a lack of discriminant validity. Additional information on the heterotrait-monomethod coefficient and other coefficients included in a multitrait-multimethod matrix is provided in the Test Construction chapter of the written study materials.
c. Incorrect See explanation for response b.
d. Incorrect See explanation for response b.

The correct answer is: a lack of discriminant validity.

107
Q

In the multitrait-multimethod matrix, which of the following coefficients provides information about a test’s convergent validity?
Select one:

A. heterotrait-heteromethod
B. heterotrait-monomethod
C. monotrait-heteromethod
D. monotrait-monomethod

A

The multitrait-multimethod matrix contains four types of correlation coefficients. Additional information on these coefficients is provided in the Test Construction chapter of the written study materials.

a. Incorrect The heterotrait-heteromethod coefficient is a measure of divergent validity.
b. Incorrect This heterotrait-monomethod coefficient is also a measure of divergent validity.
c. CORRECT The monotrait-heteromethod coefficient is a measure of convergent validity. It indicates the correlation between the test that is being validated and another measure of the same trait (monotrait) that uses a different method of measurement (heteromethod).
d. Incorrect The monotrait-monomethod coefficient indicates the correlation of the test with itself and is a measure of the test’s reliability.

The correct answer is: monotrait-heteromethod

108
Q

In terms of item response theory, the slope (steepness) of the item characteristic curve indicates the item’s:
Select one:

A. level of difficulty.
B. ability to discriminate between examinees.
C. internal consistency reliability.
D. criterion-related validity.

A

The item characteristic curve provides three pieces of information about an item - i.e., its difficulty, ability to discriminate between those who are high and low on the characteristic being measured, and the probability of answering the item correctly by guessing. Additional information about item response theory is provided in the Test Construction chapter of the written study materials.

a. Incorrect The position of the curve indicates its difficulty level.
b. CORRECT The steeper the slope of the item characteristic curve, the better its ability to discriminate between examinees who are high and low on the characteristic being measured.
c. Incorrect Reliability is not indicated by the item characteristic curve.
d. Incorrect Validity is not indicated by the item characteristic curve.

The correct answer is: ability to discriminate between examinees.

109
Q

In a normal distribution, which of the following represents the lowest score?
Select one:

A. percentile rank of 20
B. z-score of -1.0
C. T score of 25
D. Wechsler IQ score of 70

A

To answer this question, you must be familiar with the relationship between percentile ranks, z-scores, T-scores, and IQ scores in a normal distribution.

a. Incorrect A percentile rank of 20 is slightly less than one standard deviation below the mean.
b. Incorrect A z-score of -1.0 is one standard deviation below the mean.
c. CORRECT A T score is a standardized score with a mean of 50 and a standard deviation of 10. Therefore, a T-score of 25 is two and one-half standard deviations below the mean and represents the lowest score of those given in the answers.
d. Incorrect Wechsler IQ scores have a mean of 100 and standard deviation of 15. Therefore, an IQ score of 70 is two standard deviations below the mean.

The correct answer is: T score of 25

110
Q

__________ refers to the extent to which individual test items contribute to the overall purpose of the test.
Select one:

A. Validity
B. Reliability
C. Discrimination
D. Relevance

A

Several standards are used to evaluate the usefulness of tests and test items, including the four standards listed in the answers to this question.

a. Incorrect Validity refers to the overall accuracy of the test rather than to the accuracy of individual test items. In other words, validity addresses the question, “Does the test measure what it was designed to measure?” Consequently, this is not the best answer of those given.
b. Incorrect Reliability refers to the degree to which test scores are free from the effects of measurement error.
c. Incorrect Discrimination refers to the extent to which test items accurately distinguish between examinees having high and low levels of the attribute(s) measured by the test.
d. CORRECT This question describe relevance, which is determined by judging the extent to which each item assesses the target content or behavior domain and does so at the appropriate ability level.

The correct answer is: Relevance

111
Q

A personnel director hires all job applicants who obtain a high score on a job selection test but, after using the test for six months, realizes that many of the new employees are obtaining low performance ratings. Assuming that the selection test has adequate criterion-related validity, the personnel direct can reduce the number of unsatisfactory workers that she hires using the test by:
Select one:

A. lowering the selection test cutoff score and the job performance rating cutoff score.
B. raising the selection test cutoff score and the job performance rating cutoff score.
C. lowering selection test cutoff score.
D. raising the selection test cutoff score.

A

This question is referring to the test’s incremental validity. If you are having trouble understanding the effects of raising and lowering the predictor and criterion cutoff scores on a test’s incremental validity, the diagram included in the Test Construction chapter of the written study materials may be helpful.

a. Incorrect See explanation for response d.
b. Incorrect See explanation for response d.
c. Incorrect See explanation for response d.
d. CORRECT Applicants who are hired on the basis of their selection test scores but who perform poorly on the job are false positives. Raising the cutoff score on the selection test (predictor) should reduce the number of individuals who do poorly on the job - i.e., it will reduce the number of positives, including the number of false positives. Note that lowering the job performance rating (criterion) cutoff score would also reduce the number of false positives but that, in many work situations, an employer would not want to do this.

The correct answer is: raising the selection test cutoff score.

112
Q

Which of the following types of validity would you be most interested in when designing a selection test that will be used to predict the future job performance ratings of job applicants?
Select one:

A. discriminant
B. content
C. construct
D. criterion-related

A

In this situation, you are developing a test that will be used to predict future job performance.

a. Incorrect Discriminant validity refers to the extent that scores on a test do not correlate with scores on tests measuring different traits. Discriminant validity is of concern for tests designed to measures hypothetical traits (constructs).
b. Incorrect Content validity is of concern for tests designed to measure a specific content or behavior domain.
c. Incorrect Construct validity is of concern for tests designed to measure hypothetical traits.
d. CORRECT When a test is being used to predict performance on a criterion, you would be most interested in the test’s criterion-related validity (e.g., in its correlation with the criterion measure).

The correct answer is: criterion-related

113
Q

To evaluate the concurrent validity of a new selection test for clerical workers, you would:
Select one:

A. conduct a factor analysis to confirm that the test measures the attributes it was designed to measure.
B. have supervisors and others familiar with the job rate test items for relevance to success as a clerical worker.
C. administer the test to a sample of current clerical workers and correlate their scores on the test with their recently assigned performance ratings.
D. administer the test to clerical workers when they are initially hired and six months after they are hired and then correlate the two sets of scores.

A

Concurrent validity is a type of criterion-related validity.
a. Incorrect This technique would be used to evaluate the test’s construct validity.

b. Incorrect This would help establish the test’s content validity.
c. CORRECT To evaluate a test’s criterion-related validity, scores on the predictor (in this case, the selection test) are correlated with scores on a criterion (measure of job performance). When scores on both measures are obtained at about the same time, they provide information on the test’s concurrent validity.
d. Incorrect This procedure would not provide information on the test’s concurrent validity.

The correct answer is: administer the test to a sample of current clerical workers and correlate their scores on the test with their recently assigned performance ratings.

114
Q

The item discrimination index (D) ranges in value from:
Select one:

A. 0 to 10.
B. 0 to 50.
C. -1.0 to +1.0.
D. -50 to +50.

A

The item discrimination index indicates the extent to which a test item discriminates between examinees who obtain high versus low scores on the entire test or on an external criterion.

a. Incorrect See explanation for response c.
b. Incorrect See explanation for response c.
c. CORRECT The item discrimination index is calculated by subtracting the percent of examinees in the lower scoring group from the percent of examinees in the upper scoring group who answered the item correctly and ranges in value from -1.0 to +1.0.
d. Incorrect See explanation for response c.

The correct answer is: -1.0 to +1.0.

115
Q

A college freshman obtains a score of 150 on his English final exam, a score of 100 on his math exam, a score of 55 on his chemistry exam, and a score of 30 on his history exam. The means and standard deviations for these tests are, respectively, 125 and 20 for the English exam, 90 and 10 for the math exam, 45 and 5 for the chemistry exam, and 30 and 5 for the history exam. Based on this information, you can conclude that the young man’s test performance was best on which exam?
Select one:

A. English
B. math
C. chemistry
D. history

A

The raw scores on different tests may be compared by converting the scores to z-scores, which is done by subtracting the mean from the examinee’s raw score to calculate a deviation score and then dividing the deviation score by the standard deviation. Additional information on calculating z-scores is provided in the Test Construction chapter of the written study materials.

a. Incorrect See explanation for response c.
b. Incorrect See explanation for response c.
c. CORRECT In this case, the student’s English score is equivalent to a z-score of +1.25, his math score is equivalent to a z-score of +1.0, his chemistry score is equivalent to a z-score of +2.0, and his history score is equivalent to a z-score of 0. Therefore, the student obtained the highest score on the chemistry test.
d. Incorrect See explanation for response c.

The correct answer is: chemistry

116
Q

The assumption underlying convergent validity is that:
Select one:

A. a measure of a characteristic should correlate highly with a different type of measure that is already known to assess the same characteristic.
B. a valid measure of a variable should have as much (or more) factorial validity as does an existing measure of the same variable.
C. a measure of a construct should correlate more highly with itself than with another measure of the same construct.
D. to be valid, a measure of a characteristic should correlate highly with the measure of the behavior it is designed to predict.

A

A test has convergent validity when it correlates highly with another measure of the same trait.

a. CORRECT One way to establish a test’s construct validity is to determine that it correlates highly with other measures that are already known to assess the same trait. When it does, the measure is said to have convergent validity.
b. Incorrect See explanation for response a.
c. Incorrect See explanation for response a.
d. Incorrect See explanation for response a.

The correct answer is: a measure of a characteristic should correlate highly with a different type of measure that is already known to assess the same characteristic.

117
Q

Assuming a normal distribution, which of the following represents the highest score?
Select one:

a.
a Z score of 1.5

b.
a T score of 70

c.
a WAIS Full Scale IQ score of 120

d.
a percentile rank of 88

A

For the exam, you want to be familiar with the relationship of z scores, T scores, WAIS IQ scores, and percentile ranks in a normal distribution so that you can answer questions like this one.

Answer B is correct: A T score of 70 is two standard deviations above the mean.

Answer A is incorrect: A Z score of 1.5 is one and one-half standard deviations above the mean.

Answer C is incorrect: A WAIS IQ score of 120 is slightly over one standard deviation above the mean.

Answer D is incorrect: A percentile rank of 88 is slightly over one standard deviation above the mean.

The correct answer is: a T score of 70

118
Q

Dina received a percentile rank of 48 on a test, while her twin brother, Dino, received a percentile rank of 98. Their teacher realizes she made an error in scorting their tests and adds four points to Dina’s and Dino’s raw scores. (The other students’ tests were scored correctly.) When she recalculates Dina’s and Dino’s percentile ranks, the teacher will find that:
Select one:

a.
Dina’s percentile rank will change by more points than Dino’s.

b.
Dino’s percentile rank will change by more points than Dina’s.

c.
Dina’s and Dino’s percentile ranks will change by the same number of points.

d.
Dina and Dino’s percentile ranks will not change.

A

As described in the Test Construction chapter of the written study materials, percentile ranks maximize differences in the middle of the raw score distribution and minimize differences at the extremes.

Answer A is correct: This general rule means that Dina’s percentile rank (which is near the middle of the distribution) will be affected more by the four-point addition to her raw score than Dino’s percentile rank (which is extremely high)

The correct answer is: Dina’s percentile rank will change by more points than Dino’s.

119
Q

Eigenvalues are associated with:
Select one:

a.
internal consistency reliability.

b.
criterion-referenced interpretation.

c.
the multitrait-multimethod matrix.

d.
principal components analysis.

A

An eigenvalue indicates the total amount of variability in a set of tests or other variables that is explained by an identified component or factor.

Answer D is correct: Eigenvalues can be calculated for each component “extracted” in a principal component analysis. Additional information on eigenvalues and principal component analysis is provided in the Test Construction chapter of the written study materials.

The correct answer is: principal components analysis.

120
Q

Which of the following scores is NOT a norm-referenced score?
Select one:

a.
percentile rank

b.
T-score

c.
pass or fail

d.
grade-equivalent scores

A

When using norm-referenced interpretation, an examinee’s score indicates how well he or she did on the test relative to examinees in the norm group.

Answer C is correct: Pass or fail is a criterion-referenced score. It indicates whether a person has or has not mastered the test content and does not measure performance in terms of other examinees. A “pass” score obtained by one examinee does not indicate how many other examinees passed or failed.d. Incorrect A grade-equivalent score is a norm-referenced score. It allows a test user to compare an examinee’s test performance to that of students in different grade levels.

Answer A is incorrect: Percentile ranks are norm-referenced scores. A percentile rank indicates the percent of examinees in the norm group who obtained a lower score.

Answer B is incorrect: A T-score is a type of standard score, and standard scores are norm-referenced scores that indicate how well an examinee did in terms of standard deviation units from the mean score of the norm group.

Answer D is incorrect: A grade-equivalent score is a norm-referenced score. It allows a test user to compare an examinee’s test performance to that of students in different grade levels.

The correct answer is: pass or fail

121
Q

Zelda Z. obtains a score of 41 on a test that has a mean of 50 and a standard deviation of 6. If all of the scores in the distribution are transformed so that the test now has a mean of 100 and a standard deviation of 12, Zelda’s score in the new distribution would be:
Select one:

a.
91

b.
82

c.
41

d.
20.5.

A

To identify the correct answer to this question, you have to recognize that Zelda’s original score was 1-1/2 standard deviations below the mean.

Answer B is correct: A score of 82 is 1-1/2 standard deviations below the mean of the new distribution and, therefore, equivalent to a score of 41 in the original distribution.

The correct answer is: 82