Test Construction Flashcards

Question

Confidence Interval

Answer 1

Helps a test user to estimate the range w/in which an examinee's true score is likely to fall given their obtained score. Bc tests are not totally reliable, an examinee's obtained score may or may not be his/her true score. Always best to interpret an examinee's obtained score it to construct a confidence interval around that score. * A confidence interval indicates the range w/in which an examinee's true score is likely to fall given the obtained score. * It is derived using the Standard Error of Measurement (SEM): * 68% = +/- 1 SEM from obtained score * 95% = +/- 2 SEM from obtained score * 99% = +/- 3 SEM from obtained score *_Standard Error of Measurement (SEM)_* = Used to construct a confidence interval around a measured or obtained score.

Answer 2

It is used to construct a confidence interval around an examinee's obtained (measured) score. Range is calculated by multiplying the standard deviation of the test scores by the square root of 1 - the reliability coefficient. This is an index of the amount of error that can be expected in obtained scores due to the unreliability of the test. SEmes=SDx√(1-rₓₓ) Ex: A psychologist administers an interpersonal assertiveness test to a sales applicant who receives a score of 80. Since test‘s reliability is less than 1.0, the psych. knows that this score might be an imprecise est. of the applicant's true score & decides to use the standard error of measurement to construct a 95% confidence interval. Assuming that the test‘s reliability coefficient is .84 and its standard deviation is 10, the standard error of measurement is equal to 4.0 The psych. constructs a 95% conﬁdence interval by adding and subtracting 2 standard errors from the applicant's obtained score: 80 ±2(4.0) = 72 to 88. There is a 95% chance that the applicant's true score falls between 72 and 88. SEM = SD_{x√(1-rₓₓ)} _=10√1-.84 =10(.4) = 4.0

Answer 3

Refer's to a test's accuracy in terms of the extent to which the test measures what it is intended to measure. 3 Different Types of Validity include: 1. *_Content Validity_* (content or behavior) 2. *_Construct Validity_* (hypothetical trait or construct) 3. *_Criterion-Related Validity_* (status/perf. on external criterion)

Answer 4

Important for tests designed to measure a specific content or behavior domain & items are an accurate & respresentative sample of content domains they represent. Content validity is _not_ the same as face validity. Most important for achievement & job sample tests. Determined primarily by "expert judgment" This is of concern when a test has been designed to measure 1 + content/behavior domains. A test has content validity when its items are a representative sample of the domain(s) that the test is intended to measure. Usually built into a test bc it is being constructed & involves clearly defining: * The content/behavior domain * Divided into categories & sub-categories * Then write or select items that represent each sub-category thru the selection of a representative sample of items. After test devel., content validity is checked by having subject matter experts who determine if test items are an adequate & representative sample of the content or behavior domain & then evaluate the test in a systematic way. Scores on the test (X) are important bc they provide info. on how much each examinee knows about a content domain w/regard to the traits being measured, then content or construct validity are of interest.

Answer 5

(Broadest category of validity) This is important when a test will be used to measure a hypothetical trait or construct & has been shown that the test actually measures the hypothetical trait its intended to measure. A method of assessing construct validity is to correlate test scores w/scores on other measures that do & do not measure the same trait. To determine if the test has both: * *_Convergent Validity:_* The correlation btwn the test's we're validating & the measure of the same trait using a differemt method provides info. about the test's convergent validity.(Monotrait-Heteromethod; large) * *_Discriminate (divergent) Validity:_* The correlation btwn the test's we're validating & the measures of unrelated traits provide info about the test's divergent validity. (Heterotrait-Monomethod; small) * *_Multitrait-Multimethod Matrix_* is used to eval. construct validity. A table of correlation coefficients that provide info. about the test's convergent & divergent (discriminant) validity. Convergent & divergent combo provide evidence that the test is actually measuring the construct it was designed to measure. Other methods include: * Constructing a *_Factor Analysis_* to assess the test's factorial validity. Provides info. about convergent & divergent validity but is more complex techniue. * Determining if changes in test scores reflect expected developmental changes; & * Seeing if experimental manipulations have the expected impact on test scores. There tests include achievement, motivation, intelligence or mechanical aptitude. When scores on the test (X) are important bc they provide info. on how much each examinee knows about a content domain or on each examinee's status w/regard to the traits being measured, then content or construct validity are of interest.

Answer 6

* *_Convergent Validity:_* The correlation btwn the test's we're validating & the measure of the same trait using a different method provides info. about the test's convergent validity. * When a test has high correlations w/measures that assess the *same construct.* * (Monotrait-Heteromethod) * High correlations=convergent * *_Discriminate (divergent) Validity:_* The correlation btwn the test's we're validating & the measures of unrelated traits provide info about the test's divergent validity. * When a test has low correlations w/measures of *unrelated characteristics*. * (Heterotrait-Monomethod; small) * Low correlations=discriminate (divergent) Convergent & divergent combo provide evidence that the test is actually measuring the construct it was designed to measure. *_Multitrait-Multimethod Matrix_* is used to eval. w/a table of correlation coefficients that provide info. about the test's convergent & divergent (discriminant) validity.

Answer 7

A systematic way to org. the data collected when assessing a test's convergent & discriminate validity. The matrix is a table of correlation coefficients that provide info. about the test's convergent & divergent (discriminant) validity. Requires measuring at least 2 different traits using at least 2 different methods for each trait. Terms have been linked to 4 correlation coefficients: 1. *_Monotrait-Monomethod Coefficients:_* Measures the same-trait-same method. 2. *_Monotrait-Heteromethod Coefficients:_* Measures same trait-different methods. (Convergent) 3. *_Heterotrait-Monomethod Coefficients_*: Measures different traits-same method. (Divergent) 4. *_Heterotrait-Heteromethod Coefficients:_* Measures different traits-different methods.

Answer 8

Reliability coefficients: Indicate the correlation between the measured & itself. Coefficients are not directly relevant to a test's convergent & discriminate validity, they should be larger for the matrix to provide useful info.

Answer 9

These coefficients (coefficients in rectangles) indicate the correlation between different measures of the same trait. It indicates that a test has convergent validity when the monotrait-heteromethod coefficients are large.

Answer 10

These coefficients (coefficients of ellipses) show the correlation between different traits that have been measured by the same method. It indicates discriminate validity when the heterotrait-monomethod coefficients are small.

Answer 11

These coefficients (underlined coefficients) indicate the correlation between different traits that have been measured by different methods. It indicates discriminate validity when the heterotrait-heteromethod coefficients are small.

Answer 12

A multivariate statistical technique used to ID how many factors (constructs/dimensions) that underlie the intercorrelations among a set of tests, subtests, or test items. One use of obtained data is to determine if a test has construct validity by indicating the extent to which the test correlates with factors that it would & would not be expected to correlate with. A test shows construct validity when it has high correlations w/factors expected to correlate with & low correlations w/factors not expected to correlate with. True score variability consists of: * Communality & Specificity. Factors ID in a factor analysis can be either: * Orthogonal * Oblique.

Answer 13

Includes grouping a large number of test items into subtests & subscales to test hypothesis about test item scales & subscales related to one another& test construct validity. 1. *_Administer Tests to a Sample of Examinees:_* Admin. several tests to be validated to a group of examinees to measure the same & diff. traits. 2. *_Derive & Interpret the correlation Matrix:_* Correlate scores on each test w/scores on every other test to obtain a correlation (R) matrix which indicates the correlations of all the pairs included in the analysis. ID clusters of tests that are highly correlated w/1 another & number of clusters determines how many factors should be extracted in the factor analysis. 3. *_Extract the inital Factor Matrix:_* Using 1 of several available factor analytic techniques, convert the correlation matrix to a factor matrix; difficult to interpret. (Data in correlation matrix is used to derive a factor matrix that contains correlation coefficients, that indicate the degree of association btwn each test and factor). 4. *_Rotate the Factor Matrix:_* To obtain the final prouduct & simplify the interpretation of the factors by "rotating" them. 5. *_Name the Factors:_* Interpret and name the factors in the rotated factor matrix, when used to assess construct validity it has high correlations w/factors expected to correlate with & low correlations w/factors not expected to correlate with.

Answer 14

In factor analysis: *_Factor Loading_* is the correlation btwn a test (or other variable included in the analysis) & a factor. To interpret a factor loading is to square it to determine the amount of variability in the test scores that is explained by the factor. A squared factor loading provides a measure of "shared variability." Ex: If a test has a correlation of .50² with Factor I, this means that 25% of variability in test scores is explained by Factor I.

Answer 15

The total amount of variability in test scores on the test (or other variable) that is due to the factors that the test shares in common w/other tests in the analysis (identified factors). Communality is a lower-limit estimate of a test's reliability coefficient.

Answer 16

2nd component in a Factor Analysis which is variability that is due to factors that are specific or unique to the test & that are not measured by any other test included in the analysis. The specificity ID the potion of true score variability that has not been explained by factor analysis.

Answer 17

Redividing the communality of each test included in the analysis. A researcher decides which is approp. based on his/her theory about the charaterisitics measured by the tests included in the analysis. Due to the re-division each factor accounts for a different proportion of a test's variability than prior to the rotation. This process make sit easier to interpret factor loading's. There are 2 types of rotations: 1. *_Orthogonal_* (Uncorrelated) 2. *_Oblique_* (Correlated)

Answer 18

When rotation of ID factors are orthogonal unrelated attributes **(uncorrelated)**, a test's communality can be calculated by summing the squared factor loading's. *Can only calc. communality when factors are orthogonal.* Ex: If a test has a correlation of .50 w/Factor 1 & a correlation of .20 w/Factor 2 & the factors are uncorrelated, the test's communality is .29 (F1 =.50^2 = .25; F2= .20^2=.4; 25+4=.29). This means that 29% of the variability in test scores is explained by the ID factors, while the remaining variability is due to some combo of specificity & measurement error.

Answer 19

When the rotation is oblique the ID factors are **correlated**, & the attributes measured by the factors are not independent.

Answer 20

The type of validity that involves determining the relationship (correlation) btwn the predictor & the criterion. Important when test scores will be used to predict/est. scores on a criterion. The correlation coefficient is referred to as the criterion-related validity coefficient. This type of validity can be either: * *_Concurrent:_* Involves obtaining scores on the predictor & criterion at about the same time; current. * *_Predictive:_* Involves obtaining predictor scores before criterin scores; future. This is of interest when a test has been designed to estimate or predict an examinee's standing or performance on an another measure (external criterion). When the test (X) score will be used to predict scores on some other measure (Y) & it is the scores on Y that are of the most interest, then this type of validity is of interest.

Answer 21

The validity coefficient represents the correlation btwn 2 different measures, it can also be interpreted as shared validity. When the correlation btwn the 2 diff. measures is squared it provides a measure of shared variability. (Tip: On exam if Q gives the correlation coefficient for X (predictor) & Y (criterion), & asks how much variability in Y is explained by X; need to square the correlation coefficient to get the correct answer).

Answer 22

When predictor & criterion scores are obtained at the same time. Used to estimate current status/performance on the criterion. (Criterion-Related Validity)

Answer 23

When predictor scores are obtained before criterion scores. Preferred type to predict future performance on the criterion. (Criterion-Related Validity)

Answer 24

Used to construct a confidence interval around an estimated or predicted score. An index of error when predicting criterion scores from predictor scores. It's magnitude depends on 2 factors: 1. The criterion's SD 2. The predictor's validity coefficient Used to construct a confidence interval around an examinee's predicted criterion score: * 68% confidence interval is constructed by +/- 1 SD from the predicted criterion score. * 95% interval by +/- 2 SD from the predicted criterion score. * 99% interval by +/- 3 SD from the predicted criterion score. SE_est=SD_y √1-(r_xy)² Ex: Given SD of 10 & validity coefficient of .60 calc. the standard error of est.? SE_est = 10√1-.60² = .36 = √1-.36 = √.64 = 10(.8) = 8

Answer 25

The extent to which a predictor increases decision-making accuracy when the predictor is used to make selection decisions. Calculated by subtracting the base rate from the positive hit rate. Evaluated by comparing the nymber of correct decisions made with & w/out the new predictor. This has been linked to predictor & criterion cutoff scores; * true & false positives; * true & false negatives. Scatterplots are used to assess a predictors incremental validity by dividing it into 4 quadrants; predictor cutoff determines if someone is + or -, & criterion determines true or false.

Answer 26

Scored high on predictor & criterion; ppl predicted to be successful & are. On scatterplot usually right upper quadrant. (Incremental Validity)

Answer 27

Scored high on predictor & low on criterion; ppl predicted to be successful but are not. To reduce the number of false positives, the predictor cutoff can be raised and/or the criterion cutoff can be lowered. On scatterplot usually right bottom quadrant. (Incremental Validity)

Answer 28

Scored low on predictor & criterion; ppl predicted to be unsuccessful & are. On scatterplot usually left bottom quadrant. (Incremental Validity)

Answer 29

Scored low on predictor & high on criterion; ppl predicted to be unsuccessful but are successful. On scatterplot usually left upper quadrant. (Incremental Validity)

Answer 30

Reliability is necessary but not sufficient condition for validity. In terms of criterion-related validity, the validity coefficient can be no greater than the square root of the product of the reliabilities of the predictor & criterion. The formula indicates that reliability places a upper limit on validity r_xy \< √r_ₓₓ A valid test _must_ be reliable but A reliable test may or may not be valid

Answer 31

Process of re-assessing a test's criterion-related validity on a new sample to check the generalizability of the original validity coefficient.

Answer 32

Bc the predictor is often "tailor-made" for the orig. validation sample, the cross-validation coefficent tends to "shrink" (becomes smaller) bc the chance factors operating in the original sample are not all present in the cross-validation sample. As a result the validity coefficient for the cross-validation sample is usually smaller. Shrinkage refers to a reduction in the magnitude of a measure's validity coefficient & validating a predictor w/a new sample.

Answer 33

An examinee's raw score is often difficult to interpret unless it's anchored to the performance of other examinees or a predefined standard of performance. Types include: * Norm-Referenced * Percentile Rank * Standard Scores

Answer 34

Involves comparing an examinees test scores to scored obtained in a standardization sample or other comparison group. This type of interpretation may entail converting an examinee's raw score to a percentile rank and/or standard score (e.g., z-scores & T scores). The examinees raw score is converted to a score that indicates his/her relative standing in the comparison group. Percentile Ranks & Standard Scores

Answer 35

Ranges from 1 to 99 & express an examinees score in terms of percentage of examinees who achieved lower scores in the sample. Advantage = easy to interpret Limitation = Indicate an examinee's relative position in a distribution but do not provide info. about absolute differences btwn examinees in terms of their raw scores. The distribution is always flat (rectangular) regardless of the shape of the raw score distribution; nonlinear transformation (looks different). (Norm-Referenced Interpretation)

Answer 36

Anchor an examinee's test score to those of the norm group by reporting the examinee's score in terms of standard deviation units from the mean. Most common is a *_z-score_* & the distribution has a mean of 0 and a SD of 1.0. * Calculated by subtracting the mean of the distribution from the raw score to obtain a deviation score & then dividing the deviation score by the distributions SD. The resulting score indicates how far the raw score is from mean of distribution. * SD. z=(X-M)/SD Ex: Z-score of -1.0 indicates that an examinee's raw score is 1 SD below the mean. Ex: If an examinee obtains a score of 110 on a test that has a mean of 100 & SD of 10, the z-score is +1.0 (the raw score is 1 Sd above the mean) Z = 110-100/10 = 10/10 = 1 *_T-Score Distribution_* has a mean of 50 & SD of 10 * Examinee whose raw score is 1 Sd above the mean will have a T-score of 60 *_Deviation IQ scores_* have a mean of 100 & SD of 15 * Examinee whose raw score is 1 SD above the mean will have a score of 115 (Norm-Referenced Interpretation)

Answer 37

Interpretation of a test score in terms of a pre-specified standard: * *_Percentage score (% correct)_* - Indicate the proportion of the test content (e.g. % of test items) that examinees answered correctly. * *_Regression Equation_* - Predicted perf. on an external criterion. * *_Expectancy Table_* - Makes it possible to use an examinee's predictor (test) score to estimate the probability that they will attain different scores on a criterion.

Answer 38

Refers to bias introduced into a person's criterion score as a result of the knowledge of the scorer about his/her performance on the predictor. Tends to artificially inflate the relationship between the predictor and criterion.

Answer 39

Cleary's regression model aka model of test bias; if a test has the same regression line for members of both grps, the test is not biased even if it has different means for the grps.

Answer 40

Scores on objective tests are sometimes corrected for guessing in order to ensure that examinees don't benefit from guessing wildly.

Answer 41

Alternate raters This occurs when observers ratings become increasingly less accurate over time in a systematic way. Occurs when raters who are working together influence each other's ratings so that they assign ratings in increasingly similar (& idiosyncratic) ways.

Answer 42

C. 95 To identify the correct answer to this question, you need to be familiar with the areas under the normal curve & know that T scores have a mean of 50 & SD of 10. Ex: T score of 30 & 70 are 2 SD'd below & above the mean &, in a normal distribution, about 95% of scores fall btwn the scores that are +/- 2 SD's from the mean.

Answer 43

Method of score adjustment used to take grp differences into account when assigning or interpreting scores. Ex: When a band is defined as 91-100 pts & examinees who receive scores of 91, 95, & 99 are treated the same.

Answer 44

Tends to artifically inflate the size of the inter-rater reliability coefficent. Occurs when observer ratings become increasingly less accurate over time in a systematic way. This happens when raters who are working together influence each others ratings so that they they assign ratings in an increasingly similar & idiosyncratic ways.

Answer 45

Refers to whether or not test items "look like" they're measuring what the test is designed to measure. Not an actual type of validity & is desireable in some situations. If a test lacks this ppl may not be motivated to respond to items in an honest or accurate way.

Answer 46

The degree of peakedness or flatness of a probability distribution, relative to the normal distribution with the same variance. 2 Types: 1. *_Leptokurtic:_* Distribution of scores more peaked than a normal distribution. 2. *_Platokurtic:_* Distribution of scores flatter than a normal distribution.

Answer 47

Characterisitcs of this type of analysis is that the components (factors) are extracted so that the 1st component reflects the greatest amount of variability, then 2nd component, 2nd greatest etc... * Involves ID the components (factors) that underlie/explain the variability observed in a set of tests or other variables. (Similar to factor analysis) *_Eigenvalue:_* (For each principal component) is calculated by squaring the correlation btwn each test & the component & then summing the results The resulting number indicates the total mount of variability in the test that is explained by the principal component.

Test Construction Flashcards

(71 cards)