Test Constructio Flashcards

Question

Match the type of validity with its definition: A. Predicts someone's status on an external criterion measure. B. measures a theoretical, non observable construct or trait. C. Measures knowledge of domain designed to measure. D. Hi correlation w another test measures the same thing. E. low correlation w test that measures something different

Answer 1

``` A. Criterion related validity B. construct validity C. Content validity D. Convergent validity E. Divergent validity ```

Answer 2

A, b, d correct | No test had validity per se

Answer 3

A. Correct and used in industrial settings. Like work samples, license exam B. correct! C. False. Judgement and agreement of subject matter experts. Clearly identify domain. do subcategories and select from ea. to make rep. May want it to have hi correlation w tests of same content domain or those successful in the class. D. False! Face validity not really a type of validity but it is desirable or ppl may not cooperate, lack motivation etc.

Answer 4

A. True! Criterion is job performance, school achievement, test scores or that which is predicted. B. true like Pearson r and called the criterion related criterion coefficient. -1 to 1; 1 is perfect validity; o is none. Few exceed .6; even .3 may be ok. Square to interpret. Proportion of variability in criterion explained or shared by variability of the predictor. C. Part true. Can gather at same time called concurrent validity (typing test) best for current status. Less costly, more convenient. Often used for predictive (ie pilot. Can't hire all so do test pick best) Or can have predictive validation where predictor done first and criterion done later (GRE score and later looked at GPAs and then correlated). Best to predict future status. D. ESP to select employees , decide admissions, place in special classes.

Answer 5

A. How much sample mean can be expected to deviate from population mean B. reliability measurement C. Correct! D. Allows prediction of unknown value of One variable from the known value of another. Used to get PREDICTED score. C. Use this to get a confidence interval. 95% chance true criterion score will be w in the predicted score So predicted score +/- (1.96)(std e estimate). So iq of 115 put into regression equation and shows math score should be 80. SEEst is 5 95% chance actual score between 70 and 90. 80+/- (2x5)

Answer 6

SE of measurement is reliability SE of estimate is validity SE of measurement estimates where true test score is likely to fall given obtained score on same test. No predictor involved SE of estimate Where actual criterion score is likely fall given criterion that was predicted by another measure. Must have a predictor! Know SE if measurement formula! Just concept for SE of estimate

Answer 7

A. Lower validity if range of scores on predictor and/or criterion are restricted. More homogeneous, lower validity. B. lower validity if unreliable predictor and/or criterion but hi reliability doesn't guarantee adequate validity. Unreliable always invalid but reliable test may/not be valid. C. Validation of predictor on second sample will likely be lower then the first. Cross validation. Typically test developed and validated on item by item basis and those w highest correlation w criterion get in. When cross validate this then second lower. Called shrinkage. Predictor tailor made for original validation sample and doesn't generalize fully. So must cross validate or true validity is overestimated. D. Validity may vary among subgroups (men, women; hi ses, low ses...). Many moderator variables present for one grp and not another..and then their is differential validity. Rare in industrial settings. E. criterion contamination Knowledge of scores on st effects criterion. Artificially INFLATES validity coefficient. No one should have knowledge of scores.

Answer 8

Tendency of validity coefficients to decrease in magnitude upon cross validation Impacted by: increases shrink Small sample size Original item pool is large Number if items kept is small relative to item pool &/or items not chosen on basis of previously formulated ho or experience w criterion.

Answer 9

Psychological variable that is abstract. Not directly observable. Concern for abstract attributes. Worked out over time on basis of accumulated evidence. Establish w choice of methods guided by test developers theory.

Answer 10

Convergent..degree test has hi correlation w another test that is designed to measure same trait or construct. Divergent..degree test had a low correlation w another test designed to measure different trait or construct. Discriminate. Only case where a low coefficient provides evidence of hi validity. Method used is multi trait multimethod matrix. So measure two or more traits by two or more methods. Convergent if tests of same traits hi correlation even w diff methods Divergent if two tests measure diff traits have lo correlation even w same method. Other method is factor analysis.

Answer 11

All. Underlying constructs may be only a few and find out how many and to what degree they account for underlying constructs...called latent variables because tests in analysis not directly intended to measure them. Sometimes said purpose is to detect the structure in a number of variables. Means start w lg number and classify into sets. Rotation of factors is either orthogonal (in correlated factors) or oblique (correlated factors)

Answer 12

Communality and specificity or true score variability shared w other tests and part of true variability unique to the test itself. So tests reliability must be at least as hi as communality(lower limit estimate of reliability). Explained variance is Eigenvalue . Bottom of ea factor. Amount of variance in all tests accounted for by the factor. Used to determine if factor is acct for significant amt of variability in tests. Convert to percentage. (Eigenvalue x100)/number of tests Tells u total variability. Most factors ordered by Eigenvalues So factor one explains more of what is going on than 2 etc Sum of Eigenvalue can be no larger than number of tests in the analysis.

Answer 13

False. Interrelationships stay the same as do communalities. Just puts in new position. However, Eigenvalues may change after a rotation. So use term only w unrotated factors. Orthogonal factors independent of each other. Uncorrelated. Some say always should be used bc easy to interpret. Uses communality as sum of squared factor loadings. Oblique factors correlated w ea other . Some say should use bc most traits and categories are correlated. Doesn't use communality. Test has construct validity if it correlates highly w a factor it would be expected to correlate w.

Answer 14

All true for principal component analysis A, b for cluster analysis

Answer 15

A. True. Factor analysis is interval or ratio. Cluster use any kind B. true just clusters which are categories and not nec traits or latent variables C. False. Factor analysis tests ho but not cluster (no a priori ho). D. True for cluster. Develop classification system. For criminals, rapists, alcoholics..

Answer 16

True all Since reliability sets upper limit on validity if a test has moderate or low reliability will be limited in terms of how valid it can be.

Answer 17

B, d Percentage is the item difficulty index. p If p=.80 then 80% of examinees passed that item. Higher p, less difficult the item Test developers choose items w moderate difficulty bc increase test score variability which is associated w hi levels of reliability and validity. Also provides maximum differentiation between hi and lo scorers. Diff difficulty level based on purpose. Should approximate the selection ratio. Accelerate kids then .25 diff level meanin only 25% pass. Only want to select 25 % Mastery tests... item difficulty higher, like .8 or .9. Possible to get correct thru blind guessing, use higher p. this is bc if p level is too low correct responses are likely to reflect a chance guess not what trying to measure. Rule of thumb..ave difficulty level of test items should be about half way between 1.0 and level of success expected by chance. So t/f. P is .75 Multiple choice w 5 possible .20 and 1.0 so .6 Ave item difficulty is affected by nature of the sample of ppl who tried it out. Try out sample should be representative of population the test is intended.

Answer 18

Ordinal is correct According to anastasi says not interval or ratio bc equivalent diff in p level do not necessarily indicate equiv diff in difficulty Indicates rank or difficulty of items but can't infer differences are equal between the items.

Answer 19

A. Item discrimination Discriminates between say hi and lo scorers. Ie. If actual depressed ppl consistently answer diff non depressed. Measured many ways..correlate item responses w total test score. Highest items kept for test. Good when 1 attribute and internal consistency is very important. ..predict performance on criterion then ea item correlated w criterion. Pick those hi correlations w criterion but lo w ea other. Can calculate item discrimination index on some items D = U - L D ranges (max) 100 to -100 (those in lower grp and none in hi grp answer correct). 0 is equal proportion and no doscriminability. Moderate difficulty are associated w max discriminability Item difficulty places a ceiling on discrimination index. If difficulty is 1 (all correct) or 0 (no one correct) then D is 0. An item answered the same way has no discriminating value. Higher D, higher reliability All methods seem equal

Answer 20

C. Used graphical depictions of the percentage of ppl in different ability levels who answer ea analyzed item correctly. Based on assumption that performance on a test is related to how much of a latent or underlying trust is possessed by the respondent. Curves depict item difficulty and discrimination. Difficulty is at point of axis where the probability of a correct response is .5. This is another way of measuring item difficulty. Whatever item hits that .50 place makes it say level 4 difficultly. Slope tells discrimination. Steep curve less useful. Not as steep then more useful at discriminating between hi and lo scorers

Answer 21

All but c. A. .5 on ability Axis B. where curve crosses y axis; if don't cross then it is zero. D. Slope tells ya Pg. 71

Answer 22

A. Nope B. called invariance if item parameters. Item should have same parameters (difficulty and discrimination) across all samples of the population.' Implies once analyzed items of wide ranging difficulty levels can be used w any individual to provide estimate of ability. Only true w lg samples C. True So can compare scores of individuals w diff items can be directly compared Also can compare total test scores of a sample to proportion of ppm who answered ea item correctly D. Not an assumption. Has been applied to this which is giving a set of items to estimated level of ability.

Answer 23

C. Probability can answer by chance alone. Should be halfway between 1.0 and level of success expected by chance alone.

Answer 24

D. Difficulty measured in terms of percentage of examinees who answer the item correctly

Answer 25

D. Item parameters (difficulty and discrimination will be the same regardless of sample.

Answer 26

All! Developmental norms..how far along the developmental path an individual has progressed. Mental age scores..examinees score compared to ave performance of others at diff age levels. Ratio iq score. Grade equivalent scores. For educational achievement tests. Disadv of developmental norms..no comparison at diff age levels. No std deviation. So scored NoT comparable. Within grp norms...provide comparison of examinees score to most comparable std sample

Answer 27

A. Percentile rank Adv. easy to understand and interpret Disadv represent ranks and not absolute interpretation between scores B. percentage

Answer 28

Standard score...compared raw scores distance from the mean in terms of standard deviation units. Z scores how many std deviation units above or below mean T score used psych tests. MMPI Stanine scores Deviation iq scores. How iq interpreted now.

Answer 29

B only t is std score

Answer 30

D. Can compare iq of 9 and 30 yr old

Answer 31

B. one measure of reliability of a test is how homogeneous or Internally consistent items are (coefficient alpha or kuder 20). There fore, decreasing inter item consistency makes a test less reliable.

Answer 32

A. Zero. Orthogonal is uncorrelated so it is zero B. b. communality

Answer 33

C. Any coefficient will be Lower w a restriction in the range of scores in one or both variables.

Answer 34

A. Correct B. not true C. Reliability sets ceiling D. uPper limit of validity coefficient is equal to square root of reliability coefficient.

Answer 35

B. obtained test score likely to differ from true test score to degree it depends on how much error test contains. Error of measurement used to make a range in which true score is likely to fall given obtained.

Answer 36

D. Answer by using formula of std error of estimate Std error of estimate comes out to be equal to std deviation of criterion scores.

Answer 37

A. Job analysis find out tasks and what do in job

Answer 38

C. Both give index of test ave degree of inter item consistency Alpha not dichotomously score. Pg86, 86 review 13 - 18

Answer 39

D upper limit of validity coefficient is the square root of .9 (not .81 which is square of .9). Means tests validity is lower than or equal to square root of .9. Pg. 64 Don't forget When orthogonal rotation no correlation between factors If test has perfect validity there is no error of estimate.

Answer 40

A Know predictor lower score may give lower score on criterion. Results in artificially hi consistency between predictor and criterion and inflates validity coefficient.

Answer 41

A. Correct B. wrong orthogonal C. Factor analysis D. Factor loadinf

Answer 42

A. Spearman brown B. correct C.

Test Constructio Flashcards

(70 cards)