Psychological Testing #2 Flashcards
*Restriction of range
Makes test-retest reliability low, because not very many subjects.
What is standard error of measurement?
Theoretically, if the subject took many tests, their various scores would result in a normal curve. That curve would have units of standard deviation. This SD unit is an SEM.
What is a confidence interval?
A confidence interval is the amount of confidence we have that our score falls within a certain range, based on the intervals of SEM. It is given in a percentage.
What is the standard error of the difference?
A statistical measure that can help a test user determine whether the difference between scores is significant. It is usually used for sub-scores on a test.
Ch4: What is validity?
Does the test measure what it claims to measure?
A test is valid to the extent that inferences made from it are appropriate meaningful, and useful.
What is the relationship between validity and reliability?
If a test is not reliable it’s not going to be valid. However a reliable test can be invalid. Something can be consistently bad. (You have to understand the relationship between reliability and validity.)
What do we mean by a continuum of validity?
Validity cannot be captured in statistical summaries, instead it is on a continuum ranging from weak to acceptable to strong, based on the three types of validity evidence.
What are the three categories of accumulating validity evidence?
Content validity
Criterion-related validity
Construct-validity
An ideal validation includes several types of evidence in all three categories.
What is face-validity?
Well for one, it’s not actually validity. It’s how the test looks to examinees. It’s important because it can impact a person’s approach to the test. It’s loosely related to content validity.
What is content validity?
Content validity is determined by the degree to which the questions, tasks, or items on a test are representative of the universe of behavior the test was designed to sample. Especially useful when a great deal is known about the construct.
Item sampling - (Behavior) Do the items on the test fit the content for what you’re wanting to test. If I’m testing 4th grade math level, and I examine skills that aren’t taught until 5th grade, then that’s poor content validity.
Types of skills - (Responses) Multiple choice or open ended?
“Expert review” is often the choice of evidence.
What is criterion-related validity?
The test score is compared to an outcome measure (criterion). The criterion can be concurrent, e.g. people take a new IQ test and and established IQ test at the same time. The criterion can also be predictive, like in college readiness tests and employment tests.
What makes a good criterion for criterion-related validity?
RELIABLE - consistency of scores.
APPROPRIATE - Well duh, but actually sometimes this can be tricky. Should the criterion measure of an aptitude test indicate satisfaction, success, or continuance in the activity?
FREE FROM THE CONTAMINATION OF THE TEST - This is where that becomes a problem, when your criterion becomes contaminated because of the test score. I want to see if this is useful, but you already used to test to determine who you hired.It can also be contaminated by overlap between questions, e.g. if both tests ask about eating habits and sleeping habits will artificially inflate the correlation.
What is decision theory?
The purpose of psychological testing is not measurement for its own sake, but measurement in the service of decision making.
Making decisions based on test scores results in a matrix of outcomes. With hits and misses (false positives and false negatives). You have to determine where you want your mistakes to be.
What is construct validity?
A construct is a theoretical, intangible quality or trait in which individuals differ. Construct validity is theory based: Based on my understanding of this particular construct, what would I expect to see in a test?
No criterion or universe of content is accepted as entirely adequate to define the quality to be measured, so a variety of evidence is required to establish construct validity.
What is test homogeneity?
A measure of construct validity.
Does it measure a single construct?
If my theory about this is a unitary construct and I do internal consistency and it looks like it’s just one construct. It could be measuring one thing, but it might not be the right thing.
What are appropriate developmental changes?
A measure of construct validity.Is my construct something that changes as people age?Ego-centrism would have different results. The scores should go down as kids get older.
What are theory-consistent group differences?
A measure of construct validity.Can we predict who will have high and low scores for this construct?Different rates of extroversion in different professions. Nuns are high in social interest. Models and criminals are low in social interest.
What are theory-consistent intervention effects?
A measure of construct validity.Does the construct change in the appropriate direction after intervention/treatment?People’s scores of spatial orientation should increase after training, more than those who did not receive training.
What is convergent and discrimination validation?
A measure of construct validity.What should it correlate with and what should it be different from? Intelligence and social interest are theoretically unrelated.Anxiety and eating disorders overlap.
What is factor analysis?
A measure of construct validity.How many factors are you actually measuring?If you think you’re measuring three factors, and a factor analysis shows three factors, that’s a good sign.
What is classification accuracy?
A measure of construct validity.
How well does it give accurate identification of test takers? Test makers strive for high levels of:
SENSITIVITY: Accurate identification of patients who have a syndrome.
SPECIFICITY: Accurate identification of normal patients.
These are measured by percentages. Sensitivity: 79% (correctly identifies 79% of affected individuals). Specificity: 83% (correctly identifies 79% of unaffected individuals).
What are extravalidity concerns?
Side effects and unintended consequences of testing.
What are some of the unintended side effects of testing?
How do we prevent extravalidity problems?
AKA Extravalidity concerns.
Children identified my feel unusual or dumb. Legal consequences. Test should also be evaluated for (1) values in interpretation, (2) usefulness in particular application, and (3) potential and actual social consequences. Along with traditional validity.
What does NOIR stand for?
Nominal
Ordinal
Interval
Ratio