CHAPTER 4: OF TESTS AND TESTING Flashcards by Amara Vale

It has been defined as “any distinguishable, relatively enduring way in which one individual varies from another.”

Trait

How well did you know this?

Not at all

Perfectly

It distinguishes one person from another, but is relatively less enduring.

States

How well did you know this?

Not at all

Perfectly

It is an informed, scientific concept developed or constructed to describe or explain behavior.

Construct

How well did you know this?

Not at all

Perfectly

It refers to an observable action or the product of an observable action, including test- or assessment-related responses.

Overt Behavior

How well did you know this?

Not at all

Perfectly

This assumption holds that psychological traits (like shyness or intelligence) and states (like anxiety or happiness) are real and meaningful ways to describe how people differ from one another. Traits are relatively stable over time, while states are more temporary. Although traits and states are not directly observable, their existence is inferred from behavior, whether through observation, test answers, or self-reports. These constructs help psychologists explain and predict behavior. However, traits do not appear 100% of the time and can be influenced by the situation and the person’s environment. Importantly, how we label traits (like “shy” or “outgoing”) often depends on the context and the comparison group being used.

Assumption 1: Psychological Traits and States Exist

How well did you know this?

Not at all

Perfectly

It refers to a method of interpreting test results where each response contributes to a total score that reflects the strength of a specific trait, ability, or state.

Cumulative Scoring

How well did you know this?

Not at all

Perfectly

This assumption holds that once psychological traits or states are defined—such as aggression, intelligence, or anxiety—they can be measured in numerical terms using well-designed tests. Since traits like “aggression” can have different meanings depending on the context, test developers must create clear operational definitions and select behaviors that best represent those definitions. Then, they design test items that reflect these behaviors and determine how much each item should contribute to the final score. Through cumulative scoring, a person’s responses are totaled, and their score reflects the strength or level of the trait being measured. In essence, this assumption supports the idea that even abstract psychological qualities can be reliably quantified and evaluated.

Assumption 2: Psychological Traits and States Can Be Quantified and Measured

How well did you know this?

Not at all

Perfectly

This assumption states that behavior shown during a test is meaningful because it can predict behavior outside of the test. While test tasks may seem simple or unrelated, like answering multiple-choice questions or pressing keyboard keys, they are designed to reflect or correlate with broader psychological traits or behaviors. For instance, personality test responses can indicate the likelihood of certain mental health issues, and job-related tests can predict future work performance. In some cases, tests are used not to predict but to postdict behavior—helping understand past behavior, such as in forensic settings. Essentially, test results serve as samples that help forecast or explain real-world behaviors.

Assumption 3: Test-Related Behavior Predicts Non-Test-Related Behavior

How well did you know this?

Not at all

Perfectly

This assumption emphasizes that no test is perfect—every psychological test or measurement tool has its own strengths, limitations, and appropriate uses. Competent and ethical test users must fully understand the test’s development, purpose, proper administration, and interpretation. They must also be aware of what a test cannot do and be able to compensate for its weaknesses by using other sources of information when needed. This assumption highlights the importance of responsible and knowledgeable use of testing tools in psychological assessment.

Assumption 4: Tests and Other Measurement Techniques Have Strengths and Weaknesses

How well did you know this?

Not at all

Perfectly

This assumption highlights that error is an unavoidable part of psychological testing and not necessarily a mistake, but a natural part of the measurement process. Test scores can be influenced by many factors other than the trait being measured—such as the examinee’s health, mood, environment, the assessor’s behavior, or even test flaws. These influences contribute to error variance, which reflects how much of a test score is due to factors unrelated to the actual trait. Measurement theories like Classical Test Theory (CTT) and Item Response Theory (IRT) all account for this inherent variability. Essentially, error is not a flaw in testing but a factor that must always be considered in interpreting results.

Assumption 5: Various Sources of Error Are Part of the Assessment Process

How well did you know this?

Not at all

Perfectly

It refers to the part of a test score that is caused by factors unrelated to the trait or ability being measured. It is the “noise” in a score that makes it less accurate or reliable because it reflects influences other than the actual construct being assessed.

Error Variance

How well did you know this?

Not at all

Perfectly

This assumption asserts that psychological testing and assessment can be fair and unbiased, but fairness depends heavily on how tests are developed and used. Although most modern test publishers strive for fairness by adhering to standardized guidelines, fairness issues still arise—especially when tests are used with populations they weren’t designed for. Sometimes these concerns are less about the test itself and more about societal goals, like those seen in debates over affirmative action. Ultimately, tests are tools, and their fairness depends on whether they are used appropriately and ethically.

Assumption 6: Testing and Assessment Can Be Conducted in a Fair and Unbiased Manner

How well did you know this?

Not at all

Perfectly

This assumption emphasizes that psychological testing and assessment ultimately benefit society. While some may see tests as burdensome, especially in academic settings, their absence would lead to chaos and inefficiency in critical fields like medicine, education, aviation, and the military. Without tests, there would be no standardized way to evaluate competence, diagnose difficulties, or make fair and informed decisions. Thus, testing serves as an essential tool for ensuring safety, fairness, and effective functioning in many areas of life.

Assumption 7: Testing and Assessment Benefit Society

How well did you know this?

Not at all

Perfectly

It refers to the consistency or stability of test scores over time or across raters. A good test or, more generally, a good measuring tool or procedure is reliable and involves consistency.

Reliability

How well did you know this?

Not at all

Perfectly

It refers to the accuracy of a test in measuring what it is intended to measure. A test is considered valid for a particular purpose if it does, in fact, measure what it purports to measure.

Validity

How well did you know this?

Not at all

Perfectly

It is a method of evaluation and a way of deriving meaning from test scores by evaluating an individual test taker’s score and comparing it to the scores of a group of test takers.

Norm-referenced Testing and Assessment

How well did you know this?

Not at all

Perfectly

It pertains to the test performance data of a particular group of test takers that are designed for use as a reference when evaluating or interpreting individual test scores.

Norms

How well did you know this?

Not at all

Perfectly

It is the group of people whose performance on a particular test is analyzed for reference in evaluating the performance of individual test-takers.

Study These Flashcards

Normative sample

It refers to the process of deriving norms.

Study These Flashcards

Norming

It is the controversial practice of norming on the basis of race or ethnic background.

Study These Flashcards

Race Norming

It consists of descriptive statistics based on a group of test-takers in a given period of time, rather than norms obtained by formal sampling methods.

Study These Flashcards

User Norms or Program Norms

It is the process of administering a test to a representative sample of test-takers for the purpose of establishing norms.

Study These Flashcards

Standardization or Test Standardization

A smaller group selected from a larger population, used to represent the characteristics of the whole population. A subset of that population.

Study These Flashcards

Sample

It is the process of selecting a portion of the universe deemed to be representative of the whole population.

Study These Flashcards

Sampling

A sampling method where the population is divided into subgroups (strata) based on a shared characteristic (e.g., age, gender, socioeconomic status), and samples are taken from each subgroup.

Stratified Sampling

A type of stratified sampling where, after dividing the population into subgroups (strata), you randomly select participants within each stratum.

Stratified-Random Sampling

It is a non-random sampling method where participants are intentionally selected because they are believed to be representative of the population or relevant to the specific purpose of the study.

Purposive Sampling

It is a non-random sampling method where participants are selected simply because they are easily accessible or readily available to the researcher.

Convenience Sampling

It is an expression of the percentage of people whose score on a test or measure falls below a particular raw score.

Percentile

It is the proportion of items answered correctly, calculated by multiplying the number of correct responses by 100 and dividing by the total number of items.

Percentage Correct

These are scores that indicate the average performance of individuals at a specific age, based on age norms from test-takers of different age groups.

Age-equivalent scores

These are developed by administering the test to representative samples of children over a range of consecutive grade levels (such as first through sixth grades).

Grade Norms

It is a term applied broadly to norms developed on the basis of any trait, ability, skill, or other characteristic that is presumed to develop, deteriorate, or otherwise be affected by chronological age, school grade, or stage of life.

Developmental Norms

These are derived from a normative sample that was nationally representative of the population at the time the norming study was conducted.

National Norms

These are reference scores used to link or compare results from different standardized tests that measure the same ability or trait. They provide a common scale or “anchor” to ensure equivalency between scores on different tests.

National Anchor Norms

It is a statistical technique used to equate scores from different tests by matching them based on their corresponding percentile ranks.

Equipercentile Method

It refers to the normative data that are broken down by specific criteria or characteristics (e.g., age, education level, socioeconomic status, geographic region) to provide more detailed and focused comparisons for different subgroups within a larger sample.

Subgroup Norms

These are normative data developed specifically for a local population (e.g., a school, company, or community), based on their unique characteristics or performance, as opposed to relying on national norms. These are particularly useful when the national norms do not accurately reflect the local context.

Local Norms

It is a method of interpreting test scores by comparing them to the performance of a specific, unchanging group (the fixed reference group), which serves as a standardized benchmark for scoring.

Fixed Reference Group Scoring System

It is a standard on which a judgment or decision may be based.

Criterion

It may be defined as a method of evaluation and a way of deriving meaning from test scores by evaluating an individual’s score with reference to a set standard. It has also been referred to as domain- or content-referenced testing and assessment.

Criterion-referenced Testing and Assessment

The "Do's" of Culturally Informed Assessment

1. Be aware of the cultural assumptions on which a test is based 2. Consider consulting with members of a particular cultural communities regarding the appropriateness of particular assessment techniques, tests, or test items 3. Strive to incorporate assessment methods that complement the worldview and lifestyle of assessees who come from a specific cultural and linguistic population 4. Be knowledgeable about the many alternative tests or measurement procedures that may be used to fulfill the assessment objectives 5. Be aware of equivalence issues across cultures, including equivalence of language used and the constructs measured 6. Score, interpret, and analyze assessment data in its cultural context with due consideration of cultural hypotheses as possible explanations for findings

The "Don'ts" of Culturally Informed Assessment

1. Take for granted that a test is based on assumptions that impact all groups in much the same way 2. Take for granted that members of all cultural communities will automatically deem particular techniques, tests, or test items appropriate for use 3. Take a “one-size-fits-all” view of assessment when it comes to evaluation of persons from various cultural and linguistic populations 4. Select tests or other tools of assessment with little or no regard for the extent to which such tools are appropriate for use with a particular assessee. 5. Simply assume that a test that has been translated into another language is automatically equivalent in every way to the original 6. Score, interpret, and analyze assessment in a cultural vacuum

CHAPTER 4: OF TESTS AND TESTING Flashcards

(43 cards)