Midterm Flashcards
reliability
Refers to the consistency, accuracy, or stability of test scores
May be affected by the time that the test is administered, the items included on the test, external distractions, internal distractions, person grading the test, etc.
testing vs. assessment
testing: one single test is administered, less complex, requires less training
assessment: multiple tests are administered and other sources of information are collected, can be used to make diagnoses or recommendations
psychological test
A measurement device or technique used to quantify behaviour or aid in the understanding and prediction of behaviour
Designed to measure human characteristics that pertain to behaviours
personality tests
Look at overt and covert dispositions of an individual
Structured personality tests:
- Require a person to endorse or reject statements about themselves
- Typically self-report
projective tests
Typically shown ambiguous images and asked what you see, generally unreliable tests
Reactions/responses to ambiguous stimuli are noted and interpreted
Assumption that responses reflect individual characteristics
achievement tests
Assess prior learning
Measures intelligence but also education
Testing for learning disabilities
aptitude tests
Evaluates one’s potential for learning
intelligence tests
Tests:
- ability to solve problems
- potential to adapt to changing situations
- think abstractly
- profit from experiences
purposes of assessment
1) Screening: Brief evaluation given to identify clients who
◦ Are eligible for certain programs
◦ May have a disorder or disability in need of remediation or rehabilitation
◦ May need a more comprehensive assessment
2) Focused/problem solving: Detailed evaluation of specific area of functioning
◦ May address diagnostic question or clarify a referral question, e.g., Does the client have a memory deficit?
◦ May address a skill question, e.g., Does the child exhibit poor social skills?
3) Diagnosis: Detailed evaluation of client’s strengths and weaknesses in several areas, such as cognitive, academic, language, and social functioning
Involves:
◦ Diagnosis
◦ Making suggestions for placement and intervention
4) Counselling and rehabilitation: Evaluation of client’s abilities to adjust to and successfully fulfill daily responsibilities,
Possible responses to treatment and potential for recovery also considered
5) Progress evaluation (outcome monitoring): Evaluation of the day-to-day, week-to-week, month-to-month, or year-to-year progress of the client
Used to evaluate changes in the client’s functioning and skills and to evaluate the effectiveness of intervention procedures
multi-source, multi-dimension-multi-method assessment
Sources: child, parent, teachers, records, other family members, etc.
Methods: informal assessment procedures, observations, interviews, norm-referenced tests, etc.
Dimensions: intelligence, memory, achievement, oral language, adaptive behaviour, etc.
These are all triangulated to form results, which lead to clinical impressions, which lead to recommendations.
principles of psychological testing
1) Reliability: Accuracy, dependability, consistency, or repeatability of test results
2) Validity: Meaning and usefulness of test results (How appropriate are specific interpretations or inferences of test results)
3) Test administration: How a test is given to test takers
historical perspective (early antecedents)
Civil service testing may have been formalized as early as 4000 years ago in China
Test batteries were in use during the Han Dynasty (206–220 B.C.E.) (Multiple tests used to assess the same issue)
Introduced to the Western world via the English East India Company in the early 1800s, as civil service testing procedures mirrored the early Chinese systems
Charles Darwin and Individual Differences
The Origin of Species (1859)
- Evolutionary model arguing that different species develop traits that are adaptive for their survival
Applied to human beings by Sir Francis Galton (a relative of Darwin)
- Hereditary Genius (1883)
- Argued that some people have traits that make them more fit than others
Galton’s work was extended by James McKeen Cattell, leading to the development of modern tests
factor analysis
A technique for reducing many variables to a smaller set of factors (seeks the minimum number of factors, or dimensions, that can be used to describe a data set)
Charles Spearman provided conceptual foundation
Enabled advancement of testing, trait theory
Mathematically allows us to say which items are correlated and which are not
E.g., the Big 5 personality traits (many smaller factors like fearlessness and depressivity can be associated with the larger trait of Neuroticism)
empirical criterion coding
Taking a test item that looks at a specific criterion and using it to predict something even if that is not what it measures at face value
Created using the Carnegie Interest Inventory – 1921 (Empirical key that differentiated responses of successful/unsuccessful
salesmen)
nominal scales
Only purpose naming objects
Often assigns an arbitrary number to a given object
(1=male, 2=female, etc.)
ordinal scales
Ranks objects but the difference between ranks has no meaning
Most psychological tests fall here
E.g., level of education, leaker scales, etc.
interval scales
Has magnitude and equal intervals, but no absolute zero
E.g., temperature
ratio scales
Properties of the interval scale, but does have an absolute zero
E.g., weight in lbs
choosing between mean, median, and mode
Mean: essential when calculating many useful statistics
Median: often preferred with skewed distributions
Mode: Useful with nominal level data
parameters vs. statistics
Parameters: Used when studying populations
Statistics: Used when studying samples (more common than parameters)
mean, median, and mode in skewed distributions
Positively skewed (tail pulled right): mode–>median–>mean, so mode is at highest part of the distribution
Negatively skewed (tail pulled left): mean–>median–>mode, so mode is still at highest part of the distribution and mean at lowest
Normal distribution: mean, median, and mode are all the same, perfectly in centre of distribution
ceiling effect
Majority of values obtained for a variable approach the upper limit of the scale used in its measurement
E.g. a test whose items are too easy
E.g. administering a binge eating questionnaire and a measure of clinical impairment to a sample of people seeking treatment for binge eating –> severity of binge eating isn’t correlated with level of impairment…but it would be in a community (i.e. non-clinical) sample
floor effect
Most values approach lower limit of scale
E.g. assessing pubertal development at age 5
E.g. a test that is too difficult
norm-referenced score interpretations
The examinee’s performance is compared to that of other people (e.g., comparing a nurses test score to that of other nurses)
Norm-referenced interpretations are relative
Relative to the performance of others
Most of the tests psychologists use are norm-referenced
In some cases, there are norms for certain populations, e.g., impulsivity in people with ADHD
criterion-referenced score interpretations
The examinee’s performance is compared to a specified level of performance
Criterion-referenced interpretations are absolute
Compared to an absolute standard
Criterion-referenced interpretations are often used in educational settings (e.g., on the EPPP to become a licensed psychologist or a driver’s written test)
classical test theory (CTT)
Looks at what proportion is error variance and what proportion is what we are aiming to measure
CTT is the most influential theory to help us understand measurement issues.
Initiated by Charles Spearman in the early 1900s and was expanded by a number of measurement experts
Holds that every score has two components:
- True score that reflects the examinee’s true skills, abilities, knowledge,
etc.
- Error score
Xi =T+E
- Xi = Obtained or observed score
- T = True score
- E = Random measurement error
Random measurement error varies from:
- person to person
- test to test
- administration to administration
CTT allows us to estimate the reliability of test scores
content sampling error
Results from differences between the sample of items on the test, and the total domain of items (i.e., all possible items)
E.g., only asking about binge eating when doing a test on eating disorders
If the items on the test are a good sample of the domain, content sampling error will be small.
Content sampling is typically considered the largest source of measurement error.
time sampling error
Reflects random fluctuations in performance over time
Includes changes in:
- the examinee (e.g., fatigue, illness, anxiety)
- the environment (e.g., distractions, temperature)
Also referred to as temporal stability.
E.g., low temporal stability: anxiety over the school term
inter-rater differences
When scoring is subjective, inter-rater differences can introduce error (bias that the person grading the test brings to the table)
Errors in administration
Clerical errors
reliability coefficients
σ2X =σ2T +σ2E
σ2X = Observed score variance
σ2T = True score variance
σ2E = Error score variance