Midterm Flashcards

1
Q

reliability

A

Refers to the consistency, accuracy, or stability of test scores

May be affected by the time that the test is administered, the items included on the test, external distractions, internal distractions, person grading the test, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

testing vs. assessment

A

testing: one single test is administered, less complex, requires less training

assessment: multiple tests are administered and other sources of information are collected, can be used to make diagnoses or recommendations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

psychological test

A

A measurement device or technique used to quantify behaviour or aid in the understanding and prediction of behaviour

Designed to measure human characteristics that pertain to behaviours

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

personality tests

A

Look at overt and covert dispositions of an individual

Structured personality tests:
- Require a person to endorse or reject statements about themselves
- Typically self-report

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

projective tests

A

Typically shown ambiguous images and asked what you see, generally unreliable tests

Reactions/responses to ambiguous stimuli are noted and interpreted

Assumption that responses reflect individual characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

achievement tests

A

Assess prior learning

Measures intelligence but also education

Testing for learning disabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

aptitude tests

A

Evaluates one’s potential for learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

intelligence tests

A

Tests:
- ability to solve problems
- potential to adapt to changing situations
- think abstractly
- profit from experiences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

purposes of assessment

A

1) Screening: Brief evaluation given to identify clients who
◦ Are eligible for certain programs
◦ May have a disorder or disability in need of remediation or rehabilitation
◦ May need a more comprehensive assessment

2) Focused/problem solving: Detailed evaluation of specific area of functioning
◦ May address diagnostic question or clarify a referral question, e.g., Does the client have a memory deficit?
◦ May address a skill question, e.g., Does the child exhibit poor social skills?

3) Diagnosis: Detailed evaluation of client’s strengths and weaknesses in several areas, such as cognitive, academic, language, and social functioning
Involves:
◦ Diagnosis
◦ Making suggestions for placement and intervention

4) Counselling and rehabilitation: Evaluation of client’s abilities to adjust to and successfully fulfill daily responsibilities,

Possible responses to treatment and potential for recovery also considered

5) Progress evaluation (outcome monitoring): Evaluation of the day-to-day, week-to-week, month-to-month, or year-to-year progress of the client
Used to evaluate changes in the client’s functioning and skills and to evaluate the effectiveness of intervention procedures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

multi-source, multi-dimension-multi-method assessment

A

Sources: child, parent, teachers, records, other family members, etc.

Methods: informal assessment procedures, observations, interviews, norm-referenced tests, etc.

Dimensions: intelligence, memory, achievement, oral language, adaptive behaviour, etc.

These are all triangulated to form results, which lead to clinical impressions, which lead to recommendations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

principles of psychological testing

A

1) Reliability: Accuracy, dependability, consistency, or repeatability of test results

2) Validity: Meaning and usefulness of test results (How appropriate are specific interpretations or inferences of test results)

3) Test administration: How a test is given to test takers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

historical perspective (early antecedents)

A

Civil service testing may have been formalized as early as 4000 years ago in China

Test batteries were in use during the Han Dynasty (206–220 B.C.E.) (Multiple tests used to assess the same issue)

Introduced to the Western world via the English East India Company in the early 1800s, as civil service testing procedures mirrored the early Chinese systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Charles Darwin and Individual Differences

A

The Origin of Species (1859)
- Evolutionary model arguing that different species develop traits that are adaptive for their survival

Applied to human beings by Sir Francis Galton (a relative of Darwin)
- Hereditary Genius (1883)
- Argued that some people have traits that make them more fit than others

Galton’s work was extended by James McKeen Cattell, leading to the development of modern tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

factor analysis

A

A technique for reducing many variables to a smaller set of factors (seeks the minimum number of factors, or dimensions, that can be used to describe a data set)

Charles Spearman provided conceptual foundation

Enabled advancement of testing, trait theory

Mathematically allows us to say which items are correlated and which are not

E.g., the Big 5 personality traits (many smaller factors like fearlessness and depressivity can be associated with the larger trait of Neuroticism)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

empirical criterion coding

A

Taking a test item that looks at a specific criterion and using it to predict something even if that is not what it measures at face value

Created using the Carnegie Interest Inventory – 1921 (Empirical key that differentiated responses of successful/unsuccessful
salesmen)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

nominal scales

A

Only purpose naming objects

Often assigns an arbitrary number to a given object
(1=male, 2=female, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

ordinal scales

A

Ranks objects but the difference between ranks has no meaning

Most psychological tests fall here

E.g., level of education, leaker scales, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

interval scales

A

Has magnitude and equal intervals, but no absolute zero

E.g., temperature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

ratio scales

A

Properties of the interval scale, but does have an absolute zero

E.g., weight in lbs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

choosing between mean, median, and mode

A

Mean: essential when calculating many useful statistics

Median: often preferred with skewed distributions

Mode: Useful with nominal level data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

parameters vs. statistics

A

Parameters: Used when studying populations

Statistics: Used when studying samples (more common than parameters)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

mean, median, and mode in skewed distributions

A

Positively skewed (tail pulled right): mode–>median–>mean, so mode is at highest part of the distribution

Negatively skewed (tail pulled left): mean–>median–>mode, so mode is still at highest part of the distribution and mean at lowest

Normal distribution: mean, median, and mode are all the same, perfectly in centre of distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

ceiling effect

A

Majority of values obtained for a variable approach the upper limit of the scale used in its measurement

E.g. a test whose items are too easy

E.g. administering a binge eating questionnaire and a measure of clinical impairment to a sample of people seeking treatment for binge eating –> severity of binge eating isn’t correlated with level of impairment…but it would be in a community (i.e. non-clinical) sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

floor effect

A

Most values approach lower limit of scale

E.g. assessing pubertal development at age 5

E.g. a test that is too difficult

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

norm-referenced score interpretations

A

The examinee’s performance is compared to that of other people (e.g., comparing a nurses test score to that of other nurses)

Norm-referenced interpretations are relative

Relative to the performance of others

Most of the tests psychologists use are norm-referenced

In some cases, there are norms for certain populations, e.g., impulsivity in people with ADHD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

criterion-referenced score interpretations

A

The examinee’s performance is compared to a specified level of performance

Criterion-referenced interpretations are absolute

Compared to an absolute standard

Criterion-referenced interpretations are often used in educational settings (e.g., on the EPPP to become a licensed psychologist or a driver’s written test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

classical test theory (CTT)

A

Looks at what proportion is error variance and what proportion is what we are aiming to measure

CTT is the most influential theory to help us understand measurement issues.

Initiated by Charles Spearman in the early 1900s and was expanded by a number of measurement experts

Holds that every score has two components:
- True score that reflects the examinee’s true skills, abilities, knowledge,
etc.
- Error score

Xi =T+E
- Xi = Obtained or observed score
- T = True score
- E = Random measurement error

Random measurement error varies from:
- person to person
- test to test
- administration to administration

CTT allows us to estimate the reliability of test scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

content sampling error

A

Results from differences between the sample of items on the test, and the total domain of items (i.e., all possible items)

E.g., only asking about binge eating when doing a test on eating disorders

If the items on the test are a good sample of the domain, content sampling error will be small.

Content sampling is typically considered the largest source of measurement error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

time sampling error

A

Reflects random fluctuations in performance over time

Includes changes in:
- the examinee (e.g., fatigue, illness, anxiety)
- the environment (e.g., distractions, temperature)

Also referred to as temporal stability.

E.g., low temporal stability: anxiety over the school term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

inter-rater differences

A

When scoring is subjective, inter-rater differences can introduce error (bias that the person grading the test brings to the table)

Errors in administration

Clerical errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

reliability coefficients

A

σ2X =σ2T +σ2E

σ2X = Observed score variance
σ2T = True score variance
σ2E = Error score variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

test-retest reliability

A

A type of reliability coefficient

Administer the same test on two occasions

Correlate the scores from both administrations

Primarily reflects time sampling error

Important to consider length of interval between testing (e.g., taking a test at two and then again at 5 will make a big difference)

Carry-over effects are a major limitation (they will probably do better the second time they do the test because they learned from the first one)

33
Q

alternate-form reliability

A

A type of reliability coefficient

Requires two equivalent or parallel forms

Correlate the scores of the different forms

Can be administered:
- Simultaneously: primarily sensitive to time sampling errors
- Delayed: sensitive to both content and time sampling error

Alternate-form reliability may reduce, but typically not eliminate carry effects.

Few tests have alternate forms.

34
Q

split-half reliability

A

A type of reliability coefficient

Administer the test, then divide it into two equivalent halves

Correlate the scores for the half tests

Typically an odd/even split

Spearman-Brown correction formula

Primarily reflects content-sampling error

Can be calculated from one test administration

35
Q

coefficient alpha (Cronbach’s Alpha)

A

A type of reliability coefficient

Examines the consistency of responding to all items

Sensitive to content-sampling error

Also sensitive to item heterogeneity (best used when their is only one construct)

Coefficient alpha is broadly applicable

Can be calculated from one test administration

Just because a high Cronbach’s Alpha value is obtained for a test, does not mean that test is “reliable”!

The wider a construct is, the lower Alpha will be

The more items you have, the higher Alpha will be (redundant questions)

Indicates nothing about validity

36
Q

composite scores

A

When scores are combined to form a composite

For example, IQs are typically composite scores

The reliability of composite scores is typically better than the individual scores in composite because we are combining multiple scores

37
Q

validity

A

1) You should never accept a test’s name as an index of what it measures.

2) Validity is not a yes/no decision.

38
Q

content validity

A

Determine whether a test forms a representative sample of the defined content domain.

A clear definition of the content domain is necessary

How? Use experts; systematic judgment.

39
Q

face validity

A

Whether the test appears (at face value) to measure what it claims to.

The least sophisticated measure of validity.

Tests wherein the purpose is clear, even to naïve respondents, are said to have high face validity

Example: the Cat Adoration Test (CAT)*
- High face validity: “I like cats”
- Low face validity: “I would prefer to curl up at home with a book than go for a walk in the park” (doesn’t ask about whether you like cats)

40
Q

criterion validity

A

Criterion validity involves the empirical relation between the measure and variables having a given level of relevance to the attribute.

Usually
- correlation coefficients or
- differences between groups (Cohen’s d)

Two aspects:
- Convergent validity: measures of constructs that theoretically should be related to each other are, in fact, observed to be related to each other.
- Predictive validity
- Concurrent validity

Discriminant validity: measures of constructs that theoretically should not be related to each other are, in fact, observed to not be related to each other

E.g., Imagine you are developing a body dissatisfaction questionnaire.
Convergent validity would be indicated by… anxiety
Discriminant validity would be indicated by… introversion/extraversion

41
Q

multi-trait-multi-method matrix

A

Campbell & Fiske, 1959

Collect data on 2+ distinct traits/constructs, by 2+ methods (e.g., self-report,
behavioural observation, informant report)

Compute intercorrelations among resultant scores

Compile intercorrelations into a matrix (i.e., a special kind of table)

Reliability diagonal: same trait same method, highly correlated, demonstrates test-retest reliability

validity diagonals: same trait, different method, expect this to be correlated, indicates convergent validity

heterotrait-heteromethod triangles: low correlation

heterotrait-monomethod triangles: not super correlated, but more than heterotrait-heteromethod

42
Q

advantages and disadvantages of the MTMM approach

A

Advantages
* Allows examining convergent and discriminant validity simultaneously
* Stresses importance of unbiased measurement
* Stresses the characteristics that a good test should have

Disadvantages
* Often not feasible (too complex)
* Judgmental nature: Does not quantify the degree of construct validity in one single coefficient

43
Q

construct validity

A

Refers to overall evidence that the test measures the variable that it is intended to measure, rather than anything else.

Constructs: hypothetical abstractions (e.g., “food addiction,” “anxiety,” “perfectionism”)

Construct validity is an umbrella term encompassing all the evidence that determines whether a test measures a theoretical construct that the test purports to measure.

The followings can all be used to support construct validity of a measure:
- Content validity evidence
- Criterion validity evidence (convergent/discriminant)
- Experimental manipulation
- Contrasted groups
- Factor Analysis (we will learn about this later)
- Multi-trait and multi-method (MTMM) assessment approaches
- Etc.

44
Q

test item types

A

Selected-response items: close—ended (test-taker selects an option among possible responses)

Dichotomous: 2 choices for each question, e.g., True/False, Yes/No

Polytomous: More than 2 choices, e.g., Multiple choice, Likert scales, Visual Analog Scales

Constructed-response items: free response, e.g., essay questions, interviews, projective tests

45
Q

common problems in multiple-choice item writing

A

Unfocused stem: The stem should include the information necessary to answer the question. Test takers should not need to read the options to figure out what question is being asked.

Negative stem: Whenever possible, the stem should exclude negative terms such as not and except

Window dressing: Information in the stem that is irrelevant to the question or concept being assessed is considered “window dressing” and should be avoided.

Unequal option length: The correct answer and the distractors should be about the same length.

Negative options: Whenever possible, response options should exclude negatives such as “not”

Clues to the correct answer: Test writers sometimes inadvertently provide clues by using vague terms such as might, may, and can. Particularly in the social sciences where certainty is rare, vague terms may signal that the option is correct.

Heterogenous options: The correct option and all of the distractors should be in the same general category.

46
Q

the Likert format

A

Offers a continuum of responses that allow for measurements of attitudes on various topics

Is open to factor analysis, and groups of items that go together can be identified

E.g., Strongly disagree, somewhat agree, neither agree nor disagree, somewhat agree, strongly agree (five-choice format with a neutral point)

47
Q

qualitative item analysis

A

Set the test aside and review a few days later

Have a colleague review the test

Have examinees provide feedback after taking the test

Recommended to use both quantitative and qualitative approaches

48
Q

quantitative item analysis: percent endorsement statistic

A

Indicates the percentage of examinees that responded to an item in a given manner

Is no-one selecting one of your response options? Why not?

In an ability test, this indicates the response option is not an effective
distractor item

In a “typical response test” (e.g., personality, mental health), it could indicate a problem with the test, but not necessarily…

Some items assess clinical phenomena for which you would expect endorsement to be low in community samples (e.g., few people report self- induced vomiting)

You could also be asking about something that is incredibly rare

49
Q

quantitative item analysis: item discrimination

A

Discrimination refers to how well an item differentiates among test takers who differ on the construct being measured.

Applies primarily to items in test of ability (i.e., when
there is a right and wrong answer)

The difference in the proportion of test takers in the upper vs. lower criterion groups who answer a given item correctly

Commonly: select the top and bottom 27% of test takers in terms of their overall performance on the test, and exclude the middle 46%

In other words: did the top 27% of my class tend to get this question right? If they didn’t, it’s probably a poor question.

50
Q

index of discrimination (D) calculation and interpretation

A

Compute percentage of individuals (p) within the top and bottom groups who passed the item

The difference between these proportions is the discrimination index, designated as D

Formula:
D = (proportion of upper group who got it right) − (proportion of bottom group who got it right)

Item difficulty index can range from -1.0 to 1.0, or -100 to 100 (can be expressed as a decimal or percentage)
- Negative values indicate that the “worst” of the group are more likely to get the item right, so it should be discarded
- The higher the value, the better it discriminates between the upper and lower groups

51
Q

item-total correlation coefficients

A

Item discrimination can also be examined by correlating item scores with the total test score.

Item total-correlations are usually calculated using the point-biserial correlation.

Large item-total correlation suggests that an item is measuring the same construct as the overall test measures.

52
Q

item response theory

A

Theory of mental measurement that holds that the responses to items are accounted for by latent traits

Latent trait: an ability or characteristic that is inferred based on theories of behavior, as well as empirical evidence, but cannot be assessed directly.

Central to IRT is a complex mathematical model that describes how examinees at different levels of ability will respond to individual test items.

53
Q

Rasch IRT model

A

“Simplest” model

Named after the Danish mathematician and statistician Georg Rasch

Referred to as a one-parameter IRT model

Assumes that items differ in only one parameter; difficulty (i.e., b parameter)

54
Q

two-parameter IRT model

A

Assumes that items differ in both difficulty and discrimination

The ICCs differ not only in their inflection points, but also in their
slopes.

Better reflects real-life test development applications than the one- parameter IRT model

55
Q

item characteristic curves (ICC)

A

A graph with ability reflected on the horizontal axis and the probability of a correct response reflected on the vertical axis

Each item has its own specific ICC

ICCs incorporate information about the item’s difficulty and discrimination ability.

56
Q

ICC interpretation

A

The point halfway between the lower and upper asymptotes is referred to as the inflection point.

Represents the difficulty of the item

Discrimination is reflected by the slope of the ICC at the inflection point.

ICCs with steeper slopes demonstrate better discrimination than those with gentler slopes.

Shifting an ICC to the right makes the item more difficult.

57
Q

three-parameter IRT model

A

One- and two-parameter IRT models assume a zero percent possibility of answering the items by chance; they assume that items vary only in their difficulty and discrimination ability

Indicated by lower asymptote approaching zero (guessability)

Three-parameter model assumes that even if the respondent essentially has no “ability,” there is still a chance he or she may answer the item correctly simply by chance

58
Q

ethics

A

What one should or should not do, according to principles or norms of conduct

Encompasses commonly endorsed values of professional psychology, in the service of protecting the public and the profession

Ethics codes are not produced by democratically-elected legislatures

Enforcement mechanisms: usually informal, may be complex (e.g., Jordan Peterson spreading hate speech online, has to take social media training course now)

59
Q

qualifications for ethical use of tests

A

Level A tests: Paper-and-pencil measures./interest tests, with simple interpretation, Requires advanced courses in testing from an accredited college/university, or equivalent training by a qualified supervisor

Level B tests: Most individual or group achievement or interest tests, screening inventories, and personnel tests, Must have adequate training in psychometric principles and supervised experience in administration, scoring, and interpretation of the test. Requires advanced courses in testing from an accredited college/university, or equivalent training by a qualified supervisor; certain tests may specify a Masters degree in Psyc or Education.

Level C tests: Tests and aids which require advanced training and experience; generally, any aptitude, language, personality, or clinical diagnostic test, or tests used for teaching or decision making purposes, Graduate level training in the specific field to which the tests apply (e.g., school, clinical, or counselling psyc; minimum 2 university courses in tests and measurement). Minimum Masters and/or PhD in education, psychology, or related discipline; verification of licensure or registration; training and supervision in the instrument.

60
Q

informed consent

A

Before administering a test, psychologists must obtain informed consent from examinees/guardians

Includes explanation of the nature and purpose of the assessment, fees, involvement of third parties, and limits of confidentiality, rights as test takers

Exceptions when:
1. Testing is mandated by law or governmental regulations (e.g., in court)
2. Informed consent is implied because testing is conducted as a routine educational, institutional or organizational activity
3. One purpose of the testing is to evaluate decisional capacity

Minors do not have the legal right to assent, consent, or object to a proposed psychoeducational assessment. They should still be fully informed about the nature and purpose of the testing.

61
Q

knowledge of results

A

Must disclose test results in understandable language

Access rights do not apply to a record that contains raw
data from standardized psychological tests or assessments (Tests are copyrighted, and release of raw data would compromise test security)

62
Q

ethical use of tests issues

A
  1. Competence
  2. Informed Consent
  3. Knowledge of Results
  4. Confidentiality
  5. Test security
  6. Conflicts of interest
63
Q

theoretical issues

A

Are you measuring a stable characteristic of the person being tested?

If so, differences in scores over time reflect measurement error or subject variables such as fatigue

Especially problematic for personality tests

64
Q

adequacy of tests

A

What should go into an assessment of test adequacy?

So far, in considering the relative merits of various tests, we have asked whether they are psychometrically adequate

We haven’t asked: is the best test available good enough?

Society uses that standard when testing becomes a legal issue

65
Q

actuarial vs. clinical judgement

A

Is it possible for us to make good judgments on a question when we cannot articulate the basis for the judgment?

Actuarial judgment occurs when we feed test scores into statistical formulas to diagnose a psychological condition or predict future performance.

In actuarial judgment, we cannot make accurate predictions tailored to individuals. Our conclusion will be the same for every person with a given set of test scores.

Clinical judgment occurs when we have a trained psychologist interpret test scores to diagnose a psychological condition or predict future performance.

In clinical judgment, the claim is that you can determine “what caused what” in an individual’s person’s life (Dawes, 1994), but clinical judgment does not improve with experience.

66
Q

stereotype threat

A

Two levels of threat on a test

  1. Anxiety over how one will be evaluated and how well he or she will perform
  2. For members of a stereotyped group, pressure to disconfirm negative stereotypes

Research finds many examples of how such expectations/stereotypes may impact test performance

Hypotheses:
- Stereotype threat depletes working memory
- Self-handicapping leads to reduced effort and, in turn, reduced performance
- A problem with this hypothesis is its “blame the victim” tone
- Stereotype threat causes physiological arousal that can disrupt performance

67
Q

dehumanization

A

Does computerized testing and analysis of test results create a danger of minimizing human uniqueness?

Humans are very complex – which allows us to be individuals, different from each other, but testing and interpretation generalize

68
Q

when tests harm

A

Though perhaps not what many intend, tests may assign limiting labels to individuals, discriminate against them, and interfere with personal growth (e.g., labelling someone as borderline)

Carol Dweck and mindsets:
- Fixed mindset
- Growth mindset

Do tests feed into a fixed mindset mentality?

Mindsets can be changed, but test results may hinder that effort

69
Q

access to psychological testing

A

What about possible benefits of tests? Who gets those benefits?

Benefits are not always equitably distributed (people who need them most can’t access them)

WAIS-IV kit costs $1423– for tester to buy the kit. Cost must be passed on to someone. Who should that be?

Comprehensive assessment: time and resource-intensive

Psychological services are not covered by Medicare in NB

70
Q

Principle of Respect for the Dignity of Persons and Peoples

A

I.45: Share confidential information with others only with the informed consent of those involved, or in a manner that the persons involved cannot be identified, except as required or justified by law, or in circumstances of actual or possible serious physical harm or death.

71
Q

Principle of Responsible Caring

A

II.13: Assess the individuals and groups (e.g., couples, families, organizations, communities, peoples) involved in their activities adequately enough to ensure that they will be able to discern what will benefit and not harm them, using assessment methods that are appropriate to the particular cultural and social contexts of the individuals and groups involved.

II.18: Strive to provide and/or obtain the best reasonably accessible service for those seeking psychological services. This may include, but is not limited to… selecting assessment tools..that are: (a) relevant and tailored to the needs, characteristics, and contexts of the primary client or contract examinee; and (b) based on the best available evidence in light of those needs, characteristics, and contexts.

II.20: Provide suitable information, unless declined or contraindicated, about the results of assessments, evaluations, or research findings to the
individuals and groups (e.g., couples, families, organizations, communities, peoples) involved. This information would be communicated in ways that are developmentally, linguistically, and culturally appropriate, and that are meaningful and helpful.
44

72
Q

Principle of Responsibility to Society

A

IV.11: Protect the skills, knowledge, and interpretations of psychology from being misinterpreted, misused, used incompetently, or made useless (e.g., loss of security of assessment techniques) by others.

Less important than responsibility to individual client

73
Q

What are standardized achievement tests really measuring?

A

SAT scores reflect how many opportunities a student has been afforded

May mirror and maintain racial inequity

Determining the exact nature of why test differences exist, how to interpret those differences, and what to do about them is a complex endeavor

Test scores are a less accurate predictor of subsequent performance among Black, Hispanic, and Latino test- takers

74
Q

cross-cultural adaptation of questionnaires

A

Must consider several aspects of equivalence:
1. Conceptual equivalence: domains have the same relevance, meaning and
importance regarding the explored concept in both cultures.

  1. Item equivalence: items are as relevant and acceptable in both cultures.
  2. Semantic equivalence: the meaning of the items is the same in both cultures.
  3. Operational equivalence: the questionnaire can be used in the same way by its target population in both cultures.
  4. Measurement equivalence: no significant difference in psychometric properties (construct validity, reliability, responsiveness, and so forth) of the two versions.
  5. Functional equivalence is a summary of the preceding five equivalences: both versions of the instrument “do what they are supposed to do equally well.”
75
Q

24-hour dietary recall

A
  • Usually done in context of bigger populations
  • Automated multiple-pass method (AMPM), go through the day multiple times
  • Quick list of everything that that person ate
  • Prompts for forgotten foods in predefined categories and to guess quantities/portions
  • Ask for time and occasion, location
  • Does not assess long-term or unusual intake
  • Could include systematic errors (e.g., forgotten foods, inaccurate portion estimate, etc.)
76
Q

diet record

A
  • Self-reported, like a journal at time of eating
  • Sometimes food weighed or ingredients weighed
  • Recall bias not relevant, but there is self-monitoring bias
  • Places a large burden on the participant
  • Not practical for large population studies but this is still the most accurate approach
  • Does not assess long-term or unusual intake
77
Q

food frequency questionnaires

A
  • Food list combined with frequency response
  • Never, once a month or less, 2-3 times per month, once per week, 2-4 times per week, …
  • Easier way to get usual intake
  • Also cheaper to than dietary recalls because there is no labour (no one has to ask the questions, just give them the questionnaire)
  • Need to consider in advance what your research question is (are you looking at a particular diet/food/food group?)
  • Need to understand how participants understand questions (e.g., juice could be understood differently by different people)
  • Minimize day-to-day intake errors but increase errors in averaging long time intervals
78
Q

healthy eating index

A
  • Assess adequacy and moderation
  • 11 dietary components scored out of possible score of 100
  • Overall measure of diet quality

Uses:
- Evaluating nutritional health of the
population
- Assessing trends over time
- Comparing population groups
́
Challenges: components are
associated with other components

79
Q

Intra-Individual Variation in
Energy Intake

A

Refers to the variability of energy intake (caloric intake) within an individual’s diet

For example: I consumed 2300 calories on Monday and then 1600 calories on Tuesday