Biais, Échelles de validité Méthodes de construction d’instrument de mesure Types d’items et échelles de réponses Traduction / adaptation transculturelle Flashcards
Biais des tests
Une chose très importante est de ne pas confondre différence de moyenne entre des groups et biais
-The public sometimes has the impression that all assessment instruments are biased (e.g., by age, by sex/gender, by ethnic group, by clinical group, etc.).
-This is sometimes the case and it is the duty of the test user to be aware of it. devoir de l’utilisateur.trice
Reminder:
Bias = systematic error, is not random
Biais = de l’erreur systématique, n’est pas aléatoire
One very important thing is not to confuse difference in means between groups with bias
Biais des tests
Differences in means between certain groups are not a priori a bias since some are theoretically/conceptually expected
e.g., In adolescence, few or no differences in means between ethnic groups for behavior problems, but differences by sex/gender
e.g., In adulthood, presence of sex differences in some personality traits, but few or none in adolescence
normative: compared to mass population
biase de test - Un instrument d’évaluation est biaisé
Un instrument d’évaluation est biaisé «si les differences entre les membres de différents groupes sont identifiées sur la base de caractéristiques autres que celles que l’instrument prétend évaluer» (Merrell, 2008; Whitcomb, 2017)
Autrement dit, il y a présence de biais pour un instrument si le contenu, la procédure ou l’utilisation favorise ou défavorise systématiquement les membres d’un groupe plutôt qu’un autre et si cette différenciation est non pertinente à l’objectif de l’instrument
An assessment instrument is biased “if differences between members of different groups are identified on the basis of characteristics other than those the instrument purports to assess” (Merrell, 2008; Whitcomb, 2017)
In other words, bias is present for an instrument if the content, procedure, or use systematically favors or disfavors members of one group over another and if this differentiation is irrelevant to the purpose of the instrument
how is reliability affected directly
As we have seen, the fidelity of scores on an assessment instrument can be compromised by various sources of measurement error
-We have also seen that the inferences and interpretations permitted with scores provided by an assessment instrument are dependent on the degree of validity of those scores.
-Validity can be affected directly by (a) response bias on individual items or
(b) scale score biasThe presence of bias is a critical issue for both test developers and test users
(a) biais de réponse aux items individuels ou par des
(b) biais des scores à une echelle
Biais de réponse: heuristiques ou biais cognitifs
People who are being assessed and asked questions, whether about themselves or as an informant for a third party, are always at risk of being partially biasedFor example, in a job interview where a person has to answer a personality questionnaire, would they want to look their best? Or even better than their best?Even at a basic level, it is now recognized that the human cognitive system is “victimized” by several heuristics or cognitive biases (Kahneman, 2011; Kahneman, Slovic & Tversky, 1982)
Heuristiques associés aux styles de réponses
Heuristiques: Stratégies cognitives utilisées pour simplifier et accélérer une décision en situation d’incertitude (Kahneman, 2011).
Heuristics: Cognitive strategies used to simplify and speed up a decision under uncertainty (Kahneman, 2011)Sometimes referred to as “mental shortcuts.”
Apply to behavioral evaluation/estimationVery useful when one does not know a person to be evaluated well enough.
Can also lead to misjudgment and “stereotyping” of people.
Quatre exemples connus d’heuristiques
- Heuristique de la représentativité
Representativeness heuristicEvaluating a specific characteristic in terms of how well it matches a prototype (e.g., evaluating a child’s attention based on our ADHD prototype) - Heuristique de la disponibilité
Availability HeuristicRating that is influenced by the things that come most easily (or frequently) to mind for the rater (e.g., children’s aggressive behaviors)Those things that come to mind more easily are considered more frequent and more representative of reality - Heuristiques de primauté / de récence
Primacy / recency heuristics
Evaluation that is influenced by the individual’s first vs. last impression - Heuristique de l’affect
Affect heuristics
Assessment colored by current emotional and affective state (e.g., bad mood leads to estimation of more behavior problems)
influencent directement la validité
Response biases may seem trivial, but they can be very serious as they directly influence the validity of test scores
Diminished” validity can in turn compromise the quality of inferences and clinical decisions that are made about an individual (or group) being assessed
Huit grands types de biais de réponse (see pictures )
1.Extrémité: responds are extreme
2. Indécision: neutral response
3. Acquiescement: say yes to everything
4.Objection: always say no
5. Désirabilité sociale: socially acceptable exaggerate the positive
6. Gestion défavorable des impressions (malingering): answer exageratevily negative
7. Réponse aléotoire ou negligent:random
8.Deviner (guessing):
9. halo
Que faire pour prévenir ou minimiserles biais de réponse ?
Three things to do:
1. Manage the assessment situation
Anonymity, minimize frustration, give warnings (i.e., warn that there are validity scales)
2. Manage the content of the tests
Simple items (language level), content-neutral items (i.e., non-suggestive), conceptually clear response options
3. Specialized validity tests or scales
Quelques exemples d’échelles de validité
Toutes ces échelles sont basées sur le même principe : des scores très élevés ou extrêmes suggèrent un problème potential
All these scales are based on the same principle: very high or extreme scores suggest a potential problem
Indeterminacy scale (e.g., the MMPI-2; Ben-Porath & Tellegen, 2008)
The full MMPI-2 questionnaire has over 567 items
Unanswered items, or items with multiple responses on the same item, are summed
Échelles de validité- échelles de désirabilité sociale
- Échelles de désirabilité sociale
Échelle de désirabilité sociale de Marlowe-Crowne
Marlowe-Crowne Social Desirability Scale (Crowne & Marlow, 1960)e.g., “I never lie”; “I like everyone I know”; “I have never been angry”. - Inventaire balance de style de réponse socialement desirable
Self-Deception: generally honest, but overly positive responsesImpression management: dishonest responses, positive bias is used to (a) please others or (b) gain advantage
Échelles de validité - Échelle de gestion dévavorable des impressions
Échelle de gestion dévavorable des impressions
Unfavorable impression management scale (e.g., the MMPI-2; Ben-Porath & Tellegen, 2008)
Tendency to respond positively to unlikely negative items (e.g., “I’m no good at anything”; “I have no talent”)
Difficult to distinguish effect with severe clinical cases (e.g., major depression or depressive personality disorder, etc.)
Échelles de validité -
- Échelle de style de réponse extreme
-Échelle d’indécision
- Extreme Response Style Scale
Criteria proposed by the EDC (Parent et al., 2006)
i.e., choosing the 1st or 7th choice of items an abnormally high number of times - Indecision scale
Criteria proposed by the EDC (Parent et al., 2006)
i.e., choosing an abnormally high number of times the central category, i.e., the 4th choice (the one in the middle) of the items
Échelles de validité -
Incohérence variable des réponses (VRIN)
- Variable response inconsistency (VRIN)Sum of the number of item pairs that were answered inconsistentlySimilar: “I don’t think before I act” - “I act without thinking about the consequences”Different: “I don’t think before I act” - “I think carefully before I make decisionsWe give 1 pt for each inconsistent pair and calculate a sumUsed to detect random responses réponses aléatoires (intentional or not) or confusion in a questionnaire
Échelles de validité -
Incohérence vraie des réponses (TRIN)
True response inconsistency (TRIN)
In this one, only pairs of items that are conceptually different are used
Calculates a sum of the inconsistently true response item pairs minus the sum of the inconsistently false response item pairs
Used to detect inconsistent responses that indicate acquiescence l’acquiescement (very high score) or objection (very low score, possibly negative)
Biais des items et tests
une fois que le niveau du trait est contrôlé
Aussi appelé «fonctionnement différentiel des items»
Item (or indicator) biasNot differences in scores on the trait, but systematic differences in the probability of responding in a given way for each item individually, once the trait level is controlled forAlso called “differential item functioning.”Compares the probability of endorsing items on a scale of individuals in different groups who have the same score/level on the traitSame principle as control variables in predictive studies (e.g., when “controlling for SES”)
Biais des items et tests - Biais structurel
Biais structurel
Pour un instrument unidimensionnel, il peut s’agir de différences significatives des saturations factorielles entre deux groups
Pas banal puisque ceci signifie que le trait n’est pas mesuré de la même façon dans différents groupes
Pour un instrument multidimensionnel, (a) différences des saturations et (b) la structure factorielle n’est pas la même dans différents groupes
e.g., analyse factorielle révèle 3 facteurs pour les hommes, mais seulement deux pour les femmes
Biais des items et tests -Biais critériel (ou critérié)
Criterion (or criterion-referenced) biasApplies to both concurrent criterion validity (independent criteria and contrasting groups) and predictive validitye.g., A temperamental trait that predicts later adjustment for one group of children, but not for anothere.g., an IQ test predicts success for one cultural group, but not for anotherCaution: the observation of differences between groups for predictive relationships can be expected because this is theoretically justified… it is not a bias then
Biais des items et tests -Fidelity bias
Fidelity bias
Fidelity estimates are significantly different in different groups
Can be potentially important for interpretation
if bias is present, the level of confidence one can have in the scale scores varies across groups
observed group differences in means can then be partly explained by error
Biais des items et tests
Although testing by comparing groups by sex/gender, ethnicity, cultural background, clinical group, etc., can be informative for many researchers, it often results in “over-generalization”«sur-généralisation»
Variation between individuals in the same group (intragroup variance) can be enormous (see figure distributions)As a psychoeducator, one must never lose sight of the fact that the purpose of a psychoeducational assessment is to interpret the scores and make recommendations for ONE particular individualpour UN individu particulier
Méthodes de construction des tests
There are a wide variety of tests useful in psychoeducation (Hogan et al., 2017)
Tests of intellectual ability/cognitive skills
Achievement tests
Neuropsychological tests
Measures of personality/temperament
Measures of interests, attitudes, and values
Measures of psychopathology
One major category is often overlooked in psychology books
Measures of environmental constructs
CONSTRUCTION OF TESTS
In general, professional organizations expect authors to have constructed their instrument in accordance with the criteria listed in the Standards for Testing in Education and Psychology (AERA, APA, & NCME, 2014)
Test construction and validation is a long-term process
Requires revisions before it is fully satisfactory
Can take place over several years, even a few decades
Deux grandes méthodes de construction des tests
Deductive (or rational)
“conclude from propositions taken as premises”.
Inductive (or empirical)
“conclude by going from the facts to the law