Reasoning and decision making Flashcards
Lecture 1: Judging Probabilities
We will focus on some of the ways in which intuitive probability judgments violate the prescriptions of probability theory, and what these patterns of response reveal about how people estimate probabilities. Two broad approaches will be considered:
* The idea that probability (and other) judgments are sometimes “biased” because people use simplifying strategies that reduce effort but are prone to error
* The idea that we should consider the ecological context in which judgments are made, and that apparent biases may be rational responses given the informational and cognitive constraints of the human decision-maker
2 approaches of how people make probability judgments
The fallibility of probability judgements was central to the Heuristics and Biases research program developed by Amos Tversky and Daniel Kahneman in the 1970s.
The “two systems”/ “heuristics and biases” approaches suggest that probability judgments (and other kinds of judgment) are biased/violate rational prescriptions because the human judge has adopted a quick-and-dirty strategy rather than a more effortful consideration of relevant material.
An alternative framework emphasizes the role of ecological conditions and informational constraints on people’s judgments and decisions. We will consider two examples.
The “Availability” Heuristic
If an event happens a lot then it will be easy to think of many past instances, so basing judgments on availability is sensible and people’s frequency judgments are often very accurate. However, availability can entail bias if:
1. our experience of past events does not reflect their true frequencies, or
2. events are easy to recall for some reason other than their frequency of occurrence.
One observation that is often taken as evidence for the availability heuristic is that people commonly overestimate the frequency or probability of rare events and underestimate common ones. For example, Lichtenstein et al. (1978) had participants estimate the number of US deaths per year due to 40 causes ranging from very rare (e.g., botulism, with one death per 100 million people) to very common (e.g., stroke: 102 000 deaths per 100 million people).
As shown in the graph below, participants systematically over-estimated the latter and under-estimated the former (the curve shows the best-fitting relationship between real and estimated probabilities; the straight line shows what would happen if people’s judgments were accurate).
This pattern is often attributed to availability: rare events are often given disproportionate publicity and are correspondingly more mentally-available than their environmental frequency would merit.
However:
* The bias here is in the environment (the over-reporting of rare events) rather than a consequence of a flawed estimation strategy.
* This kind of effect does not directly demonstrate availability-based judgment, because no assessment of the ease-of-retrieval has been made.
* The tendency to over-/under-estimate rare and common events can be explained in other ways. In particular, it can be seen as an instance of the general central tendency of judgment, where estimates for extreme items being biased towards the mean of the set. This central tendency is widespread and can be seen as an optimizing strategy – when one is uncertain, guessing the mean of the distribution is sensible – without invoking availability.
A stronger demonstration of the use of the availability heuristic comes from Tversky and Kahneman (1973). Participants listened to a list of 39 names. In one condition the names comprised 19 famous men and 20 less famous women; in another condition it comprised 19 famous women and 20 less famous men.
After listening to the list, some participants had to write down as many names as they could recall; others were asked whether the list contained more names of men or of women.
* In the recall task, participants retrieved more of the famous names (12.3 out of 19) than the non-famous names (8.4 out of 20). That is, famous names were more available.
* Crucially, 80 out of 99 participants judged the gender category that contained more famous names to be more frequent. (E.g., the people given a list of 19 famous men and 20 famous women reported that there were more men than women in the list).
It seems that people made their proportion estimates by assessing the ease with which examples of each come to mind. When one category was easier to retrieve (via the fame manipulation) it was judged more frequent, even when it was actually experienced less often.
The Conjunction Fallacy
The availability heuristic is posited to produce judgments that deviate from the rules of probability theory. A basic axiom of probability theory is that the probability of event “A” cannot be less than the probability of the conjunction “A and B”. However, subjective probability estimates sometimes violate this principle, demonstrating the conjunction fallacy.
For example, Tversky and Kahneman (1983) gave some participants the following problem:
* “In four pages of a novel (about 2,000 words), how many words would you expect to find that have the form _ _ _ _ i n g (seven letter words that end with “ing”)?
Other participants were asked:
* “To estimate the number of words of “the form _ _ _ _ _ n _ (seven letter words with n as the penultimate letter)”.
All ing words have n as the penultimate letter, so the number of n words must be at least as large as the number of ing words. However, participants violated this principle: they estimated, on average, 13.4 ing words but only 4.7 n words.
Tversky and Kahneman (1983) took this as evidence that people are basing their judgments on the mental availability of relevant instances: it is easy to think of “ing” words (for example, by thinking of words that rhyme) but we are less accustomed to organizing/retrieving words based on their penultimate letter, so n words are harder to retrieve and thus seem rarer. If/when participants apply a more systematic mental search strategy, we would expect the conjunction fallacy to disappear.
Base Rate Neglect
Similarity-based judgments are insensitive to prior probabilities: the extent to which I look like the sort of person who might be proficient at ballet is independent of the proportion of ballet dancers in the population, for example.
So, judgments based on representativeness will be largely independent of base rates.
In one demonstration, Kahneman and Tversky (1973) told participants that a panel of psychologists had interviewed a number of engineers and lawyers and produced character sketches of each person. They were told that 5 such descriptions had been randomly selected and that they should rate, from 0-100, the likelihood that each sketch described one of the engineers (rather than one of the lawyers).
Some participants were told that the population from which the descriptions were drawn consisted of 30 engineers and 70 lawyers. Others were told that the population comprised 70 engineers and 30 lawyers. That is, Kahneman and Tversky manipulated the base rates for the two possible outcomes.
Below is an example personality sketch:
“Jack is a 45 year old man. He is married and has four children. He is generally conservative, careful, and ambitious. He shows no interest in political and social issues and spends most of his free time on his many hobbies which include home carpentry, sailing, and mathematical puzzles. The probability that Jack is one of the 30 [or 70, depending on the condition] engineers in the sample of 100 is ______ %”
The descriptions varied in how similar they were to the stereotypical lawyer/engineer.
* Crucially, people judged the probability that Jack is a lawyer to be much the same when the description was purportedly drawn from a population of mostly engineers as when it was drawn from a population of mostly lawyers.
This is an example of base rate neglect: the personality description might provide some information about Jack’s likely occupation, but this should be combined with information about the number of engineers and lawyers in the population from which his description was randomly drawn. However, people ignored these base probabilities. Kahneman and Tversky argue that:
1. People assess the extent to which the description of Jack is similar to (or representative of) each of the two categories – lawyers and engineers.
2. To the extent that Jack is more similar to the stereotypical engineer, he is more likely to be judged an engineer.
3. Because this assessment of similarity is independent of the prevalence of lawyers and engineers in the population, the resulting probability judgment is independent of the base rates for these two professions.
More direct evidence for the role of representativeness comes from Kahneman and Tversky (1973), who gave participants the following personality sketch:
“Tom W. is of high intelligence, although lacking in true creativity. He has a need for order and clarity, and for neat and tidy systems in which every detail finds its appropriate place. His writing is rather dull and mechanical, occasionally enlivened by somewhat corny puns and by flashes of imagination of the sci-fi type. He has a strong drive for competence. He seems to have little feeling and little sympathy for other people and does not enjoy interacting with others. Self-centred, he nonetheless has a deep moral sense.”
They were also given a list of 9 academic subject areas (e.g., computer science).
* The prediction group was told that the sketch of Tom was prepared by a psychologist during Tom’s final year in high school, and that Tom is now a graduate student. They were asked to rank the 9 academic subjects by the probability that Tom W. is specializing in that topic.
* The base-rate group was not shown the Tom W. sketch but “consider[ed] all first year graduate students in the US today” and indicated the percentage of students in each of the 9 subject areas – that is, the estimated the base rates for each subject area.
* The representativeness group ranked the 9 subject areas by the degree to which Tom W. “resembles a typical graduate student” in that subject area.
Across the 9 subjects, probability judgments were very highly correlated with representativeness judgments (r = .97) but negatively correlated with base-rate judgments (r= -.65). That is, predictions were based on how representative people perceive Tom W. to be of the various fields, and ignored the prior probability that a randomly-selected student would belong to those fields (base rate neglect).
The “Representativeness” Heuristic
The “Representativeness” Heuristic
Kahneman and Tversky also suggested that people use a representativeness heuristic. The idea is that:
* when estimating a probability – for example, how likely it is that a person belongs to a particular category or the probability that an observed sample was drawn from a particular population – people assess the similarity between the outcome and the category (or between the sample and the population).
Suppose that you meet a new person at a party and try to estimate the probability that he or she has tried internet dating. The idea is that you base your judgment on the similarity between the person and your stereotype of internet-daters – that is, on the extent to which the person is representative of the category “people who have tried internet dating”.
More generally, the representativeness heuristic involves “an assessment of the degree of correspondence between a sample and a population, an instance and a category, and act and an actor or, more generally, between an outcome and a model.” (Tversky & Kahneman, 1983, p. 295).
As with the availability heuristics, we can see evidence for this strategy by looking at the biases and axiom-violations that creep in to people’s intuitive judgments.
The “Anchor-and-Adjust” Heuristic
So far we have considered how people use information about the target quantity to reach a probability or frequency estimate. Our judgments are also shaped by candidate response values. In particular, anchoring refers to the assimilation of a numeric estimate towards another, anchor value.
Anchors can come from many sources. Often, our own recent judgments serve as anchors for the current estimate. For example, Matthews and Stewart (2009) had people estimate the prices of shoes from Top Shop; the judgments on trial n positively correlated with the judgments on trial n-1 for 26 out of 28 participants.
Anchors can also be externally provided, and ostensibly irrelevant. In a famous demonstration, Tversky and Kahneman (1974) span a wheel of fortune that landed on 10 (a low anchor) for one group of participants and on 65 (a high anchor) for another group. Participants were asked whether the proportion of African Countries in the United Nations was more or less than the anchor, and then asked for their best estimate of the true value. The median estimate was 25 in the low anchor condition and 65 in the high anchor condition – that is, the participants’ judgments were pulled towards the anchor values.
Similarly, Chapman and Johnson (1999) had people write down the last 2 digits of their social security number and treat it as a probability (e.g., “14%”). Participants were asked whether the probability that a Republican would win the 1996 US Presidential Election was more or less than this probability, prior to giving their best estimate of the true probability. The larger the anchor, the larger the best estimate, with a correlation of r = 0.45.
The most famous account of anchoring effects is the “anchor-and-adjust” heuristic; the idea is that we use the anchor as an initial estimate of the target value and adjust from that starting point in the right direction; because the adjustment is effortful, we often adjust insufficiently and so our judgment is biased towards the anchor value.
This probably happens sometimes, but there are contraindications. For example, in the “wheel of fortune” task described above, warning people about anchoring effects and/or giving them an incentive to be accurate often has little effect on the extent to which people anchor on the provided value (e.g., Epley & Gilovich, 2005), which doesn’t fit with the idea that the anchoring effect reflects a “lazy” or “intuitive” judgment system that can be over-ridden by effortful deliberation.
Other mechanisms that might contribute towards anchoring/assimilation effects include:
* The idea that consideration of the anchor as a possible value for the estimated quantity activates relevant semantic knowledge (e.g., when considering 12% as a possible probability for the probability of a Republican win, we call to mind relevant information about the state of the economy, public perceptions of the candidates etc; this activated knowledge then shapes or biases our final estimate; Chapman & Johnson, 1999)
* The idea that an anchor value changes our perception of the magnitude of other candidate values (e.g., if we’ve just been thinking about a 12% probability, 50% seems quite large; if we’ve been considering 88%, 50% seems quite small; Frederick & Mochon, 2011).
* The idea that externally-presented anchors may be seen as a “hint” or suggestion, even if they are ostensibly uninformative (after all, doesn’t the fact that the experimenter is getting me to consider a number generated by a wheel of fortune suggest that they want me to be influenced by it in some way?)
These possibilities are not mutually exclusive – and note that they do not all fit with the idea that anchoring stems from the application of quick-and-easy-but-biasing heuristics.
Ecology and Adaptation-Example 1: Natural Frequency Formats
One example of an “ecological” argument comes from the effect of natural frequencies on base rate neglect. In the “two systems”/“heuristics and biases” view, the problems caused by using availability or representativeness as the basis for probability judgments can be overcome by evoking “System 2” – i.e., by employing a more effortful processing strategy. Consistent with this, there is evidence that people can discount a potentially-misleading but readily-accessible cue such as stimulus familiarity (e.g., Oppenheimer, 2003). But can we do anything other than alert people to possible bias and/or tell them to put more effort into their judgment?
Some researchers argue that people do much better at probability tasks when the information is presented in way that matches our supposed “evolved” cognitive capacities for handling this kind of information. In particular, it has been argued that humans evolved to process frequencies (counts) obtained by sampling the environment, rather than normalized probabilities.
For example, consider the following problem:
“For a woman at age 40 who participates in routine screening, the probability of breast cancer is 1%. If a woman has breast cancer, the probability is 80% that she will have a positive mammogram. If a woman does not have breast cancer, the probability is 10% that she will still have a positive mammogram.
Imagine a woman from this age group with a positive mammogram. What is the probability that she actually has breast cancer?”
When Eddy (1982) gave this problem to physicians, 95 out of 100 gave estimates between 0.70 and 0.80. An estimate of 80% demonstrates the inverse fallacy: it confuses the probability of a positive test result given the presence of cancer, p(positive|cancer), with the probability of cancer given the positive test result, p(cancer|positive). These probabilities are not the same: the chances of cancer given a positive test depend on the base rate (prior probability) of cancer in the population. A positive test is more likely to indicate cancer when cancer is widespread than when it is very rare. But the physicians (and most people) tend to ignore this base rate information.
Probability theory tells us how we should update our beliefs (e.g., that a person has cancer) in the light of new information. Suppose we have a hypothesis and know that the prior probability that is true is and the probability that it is false, . We then encounter some new data, . The conditional probability of obtaining those data under the hypothesis is . (That is, is the probability of obtaining these data if the hypothesis is true).
Bayes’ theorem tells us how we should update our beliefs to give the posterior probability that is true, given our prior belief and the new data:
In the cancer example:
* is the hypothesis that the person has cancer
* is the base rate of cancer in the population (the prior probability that a randomly selected person has cancer) and equals 0.01
* is the prior probability that a person does not have cancer and equals 0.99
* is the probability of getting a positive test result given that the person has cancer, and equals 0.8
* is the probability of getting a positive test result given that the person does not have cancer, and equals 0.1
Thus:
In other words, given the positive test result the probability that the person has cancer is still only 7.5%.
Gigerenzer and colleagues have argued that probabilities only make sense when conceived as long-run frequencies, and that it does not make sense to talk about the probability of a one-off event (e.g., that a given person has a disease). Rather, Gigerenzer argues that humans evolved to keep track of event frequencies, estimated over time by “natural sampling” (i.e., encountering different types of events and remembering the number of times they occur).
Correspondingly, if we re-express the diagnosis problem in terms of natural frequencies (number of events of each type) rather than normalized probabilities, then people should find it much easier.
Consider this re-expression of the previous problem:
“Ten out of every 1000 woman at age 40 who participate in routine screening have breast cancer. Of these ten women with breast cancer, eight will have a positive mammogram. Of the remaining 990 women without breast cancer, 99 will still have a positive mammogram.
Imagine a group of 40 year old women with positive mammograms. How many of them actually have breast cancer? ____ out of _____”
As Hoffrage and Gigerenzer (1998) note, now the answer can easily be “seen” to be 8 out of 107 = 7.5%. They found that only 8% of physicians answered correctly (gave a judgment within 5% of the true value) in the original wording of the task, but that this increased to 46% with the natural frequency format.
More generally, this representation of the problem means that the answer is simply the number of true positives divided by the total number of positives; there is no need to keep track of the base rate, explaining base-rate neglect when problems are presented in standard probability format. In other words, the task is difficult in the original version because the use of normalized probabilities (which necessitate the explicit incorporation of base rates/priors) deviates from how we “naturally” evaluate chance.
Similar evolutionary ideas have been advocated by others (e.g., Cosmides and Tooby, 1996), but there are alternative explanations for why the natural frequency format makes the task easier. In particular, it has been suggested that it simply clarifies the set relations between the various event categories, and that any manipulation which achieves this will have the same beneficial effect.
Irrespective of the basis for the natural frequency format effects, some authors have argued that base rate neglect is not as common as the heuristics-and-biases programme would have us believe (e.g., Koehler, 1994). Quite often in natural contexts, people are sensitive to prior probabilities and update their beliefs appropriately.
Example 2: The Gambler’s Fallacy and Hot Hand Fallacy
A different illustration of potential “ecological rationality” comes from a consideration of how people judge the probability of streaks of a random outcome. For a sequence of independent events, the probability of a particular outcome is the same irrespective of what has happened in the past: the probability of getting “heads” from a fair coin is the same after a run of 3 heads as after a run of 3 tails. However, subjective probabilities often violate this independence axiom.
For example, Croson and Sundali (2005) examined roulette betting patterns from a Nevada casino. They focused on “even money” outcomes (where the two outcomes are equally likely, such as “red or black” or “odd or even”; if you bet on the right outcome, you get back twice what you staked) and looked at bets as a function of the number of times that an outcome had occurred in a row (e.g., a streak of two would mean that the last two spins both came up heads or both came up tails).
The graph below shows the proportion of bets that were “with” (white bars) and “against” (black bars) the streaks. As the run-length increased, people were increasingly likely to bet that the next outcome would be the opposite of the streak; after a run of 6 or more, 85% of bets were that the streak would end, even though this probability remains fixed at .50.
The belief that a run of one outcome increases the probability of another (when the events are actually independent) is called the gambler’s fallacy.
The gambler’s fallacy is often attributed to the representativeness heuristic: people expect a “local” sequence to be representative of the underlying process (Tversky & Kahneman, 1974). I know that a coin should, in the long run, produce equal numbers of heads and tails, so I expect any sequence of coin tosses to have this property. A run of heads means that a tails outcome will make the local sequence more representative of the data-generating process.
The gambler’s fallacy is widespread, but sometimes people show the opposite tendency by believing that a streak elevates the probability that the same outcome will occur again. In a famous demonstration, Gilovich et al. (1985) found that basketball players’ shooting accuracy was independent of their recent performance: the probability of scoring was the same after a run of “baskets” as after a run of “misses”. However, basketball fans believed that a player’s next shot was more likely to score after a run of successful shots than after a run of misses – a so-called “belief in the hot hand” or “hot hand fallacy”.
Gilovich et al.’s statistical analysis has been questioned (it is hard to establish that the outcomes of each basketball shot really are independent events), but the hot hand fallacy has been found in situations where the success of consecutive attempts really cannot be any guide to future performance. For example, Ayton and Fischer (2004) had people play a roulette-style game and found that their confidence in their predictions for the next outcome was greater after a run of successful predictions – even though the probability of them being right next time must be independent of past success because the roulette spins are random.
Belief in the hot hand has again been attributed to the representativeness heuristic: a run of one outcome doesn’t seem representative of randomness, leading people to conclude that the underlying process is not random (Gilovich et al., 1985).
Some researchers have objected that it is problematic to use the same mechanism to “explain” two completely contradictory findings (belief that a streak will end in the GF and that it will continue in the HH).
Ayton and Fischer (2004) therefore offered an alternative account, based on ecological considerations. Their argument runs:
* Many physical processes involve sampling without replacement, which results in diminishing probability for a given outcome the more times that it has occurred. For example, if you rummage blindly in your cutlery drawer for spoons, removing the spoons as you find them, then the probability that the next item will be a spoon decreases as your hunt progresses.
* Correspondingly, the GF reflects an over-generalization of this ecologically-sensible principle to other random, mechanical processes – e.g., roulette wheels and coin tosses – about which we have very limited experience.
* By contrast, many aspects of intentional human performance really do show positive recency. If you practice a new game, your shooting success will increase. So the hot hand fallacy can be seen as an appropriate generalization of this principle to situations which also use human performance, but where the outcome probabilities are in fact independent.
In support of these ideas, Ayton and Fischer (2004) presented sequences of outcomes with varying alternation rates (AR; a low AR means the next outcome is unlikely to be different from the last, giving many long runs of one outcome; a high AR means lots of short runs). Participants had to judge which of two processes generated each sequence (e.g., a series of basketball shots or a sequence of coin tosses). As the streak length increased, participants were more likely to attribute the sequence to intentional human performance like basketball than to a random mechanical process like coin-flipping.
A related but distinct account comes from elegant work by Hahn and Warren (2009). With an infinitely long sequence of coin flips, all sequences of a given length occur with equal probability – for example, the sequence HHHH will occur with the same frequency as HHHT, so believing that a run of heads means it’ll be tails next time is indeed a fallacy. However, Hahn and Warren noted that humans do not experience or remember infinitely-long sequences – and for shorter sequences, the probability of encountering HHHT and HHHH are not equal. In one illustration, Hahn and Warren simulated 10,000 sequences of 10 coin flips. The pattern HHHH only appeared in about 25% of the sequences, whereas HHHT occurred in about 35% of the simulated samples. In other words, if we had 10,000 people each of whom had experienced 10 flips of a fair coin, it would be perfectly reasonable for more of them to expect a sequence HHH to end with a T than with another H.
This work provides one example of a broader principle – namely, that the supposed “fallacies” of human judgment and decision-making are often perfectly rational given the finite and imperfect information afforded by the environment and our limited mental capacities.
Conclusion for judging probability
Conclusions
We have identified several key phenomena and ideas:
1. Human probability judgments do not always follow the laws of probability, and these deviations illuminate the judgment process.
2. One broad framework posits the use of simplifying strategies that reduce effort at the expense of sometimes introducing bias. In particular, people sometimes simplify judgments by substituting an easier-to-evaluate entity for the target dimension: the availability and representativeness heuristics are two examples.
3. Judgments often assimilate towards anchor values. There are many types of anchor and many mechanisms that underlie this assimilation.
4. We can also consider probability judgments in their ecological context. One idea is that humans evolved to process frequencies, not normalizing probabilities, although this interpretation of frequency-format effects is debatable.
5. Likewise, we can see phenomena such as the gambler’s fallacy as reflecting ecological experience with different types of generating process.
6. We have focused on probability judgments, but these kinds of ideas and effects apply to many other kinds of judgment.
Lecture 2: Reasoning
We will discuss how human reasoning deviates from the prescriptions of formal logic in a range of tasks, and how the systematic patterns of success and failure on these tasks inform our theorizing about the mental operations that underlie human reasoning. Four broad approaches will be considered:
1. People may solve reasoning problems by using simple heuristics (rules of thumb) rather than engaging actual reasoning processes
2. People often make “errors” because the use of language in formal logic differs from that of everyday life
3. The “Mental Models” framework provides an example of an algorithmic description of the steps by which people reason
4. Responses in reasoning tasks are highly sensitive to the framing of the task and the participant’s background beliefs
Two types of reasoning
Inductive reasoning involves drawing general conclusions from particular instances. For example, given the premise “I have fallen asleep in every Psychology lecture so far”, one might draw the conclusion “I will always fall asleep in Psychology lectures”. Inductive reasoning takes many forms and is central to scientific research, but the conclusions are not necessarily true; there is always the possibility that the next Psychology lecture will manage to hold your attention throughout.
Deductive reasoning involves drawing conclusions which follow necessarily from the premises; if we accept that the premises are true, and if the argument follows the rules of logic, then the conclusion has to be true, too.
We will focus on two kinds of deductive reasoning – propositional reasoning and syllogistic reasoning.
Two types of deductive reasoning
propositional reasoning and syllogistic reasoning.
Syllogistic reasoning
The study of Aristotelian syllogisms (aka quantitative syllogisms) provides an alternative approach to the psychological processes that underlie reasoning. Syllogisms typically comprise two premises and a conclusion, and involve the quantifiers all, no, some, and some…not.
The following is an example:
All people who teach psychology are psychologists
Jon teaches psychology
Therefore, Jon is a psychologist
Such arguments may be valid or invalid. Validity is determined by the structure of the argument – the relations between the premises and the conclusion. A valid argument is one where, if one accepts the truth of the premises, then the conclusion is also true. The above example is a valid argument. Of course, one might not accept the premises (in fact, Jon doesn’t have a degree in psychology, he just works as one), but that doesn’t change the validity.
The combination of quantifiers (all, no, some, some…not) and order of terms (e.g., all a are b vs all b are a) gives a total of 512 two-premise syllogisms, most of which are regarded by logicians as invalid.
Studies of syllogistic reasoning typically present the two premises and either ask participants “what follows?” or present a conclusion and have them indicate whether it is valid or invalid.
Imperfect performance
Despite their simple structure, syllogistic reasoning problems can be very hard. For example, in a review of the literature, Roberts and Sykes (2005) found that problems of the form: “all a are b; all b are c; what follows?” were correctly solved by 88% of participants (valid conclusion: “all a are c”). However, given a problem of the form: “all b are a; all b are c; what follows?” only 8% of participants correctly concluded that “some a are c” (or, equivalently, that “some c are a”).
By studying how structural features of the problem change performance, we can try to develop models of how people go about solving this kind of problem. We consider four approaches to understanding performance on these kinds of reasoning task.
four approaches to understanding performance on syllogistic reasoning task.
Approach 1: Identify simplifying strategies
Approach 2: Focus on interpretation of the terms
Approach 3: Posit a sequence of processing steps – the “Mental Models” framework
Approach 4: Consider the role of framing and experience
Approach 1: Identify simplifying strategies
One suggestion is that many people do not actually engage in any reasoning at all when confronted with syllogistic reasoning problems. Rather, they may base their responses on simple heuristics.
An early example is atmosphere theory, according to which the mood of the premises influences judgments about what the mood of the conclusion should be. “Mood” means whether the statement is affirmative or negative, and whether it is universal or particular. (E.g., “all…” is universal and affirmative, whereas “some are not…” is particular and negative).
Begg and Denny (1969) gave participants 64 reasoning problems comprising two premises and a choice of four conclusions. For example:
All a are b
All b are c
All c are a Some c are a No c are a Some c are not a
Participants indicated which if any of the 4 conclusions followed from the premises. Nineteen of the 64 problems had a valid solution among the 4 options presented; the authors focussed on responses for the other 45 problems, where choosing any of the options constituted an error.
* When both premises were positive, 79% of conclusions endorsed were positive
* When at least one premise was negative, 73% of chosen conclusions were negative
* When both premises were universal, 77% of chosen conclusions were universal
* When at least one premise was particular, 90% of chosen conclusions were particular
So this is evidence that the “atmosphere” (quality and quantity) of the premises shapes beliefs about the validity of different possible conclusions – e.g., universal premises lead people to assert universal conclusions.
Crucially, however, this fails to explain why/how participants decide whether or not a syllogism has a valid conclusion, yet when participants are given two premises and asked “what follows?”, they correctly identify that there is no valid inference 29-40% of the time (Roberts & Sykes, 2005). The idea that their conclusions are guided by the “atmosphere” of the premises doesn’t capture this.
Approach 2: Focus on interpretation of the terms
“Errors” in syllogistic reasoning partly reflect differences between the use of language in formal logic and in everyday life.
For example, consider two arguments:
VALID INVALID
All A are B All A are B
All B are C All C are B
Therefore, All A are C Therefore, All A are C
If I take “All C are B” to mean that “All C are B and vice-versa” then the invalid argument on the right would be equivalent to the valid one on the left, and it would be fine to accept the conclusion. Likewise, in logic “Some” means “Some and perhaps but not necessarily all”, but in everyday speech we typically use “Some” to mean “Some but not all”.
In one demonstration, Ceraso and Provitera (1971) presented wooden blocks and had people reason about their properties. In the “traditional” version of the task, people were given syllogisms such as:
All blocks with holes are red
All blocks with holes are triangular
Only 1 out of 40 people correctly identified “Some red blocks are triangular” as the valid inference; more than half endorsed “All red blocks are triangular”, which is what we’d expect if they take “All A are B” to imply “All B are A”.
In a modified version of the task, people were given more explicit instructions about the interpretation of the premises, such as:
Whenever I have a block with a hole it is red, but not all red blocks have holes
Whenever I have a block with a hole it is triangular, but not all triangular blocks have holes
The proportion of people who correctly responded “Some red blocks are triangular” rose to 27 out of 40. Across a number of such problems, people scored an average of 58% correct with the traditional format but 94% correct with the modified versions.
So, these authors argue that syllogistic reasoning errors arise because people don’t properly apprehend the premises in the way that the experimenter intends. However, it is unlikely that premise misapprehension accounts for the full spectrum of performance on this kind of task.
Approach 3: Posit a sequence of processing steps – the “Mental Models” framework
A more sophisticated and very wide-ranging account of the mental operations that underlie reasoning comes with the mental models framework developed by Philip Johnson-Laird (e.g., Bucciarelli & Johnson-Laird, 1999; Johnson-Laird, Byrne, & Schaeken, 1992). The mental models approach has been applied to many types of reasoning problem; it posits that reasoning involves three stages:
1. Comprehension: use language and background knowledge to construct a mental model of the state of the world that is implied by the premises
2. Description: combine the models implied by the premises into a composite, and use this to try to draw a conclusion that goes beyond re-iterating the premises.
3. Validation: search for alternative models. If all of these are consistent with the initial conclusion, it is judged valid; if one or more of the new models contradict the conclusion, reject it and try to construct an alternative which can then be validated.
For example, consider the following:
All Psychologists are Comedians
All Comedians are Psychopaths
What follows?
We start by constructing a model of the first premise:
Psychologist Comedian
Psychologist Comedian
…
Each row represents a conjunction of items (e.g., a psychologist-comedian); you generate an arbitrary number of instances of each case – I have listed two above, but for simplicity we could list just one. (Sometimes authors would write the “Psychologist” exemplars in square brackets to signify that the Psychologist item is exhaustively represented – it cannot appear in any other possibility and must always be paired with “Comedian”.) The three dots signify that there are other instances and possibilities that could be represented but which aren’t yet included in the model. In particular, we could include something that is a non-psychologist comedian, or a non-psychologist non-comedian.
Likewise, the second premise leads to a model like this:
Comedian Psychopath
Comedian Psychopath
…
During the “description” stage, the reasoner attempts to construct an integrated representation of the information in the premises. For this example, only one such model can be constructed:
Psychologist Comedian Psychopath
In this case, the “validation” step would fail to find any other models, from which it follows that “All Psychologists are Psychopaths” – the correct inference (well, the valid conclusion!).
This kind of one-model syllogism should be relatively easy to solve because there is only one model that is consistent with the premises. Other multiple-model syllogisms are more challenging, because there are several possible ways of combining the information in the premises. For example:
No Artists are Bakers
All Bakers are Candlemakers
What follows?
Here we might construct an initial model (using A, B, and C for the Artists, Bakers, and Candlemakers):
A
B C
Which would lead to the preliminary conclusion that “No Artists are Candlemakers” (or that “No Candlemakers are Artists”).
However, searching for alternative models during the validation step reveals that a second model is possible:
A
A C
B C
This model acknowledges the possibility of an artist-candlemaker, which refutes the initial conclusion. A new conclusion, consistent with both of the models, is that “Some Artists are not Candlemakers”. However, a third model can also be constructed:
A C
B C
Again, this refutes the previous conclusion. The only conclusion that is consistent with all three mental models is that “Some Candlemakers are not Artists”.
The mental models approach also describes what happens when there is no valid conclusion. For example, consider:
No Aardvarks are Bigots
No Bigots are Chocolate-lovers
What follows?
An initial model might be:
A
B
C
Leading to the conclusion “No A are C”. However, one can also construct a model:
A C
B
in which all A are C. There is no conclusion that is consistent with both models, so the correct response is “no valid conclusion”.
So:
* If a reasoner fails to consider all of the alternative models, he or she is less likely to draw the correct inference – so multiple model syllogisms will be harder than single-model ones
* Considering more models will require more time, effort, and processing-capacity – so multiple-model syllogisms will take longer to solve, and people with greater working memory, or those with more time/inclination to work on the task, will do better
Copeland and Radvansky (2004) tested these predictions:
* First, participants completed a working memory span assessment
* Next, participants were presented with syllogisms such as “All cyclists are coffee drinkers; All coffee drinkers are surgeons” along with all 9 possible conclusions (the 8 combinations of the two end terms “Cyclists” and “Surgeons” with the four quantifiers “All”, “None”, “Some” and “Some…not”, plus the option “no valid conclusion”).
The following table shows accuracy and response-time data, organized according to the number of models supported by each syllogism:
% correct RT (s) One-model 87 25 Two-model 40 29 Three-model 34 33
As you can see:
* Problems with more possible mental models were solved less accurately, consistent with people failing to consider all of the possible states implied by the premises
* Problems with more possible models were solved progressively more slowly, consistent with it taking time to construct each model and check the validity of preliminary conclusions
* In addition, participants with higher WM span were more accurate and faster at the reasoning tasks, particularly for more complex syllogisms, consistent with model-construction being a resource-intensive activity
* Analysis of the response choices showed that they were better predicted by the mental models theory than by simple heuristics such as the “atmosphere” approach described above
These data fit with the idea that model construction is a time- and resource-demanding activity. However, they are not direct evidence for the mental model strategy – indeed, responses were similarly well-described by an alternative “probability heuristics” model.
Newstead, Handley, and Buck (1999) sought more direct evidence. Participants were given premises such as:
All of the buskers are computer operators
None of the computer operators are boxers
Participants had to write down the conclusion (or if there was no valid conclusion). Straight after answering each problem, participants were given a list of the 9 possible conclusions and indicated which they had considered when coming up with their response. There results were as follows:
Single-model syllogisms Multiple-model syllogisms Indeterminate syllogisms (where no valid conclusion can be drawn)
% correct 70 12 19
Number of conclusions considered 1.05 1.12 1.12
So, although multiple-model syllogisms are harder, people did not try to construct more models for them. In addition, there was no correlation between the number of models constructed and the proportion of syllogisms solved correctly.
Newstead et al. (1999, p.354) argue that “reasoners are able to construct alternative models – for example, when their first model leads to an unbelievable conclusion – but that they normally construct only one”.
If this is correct, a key issue would be: what determines how the initial model is constructed? The next section offers some ideas.
Approach 4: Consider the role of framing and experience
A final approach to the study of reasoning emphasizes the contribution of background knowledge and the role that reasoning plays in natural conversation. Syllogistic reasoning is affected by the framing of the problem and the participant’s prior experiences. One crucial demonstration came from Evans et al. (1983), who gave people valid and invalid syllogisms with believable and unbelievable conclusions. Examples are shown in the table, along with the mean proportion of people who accepted each argument as valid for each type of problem:
Believable conclusion Unbelievable conclusion Valid No cigarettes are inexpensive Some addictive things are inexpensive
Therefore, some addictive things are not cigarettes (89%) No addictive things are inexpensive
Some cigarettes are inexpensive
Therefore, some cigarettes are not addictive (56%)
Invalid No addictive things are inexpensive
Some cigarettes are inexpensive
Therefore, some addictive things are not cigarettes (71%) No cigarettes are inexpensive
Some addictive things are inexpensive
Therefore, some cigarettes are not addictive (10%)
In this study, plausibility increased the judged validity of both valid and invalid arguments: judgments about argument validity are influenced by beliefs both about the conclusions themselves and about the probability that those conclusions will be true.
There have been many attempts to explain this so-called belief bias.
* The selective scrutiny hypothesis – another example of a heuristic approach to reasoning – posits that people initially evaluate the plausibility of the conclusion. If it is reasonable, they accept it – without engaging in any actual “reasoning” at all; scrutiny of the logical connection between premises and conclusion only arises when the conclusion is unbelievable.
However, in the example from Evans et al. (1983) above – and in a meta-analysis of similar studies by Klauer et al. (2000) – we saw that validity does affect the acceptance of believable arguments (that is, people reject invalid arguments with plausible conclusions).
* The misinterpreted necessity hypothesis conjectures that people don’t know how to respond when a conclusion is possible but not logically necessary. (E.g. All A are B; All B are C; Therefore, all C are A. It might be true that all C are A, but it is not a logical necessity.) In such cases, they might use believability to make their decision.
However, as you can see in the above example (and in other studies), belief influences acceptance even when conclusions are deductively valid – it is not limited to indeterminate uncertainty.
Rather than belief influencing judgments before (or instead of) reasoning (selective scrutiny) or after reasoning (misinterpreted necessity), some have argued that belief exerts two separate effects: (1) inducing an overall bias to accept/reject the conclusion, and (2) shaping the reasoning process itself.
Klauer et al. (2000) developed a framework which incorporates this idea (see also Evans et al., 2001). The basic ideas are that:
* People typically generate just one mental model, because of capacity limits (see also the evidence from Newstead et al., 1999, above)
* If the conclusion is believable, people attempt to construct a model that is consistent with this claim
* If the conclusion is unbelievable, they attempt to construct a model which refutes this claim (e.g., if the unbelievable conclusion was “All A are C”, they would try to construct a model where “Some A are not C” that was consistent with the information in the premises).
* When the attempt to construct the “desired” model fails, the participant is in a state of uncertainty and will be somewhat swayed by their belief about the base-rate probability that the conclusion is valid.
The nice feature of this theory is that it combines a description of the cognitive operations by which people reason with the idea that these operations will be shaped by prior beliefs and biases – and in so doing it captures the interacting effects of validity, believability, and base-rates reported by Klauer et al. (2000).
However, we need to be cautious. Evidence that belief bias affects reasoning rather than just leading to an overall boost to the probability of responding “valid” comes from the interaction between believability and validity (the effect of believability is larger for invalid arguments). But this analysis uses “proportion correct” as the index of performance. As you might know from studies of perception and memory, using proportion correct means assuming that there is a linear relationship between hit rate and false alarm rate, but this is rare; rather, data such as these are often more appropriately analysed in a signal detection framework. When researchers have applied a signal detection analysis to the effect of believability on people’s ability to discriminate between valid and invalid arguments, they often find a criterion shift (i.e., a bias towards judging all arguments Valid when they have plausible conclusions) but no effect on discriminability (the underlying ability to distinguish valid from invalid arguments). This argues against the idea that prior knowledge/belief qualitatively changes the reasoning process in the manner envisaged by the selective scrutiny or mental models accounts (see, for example, Trippas et al, 2018).
Propositional Reasoning
The same four approaches can be found in studies of propositional reasoning. Propositional reasoning involves reasoning about propositions containing the conditionals: If, And, Not, and Or.
For example, consider the following proposition:
If it is raining, then I take the bus.
Now suppose you learn that it is raining, and infer that I therefore took the bus. This is a valid type of inference, called the modus ponens (MP). Equally, suppose you learn that I did not take the bus, and conclude that it is not raining. This, again, is a valid inference, called the modus tollens (MT). Now suppose you learn that I took the bus and conclude that it must be raining. This is called affirming the consequent (AC) and is usually regarded as a fallacy (I might have taken the bus even though it was sunny). Likewise, you might learn that it is not raining, and conclude that I did not take the bus. This is called denial of the antecedent (DA), and is again regarded as an error (because again, I might take the bus in the sunshine).
Studies of this kind of reasoning task often use abstract constructions such as “if p then q” to minimize any contribution of background knowledge/beliefs about the likely truth of different possible conclusions. Schroyens et al. (2001) collated data from many experiments and examined the frequency with which people made each of the four types of inference, as follows (with reminders of how these would apply to the “if it is raining…” example in parentheses).
Modus ponens (MP)
(It is raining. Conclude I took the bus.) Denial of the antecedent (DA)
(It is not raining. Conclude I did not take the bus.) Affirmation of the consequent (AC)
(I took the bus. Conclude it is raining.) Modus tollens (MT)
(I did not take the bus. Conclude it is not raining.)
96.8% 56.0% 64.0% 74.2%
Clearly, people frequently commit the “fallacies” of denying the antecedent and affirming the consequent.
Related to propositional reasoning is a famous task – the four-card selection task – developed by Wason (1968). Wason laid out four cards in front of participants and told them that each card has a number on one side and a letter on the other. The cards were like this:
D K 3 7
They were also given the conditional sentence:
“If there is a D on one side, then there is a 3 on the other side.”
The Experimenter pointed to each card in turn and asked whether knowing what was on the other side would allow him to find out whether the rule was true. (In more recent versions, the participant is simply asked to indicate only those cards that they’d need to turn over to see whether the rule is true or false).
Turning over the D card is a fairly obvious step; if there was anything other than a 3 on the reverse, it would mean the rule was false, so this is a “correct” card choice. Typically, a large proportion of participants also choose the 3 card, presumably thinking that there should be a D on the other side. However, this is not the “right” answer; the rule doesn’t say that there has to be a D on the other side of every 3, so this card is irrelevant. The correct choice is the 7, because if there is a D on the other side of that then rule is false.
However, in Wason’s original study, only 1 out of 34 people choose D and 7. So people’s reasoning deviates from formal logic.
More generally, the rule that participants have to test can be phrased as:
“If P then Q”
and the cards as:
P not-P Q not-Q
89% 16% 62% 25%
with the correct choice being P and not-Q. The numbers below each option are the proportions of participants who selected each card, collapsing across a large number of experiments using the task (Oaksford and Chater, 1994). Clearly, the “failure” to choose the P,not-Q combination is widespread.
Early theorizing attributed this to a confirmation bias – a tendency to seek evidence that the rule is true rather than trying to falsify it. Indeed, the task was seen as relevant to broader debates about how scientists can and should approach theory testing. However, this idea was quickly shown to be inadequate (see below).
Again, we can consider four approaches to understanding performance on these kinds of reasoning problems.
Approach 1: Identify simplifying strategies
Just as for syllogistic reasoning, approach to propositional reasoning involves identifying the simplifying strategies (heuristics) that people sometimes use to reach a solution. For the 4-card selection task, one potential strategy was uncovered by Evans and Lynch (1973), who varied the conditional rule that participants had to test. Examples, along with the proportion of people selecting each card, are shown below:
S 9 G 4
If there is an S on one side, then
there will be a 9 on the other side
(If P then Q) 88% 50% 8% 33%
If there is an S on one side, then
there will not be a 9 on the other side
(If P then not-Q) 92% 58% 4% 8%
Confirmatory testing should lead people in the second row to select S and 4 (i.e., to seek instances where the rule is true), but in fact participants selected the cards which were mentioned in the rule (S and 9). In the case of “if P then not-Q”, this actually leads to the logically correct response!
Thus, simply choosing items that are explicitly mentioned in the problem statement – a “matching heuristic” – might be one simplifying strategy when faced with this kind of task.
Approach 2: Focus on interpretation of the terms
As for syllogistic reasoning, propositional reasoning “errors” may often reflect the participant’s interpretation of the terms. For example, in the selection task: “If there is an D on one side…” might be taken to mean “If there is an D on the top of the card [i.e., on the part I can see]…”. In this case, the only card you need to turn over to check the rule is the “D” card.
Likewise, some people might take “If” to be biconditional – i.e., to mean “If and only if…”, so that “If P, then Q” also means “If Q, then P”). Under this interpretation of “If”, the participant would need to turn over all 4 cards – or just the “P” and “Q” cards if they think the rule applies to the visible faces of the cards.
Gebauer and Laming (1997) argued that the common selection of “P” and “Q” results from just this pattern of understanding of the rule (see also Osman & Laming, 2001). You can see how the biconditional interpretation of “If” would also to Affirming the Consequent and Denial of the Antecedent fallacies.
Approach 3: Posit a sequence of processing steps – the “Mental Models” framework
Heuristic responding (e.g., the matching rule) and interpretational confusion don’t capture all of the challenges posed by people’s conditional inferences. For example, why is the Modus Tollens harder than the Modus Ponens? The error rates for these should be the same no matter how people interpret “If”, but the MT is more difficult. Likewise, those accounts don’t describe what actually happens when people engage in reasoning.
Johnson-Laird and colleagues have applied their mental models approach to propositional reasoning, too.
For example, suppose you are given the conditional proposition:
If there is a Circle, then there is a Triangle
The idea is that this leads to an initial model which just relates the items explicitly mentioned in the conditional rule:
Circle Triangle
…
Like before, each row represents a conjunction of items (a circle paired with a triangle). The categorical premise “There is a Circle” then leads easily to the conclusion “There is a Triangle” (the valid, Modus Ponens argument). Likewise, the initial model leads to the Affirmation of the Consequent (AC) fallacy: from the model, the premise “There is a Triangle” leads to the conclusion “There is a Circle”.
In contrast, being told “There is not a Triangle” leads to no inference because this initial model does not include any representation of “No Triangle” cases. As we saw above, people do often fail to draw the valid, Modus Tollens inference that “There is not a Circle”.
The AC fallacy and the failure to draw the valid MT inference will both be avoided if we expend the mental effort to “flesh out” the other mental models that are consistent with the information in the conditional by constructing models that explicitly incorporate “No circles” and “No triangles”:
Circle Triangle
No Circle No Triangle
No Circle Triangle
With this fully explicit set of possibilities, the “No Triangle” premise leaves us with only one model – that in which there is “No Circle” (i.e., we draw the Modus Tollens inference). Likewise, we avoid affirming the consequent – given the premise “There is a Triangle”, inspecting the models reveals that the presence of a Triangle does not permit a conclusion about the presence or absence of a Circle.
The same kind of approach can be used for Wason’s selection task (see e.g., Ragni et al., 2018). Just as for syllogistic reasoning, there is disagreement about the adequacy of the mental models approach (e.g., Baratgin et al., 2015).
Approach 4: Consider the role of function and experience
As before, framing and experience shape people’s responses in propositional reasoning tasks, and these effects illuminate the underlying mental processes.
Focusing on the selection task: one early observation was that performance is made easier by using familiar materials. Another key finding is that the selection task seems to be easier when it is cast in terms of familiar social rules rather than abstract symbols.
For example, Griggs and Cox (1982) asked people to imagine that they are police officers responsible for ensuring that people conform to the rule “If a person is drinking beer, then the person must be over 19 years of age”. Each card represented a person, with their age on one side and what they are drinking on the other; the task was to “select the card or cards that you definitely need to turn over to determine whether or not people are violating the rule”.
The cards were:
Drinking a Beer Drinking a Coke Age 22 Age 16
The same participants also completed a standard, abstract version of the task with the rule “If a card has an ‘A’ on one side, then it has a ‘3’ on the other side” and the cards A, B, 2, and 3. (Task order was counterbalanced.)
* With the Abstract version of the task, no participant selected the P,not-Q combination (i.e., no-one choose A and 2).
* With the thematic version, it is much easier to see that we need to turn over the P and not-Q cards (i.e., “Drinking a Beer” and “Age 16”) to see whether the rule has been violated; and 29 out of 40 participants responded correctly.
Memory cueing?
Why did the change of format make the task so much easier? The task has subtly changed from reasoning about the tests needed to establish the truth/falsity of a proposition to identifying cases where a rule has been violated. Reasoning about obligations and permissible behaviours is called deontic reasoning, and is arguably a different type of thinking from that required by the abstract selection task.
One explanation for superior performance with the deontic version of the task is that people have prior experience with the rule in question; Griggs and Cox (1982) interpreted their results as reflecting retrieval of previous experience with the rule (and with instances that would violate it).
Support for this idea comes from cross-cultural studies in which thematic framing only improved performance for participants whose country has a rule of that kind (e.g., Cheng & Holyoak, 1985).
Cheater detection?
Familiarity with the materials cannot be the whole explanation for superior performance with modified versions of the selection task, however, because we also see improvements with rules that are completely novel.
For example, Cosmides (1989) administered versions of the selection task that involved a fictional Polynesian tribe (the Kaluame) and rules such as:
“If a man eats cassava root, then he must have a tattoo on his face”
with cards representing 4 men:
Tattoo No tattoo Eats cassava Eats molo nuts
The task is to see if the rule is being broken. In one condition, the rule was framed as a simple description of co-occurrence observed by an anthropologist. The proportion of participants who chose the “P, not-Q” combination (i.e., “Eats cassava” and “No tattoo”) was only 21%, similar to the proportion seen with abstract materials. However, when the same rule was framed as a social contract (cassava root being an aphrodisiac that the tribe’s Elders decree should be limited to married men, who are distinguished by having facial tattoos), the proportion of people who chose P not-Q rocketed to 75%.
Cosmides argued that humans have an evolved sensitivity to violations of social contracts, which can be thought of as conditionals of the form “If you take a benefit, then you pay a cost”.
In a subsequent test, Cosmides employed a “switched” version of the social contract rule:
“If a man has a tattoo on his face, then he eats cassava root”
Tattoo No tattoo Eats cassava Eats molo nuts
The logically-correct answer is now to turn over the “Tattoo” and “Eats molo nuts” cards, but only 4% of people did this. However, if people are still on the hunt for cheats, they will focus on the “no tattoo” and “eats cassava” cards. 67% of people did this.
Cosmides took these findings as evidence for evolved “social contract algorithms” which underlie human reasoning.
Although very famous, the evolved cheater-detection idea is deeply flawed. One basic problem is that we see facilitation of performance on the selection task (i.e., increased P, not-Q selections) with rules which cannot realistically be described as “If you take a benefit, then you pay a cost”. For example, Manktelow (p.84) reports a study in which the rule was “If you clear up spilt blood, then you must wear rubber gloves”. Approximately 75% of participants correctly choose the P, not-Q cards (i.e., Clearing up Blood, and Not Wearing Gloves) with this framing, but “clearing up spilt blood” doesn’t constitute a “benefit” from a social contract!
Relevance and Utility
A more general approach posits that choices in the selection task depend on the relevance or utility of the various cards to the question that they think they are being asked. There are various versions of this basic idea, but the general framework can be related to some of the foregoing findings:
* The items mentioned in the rule are likely to seem particularly relevant and, as we saw above, the matching bias suggests that people select these irrespective of the rule they are being asked to test
* In the “social contract” versions of the task studied by Cosmides, there is high utility attached to finding a cheat. Likewise, there is value in identifying a nurse who doesn’t wear gloves when clearing up blood
Girotto et al. (2001) provide evidence that it is the perceived relevance/value of the options, rather than the detection of rule-violations, which determines choices in the selection task. Participants took the role of employees in a travel agency, and went through four successive versions of the Selection Task, as follows:
* “True descriptive”. It is 1979. A customer would like to travel to East Africa but is allergic to the cholera immunization. You seek to show the customer that there is a rule that “If a person travels to any East African country, then that person must be immunized against cholera”. There are four cards representing countries and the vaccines they require: Somalia; Sweden; Requires Cholera; Requires None. Which cards do you need to turn over to find out whether the rule is true? 65% of people chose the P,Q combination (Somalia and Cholera); only 9% chose P,not-Q (Somalia, None). This replicated the usual finding of poor performance on the selection task
* “True deontic”. Now your boss asks you to check whether customers have followed the rule. The four cards represent travellers and their immunization status: Mr Neri, Ethiopia; Mr Verdi, Canada; Immunized, Cholera; Immunized, None. Which cards do you need to turn over to see if people have followed the rule? Now only 26% of participants made the P,Q selection (Neri, Cholera), whereas 62% chose P,not-Q (Neri, None)
At this point it looks like framing the task as one of rule-violation boosts the selection of the logically-correct P,not-Q combination, as predicted by theories that argue for specialized systems for deontic reasoning/cheater detection.
But the experiment continued…
* “False descriptive”. Now is it the present day and you are thinking about going to East Africa yourself, and are allergic to the cholera immunization. However, you think that the immunization is no longer required. Your boss disagrees. You are confronted with cards representing countries and their required immunizations: Kenya; Ireland; Requires Cholera; Required None. Which do you have to turn over to see whether it is true that “If a person travels to any East African country, then that person must be immunized against cholera”? Now 15% chose P,Q (Kenya, Cholera) and 47% chose P,not-Q (Kenya, None)
Although this is not a situation where we are invited to detect rule-breakers, people have been led to the correct P,not-Q selection by the perceived relevance of those options – the framing leads one to regard it as important to establish that the boss is wrong.
The final condition was:
* “False deontic”. It turns out that you were right and that the rule is no longer applicable. Your boss is worried that she may have mis-informed customers, and asks you to check client records to see whether customers have followed the rule. The four cards are: Mr Rossi, Eritrea; Mr Bianco, France; Immunized, Cholera; Immunized, None. 71% chose P,Q (Rossi, Cholera) and only 15% chose P,not-Q (Rossi, None).
Even though the framing is now deontic, people are making the classic selection-task-error of choosing P and Q. Why? Because the wording of the task gives great relevance to the possibility that people might have followed a rule unnecessarily (and might therefore sue the company, for example).
These effects reiterate that we have come a long way from studying “pure” propositional reasoning. Indeed, Sperber and Girotto (2002, p.277) argue:
“relevance-guided comprehension processes tend to determine participants’ performance and pre-empt the use of other inferential capacities. Because of this, the value of the selection task as a tool for studying human inference has been grossly overestimated.”
Integrating approaches
The approaches discussed above are not mutually exclusive. Different people will employ different strategies (e.g., heuristic responding vs a rigorous attempt to construct mental models) at different times. Furthermore, the effects of experience and relevance discussed in the preceding section are not incompatible with accounts that emphasize interpretation of terms or the construction of mental models. As just two examples:
* The interpretation of “If” as conditional or biconditional can depend on the content of the rule. For example, Wagner-Egger (2007) used 2 deontic versions of the 4 card task and probed not just participants’ card selections, but also their understanding of the rule (by asking what would have to be on the reverse of each card, assuming the rule is true). In one version, the rule was “If a customer is drinking beer, then he/she must be over 18 years of age”. Most participants interpreted “If” as a conditional (i.e., you might be over 18 and not drink beer!) and made the “correct” p-not-q (“beer”, “under 18”) card selection – like we saw in previous work. In another version, the rule was “If a customer spends more than 100 Swiss Francs, then he/she receives a free gift”. Now most participants interpret the rule biconditionally (i.e., as meaning “if and only if you spend the money do you get the gift”) and the “p-not-q” card selection pattern was less frequent than turning over all 4 cards (as one should, if one has adopted the biconditional interpretation of “If”).
* In the Mental Models framework, past knowledge affects the ease with which a full set of models will be fleshed-out. For example, “If it was foggy, then the match was cancelled. The match was not cancelled.” This readily leads to the usually-difficult Modus Tollens conclusion (“It was not foggy”) because existing knowledge of the fog-sports relationship means we readily flesh out the full set of mental models supported by the conditional “If, then” statement.
* Likewise, Mental Models theory can accommodate the effects of relevance and of differing interpretations of the premises (e.g., conditional vs biconditional), by assuming that these factors influence the formulation of the “initial” model and the fully explicit set of models (see Ragni et al., 2018).
The approaches and ideas discussed in this lecture are therefore not necessarily in opposition to one another.
The “New Paradigm”
In the past few years, there has emerged a new way of thinking about the kinds of tasks and effects that we have discussed. The core idea of this “new paradigm” is that the way humans approach such tasks rests upon the calculus of probability rather than the calculus of logic. Conventional logic requires us to accept statements such as “All A are B” or “If A, then B” as absolutely true, or to reject them as absolutely false. In real discourse, such statements involve degrees of probability or belief. For example, we might attach a high probability to the conditional: “If Abdul is a Baker, then he likes Cakes” – e.g., this statement might be judged to be likely to be true, but not certain (after all, nothing ever is). We can also assign some degree of belief to Abdul being a Baker and to him liking Cakes. If someone now asserts that “Abdul is a Baker”, we can update all of these beliefs – in Abdul’s status as a Baker, in his probable liking for Cakes, and potentially in the truth of the conditional “If, then” statement, too. There is an extensive and complex body of work in this area, which is beyond the scope of the current lectures – but the suggested reading gives an introduction to this approach.
Conclusions for deductive reasoning
Reasoning is a huge topic. Nonetheless, general principles emerge:
1. Human reasoning often deviates from formal logic
2. We can identify simplifying strategies that capture some aspects of performance
3. People often interpret the terms of reasoning problems differently from the intended meaning of the experimenter, but otherwise reason appropriately
4. We can try to develop detailed accounts of the steps by which people reason; Mental Models theory is one prominent examples
5. Responses are greatly influenced by the framing of the problem, the participant’s background knowledge, and the way that they interpret the task
6. In fact, these contextual and interpretational effects mean that, in many cases, our “reasoning” experiments may not be studying the types of reasoning that the experiments originally intended at all!
7. Nonetheless, examining the patterns of performance across multiple versions of the tasks can illuminate the mental processes that underlie performance on these tasks and, more generally, tell us something about how people “think” when they tackle complex problems
Lecture 3: Risky Choice
This lecture will cover some of the key findings and theories in the cognitive psychology of risky choice, focusing on “decisions from description”. We will examine:
* How typical behaviour deviates from the prescriptions of conventional theories of rational choice
* How these results led to Prospect Theory as a model of human decisions under risk
* Some of the empirical and conceptual problems/limitations of Prospect theory
Introduction
A decision is a choice between alternatives that is intended to produce a desired or favourable outcome.
Types of choices
multi-attribute choice
Most decisions involve multi-attribute choice – one must select between 2 or more options that differ in 2 or more attributes (e.g., choosing between 3 phones that differ in price, screen size, and battery life).
inter-temporal choice
In inter-temporal choice, one of the attributes that varies is time (e.g., would you rather receive £10 right now, or £25 one year from today?) -Heavily discount the future, the discount rate of people vary. Using this reasoning scenario, we could find the choice of people whether planning along the future, this might vary due to personal diversity.
risky choice
In risky choice, one or more of the possible outcomes are probabilistic (i.e., they are not certain to occur). Sometimes the probabilities are not known precisely, in which case the decision may be referred to as “under uncertainty” or “under ambiguity” (there is some disagreement about terminology; some authors use “risk” and “uncertainty” interchangeably to mean situations whose outcomes are probabilistic.)
Risky choices are made “from description” (information about the options is explicitly presented – e.g., in writing); other choices are made “from experience” (the decision-maker has to learn the outcomes and their probabilities by repeatedly sampling the environment).
Sometimes risky choices are made “from description” (information about the options is explicitly presented – e.g., in writing); other choices are made “from experience” (the decision-maker has to learn the outcomes and their probabilities by repeatedly sampling the environment). There are important differences between these two kinds of task (see e.g., Wulff et al., 2018).
The archetypal paradigm for studying risky choice involves presenting a choice between gambles, such as:
A. An 80% chance of £4000 (and a 20% chance of nothing)
B. £3000 for sure
Would you rather play A or B?
The role of this type of choice task in studies of decision-making has been compared to the study of Drosophila in cell biology, and has led to sophisticated accounts of how people evaluate and choose between risky prospects.