Learning and rational choice Flashcards by Emma Poynton-Smith

What is Monty Hall’s three door problem?

A problem on a game show where contestants have a choice of three doors, one of which hides a car and the other two hide booby prizes. The contestant chooses one door and then Monty Hall opens one of the unchosen doors to reveal a booby prize, they then have the choice to stick with or switch their original choice.

How well did you know this?

Not at all

Perfectly

What strategy is it better to follow in the Monty Hall problem?

It is always better to switch.

How well did you know this?

Not at all

Perfectly

What do most people choose to do in the Monty Hall problem?

Stick - the solution is counter-intuitive and even academic statisticians have difficulty with it.

How well did you know this?

Not at all

Perfectly

Explain the probabilities involved in the Monty Hall problem.

At the start contestants have 1/3 chance of choosing the right door and 2/3 chance of getting it wrong. As the door Monty Hall opened had a 0/3 chance of being right (he had no choice), the remaining door must have 2/3 chance of revealing the car.

How well did you know this?

Not at all

Perfectly

What did Granberg (1999) and Granberg (1995) find?

That the Monty Hall dilemma is difficult - around 80-90% of participants stick.

How well did you know this?

Not at all

Perfectly

What did Aaron (1998) state?

That people stick because the Monty Hall dilemma presents a cognitive illusion in which participants believe that the odds of winning the car by either sticking or switching are 50:50.

How well did you know this?

Not at all

Perfectly

What did Granberg (1995) do and find?

Asked participants how they’d feel in hypothetical examples of the Monty Hall game, found that they reported they’d feel worse if they had switched and lost than if they stayed with their choice and lost.

How well did you know this?

Not at all

Perfectly

What do Granberg (1995)’s findings suggest?

That regret theory/status quo bias may be partly to blame for people being more likely to stick on the Monty Hall game.

How well did you know this?

Not at all

Perfectly

What did Gilovich (1995) do and find?

Asked participants to rate the value of the booby prize in the Monty Hall game, and found that participants who switched assigned a high monetary value than those who stuck, suggesting that the expected utility for making the wrong choice differs according to whether the error is one of commission or omission.

How well did you know this?

Not at all

Perfectly

What implications are there of Gilovich (1995)’s findings?

As the EU of making the wrong choice differs according to whether the error is one of commission or omission, there might be a framing effect.

How well did you know this?

Not at all

Perfectly

What is the Russian Roulette Dilemma?

The counterpart to Monty Hall, in which one door conceals a (terminal) loss and the other doors don’t.

How well did you know this?

Not at all

Perfectly

What is the optimal strategy in the Russian Roulette dilemma?

To stick to avoid the loss.

How well did you know this?

Not at all

Perfectly

What do participants tend to do in the in the Russian Roulette dilemma?

They make the sub-optimal choice and switch, although they are less ready to switch in this game than they are to stick in the Monty Hall game.

How well did you know this?

Not at all

Perfectly

What implications does people possessing the normative processes (cognitive architecture) that would allow them to make the optimum choice have on the Monty Hall game?

If this is true, then when they’re permitted to play the game on successive occasions for real rewards they should learn to switch.

How well did you know this?

Not at all

Perfectly

What did Friedman (1998)do?

An experiment on learning to switch in the Monty Hall game with two parts.

How well did you know this?

Not at all

Perfectly

What was part 1 of Friedman (1998)’s experiment, and what did it find?

Participants played 10 rounds of the game and received $0.40 for the grand prize and $0.10 for the booby prize. Found that the proportion of participants who switched increased from less than 10% at the start to around 30% at the end (only 6 switched more than half the time).

How well did you know this?

Not at all

Perfectly

What was part 2 of Friedman (1998)’s experiment?

Participants received one of four treatments:

Incentives group: received larger financial rewards and penalties (gain of $1.00 for grand and loss of $0.50 for booby)
Track record group: Ss required to record the outcome of each round of the game that they played, along with the outcomes of the strategies of always sticking/switching
Advice group: received conflicting explanations about why switching/sticking was the best
Compare group: shown the results of the first 40 participants and stated that approx. 60% of switch choices won grand prizes and approx. 30% of stick won.

How well did you know this?

Not at all

Perfectly

What did part 2 of Friedman (1998)’s experiment find?

Each of the treatments led to a steady increase in the number of switch choices
They rose from 40% at the start to 53% at the end

How well did you know this?

Not at all

Perfectly

What do Friedman (1998)’s findings suggest?

Study These Flashcards

Although in none of the treatments did the switch approach 100%, trends suggest that given sufficient number of rounds (approx 500) the normative benchmark would’ve been reached.
Practice enables learning of outcomes for strategies, as shown by the increased switch rate.

What is a problem with Friedman (1998)’s conclusions?

Study These Flashcards

The idea that the normative benchmark would have been reached is statistically dubious, as shown by probability matching experiments with many trials - learning seems to approach but not reach an asymptote.

What are choice anomalies?

Study These Flashcards

Situations in which humans consistently make the sub-optimal decision.

What did Friedman (1998) state about choice anomalies?

Study These Flashcards

Humans possess the cognitive architecture to be rational and may be bad at probabilities due to having evolved an architecture unsuitable to the modern environment, but we LEARN (optimise under constraints). Consequently, “every choice ‘anomaly’ can be greatly diminished or entirely eliminated in appropriately structured learning environments.”

What are bandit problems?

Study These Flashcards

Problems used as evidence for human irrationality and prospect theory in which people consistently fail to maximise expected utility.

Give some examples of bandit problems.

Study These Flashcards

Social dilemma (tragedy of the commons)
Addictive behaviours
Probability matching
Melioration

What did Arrow (1958) state about bandit problems?

Individuals don't reach the optimal behaviour (asymptotic) even after an indefinitely large amount of learning in bandit problems.

What did Tversky & Edwards (1966) do?

An experiment in which judges had to predict which out of 2 lights was going to turn on next.

What did Tversky & Edwards (1966) find?

Judges probability matched - if the left light turned on 70% of the time, judges would predict 70% for left light and 30% for right light.

Why is probability matching a suboptimal choice in Tversky & Edwards (1966)'s experiment?

As judges' predictions are independent of the lights actually turning on, matching the probabilities of the events would lead to judges correctly predicting left light (.7*.7) = .49 of the time and the right light (.3*.3) = .09 of the time = accuracy of .58, compared to the optimal strategy of consistently picking the left light (.7x1 = .7).

What did the judges in Tversky & Edwards (1996) report as the best strategy?

To observe the frequencies of each light (i.e. calculating reinforcements) and then make predictions that match those frequencies (which is what most did) - probability matching.

What did Neimark and Shuford (1959) find?

In trials using ten card blocks of red and black cards, participants probability matched, then improved over time but didn't reach the asymptote.

What is a problem with Neimark and Shuford (1959)'s experiment?

The trial probabilities weren't independent and they randomised the cards without replacement.

What does the 'right experiment' need in an appropriate learning environment?

- Schedules are randomised with replacement | - Extended and distributed training

Describe a structured environment in a learning experiment.

- Feedback | - Financial incentives

Describe an unstructured environment in a learning experiment.

- No feedback or financial incentives

What did Shank and Tunney (2002) do?

Compared the mean proportion of maximising responses for flashing lights over 1500 trials for comparing the effects of payoff and feedback.

What did Shank and Tunney (2002) find?

Maximising response proportions increased to about 0.9, highest for the payoff+feedback group, with the lowest for the no payoff+no feedback group.

What conclusions can be made from research into probability matching?

- Probability matching can be made to disappear in appropriately structured learning environments (i.e. randomised schedules with replacement, many trials, feedback and financial incentives). - In this respect human decision making is rational (optimised under constraints).

What is melioration?

The fact that in a choice situation when two variable reinforcement schedules are imposed, organisms match their relative response rate to the relative reinforcement rate. This happens because choice responding occurs to the alternative with the higher local rate of reinforcement, and that local reinforcement rate itself changes continuously as a function of the behavioural allocation.

What was melioration theory derived from?

Law of Effect - it's a theory of the Matching Law.

What did Herrnstein (1990) state about melioration?

“A rise or fall in the reinforcement of a response causes the rate of occurrence of the response to change in the same direction. Should there be an inequality in unit returns from two alternatives, behaviour ‘meliorates', redistributing itself toward the more lucrative, hence stronger, alternative”

What did Herrnstein et al. (1993) find?

Participants favoured a smaller, sooner reward to a larger, later one. This has important applications for addiction.

What have experiments on variable magnitude schedules found?

Participants learn over time to choose the rational response, which supports RCT.

What varies in a) variable magnitude and b) variable ratio schedules?

a) The size of the reward. | b) The fixed outcome varies in probability.

What has research found that participants tend to do in variable ratio schedule experiments?

Make the irrational choice.

Are maximising responses higher in variable magnitude or variable ratio schedules?

Magnitude.

What can be concluded from research into melioration and variable schedules?

Melioration can be made to disappear in appropriately structured environments, which provides further support for the rationality of human decision making.

What is a problem with the idea that melioration can be made to disappear in appropriately structured environments?

Impulsive behaviour is relatively difficult to abolish for ratio schedules. This can be argued to be self-evident from real life examples such as smoking - a smoker will definitely enjoy their next cigarette but only probably die from it.

Learning and rational choice Flashcards

(47 cards)