Learning and rational choice Flashcards Preview

C82COG - Thinking > Learning and rational choice > Flashcards

Flashcards in Learning and rational choice Deck (47):

What is Monty Hall's three door problem?

A problem on a game show where contestants have a choice of three doors, one of which hides a car and the other two hide booby prizes. The contestant chooses one door and then Monty Hall opens one of the unchosen doors to reveal a booby prize, they then have the choice to stick with or switch their original choice.


What strategy is it better to follow in the Monty Hall problem?

It is always better to switch.


What do most people choose to do in the Monty Hall problem?

Stick - the solution is counter-intuitive and even academic statisticians have difficulty with it.


Explain the probabilities involved in the Monty Hall problem.

At the start contestants have 1/3 chance of choosing the right door and 2/3 chance of getting it wrong. As the door Monty Hall opened had a 0/3 chance of being right (he had no choice), the remaining door must have 2/3 chance of revealing the car.


What did Granberg (1999) and Granberg (1995) find?

That the Monty Hall dilemma is difficult - around 80-90% of participants stick.


What did Aaron (1998) state?

That people stick because the Monty Hall dilemma presents a cognitive illusion in which participants believe that the odds of winning the car by either sticking or switching are 50:50.


What did Granberg (1995) do and find?

Asked participants how they'd feel in hypothetical examples of the Monty Hall game, found that they reported they'd feel worse if they had switched and lost than if they stayed with their choice and lost.


What do Granberg (1995)'s findings suggest?

That regret theory/status quo bias may be partly to blame for people being more likely to stick on the Monty Hall game.


What did Gilovich (1995) do and find?

Asked participants to rate the value of the booby prize in the Monty Hall game, and found that participants who switched assigned a high monetary value than those who stuck, suggesting that the expected utility for making the wrong choice differs according to whether the error is one of commission or omission.


What implications are there of Gilovich (1995)'s findings?

As the EU of making the wrong choice differs according to whether the error is one of commission or omission, there might be a framing effect.


What is the Russian Roulette Dilemma?

The counterpart to Monty Hall, in which one door conceals a (terminal) loss and the other doors don't.


What is the optimal strategy in the Russian Roulette dilemma?

To stick to avoid the loss.


What do participants tend to do in the in the Russian Roulette dilemma?

They make the sub-optimal choice and switch, although they are less ready to switch in this game than they are to stick in the Monty Hall game.


What implications does people possessing the normative processes (cognitive architecture) that would allow them to make the optimum choice have on the Monty Hall game?

If this is true, then when they're permitted to play the game on successive occasions for real rewards they should learn to switch.


What did Friedman (1998)do?

An experiment on learning to switch in the Monty Hall game with two parts.


What was part 1 of Friedman (1998)'s experiment, and what did it find?

Participants played 10 rounds of the game and received $0.40 for the grand prize and $0.10 for the booby prize. Found that the proportion of participants who switched increased from less than 10% at the start to around 30% at the end (only 6 switched more than half the time).


What was part 2 of Friedman (1998)'s experiment?

Participants received one of four treatments:
- Incentives group: received larger financial rewards and penalties (gain of $1.00 for grand and loss of $0.50 for booby)
- Track record group: Ss required to record the outcome of each round of the game that they played, along with the outcomes of the strategies of always sticking/switching
- Advice group: received conflicting explanations about why switching/sticking was the best
- Compare group: shown the results of the first 40 participants and stated that approx. 60% of switch choices won grand prizes and approx. 30% of stick won.


What did part 2 of Friedman (1998)'s experiment find?

- Each of the treatments led to a steady increase in the number of switch choices
- They rose from 40% at the start to 53% at the end


What do Friedman (1998)'s findings suggest?

- Although in none of the treatments did the switch approach 100%, trends suggest that given sufficient number of rounds (approx 500) the normative benchmark would've been reached.
- Practice enables learning of outcomes for strategies, as shown by the increased switch rate.


What is a problem with Friedman (1998)'s conclusions?

The idea that the normative benchmark would have been reached is statistically dubious, as shown by probability matching experiments with many trials - learning seems to approach but not reach an asymptote.


What are choice anomalies?

Situations in which humans consistently make the sub-optimal decision.


What did Friedman (1998) state about choice anomalies?

Humans possess the cognitive architecture to be rational and may be bad at probabilities due to having evolved an architecture unsuitable to the modern environment, but we LEARN (optimise under constraints). Consequently, "every choice 'anomaly' can be greatly diminished or entirely eliminated in appropriately structured learning environments."


What are bandit problems?

Problems used as evidence for human irrationality and prospect theory in which people consistently fail to maximise expected utility.


Give some examples of bandit problems.

- Social dilemma (tragedy of the commons)
- Addictive behaviours
- Probability matching
- Melioration


What did Arrow (1958) state about bandit problems?

Individuals don't reach the optimal behaviour (asymptotic) even after an indefinitely large amount of learning in bandit problems.


What did Tversky & Edwards (1966) do?

An experiment in which judges had to predict which out of 2 lights was going to turn on next.


What did Tversky & Edwards (1966) find?

Judges probability matched - if the left light turned on 70% of the time, judges would predict 70% for left light and 30% for right light.


Why is probability matching a suboptimal choice in Tversky & Edwards (1966)'s experiment?

As judges' predictions are independent of the lights actually turning on, matching the probabilities of the events would lead to judges correctly predicting left light (.7*.7) = .49 of the time and the right light (.3*.3) = .09 of the time = accuracy of .58, compared to the optimal strategy of consistently picking the left light (.7x1 = .7).


What did the judges in Tversky & Edwards (1996) report as the best strategy?

To observe the frequencies of each light (i.e. calculating reinforcements) and then make predictions that match those frequencies (which is what most did) - probability matching.


What did Neimark and Shuford (1959) find?

In trials using ten card blocks of red and black cards, participants probability matched, then improved over time but didn't reach the asymptote.


What is a problem with Neimark and Shuford (1959)'s experiment?

The trial probabilities weren't independent and they randomised the cards without replacement.


What does the 'right experiment' need in an appropriate learning environment?

- Schedules are randomised with replacement
- Extended and distributed training


Describe a structured environment in a learning experiment.

- Feedback
- Financial incentives


Describe an unstructured environment in a learning experiment.

- No feedback or financial incentives


What did Shank and Tunney (2002) do?

Compared the mean proportion of maximising responses for flashing lights over 1500 trials for comparing the effects of payoff and feedback.


What did Shank and Tunney (2002) find?

Maximising response proportions increased to about 0.9, highest for the payoff+feedback group, with the lowest for the no payoff+no feedback group.


What conclusions can be made from research into probability matching?

- Probability matching can be made to disappear in appropriately structured learning environments (i.e. randomised schedules with replacement, many trials, feedback and financial incentives).
- In this respect human decision making is rational (optimised under constraints).


What is melioration?

The fact that in a choice situation when two variable reinforcement schedules are imposed, organisms match their relative response rate to the relative reinforcement rate. This happens because choice responding occurs to the alternative with the higher local rate of reinforcement, and that local reinforcement rate itself changes continuously as a function of the behavioural allocation.


What was melioration theory derived from?

Law of Effect - it's a theory of the Matching Law.


What did Herrnstein (1990) state about melioration?

“A rise or fall in the reinforcement of a response causes the rate of occurrence of the response to change in the same direction. Should there be an inequality in unit returns from two alternatives, behaviour ‘meliorates', redistributing itself toward the more lucrative, hence stronger, alternative”


What did Herrnstein et al. (1993) find?

Participants favoured a smaller, sooner reward to a larger, later one. This has important applications for addiction.


What have experiments on variable magnitude schedules found?

Participants learn over time to choose the rational response, which supports RCT.


What varies in a) variable magnitude and b) variable ratio schedules?

a) The size of the reward.
b) The fixed outcome varies in probability.


What has research found that participants tend to do in variable ratio schedule experiments?

Make the irrational choice.


Are maximising responses higher in variable magnitude or variable ratio schedules?



What can be concluded from research into melioration and variable schedules?

Melioration can be made to disappear in appropriately structured environments, which provides further support for the rationality of human decision making.


What is a problem with the idea that melioration can be made to disappear in appropriately structured environments?

Impulsive behaviour is relatively difficult to abolish for ratio schedules. This can be argued to be self-evident from real life examples such as smoking - a smoker will definitely enjoy their next cigarette but only probably die from it.