Instrumental learning Flashcards Preview

PSY2304 Biological Basis of Behaviour > Instrumental learning > Flashcards

Flashcards in Instrumental learning Deck (18)
Loading flashcards...

early work

Animal psychologists were studying instrumental learning before Pavlov’s work became known:
- Small (rats in scaled-down Hampton Court maze),
- Thorndike (puzzle boxes)


instrumental conditioning

stim (in box) --> response R reflex.

When the animal encountered a certain discriminative stimulus S then it emitted the response R.

Reinforcement established the link between S and R.


instrumental conditioning procedures

Positive reinforcement = R-->appetitive - (more R)

Punishment = R-->aversive - (less R)

Negative reinforcement = R-->no aversive - (more R)

Omission Training = R-->no appetitive - (less R)


reward and reinforcement

Thorndike’s Law of Effect: animals repeat actions that lead to a satisfying state of affairs, and this is called reinforcement

Hull: reinforcement is due to drive reduction, hence the animal will work for food if it is hungry, or for water if it is thirsty etc.


schedules of reinforcement

Extinction applies to instrumental conditioning, too – stop giving reinforcers and the response ceases
But we can get away with only reinforcing some of the responses the subject emits, and still get stable conditioned responding
A schedule of reinforcement is a rule for deciding which responses we reinforce
Different schedules lead to different, highly predictable, patterns of response, instantly recognisable on a cumulative record


sample schedules and their effects

Continuous reinforcement, CRF – reinforce every response

Fixed ratio, FR – reinforce every nth response. Pause after each reinforcement followed by fast responding
- Gets sated if give to many – why this method is good

Variable ratio, VR – reinforce every nth response on average. Continuous fast responding
- Doesn’t know when the reinforcer will be given

Fixed interval, FI – reinforce the first response after time t has elapsed since the last reinforcer. Pause after each reinforcement followed by gradually increasing response rate
- Slows down responding then speeds up when time nearly up

Variable interval, VI – same as FI but with a variable time period. Continuous moderate response rate
- More stable pattern of responding


ratio schedules

reinforcement depends on number of responses
- if 1 = continuous reinforcement
- if not 1 = partial or intermittent reinforcement
- fixed ratio schedule, e.g. FR10
- variable ratio schedule, VR10 (the average of responses required equals 10)


interval schedules

reinforcement depends on time interval

Fixed interval, e.g. FI4

variable interval, e.g. VI2 (again the average…) – most used


instrumental learning: can it be explained as form of Pavlovian conditioning?

US = reward, e.g. food, freedom

UR = natural response, e.g. eating, approach

CS = starting condition, e.g. start box of maze, inside of puzzle box, sight of lever

CR = approach

So when the rat “learns to press” the lever it may simply find the lever attractive (stimulus substitution) and bump into it because of this. Is the apparent learning of the response simply an artifact brought about by Pavlovian conditioning?


omission schedule

Distinguishing between Pavlovian and instrumental conditioning

see notes


Grindley's bidirectional control

Switch either side of head

If touches right switch – gets carrot

Other way = nothing happens

Change contingency – can it learn to turn other way? - fi can then it is learning the action

The fact that the animals will learn to turn their heads left or right when the buzzer has the same relationship with reward establishes that this is not Pavlovian conditioning.


contemporary issue

Actions and Habits -is all instrumental learning the same?

We shall see that the answer is no.

In some circumstances the S->R account seems to be the correct one.

In others, there is clear evidence that the animal has some expectancy of an outcome and modifies it’s behaviour accordingly.



evidence that animals had some representation of the outcome in instrumental learning. If the outcome is made aversive, they respond less in extinction.

Press lever for food

Then extinction – food and chloride that makes them ill – don’t want them any more – dev group

Non group - get given them on different days – not paired
Given no pellets next time in box

Dev = less lever presses and don’t re-acquire as quickly again

see notes

could show that they responded less in extinction to an outcome that had been made aversive, but only if they had not been overtrained (100), if they had (500) they continued to respond.

Overtrained – 500 trials – not getting any better but more experience – becomes habitual

Dev group still press lever as much as before extinction

Just doing it for no reward – habitual response

overtrained animals are exhibiting what Adams and Dickinson called a habit, something that an S->R account would expect, where the current outcome value has no impact on the probability of making a response in the presence of the discriminative stimulus.


Colwill and Rescorla (1990)

some representation of the outcome is involved in determining their performance

Rat in box – press lever/pull chain

Under light – lever = food, chain = sucrose water – equally valuable

Under tone - lever = water, chain = food

Conditional on stimulus presented

Water + lithium chloride – don’t like any more- devalued


see notes

Light – don’t pull chain as much – paired with chloride

Tone – press lever less – chloride

Basing decisions on outcome provided

see notes


Dickinson and the castaways dilemma

Actions that required knowledge of the expected outcome, and Habits of the S->R kind. He then set about testing this idea by proposing what he now terms the “Castaways Dilemma”

In this, someone who is castaway on a desert island is hungry but manages to find and eat coconuts. Then they become thirsty and there's no water available - what do they do?

The answer is pretty obvious - they drink coconut milk - but would an animal have the ability to learn this?


castaway's dilemma - transferred to lab

see notes

They (Dawson and Dickinson) found no difference in performance of the two actions.

Both actions were performed more than in a control group who’d not been made thirsty, but they interpreted this as general activation of the available responses by thirst (which seems perfectly reasonable).

There was no sign of any outcome specific activation of an action.

Thirst energised behaviour but not doing right thing

see notes


Dickinson and Wyatt (1997)

They can solve the Castaway’s dilemma!

The animals now respond more for the sugar water under thirst.

But only if you let the animal learn that one of the reinforcers (in this case sugar water) is valuable under the new drive state (thirst) before test.

This was the new idea they incorporated in their revised design.

see notes

Need to know that when thirsty, sugar water good

Sugar water + food while thirsty - find sugar water better – then can solve the problem

see notes



incentive learning is needed to support drive-related action on the basis of the available outcomes.

Tony Dickinson has argued for a model of instrumental performance that requires inference on the basis of these results.

Thus the animal is postulated to reason that:
1. I’m thirsty
2. If I pull the chain I get sugar water
3. Sugar water is good when I’m thirsty
4. I’ll pull the chain then.

And for this inference to be possible - each step in the chain has to be available, so the animal

Not a reflex

Pull together knowledge