Flashcards in Instrumental learning Deck (18)
Animal psychologists were studying instrumental learning before Pavlov’s work became known:
- Small (rats in scaled-down Hampton Court maze),
- Thorndike (puzzle boxes)
stim (in box) --> response R reflex.
When the animal encountered a certain discriminative stimulus S then it emitted the response R.
Reinforcement established the link between S and R.
instrumental conditioning procedures
Positive reinforcement = R-->appetitive - (more R)
Punishment = R-->aversive - (less R)
Negative reinforcement = R-->no aversive - (more R)
Omission Training = R-->no appetitive - (less R)
reward and reinforcement
Thorndike’s Law of Effect: animals repeat actions that lead to a satisfying state of affairs, and this is called reinforcement
Hull: reinforcement is due to drive reduction, hence the animal will work for food if it is hungry, or for water if it is thirsty etc.
schedules of reinforcement
Extinction applies to instrumental conditioning, too – stop giving reinforcers and the response ceases
But we can get away with only reinforcing some of the responses the subject emits, and still get stable conditioned responding
A schedule of reinforcement is a rule for deciding which responses we reinforce
Different schedules lead to different, highly predictable, patterns of response, instantly recognisable on a cumulative record
sample schedules and their effects
Continuous reinforcement, CRF – reinforce every response
Fixed ratio, FR – reinforce every nth response. Pause after each reinforcement followed by fast responding
- Gets sated if give to many – why this method is good
Variable ratio, VR – reinforce every nth response on average. Continuous fast responding
- Doesn’t know when the reinforcer will be given
Fixed interval, FI – reinforce the first response after time t has elapsed since the last reinforcer. Pause after each reinforcement followed by gradually increasing response rate
- Slows down responding then speeds up when time nearly up
Variable interval, VI – same as FI but with a variable time period. Continuous moderate response rate
- More stable pattern of responding
reinforcement depends on number of responses
- if 1 = continuous reinforcement
- if not 1 = partial or intermittent reinforcement
- fixed ratio schedule, e.g. FR10
- variable ratio schedule, VR10 (the average of responses required equals 10)
reinforcement depends on time interval
Fixed interval, e.g. FI4
variable interval, e.g. VI2 (again the average…) – most used
instrumental learning: can it be explained as form of Pavlovian conditioning?
US = reward, e.g. food, freedom
UR = natural response, e.g. eating, approach
CS = starting condition, e.g. start box of maze, inside of puzzle box, sight of lever
CR = approach
So when the rat “learns to press” the lever it may simply find the lever attractive (stimulus substitution) and bump into it because of this. Is the apparent learning of the response simply an artifact brought about by Pavlovian conditioning?
Distinguishing between Pavlovian and instrumental conditioning
Grindley's bidirectional control
Switch either side of head
If touches right switch – gets carrot
Other way = nothing happens
Change contingency – can it learn to turn other way? - fi can then it is learning the action
The fact that the animals will learn to turn their heads left or right when the buzzer has the same relationship with reward establishes that this is not Pavlovian conditioning.
Actions and Habits -is all instrumental learning the same?
We shall see that the answer is no.
In some circumstances the S->R account seems to be the correct one.
In others, there is clear evidence that the animal has some expectancy of an outcome and modifies it’s behaviour accordingly.
evidence that animals had some representation of the outcome in instrumental learning. If the outcome is made aversive, they respond less in extinction.
Press lever for food
Then extinction – food and chloride that makes them ill – don’t want them any more – dev group
Non group - get given them on different days – not paired
Given no pellets next time in box
Dev = less lever presses and don’t re-acquire as quickly again
could show that they responded less in extinction to an outcome that had been made aversive, but only if they had not been overtrained (100), if they had (500) they continued to respond.
Overtrained – 500 trials – not getting any better but more experience – becomes habitual
Dev group still press lever as much as before extinction
Just doing it for no reward – habitual response
overtrained animals are exhibiting what Adams and Dickinson called a habit, something that an S->R account would expect, where the current outcome value has no impact on the probability of making a response in the presence of the discriminative stimulus.
Colwill and Rescorla (1990)
some representation of the outcome is involved in determining their performance
Rat in box – press lever/pull chain
Under light – lever = food, chain = sucrose water – equally valuable
Under tone - lever = water, chain = food
Conditional on stimulus presented
Water + lithium chloride – don’t like any more- devalued
Light – don’t pull chain as much – paired with chloride
Tone – press lever less – chloride
Basing decisions on outcome provided
Dickinson and the castaways dilemma
Actions that required knowledge of the expected outcome, and Habits of the S->R kind. He then set about testing this idea by proposing what he now terms the “Castaways Dilemma”
In this, someone who is castaway on a desert island is hungry but manages to find and eat coconuts. Then they become thirsty and there's no water available - what do they do?
The answer is pretty obvious - they drink coconut milk - but would an animal have the ability to learn this?
castaway's dilemma - transferred to lab
They (Dawson and Dickinson) found no difference in performance of the two actions.
Both actions were performed more than in a control group who’d not been made thirsty, but they interpreted this as general activation of the available responses by thirst (which seems perfectly reasonable).
There was no sign of any outcome specific activation of an action.
Thirst energised behaviour but not doing right thing
Dickinson and Wyatt (1997)
They can solve the Castaway’s dilemma!
The animals now respond more for the sugar water under thirst.
But only if you let the animal learn that one of the reinforcers (in this case sugar water) is valuable under the new drive state (thirst) before test.
This was the new idea they incorporated in their revised design.
Need to know that when thirsty, sugar water good
Sugar water + food while thirsty - find sugar water better – then can solve the problem