Instrumental Learning Flashcards

(35 cards)

1
Q

EARLY WORK

A
  • animal psychologists studying instrumental learning before Pavlov:
    1. Small (rats in Hampton Court mazes)
  • issue of not being geared to studying learning process
    2. Thorndike (cats in puzzle boxes)
  • better for learning process focus
  • learned to escape; faster over trials
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

INSTRUMENTAL CONDITIONING

A
  • Law of Effect aka. if reward follows animal response -> association between stimuli/response = strengthened (S-R learning)
  • concept following naturally post Throndike’s analysis = S-R reflex
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

PROCEDURES

A

POSITIVE REINFORCEMENT
PUNISHMENT
NEGATIVE REINFORCEMENT
OMISSION TRAINING

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

POSITIVE REINFOREMENT

A
  • R -> appetitive aka. ^ R
  • reward follows reinforcement
    THRONDIKE
  • animals repeat actions -> satisfying state of affairs
    HULL
  • drive reduction aka. animal works for food if hungry aka. redefined “satisfying state of affairs”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

PUNISHMENT

A
  • R -> aversive aka. less R
  • reduces responding
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

NEGATIVE REINFORCEMENT

A
  • R -> no aversive aka. ^ R
  • response stops aversive stimulus that otherwise would have occurred
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

OMISSION TRAINING

A
  • R -> no appetitive aka. less R
  • response cancels reward that would normally occur = omission schedule
  • eventually leads to response reduction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

SCHEDULES OF REINFOREMENT

A
  • extinction applies to instrumental conditioning too aka. stop giving reinforcers -> response stops
  • BUT we can only get away w/reinforcing some responses pps emit w/stil stable conditioned responding
  • reinforcement schedule = rule for deciding which responses to reinforce
  • dif schedules -> dif/^ predictable response patterns; instantly recognisable on cumulative record patterns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

SIMPLE SCHEDULES & EFFECTS

A

CONTINUOUS REINFORCEMENT
FIXED RATIO
VARIABLE RATIO
FIXED INTERVAL
VARIABLE INTERVAL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

CONTINUOUS REINFORCEMENT

A
  • CRF
  • reinforces every response
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

FIXED RATIO

A
  • FR
  • reinforce every nth response
  • pause after each followed by fast responding
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

VARIABLE RATIO

A
  • VR
  • reinforce every nth response on average
  • continuous fast responding
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

FIXED INTERVAL

A
  • FI
  • reinforce first response after time (t) elapsed since last reinforcer
  • pause after each reinforcement followed by gradually ^ response rate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

VARIABLE INTERVAL

A
  • VI
  • same as FI
  • BUT w/variable time period
  • continuous moderate response rate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

RATIO SCHEDULES

A
  • reinforcement depends on responses number
  • 1 = continuous reinforcement
  • not 1 = partial/intermittent reinforcement
  • fixed ratio schedule = FR10
  • variable ratio schedule = VR10
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

INTERVAL SCHEDULES

A
  • reinforcement depends on time interval
  • ratio schedules typically support more rapid responding
  • variable ratio (ie. get reinforcement on average every 10 responses) smooths it out
  • interval schedules give rise to quite specific pattern; as interval (ie. 30s) comes to end -> responding ^ til pellet obtained then falls back
  • smoothed out in variable interval (VI); most commonly used schedule for lever pressing ie. conditioned suppression exps; gives steady responding rate
  • first response after certain time gives reward; time varies so average = 30s
17
Q

INSTRUMENTAL LEARNING = PAVLOVIAN?

A
  • US = reward ie. food/freedom
  • UR = natural response ie. eating/approach
  • CS = starting condition ie. start maze/puzzle box
  • CR = approach
  • when rat “learns” lever it may just find it attractive (stimulus substitution) aka. bumping it
  • is apparent learning simply artifact brought via Pavlovian conditioning
18
Q

OMISSION SCHEDULE

A
  • distinguishes between Pavlovian/instrumental conditioning
  • if all apparently instrumental learning = Pavlovian conditioning then rat shouldn’t learn this
  • if tone sounds -> food delivered BUT only if rat doesn’t anticipate/go into deliverance magazine
  • BUT if it does then it’s cancelled
  • rat must learn response of not entering magazine
19
Q

OMISSION SCHEDULE: RESULTS

A
  • rat just about learns not to enter magazine
  • when we pair tone/food -> natural tendency = approach magazine when tone sounds
  • gradually learns to suppress tendency hence get more ^ food
20
Q

GRIDNELY’S BIDIRECTIONAL CONTROL

A
  • another way to check instrumental learning = not due to amplification of some pre-existing response to CS via US pairing (NOT true instrumental learning)
  • earliest automated psychology exps on record
  • guinea pig which likes carrot
  • will get access to carrot if it turns head left when buzzer sounds, shifting lever
  • learns to do this
21
Q

GRINDLEY’S: RESULTS

A
  • guinea pigs will learn to turn heard when buzzer has same relationship w/reward = evidence that it’s not simple Pavlovian conditioning
  • then trained to new response to turn head to right
  • slow at start as gives old response BUT just as fast as og w/more trials
  • if it had a tendency to turn left (which was still being reinforced) this cannot explain reversal
  • has learned at least 1 new response consistent w/instrumental learning
22
Q

CONTEMPORARY ISSUE

A
  • actions/habits; is all instrumental learning the same?
  • ANS = no
  • in some circumstances S-R account = correct
  • clear evidence in others that animal has some expectancy of outcome & modifies beh accordingly
23
Q

ADAM & DICKINSON

A
  • earliest evidence of animals having some representation of outcome in instrumental learning
  • if outcome = aversive -> less response
  • animals trained to lever press for sucrose; went through devaluation phase
  • controls = getting sucrose one day/getting ill next
  • shouldn’t have any particular effect
  • exp animals = sucrose/illness paired; should not like sucrose anymore
  • would still press lever when given opportunity BUT not as much
24
Q

ADAM & DICKINSON: RESULTS

A
  • reaction could depend on how much training given lever pressing beforehand
  • if normal (100 trials) = tended not to press lever for sucrose they didn’t like (no sucrose delivered in this exp)
  • if over-trained (500 trials) then they kept pressing lever
25
ADAM & DICKINSON: HABITS
- over-trained animals exhibited habits - S-R account would expect this - habit = current outcome value has no impact on probability of making response in discriminative stimulus presence - just seeing lever activates response of pressing it automatically (ie. pulling light switch just because you see it)
26
COLWILL & RESCORLA
- some representation of outcome is involved in determining performance - light on = pressing lever -> food/pulling chain -> sucrose - if tone sounds -> reinforcers swapped around so pressing lever -> sucrose solution etc. - post training = 1 reinforcer devalued by pairing w/illness (1 shown = sucrose solution) - then test in extinction (no reinforcers)
27
COLWILL & RESCORLA: RESULTS
- response leading to devalued outcome = performed less than the other one - BUT response changes depending on whether light/tone = present - animal has good grasp of what outcome to expect in given situation; avoids the one it doesn't want
28
CASTAWAYS DILEMMA
- instrumental learning results -> Dickinson suggested 2 learning types: 1. actions (require knowledge of expected outcome) 2. habits (S-R) - tested it via castaways dilemma: someone who is castaway on desert island is hungry; eats coconuts; thirsty but no water; what do? - ANS = obvious (drink coconut milk) BUT can animals do this?
29
CASTAWAYS DILEMMA: IN THE LAB
WHEN HUNGRY - both outcomes = rewarding/performed WHEN THIRSTY - drive state changed; test in extinction so no further training
30
CASTAWAYS DILEMMA: DICKINSON (1997) BEFORE
- found no dif in performance of 2 actions - both performed more in control group who'd not been made thirsty - BUT interpreted as general activation of available responses by thirst; seemly reasonable - no sign of any outcome specific activation of an action - realised they'd missed something...
31
CASTAWAYS DILEMMA: DICKINSON (1997) AFTER
- animals CAN solve the castaway dilemma! - respond more for sugar water under thirst BUT only if you let it learn that 1 reinforcer (sugar water) = valuable under new drive state (thirst) before test - new idea incorporated into original design
32
CASTAWAYS DILEMMA: ANALYSIS
- incentive learning needed to support drive-related action on basis of available outcomes - Dickinson argued for model of instrumental performance requiring inference on basis of results - animal postulated to reason that: 1. it's thirsty 2. pulling chain -> sugar water 3. sugar water = good when thirsty 4. so it should pull the chain - each step on chain must be available for inference possibility; animal must know sugar water = valued under thirst
32
SUMMARY I
- instrumental learning cannot be explained purely as Pavlovian conditioning BUT evidence of both oft involved in beh control - 2 forms of instrumental learning: 1. knowledge of action consequences 2. S-R reflex supports habitual responding (via overtraining)
33
IMPLICATIONS
- consider addiction; role of reinforcement in maintaining drug seeking beh - over time could -> habit formation causing drug seeking beh to become independent of value of the drug; automatic response literally out of control
34
SUMMARY II
- instrumental performance that isn't habit (ie. S-R) based may well differ from habits/Pavlovian conditioning in important respects - if animal knows consequences of its actions (ie. expected outcome) -> must also represent outcome & relationship to action performed - can use knowledge to make inference in combination w/other knowledge - if animal knows outcome = valuable under certain state + outcome produced by given action (never been performed under said state) => can combine knowledge productively to give appropriate response - this is beyond simple association