Instrumental conditioning Flashcards
(20 cards)
Miller and Konorski (1928) during CS-US training added a second unrelated action
Animals learnt to make the action even though it was unrelated to the US.
Reinforcement in instrumental (operant) conditioning
Increases behaviour. Positive (add appetitive stimulus) or negative (escape by removing noxious stimulus after correct behaviour/active avoidance)
Punishment in operant conditioning
Decrease behaviour. Positive (add noxious stimuli) or negative (remove appetitive stimulus following behaviour).
Shaping by successive approximations
Complex behaviours can be built up through small steps
Ratio vs. interval reinforcement
- Fixed ratio: highest rate of responding. Reinforce after a fixed number of responses.
- Variable ratio: high rate. Reinforce after an unpredictable number of responses.
- Fixed interval: scalloped shape. Rapid responding near time for reinforcement. Reinforce after a fixed amount of time.
- Variable interval: steady, moderate responding.
Dixon et al. (2014) slot machines
Micro-reinforcement can speed up actions. 20-line slot machines (micro-rewards with a false sense of progress; variable ratio reward) reduced the post-reinforcement pause compared to 1-line slot machine. High-risk gamblers reported a greater sense of control.
Goal-directed vs. habitual behaviour
Goal-directed: actions performed because of their expected outcomes. (Not just drives, but outcome).
Habits: actions triggered by stimuli regardless of the outcome’s current value. (Response-outcome associations).
Hershberger’s (1986) Pavlovian chicken
Chick learnt the visual approach signal for food. It fails to learn to move backwards (failure of goal-directed behaviour).
Dickinson and Balleine’s (1994) 2 criteria for goal-directedness
- Instrumental criterion. Animal must learn contingency between response and outcome.
- Goal criterion. Animal must learn incentive value of a stimulus.
Instrumental criterion
Animal learns that pressing a lever gives it food.
Goal criterion
A representation of the outcome is a goal for the agent. Animal must learn the incentive value of a stimulus (eg. food reduces hunger). If the animal stops pressing the lever when it’s full, this suggests it values the outcome appropriately.
Colwill and rescorla (1985) reinforcer devaluation and instrumental responding
Rats reduced response when outcome was devalued (made aversive/fed to satiation). Response shows sensitivity to changes in outcome. (Absence of devaluation effect suggests that behaviour controlled by habit and not R-O).
Valentin et al. (2007) human goal-directed learning
OFC shows modulatory activity during selection of devalued (less activity)/non-devalued (more activity) outcomes. Participants avoided devalued outcome cue.
2 candidate theories for ABA effect
- Direct effect: exercise reduces reinforcing value of food/food not as salient or pleasurable etc.
- Indirect effect: behavioural failure to adapt to restricted feeding schedule (more likely indirect effects).
Which animals are more vulnerable to ABA effect?
Animals that develop food anticipatory behaviours (more movement before food).
Robinette et al. (2021) serotonin SIRT1 and ABA
Serotonin may delay the onset and progression of ABA in animals. SIRT1 KO mice lost less weight. SIRT1 overexpressors lost more weight.
What does SIRT1 do?
SIRT1 increases foraging and activity (when hungry) and decreases serotonin.
Dwyer and Boakes (1997) 3 experiments ABA recovery of weight
- Access to feeding schedule prior to introduction of access to the wheel.
- Move feeding to start of rats’ active period.
- Activity restricted before feeding.
Supports indirect theory as food hasn’t become less rewarding, rats just failed to adapt to feeding schedule.
Balleine et al. (2009) motivation and incentive learning
Not just need learning R->O but also that outcome changes motivational state (food reduces hunger). Rats don’t just respond based on current motivation (hunger), they also need prior experience of the outcome’s value in that motivational state. Rats trained while full don’t know that food reduces hunger, so they don’t press the lever when they later get hungery.
Brain regions for goal-directed action-outcome learning vs. habitual performance
Goal-directed: dorsomedial striatum.
Habitual: dorsolateral striatum.