Learning Part 2: Operant Conditioning Flashcards Preview

Mind, Brain & Behaviour 1 > Learning Part 2: Operant Conditioning > Flashcards

Flashcards in Learning Part 2: Operant Conditioning Deck (24):

Operant Conditioning

Goal directed behaviour
operant conditioning is concerned with how environmental stimuli shape complex goal-directed behaviours?


Edward Thorndike

His experiments, conducted at the turn of the 20th century, paved the way for a behaviourist account of voluntary behaviour

He worked with different animals: e.g. chicks, cats and dogs

He wanted to find out whether animals use reasoning to solve problems

Famous for Thorndike's puzzle box


Thorndike's puzzle box

Thorndike's puzzle box: a cat was placed inside a puzzle box and food is placed outside of the box
Is the cat able to work out a mechanism to open the door of the box to obtain the food?

The cat learned by trial and error (and success): first attempts are random, then it stumbled across solution

Cats became faster on subsequent trials in the same puzzle box

Cats learn to associate response with rewarding consequence

Consequences shape behaviour: unsuccessful responses are gradually eliminated

The conclusion is that cats learn simple stimulus-response (S-R) associations rather than complex reasoning processes


Law of Effect

Responses followed by a satisfying state of affairs are strengthened and are more likely to occur again (rewards)

Responses followed by an annoying or unsatisfactory state of affairs are weakened and are unlikely to occur again (punishment)


B.F Skinner (1904-1990)

He was influenced by Thonrndike’s work describing voluntary human behaviour using basic S-R associations and without resorting to mentalistic concepts

“Behaviour operates on the environment to generate consequences.”

Organisms learn which behaviours are emitted to earn rewards or avoid punishments

Operant describes any active (voluntary) behaviour that is produced in order to generate consequences, or is instrumental in generating consequences

Essentially everyone is trying to gain something desired or avoid something unpleasant


B.F Skinner (consequences shape behaviour)

consequences shape behaviour: unsuccessful responses are gradually eliminated



Reinforcement occurs when the consequences of an action increase the likelihood of the action being repeated

Reinforcement increases or strengthens the occurrence of a behavior in the future


Positive reinforcement +

Stimulus or event which, when presented as a consequence of a behaviour, increases the likelihood of that behaviour recurring in the future


Negative reinforcement -

Stimulus or event which, when reduced or terminated, increases the likelihood that an associated behavior will recur


Continuous reinforcement

Each response is reinforced


Partial reinforcement

Reinforcement is given only for some correct responses

Generates behavior that persists longer: learners keep "testing" for a reward


Fixed ratio schedule

Rewarded after a fixed number of correct responses

high rate of responding

faster responses yield quicker payoffs (“bursts”)
e.g. paid for producing a specific number of items


Variable ratio schedule

Rewarded after an average number of correct responses

high rate of responding: persistent responding

People/ animals hope that the next response will bring reward
e.g. gambling


Fixed interval schedule

Reinforcement for first correct response after a fixed time period

Flurry of responding right before a reward is due
e.g. test scheduled every four weeks


Variable interval schedule

Rewarded for first correct response after an average time period

Less predictable

Slow but steady pattern of responding (“testing”)
e.g. surprise quizzes



Learning more complex behaviours by reinforcing successive approximations to the desired behaviour:

Reinforce high frequency component of desired response

Drop reinforcement – behaviour becomes more variable again

Await response that is still close to desired response – then reintroduce reinforcement

keep cycling: closer approximations are achieved

Shaping of behaviour which is not in the animal’s natural repertoire



Extinction occurs when reinforcement is withheld

It is not an immediate process, often brief increase in responding

Partially reinforced responses are harder to extinguish



The use of aversive consequences to reduce undesirable behavior

Any event which decreases the likelihood that ongoing behaviour will recur


Positive punishment +

Behaviour is followed by the presentation of an aversive stimulus

Stimulus is added to situation
e.g. electric shock


Negative punishment -

Behaviour is followed by withdrawal of rewarding stimulus

Stimulus is taken away
e.g. removal of toys


problems associated with Punishment

Punishment is more effective when it is swift (no delay) and consistent (not just administered sometimes)

It is less effective than reinforcement because no desired behaviour is established

It does not cause long-term behaviour change: suppression of behaviour

When threat of punishment is removed, the behaviour returns (e.g. speed cameras)

It produces negative feelings and does not promote new learning

It may indeed teach the recipient to use punishment towards others

It is useful if behaviour is dangerous and must be changed/suppressed quickly


Operant Conditioning: Children

Reinforce alternative behaviour that is incompatible with the undesirable behaviour (e.g. respond to normal voice only, not to screaming)

Identify the crucial reinforcer (maintaining the behaviour) and stop reinforcing the problem behaviour (extinction)

Reinforce the non-occurrence of the undesirable behaviour

Remove the opportunity for positive reinforcement

Use strongly reinforcing stimuli, but use variety (e.g. praise, privileges)

Immediate reinforcement after the preferred behaviour

Start with reinforcing all the time, switch to intermittent

Encourage self-reinforcement through pride and a sense of self-control


Martin Seligman (Learned Helplessness)

He investigated the effects of exposure to uncontrollable shock on escape/avoidance learning in dogs

1/3 of dogs exposed to unavoidable shock failed to learn to avoid or escape from an unpleasant or aversive stimulus

first phase: Classical Conditioning
- shock paired with light
second phase: Operant Conditioning
- learn to jump when light is switched on to the other side of the box


Basic Principles of Learned Helplessness

Learned helplessness might explain behaviour after abuse and in depression

When the traumatic event first occurs it causes a heightened state of emotionality, which has been called "fear“

Fear continues until the subject learns that he can or cannot control the trauma

“If subject learns that he cannot control the traumatic event, fear decreases and is replaced with depression.” (Seligman, 1979)