Lecture 6 - Operant Conditioning Flashcards Preview

COGS 101B Exam 1 > Lecture 6 - Operant Conditioning > Flashcards

Flashcards in Lecture 6 - Operant Conditioning Deck (52):

Operant condition

(also known as instrumental conditioning and trial-and-error learning)

is associating a voluntary
behavior (‘operation on the environment’) with an outcome.

some action the animal chooses to do is associated with an outcome


Law of Effect

Animals learn that a behavior (or class of similar behaviors)
predicts a particular outcome and seek the outcome by performing a particular behavior

Behaviors with good outcomes increase; behaviors with bad outcomes decrease.

(Thorndike, 1911)


Discrete trial paradigm


Cat opens the puzzle box and is reinforced with food reward.

Cat learned that flipped the switch was responsible for it getting out and getting food: So that escape behavior becomes more likely (and faster) in the future.

discrete because every-time the cat got out that was one trial and for each new trial the cat had to be put back in


B. F. Skinner

free-operant paradigm

refined Thorndike's method to allow the animal to respond repeatedly

==> allowed the animal to control the rate of responding ==> animal controls when they get the reward (food)

• SKinner Box


Skinner Box

little contraption, everything was automated: counted the number times the lever was pressed, counted the number of times the reward was provided

made it easy to measure this activity over time

instead of recording trials you're recording behaviors over time

• Behaviors could be automatically recorded
in a Skinner box – count number of behaviors and outcomes.



reinforcing behavior: giving reward for every time the rat presses the lever

the amount of responses goes up



it keeps pressing the lever but no food comes

if you stop reinforcing the behavior then the behavior starts to go away

amount of responses decreases


Basic elements of the free-operant paradigm:

• discriminative stimulus (S)

• behavioral response (R)

• outcome (O)

S --> R --> O


Through repeated trials, the animal learns that the outcome is contingent upon

the appropriate response.


discriminative stimulus (S)

that helps you select
the appropriate behavior (e.g. rat can see the lever).

the animal has to be able to ID something in the environment that it's operating on


behavioral response (R

or class of similar responses,

is performed in response to the stimulus (e.g. rat pushes lever with either paw).


outcome (O)

follows that either reinforces or punishes the behavior (e.g. rat gets food, good outcome).



Outcomes that increase the likelihood of the behavior

primary reinforcers

secondary reinforcers


primary reinforcers

meet some innate need (e.g. food, water, sleep, and sex).

Note that these are not always reinforcing (i.e.
you won’t work for water if already satiated).


Secondary reinforcers

have no intrinsic value, but predict or are associated with primary reinforcers (e.g. money, good grades, gold stars, etc.).

something by itself has no value but through some kind of association it's learned that this other thing is valuable



Outcomes that decrease the behavior

primary punisher

secondary punisher


Primary punisher

Pain (shock), nausea, loud noises, social disapproval (?), loss of freedom (jail).

basically just aversive things


Secondary punisher

Monetary fines, demerits, bad grades, etc.


You are about to press a button on your iClicker. When
you see that you got the correct answer to the question,
that acts as a ______________.

Secondary reinforcer


positive (+) conditioning

If an outcome/consequence is added, if you're given an outcome as a result of your behavior

this has nothing to do with “good” or “bad.”


negative (-) conditioning.

If an outcome/consequence is removed, something is taken away

this has nothing to do with “good” or “bad.”


Positive reinforcement

when you want to increase the behavior (reinforce) and you do it positively

animal rewarded for doing a behavior --> given something to make the behavior more likely

response increases (reinforcement)+ consequence is added (positive)


Negative Reinforcement (escape/avoidance)

response increases (reinforcement) + consequence is removed (negative)

want the behavior to increase but take something away ==> if you do something I want, I'll take away a "bad thing"


Positive punishment

Response decreases (punishment) + Consequence is added (positive)

when you don't want a behavior and you add something (electric shock)


Negative punishment (omission)

Response decreases (punishment) + Consequence is removed (negative)

I don't want you to do a behavior so I take something away (money, privileges, etc...)

"No more T.V. for you!"


Positive reinforcement example

Eat all your vegetables --> get some dessert.

"do something I want you to do and I give you something"


Positive punishment example

Scratch the couch ==> get sprayed with water;

tease your sibling ==> parental scolding.


Negative Reinforcement example

Shut off the alarm clock (aversive stimulus) ==> removal of an aversive stimulus;

- arm does flailing motion
- next morning you're more likely to make that same movement
- reinforcement of behavior that takes away an aversive stimulus

take ibuprofen ==> reduce a headache.
- next time you have a headache you're more likely to grab that medicine again
- reinforces that behavior of taking the medicine
- you're not getting anything, something is being taken away (an aversive stimulus - a headache)


Negative punishment example

Commit armed robbery ==> loss of freedom (jail).


timing and context in operant conditioning

are critical for forming the association.

critical for how effective it's going to be


If the outcome is delayed....

... the association is not learned as well.

So, punishing your dog for something it did an hour ago is probably not very effective…


any kind of reinforcement to be effective needs to come

almost immediately


Reinforcement schedules

(i.e. how often you get the outcome)

how providing an outcome, on what timing, how frequently, how reliably, how that can affect the rate at which the associations are learned.

how often and how reliably you get the outcome: going to affect the rate of learning and the effectiveness over time


continuous reinforcement

When you get a reward after every behavior:

every time the rat presses the lever it gets a reward: no break in the reward: everytime you perform the action you get the outcome


partial reinforcement

anything that isn't a continuous reinforcement schedule

the outcome follows
less than 100% of the time


variable-ratio schedule

A powerful form of partial reinforcement schedule

steep learning curve: if you don't know when it's coming you just keep banging away at the lever

you don't get the outcome every time, but you get it about every 5 or 10 times - but you can't predict it (unknown) – the exact timing can’t be predicted.

gambling!!! - the payout is variable

most effective and has the highest curve of learning


fixed ratio

every fifth time you perform the action you get the reward

rats: 5 responses and a pause (a plateau)


Sheldon gave Penny chocolate each time she did
something to please him. What kind of paradigm is this?

Positive reinforcement


Sheldon sprayed water on Leonard when he disagreed.
What kind of paradigm is this?

Positive punishment

Sheldon wants Leonard to do it less (punisher) and he is adding something (positive)

something is being added to the situation and he wants him to not perform that behavior again


Reinforcers and punishers can be

equally effective at
producing behavior in laboratory conditions (controlled conditions); however, punishers can experience problems in the real world.


Problems with punishers?
(in the real world when you can't control those discriminative stimuli or timing as easily)

1. If you punish a behavior, you may encourage
cheating/circumvention. (“Don’t’ speed” becomes “Don’t get caught speeding”.)

2. Concurrent reinforcement may undermine the punishment. (Student punished for talking in class may be reinforced with
approval by other students.)

3. Punishment can lead to more variable behavior. (If a specific behavior is decreased, what replaces it?)
- if you punish a child for jumping on the couch, then they may start jumping on the bed (doesn't get rid of the class of behaviors)

4. The initial intensity of the punishers needs to be fairly intense (otherwise you may get habituation).

5. Punishment can lead to stress and anxiety, which is associated with other undesirable behaviors. (creates states that aren't conducive for encourage the behaviors you want)


How do animals get trained to do complex (and sometimes stupid) things?

• You can’t simply reinforce a complex behavior as it may not be done accidentally.

• Use chaining (chained learning) to create a series of reinforced behaviors that build on each other (start with something simple and keep adding one step at a time till you get something that looks much more complex)

squirrel on waterskis

S (See platform) --> R (Stand on platform) --> O (Food reward)

S (See handle) --> R (Stand on platform + place paws on handles) --> O (Food reward)


Operant vs. classical conditioning

Classical conditioning
• Passive: environment works on animal.
• UStimulus evokes a
• Animal learns that the CS predicts the US.
• Typically simple associations.

Operant conditioning:
• Active: animal operates on environment.
• A behavioral response
produces an outcome.
• Animal learns that behavior predicts an outcome.
• More flexible and powerful, producing more complexity.

However, the two often work together (e.g. primary and secondary reinforcers can become associated classically).


Evaluating situations to ID what kind of conditioning or paradigm

is it passive or active?

what's being associated?

the more complex a behavior the more likely it's operant conditioning


Brain-based models for operant conditioning

any instance of operant conditioning involves the interaction of several neural systems.


Law of effect

origins of operant conditioning

states that animals make associations between voluntary behaviors and
contingent outcomes.


_____ make a behavior more likely



_____ make a behavior less likely



Both reinforcers and punishers can be due to

intrinsic preferences
(primary) or learned associations with intrinsic preferences (secondary).


When you add something to the outcome (give a treat or shock), that is



When you take away something (pain or freedom),
that is



_____ may not always be as effective as reinforcers in the real world, but are equally effective in the lab.