learning part 4 Flashcards

(136 cards)

1
Q

⚙️ What Is Instrumental Conditioning?

A

Instrumental Conditioning (also called Operant Conditioning) is learning through consequences — that is, learning that a certain behavior leads to a specific outcome. It’s called instrumental because your behavior is the instrument (the tool) that produces a result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Elicited Behavior vs Instrumental Behavior

A

🌩 Elicited behavior

These are automatic responses that happen when a stimulus appears — you don’t choose them. Examples include:

Habituation: You stop reacting to something (e.g. stop noticing a ticking clock).

Sensitisation: You become more reactive to a repeated stimulus (e.g. a loud noise becomes more annoying).

Classical Conditioning: Like Pavlov’s dog — a bell (stimulus) makes the dog salivate because it predicts food.

🧠 These behaviors are:

“Triggered or prompted by a specific stimulus in the environment… automatic or involuntary.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Do the procedures… require the participant to make a particular response to obtain food or other USs (Unconditioned Stimuli) or CSs (Conditioned Stimuli)?”

A

“They do not require the participant to make a particular response.”

In other words, the stimulus triggers the behavior — the participant is not choosing a behavior to get a reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

🛠 Now Enter: Instrumental Conditioning

A

In instrumental conditioning, your behavior controls what happens.

Instead of something in the environment making you react (stimulus → response), you do something to make something happen (behavior → stimulus).

💡 Analogy:
Think of a vending machine:

You press a button → You get a drink.

Your action (behavior) causes the outcome (stimulus: drink).

If you do nothing, nothing happens.

🔄 Classical Conditioning (for comparison):
A bell rings (stimulus) → You salivate (response).

The environment acts on you, not the other way around.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

☕ Everyday Analogy of instrumental conditioning

A

“Putting a coin in a coffee machine = coffee.”

Your action (putting the coin) is instrumental because it produces a result (coffee).

You do this behavior because it worked in the past — this is learning through consequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

🧪 Thorndike’s Puzzle Boxes

A

Edward Thorndike studied how animals learn to get what they want through trial and error. He:

“Placed hungry animals (mainly cats) in puzzle boxes… with some food left outside in view of the animal.”

💡 Analogy: Imagine being locked in a room with snacks on the other side of a glass wall — you can see them but need to figure out how to unlock the door.

🎯 Goal for the Animal:
“Learn how to escape from the box to obtain the food.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

📦 Different Boxes, Different Tricks (Thorndike’s Puzzle Boxes)

A

“Different puzzle boxes required different responses to get out.”

Example:

Box A: Cat must pull a ring.

First time: 160 seconds to solve it.

Later: 6 seconds — it learns the trick!

“Box I: cats push a lever down.”

Each box was a new problem, and the cats learned through trial and error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

🧠 How Did They Learn? (Thorndike’s Puzzle Boxes)

A

“Initially, the cats showed random behaviours… but with continued practice… latencies became shorter.”

This means they got faster at escaping.

“Through trial and error… they retained successful behaviours and eliminated useless behaviours.”

💡 Analogy: Like trying different keys on a locked door — eventually, you find the one that works and remember it.

“Although Thorndike titled his treatise ‘animal intelligence’, many aspects of behaviour seemed unintelligent.”

That is — the animals weren’t reasoning; they were just trying stuff until something worked.!!!!!!!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

🔄 Reinforcement & Learning (Thorndike’s Puzzle Boxes)

A

“Behaviours that result in a positive outcome (escape) lead to an association between stimuli in a puzzle box and the effective response (pushing the lever).”

So:

See lever (stimulus)

Push lever (response)

Door opens (consequence)

Then:

“The consequence (escaping) reinforces this association.”

But it’s important to clarify:

“Not [that] the cat sees the lever and understands how it works.”

The cat doesn’t reason it out like a human. It just learns: “When I do X → I get Y.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Law of Effect (💡 Thorndike’s Discovery)

A

💡 Thorndike’s Discovery:
“Thorndike established the LAW OF EFFECT.”

The Law of Effect is like a rule of thumb that says:

If something good happens after a behavior, you’re more likely to do that behavior again.

If something bad happens (or nothing happens), you’re less likely to repeat it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Example of law of effect

A

“If an R (pressing a lever) in the presence of an S (lever) is followed by a positive event (escape), the association between the S-R becomes strengthened.”

🟡 Translated:

R = Response (e.g. pressing a lever)

S = Stimulus (e.g. seeing the lever)

Positive Event = Escape

So: If the cat presses the lever (R) when it sees the lever (S) and that lets it escape (reward), then it will remember this combo.

✅ “The R will be more likely to happen again the next time the S is encountered.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

🚫 Negative Event Weakens law of effect (response weakening)

A

if a R (reaching for the door-handle) in the presence of a S (the door-handle) is followed by a negative event (no escape), the association between the S-R becomes weakened.”

🟡 Translation:

If you do a behavior and nothing good happens, your brain learns: “this isn’t worth it.”

❌ “The R will be less likely to happen the next time the S is encountered.”

🧠 This is called response weakening. !!!!!!!!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Key Concept of law of effect

A

The animal learns a connection between seeing something (S) and doing something (R) — like seeing a lever (S) and pushing it (R).
What happens after (the consequence) doesn’t get remembered as a “goal,” but simply makes the S-R link stronger or weaker.

💡 It’s like the brain is saying:
“Whenever I see this thing and do that move, something good/bad follows. I’ll do it more/less next time.”

🧠 The animal isn’t thinking about the reason — it just gets better at repeating the action that worked.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

🧭 What Are Discrete-Trial Procedures?

A

“DISCRETE-TRIAL PROCEDURES can also be conducted in mazes similar to Thorndike’s puzzle box.”

These are experiments where the animal only has one chance per trial to perform a response.

Example:
“Rat begins in a start box and travels down a runway to the other end (the goal box) that has a reinforcer (e.g. food/water).”

💡 Analogy: Like a timed race where the rat runs from Start to Goal, and the prize is a snack at the end.

✅ “The trial is over once the rat has performed the instrumental response (i.e. reached the goal box).”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

🏃‍♂️ How ti measure Learning in Discrete-Trial Procedures?

A

We measure how well the animal is learning by looking at:

  1. Running Speed
    “How fast the animal gets from the start box to the goal box.” “It increases/decreases with repeated training trials.”

🟢 If learning happens → rat runs faster 🔴 If confused → rat runs slower

  1. Response Latency
    “Time taken to leave the start box and begin moving down the alley.” “It becomes shorter/longer as training progresses.”

🟢 If the rat has learned what to do, it starts quickly 🔴 If it’s unsure or unmotivated, it hesitates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

🧪 T-Maze & Complex Learning

A

A T-Maze is shaped like a “T”. The rat starts at the bottom and must choose to turn left or right.

“The goal is located in each arm of T, allowing the study of more complex processes.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

🐭 Experiment with Baby Rats in T-Maze

A

“They placed its mother in the right goal box, and another female rat in the left.”

Then:

“One trial consisted of putting the baby rat in the start box, and when the rat reached the goal box where the mother was, the trial was over.”

The reinforcer = being reunited with the mother.

“The rats learned to turn right with successive trials, and this behaviour continued even after the mother was not there.”

💡 Important: The rats learned the action, not just the mother’s presence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

🤯 What Did the Rats Learn in the T-Maze experiment?

A

“Stimulus: JUNCTION” (where the maze splits)

“Instrumental Response: TURN RIGHT”

“Reinforcing Outcome: TO MEET WITH ITS MOTHER.”

So when the baby rat sees the junction (stimulus), it learns to turn right (response) to meet mom (reward).

“When the baby rats saw the junction in the maze (the stimulus) they turned right (instrumental response), which led to the reinforcing outcome (the mother).”

And:

“The reinforcing stimulus made it more likely that the rat would turn right in the future.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Thorndike’s procedures (puzzle boxes and mazes)

A

“In Thorndike’s procedures (puzzle boxes and mazes), the animal only has the opportunity to show instrumental responses during specific periods of time: trials.”

Each learning opportunity is limited and controlled.

“The animal has limited opportunities to respond, and those opportunities are scheduled by the experimenter.”

This ensures precise measurement of how behavior changes over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Free-operant Procedures analogy

A

(Think: letting someone play with a video game without stopping them after each level)

B.F. Skinner revolutionized psychology by creating free-operant procedures, which differ from older, more rigid discrete-trial procedures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

🔑 Key Concept: Free-operant behavior

A

In discrete-trial setups (like a maze), the subject is removed after each trial—like doing one puzzle, stopping, and starting over.

In free-operant procedures, the subject is not removed after each trial, meaning they can behave continuously. It’s like leaving a rat in a video game world where it can keep playing.

The subject is not removed after each trial, which helps to study behaviour more continuously.

🐀 The animal is free to produce instrumental behaviour many times.

It can press a lever as much as it wants, whenever it wants.

This procedure is more natural since behaviour is continuous, one activity leads to another. Behaviour is not divided into units.

Imagine watching someone cook: first they wash vegetables, then chop them, then cook them—all in a flow. This is more natural than stopping after each step.

It allows the study of more continuous behaviour that can be broken into measurable units called operants.

Operant = a measurable action like pressing a lever to receive food.!!!!!!!!!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The Skinner Box (Operant Chamber)

A

(Lab tool for studying free-operant behavior)

Skinner box: allows the study of free-operant behaviour.

Inside the box:

There’s a lever (for rats) or a key (for pigeons).

When pressed or pecked, it gives food (reinforcer).

This lets scientists observe:

How often the animal performs the behavior

How long it takes to learn

What affects the rate of responding

Diagram:
Hungry rats → Push the lever → A pellet of food falls into the food cup

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is operant?

A

Operant = a measurable action like pressing a lever to receive food.!!!!!!!!!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Operant Response and Operant in The Skinner Box?

A

🔄 3. Operant Response and Operant
1. Operant response
= lever pressing or pecking a key
This is the action the animal performs to get the outcome.

  1. Operant
    = the modification of behaviour by the reinforcing or inhibiting effect of its own consequences.
    This is the whole learning process: the behavior changes because it produces a result (like food or no food).
  2. Same operant response = different behaviors with the same result
    Different behaviours that result in the same effect on the environment are considered the same operant response.

💡 If a rat presses a lever with its left paw, right paw, or nose—all count as one operant response, because the effect (food) is the same.!!!!!!!!!!!!!

“The important is not the muscles involved in performing the behaviour but how the behaviour operates in the environment (results).”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is Magazine Training?
(Think of it as training the animal to understand the "food machine") Magazine training: First step in teaching animals that food comes from a specific place, with a signal. Rat learns that food is delivered in a food cup via classical conditioning. The sound of the dispenser is repeatedly paired with food. Eventually, the sound itself elicits salivation (like Pavlov’s dog). Later, just hearing the sound makes the rat go to the cup to check for food: The sound elicits a sign-tracking response: going to the food cup and picking up the food pellet.
26
What is Response Shaping?
(Imagine teaching someone to shoot a basketball who has never seen one before) Response shaping means gradually guiding an animal (or person) toward the final behavior by rewarding tiny steps in the right direction. Here’s how it works: At first, you give food just for standing up on hind legs, no matter where in the box. Once the rat does that easily, you only give food if it rears near the lever. Finally, you only give food if it rears and presses the lever down. This way, you're shaping its behavior bit by bit, like sculpting a statue from a rough block. When the rat finally learns to press the lever correctly, you stop rewarding the earlier steps—otherwise, the rat might keep doing the easier things (like just standing up) and never learn the full task.
27
Analogies for Humans of Response Shaping
1. Learning sports: Teaching a child basketball: Reward any shot first → then shots that get closer → then only shots that go in. 2. Learning guitar: Reward any chord, even if bad → then chord transitions → then playing full songs smoothly. 3. Social anxiety: Reward attending a social event, even if anxious → then starting a conversation → then feeling comfortable in social settings. All these are examples of shaping through successive reinforcement.
28
Is it Teaching New Behaviors to press a lever for the rats?
No. We are teaching the rat how to combine familiar responses to create a new activity. Think of it like teaching a dance: the movements are familiar (step, turn), but the sequence is new.
29
Response Rate and Operant Behavior
As free-operant procedures allow the subject to produce continuous behaviour over long periods of time, operant responses are measured as a rate. In a free-operant setup (like the Skinner Box), animals can act whenever they want, so we measure how often they do the behavior. Response rate = frequency of instrumental responses per minute. Think of this like counting how many times a rat presses a lever in 60 seconds. High response rate → high probability the behavior will happen again. Low response rate → low probability the behavior will happen again. 🎯 Key Term: Response Rate = how often a behavior happens in a set amount of time.
30
Instrumental Conditioning in Real Life
In all instrumental conditioning procedures, the subject makes a response that produces an outcome. This means behavior is goal-directed—it does something because it leads to a result. Examples: Yelling at a cat for getting on the bed = trying to stop the behavior. Closing a window when it rains = avoiding discomfort. Not letting the car be used after curfew = punishing behavior. All of these are instrumental conditioning: behavior changes because of what it causes.
31
The 4 Core Procedures in instrumental conditioning:
Name: Positive Reinforcement What Happens: Add something good Example: Give a treat Effect: Increases behavior _______________________________ Name: Positive Punishment What Happens: Add something bad Example: Scold a child Effect: Decreases behavior _______________________________ Name: Negative Reinforcement (escape or avoidance). What Happens: Take away something bad Example: Stop a shock Effect: Increases behavior _______________________________ Name: Negative Punishment (omission) What Happens: Take away something good Example: No phone Effect: Decreases behavior _______________________________
31
Types of Outcomes in instumental conditioning
Instrumental outcomes differ in two ways: 1. Appetitive vs. Aversive = pleasant vs. unpleasant E.g. Reward vs. Punishment 2. Positive vs. Negative = present vs. absent E.g. Food vs. No food So: An outcome can be: Positive Appetitive (add something good = give a treat) Positive Aversive (add something bad = yell at someone) Negative Appetitive (take away something good = no dessert) Negative Aversive (take away something bad = remove a shock) Whether the outcome is appetitive/aversive and present/absent determines the probability (i.e., rate) of instrumental responding in the future. In short: Does the behavior go up or down depending on what happens afterward?
32
The Core Components of Instrumental Conditioning
All instrumental procedures include: Learning history – What the subject learned before. Instrumental response – The specific behavior (like pressing a lever). Outcome or contingency – The result (reward or punishment). Stimulus – The cue (e.g. a lever). Response-outcome relationship – Whether the action causes the outcome or not.
33
Positive Reinforcement?
Instrumental behaviour produces an APPETITIVE STIMULUS. = Doing the behavior gets you something pleasant. If the response appears, the appetitive stimulus is presented. If it does not appear, the appetitive stimulus is not presented. Example: E.g. a dog will be more likely to sit if you give him praise or a treat. ✅ This results in an INCREASED RATE OF RESPONDING.
34
Positive Reinforcement in a Skinner Box
When presented with a lever, the subject will be more likely to press it if it has learned that it is associated with an appetitive stimulus. In other words: Rat presses lever → gets food → presses lever more in future.
35
Negative Reinforcement?
Instrumental behaviour produces the absence of an aversive stimulus. = Doing the behavior removes something unpleasant. ✅ Results in an INCREASED RATE OF INSTRUMENTAL RESPONDING. Example: E.g. Banging on a wall makes a noisy neighbour quieter. You remove the noise (bad thing), so you're more likely to bang again in the future.
36
Negative Reinforcement with a Lever
When presented with a lever, the subject will be more likely to press it if it has learned that it is associated with the elimination of the negative stimulus (shock). Pressing the lever stops a shock → the rat learns to press it more often.
37
Key Terms:
38
Positive Punishment?
Instrumental behaviour produces an aversive stimulus. That means: The subject does something, and as a result, something unpleasant is added. This decreases how often the behavior happens in the future. Results in a DECREASED RATE OF RESPONDING. 📌 Example: Your boss criticises you for being late for a meeting. → The criticism is the aversive stimulus that was added, making you less likely to be late again. ✅ KEY POINT: “Positive” means something is added. “Punishment” means the behavior goes down.
39
Positive Punishment in a Skinner Box
When presented with a lever, the subject will be less likely to press it if it has learned that it is associated with an unpleasant stimulus (shock). So: Rat presses lever → gets shocked → presses lever less. 🧠 That’s positive punishment in action: Positive = shock is added Punishment = lever pressing decreases
40
Negative Punishment
Instrumental behaviour produces the absence of an appetitive stimulus. In other words: The subject does something, and something pleasant is taken away. Results in a DECREASED RATE OF RESPONDING. 📌 Example: Putting a child on time-out (eliminates having fun). They’re not given a new punishment, but they lose access to things they enjoy (games, TV, siblings, etc.). Purple Text Explained: The child is placed in a boring environment (no toys, no fun). They’re not being hurt or yelled at—instead, they’re being separated from positive reinforcement. We are eliminating the sources of positive reinforcement, such as playing with brothers or watching TV. ✅ KEY POINT: “Negative” means something is removed. “Punishment” means the behavior goes down.
41
Negative Punishment in the Skinner Box
When presented with a lever, the subject will be less likely to press if it has learned it leads to the absence of a positive outcome. So: Rat presses lever → food is not given → rat presses less. This is different from being shocked (positive punishment) because here, the rat is just missing out on something good (food).
42
Natural Link Between Behavior and Reinforcer
🔸 “A behaviour cannot be reinforced if it is not NATURALLY LINKED TO THAT REINFORCER.” This means: not all behaviors can be easily reinforced. For reinforcement to work, the behavior must already make some intuitive sense in relation to the outcome. Example: Thorndike’s cats: He tried to train cats to yawn to open a puzzle box and escape. Didn't work. Why? Because yawning and escaping from a box have no natural connection. Analogy: Imagine trying to get someone to wiggle their ears in order to unlock their phone. It’s not intuitive—so it’s hard to reinforce.
43
Stimulus Activates Behavior Systems
🔸 “The presence of a stimulus (e.g. food) will activate the behaviour system related to that stimulus (e.g. foraging and feeding).” A stimulus (like the smell of food) activates specific innate systems in the brain—sets of behaviors that evolved together. 🐹 Example (Hamsters): If hamsters are hungry, they don’t clean themselves much. Instead, they dig and search for food. 🔸 “Food deprivation in hamsters decreases the probability of self-care responses, like face washing, but increases the probability of environment-directed activities, such as digging, and scrabbling…” Conclusion: 🔸 “Self-care responses … are not part of the feeding system activated by hunger, whereas digging, scrabbling are.” Analogy: When you’re starving, you're not focused on brushing your hair. You're opening cupboards and looking for snacks. The system your brain activates depends on the stimulus.
44
Nature of the Reinforcer?
🔸 “The success of instrumental conditioning also depends on the nature of the REINFORCER.” --> 🔹 Whether a behavior is successfully learned depends on what kind of reward (reinforcer) you use. If the reward is something the subject really wants (good quality, right amount, and at the right time), then the behavior is more likely to be repeated. 🔸 “The quality and quantity of a reinforcer determine the success of positive reinforcement procedures.” Example: 🔸 “Picture a rat that gets a week’s piece of food after making one lever press. Such a large reinforcer is not likely to motivate frequent lever pressing.” Analogy: If your boss gave you a month's salary for just showing up one day, you’d stop working extra days! The reward was too big too soon, so you’re not motivated to keep trying.
45
Case Study with Boy with Autism (Trosclair-Lasserre et al., 2008)
A study (Trosclair-Lasserre et al., 2008) found: 🔸 Getting attention (like praise, hugs, tickles) worked well as a reward for a 5-year-old boy. 🔸 When the boy pressed a button (his action), he got attention for 10, 105, or 120 seconds (different lengths of reward). 🔸 As the experiment went on, he had to press the button more times to get the reward: 1, 2, 5, 10, 20, 30, and up to 40 presses. Key Insight: 🔸 The boy kept pressing more when the reward was longer (more attention). 🔸 Only the biggest rewards motivated him when it got harder. Analogy: If your parents give you lots of hugs and praise the first time you do the dishes, you’ll be more willing to keep doing them—even if they ask you to wash more dishes later.
46
What is Behavioural Contrast Effects?
🔸 A reward’s effect depends not just on how good or big it is, but also on what the person got before. 🔸 A big reward feels amazing if you're used to small ones, but a small reward feels disappointing if you're used to big ones. 🔸 This effect is called a behavioural contrast. Analogy: If you normally get €10 for cleaning your room, and now get €2, it feels unfair—even though €2 might have seemed fine before. What you expect changes how the reward feels.
47
Causal Relationship Matters
🔸 It doesn’t make sense to keep doing an action if it doesn’t actually cause the result you want. 🔸 You might use a lucky pen and get a good grade, but the pen didn’t cause the grade. 🔑 Key idea: Only actions that directly lead to a reward will be reinforced. Superstitions happen when your brain wrongly connects an action (like wearing a pen) with an outcome (like success), even when they’re unrelated.
48
Temporal vs. Causal Relations between a response and a reinforcer
🔸 There are two ways a behavior and reward can be linked: Temporal: the reward comes soon after the action Causal: the action is necessary for the reward 🔸 Temporal contiguity = reward happens right after the behavior 🔸 Contingency = the behavior must cause the reward 🔸 These two are separate things—you can have one without the other. Example: If you press a button and get candy right away = good contiguity If candy comes only when you press the button = good contingency You need both for strong learning, but they are not the same. 🎓 Example: Studying for a Test Contiguity (timing): You study and get praised by your teacher the next day → good contiguity You study but hear praise a month later → poor contiguity Contingency (cause-effect): If studying is the reason you got a good grade → good contingency If you didn’t study but still got a good grade → no contingency Conclusion: The best learning happens when studying both causes the reward (contingency) and the reward comes right after (contiguity).
49
Timing of Reinforcement
🔸 Getting a reward right after a behavior works better than getting it much later. 🔸 Dickinson et al. (1992): Rats kept pressing a lever if food came 2–10 seconds after pressing. But if the food came after 60 seconds, they stopped pressing. Analogy: If someone laughs at your joke right away, you feel encouraged. But if they laugh an hour later, it feels random—so you're less likely to tell the joke again.
50
Credit Assignment Problem
🔸 “This is likely due to CREDIT ASSIGNMENT:” Credit assignment is the brain's way of figuring out which action caused the reward.
51
Credit Assignment Problem (lever experiement)
What happens? 🔸 “During the experiment rats will be doing a lot of things in addition to pressing the lever.” In 60 seconds, the rat might sniff, scratch, walk, groom—lots of random stuff. 🔸 “In a 60-second period, the rat will not know which of the things it does produces the food, so will be less likely to attribute the reinforcer to the lever press.” If food comes too late, the rat gets confused—it can’t figure out what it did to earn it. 🔸 “Result → less instrumental responding.” If the rat doesn’t know what caused the food, it won’t bother pressing the lever again. Analogy: If you win money 1 minute after you clap your hands—but you also sneezed, blinked, and tied your shoe during that minute—you won’t know what earned you the money.
52
Contingency + Contiguity are BOTH Important
🔸 “If the reward comes 60 seconds after the action, the subject doesn’t learn the connection—so the behavior doesn’t happen again. Learning fails if the delay is too long.” Why? Because it’s hard for the brain to tell which action caused the reward after a long wait. 🔸 “This shows that for learning to happen, you need both: the action must cause the reward (contingency) and the reward must come quickly after the action (contiguity).”
53
Contingency + Contiguity are BOTH Important (box explenation)
Contingent = (“The instrumental behaviour causes the presentation of the reinforcer.”) The behavior makes the reward happen—there’s a clear cause-effect link. Contiguous = (“The instrumental behaviour is shortly followed by the reinforcer.”) The reward comes soon after the behavior—not delayed. Conclusion = (“The instrumental response must reliably produce the reinforcer, but the difference in time between the response and the reinforcer must not be too long.”) The action must consistently lead to the reward, and the reward must come quickly enough for the brain to connect them. Analogy: If studying causes you to pass (contingency) but you don’t get your results for 6 months (bad contiguity), it’s harder to learn from it.
54
Skinner’s Superstition Experiment (Role of Contiguity)
🔸 “Skinner’s superstition experiment illustrates the role of contiguity.” 🔸 “Pigeons in an experimental chamber received food periodically every 15s irrespective of what they were doing.” That means: the food came no matter what—not based on their behavior.
55
Skinner’s Superstition Experiment (Role of Contiguity) what happend?
🔸 “One bird was conditioned to turn counterclockwise, another pushed its head into a corner, another did a ‘tossing’ motion.” 🔸 “The pigeons thought they were controlling the delivery of the food…” But they weren’t. The food came automatically. 🔸 “What does this have to do with contiguity?” Because the food came right after a random behavior, the pigeons linked it to that behavior—even though it was not causal. _______________________________________ 🔸 “Contiguity is more important than contingency.” --> Meaning: “Timing matters more than actual cause—what happens right before the reward gets linked to it.” 🔸 “The fact the behaviour occurred just before the reinforcer was more important than whether it caused the reinforcer.” --> Meaning: “It’s more important that the behavior happens just before the reward than whether it actually caused it.” Skinner called this: “Accidental” or “Adventitious” reinforcement.!!!!!!
56
Effects of controllability of reinforcers
🔸 “When a response reliably causes/produces a reinforcer (high contingency), this means the response controls the reinforcer.” When your action clearly leads to the reward, you feel in control. 🔸 “Most studies on the controllability of reinforcers involve exposing animals to uncontrollable stressful events (shock), which produces learned helplessness.” _______________________________________ 🔸 “If a behavior consistently causes a reward (high contingency), it means the person or animal has control over the outcome.” When you know your actions make a difference, you feel in control. 🔸 “Many studies test this by giving animals stressful events they can’t control (like shocks), which leads to learned helplessness.”
57
🔵 What is Learned Helplessness?
🔸 “LEARNED HELPLESSNESS: It is a feeling that occurs after the person has experienced a tense state repeatedly. He/she believes that he/she is incapable to control or change the situation, so he/she does not even try.” Example: Imagine you try to escape a situation (like a loud noise) but nothing you do works. After a while, you stop trying altogether—even if someone later gives you a way out. 🔸 “LEARNED HELPLESSNESS: It is a feeling that occurs after the person has experienced a tense state repeatedly. He/she believes that he/she is incapable to control or change the situation, so he/she does not even try.” Example: Imagine you try to escape a situation (like a loud noise) but nothing you do works. After a while, you stop trying altogether—even if someone later gives you a way out.
58
Learned helplessness experiment
Learned helplessness is when someone (or an animal) gives up trying to escape something bad because they learned that their actions don’t help. 🔸 “Learned helplessness experiments involve 3 groups and 2 phases.”
59
Exposure Phase in Learned helplessness experiment
Group E (Escape): 🔸 “Exposed to periodic shocks that can be escaped by performing an instrumental response (e.g. pressing a lever).” ➡ These animals learn they have control. They press the lever → shock stops. Group Y (Yoked): 🔸 “Paired with a member of Group E and receives the same shocks but cannot control escaping the shocks.” ➡ Their shocks stop only when the Group E animal presses, not them. So their behavior has no effect. They learn they’re helpless. Group R (Restricted): 🔸 “Receives no shocks but is kept in the chamber just like the others (control group).” ➡ They don’t experience helplessness or learning about escape.
60
Conditioning Phase (Testing Escape Learning)
Now we test if the groups learned to escape: 🔸 “All groups receive escape-avoidance training (the animals go back and forth between compartments to avoid a shock).” 🔸 “Escape-avoidance is an instrumental response: going to the correct chamber produces absence of shock (negative reinforcement).” 📌 So now, behavior (moving chambers) = avoiding pain. Results: Group Y couldn’t learn how to avoid the shock like the other groups did (or they learned it much more slowly). 🔸 Being exposed to shocks they couldn’t control made it hard for them to learn how to escape later.
61
What Happened to Group Y?
🔸 “During the exposure phase, Group Y learn that they are not in control of shock: Their behaviour does not cause the absence of a negative outcome (no response-reinforcer contingency).” In simple terms: They learn: “Nothing I do works.” 🔸 “Learning that they were not in control of the shocks in the exposure phase stops them from learning they are in control in the conditioning phase...” Even when later they can escape, they don’t try, because they’ve learned to give up.
62
Why Escape Learning Matters
🔸 “This has important implications: Learning to predict when a bad thing will happen, or when it will end, may be just as helpful in reducing stress as learning how to escape the bad thing (by doing or removing behaviours).” Even if we can’t stop something bad, knowing when it will happen or end can help us cope better.
63
Ana’s Case (Real-Life Example)
🔸 “Ana is a 32-year-old woman… struggling with social isolation and feelings of hopelessness…” 🔸 “Since a young age, Ana has experienced rejection… affecting her self-esteem and confidence.” 🔸 “Now Ana believes she’ll never be able to make friends. She rejects all suggestions, saying ‘but…’ to everything—because of her learned helplessness.” In other words: She learned early that nothing she did helped her make friends. So now, she doesn’t try anymore, even though she could succeed if she gave it another shot.
64
🔹 What is a Schedule of Reinforcement?
Imagine you're teaching a dog to sit. Every time it sits, you give it a treat. But over time, you stop giving a treat every single time. Maybe you give it one every 2nd or 5th time instead. The pattern or rule you follow to decide when the dog gets a treat is called a schedule of reinforcement. 📘 Definition: A schedule of reinforcement is a program or rule that determines how and when a reinforcer (reward) follows a behavior (response).
65
🔹 Instrumental Conditioning Basics
Instrumental responses are actions that are performed to gain a reward. Think of: Studying hard to get a good grade. Texting a friend to get a reply. But here's the kicker: You don’t always get reinforced every time you do the action. 🔸 “You do not get a great grade on an exam each time you study hard.” 🔸 “Inviting a friend for dinner does not always result in a pleasant evening.” So in real life, rewards are unpredictable — just like life itself.!!!!!!
66
Reinforcement Isn’t Always Immediate
In Instrumental Conditioning Procedures, reinforcement does not follow every response. Example: “Friends don’t reply every time we text them.” “We don’t succeed at a video game every time we play it.” This means learning in the real world is about adapting to uncertain outcomes.
67
🔹 How Schedules Work Technically
Let’s think of reinforcement like a vending machine that sometimes gives you snacks based on different hidden rules. The reward (reinforcer) can be given in different ways: After a certain number of actions you perform — like pressing the button 5 times before it gives you something. That’s called a ratio schedule. After a certain amount of time has passed — like waiting 10 minutes after your first button press before it might give you a snack again. That’s called an interval schedule. Only when a specific signal is present — like the vending machine only works when a green light is on. This is a stimulus-controlled schedule (the presence of a specific cue). These different patterns shape and maintain behavior differently — meaning the way we learn and keep doing a task depends heavily on which schedule is being used. 🔹 Highlighted Sentence: “This means that schedules of reinforcement influence motivation of behaviour.” 🧠 Intuition: Think of a mobile game that gives you rewards at unpredictable moments — like random treasure chests. Even if you don’t win every time, the thrill of maybe getting a big prize keeps you playing. It’s the uncertainty that fuels your motivation to keep engaging.
68
Why Are Schedules So Useful?
Schedules of reinforcement are also useful because they help us measure how often a behavior naturally occurs before any reward is introduced — this is called the baseline. It’s like getting a sense of what “normal” looks like before trying to change it. Let’s use a different analogy: imagine you’re testing a new fertilizer on plants. Before adding anything, you’d first observe how fast the plants normally grow. That way, if the fertilizer works, you’ll notice a clear change from that starting point. Reinforcement schedules work the same way — they let psychologists see how behavior changes over time by comparing it to the original pattern of behavior. 🔹 Highlighted Sentence: “We first need to know what behaviour was happening before the event that caused the change.” 🧠 Intuition: Without a starting reference, you can’t know if something really improved. If a kid suddenly starts doing more chores, was it because of the new reward system? You can only tell if you knew how often they did chores before the rewards started.
69
Types of Reinforcement Schedules
1. Ratio Schedules (based on number of actions) 2. Continuous Reinforcement (CRF)
70
Reinforcement Schedule: Ratio Schedules (based on number of actions)
You get rewarded after doing something X number of times. Example: “You can be given chocolate for every 3–10 pages of a book you read.” Here’s an analogy: Think of getting one sticker after every few chores. Sometimes it’s after 3 chores, sometimes after 5. This is Variable Ratio Schedule.
71
Reinforcement Schedule: Continuous Reinforcement (CRF)
ou get a reward every single time you do the action. “Reinforcers are delivered after every response (1, 3, 5, 6...).” Example: Drug rehab programs: “Every negative drug test = voucher for money.” But here's the important note: “Continuous reinforcement is not common in real life because the world is not perfect.” You might: Push an elevator button and it doesn’t work. Turn the tap but no hot water comes out (if the heater is broken).
72
Continuous Reinforcement in Practice: Token Economies
These are structured reward systems (often for kids or behavior therapy). You earn “tokens” (stickers, stars, etc.) for doing a behavior (like finishing homework). Later, you exchange tokens for something fun (screen time, candy, etc.). Example: For a Child with Homework Trouble Finish homework ➜ get 1 token. Good behavior in other areas ➜ bonus tokens. End of the week ➜ exchange tokens for a reward. This helps teach: Motivation Goal-setting Delayed gratification “The child can save the tokens and later exchange them for a larger prize.”
73
🔹 Intermittent (Partial) Reinforcement
🔑 Definition: These are conditions where the response is only reinforced sometimes, not every time — also called PARTIAL or INTERMITTENT REINFORCEMENT. 🧠 Imagine you're playing a claw machine — it doesn't give you a plush toy every time you play. Sometimes you win, most times you don't. But you still keep playing, right? That’s intermittent reinforcement in action. 📘 Highlighted: “In intermittent (partial) reinforcement schedules the reinforcer doesn’t occur after every response.” Example: “Like playing fruit machines (casino games): each play won’t result in a reward.” “The reinforcement is intermittent and causes a euphoric response in the brain that in some circumstances can lead to gambling addiction.”
74
🔹 Intermittent Reinforcement in Relationships
In toxic relationships, affection, praise, or kindness is given inconsistently — mixed with coldness or abuse. You never know when you’ll get love or be ignored/yelled at. 📘 Highlighted: “Toxic relationships involve inconsistent provision of attention, affection, or praise, alternating between gratifying moments and periods of negativity or abuse.” This rollercoaster can make people more emotionally attached, not less, because they’re chasing the next positive moment — like a slot machine player hoping for a win. “The individual constantly **seeks gratification, experiencing euphoria when obtained and distress when withdrawn or turned negative.” This can lead to emotional addiction.
75
🔹 The Pigeon Experiment (Skinner Box) Intermittent Reinforcement
B.F. Skinner placed pigeons in a box with a useless lever (the lever didn’t affect when food came). Food was given at random intervals, regardless of what the pigeons did. 📘 Highlighted: “The reinforcer was delivered intermittently, independent of the pigeon’s behaviour.” “Results: The pigeons lost their health pushing all time the lever.” 🧠 Analogy: It’s like hitting the refresh button on a webpage over and over even though it has no effect — just because you once got lucky. That’s how powerful random rewards can be — even irrational behavior is reinforced.
76
Fixed-Ratio Schedules
In contrast to random rewards, fixed-ratio schedules give reinforcement after a set number of responses. 📘 Highlighted: “In fixed-ratio schedules the number of reinforcers received per number of responses is fixed.” Example: “E.g. 1 reinforcer for every 10 responses (learning jargon = FR 10).” 🧠 Think of: Getting paid after handing out 100 flyers → you work harder and faster to reach the fixed goal. Dialing a phone number → You need to enter all digits to complete the call. “Fixed ratios are often seen in jobs.” “Making a phone call also involves a fixed-ratio schedule.”
77
Continuous Reinforcement = A Special Case
“CONTINUOUS REINFORCEMENT schedules are a type of FIXED-RATIO schedule.” It’s a 1:1 ratio — reward every time a response happens. 📘 Highlighted: “The ratio is 1:1 (reinforcement is obtained after every response).” “Continuous reinforcement schedules result in STEADY AND MODERATE RATES OF RESPONDING.” 🧠 Analogy: Imagine you get a cookie every time you wash a dish — you’ll keep doing it, but not frantically. You’ll pause, do it, get a cookie, repeat. It’s predictable and leads to steady effort.
78
Response Patterns: Cumulative Records
“Steady responding is preceded by a brief pause.” Think of this: Before you start a new phone call, you pause to dial. But once you’re in the call, you talk steadily. 📘 Highlighted: “You are not probable to pause in the middle of a calling. There is a steady and high rate of responding once the behaviour starts.”
79
🔹 What Are Cumulative Records?
Cumulative records are like a behavior time-lapse — they track how often a behavior occurs over time. 📘 Highlighted: “Cumulative records show the total (cumulative) number of responses during a period of time.” Here’s how it works: A pen draws a line upward every time a behavior happens. Paper moves at a constant speed. The vertical height = how many times the behavior happened. 📘 Highlighted: “The pen moves up a ‘step’ vertically after each response.” “The total vertical distance → cumulative (total) number of responses.”
80
Interpreting Behavior Patterns on Cumulative Record
The figure shows how frequently a subject (like a rat) responds over time: No responding between times A and B: Like someone staring at a vending machine but not pressing any button yet. Steady rate of responding between B and C: The subject starts pressing steadily, like tapping a button at a normal pace. Increased rate of responding between C and D: They get more eager—faster tapping—probably anticipating a reward soon. Brief pause after D before resuming responding: A short break after a reward—like taking a breath after finally getting that candy. This figure is basically like a diary of effort over time—a “step” graph shows how active or inactive the subject was.
81
Fixed-Ratio Schedules in Action
In fixed-ratio schedules, there is normally a brief pause before responding starts again after each reinforcement. 🔁 Think of a coffee shop stamp card: After you get your free coffee (reinforcer), you might chill for a bit before starting to collect new stamps. This image shows steady responding (pecking) on a FR 120 schedule (one reinforcer per 120 pecks). Imagine you had to press a button 120 times to get a snack. You’d likely press consistently—until the snack is delivered. After each reinforcer, the pigeon briefly pauses, before a high rate of steady pecking resumes until the next reinforcer is delivered. This pause is called a post-reinforcement pause. Then comes the “ratio run”—a burst of work until the next treat.
82
post-reinforcement pause.?
Imagine you had to press a button 120 times to get a snack. You’d likely press consistently—until the snack is delivered. After each reinforcer, the pigeon briefly pauses, before a high rate of steady pecking resumes until the next reinforcer is delivered. This pause is called a post-reinforcement pause. Then comes the “ratio run”—a burst of work until the next treat.
83
Ratio Strain and Burnout
Ratio means how many times an action must be repeated to earn a reward. If you raise the fixed-ratio requirement (say from 120 actions to 150), the break the subject takes after getting a reward gets longer, even though their pace while working stays the same. 🧠 It’s like needing to do 150 pushups instead of 120 before getting to rest—you might delay starting because it feels harder. Big jumps (like going from 120 to 500) can cause frequent pauses mid-task, as the subject gets mentally or physically worn out. This is called ratio strain. If the strain gets too intense, the subject might stop altogether—just like you'd quit if your workout suddenly became impossible. So, if you increase the ratio too much, ratio strain can lead to complete burnout, where the subject basically thinks, “Forget it. Not worth the effort.”
84
Variable-Ratio Schedules (Unpredictable Rewards)
In variable-ratio schedules, the number of actions needed for a reward changes each time—it’s unpredictable. 📌 For example, you might get a reward after 10 tries, then 8, then 7. You never know exactly how many times you’ll need to act. Variable-ratio schedules show up in real life anytime you must put in effort, but the payoff comes at random intervals. 🎰 Like slot machines—every pull might be the jackpot, so you keep playing, hoping the next try will be it.
85
Post-Reinforcement Pauses in Variable Schedules
As the number of required responses is unpredictable, variable-ratio schedules are less likely to produce post-reinforcement pauses compared to fixed-ratio schedules. Why? Because you don’t know when the next win will come—so you keep going without taking a break.
86
Behavior Comparison - Variable vs. Fixed Ratio
As the number of required responses is unpredictable, variable-ratio schedules are less likely to produce post-reinforcement pauses compared to fixed-ratio schedules. In the figure, even though the VR schedule requires more responses than the FR schedule, it maintains a more steady rate of responding. The graph shows that fixed-ratio behavior looks choppy (pause-work-pause), while variable-ratio behavior is smoother and more continuous.
87
Training a Dog — Using Ratios in Real Life
At first, you give the dog a treat every time it sits (this is continuous reinforcement). Then you gradually increase the number of times the dog must sit before getting a treat. 📌 You might give the dog a treat after 2 correct sits, then after 3, then after 1, etc. Since the dog never knows exactly when it will receive a treat, it keeps performing the behavior to increase its chances. Variable ratio schedule of reinforcement tends to be very effective in reinforcing behaviors and maintaining them over the long term. This is why gambling is addictive and why smart dog trainers don’t always give treats—they make dogs “work” for uncertain rewards.
88
Are there any differences between variable reinforcement and intermittent reinforcement?
Variable Reinforcement means you get a reward after a random number of actions or a random amount of time. This uncertainty makes the behavior very strong and long-lasting, because you never know when the reward is coming—so you keep trying. Intermittent Reinforcement is the bigger category—it just means rewards come only sometimes, not every time. Variable reinforcement is one type of intermittent reinforcement, but not all intermittent reinforcement is variable. 📌 Think of it like this: All squares are rectangles, but not all rectangles are squares.
89
Intermittent Reinforcement the broader category
“Intermittent reinforcement is the bigger category.” Imagine “intermittent reinforcement” as a genre of movies—like "action movies." Now, “variable reinforcement” is a subgenre within that—like "superhero action movies." So all superhero action movies are action movies, but there are also other kinds of action movies (spy, war, car chase, etc.). 👉 So: Intermittent reinforcement = any reward that happens only sometimes, not after every behavior. Variable reinforcement = a specific kind of intermittent reinforcement where the reward comes at random intervals or after a random number of responses.
90
Intermittent Reinforcement vs. Continuous Reinforcement
🔁 Continuous Reinforcement "Reinforcement is delivered after every response emitted." 🧼 Analogy: Imagine a vending machine. You put in a coin → you get a snack. Every single time. The behavior (inserting coins) is always reinforced. Good for learning a behavior fast. But… it’s easy to “extinguish” (stop the behavior) once the reinforcement stops. (If it eats your coin once, you stop trusting it.) 🔄 Intermittent Reinforcement "Reinforcement is delivered after some, but not all, of the responses emitted by the individual." 👨‍🏫 Analogy: Imagine your teacher gives you praise only sometimes when you answer questions. Not always. Just occasionally. Takes longer to learn. But once learned, the behavior is more resistant to extinction (because you keep hoping reinforcement might happen next time). 📌 KEY POINT: "All variable reinforcement is intermittent. But not all intermittent reinforcement is variable." Think of "variable" as random and "intermittent" as not every time.
91
"All variable reinforcement is intermittent. But not all intermittent reinforcement is variable." Think of "variable" as random and "intermittent" as not every time.
Intermittent = You don’t get rewarded every time. Sometimes yes, sometimes no. Variable = You don’t know when the reward will come. It’s random. So: 🎰 Slot machine → Variable (random) AND Intermittent (not every time)  ✅ This is a variable reinforcement → and yes, it’s also intermittent. 🗓️ Weekly paycheck (if you only get it after doing something) → Intermittent (not after every task), BUT predictable (always after 1 week)  ✅ This is intermittent, but ❌ not variable → it’s fixed.
92
🎲 Variable Ratio (VR) - (Ratio Schedules)
"The reinforcer is delivered after a varying number of responses." "Higher rate of responding. Less resistant to extinction." 🧠 Definition: Reinforcement is given after a random number of actions, but with an average. 🎰 Example: Slot machines You don't win every time you play, but you do sometimes, unpredictably, which motivates you to keep trying. 📱 Example: Checking your phone There aren't always notifications, but sometimes there are, reinforcing the habit of checking constantly. 🛠️ Analogy: You’re drilling for oil. Sometimes you strike after 3 holes, sometimes after 10, but you know it averages out. This uncertainty keeps you digging like mad. Leads to high response rates. Harder to extinguish than fixed schedules—people keep going, hoping the next time is the one.
93
🧾 Fixed Ratio (FR) - (Ratio Schedules)
"Reinforcement is delivered after a specific number of responses have been emitted." 🧁 Example: A bakery gives you a free pastry after you buy 10. Always 10. 📏 Analogy: It’s like a punch card—once you know it takes 10 stamps to get a free coffee, you keep buying until you hit 10. Quick bursts of behavior before reinforcement. May pause briefly after reinforcement (you wait before starting over).
94
Interval Schedules ⏲️ Fixed Interval (FI)
"Response is reinforced only if the response occurs after a certain amount of time has passed." "The amount of time that has to pass before a response is reinforced is constant from one trial to the next." 🧼 Example: Washing machine You can open the door as many times as you want, but it only gives clean clothes after the cycle ends. 📅 Example: Weekly paycheck You’re only rewarded (paid) after a fixed interval of one week has passed—regardless of how much you worked. 🕊️ Pigeons and Pecks In a fixed-interval 4-minute schedule: "PECKS BETWEEN MINUTE 0 AND 3:59 WOULD NOT PRODUCE A FOOD REWARD." "Pigeons learn to wait until close to the end of a fixed interval (4 minutes) before they start pecking." 📈 Key behavior pattern: "As the time for the availability of the next reinforcement draws closer, the response rate increases." 🛠️ Analogy: Think of students who only study right before the test. The test (reinforcement) happens at regular intervals, so behavior (studying) increases near those deadlines.
95
"The interval determines only when the reinforcer becomes available, not when it is delivered." "In order to receive the reinforcer, the subject must do the instrumental response." 🧠 Think of this like a paycheck being ready at 5 PM Friday (reinforcement becomes available) but you still have to go pick it up (instrumental response).
96
VARIABLE-INTERVAL SCHEDULE
📖 Definition: "The amount of time that passes between a reference point (e.g. start of trial/previous reinforcer) and response that can produce a reinforcer varies between trials." 🧠 Simplified: This means you only get a reward after some time has passed, but that amount of time changes every time. 📱 Example: "Checking your phone will only result in reinforcement (a new message) after a certain amount of time has passed since you last looked at your phone, and this time period is variable and unpredictable." 🧠 Analogy: Imagine your phone messages are on a mystery timer. You could check it every second, but that wastes energy. So you learn to check it now and then, hoping the next time gives a reward (a message).
97
🐦 Pigeon Experiment (VARIABLE-INTERVAL SCHEDULE)
"The 1st food pellet would be available via pecking after at least 1 minute, the 2nd after 3 minutes, the 3rd after 2 minutes..." The average time between reinforcers is 2 minutes, so it's a VI 2-minute schedule. 🧠 Imagine: Pigeon pecks, but food doesn’t come immediately—it has to wait a bit. But how long? It changes every time, so the pigeon just keeps pecking steadily over time. "Like variable-ratio schedules, variable-interval schedules maintain steady (although moderate) response rates with few regular pauses." That means pigeons learn not to pause too much, but also not to go crazy fast—just a steady rhythm.
98
🔁 Fixed-Ratio & Fixed-Interval (comparing)
"Both produce a post-reinforcement pause, as well as high response rates just before the delivery of the next reinforcer." 🧠 Think of a student: Fixed-ratio: You get a reward after 10 questions. So after finishing 10, you rest briefly before starting again. Fixed-interval: You know the quiz is every Friday. You study like crazy on Thursday night = high response rate before reinforcement.
99
🔄 Variable Schedules comparing: Both variable-ratio and variable-interval schedules
"Both variable-ratio and variable-interval schedules maintain steady response rates and few predictable pauses." No predictable moment = no reason to stop. So: You don’t cram last minute. You just keep going at a steady pace.
100
PIGEON A vs. PIGEON B – Reynolds (1975) experiment
Setup: Pigeon A: Variable-Ratio (reward after random number of pecks) Pigeon B: Variable-Interval (reward after random time passed) But! The environment was controlled so both got the same opportunities for reward. "When pigeon A was one peck away from getting food, it was also made available for pigeon B." So in theory, they had the same chance. But here’s the twist... 💡 RESULT: "Pigeon A responded at a much higher rate than Pigeon B." "The variable-ratio schedule motivated more vigorous responses." Why? Because... 🧠 Motivation Logic: "Pigeon A will peck as fast as he can to get the reward as soon as possible." The faster he pecks, the faster the reward. But... "Pigeon B hasn’t learned that the faster he completes the requirement, the sooner he’ll get the reinforcement." Because... he can’t. In interval schedules, it doesn't matter how fast you go—the reward only appears after enough time.
101
Human Behavior Implication
"The difference between ratio and interval schedules has implications for human behavior (MOTIVATION)." "Bosses may get more productive employees for the same salary if they are paid according to how much work they do (ratio) rather than on the last day of the month (interval)." 🧠 Imagine two jobs: You're paid €10 per report you write. (Ratio) → You’ll write faster. You're paid €1000 at the end of the month, no matter what. (Interval) → You’ll chill until the deadline.
102
Schedule: Fixed Interval (FI) What It’s Based On: Time Is Timing Predictable?: ✅ Yes Behavior Pattern: Burst of work before time is up _____________________________________ Schedule: Variable Interval (VI) What It’s Based On: Time Is Timing Predictable?: ❌ No Behavior Pattern: Steady, moderate response rate _____________________________________ Schedule: What It’s Based On: Is Timing Predictable?: Behavior Pattern: _____________________________________ Schedule: Fixed Ratio (FR) What It’s Based On: Number of actions Is Timing Predictable?: ✅ Yes Behavior Pattern: Fast work with short breaks _____________________________________ Schedule: Variable Ratio (VR) What It’s Based On: Number of actions Is Timing Predictable?: ❌ No Behavior Pattern: Very fast, steady response (like gambling) _____________________________________
103
From Simple Instrumental Behaviour to Real-Life Decisions
Key idea: Before this, you’ve been learning about simple instrumental behaviours, where one type of response is reinforced — like pressing a lever and getting a treat. 🧠 Instrumental behaviour = “If I do X, I get Y.” But real life is more complicated. You don’t just do one thing — you choose between alternatives. 🟣 Real life = constantly choosing between alternatives Should we stay at home or go to the cinema? If we stay home, what should we watch? Will we watch it to the end or switch channels? 💡 Analogy: Imagine you’re on Netflix. Every second, you’re making choices: Watch the current show? Pause it and scroll? Open TikTok instead? Your brain is juggling rewards (fun, satisfaction) that are linked to different actions, and your choice depends on which one seems better — that's choice behaviour.
104
What is Concurrent Schedules: How the Brain Learns to Choose
In lab settings, we use concurrent schedules to study this. Key Idea: Different responses are linked to different reinforcers on different schedules. Imagine you're a rat in a Skinner box: There’s a lever and a food dispenser. Pressing the lever gives food every 10 seconds (if you press). Pressing the dispenser gives food every 30 seconds. 💬 “The rat must choose which response to make based on its history of reinforcement and current conditions.” 🧠 Analogy: You’re on two apps: Instagram gives you a dopamine hit every 10 seconds of scrolling. Reddit gives you a dopamine hit every 30 seconds, but with better posts. Which one do you spend more time on? That depends on your learning history and current mood — exactly what the rat is doing.
105
Concurrent Schedule Procedures: The Pigeon and the Keys
In experiments: A pigeon can peck two keys. Each key follows a different reinforcement schedule. These schedules run at the same time = concurrent. Example: Left key: VI 60 → Variable Interval 60 seconds = get reward on average every 60 seconds. Right key: FR 10 → Fixed Ratio 10 = get reward every 10 pecks. 🧠 Analogy: Left key is like waiting in line for coffee: on average, you get it every minute. Right key is like taking 10 steps to the fridge and getting a soda every time. The pigeon is free to switch. That freedom = choice.
106
Measuring Choice Behaviour: Response Rates
To understand the pigeon’s choice, we count how often it chooses each option. "RATE OF RESPONDING for left alternative = how many responses were for the left." 🧠 Think of this like counting: How many times did you open Instagram vs. Reddit? Relative rate of responding is calculated as: Relative Rate (Left)= Total Left Responses / Total Left + Total Right Responses ​If equal: 10/20 = 0.5 If more left: 15/20 = 0.75 If more right: 5/20 = 0.25
107
Let’s say: You eat 15 chocolate bars and 5 fruit bars. Now we want to know: "What fraction of your snacking was chocolate?" You ate a total of 15 + 5 = 20 snacks. So, the chocolate proportion is: So: If you eat them equally (10 + 10), chocolate = 10/20 = 0.5 If you eat more chocolate (15 + 5), chocolate = 15/20 = 0.75 If you eat less chocolate (5 + 15), chocolate = 5/20 = 0.25
108
Reinforcement Rates: Which One Actually Pays More?
Rate of reinforcement: how often each action gives a reward. Same formula: Relative Rate of Reinforcement (Left)= Reinforcers from Left / Reinforcers left + right If both keys give 50% of total rewards → rate = 0.5. 🧠 Analogy: Imagine you get snacks from either mom (left) or dad (right). You calculate: Who gave you more treats? Then decide who to hang around more based on past snack success. ​
109
The Matching Law: How Animals (and Humans) Decide
"The Matching Law describes what happens when two alternatives are not reinforced according to the same schedule." It says: "The rate of responding on an alternative matches the rate of reinforcement." In other words: If 40% of the rewards come from the left, the subject will make 40% of their responses to the left. 🧠 Analogy: You're a freelancer with two clients: Client A pays you 40% of your total income. Client B pays you 60%. You naturally divide your time to match: Spend 40% of your time on Client A. 60% on Client B. Even pigeons do this! It's deeply built into how behaviour is shaped by consequences.
110
❓ What motivates instrumental behaviour?
We’re asking: “Why does someone (or a rat) actually do something like pressing a lever?” Obvious answer: To get something good (reinforcer) or avoid something bad (punishment). But psychology doesn’t stop at the obvious. So researchers study it using two perspectives: Associative structure of instrumental conditioning and Response-allocation approach
111
Associative structure of instrumental conditioning
(also called the molecular perspective) This asks: “How are stimuli, responses, and outcomes linked in the moment?” 🧠 Analogy: Imagine you’re trained like a spy. You notice: When you see the red light (stimulus)... You press the button (response)... And a secret door opens (outcome). You form an S–R–O association: Stimulus (context) Response (your action) Outcome (what happens)
112
Response-allocation approach
(also called the molar perspective) Instead of focusing on a single moment, this looks at the big picture: “How does this behaviour fit into your long-term goals and how you manage your time or energy?” 🧠 Analogy: Imagine your day as a budget of 24 hours. You "spend" time between studying, relaxing, eating. The molar perspective asks: “Why did you allocate 2 hours to gaming and only 1 to studying?”
113
🧠 Environmental Context Matters in motivation in instrumental behaviour
“Instrumental responding involves more than just a response and a reinforcer: THE ENVIRONMENTAL CONTEXT.” When you do something, it’s always embedded in a situation. This is crucial. 📱 Example: Sending a text You don’t just move your thumbs. You’re holding the phone, seeing the screen, sitting in your room, maybe petting your dog. 🚗 Example: Turning a key in the ignition You’re in the car, gripping the key, hearing traffic, feeling the seat. So the brain learns associations between responses and all the environmental cues.
114
🧩 The S–R–O Model
“Instrumental responding involves three events: contextual stimuli (S), instrumental response (R), and response outcome (O).” The outcome (O) isn’t just a reward — it plays a supporting role: “O only serves to strengthen or weaken the S–R association.” In other words: Getting food (O) after pressing the lever (R) while hearing a tone (S) makes it more likely you'll press again in that same context. 💡 It’s like O is the coach whispering: “Nice job — keep doing this when you see that!”
115
🔁 Habits
“Habits are things we do automatically without thinking.” Psych research says: About 45% of human behaviour is habitual. 🧠 Analogy: Ever grab your phone without thinking? Drink coffee at the same time every day? That’s S–R learning — your brain has learned “when X, do Y” without thinking about the outcome anymore.
116
Two-Process Theory (Rescorla & Solomon, 1967)
This theory explains how emotional associations can motivate behaviour. There are two systems of learning: Pavlovian (classical) conditioning: Learn that a stimulus predicts an outcome → e.g., a tone means food. Instrumental conditioning: Learn that a response leads to an outcome → e.g., pressing lever = food. But here’s the trick: they interact. “The S–O association activates an emotional state/reward expectancy.” 🎧 If a tone signals food, the animal gets excited when hearing it. This emotion boosts motivation to press the lever. If the tone signals a shock ⚡️, the animal gets anxious and presses less.
117
Pavlovian-Instrumental Transfer (PIT)
This experiment tests the interaction of Pavlovian and instrumental learning. 3 Phases: Instrumental conditioning Rat learns: press lever = get food Pavlovian conditioning Rat learns: tone = food Transfer test You present the tone during lever pressing. What happens? If tone predicts a good outcome, pressing increases. If tone predicts something bad, pressing decreases. 🧠 It’s like hearing the oven timer ding (CS) while you’re cooking. It motivates you to grab the food faster.
118
⛓ Restriction = Motivation
“Instrumental behaviour is motivated by restricting an organism’s natural behaviour.” Organisms (like rats or humans) have natural tendencies — things they would do all the time if nothing stopped them (like eating or chilling). But when these are restricted, motivation increases. 🧠 Analogy: You’re most motivated to eat chocolate after a diet.
119
Response Allocation Theory
Reinforcer is not a thing (like a ball), it’s a behaviour (like playing with the ball). 🧠 Imagine this: A child doesn't just love a ball. They love what they do with it: play football. So what reinforces them isn’t the object — it’s the activity it allows.
120
Consummatory-Response Theory
“It’s not the nature of the response that is reinforcing, but the behaviour it elicits.” In other words: Reinforcement comes from the behaviour itself, especially natural, species-typical behaviours like eating or drinking. This flips the old idea: Old view: special rewards are magic New view: normal behaviours (like eating, grooming) are powerful reinforcers because they’re biologically meaningful
121
What is the The Premack Principle?
“High-probability behaviour can reinforce low-probability behaviour.” Let’s break it down: Example: Hungry rat In a natural setting: Very likely to eat food (H = high-probability behaviour) Not likely to press a lever (L = low-probability behaviour) But if you set up the rule: “You can only eat if you press the lever first…” …then the rat will start pressing the lever more! Key Applications: Eating (H) can reinforce lever pressing (L) Drinking sweet water or running in a wheel (both high-probability behaviours for rats) can also be used to reinforce pressing a lever So it’s not just what the reinforcer is — it’s how likely the behaviour is naturally.
122
🧪 Premack’s Experiment with Children
Premack gave children two options: → Eat candy or play pinball. He observed which activity each child naturally chose more often. This let him measure which behaviour was more likely for each individual. What he found: Some kids preferred eating candy, while others liked playing pinball more. Then came Phase 2: the reinforcement test. In one setup, eating candy (high-probability behaviour) was only allowed after the child played pinball (low-probability behaviour). “The children must play pinball to obtain candy.” 🧠 Analogy: Imagine a kid who loves video games (H) but dislikes chores (L). If they must do chores first to play games, the games act as a motivator. Key finding: Only the kids who already liked candy more than pinball showed increased pinball playing — meaning candy reinforced pinball. “Performing L (pinball) before H (candies) = reinforcement of L (pinball).” 🔑 So, the Premack Principle means using a behaviour someone already enjoys to reinforce a behaviour they usually avoid.
123
What is Response Deprivation?
📌 Key idea: “In instrumental conditioning procedures, a high probability of reinforcing behaviour is maintained by restricting access to reinforcement.” If you can do something whenever you want, it becomes less motivating. 🧠 Analogy: If your favorite dessert is always in the fridge, it’s no big deal. But if it’s only available once a week, it becomes special — and you’re willing to work for it. “Rats will only eat when they are hungry: they will not eat on cue if they have free access.”
124
What is 📘 RESPONSE-DEPRIVATION HYPOTHESIS?
“Restricted access to reinforcers is critical for motivating instrumental responding.” Here’s the big twist: “Low-probability behaviours can motivate instrumental responding, provided there is restricted access.” Wait… low-probability behaviours can be reinforcing?? Yes! If they’re restricted enough. 💡 Analogy: Imagine you hate going to the library, but if you haven’t been allowed to for a week, it might start to seem appealing. This goes beyond Premack, because Premack says only high-probability behaviours can reinforce.
125
🧪 Johnson et al. (2003): Testing Response Deprivation
Case: A child who didn’t like either filing or tracing letters. Normally, neither would reinforce the other. But when access to tracing letters was restricted... “Restricting access to tracing letters activity could reinforce filing letters activity.” Even though both were low probability, restriction alone created motivation! “This result is contrary to the Premack principle.” That means: Premack isn't always right. Response Deprivation is more basic: it can override how probable a behaviour is.
126
🌟 What is the "bliss point"?
“Every situation offers various response chances.” Imagine you can choose how to spend time: Run in a wheel Drink Eat Press a lever You naturally allocate your responses in a way that feels just right — that’s your behavioural bliss point. “Bliss point is the ideal combination of activities that maximizes student well-being.” 🧠 Analogy: You like 60 minutes of social media for every 15 minutes of studying. That’s your bliss point.
127
⚖️ Imposed Instrumental Contingency
“Imposed instrumental contingency = time on Facebook must equal time studying.” Now you don’t get to freely choose. To access your favorite thing (e.g. Facebook), you must first do something less appealing (e.g. studying). “Bliss point is impossible to achieve without compromise.” 🧠 Analogy: If your ideal balance is 60 min Facebook + 15 min study, but the rule is 1:1, now you need to study 60 minutes to get 60 minutes of Facebook. That’s not your bliss point anymore.
128
📊 Real-life application of Imposed Instrumental Contingency
“Once the instrumental contingency is starting, the student cannot watch TV for 60 minutes and study for 15 minutes.” They must shift behaviour: Watch 60 mins TV → Study 60 mins Want to study only 15 mins → TV for just 15 mins 🧠 This approach modifies behaviour by changing the ratio between preferred and less-preferred behaviours.
129
What is Behavioural Economics?
✅ What is Instrumental Responding? Think of instrumental responding as doing something to get something. If you study to get good grades, that’s instrumental. If a rat presses a lever to get food, same thing. 🧠 Analogy: Imagine you're working part-time to buy concert tickets. Your instrumental behavior is working; your reinforcer is the ticket. Now, why you choose that action depends on: Alternative reinforcers – Are there other fun things to do instead of work? Netflix? Hanging out? Relation to the reinforcer – Is hanging out with friends as satisfying as the concert? Cost of alternatives – Is watching Netflix free while tickets are expensive? “This has been studied in depth in behavioural economics.” This field mixes psychology and microeconomics to understand how people choose between rewards and how behavior changes based on cost, availability, and preference.
130
✅ What is Instrumental Responding?
Think of instrumental responding as doing something to get something. If you study to get good grades, that’s instrumental. If a rat presses a lever to get food, same thing. 🧠 Analogy: Imagine you're working part-time to buy concert tickets. Your instrumental behavior is working; your reinforcer is the ticket.
131
Consumer Demand & the Demand Curve
Key terms: Demand curve: Shows how much people buy at different prices. Elasticity of demand: How sensitive buying is to price changes. ✅ Understanding the Curves: Curve A (steep drop): Highly elastic – like candy. If candy gets expensive, people stop buying. Curve C (flat): Inelastic – like petrol. People keep buying even if the price rises. So: “The more candy costs, the less you will buy...” “Gasoline is much less elastic because people continue to buy it even if the price increases.”
132
Linking Behavioural Economics to Learning
Instrumental responding can be explained the same way. You can model responses like economic decisions! ✅ Definitions: Responses/time spent = effort or cost (like money). Reinforcer = reward/product. Response requirement/time interval = price. Schedule of reinforcement = pricing plan (e.g., buy 10 coffees get 1 free = Fixed Ratio 10). “Price of the reinforcer is determined by the schedule of reinforcement.” 🧠 Analogy: If you need to press a lever 50 times for 1 snack (FR50), that’s like paying 50 euros. As the "price" goes up (more responses required), you might stop "buying" (responding).
133
Johnson & Bickel (2006) Study (smoking)
✅ Setup: In a study, smokers had to choose between 3 levers: One gave 5 cents One gave 25 cents One gave 3 puffs of a cigarette The effort to get each reward kept increasing (from FR3 to FR6000 — meaning you had to press more and more times). Researchers watched to see when people gave up on each reward. 5 cents had high elasticity – people stopped pressing early. Cigarettes had low elasticity – people kept pressing even when it got really hard. 🧠 Analogy: Picture having to climb stairs for a reward: A few steps for a coin More steps for a bill A whole flight for a smoke if you're addicted You'd stop climbing quickly for the coin, hesitate for the bill, but go all the way for the smoke if it's something your brain really craves.
134
Availability of Substitutes
“More available options = higher demand elasticity.” This means if there are alternatives, people switch easily when price increases. ✅ Examples: Newspapers: Now you have news apps, YouTube, etc. → people drop newspapers quickly if price goes up. Cinema: Competes with Netflix/HBO, so people are sensitive to ticket price hikes. 🧠 Analogy: You’re thirsty. If there's only one vending machine, you’ll pay €3. But if there are 10 shops around, you won’t pay more than €1. That's elasticity due to available substitutes.
135
Income and Time as Moderators
“Income level: the higher your salary, the less affected you are by price increases.” Rich people don’t care much if Netflix raises its price. They have lower demand elasticity because the cost means less to them. “In instrumental contingencies...” If you have more time or energy, you’re less affected by the effort required to get a reward. 🧠 Analogy: If you have all day off, walking 10 minutes for coffee isn’t a big deal. If you're on a 10-minute break, you’ll skip it.