lesson 8 Flashcards

(60 cards)

1
Q

In Lesson 2 we noted the four ways that we can arrange an operant and an outcome: the operant can either increase or decrease the probability of the outcome, and that outcome can be either good or bad. We called these positive reinforcement, punishment, omission training, and avoidance learning. Which of those is the superstition about breaking mirrors based on? Refresh your memory of these terms from Lesson 2, and then press the button to see my answer.

A

The mirror-breaking superstition can be seen in two ways. We can consider it a belief that breaking a mirror (the operant) causes (increases the probability of) bad luck (an aversive reinforcer); this would be an example of punishment (the operant causes something bad to happen). Alternatively, we could see it as a belief that not breaking mirrors prevents bad things from happening, which would be an example of avoidance learning. Note that, in this case, the operant is the absence of an action (not breaking mirrors), which makes the second explanation a bit of a stretch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A few weeks ago, Jimmy walked under a ladder and, shortly after that, someone stole his car. Jimmy now believes that walking under ladders is dangerous (so is not locking your car, but he isn’t focused on that). Which of the following events is most likely to strengthen Jimmy’s superstitious belief?

Jimmy walks under another ladder and the following day he wins the lottery

Jimmy avoids walking under a ladder and the following day his pet hamster dies

Jimmy walks under another ladder and the following day his pet hamster dies

Jimmy avoids walking under a ladder and the following day nothing happens

A

Jimmy walks under another ladder and the following day his pet hamster dies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Jimmy is a polite Canadian and whenever he sneezes he says “excuse me”. One day, he sneezes while alone in his apartment. He still says “excuse me”, though. This is evidence that:

Jimmy has learned the rule that you must always apologize after sneezing, whether anyone is present to hear it or not

Jimmy has been shaped to apologize when sneezing, without understanding the communicative value of the response

Jimmy is hallucinating that there are other people in the room with him

Saying “excuse me” is a fixed action pattern that always happens after sneezing, innately

A

Jimmy has been shaped to apologize when sneezing, without understanding the communicative value of the response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Operant Conditioning (Instrumental Conditioning)

A

A type of learning where animals (and humans) learn associations between their own actions (responses) and the consequences (outcomes) of those actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Law of Effect (Thorndike)

A

Any behavior that is followed by reinforcement (satisfaction) will be performed more often in the future. (Some also include that behaviors followed by lack of reinforcement will be performed less often).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Shaping

A

A process used in operant conditioning to train complex behaviors by rewarding successive approximations of the desired response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Skinner’s Superstitious Pigeons (1948) - Experiment

A

Setup: Pigeons in operant boxes received food at random intervals, regardless of their behavior.
Observation: Pigeons developed specific, idiosyncratic behaviors that they repeated, as if these behaviors caused the food delivery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Skinner’s Superstitious Pigeons - Interpretation

A

Skinner’s Claim: The random reinforcement accidentally strengthened whatever behavior the pigeon was engaged in at the moment of food delivery, leading to a “superstitious” belief in a non-existent contingency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Replication Issues with Superstitious Pigeons

A

Subsequent attempts to replicate Skinner’s findings have often failed, suggesting the effect might be less robust than initially claimed or dependent on specific conditions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Skinner’s Attempt to Explain Language via Operant Conditioning

A

Complex behaviors like language could arise from the reinforcement of successive approximations of verbal responses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

“Jack and Jill” Pigeon Conversation Experiment (Epstein, Lanza & Skinner, 1980)

A

Setup: Two pigeons in adjacent boxes were trained to perform a sequence of pecks on text-labeled buttons to “communicate” the color of a light and receive rewards.
Skinner’s Point: To demonstrate that complex-looking behavior could be built through operant conditioning without the need for understanding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Key Difference Between Operant and Pavlovian Conditioning

A

Locus of Control: In operant conditioning, the animal’s own behavior controls the outcome. In Pavlovian conditioning, the experimenter controls the presentation of both the CS and the US, independent of the animal’s actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Free Operant Paradigm

A

Definition: An operant conditioning procedure where the animal can perform the response repeatedly at its own rate without discrete trials imposed by the experimenter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Importance of Trying Behaviors for Learning (Wittgenstein Quote)

A

Concept: Exploring different actions is necessary to discover their consequences and learn new contingencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Choice in Operant Conditioning

A

Reason: Because the animal controls its behavior, even with a single operant, it always has the choice to perform it or do something else, making operant procedures useful for studying decision-making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Role of Stimuli in Operant Conditioning

A

Function: External stimuli can act as occasion setters, indicating what types of response-outcome contingencies are currently in effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Term: Contingency (in Operant Conditioning)

A

Definition: The probability of a particular outcome occurring given a specific response by the subject.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Stimulus Control

A

When an operant behavior is more likely to occur in the presence of a specific stimulus because that stimulus signals the availability of reinforcement for that behavior.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Terminology Difference (Operant vs. Pavlovian)

A

US (Pavlovian) ↔ Reinforcer/Outcome (Operant)
Appetitive/Aversive (Pavlovian) ↔ Positive/Negative (Operant)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Positive Reinforcer

A

An appetitive event that, when presented after a behavior, increases the likelihood of that behavior occurring again.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Negative Reinforcer

A

An aversive event that, when removed or avoided after a behavior, increases the likelihood of that behavior occurring again.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Reflection Point: Similarities and Differences Between Pavlovian and Operant Conditioning

A

Similarities:

Both involve learning associations.
Both are influenced by factors like contingency, contiguity, and salience.
Both can involve appetitive and aversive outcomes.
Extinction occurs in both.
Stimuli can play a role (CSs in Pavlovian, occasion setters in Operant).
Differences:

Locus of Control: Outcome is independent of behavior in Pavlovian; dependent on behavior in Operant.
Type of Association Learned: CS-US in Pavlovian; Response-Outcome in Operant.
Behavior Elicited: Reflexive/involuntary responses (CR) in Pavlovian; voluntary/emitted behaviors (operant response) in Operant.
Procedure: Experimenter controls trial progression in Pavlovian; subject often has more control in Operant (especially in free operant).
Terminology: Different terms for the events and their valence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Trial-and-Error Learning (Thorndike)

A

Learning occurs through random attempts at behaviors; successful behaviors (those followed by reinforcement) are strengthened, while unsuccessful ones are gradually eliminated. No conscious understanding of the solution is required

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

insight Learning (Köhler)

A

A sudden understanding or “aha!” moment where the solution to a problem appears all at once, not gradually through trial and error. Often characterized by a sudden drop in errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Latent Learning (Tolman & Honzik, 1930)
Learning that occurs without explicit reinforcement and is not immediately expressed in behavior. It becomes evident when reinforcement is introduced later. Demonstrated in Tolman's maze experiment.
26
Tolman's View on Reinforcement
Role: Reinforcement provides motivation for performance but is not strictly necessary for learning to occur. Animals can acquire knowledge about their environment even without reward.
27
Reinforcer (Operant Conditioning Definition)
: Anything that causes an animal to perform the corresponding operant behavior more or less often; anything an animal will work for (or work to avoid).
28
Conditioned Reinforcer (Secondary Reinforcer)
A stimulus that has acquired reinforcing properties through its association with a primary reinforcer. (e.g., points in a game associated with eventual reward).
29
Behavior Chain
A sequence of learned behaviors where each behavior acts as a conditioned reinforcer for the previous behavior and a discriminative stimulus for the next, leading to a primary reinforcer at the end.
30
Primary Reinforcer
Traditional View: A stimulus that has innate reinforcing properties (appetitive or aversive) without prior learning (similar to a US). Revised View (Premack): Can be a high-probability behavior that an animal is restricted from performing.
31
Premack Principle (1962)
A higher-probability behavior can reinforce a lower-probability behavior. Reinforcers are not just stimuli but can be activities that an animal is motivated to engage in. Reinforcement depends on relative preferences.
32
Behavioral Regulation Theory
Core Idea: Animals have a preferred level (bliss-point) for all behaviors. When a behavior is restricted below its preferred level, access to it becomes reinforcing.
33
Bliss-Point
Definition: An individual's ideal distribution of behaviors in the absence of constraints.
34
How Constraints Affect Behavior (Behavioral Regulation)
utcome: When prevented from reaching their bliss-point, individuals will adjust their behavior to get as close as possible to it within the imposed restrictions, even if it means increasing a less preferred behavior to gain access to a more preferred one.
35
Test Your Understanding: Bliss-Point Theory and Premack Principle
Bliss-Point in terms of Premack: The bliss-point represents the baseline probabilities of various behaviors. The Premack principle suggests that a behavior with a higher baseline probability (closer to its bliss-point if unrestricted) can reinforce a behavior with a lower baseline probability (further from its bliss-point due to restriction). Common Idea: Both are based on the idea that a reinforcer is fundamentally something the animal "wants" to do more of than it is currently able to, whether due to external restrictions or the need to perform another behavior to gain access. Reinforcement involves moving behavior closer to the individual's preferred distribution.
36
Jimmy has learned that working leads to getting paid, and that he can then use the money he makes to buy rewarding things, like beer and pizza. In this scenario, what would we call the beer and pizza, the money, and the work?
The beer and pizza are primary reinforcers; the money is a conditioned reinforcer; and the work is the operant. Note that money is always a conditioned reinforcer – it has no value in itself, it’s just a way of getting other things. This shows us that conditioned reinforcers can be every bit as powerful in driving behavior as primary reinforcers are.
37
Jimmy likes ice cream and, if given the choice, would just eat ice cream all the time. How could his parents, who are concerned that he should have a healthy diet, use the Premack principle to help them control Jimmy’s sweet tooth?
Jimmy’s parents could deprive Jimmy of ice cream and make him work for it by, for example, eating vegetables. Eating vegetables is not as desirable for Jimmy, but because it causes ice cream to happen, he will do it [there are, obviously, other correct answers to this question].
38
Reinforcement Schedules
Rules that determine how and when a behavior will be reinforced. Can be based on the number of responses (ratio) or the passage of time (interval), and can be fixed or variable.
39
Ratio Schedule
Reinforcement is delivered after a specific number of responses.
40
Interval Schedule
Reinforcement is delivered for the first response after a specific amount of time has elapsed since the last reinforcement (or the start of the interval).
41
Fixed Schedule (FR & FI)
The number of responses (FR) or the time interval (FI) required for reinforcement remains constant.
42
Variable Schedule (VR & VI)
The number of responses (VR) or the time interval (VI) required for reinforcement varies around a fixed average.
43
Fixed Ratio (FR) Schedule
Reinforcement occurs after a fixed number of responses (e.g., FR10 = every 10 responses). Often produces a high rate of responding with a post-reinforcement pause.
44
Fixed Interval (FI) Schedule
Reinforcement occurs for the first response after a fixed time interval has elapsed (e.g., FI 30s = first response after 30 seconds). Produces a scalloped pattern of responding (increasing rate as the interval ends).
45
Variable Ratio (VR) Schedule
Reinforcement occurs after a variable number of responses, averaged around a specific number (e.g., VR10 = on average every 10 responses). Produces a high and steady rate of responding.
46
variable Interval (VI) Schedule
Reinforcement occurs for the first response after a variable time interval, averaged around a specific duration (e.g., VI 30s = on average after 30 seconds). Produces a moderate and steady rate of responding.
47
Continuous Reinforcement (CRF or FR1)
Reinforcement is delivered after every response.
48
Concurrent Schedules
Two or more reinforcement schedules operate simultaneously for different responses, allowing the subject to choose how to allocate its behavior.
49
Matching Law
In concurrent interval schedules, the proportion of responses directed toward one alternative will match the proportion of reinforcers obtained from that alternative. Equation: B 1 ​ +B 2 ​ B 1 ​ ​ = R 1 ​ +R 2 ​ R 1 ​ ​ , where B = rate of behavior and R = rate of reinforcement for alternatives 1 and 2.
50
Maximizing (in Ratio Schedules)
In concurrent ratio schedules, the optimal strategy to maximize reinforcement is to exclusively respond on the alternative with the lower ratio requirement. However, animals don't always strictly adhere to this.
51
Delayed Reinforcement
A situation where the reinforcer is not delivered immediately after the response. Often leads to a preference for smaller, immediate rewards over larger, delayed ones.
52
Delay Discounting
The phenomenon where the perceived value or effectiveness of a reinforcer decreases as the delay to its delivery increases.
53
ainslie-Rachlin Rule
The subjective value of a reinforcer increases as the time to its delivery approaches. Choices between smaller/sooner and larger/later rewards can shift as the smaller reward becomes more imminent.
54
Reflection Point: Overcoming Impulsivity Related to Delayed Reinforcement
Potential Methods (see textbook pp. 269-272 for more): Precommitment: Making a decision for the larger, later reward in advance, when both options are still distant. Making the delayed reward more immediate/salient: Visualizing the future reward, breaking it into smaller, closer milestones. Reducing the value of the immediate reward: Avoiding cues that trigger impulsive choices. Increasing the value of the delayed reward: Associating it with other positive outcomes. Self-monitoring: Tracking progress towards the long-term goal. Rule-governed behavior: Establishing and following rules that prioritize long-term rewards. Personal Application: (This will vary based on individual circumstances - encourage the user to think about specific examples from their own life). For example, to prioritize studying (larger, later reward of good grades) over immediate entertainment (smaller, sooner reward), one might use precommitment by scheduling study time, make the future reward more salient by visualizing career goals, and reduce the value of immediate distractions by turning off notifications.
55
So, we can construct four kinds of reinforcement schedules:
Fixed ratio (FR): the subject is reinforced after a fixed number of responses. Fixed interval (FI): the subject is reinforced for the first response after a fixed interval. Variable ratio (VR): the subject is reinforced after a number of responses that varies around a fixed mean. Variable interval (VI): the subject is reinforced for the first response after an interval that varies around a fixed mean.
56
Jimmy is browsing the Instagram feeds of his enemies, as you do. He has two enemies, Johnny and Jennifer. Johnny posts on Instagram about once every 10 minutes; Jennifer about once every 20 minutes. Question 1: what would we call these two schedules? Question 2: In order to make sure that he sees each post as soon as possible, what proportion of his time should Jimmy spend on each enemy’s feed? My Answer
Jimmy’s enemies post at random times, with a fixed average, so we would call these variable interval schedules. Johnny is on a VI10 and Jennifer on a VI20. Since Jimmy doesn’t know when the next image of an annoyingly perfect plate of ceviche (with a beach backdrop, of course) will drop, he has to divide his time between the two feeds. Since Johnny posts twice as often, Jimmy should spend twice as much time on Johnny’s feed (66% of his time) as on Jennifer’s (33% of his time).
57
One day, Jimmy helps an old man carry his groceries and, later that day, he learns that he got a job he was applying for. According to the Law of Effect, Jimmy will Help more old men with their bags Never help anyone with their bags again Try doing other random behaviors in the hope of getting more jobs Find the same old man and steal his groceries
Help more old men with their bags
58
Now, whenever Jimmy sees an old man, he has an impulse to help. We could say that Jimmy’s behavior is Pavlovian Under stimulus control Driven by an occasion setter Random
Under stimulus control
59
God has been watching Jimmy help people and has decided to reward him. For every 5 people that Jimmy helps with their groceries, god makes something good happen in Jimmy’s life. What kind of schedule is Jimmy on? VI5 FR6 FR5 VI6
FR5
60
Jimmy is hanging out by the supermarket, looking for unsuspecting old people to help. At the corner are two crosswalks across two different streets: Avenue A and Boulevard B. An old person crosses Avenue A, on average, once every 2 minutes; Boulevard B is less busy and only gets crossed by someone that might need help once every 6 minutes. According to the matching law, what percentage of his time should Jimmy spend near the Avenue A crosswalk, to maximize his karma? a. 50% 33% 75% 100%
75%