Task 6 - Instrumental Conditioning Flashcards

1
Q

Operant conditioning

A

process whereby organisms learn to make or refrain from making certain responses in order to obtain or avoid certain outcomes
example: Thorndikes puzzle box

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Reinforcement

A

this process of providing an outcome for a behaviour that increases the probability of that behaviour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

when deciding whether paradigm is operant or classical

A
  • -> focus on the outcome
  • when the outcome happens regardless –> classical
  • when the outcome only happens by chance (if one does something) –> operant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Free-operant paradigm

A

animal could operate the apparatus freely, whenever it chose (f.e. when Thorndike added a return ramp to his puzzle box)`

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Discrete trials paradigm

A

trials were controlled by the experimenter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Skinner box

A

he devised the cage – with a trough in one wall through which food could be delivered automatically

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cumulative recorder

A

A learning curve drawn by a pen that moves across a roll of paper at a steady rate, increasing its vertical height by a fixed amount for every response of an organism, such as a lever press by a rat in a Skinner box or a peck by a pigeon of an illuminated plastic key – f.e. odometer in the car

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Discriminative Stimuli

A

stimuli that signal whether a particular response will lead to a particular outcome
–> they help the learner discriminate or distinguish the conditions where a response will be followed by a particular outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Discriminative Stimuli

A

stimuli that signal whether a particular response will lead to a particular outcome
–> they help the learner discriminate or distinguish the conditions where a response will be followed by a particular outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Shaping (??)

A

in which successive approximations to the desired response are reinforced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Chaining (–> backward chaining)

A

technique in which organisms are gradually trained to execute sequences of discrete responses

  • related technique to shaping
  • -> sometimes more effective to train the steps in the reverse order
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Chaining (–> backward chaining)

A

technique in which organisms are gradually trained to execute sequences of discrete responses

  • related technique to shaping
  • -> sometimes more effective to train the steps in the reverse order
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Reinforcer

A

is a consequence of behavior that leads to increased likelihood of that behavior in the future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Primary reinforcers

A

they are of biological value to the organism, and therefore organisms will tend to repeat behaviors that provide access to these things
- examples: Food, water, sleep, the need to maintain a comfortable temperature, and sex

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Drive reduction theory (Clark Hull)

A

proposed that all learning reflects the innate, biological need to obtain primary reinforcers
–> complication: primary reinforcers are not always reinforcing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

secondary reinforcers

A

reinforcers that initially have no biological value, but that have been paired with (or predict the arrival of) primary reinforcers (can be as strongly encouraging as primary enforcers)
– Example: money

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Token economies

A

often used in prisons, psychiatric hospitals, and other institutions where the staff has to motivate inmates or patients to behave well and to perform chores such as making beds or taking medications

  • tokens function in the same way as money does in the outside world
  • Animals as well will work for secondary reinforcers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

negative contrast:

A

organisms given a less-preferred reinforcer in place of an expected and preferred reinforcer will respond less strongly for the less-preferred reinforcer than if they had been given that less-preferred reinforcer all along
– F.e. the monkey that throws the cucumber because it is the less preferred food, once he saw the grapes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Punishment

A

the process of providing outcomes for behaviour that decrease the probability of that behaviour – the response decreases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Punishers or negative outcomes

A

common punishers for animals include pain, confinement, and exposure to predators (or even the scent of predators)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Four most important factors that determine how effective the punishment will be

A
  1. Punishment leads to more variable behaviour.
  2. Discriminative stimuli for punishment can encourage cheating
  3. Concurrent reinforcement can undermine the punishment
  4. Initial intensity matters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Differential reinforcement of alternative behaviors (DRA)

A

A process – rather than delivering punishment each time the unwanted behaviour is exhibited, it’s possible to reward preferred, alternate behaviours

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Reinforcement schedules

A

the rules determining when outcomes are delivered in an experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Timing affects learning

A

Normally, immediate outcomes produce the fastest learning

Delays undermine the punishments effectiveness, and may weaken learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Response consequence delay
the longer one waits to punish something/someone the less the association will be made between the punishment and the ...
26
Self-control
an organism’s willingness to forego a small immediate reward in favor of a larger future reward
27
Positive (reinforcement)
positive does not mean good → instead it means added
28
Positive reinforcement
the desired response causes the reinforcer to be added to the environment
29
Positive punishment
an undesired response causes a punisher to be added to the environment
30
Negative reinforcement
behaviour is encouraged (reinforced) because it causes something to be subtracted from the environment -- over time the response becomes more frequent -- sometimes called avoidance training
31
Negative punishment
something is subtracted (negative) from the environment, and this subtraction punishes the behavior -- sometimes called omission training
32
Negative (reinforcement)
negative does not mean bad, it means subtraction in a mathematical sense
33
Reinforcement / Punishment Positive / Negative (Definition)
the terms reinforcement and punishment describe whether the response increases (reinforcement) or decreases (punishment) as a result of training. the terms positive and negative describe whether the outcome is added (positive) or taken away (negative)
34
Partial reinforcement schedules
patterns in which an outcome follows a response less than 100 percent of the time -- Example: Becky has to clean her room seven days in a row to obtain her weekly allowance (seven responses for one reinforcement)
35
Four types of partial reinforcement:
1. Fixed-ratio (FR) schedule 2. Fixed-interval (FI) schedule 3. Variable-ratio (VR) schedule 4. Variable-interval (VI) schedule
36
1. Fixed-ratio (FR) schedule
In operant conditioning, a reinforcement schedule in which a specific number of responses are required before a reinforcer is delivered; for example, FR 5 means that reinforcement arrives after every fifth response
37
Postreinforcement pause
In operant conditioning with a fixed-ratio (FR) schedule of reinforcement, a brief pause following a period of fast responding leading to reinforcement. It just happens -- the animal takes a break -- the longer the organism is doing the response the longer the pause will be
38
2. Fixed-interval (FI) schedule
an FI schedule reinforces the first response after a fixed amount of time
39
3. Variable-ratio (VR) schedule
a VR schedule provides reinforcement after a certain average number of responses --> as a result, there is a steady, high rate of responding even immediately after a reinforcement is delivered, because the very next response just might result in another reinforcement
40
4. Variable-interval (VI) schedule
a VI schedule reinforces the first response after an interval that averages a particular length of time -- VI schedules tend to produce higher rates of responding than FI schedules (more reinforcing than the fixed-ratio) The interval schedules are better than the ratio
41
Concurrent reinforcement schedules
in which the organism can make any of several possible responses, each leading to a different outcome -- Linked to behavioural economic --> how they use their time and resources
42
Matching law of choice behaviour
the principle that an organism, given a choice between multiple responses, will make a particular response at a rate proportional to how often that response is reinforced relative to the other choices
43
Behavioural economics
the study of how organisms allocate their time and resources among possible options -- economic theory predicts that each consumer will allocate resources in a way that maximizes her “subjective value,” or relative satisfaction. (in microeconomics, the word utility is used instead of subjective value.) the value is subjective because it differs from person to person Pigeon could either get a reinforcer after a minute or two pellets after 2 min
44
Bliss point
the particular allocation of resources that provides maximal subjective value to an individual - Changes depending on context
45
Premack principle
The theory that the opportunity to perform a highly frequent behavior can reinforce a less frequent behavior; later refined as the response deprivation hypothesis. - - Example: if you have been studying for several hours straight, the idea of “taking a break” to clean your room or do the laundry can begin to look downright attractive - - Rats want to run on their wheel
46
Response deprivation hypothesis
a refinement of the Premack principle stating that the opportunity to perform any behaviour can be reinforcing if access to that behaviour is restricted → want something because you can’t have it
47
Basal ganglia
collection of ganglia (cluster of neurons) information from the sensory cortex to the motor cortex can also travel via this indirect route One part of the basal ganglia is the dorsal striatum -- which can be further subdivided into the caudate nucleus and the putamen
48
dorsal striatum
receives highly processed stimulus information from sensory cortical areas and projects to the motor cortex, which produces a behavioral response -- Plays a critical role in operant conditioning, particularly if discriminative stimuli are involved Rats with lesions of the dorsal striatum can learn operant responses (e.g., when placed in a skinner box, lever-press R to obtain food O). But if discriminative stimuli are added (e.g., lever-press r is reinforced only in the presence of a light sd), then the lesioned rats are markedly impaired -- similar to people that have a disruption to the striatum due to Parkinson’s disease or huntington’s disease → the dorsal striatum appears necessary for learning SD → R associations based on feedback about reinforcement and punishment
49
Orbitofrontal cortex
appears to contribute to goal-directed behavior by representing predicted outcomes - - receives inputs conveying the full range of sensory modalities (sight, touch, sound, etc.) and also visceral sensations (including hunger and thirst), allowing this brain area to integrate many types of information; - - outputs from the orbitofrontal cortex travel to the striatum, where they can help determine which motor responses are executed
50
?? is this also right ??
First projects from the sensory cortex (stimulus) to → the orbitofrontal cortex (prediction) → then to the basal ganglia (SD → R association)→ then to the striatum (motor learning)→ then to the motor cortex (reaction)
51
wanting and liking in the brain
later studies identified that rats would work for electrical stimulation in several brain areas, including the ventral tegmental area (VTA)
52
Ventral tegmental area (VTA)
a small region in the midbrain of rats, humans, and other mammals -- produces dopamine (wanting something) -- can stimulate the VTA to get same effect as a reinforcer
53
"pleasure centers"
some researchers inferred that the rats “liked” the stimulation, and the VTA and other areas of the brain where electrical stimulation was effective became informally known as “pleasure centers.”
54
Anhedonia hypothesis
the incentive salience hypothesis proves this wrong -- that wanting and liking is the same thing and that dopamine is for both
55
Hedonic value
the subjective “goodness” of a reinforcer, or how much we like it
56
Motivational value
meaning how much we “want” a reinforcer and how hard we are willing to work to obtain it
57
Incentive salience hypothesis
The hypothesis that dopamine helps provide organisms with the motivation to work for reinforcement -- states that the role of dopamine in operant conditioning is to signal how much the animal “wants” a particular outcome—how motivated it is to work for it
58
Endogenous opioids
brain chemicals that are naturally occurring neurotransmitter-like substances (peptides) with many of the same effects as opiate drugs
59
how do "wanting" and "liking" interact
Possible way that the two brain systems (of liking and wanting) interact: differences in the amount of endogenous opioid released, and in the specific opiate receptors they activate, may help determine an organism’s preference for one reinforcer over another
60
Pathological addiction
a strong habit that is maintained despite harmful consequences addiction may involve not only seeking the “high” but also avoiding the adverse effects of withdrawal from the drug. in a sense, the high provides a positive reinforcement, and the avoidance of withdrawal symptoms provides a negative reinforcement—and both processes reinforce the drug-taking responses
61
Behavioural addictions
are addictions to behaviour, rather than drugs, that produce reinforcements or highs, as well as cravings and withdrawal symptoms when the behaviour is prevented -- Perhaps the most widely agreed-upon example of a behavioral addiction is compulsive gambling
62
Detoxification
taking a different drug instead -- like drinking alcohol free beer
63
Extinction
if response R stops producing outcome o, the frequency of r should decline
64
Distancing
avoiding the stimuli that trigger the unwanted response
65
Differential reinforcement of alternate behaviours (DRA)
reinforce yourself for example with a spa day if you didn’t use the drug or punish yourself if you did use the drug
66
Delayed reinforcement
whenever the smoker gets the urge to light up, she can impose a fixed delay (e.g., an hour) before giving in to it
67
most effective treatments
combine cognitive therapy (including counseling and support groups) with behavioral therapy based on conditioning principles—and medication for the most extreme cases
68
Protestant ethic effect (NOT sure if this is right)!!!
delusional -- would rather work for their food then get it freely -- you think that you do something for an effect-- vs habit slip
69
Reward prediction hypothesis (ask about this again)
the firing of dopamine
70
Reward prediction hypothesis (ask about this again)
the firing of dopamine -- the phasic activity of dopaminergic neurons in the midbrain signals a discrepancy between the predicted and currently experienced reward of a particular event