learning and behaviour Flashcards
weeks 1-3 (64 cards)
What is classical/Pavlovian conditioning?
Learn that a specific event (US) follows a
signal (CS)
Responses made in anticipation or
preparation for even
What is instrumental and free operant learning?
Learn that a response (R) in a specific context (Sd) produces a specific outcome (Sr)
Responses made to generate or avoid outcome in that context
What do CS, US, CR, UR, Sd, Operant, Rf mean?
CS - Conditioned Stimulus: This is a stimulus that initially does not trigger a specific response but comes to do so after being associated with an unconditioned stimulus.
US - Unconditioned Stimulus: A stimulus that naturally and automatically triggers a specific response without prior conditioning.
CR - Conditioned Response: This is the learned response to the conditioned stimulus that occurs after the conditioned stimulus and unconditioned stimulus have been paired.
UR - Unconditioned Response: This is the unlearned response that occurs naturally in reaction to the unconditioned stimulus.
Sd - Discriminative Stimulus: A stimulus in the presence of which a particular operant response is reinforced.
Operant: This refers to an active behavior that operates upon the environment to generate consequences.
Rf - Reinforcer: Any event that strengthens or increases the frequency of a behavior that it follows.
what is effective reinforcement?
three important factors for reinforcement/punishment which need to be considered for effective behavioural change:
Immediacy/Contiguity
* The consequence should occur soon after reaching
a target or goal.
Contingency
* The consequence should occur reliably after
reaching a target or goal, and access to it at any
other times should be limited.
Value
* The consequence should be valuable and/or
meaningful to you
What is the difference between positive reinforcement, negative reinforcement, positive punishment, negative punishment?
positive reinforcement: behaviour causes a positive outcome, behaviour increases
negative reinforcement: behaviour avoids negatiive outcome, behaviour increases
positive punishment: behaviour causes punishment, behaviour decreases
negative punishment: behaviour prevents a good outcome, behaviour decreases
What are Discrete trial vs Free Operant procedures?
Discrete trial: single trial, measured using objective Dvs such as time or errors.
Free operant procedure: rat placed in situation, makes ‘right’ response and is rewarded
Outline some limitations associated with discrete trial procedures
when subject can respond is constrained
one response and one reinforcer per trial
handling stress
outline what is meant by B = kR
The rates of the response is
proportional to the rate of
reinforcement
B (rate of behaviour) = k (slope constant) R (rate of reinforcement)
outline the different schedules of reinforcement
- how often is a behaviour rewarded in operant conditioning?
Fixed ratio - behaviour is reinforced after a number of responses.
Variable ratio - behaviour is reinforced after an unpredictable umber of responses. No pauses, as reinforcement is unpredictable = this reinforcement scheduele used in gambling and highly resistant to extinction
Fixed interval - reinforces behaviour after a fixed amount of time, not based on responses. Responses are slow after reinforcement but increase as the interval approaches
Variable interval - behaviour is reinforvement after an unpredictable amount of time.
Name the 4 types of reinforcers
Primary Reinforcers: These are naturally reinforcing because they satisfy basic biological needs or drives. Examples include food, water, and sleep. They are inherently valuable and do not require learning to be effective.
Secondary Reinforcers (or Conditioned Reinforcers): These do not satisfy biological needs but are effective through their association with primary reinforcers. Money is a common example; it is not inherently valuable but can be used to obtain primary reinforcers.
Activity Reinforcers/Premack principle: These involve access to activities that are inherently enjoyable. For instance, playing a video game or watching a favourite TV show can serve as an activity reinforcer.
Token Economies: This system involves symbolic reinforcers (tokens) that can be exchanged for other reinforcers, primary or secondary. It’s commonly used in settings like classrooms or therapeutic programs where tokens can be earned for specific behaviours and later exchanged for privileges, items, or activities.
What is the premack principle?
The Premack Principle is a falls under the category of activity reinforcers in behavioural psychology. It is often used to increase the likelihood of a target behavior by using a more desirable activity as a reinforcer.
The Premack Principle states that a more probable behavior can reinforce a less probable behavior. In practical terms, this means that an activity a person is more likely to do can be used as a reward to reinforce an activity they are less likely to do.
Outline some issues associated with primary reinforcers and secondary reinforcers
primary reinforcers:
Heavily dependent on motivational state
* Satiety– Or very contextual
* Status is culturally determined, utility is situation
specific– Suffer from poor contiguity
* High transactional costs, slow to deliver, can interfere
with ongoing behaviour
secondary reinforcers:
– Must be established via classical conditioning
* Expensive and time consuming– Can extinguish or be counter-conditioned
outline advantages of activity rewards and token economies
activity rewards
Cheap and intrinsic;– Usually good for those who find usual
rewards uninteresting
token economies
success with Sz patients
contiguity (can usually be given immediately, low cost and don’t interfere with behaviour)
value - universal reinforcers cater to individual taste, not subject to satiety
what are universal reinforcers with an example
Universal reinforcers are a subset of secondary reinforcers that are widely effective across many different individuals and situations. For example, money is typically considered a universal reinforcer because it does not satisfy a biological need directly but is highly effective in reinforcing behavior due to its ability to be exchanged for a variety of primary reinforcers (like food, shelter) and other secondary reinforcers (like entertainment, luxury items).
what two things are needed to train complex behaviour?
shaping
chaning
what is shaping
Shaping involves reinforcing successive approximations of a desired behavior.
This technique is used when the behavior does not yet exist, so you reinforce any behavior that is closer to the desired behavior. This is because classical condition affects pre-existing behaviours/responses, but shaping with instrumental conditioning can generate entirely novel behaviours.
Over time, response strength and accuracy increases, only behaviors that increasingly resemble the desired behavior are reinforced, allowing you to “shape” the behavior incrementally.
what is response chaning?
Chaining involves teaching a complex behavior by breaking it down into simpler, discrete parts or steps, and then linking these steps together in a sequence. Each step of the sequence is taught individually, and the reinforcement is given for completing each step. The sequences can be built up gradually until the entire sequence is performed fluidly.
what three principles must be adhered to for effective shaping?
Close temporal contiguity between R and Rft–Avoid giving spurious Rfts! This degrades contingency–Avoid reinforcing the wrong behaviour – development of “superstitious”
behaviour
give an example of response chaning
if teaching someone to prepare a cup of tea, you might break the task into steps like filling the kettle, boiling the water, placing a teabag in a cup, pouring the water, and finally adding milk or sugar. Each step would be taught and reinforced in sequence.
Running in a maze used to be considered a response chain. why isn’t it anymore?
when an animal runs a maze, the behavior is more often shaped as a whole rather than taught in discrete steps. The animal may learn to run the maze through trial and error, guided by rewards that occur at the end of the maze rather than for individual steps within the maze. The process involves forming a cognitive map or using cues within the environment to learn the path, rather than chaining discrete responses together.
Furthermore, maze running can involve a variety of problem-solving strategies and may not always follow the same sequence of actions, unlike a response chain where the sequence is fixed and each response is a conditioned cue for the next.
Outline the features of extinction of operant behaviour
extinction occurs when the contingency between response and outcome is removed which causes the established response to decline
Extinction is not ‘unlearning’ or ‘forgetting’ as it can be reinstated
Outline the types of relapse (spontaneous recovery, renewal, reinstatement, stress-induced reinstatement)
Spontaneous Recovery: This occurs when an extinguished response reappears after a period of no exposure to the conditioning or extinction context, without any additional reinforcement. Suggests that original learning is not erased but is temporarily suppressed during extinction. The context in which the learning took place, or a similar context, can serve as a powerful cue that reignites the memory of the behavior and its associated reinforcement, even after a period of non-reinforcement.
Renewal: This effect happens when a behavior is extinguished in a different context from where it was acquired and then the behavior reappears when the individual returns to the original learning context.
Reinstatement: Encountering the Outcome (the reinforcer) can trigger
a relapse of the extinguished
response e.g gambler sees someone else win causing gambling behaviour to come back
Stress-induced Reinstatement: stress triggers the return of a behaviour e.g biting nails harder not to do when you are stressed
Outline partial reinforcement effect in extinction (PREE) with an example
What is PREE?
The Partial Reinforcement Extinction Effect occurs when a behavior that has been reinforced intermittently (partially) takes longer to extinguish than a behavior that has been reinforced continuously. In other words, if a behavior is sometimes but not always rewarded, it becomes more resistant to extinction compared to a behavior that is rewarded every time it is performed.
Why Does PREE Occur?
The theory behind PREE suggests that when reinforcement is partial or intermittent, the individual learns that the absence of reinforcement at a particular instance does not necessarily mean that future responses will not be reinforced. This uncertainty leads to a more persistent behavior during the extinction phase where no reinforcement is provided at all.
For example, if a vending machine sometimes gives you a snack when you insert money and sometimes doesn’t, you may continue to try inserting money even when it stops dispensing snacks entirely because you’ve learned that sometimes you need to try more than once to get a reward.
what is a discriminative stimulus (Sd)?
discriminative stimuli control our behaviours, our behaviour is observably different in the presence or absence of specific discriminatory stimuli
e.g we wouldn’t act the same in a bar and in an office
Through generalisation Discriminative stimuli can be a range of things, such as stimuli, concepts or categories