operant conditioning
-process in which behavioural change (i.e., learning) occurs because of the consequences of behaviour
thorndike: law of effect - aim
-toinvestigate the development of learned behaviour in animals that he could generalise to humans
thorndike: law of effect - method
-thorndike put hungry cats in cages with automatic doors that could be opened by pressing a lever inside the cage and he timed how long it took the cat to escape
-when first placed in the cages, the cats displayed unsystematic trial-and-error behaviours trying to escape. e.g. they scratched, bit, and wandered around the cages without identifiable patterns
-thorndike would then put food outside the cages, the catsthen experimented with different ways to escape and reach the fish
-eventually, they would stumble upon the lever which opened the cage
When it had escaped, the cat was put in again, and the time it took to escape was recorded
thorndike: law of effect - key findings
-in successive trials, the cats learnt that pressing the lever would have the favourable consequence of getting food, which led to the cats becoming quicker at pushing the lever
-thorndike proposed the Law of Effect - any behaviour that is followed by pleasant consequences is likely to be repeated in that situation, and any behaviour followed by unpleasant consequences is likely to be stopped
thorndike: law of effect - contributions
-introduced the concept of reinforcement – the idea that a behaviour can encouraged by pleasant consequences
-provided a foundation for Skinner’s work on operant conditioning
thorndike: law of effect - limitations
-hard to generalise results to a human population as animal and human cognitive processes are different
-oversimplifies human behaviour as it doesn’t acknowledge more complex cognitive processes and motivations that may be involved in learning
three phase model - antecedent
-the stimulus that occurs before the behaviour, e.g. zookeeper gives a signal to the seal
three phase model - behaviour
-action that occurs due to the antecedent, it’s an active behaviour that operates upon the environment to generate consequences e.g. seal does a trick
three phase model - consequence
-result of the behaviour, seal gets a fish
reinforcement
-a consequence which increases the likelihood of a desirable behaviour occurring again
-reinforcers can be primary (things that are innately reinforcing such as food or warmth) or secondary (things that are learned such as money
positive reinforcement
-addition of a pleasant consequence to increase the likelihood of a desirable behaviour occurring again, e.g. seal gets a fish when it performs a trick
negative reinforcement
-removal of an aversive consequence to increase the likelihood of a desirable behaviour occurring again, e.g. car stops beeping when you put on your seat belt
punishment
-a consequence which decreases the likelihood of an undesirable behaviour occurring again
positive punishment
-addition of an aversive consequence to decrease the likelihood of an undesirable behaviour occurring again, e.g. electric fence zapping an animal trying to escape
negative punishment
-removal of a pleasant consequence decrease the likelihood of an undesirable behaviour occurring again, e.g. taking away a child’s toy because they have been misbehaving
factors of the effectiveness of operant conditioning
appropriateness
-reinforcement or behaviour needs to match the behaviour
-a parent helping their child pay for a car would be appropriate reinforcement for a good ATAR but not for keeping their room tidy
-detention would not be an appropriate punishment for a student throwing a chair at a teacher, but would be for not doing homework
timing
-punishment or reinforcement must occur after the behaviour, so it is seen as a consequence
-must be immediate
schedules of reinforcement
-pattern that defines how often a desired response will be reinforced
-reinforcement schedules take place in both naturally occurring learning situations as well as more structured training situations
different patterns influence:
-response rate: how many times the behaviour is displayed
-extinction rate: how long does it take for the behaviour to disappear without reinforcement
continuous reinforcement
-desired behaviour is reinforced each and every time it occurs, used to teach a new behaviour
-desired behavior is typically learned quickly
-difficult to maintain over a long period of time due to the effort of having to reinforce a behavior each time it is performed
-e.g. seal gets a fish everytime it performs a trick when it is first learning it
fixed ratio
-behaviour is reinforced only after a specific number of responses
-builds a high response rate
-has a medium extinction rate, irregular responding may occur if reinforcement is stopped
-e.g. you get a free coffee after every 10th coffee purchase at a café
fixed interval
-behaviour is rewarded only after a specified amount of time has elapsed
-medium response rate
-produces a choppy stop-start pattern rather than a steady rate of response
-tend to respond more frequently as the anticipated time for reward draws near
-medium extinction rate
-e.g. employees receive a paycheck every fortnight
variable ratio
-behaviour is reinforced after an unpredictable number of responses
-response rate is fast
-responses are persistence in hope that the next response might be one needed to receive reinforcement
-extinction rate is slow
-e.g. slot machines at casinos
vairable interval
-occur when a response is rewarded after an unpredictable amount of time has passed
-moderate but steady response rate
-extinction rate is slow
-e.g. receiving social media notifications, and fishing