6.2—operant conditioning: learning through consequences Flashcards Preview

🚫 PSY100H1: Introduction to Psychology (Winter 2016) with J. Vervaeke > 6.2—operant conditioning: learning through consequences > Flashcards

Flashcards in 6.2—operant conditioning: learning through consequences Deck (22)
Loading flashcards...

Operant Conditioning

  • operant conditioning: a type of learning in which behaviour is influenced by consequences
  • very few of our behaviours are random; people tend to repeat actions that previously led to positive or rewarding outcomes
  • if a behaviour previously led to a negative outcome, people are less likely to perform it again
  • operant conditioning involves voluntary actions (e.g. speaking, listening, starting and stopping and activity, moving toward or away from something)



  • contingency: a consequence depends upon an action
  • this is important to operant conditioning
  • e.g. earning good grades is generally contingent upon studying effectively
  • the consequences of a behaviour can be either reinforcing or punishing (figure 6.10)



  • reinforcement: a process in which an event or reward that follows a response increases the likelihood of that response occurring again
  • Thorndike (1905); cats in puzzle boxes were able to escape more rapidly over repeated trials because they learned which responses worked (figure 6.11)
    • law of effect: responses followed by satisfaction will occur again; those not followed by satisfaction become less likely



  • reinforcer: a stimulus that is contingent upon a response, and that increases the probablility of that response occurring again
  • B.F. Skinner; behaviourist influenced by Thorndike
    • operant chambers: (or Skinner boxes): include a lever or key that the subject can manipulate; pushing the lever may result in the delivery of a reinforcer (e.g. food)


Punishment and Punisher

  • punishment: a proces that decreases the future probability of a response
  • punisher: a stimulus that is contingent upon a response, and that results in a drecrease in behaviour
    • are not based on the stimuli themselves, but their effects on behaviour
    • e.g. yelling, losing money, or going to jail will all make it less likely that a particular response will occur again


Positive Reinforcement

  • positive reinforcement: the strengthening of behaviour after potential reinforcers such as praise, money, or nourishment follow that behaviour (table 6.2)


Negative Reinforcement

  • negative reinforcement: the strengthening of a behaviour because it removes or diminishes a stimulus (table 6.2)


Avoidance Learning | Negative Reinforcement

  • avoidance learning: a specific type of negative reinforcement that removes the possibility that a stimulus will occur
  • e.g. taking a detour to avoid traffic on a particular road
  • brain-imaging scans show a region of the frontal lobes (the orbitofrontal cortex) show incrased activity when successfully avoiding a negative outcome
  • avoidance learning (negative reinforcement) uses some of the same brain networks as positive reinforcement


Escape Learning | Negative Reinforcement

  • escape learning: occurs if a response removes a stimulus that is already present
  • e.g. covering your years when you hear really loud music
    • you can't avoid the music, because it's already present, but you can escape it instead


Positive Punishment

  • positive punishment: a process in which a behaviour decreases in frequency because it was followed by a particular, usually unpleasant, stimulus
  • e.g. cat owners using a spray bottle


Negative Punishment

  • negative punishment: when a behaviour decreases because it removes or diminishes a particular stimulus
  • e.g. when a parent grounds a child


Primary and Secondary Reinforcers

  • primary reinforcers: reinforcing stimuli that satisfy basic motivational needs—needs that affect an individual's ability to survive (and, if possible, reproduce)
    • e.g. food, water, shelter, and sexual contact
  • secondary reinforcers: reinforcing stimuli that acquire their reinforcing effects only after we learn that they have value
    • e.g. money and praise
    • they are abstract and don't directly influence survival-related behaviours
  • the nucleus accumbens becomes activated when processing rewards (i.e. both primary and secondary reinforcers)
    • variations in this area are why different people differ in their motivations for reinforcers
  • when behaviour is rewarded, dopamine is released
    • dopamine-releasing neurons in the nucleus accumbens and surrounding areas keep track of which behaviours are, or are not, associated with a reward
    • they are involved with learning new behaviour-reward associates as well as reinforcement itself


Discriminative Stimulus

  • discriminative stimulus: a cue or event that indicates that a response, if made, will be reinforced
  • e.g. before pouring a cup of coffee, we check if the light on the coffee pot is on; a discriminative stimulus tells us that the beverage will be hot and, presumably, reinforcing
  • e.g. you will only ask to borrow your parents' car when they're in a good mood
  • these stimuli demonstrate that we can use cues from our environment to help us decide whether or not to perform a conditioned behaviour



  • generalization: when an operant response takes place to a new stimulus that is similar to the stimulus present during original learning
  • e.g. a child petting, laughing, and playing with a border collie may lead to him becoming more likely to pet other dogs, or even other furry animals
  • in operant conditioning, discriminating and generalization are controlled by dopamine-secreting neurons
    • compare this to classical conditioning, were these two effects were due to the strengthening of synapses as a result of simultaneous firing


Delayed Reinforcement and Extinction

  • Thorndike (1911) noticed that reinforcement was more effective if there was very little time between the action and the consequence
  • this difference is due to the greater difficulty in associating the reinforcer with the behaviour
  • e.g. drugs that have their effect soon after they're taken are generally more addictive than drugs whose effects occur several minutes or hours afterwards
  • extinction: the weakening of an operant response when reinforcement is no longer available
    • e.g. if you lose your Internet connection, you'll stop trying to refresh your browser because there's no reinforcement for doing so


Reward Devaluation

  • behaviours do change when the reinforcer loses some of its appeal
  • experiment: rats are trained to press two different levers, each with a different reward (i.e. two different rewarding tastes); if experimenters pre-feed the animal with one of these two tastes, they will crave it less than the other



  • shaping: a procedure in which a specific operant response is created by reinforcing successive approximations of that response
    • e.g. toilet training; shaping is done in a step-by-step fashion until the desired response is learned
  • chaining: linking together two or more shaped behaviours into a complex action or sequence of actions
  • e.g. animal actors in movies were almost certainly trained through lengthy shaping and chaining procedures
  • applied behaviour analysis (ABA): using close observation, prompting, and reinforcement to teach behaviours, often to people who experience difficulties and challenges owing to a developmental condition (e.g. autism)


Schedules of Reinforcement

  • schedules of reinforcement: rules that determine when reinforcement is available
  • continuous reinforcement: every response made results in reinforcement
    • e.g. vending machines deliver a snack every time the correct amount of money is deposited
  • partial (intermittent) reinforcement: only a certain number of responses are awarded, or a certain amount of time must pass before reinforcement is available (figure 6.14)
    • e.g. phoning a friend only gets an actual person on the other end of the call some of the time


Ratio Schedules and Interval Schedules

  • ratio schedules: the reinforcements are based on the amount of responding
    • tend to generate relatively high rates of responding
  • interval schedules: based on the amount of time between reinforcements
  • fixed schedule: the schedule of reinforcement remains the same over time
  • variable schedule: the schedule of reinforcement, although linked to an average, varies from reinforcement to reinforcement
  • fixed-ratio schedule: reinforcement is delivered after a specific number of responses have been completed
    • e.g. a rat is required to press a lever 10 times to receive food
  • variable-ratio schedule: the number of responses required to receive reinforcement varies according to an average
    • e.g. in a VR5 experiment, trials could involve seven level presses, followed by four, six, three, and so on; but the average is five
    • in animal studies, variable-ratio schedules lead to the highest rate of responding of the four types of reinforcement schedules
  • fixed-interval schedule: reinforces the first response occuring after a set amount of time passes
    • e.g. if your professor gives you an exam every three weeks, your reinforcement for studying is on a fixed-interval schedule
  • variable-interval schedule: the first response is reinforced following a variable amount of time
    • e.g. if yuo're wathing a meteor shower, you'd be rewarded for looking up at irregular times; a meteor may fall on average every 5 minutes, but there will be times of inactivity for 1 minute, 8 minutes, 10 minutes, and so on


Partial Reinforcement

  • partial reinforcement effect: a phenomenon in which oraganisms that have been conditioned under partial reinforcement resist extinction longer than those conditioned under contiuous reinforcement
  • e.g. people are only intermittently reinforced for putting money into a slot machine, but a high rate of responding is still maintained and may not drecrease until afte a great many losses in a row
  • this effect is likely due to the fact that the individual is used to not receiving reinforcement for every response, so a lack of reinforcement isn't surprising and doesn't alter the motivation to produce the response when the reinforcement isn't available
  • reinforcement and supersitition; whether a supersitition affects your performance is based on whether or not you allow it to


Applying Punishment

  • people tend to be more sensitive to the unpleasantness of punishment than they are to the pleasures of reward
    • e.g. in an experiment, univeristy students playing a computer game found losing $100 to be three times more punishing than gaining $100 was reinforcing
  • the use of punishment raises some ethical concerns, especially when it comes to physical means
  • while punishment may suppress an unwanted behaviour temporarily, by itself it does not teach which behaviours are appropriate
  • punishment of any kind if most effective when combined with reinforcement of an alternative, suitable response


Classical and Operant Conditioning

  • some may want to think of behaviour as being due to either clssical conditioning or operant conditioning
  • but complex behaviour is influenced by both types of learning, each influencing behaviour in slightly different ways
  • e.g. playing slots
    • uses a variable-ratio schedule of reinforcement (a type of operant conditoning) that leads to a high response rate
    • but the flashing lights and sounds, maybe even the chair, all serve as conditioned stimuli for the unconditioned response of excitement associated with gambling

Decks in 🚫 PSY100H1: Introduction to Psychology (Winter 2016) with J. Vervaeke Class (50):