Week 6 (Chapter 47, 48) Flashcards

(39 cards)

1
Q

What is thought to be at the centre of reinforcement learning in the mammalian brain?

A

The midbrain dopaminergic system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the uses for a theoretical framework for reinforcement learning?

A
  • aid in the interpretation of neurophysiology data

- guide the design of future studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The Rescorla-Wagner model formalized the idea that _______ is needed to drive learning

A

An error between the actual and predicted outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the learning theory originally formulated for computer learning?

A

Temporal difference (TD) learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Reward expectation reduces dopamine reward responses in a purely _____ fashion

A

Subtractive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A classic adaptation of TD learning to a biological circuit model utilizes a _____

A

Complete serial compound (CSC) feature representation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Dopamine neurons show a negative prediction error ______

A

At the time of an expected reward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Dopamine responses to reward are suppressed only at _____

A

The time of the expected reward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens when the timing of a reward is shifted?

A

A larger dopamine response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cues followed by late rewards result in _____ dopamine responses than cues followed by early rewards

A

smaller

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What TD model was proposed to explain the shortcomings of the CSC TD model?

A

The microstimulus model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does the microstimulus TD model differ from the CDC TD model?

A

The microstimulus model is able to account for the longer dip of dopamine responses upon reward omission

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In 1998, Hollerman & Shultz discovered that when monkeys were given a reward stimulus earlier than expected, ______

A

There was a large dopamine response upon the reward, but no negative response at the time of the usual reward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is one possible modification to the CSC TD model after Hollerman & Shultz’s discovery?

A

An animal is in two states - ISI when expecting a reward and ITI when not. When one is activated, the other is deactivated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Semi-Markov dynamics imply that time spent in a state is _____ and is defined by ______

A
  • probabilistic

- a probability distribution called a ‘dwell time distribution’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The ______ model accounts for Hollerman & Shultz’s findings

A

Belief-state TD model

17
Q

According to the belief-state TD model, uncertainty should ______

A

Dramatically affect how reward expectation evolves over time

18
Q

The CSC TD is model-____, while the belief-state TD is model-____

19
Q

What is a shortcoming of the belief-state model?

A

If an animal is hungry, food-based rewards would have higher state values than drink-based rewards - this is not explicitly learned, therefore the belief-state model does not account for it

20
Q

Dopamine neurons code for ______

A

Reward prediction error

21
Q

The magnitude of dopamine prediction error responses scales _____, integrating them into a biological teaching signal for ______

A
  • reward size, probability, and delay

- utility

22
Q

Rewards drive learning as _____

A

positive reinforcement

23
Q

What brain areas are specialized for processing rewards and reward-related behaviours?

A
  • dopaminergic midbrain
  • orbitofrontal cortex
  • amygdala
  • ventral striatum
24
Q

Dopamine neurons reside in the _____ and send signals to the ____ and _____

A
  • midbrain
  • basal ganglia
  • frontal cortex
25
What do increases in dopamine neuron activity indicate?
The outcome was better than predicted, and the preceding behaviour should be repeated or invigorated
26
What does the magnitude of dopamine activity indicte?
By what degree behaviours should be updated
27
Dopamine teaching signals reflect the same values used for _______
Economic decisions
28
The R-W model refers to association between _____
The conditioned stimulus and unconditioned stimulus
29
The TD model consists of an explicit ______ that reflects ______
- value function | - reward expectation through time
30
Dopamine repsonses show a _____ relationship to reward amount
positive monotonic
31
Reward responses are _____ as reward probability gets larger
Diminished
32
What is expected utility?
probability multiplied by utility
33
What two reward parameters are critical to separate objective factors from subjective values?
Timing and risk
34
What is temporal discounting?
People tend to prefer rewards sooner rather than later
35
What is the economic definition of risk?
Statistical variance in outcome distributions?
36
Utility is defined in economics as _____-
Subjective value derived from choice behaviour
37
What is a certainty equivalent?
The singular reward amount that has the same utility as a gamble
38
Standard utility functions are _____, because most humans are _____
- concave | - risk-averse
39
Optogenic stimulation of dopamine neurons shows that ________
Dopamine activations teach animals what to choose