Rienforcement Learning Flashcards

(37 cards)

1
Q

reinforcement learning theory by…

A

sutton and barto

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

reinforcement learning theory

explain

A

outcomes are probabalistic - hold a level of uncertainty
ude experiences to deterimine diff value functions at a given time
feedback - confirm or disconfirm expectations and update VF
MODEL FREE
expectancy drives action selection
experience drives VF updating

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

caricelli et al
model-based neural signatures of regret

TASK

A

fMRI - regret in reinforcement learning framework
x2 choice gamble task:

left = 50% win 200 points, 50% lose 200 points
– odds the same and outcome is equivalent loss or win

right = 75% win 50 points, 24% lose 50 points

  • discount - 200> 50 but loss > win
  • how much can the person tolerate the risk and how much are they willing to discount/delay the ultimate reward?

pps shown outcome of their choice and/or the outcome of the choice not chosen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

caricelli et al
model-based neural signatures of regret

RESULTS

A

impact of outcome modulated by amount of feedback -
magnitude of dissappointment (discrepancy between chosen and alt)
- correlated with heightened MTG and brain stem activation - implcated in processing of aversive signals

mOFC discriminated win or loss only in complete feedback trials
- counterfactual in losing trials = greater activation in mOFC (regret), whilst winning = greater deactivation (relief) + dACC + a.hipp

VS = reward prediction error - heighten to wins, depress for loses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

define regret

A

responsibility operates via counterfactual reasoning

  • relate the outcome of a previous decision with what we would have obtained has we opted for the rejected alternative
  • experience when this comparison is to our disadvantage
  • embodies as feeling of responsibility, which dissappointment does not cohere to

thought that the aversive nature of regret means we learn from its experience to minimise its reoccurence when considering new choice decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

why do we feel regret following our decisions

A

part of higher order models of the world - tool we can use to learn faster

have a disproportionate aversion to losses>gains - pay attention to when go wrong so we can adapt and improve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

common assumption of regret

A

assumed to be highly aversive - motivates behaviour to avoid outcome reoccurence

BUT may also involve the agent making ‘normal’ decisions which deviate less so as to make the decision more JUSTIFIED

  • invoke cog regulation strategies whereby we mentally reconstruct an event to make ourselves feel better
  • repeat prev-assoc regret outcome allow the decision maker to make up for the prior mistake

‘chasing’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

define gamblers fallacy

A

probability of a win is mispercieved to inrease following a loss and decrease following a win

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Nicolle et al
pushing through regret

TASK

A
binary choice - always aware of the choice not taken
2x2x2
win/loss
high(50p)/low(10p)
agency/no agency

operationalised objective regret where anticipated regret held constant - regret under ambiguity>anticipated risk

£10 - place bet on uncertain gambles (50% - not told)
red-loss, blue-win - av. value of each stake = 0p - no financial incentives in favour of either stake

2/3 agency, 1/3 no agency (comp choses)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Nicolle et al
pushing through regret

BEHAVIOURAL RESULTS

A

heigher repeat of regret-related>relief-related choices early > late runs (early bias)
- only in agency trials

tendency to repeat 10p > 50p bets overall in agency
UNLESS 50p loss early - preferred to repeat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Nicolle et al
pushing through regret

BEHAVIOURAL RESULTS + GAMBLERS FALLACY

A

doesnt sufficiently explain gamblers fallacy

if perception of loss = misperception that later win - would do so following both agency and non agency conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Nicolle et al
pushing through regret

NEURO RESULTS - VS

A

more bilat. VS activity when outcome could have been worse relative to when it could have been better

  • counterfactual signal in VS
  • consistent with role in rewards processing relative to counterfactual reference point
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Nicolle et al
pushing through regret

NEURO RESULTS - DS

A

repeat - ‘stick’, alt choice - ‘switch’

l. DS - heightened when stick to regret choice
- no sig diff in stick/switch to relief

r. DS - heightened to loss with agency
- decrease early to late runs
- reflect beh tendency to repeat regret related 50p loss early on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Nicolle et al
pushing through regret

NEURO RESULTS - ACC

A

more activity assoc with choice to stick to regret

diminished activity assoc with choice to stick to relief

no differentiation in activity when chose to switch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Nicolle et al
pushing through regret

NEURO RESULTS - OFC

A

no OFC related activity in reference to regret

may be due to way in which conducted analysis
BUT likely die to regret under ambiguity > anticipated risk as in prev studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Nicolle et al
pushing through regret

NEURO RESULTS - striatum dissociation explanation

A

VS: reflect value of experience outcome that might have been relative to a diff choice - a-o prediction
DS: regret related choice repetition - reflect s-r learning

ventral/dorsal dissociation where rewards are related to their counterfactual alternatics

ma reflect role of DS in updating subjective value of repeating a previously regret-related choice - may be motivated by desire to defend/justify past (reduce cog dissonance)

17
Q

flemming and dolan 2012

metacognition

A

HOW WE BECOME AWARE OF OUR SUCCESSES/FAILURES
we have an internal commentary
observe our behaviour as agents within our environment

feedback mechanism - before we update our VF for future decisions, we self reflect on the occurences

18
Q

flemming and dolan 2012

model of metacognition

A

object level ie task performance
- represented in posterior cortex

monitored by the meta level cognitive PFC which controls the object level

commentary inbetween stimulus - response assoc

19
Q

role of OFC in decision making

coricelli

A

interplay between decision making and emotion processing assumed to involve brain structures involved in exec and emotion processing

ie OFC - assigns value to stimuli and updates salience of primary and secondary reinforcers

ie hipp - regret is a declaritive process and hipp is critical to declarative memory (what we need to remember)

ie ACC - appraisal of conflicts and decision making for beh adjustment

20
Q

coricelli hyp

A

is the OFC related to the experience and anticipation of regret, and in the learning from said experiences

‘partial’ vs ‘complete’ feedback conditions

21
Q

coricelli

results- amyg

A

pps become increasingly regret aversive - enhanced activity within the mOFC amg

  • mOFC and amyg activity reoccur prior to new choice - same circuitry mediates direct experience of regret + its anticipation
  • incorporates weight of relative emotional value and options for choice may reflect processing assoc with avoiding future regret (VF updating)

following cumulative regret experience, beleived these regions guide decision making processes via an updated representation of value with the weighting of the relative emotinal value for diff options of choice

22
Q

why do animals make predictions

A

predictions permit an animal to anticipate an outcome and thus prepare their behaviour in line with said outcome (avoidance/approach)

anticipatory capacity is crucial for deciding between alt courses of action and attaining the optimal choice

23
Q

define a reward

A

the pos value assigned to an objects/act/state which induces approach behaviour
may also act as pos reinforcer of beh by increasing the freq of a beh (instrumental conditioning) in the reciept of the reward

24
Q

what can animals predict about a reward

A

their time of expected reciept and their magnitude

25
are rewards static
no - animals can assign and update values based on other variables which determine the rewards apetitive value ie satiety
26
classical conditioning in reward
repeated assoc of reward with an arbritrary cues = cs-us pairing develop an assoc in which animals predict the reciept of the reward and its mag following the cue - prep approach transfers to the cue
27
koob | DA and reward learning
cocaine and amphetamines increase the prevalence of DA - linked to its addictive properties midbrain DA in reward learning
28
schultz primate experiment
single DA neural activity of primates via electrodes in midbrain - short phasic firing to reward ie reciept of juice w/ frequent pairing of CS-US assoc, phasic firing transfers back to the cue - predictive omission of reward = depression of DA below baseline at the exact time point at which the reward is commonly expected
29
Schultz consclusions about DA
DA codes for prediction errors or deviations in the predicition between the expereince and the prediction of the rewards time and magnitude PE+/PE- use to learn from prev and strive towards PE+ to achieve best outcome > avoidance of PE-
30
problem with schulz
correlational doesnt contend that the phasic firing of dopamine at a prediction error drives learning - may be a consequence of it
31
steinberg et al aim
provide evidence for the causal link between coding of PE by midbrain DA - learning as outcome use optogenetics
32
define optogenetics | steinberg et al
invedigate direct causal relation between neuronal firing and beh genetic photosensitive microfibre on VTA/midbrain DA cells - switch on and off the neurons and observe subseqeunt beh
33
steinberg et al method (1)
blocking procedure absence of PE at reward delivery prevents learning about the redundant second cue - atrificially activate DA during reward deliver in blocking should mimic natural PE and allow pairing with redundant cue 1. pair CS-US assoc (tone to sucrose) 2. compound cue - add in second (light) 3. test: does the light elicit approach
34
steinberg et al results (1)
all rats learn initial pairing compound cue and DA = respond to light learning about light blocked in controls (no genetic manipulation) learning does not occur when DA firing not contingent with presentation of compound cue
35
steinberg et l method (2)
extinction learning - DA omission induce extinction of CS-US assoc - DA interfere with omission and pairing continue in spite of reward presentation replaced expected sucrose with reciept of water and fire DA neurons
36
steinberg et l results (2)
downshift test: sucrose replaced with water + opt. stimulation = heightened responding compared to no opt. controls BUT not completely counter
37
steinberg et l explanation of (2) results
inability to fully counter extinction may reflect inability for DA to fully counteract reward omission OR may reflect competition between artificially imposed DA with other neural circuitry which may also be involved in processing reward omissions