Rienforcement Learning Flashcards
(37 cards)
reinforcement learning theory by…
sutton and barto
reinforcement learning theory
explain
outcomes are probabalistic - hold a level of uncertainty
ude experiences to deterimine diff value functions at a given time
feedback - confirm or disconfirm expectations and update VF
MODEL FREE
expectancy drives action selection
experience drives VF updating
caricelli et al
model-based neural signatures of regret
TASK
fMRI - regret in reinforcement learning framework
x2 choice gamble task:
left = 50% win 200 points, 50% lose 200 points
– odds the same and outcome is equivalent loss or win
right = 75% win 50 points, 24% lose 50 points
- discount - 200> 50 but loss > win
- how much can the person tolerate the risk and how much are they willing to discount/delay the ultimate reward?
pps shown outcome of their choice and/or the outcome of the choice not chosen
caricelli et al
model-based neural signatures of regret
RESULTS
impact of outcome modulated by amount of feedback -
magnitude of dissappointment (discrepancy between chosen and alt)
- correlated with heightened MTG and brain stem activation - implcated in processing of aversive signals
mOFC discriminated win or loss only in complete feedback trials
- counterfactual in losing trials = greater activation in mOFC (regret), whilst winning = greater deactivation (relief) + dACC + a.hipp
VS = reward prediction error - heighten to wins, depress for loses
define regret
responsibility operates via counterfactual reasoning
- relate the outcome of a previous decision with what we would have obtained has we opted for the rejected alternative
- experience when this comparison is to our disadvantage
- embodies as feeling of responsibility, which dissappointment does not cohere to
thought that the aversive nature of regret means we learn from its experience to minimise its reoccurence when considering new choice decisions
why do we feel regret following our decisions
part of higher order models of the world - tool we can use to learn faster
have a disproportionate aversion to losses>gains - pay attention to when go wrong so we can adapt and improve
common assumption of regret
assumed to be highly aversive - motivates behaviour to avoid outcome reoccurence
BUT may also involve the agent making ‘normal’ decisions which deviate less so as to make the decision more JUSTIFIED
- invoke cog regulation strategies whereby we mentally reconstruct an event to make ourselves feel better
- repeat prev-assoc regret outcome allow the decision maker to make up for the prior mistake
‘chasing’
define gamblers fallacy
probability of a win is mispercieved to inrease following a loss and decrease following a win
Nicolle et al
pushing through regret
TASK
binary choice - always aware of the choice not taken 2x2x2 win/loss high(50p)/low(10p) agency/no agency
operationalised objective regret where anticipated regret held constant - regret under ambiguity>anticipated risk
£10 - place bet on uncertain gambles (50% - not told)
red-loss, blue-win - av. value of each stake = 0p - no financial incentives in favour of either stake
2/3 agency, 1/3 no agency (comp choses)
Nicolle et al
pushing through regret
BEHAVIOURAL RESULTS
heigher repeat of regret-related>relief-related choices early > late runs (early bias)
- only in agency trials
tendency to repeat 10p > 50p bets overall in agency
UNLESS 50p loss early - preferred to repeat
Nicolle et al
pushing through regret
BEHAVIOURAL RESULTS + GAMBLERS FALLACY
doesnt sufficiently explain gamblers fallacy
if perception of loss = misperception that later win - would do so following both agency and non agency conditions
Nicolle et al
pushing through regret
NEURO RESULTS - VS
more bilat. VS activity when outcome could have been worse relative to when it could have been better
- counterfactual signal in VS
- consistent with role in rewards processing relative to counterfactual reference point
Nicolle et al
pushing through regret
NEURO RESULTS - DS
repeat - ‘stick’, alt choice - ‘switch’
l. DS - heightened when stick to regret choice
- no sig diff in stick/switch to relief
r. DS - heightened to loss with agency
- decrease early to late runs
- reflect beh tendency to repeat regret related 50p loss early on
Nicolle et al
pushing through regret
NEURO RESULTS - ACC
more activity assoc with choice to stick to regret
diminished activity assoc with choice to stick to relief
no differentiation in activity when chose to switch
Nicolle et al
pushing through regret
NEURO RESULTS - OFC
no OFC related activity in reference to regret
may be due to way in which conducted analysis
BUT likely die to regret under ambiguity > anticipated risk as in prev studies
Nicolle et al
pushing through regret
NEURO RESULTS - striatum dissociation explanation
VS: reflect value of experience outcome that might have been relative to a diff choice - a-o prediction
DS: regret related choice repetition - reflect s-r learning
ventral/dorsal dissociation where rewards are related to their counterfactual alternatics
ma reflect role of DS in updating subjective value of repeating a previously regret-related choice - may be motivated by desire to defend/justify past (reduce cog dissonance)
flemming and dolan 2012
metacognition
HOW WE BECOME AWARE OF OUR SUCCESSES/FAILURES
we have an internal commentary
observe our behaviour as agents within our environment
feedback mechanism - before we update our VF for future decisions, we self reflect on the occurences
flemming and dolan 2012
model of metacognition
object level ie task performance
- represented in posterior cortex
monitored by the meta level cognitive PFC which controls the object level
commentary inbetween stimulus - response assoc
role of OFC in decision making
coricelli
interplay between decision making and emotion processing assumed to involve brain structures involved in exec and emotion processing
ie OFC - assigns value to stimuli and updates salience of primary and secondary reinforcers
ie hipp - regret is a declaritive process and hipp is critical to declarative memory (what we need to remember)
ie ACC - appraisal of conflicts and decision making for beh adjustment
coricelli hyp
is the OFC related to the experience and anticipation of regret, and in the learning from said experiences
‘partial’ vs ‘complete’ feedback conditions
coricelli
results- amyg
pps become increasingly regret aversive - enhanced activity within the mOFC amg
- mOFC and amyg activity reoccur prior to new choice - same circuitry mediates direct experience of regret + its anticipation
- incorporates weight of relative emotional value and options for choice may reflect processing assoc with avoiding future regret (VF updating)
following cumulative regret experience, beleived these regions guide decision making processes via an updated representation of value with the weighting of the relative emotinal value for diff options of choice
why do animals make predictions
predictions permit an animal to anticipate an outcome and thus prepare their behaviour in line with said outcome (avoidance/approach)
anticipatory capacity is crucial for deciding between alt courses of action and attaining the optimal choice
define a reward
the pos value assigned to an objects/act/state which induces approach behaviour
may also act as pos reinforcer of beh by increasing the freq of a beh (instrumental conditioning) in the reciept of the reward
what can animals predict about a reward
their time of expected reciept and their magnitude