DEFINITIONS Flashcards
(50 cards)
what is learning
ability to adapt to new situations
implicit and explicit (does not require motivation - only experience of an error in judgement)
coding PE - discrepeancy between what you think you know and what is true in the moment
components of learning
1 learning about reward and punishment
2 selecting action goals
3 actions to obtain reward
4 monitoring the potential value of switching to n alt course of action
components of learning
- learning about reward and punishment
associative learning
perform for reward
not perform to avoid punish
components of learning
- selecting action goals
assign value to diff goals and make decisions in accordance with goal of highest valued outcome
beh as related to goals
reliant on temporal discounting
components of learning
- actions to obtain reward
map actions to goals - actions that lead to more valuable outcome of goal most efficiently
components of learning
- monitoring value of switching action to alt
exploitation vs exploration trade offs
are there prev unconsidered shortcuts which might obtain reward more easily?
- uncertain routes may have better outcome
counterfactual thinking and social learning - determing if switching is a good idea
define decision making
awareness of available alternatives and assigning value to each - which routes lead to which outcomes
define choice behaviours
actions assoc with the choice of a specific alternative
not decisino making - consequence of
define normative theories
agent makes decision about the utility of an outcome based on likelihood and value
opt for best choice in idealised context that maximises the utility
**flawed - make mistakes, dont know everything about outcome
define descriptive theories
actions chosen probabalistically based on value function and updated on the basis of its outcome (RPE)
more trial and error
observe beh and infer decision making process
factors that influence decision making
lifetime - alter perception of what valuable
pathology - decisional disorder - not mapped to reality
momentary fluctuations
define rienforcement learning models
expand theory into tractable parameters
allow us to measure and quantify latent (theoretical) parameters of behaviour
reinforcement learning theory
expands utility - understanding of actions and outcomes are probabalistic
use experience and feedback overtime - confirm or disconfirm expectancies and update
expectancy drives action, experience drives value updating
based on model free learning
define value function
estimate of the sum of the future rewards
all accumulated reward LT - maximal over tiem
define reward function
estimate of the immediate intrinsic value of rewards
define state value function
sum of future expected rewards
based on:
animals state ie satiety
define action value function
sum of future expected rewards within an environmental state following an action
define reward prediction error RPE
difference between actual and expected reward as expected based on current value functions
define rescorla wagner
the amount of learning (the
change ∆ in the predictive value of a stimulus
V) depends on the amount of surprise (the difference
between what actually happens, λ, and
what you expect, ΣV
define model free rienforcement learning
direct experience with reward/penalty
decisions based on VF updating following PE
determined by certainty of the variable and action familiarity
define model based reinforcement learning
model of the world - preference about the inexperienced but hypothesised world
make informed decisions without trial and error
- use motivational states and higher order input (ie social) without direct reward/penalty
adjust VF when new info about internal/external environment - avoid relearning based on experienced irregularities in RPE
model free accomodation of model based learning
DOLL ET AL
accomodate simplified model based:
generalise learning from one state to another without additional experience if states overlap
update without direct experience of the action assoc with the devalued reward - alter rep of reward and thus reduce beh
ie reversal learning:
train lever = food
lever press > no reward? - update VF and decrement value of action as outcome
- lever overlaps with food presentation therefore devalued food = devalued lever press
doll et al - model based learning in lever task - motivations
rat integrates model of the world into its motivations
ie temp discounting or maze task
choice will be base don how hungry and/or how impulsive the rat is
define counterfactual thinking
ability to imagine hypothetical outcomes
what could have been -
when dont know vf or which decision to make
generate ficticious PE unencountered - experience of regret - the value of actions not chosed to alter future choice