Week 4 Flashcards
a*
A rational agent chooses the action which maximises its utility
Expected utility
For the possible states that arise from action a (s’ € sa
Maxmin criterion
Focus is on choosing action that has least-bad worst outcome
We are maximising the action choose for the best min state
Maximax criterion
Choose the action that has the best best case scenario
First order Markovian? Second?
Decision only depends on itself, second: depends on decision before
Components of a Markov Decision Process (MDP)
- set of states s € S with initial state s0
- set of actions in each state A(s)
- transition model P(s’|s,a)
- reward function R(s)
What is a policy
For MDP, set of actions for each state, e.g;
- π: π(s0) = left, π(s1) = right, π(s2) = forward, …
Considerations when calculating Utility of a run
For a run (series of states);
-consider horizon (in)finite
- are utilities (non)stationary (do they change?)
Difference between additive and discounted rewards
Proper policy
Always ends in a terminal state eventually
Problem with infinite horizons?
Could make calculation of expected utility of policy inf
How to deal with infinite horizons ?
- proper policies
- average rewards (divide expected utility by # of state)
- discounted rewards
Calculate expected utility of run for infinite run
Expected utility of a policy π (using discounted rewards)
Where St is the RV, state that the agents get to at time t
Optimal policy (notation)
Policy for Choosing action that maximised expected utility of next state
Bellman equation
Utility of state is rewards of state + (discount factor)* max of utility of next state
Value iteration process
Continue until values of states don’t change
Guaranteed to converge to optimal
Policy improvement
Policy evaluation
Policy iteration
1) policy evaluation (for every state)
2) policy improvement (for every state)
Until convergence
Approximate policy evaluation
Modified policy iteration
Modified policy iteration pro and con
Pro: way more efficient
Con: not guaranteed to converge
Transition model is?
P(s’|s,a)