Week 4 Flashcards
a*
A rational agent chooses the action which maximises its utility
Expected utility
For the possible states that arise from action a (s’ € sa
Maxmin criterion
Focus is on choosing action that has least-bad worst outcome
We are maximising the action choose for the best min state
Maximax criterion
Choose the action that has the best best case scenario
First order Markovian? Second?
Decision only depends on itself, second: depends on decision before
Components of a Markov Decision Process (MDP)
- set of states s € S with initial state s0
- set of actions in each state A(s)
- transition model P(s’|s,a)
- reward function R(s)
What is a policy
For MDP, set of actions for each state, e.g;
- π: π(s0) = left, π(s1) = right, π(s2) = forward, …
Considerations when calculating Utility of a run
For a run (series of states);
-consider horizon (in)finite
- are utilities (non)stationary (do they change?)
Difference between additive and discounted rewards
Proper policy
Always ends in a terminal state eventually
Problem with infinite horizons?
Could make calculation of expected utility of policy inf
How to deal with infinite horizons ?
- proper policies
- average rewards (divide expected utility by # of state)
- discounted rewards
Calculate expected utility of run for infinite run
Expected utility of a policy π (using discounted rewards)
Where St is the RV, state that the agents get to at time t
Optimal policy (notation)