For MDP, set of actions for each state, e.g; - π: π(s0) = left, π(s1) = right, π(s2) = forward, …

Week 4 Flashcards by tyrion lannister

A rational agent chooses the action which maximises its utility

How well did you know this?

Not at all

Perfectly

Expected utility

For the possible states that arise from action a (s’ € s_a

How well did you know this?

Not at all

Perfectly

Maxmin criterion

Focus is on choosing action that has least-bad worst outcome

We are maximising the action choose for the best min state

How well did you know this?

Not at all

Perfectly

Maximax criterion

Choose the action that has the best best case scenario

How well did you know this?

Not at all

Perfectly

First order Markovian? Second?

Decision only depends on itself, second: depends on decision before

How well did you know this?

Not at all

Perfectly

Components of a Markov Decision Process (MDP)

set of states s € S with initial state s₀
set of actions in each state A(s)
transition model P(s’|s,a)
reward function R(s)

How well did you know this?

Not at all

Perfectly

What is a policy

For MDP, set of actions for each state, e.g;

π: π(s₀) = left, π(s₁) = right, π(s₂) = forward, …

How well did you know this?

Not at all

Perfectly

Considerations when calculating Utility of a run

For a run (series of states);
-consider horizon (in)finite
- are utilities (non)stationary (do they change?)

How well did you know this?

Not at all

Perfectly

Difference between additive and discounted rewards

How well did you know this?

Not at all

Perfectly

Proper policy

Always ends in a terminal state eventually

How well did you know this?

Not at all

Perfectly

Problem with infinite horizons?

Could make calculation of expected utility of policy inf

How well did you know this?

Not at all

Perfectly

How to deal with infinite horizons ?

proper policies
average rewards (divide expected utility by # of state)
discounted rewards

How well did you know this?

Not at all

Perfectly

Calculate expected utility of run for infinite run

How well did you know this?

Not at all

Perfectly

Expected utility of a policy π (using discounted rewards)

Where S_t is the RV, state that the agents get to at time t

How well did you know this?

Not at all

Perfectly

Optimal policy (notation)

How well did you know this?

Not at all

Perfectly

Policy for Choosing action that maximised expected utility of next state

Study These Flashcards

Bellman equation

Study These Flashcards

Utility of state is rewards of state + (discount factor)* max of utility of next state

Value iteration process

Study These Flashcards

Continue until values of states don’t change
Guaranteed to converge to optimal

Policy improvement

Study These Flashcards

Policy evaluation

Study These Flashcards

Policy iteration

Study These Flashcards

1) policy evaluation (for every state)
2) policy improvement (for every state)

Until convergence

Approximate policy evaluation

Study These Flashcards

Modified policy iteration

Study These Flashcards

Modified policy iteration pro and con

Study These Flashcards

Pro: way more efficient
Con: not guaranteed to converge

Transition model is?

P(s’|s,a)

Week 4 Flashcards

(25 cards)