Week 4 Flashcards

1
Q

a*

A

A rational agent chooses the action which maximises its utility

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Expected utility

A

For the possible states that arise from action a (s’ € sa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Maxmin criterion

A

Focus is on choosing action that has least-bad worst outcome

We are maximising the action choose for the best min state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Maximax criterion

A

Choose the action that has the best best case scenario

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

First order Markovian? Second?

A

Decision only depends on itself, second: depends on decision before

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Components of a Markov Decision Process (MDP)

A
  • set of states s € S with initial state s0
  • set of actions in each state A(s)
  • transition model P(s’|s,a)
  • reward function R(s)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a policy

A

For MDP, set of actions for each state, e.g;

  • π: π(s0) = left, π(s1) = right, π(s2) = forward, …
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Considerations when calculating Utility of a run

A

For a run (series of states);
-consider horizon (in)finite
- are utilities (non)stationary (do they change?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Difference between additive and discounted rewards

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Proper policy

A

Always ends in a terminal state eventually

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Problem with infinite horizons?

A

Could make calculation of expected utility of policy inf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to deal with infinite horizons ?

A
  • proper policies
  • average rewards (divide expected utility by # of state)
  • discounted rewards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Calculate expected utility of run for infinite run

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Expected utility of a policy π (using discounted rewards)

A

Where St is the RV, state that the agents get to at time t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Optimal policy (notation)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Policy for Choosing action that maximised expected utility of next state

A
17
Q

Bellman equation

A

Utility of state is rewards of state + (discount factor)* max of utility of next state

18
Q

Value iteration process

A

Continue until values of states don’t change
Guaranteed to converge to optimal

19
Q

Policy improvement

A
20
Q

Policy evaluation

A
21
Q

Policy iteration

A

1) policy evaluation (for every state)
2) policy improvement (for every state)

Until convergence

22
Q

Approximate policy evaluation

A
23
Q

Modified policy iteration

A
24
Q

Modified policy iteration pro and con

A

Pro: way more efficient
Con: not guaranteed to converge

25
Q

Transition model is?

A

P(s’|s,a)