Convergence Flashcards

1
Q

What is the difference between RL with control and without?

A

RL with control involves the agent selecting actions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the Bellman equation with actions? How does this relate to a Bellman operator?

A
Q(s,a) = R(s,a) + gamma*Sum_s'[T(s,a,s') max_a'(Q(s',a'))]
[BQ](s,a) = the above form
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the Bellman operator Q* = BQ* describe? Q_t = BQ_t-1?

A
Q* = BQ* = Bellman equation
Q_t = BQ_t-1 = Value iteration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a contraction mapping?

A

B is a operator (maps value functions to value functions)
If, for all Q functions F,G and some 0 <= gamma < 1
|| BF-BG || _inf <= gamma*|| F-G ||_inf
then B is a contraction mapping

Note ||Q||inf = max(s,a) |Q(s,a)|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the properties of a contraction mapping?

A

If B is a contraction mapping:

  1. F* = BF* has a unique solution
  2. F_t = BF_t-1 => F_t -> F* (i.e. value iteration converges)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

T/F: Is the max operator a non-expansion?

A
True
Assert WLOG - max_a f(a) >= g(a)
|max_a f(a) - max_a g(a)| = max_a f(a) -max_a g(a) = f(a1) - g(a2) (for some a1, a2)
<= f(a1) - g(a1) (by our assertion)
= |f(a1) - g(a1)| <= max_a |f(a) - g(a)|
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can we generalize MDPs? Do they converge? If so, to what?

A

Using the generalized Bellman Equation we can change the “sum” and “multiplication” operator.
They will converge as long as they are non-expanding.
They converge to some fixed point based on the definition of the model, not necessarily Q*
e.g. we can minimize the outcome of the maximizing action to develop a risk averse MDP model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some examples of non-expansion operators?

A

Order statistics: max, min, etc

Fixed convex combination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly