Rewards Flashcards

1
Q

Why would we want to change the reward function for an MDP?

A

To make the MDP easier (speed, space, solvability) to solve while learning something similar to what it would have learned anyways

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can we change the reward function without changing the optimal policy?

A
  1. Multiplying by a (positive) scalar
  2. Shifting by a scalar (adding)
  3. Non-linear potential-based transformations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the new Q function equal to if we multiply the reward function by a positive constant c?

A

Q’(s,a) = c*Q(s,a)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the new Q function equal to if we add a constant c to the reward function?

A

Q(s,a) = Q(s,a) + c/(1-gamma)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is potential based reward shaping? What is the purpose?

A

Adding rewards for entering states but subtracting them when the state is exited. It is intended to encourage specific behavior (e.g. moving towards a goal) and speed up learning without creating an infinite reward pump.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is equivalent to doing Q learning with potentials?

A

Q learning initialized with the potential function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly