Dynamic Programming Flashcards

(5 cards)

1
Q

What is Dynamic Programming ? How does it fit into the solution of an MDP?

A

A method to solve complex problem by breaking them down into subproblems, solving this smaller problems, and combining those solutions

In reinforcement learning it leverages value functions to systematically compute good policies when the MDP’s model (transition probabilities and rewards) are known

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DP breaks into two main routines. What are they? How the union of those routines is called? What is its aim?

A

Policy Evaluation: given its policy, compute its state-value

Policy Improvement: given a state-value function estimated, derive a better policy

By alternating these (Policy Iteration) or blending them in each sweep (Value Iteration), we converge to the optimal value function and optimal policy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What essentially is the representation of solving the state value function for |S| states? How can we solve it?

A

A system of linear equations in the unknowns values of each state

Instead of solving the linear system directly, we use successive approximation:
1. Initialize the values for each state arbitrarily
2. Iteratively calculate the value function
3. Stop when the difference is smaller then a given threshold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Policy Iteration? What ar e the processes involved in it?

A

Policy Iteration alternates full evaluation and full improvement

  1. Initialize an arbitrary policy
  2. Do Policy Evaluation (by iterative evaluation until convergence) followed by Policy Improvement
  3. Stop when the policy do not change from one iteration to another
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Value Iteration?

A

It fuses evaluation and improvement into one update

How well did you know this?
1
Not at all
2
3
4
5
Perfectly