Dynamic Programming Flashcards

Question 1

Q

What is Dynamic Programming ? How does it fit into the solution of an MDP?

Answer

A

A method to solve complex problem by breaking them down into subproblems, solving this smaller problems, and combining those solutions

In reinforcement learning it leverages value functions to systematically compute good policies when the MDP’s model (transition probabilities and rewards) are known

Question 2

Q

DP breaks into two main routines. What are they? How the union of those routines is called? What is its aim?

Answer

A

Policy Evaluation: given its policy, compute its state-value

Policy Improvement: given a state-value function estimated, derive a better policy

By alternating these (Policy Iteration) or blending them in each sweep (Value Iteration), we converge to the optimal value function and optimal policy

Question 3

Q

What essentially is the representation of solving the state value function for |S| states? How can we solve it?

Answer

A

A system of linear equations in the unknowns values of each state

Instead of solving the linear system directly, we use successive approximation:
1. Initialize the values for each state arbitrarily
2. Iteratively calculate the value function
3. Stop when the difference is smaller then a given threshold

Question 4

Q

What is Policy Iteration? What ar e the processes involved in it?

Answer

A

Policy Iteration alternates full evaluation and full improvement

Initialize an arbitrary policy
Do Policy Evaluation (by iterative evaluation until convergence) followed by Policy Improvement
Stop when the policy do not change from one iteration to another

Question 5

Q

What is Value Iteration?

Answer

A

It fuses evaluation and improvement into one update

Dynamic Programming Flashcards

(5 cards)