Final Review pt. 6 Flashcards

Question 1

Q

True or False: Options over an MDP form another MDP. Why?

Answer

A

True. Options over MDPs form SMDPs which can be converted to MDPs.

Question 2

Q

Can we do Q learning on SMDP? If so, how do we do that?

Answer

A

True, we use observations in place of actions and discounted rewards in place of rewards

Question 3

Q

Is value iteration guaranteed to converge to the optimal policy for an SMDP?

Answer

A

with a particular choice of option there is no guarantee that it would end up with
an optimal policy

Question 4

Q

Is value iteration guaranteed to converge for an SMDP?

Answer

A

I’m thinking value iteration is guaranteed to converge with an SMDP but not necessarily to the optimal policy.

Question 5

Q

Does SMDP generalize over MDP?

Answer

A

Yes, SMDP is a generalization of MDP. In SMDP, the timestep between actions is a variable.

Question 6

Q

How is SMDP defined? Why is it called semi-Markov?

Answer

A

SMDPs are Markov Decision Processes that use options instead of discrete “atomic” actions. These options are allowed to take variable time instead of discrete time.

Question 7

Q

How is an option defined?

Answer

A

<i> : I = the initialization set of states, Pi = the policy to take with that option, and Beta = the termination set of states.</i>

</i>

Final Review pt. 6 Flashcards

(7 cards)