Final Review pt. 6 Flashcards

1
Q

True or False: Options over an MDP form another MDP. Why?

A

True. Options over MDPs form SMDPs which can be converted to MDPs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Can we do Q learning on SMDP? If so, how do we do that?

A

True, we use observations in place of actions and discounted rewards in place of rewards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Is value iteration guaranteed to converge to the optimal policy for an SMDP?

A

with a particular choice of option there is no guarantee that it would end up with
an optimal policy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Is value iteration guaranteed to converge for an SMDP?

A

I’m thinking value iteration is guaranteed to converge with an SMDP but not necessarily to the optimal policy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Does SMDP generalize over MDP?

A

Yes, SMDP is a generalization of MDP. In SMDP, the timestep between actions is a variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is SMDP defined? Why is it called semi-Markov?

A

SMDPs are Markov Decision Processes that use options instead of discrete “atomic” actions. These options are allowed to take variable time instead of discrete time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is an option defined?

A

<i> : I = the initialization set of states, Pi = the policy to take with that option, and Beta = the termination set of states.</i>

</i>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly