Review Session #4 Flashcards Preview

CS 7642 - OMSCS (Final Exam) > Review Session #4 > Flashcards

Flashcards in Review Session #4 Deck (9)
Loading flashcards...

True or False: Since SARSA is an “online” algorithm, it will never learn the optimal policy for an environment.

False. As noted: "online" does not matter, SARSA converges with infinite exploration at the limit and a greedy policy.


True or False: RL with linear function approximation only works on toy problems.

False, by doing feature engineering can be used broadly. Add non-linear features and you can cover a lot of problems.


True or False: KWIK algorithms are an improvement of MB (mistake bound) algorithms and as such will have same or better time complexity for all classes of problems.

False: "However, some hypothesis classes are exponentially
harder to learn in the KWIK setting than
in the MB setting. An example is conjunctions of n
Boolean variables, in which MB algorithms can guess
“false” when uncertain and learn with n+1 mistakes,
but a KWIK algorithm may need
(2n/2) [I don't know]s to acquire
the negative examples required to capture the
target hypothesis."


True or False: With a classic update using linear function approximation, we will always converge to some values, but they may not be optimal.

False we will always converge to a global optima but the linear function approximation may not be suitable for non-linear domains. Baird’s star problem is not guaranteed to converge.


True or False: Residual algorithms are always guaranteed to converge, but often do so too slowly to be practical.

True: Depending on the choice of psi, the residual algorithm shows this behavior. Residual algorithms are a generalization of residual gradient algorithms.


True or False: Applying generalization with an "averager" on an MDP results in another MDP.

True, any generalization of an MDP would result in another MDP.


True or False: Problems that can be represented as POMDPs cannot be represented as MDPs.

False. You can go from an POMDP over some state-space to a larger MDP over a belief-space.


True or False: The only algorithms that work in POMDPs are planning algorithms.

False: Lots of algorithms will work here that involve using the POMDP: qlearning, sarsa, etc... Certain POMDPs could even be solved by using the PSR. Which I thought was a good example, but, not mentioned in this thread.

Noting they are undecidable for POMDPs in general is a fine answer as well.


True or False: We can represent any POMDP as a PSR, however some classes of POMDPs will be too large to deal with as PSRs.

False. Lot of algorithms will work here. PSR as noted is a great counter example.

PSR theorems shows that any POMDP can be represented as a PSR with no more than n tests. So they can be reasonably converted to PSRs.