POMDPs Flashcards

1
Q

What is a POMDP?

A

A partially observable MDP. Generally, a way of talking about “non-Markov” environments where we cannot observe the full state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Given the following definition of an MDP and a POMDP how can one convert an MDP to a POMDP?
MDP = (S, A, T, R)
POMDP = (S, A, Z, T, R, O)

A

S=S, A=A, Z=S, T=T, R=R
O(S,Z) = 1 if S==Z
O(S,Z) = 0 if S!=Z

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

T/F, POMDPs generalize MDPs?

A

True - We can convert any MDP to a POMDP with specific mappings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

True/False We cannot use VI/PI/LP in POMDPs because they have infinite states.

A

False, we can use these solution methods by reducing the infinite states to a finite set of states using max to create piecewise linear and convex representations of the belief state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the requirements to purge a vector when reducing a POMDP state space to a finite piecewise linear convex combination?

A

It must not take place in the max. All dominated vectors can easily be purged (only need to check endpoints).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Categorize learning models based on the following criteria: observed/partially observed, controlled/uncontrolled.

A

Observed, controlled - MDP
Observed, uncontrolled - Markov Chain
Partially observed, controlled - POMDP
Partially observed, uncontrolled - Hidden Markov Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the benefits of framing RL as a POMDP?

A

It naturally resolves the explore/exploit phenomenon. Our goal is to maximize reward but by taking actions to determine the “state” (MDP) which maximizes reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Under what framework can we view RL as a planning problem?

A

We can view it as a planning problem in a POMDP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Is it possible to reduce the continuous POMDP representation of RL into a piecewise convex poynomial?

A

Yes there are algorithms such as BEETLE but they tend to be too computationally expensive to be useful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is predictive state representation (PSR)?

A

Predictive states is the probabilities of future predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

True/False: If we know the predictive state in PSR we can determine the belief state?

A

True if we have enough tests. Without enough tests there may not be a unique mapping.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

True/False: If we know the belief state in PSR we can determine the predictive state?

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the PSR Theorem?

A

Any n-step POMDP can be represented by a PSR with no more than n tests, each of which is no longer than n steps long

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why use PSR?

A

Learning a PSR can be easier to learn than a POMDP under certain criteria, although in general easy POMDPs are easy as PSR and hard POMDPs are hard as PSR. From a philosophical perspective it is better to represent our world as something observable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly