CS7642_Week7 Flashcards

1
Q

“Exploration is expectation” in Bayesian RL? (True/False)

A

True. in Bayesian RL we’re updating our posterior beliefs as we go, so as our beliefs are updated and contain more an more “doses of reality” (i.e. the underlying dynamics of the problem), we end up with an optimal solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is PSR?

A

PSR :: Predictive State Representation. Essentially, I only care about states that allow me to make some concrete prediction about the world.

Key idea is that we may not ever be able to know the ground truth of what state we’re in, but we can run tests and track outcomes to ground our belief of what state we’re in with empirical evidence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A PSR can represent any POMDP? (True/False)

A

True. PSRs are really just a more philosophically palatable representation of POMDPs that dispenses with the notion of hidden/unobservable states.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

States and predictors are the same thing? (True/False)

A

True. Or at least more or less true in the context of RL/machine learning. We only really care about a state or feature insofar as it allows us to make some sort of prediction that has a basis in reality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

POMDPs generalize regular MDPs? (True/False)

A

True. POMDPs are just a way of talking about “non-Markov” environments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The definition of a POMDP is a tuple (S, A, Z, T, R, O)? (True/False)

A

True. Z is the observable (i.e. the thing(s) the agent actually sees), and O is the observation function, i.e. O(S, Z) is a function of the actually underlying state S and what we can actually observe of the process Z

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

We have to expand out our notion of “states” in order for a POMDP to generalize a regular MDP?

A

True. We expand it be defining states to be “belief” states b(s) that represent a probability distribution of the states we think we’re in.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In a POMDP, the reward is encoded into the observable Z, i.e. r = f(z)? (True/False)

A

True. In a POMDP the reward isn’t observed directly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

POMDPs are difficult to solve because the contain an infinite number of states? (True/False)

A

True. Because we’re working in “belief state space”, there’s no deterministic solution. There’s an infinite number of beliefs we could have in a probabilistic space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

We can make the infinite belief state space of a POMDP tractable from a computational perspective by performing value iteration using a maximum over Piecewise Linear and Convex? (True/False)

A

True. Think of two belief states on a line. The intersections of the linear functions that denote my belief about whether I’m in belief state A or B define a convex surface that opens upwards to the top. Each linear function can represent an infinite number of states, just by definition of a function (a mapping from input to output). We then just take the max over that linear combination to solve the POMDP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do we call a model that is (1) Partially Observed and (2) Controlled?

A

A POMDP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do we call a model that is (1) Observed and (2) Controlled?

A

A regular MDP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do we call a model that is (1) Observed and (2) uncontrolled?

A

Markov chain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What do we call a model that is (1) Partially Observed and (2) uncontrolled?

A

Hidden Markov Model (didn’t really talk about this in the course)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In Bayesian RL, we can think about RL as a POMDP itself?

A

True. It turns RL into simply planning. The hidden state then becomes the parameters of the MDP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Exploitation in belief space is the same as exploring, so the exploration vs. exploitation conundrum goes away in Bayesian RL? (True/False)

A

True.

17
Q

Solutions to Bayesian RL problems can be approximated as Piecewise Linear and Convex? (True/False)

A

False. They are Piecewise Polynomial and Convex.