CS7642_Week5 Flashcards

1
Q

What does Dr. Littman suggest is the fundamental difference between RL and other kinds of machine learning?

A

Exploration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True/False: Gittin’s index is a good way of calculating whether to make an exploration decision or not?

A

False (unless you’re working with K-armed bandit problems - otherwise research has shown it generally doesn’t work well).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What happens to Hoeffding bounds as we have more data to sample from?

A

The bounds get SMALLER (i.e. we’re more confident in our estimate). This is because the bounds shrink with the square root of ‘n’ samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the ‘Simulation Lemma’?

A

The idea that if we have a “pretty good” estimate of the real MDP, then optimizing our rewards in that estimate is going to be pretty good in the real MDP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the ‘Explore or Exploit’ Lemma?

A

If all transitions are either accurately estimated or unknown, then the optimal policy is either near optimal OR an unknown state is reached “quickly”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the RMax algorithm?

A

“Optimism in the face of uncertainty”. In essence, it says that anytime we don’t know something, assume it’s awesome, and behave accordingly. Eventually we’ll have traversed the unknown edges of the MDP and be left with an optimal policy.

Algorithm:

  1. Keep track of the MDP
  2. Any unknown state-action pair is defined as “Rmax”
  3. Solve the MDP
  4. Take action from it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Rmax is an efficient algorithm for deterministic MDPs? (True/False)

A

True. Can be solved in polynomial time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why are Hoeffding bounds important?

A

They help us with epistemological uncertainty. In decision making, we’re often uncertain about what we know. Hoeffding bounds tell us how certain we actually are.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly