CS7642_Week2 Flashcards

1
Q

How do we evaluate a learner?

A
  1. Value of the returned policy
  2. Computational complexity (time)
  3. Experience complexity (i..e how much data it needs)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 “classes” of solution methods for solving RL problems? (bonus: what category do TD methods fall into?)

A
  1. Model-based
  2. Value-based (TD methods fall into this)
  3. Policy-based
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What properties must the learning rate have for RL?

A
  1. SUM of all learning rate values must be infinite

2. SUM OF SQUARES of learning rate values must be finite

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Names some of the differences between TD(0) and TD(1)

A

TD(0):

  • Slow to propagate information
  • High bias, low variance
  • Maximum likelihood estimate (MLE)

TD(1):

  • Equivalent to MC, samples full trajectories
  • Requires full trajectory in order to update
  • Low bias, high variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What values lambda tend to work well (empirically speaking) when used in TD(lambda)?

A

0.3-0.7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Does Q-learning always converge? If so, what does it converge to?

A

Yes, it does converge, and in fact converges to the optimal value Q*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is non-expansion/contractions?

A

TODO: Watch lesson 5 on convergence (need to particularly pay attention to stuff on contractions and non-expansion at a conceptual level)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What things are contraction mappings / non-expansions?

A

Order statistics, FIXED convex combinations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly