W10 Future RL & Wrap-up Flashcards

1
Q

Tabular RL key words

A

MDP, grid world, cartpole, tabular Q-learning, exploration-exploitation, on/off policy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

1) Deep learning brings what to RL?
2) Pros & Cons of model-free methods?

A

1) methods to break correlations and improve convergence (replay buffer and a separate target network)
2) Algorithms often reach good quality optima, but model-free algorithms have a high sample complexity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Model-based method summary

A

Combine planning and learning to improve sample effciency.
For high-dimensional env: use uncertainty modeling & latent models/world models to reduce dimensionality for planning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Main challenge of DRL?

A

Manage the combinatorial explosion that occurs when a sequence of decisions is chained together. Finding the right kind of inductive bias can exploit structure in this state space.

3 major challenges
1. Solving larger problems faster (reduce sample complexity with latent-models, curriculum learning in self-play, transfer learning, meta-learning, better exploration through intrinsic motivation)
2. More agents (hierachical, population-based self-play league)
3. Human interaction (explainable AI, generalization)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly