7. Current Challenges and Future Research Directions Flashcards

(12 cards)

1
Q

How does model-free methods fair with Sample Efficiency?

A

Many advanced model-free methods still demand tens to hundreds of millions of frames to surpass human expert performance in tasks like Atari games, rendering real-world problems impractical to solve within reasonable timeframes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does On-policy methods fair with Sample Efficiency?

A

On-policy methods, by their nature, suffer from poor sample efficiency because any changes in policy necessitate the collection of entirely new samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which models in RL demand extensive Hyperparameter Tuning?

A

Off-policy actor-critic, Q-learnin g, and other value-fitting methods demand extensive hyperparameter searches, which negatively impact both stability and generalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a common issue with significant number of RL methods in regarding the Stability & Convergence principal?

A

A significant number of RL methods are not founded on gradient descent, leading to open questions regarding their convergence guarantees.

especially for value -fitting algorithms that employ nonlinear function approximators

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When discussing Stability & Convergence principals in respect to rl algorthims: value fitting, model-based methods, and Policy Gradient algorithms?

A

value -fitting algorithms that employ nonlinear function approximators can have difiiculties with gradient descent.

While model-based methods generally converge, they may produce less generalized solutions and are susceptible to exploitation by flawed environmental models, leading to unrealistic behaviors

Policy Gradient methods, despite their theoretical convergence for convex objective functions, are plagued by high variance, which can easily destroy training progress.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Although improvements like experience replay and target networks in DQN enhance performance, they introduce numerous ___________ , making tuning a complex and often ____________ task across different RL applications

A

First blank - hyperparameters
Second blank - non-generalizable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What the relationship between sample efficiency, bias, and variance?

A

The objectives of increasing sample efficiency, lowering bias, and lowering variance are often conflicting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Bias vs. Variance Trade-offs

For instance, efforts to reduce variance, such as through increased sampling, typically come at the cost of reduced ___.

Can you name another instance?

A

First blank - sample efficiency.
Another Instance - reducing sample numbers can increase bias and reduce variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Computational Expense

___ methods inherent in model-based solutions are computationally ___.

A

first blank - Iterative
second blank - intensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Computational Expense

Furthermore, achieving high parallelism, which is crucial for large-scale problems, is not straightforward for all RL methods, particularly 1 learning and 2 RL.

A
  1. value-based
  2. model-based
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Assumptions & Approximations:

Many RL methods rely on implicit assumptions, such as ____.

A

Such as continuity of state/action spaces, full observability, or episodic learning, which require verification and inherently limit their applicability to diverse real-world scenarios.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Assumptions & Approximations:

value-based methods also face significant challenges when applied to 1.

A
  1. continuous control problems
How well did you know this?
1
Not at all
2
3
4
5
Perfectly