7. Current Challenges and Future Research Directions Flashcards

Question 1

Q

How does model-free methods fair with Sample Efficiency?

Answer

A

Many advanced model-free methods still demand tens to hundreds of millions of frames to surpass human expert performance in tasks like Atari games, rendering real-world problems impractical to solve within reasonable timeframes

Question 2

Q

How does On-policy methods fair with Sample Efficiency?

Answer

A

On-policy methods, by their nature, suffer from poor sample efficiency because any changes in policy necessitate the collection of entirely new samples.

Question 3

Q

Which models in RL demand extensive Hyperparameter Tuning?

Answer

A

Off-policy actor-critic, Q-learnin g, and other value-fitting methods demand extensive hyperparameter searches, which negatively impact both stability and generalization

Question 4

Q

What is a common issue with significant number of RL methods in regarding the Stability & Convergence principal?

Answer

A

A significant number of RL methods are not founded on gradient descent, leading to open questions regarding their convergence guarantees.

especially for value -fitting algorithms that employ nonlinear function approximators

Question 5

Q

When discussing Stability & Convergence principals in respect to rl algorthims: value fitting, model-based methods, and Policy Gradient algorithms?

Answer

A

value -fitting algorithms that employ nonlinear function approximators can have difiiculties with gradient descent.

While model-based methods generally converge, they may produce less generalized solutions and are susceptible to exploitation by flawed environmental models, leading to unrealistic behaviors

Policy Gradient methods, despite their theoretical convergence for convex objective functions, are plagued by high variance, which can easily destroy training progress.

Question 6

Q

Although improvements like experience replay and target networks in DQN enhance performance, they introduce numerous ___________ , making tuning a complex and often ____________ task across different RL applications

Answer

A

First blank - hyperparameters
Second blank - non-generalizable

Question 7

Q

What the relationship between sample efficiency, bias, and variance?

Answer

A

The objectives of increasing sample efficiency, lowering bias, and lowering variance are often conflicting

Question 8

Q

Bias vs. Variance Trade-offs

For instance, efforts to reduce variance, such as through increased sampling, typically come at the cost of reduced ___.

Can you name another instance?

Answer

A

First blank - sample efficiency.
Another Instance - reducing sample numbers can increase bias and reduce variance.

Question 9

Q

Computational Expense

___ methods inherent in model-based solutions are computationally ___.

Answer

A

first blank - Iterative
second blank - intensive

Question 10

Q

Computational Expense

Furthermore, achieving high parallelism, which is crucial for large-scale problems, is not straightforward for all RL methods, particularly 1 learning and 2 RL.

Answer

A

value-based
model-based

Question 11

Q

Assumptions & Approximations:

Many RL methods rely on implicit assumptions, such as ____.

Answer

A

Such as continuity of state/action spaces, full observability, or episodic learning, which require verification and inherently limit their applicability to diverse real-world scenarios.

Question 12

Q

Assumptions & Approximations:

value-based methods also face significant challenges when applied to 1.

Answer

A

continuous control problems

7. Current Challenges and Future Research Directions Flashcards

(12 cards)