7. Current Challenges and Future Research Directions Flashcards
(12 cards)
How does model-free methods fair with Sample Efficiency?
Many advanced model-free methods still demand tens to hundreds of millions of frames to surpass human expert performance in tasks like Atari games, rendering real-world problems impractical to solve within reasonable timeframes
How does On-policy methods fair with Sample Efficiency?
On-policy methods, by their nature, suffer from poor sample efficiency because any changes in policy necessitate the collection of entirely new samples.
Which models in RL demand extensive Hyperparameter Tuning?
Off-policy actor-critic, Q-learnin g, and other value-fitting methods demand extensive hyperparameter searches, which negatively impact both stability and generalization
What is a common issue with significant number of RL methods in regarding the Stability & Convergence principal?
A significant number of RL methods are not founded on gradient descent, leading to open questions regarding their convergence guarantees.
especially for value -fitting algorithms that employ nonlinear function approximators
When discussing Stability & Convergence principals in respect to rl algorthims: value fitting, model-based methods, and Policy Gradient algorithms?
value -fitting algorithms that employ nonlinear function approximators can have difiiculties with gradient descent.
While model-based methods generally converge, they may produce less generalized solutions and are susceptible to exploitation by flawed environmental models, leading to unrealistic behaviors
Policy Gradient methods, despite their theoretical convergence for convex objective functions, are plagued by high variance, which can easily destroy training progress.
Although improvements like experience replay and target networks in DQN enhance performance, they introduce numerous ___________ , making tuning a complex and often ____________ task across different RL applications
First blank - hyperparameters
Second blank - non-generalizable
What the relationship between sample efficiency, bias, and variance?
The objectives of increasing sample efficiency, lowering bias, and lowering variance are often conflicting
Bias vs. Variance Trade-offs
For instance, efforts to reduce variance, such as through increased sampling, typically come at the cost of reduced ___.
Can you name another instance?
First blank - sample efficiency.
Another Instance - reducing sample numbers can increase bias and reduce variance.
Computational Expense
___ methods inherent in model-based solutions are computationally ___.
first blank - Iterative
second blank - intensive
Computational Expense
Furthermore, achieving high parallelism, which is crucial for large-scale problems, is not straightforward for all RL methods, particularly 1 learning and 2 RL.
- value-based
- model-based
Assumptions & Approximations:
Many RL methods rely on implicit assumptions, such as ____.
Such as continuity of state/action spaces, full observability, or episodic learning, which require verification and inherently limit their applicability to diverse real-world scenarios.
Assumptions & Approximations:
value-based methods also face significant challenges when applied to 1.
- continuous control problems