chapter 18 Flashcards
(21 cards)
What is the goal of reinforcement learning (RL)?
To learn a policy that maximizes cumulative rewards over time through interaction with an environment.
What is a policy in reinforcement learning?
A strategy used by an agent to determine actions based on observations.
What is a common application of reinforcement learning?
Game playing, robotics, home automation, stock trading.
What is the role of a reward in reinforcement learning?
A scalar signal that guides the learning of the policy by evaluating the consequences of actions.
What is OpenAI Gym?
A toolkit for developing and comparing RL algorithms using standardized environments.
What is the ‘inverted pole’ problem in OpenAI Gym?
A control task where a cart must balance a pole upright by moving left or right.
What are the observations in the cartpole problem?
Cart position, cart velocity, pole angle, and pole angular velocity.
What is the action space in the cartpole example?
Two discrete actions: accelerate left or right.
Why is randomness introduced in neural network policies?
To explore new actions and avoid getting stuck in local optima.
What is the credit assignment problem in RL?
Determining which actions contributed most to a long-term outcome.
What is the role of the discount factor γ in RL?
To reduce the weight of future rewards, ensuring that immediate rewards are prioritized.
What is the REINFORCE algorithm?
A policy gradient method that adjusts policy parameters based on advantages of actions.
What is a Markov Decision Process (MDP)?
A model defining states, actions, transition probabilities, and rewards.
What is the Bellman Optimality Equation?
An equation that defines the value of a state under an optimal policy.
What is a Q-value?
An estimate of expected future rewards for taking an action in a given state.
What is Q-learning?
An off-policy RL algorithm that learns Q-values to guide optimal actions.
What is the ε-greedy policy?
A policy that explores randomly with probability ε and acts greedily otherwise.
What is Deep Q-Learning?
An extension of Q-learning using deep neural networks to approximate Q-values.
What is catastrophic forgetting in RL?
A phenomenon where learning new behaviors erases previously learned ones.
What is the TF-Agents library?
A TensorFlow-based library that provides tools and environments for RL research.
Why is a simulator important in RL?
It allows the agent to safely and quickly explore many actions to learn optimal behavior.