chapter 18 Flashcards

(21 cards)

1
Q

What is the goal of reinforcement learning (RL)?

A

To learn a policy that maximizes cumulative rewards over time through interaction with an environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a policy in reinforcement learning?

A

A strategy used by an agent to determine actions based on observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a common application of reinforcement learning?

A

Game playing, robotics, home automation, stock trading.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the role of a reward in reinforcement learning?

A

A scalar signal that guides the learning of the policy by evaluating the consequences of actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is OpenAI Gym?

A

A toolkit for developing and comparing RL algorithms using standardized environments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the ‘inverted pole’ problem in OpenAI Gym?

A

A control task where a cart must balance a pole upright by moving left or right.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the observations in the cartpole problem?

A

Cart position, cart velocity, pole angle, and pole angular velocity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the action space in the cartpole example?

A

Two discrete actions: accelerate left or right.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why is randomness introduced in neural network policies?

A

To explore new actions and avoid getting stuck in local optima.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the credit assignment problem in RL?

A

Determining which actions contributed most to a long-term outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the role of the discount factor γ in RL?

A

To reduce the weight of future rewards, ensuring that immediate rewards are prioritized.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the REINFORCE algorithm?

A

A policy gradient method that adjusts policy parameters based on advantages of actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a Markov Decision Process (MDP)?

A

A model defining states, actions, transition probabilities, and rewards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Bellman Optimality Equation?

A

An equation that defines the value of a state under an optimal policy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a Q-value?

A

An estimate of expected future rewards for taking an action in a given state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Q-learning?

A

An off-policy RL algorithm that learns Q-values to guide optimal actions.

17
Q

What is the ε-greedy policy?

A

A policy that explores randomly with probability ε and acts greedily otherwise.

18
Q

What is Deep Q-Learning?

A

An extension of Q-learning using deep neural networks to approximate Q-values.

19
Q

What is catastrophic forgetting in RL?

A

A phenomenon where learning new behaviors erases previously learned ones.

20
Q

What is the TF-Agents library?

A

A TensorFlow-based library that provides tools and environments for RL research.

21
Q

Why is a simulator important in RL?

A

It allows the agent to safely and quickly explore many actions to learn optimal behavior.