Solution Methods Flashcards

(99 cards)

1
Q

What is the purpose of reward signals?

A

To provide feedback to the agent about the quality of its actions in reinforcement learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True or false: Exploration is important in reinforcement learning.

A

TRUE

Exploration helps agents discover new strategies and improve performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fill in the blank: Q-learning is a type of _______ learning.

A

value-based

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does policy refer to in reinforcement learning?

A

A strategy that defines the agent’s actions based on its state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define Markov Decision Process (MDP).

A

A mathematical framework for modeling decision-making where outcomes are partly random and partly under the control of a decision maker.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the discount factor in reinforcement learning?

A

A value between 0 and 1 that determines the importance of future rewards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

True or false: Temporal Difference Learning combines ideas from dynamic programming and Monte Carlo methods.

A

TRUE

It updates estimates based on other learned estimates without waiting for a final outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the role of value functions?

A

To estimate how good it is for an agent to be in a given state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Fill in the blank: SARSA stands for _______.

A

State-Action-Reward-State-Action

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define exploit in the context of reinforcement learning.

A

To choose the best-known action based on current knowledge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is policy gradient method?

A

A type of reinforcement learning that optimizes the policy directly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

True or false: Deep Q-Networks use neural networks to approximate Q-values.

A

TRUE

This approach allows handling high-dimensional state spaces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does experience replay do?

A

Stores past experiences to improve learning efficiency and stability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Fill in the blank: Actor-Critic methods involve both an _______ and a critic.

A

actor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define exploration-exploitation tradeoff.

A

The balance between exploring new actions and exploiting known rewarding actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the Bellman equation?

A

A recursive equation that relates the value of a state to the values of its successor states.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

True or false: Monte Carlo methods require complete episodes to update value estimates.

A

TRUE

These methods average returns from complete episodes for learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a reward shaping technique?

A

Modifying the reward function to make learning easier and faster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Fill in the blank: Dyna-Q integrates learning with _______ and planning.

A

simulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Define policy iteration.

A

An algorithm that iteratively improves the policy based on value function updates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the value iteration algorithm?

A

An algorithm that computes the optimal policy by iteratively updating value estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

True or false: Hierarchical reinforcement learning breaks tasks into smaller subtasks.

A

TRUE

This approach simplifies complex problems by structuring them hierarchically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is transfer learning in reinforcement learning?

A

Applying knowledge gained in one task to improve learning in a different but related task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Fill in the blank: Multi-agent reinforcement learning involves _______ agents.

A

multiple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Define **curriculum learning**.
A training strategy where tasks are presented in increasing difficulty to improve learning efficiency.
26
What is **inverse reinforcement learning**?
Learning an agent's goals by observing its behavior and inferring the reward function.
27
True or false: **Continuous action spaces** can complicate reinforcement learning.
TRUE ## Footnote Algorithms must handle infinite action choices, requiring specialized techniques.
28
What is a **state space**?
The set of all possible states in which an agent can find itself.
29
Fill in the blank: **Function approximation** is used to generalize learning across _______.
states
30
Define **sparse rewards**.
A situation where rewards are infrequent, making learning more challenging.
31
What is **bootstrapping** in reinforcement learning?
Using existing value estimates to update other value estimates.
32
True or false: **Reward hacking** is when agents exploit loopholes in reward functions.
TRUE ## Footnote This can lead to unintended behaviors in agents.
33
What is a **policy network**?
A neural network that outputs action probabilities for given states.
34
Fill in the blank: **Actor** in Actor-Critic methods is responsible for _______.
selecting actions
35
Define **Q-value**.
The expected future reward of taking a specific action in a given state.
36
What does **epsilon-greedy** strategy do?
Balances exploration and exploitation by randomly choosing actions with a small probability.
37
True or false: **Stochastic policies** can produce different actions for the same state.
TRUE ## Footnote This introduces randomness in decision-making.
38
What is **policy evaluation**?
The process of determining the value function for a given policy.
39
Fill in the blank: **Deep reinforcement learning** combines deep learning with _______.
reinforcement learning
40
Define **reward function**.
A function that defines the rewards received after taking actions in states.
41
What is **action space**?
The set of all possible actions an agent can take in a given state.
42
True or false: **Deterministic policies** always choose the same action for a state.
TRUE ## Footnote This contrasts with stochastic policies.
43
What does **state representation** involve?
Encoding the state information in a format suitable for learning algorithms.
44
Fill in the blank: **Temporal difference** methods update values based on _______.
estimates of future rewards
45
Define **exploration strategy**.
A method for deciding how to explore the action space during learning.
46
What is **reward discounting**?
The practice of reducing the value of future rewards to prioritize immediate rewards.
47
True or false: **Off-policy learning** allows learning from actions not taken by the agent.
TRUE ## Footnote This is useful for learning from historical data.
48
What is **policy optimization**?
The process of improving a policy to maximize expected rewards.
49
Fill in the blank: **State-action pairs** are crucial for learning in _______.
reinforcement learning
50
Define **reinforcement learning**.
A type of machine learning where agents learn by interacting with an environment to maximize rewards.
51
What is the purpose of **reward signals**?
To provide feedback to the agent about the quality of its actions in reinforcement learning.
52
True or false: **Exploration** is important in reinforcement learning.
TRUE ## Footnote Exploration helps agents discover new strategies and improve performance.
53
Fill in the blank: **Q-learning** is a type of _______ learning.
value-based
54
What does **policy** refer to in reinforcement learning?
A strategy that defines the agent's actions based on its state.
55
Define **Markov Decision Process (MDP)**.
A mathematical framework for modeling decision-making where outcomes are partly random and partly under the control of a decision maker.
56
What is the **discount factor** in reinforcement learning?
A value between 0 and 1 that determines the importance of future rewards.
57
True or false: **Temporal Difference Learning** combines ideas from dynamic programming and Monte Carlo methods.
TRUE ## Footnote It updates estimates based on other learned estimates without waiting for a final outcome.
58
What is the role of **value functions**?
To estimate how good it is for an agent to be in a given state.
59
Fill in the blank: **SARSA** stands for _______.
State-Action-Reward-State-Action
60
Define **exploit** in the context of reinforcement learning.
To choose the best-known action based on current knowledge.
61
What is **policy gradient** method?
A type of reinforcement learning that optimizes the policy directly.
62
True or false: **Deep Q-Networks** use neural networks to approximate Q-values.
TRUE ## Footnote This approach allows handling high-dimensional state spaces.
63
What does **experience replay** do?
Stores past experiences to improve learning efficiency and stability.
64
Fill in the blank: **Actor-Critic** methods involve both an _______ and a critic.
actor
65
Define **exploration-exploitation tradeoff**.
The balance between exploring new actions and exploiting known rewarding actions.
66
What is the **Bellman equation**?
A recursive equation that relates the value of a state to the values of its successor states.
67
True or false: **Monte Carlo methods** require complete episodes to update value estimates.
TRUE ## Footnote These methods average returns from complete episodes for learning.
68
What is a **reward shaping** technique?
Modifying the reward function to make learning easier and faster.
69
Fill in the blank: **Dyna-Q** integrates learning with _______ and planning.
simulation
70
Define **policy iteration**.
An algorithm that iteratively improves the policy based on value function updates.
71
What is the **value iteration** algorithm?
An algorithm that computes the optimal policy by iteratively updating value estimates.
72
True or false: **Hierarchical reinforcement learning** breaks tasks into smaller subtasks.
TRUE ## Footnote This approach simplifies complex problems by structuring them hierarchically.
73
What is **transfer learning** in reinforcement learning?
Applying knowledge gained in one task to improve learning in a different but related task.
74
Fill in the blank: **Multi-agent reinforcement learning** involves _______ agents.
multiple
75
Define **curriculum learning**.
A training strategy where tasks are presented in increasing difficulty to improve learning efficiency.
76
What is **inverse reinforcement learning**?
Learning an agent's goals by observing its behavior and inferring the reward function.
77
True or false: **Continuous action spaces** can complicate reinforcement learning.
TRUE ## Footnote Algorithms must handle infinite action choices, requiring specialized techniques.
78
What is a **state space**?
The set of all possible states in which an agent can find itself.
79
Fill in the blank: **Function approximation** is used to generalize learning across _______.
states
80
Define **sparse rewards**.
A situation where rewards are infrequent, making learning more challenging.
81
What is **bootstrapping** in reinforcement learning?
Using existing value estimates to update other value estimates.
82
True or false: **Reward hacking** is when agents exploit loopholes in reward functions.
TRUE ## Footnote This can lead to unintended behaviors in agents.
83
What is a **policy network**?
A neural network that outputs action probabilities for given states.
84
Fill in the blank: **Actor** in Actor-Critic methods is responsible for _______.
selecting actions
85
Define **Q-value**.
The expected future reward of taking a specific action in a given state.
86
What does **epsilon-greedy** strategy do?
Balances exploration and exploitation by randomly choosing actions with a small probability.
87
True or false: **Stochastic policies** can produce different actions for the same state.
TRUE ## Footnote This introduces randomness in decision-making.
88
What is **policy evaluation**?
The process of determining the value function for a given policy.
89
Fill in the blank: **Deep reinforcement learning** combines deep learning with _______.
reinforcement learning
90
Define **reward function**.
A function that defines the rewards received after taking actions in states.
91
What is **action space**?
The set of all possible actions an agent can take in a given state.
92
True or false: **Deterministic policies** always choose the same action for a state.
TRUE ## Footnote This contrasts with stochastic policies.
93
What does **state representation** involve?
Encoding the state information in a format suitable for learning algorithms.
94
Fill in the blank: **Temporal difference** methods update values based on _______.
estimates of future rewards
95
Define **exploration strategy**.
A method for deciding how to explore the action space during learning.
96
What is **reward discounting**?
The practice of reducing the value of future rewards to prioritize immediate rewards.
97
True or false: **Off-policy learning** allows learning from actions not taken by the agent.
TRUE ## Footnote This is useful for learning from historical data.
98
What is **policy optimization**?
The process of improving a policy to maximize expected rewards.
99
Fill in the blank: **State-action pairs** are crucial for learning in _______.
reinforcement learning