Reinforcement Learning Flashcards

(13 cards)

1
Q

What is Reinforcement Learning?

A

A learning paradigm where an agent interacts with an environment to take actions that maximise cumulative numeric rewards (the reward hypothesis).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the core elements formalised in an RL problem?

A

Agent (decision maker), Environment (the world), Actions, States (observations), and Rewards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between environment state and agent state?

A

Environment state is the true underlying state; agent state is what the agent observes (e.g., partial observations or history).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What distinguishes fully observable from partially observable environments?

A

Fully observable: agent sees the complete state; partially observable: agent only sees partial observations of the state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the three main components of an RL agent?

A

Policy (mapping states to actions), Value function (estimating expected returns), and Model (predicting environment dynamics).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why does RL violate the i.i.d. assumption of supervised learning?

A

Because samples are sequentially correlated, collected via the agent’s policy, not independently drawn from a stationary distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Deep Reinforcement Learning?

A

Using deep neural networks as function approximators for policies and value functions to handle high-dimensional inputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is imitation learning?

A

Training agents by mimicking expert demonstrations instead of learning from reward signals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Name two high-profile RL applications mentioned in the lecture.

A

AlphaGo (game playing) and ChatGPT RLHF (language model fine-tuning with human preferences).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is RLHF (Reinforcement Learning from Human Feedback)?

A

A training paradigm where human feedback provides reward signals to fine-tune models, as used in ChatGPT.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the reward hypothesis?

A

Any goal can be formalised as the maximisation of the expected cumulative reward signal from the environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What examples of sequential data and tasks motivate RL?

A

Games (Go), robotics control, dialogue systems, and autonomous vehicles, where sequence and decision-making are key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Markov property in the RL context?

A

The assumption that the future is independent of the past given the present state, enabling state-based decision-making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly