Reinforcement Learning Flashcards
(13 cards)
What is Reinforcement Learning?
A learning paradigm where an agent interacts with an environment to take actions that maximise cumulative numeric rewards (the reward hypothesis).
What are the core elements formalised in an RL problem?
Agent (decision maker), Environment (the world), Actions, States (observations), and Rewards.
What is the difference between environment state and agent state?
Environment state is the true underlying state; agent state is what the agent observes (e.g., partial observations or history).
What distinguishes fully observable from partially observable environments?
Fully observable: agent sees the complete state; partially observable: agent only sees partial observations of the state.
What are the three main components of an RL agent?
Policy (mapping states to actions), Value function (estimating expected returns), and Model (predicting environment dynamics).
Why does RL violate the i.i.d. assumption of supervised learning?
Because samples are sequentially correlated, collected via the agent’s policy, not independently drawn from a stationary distribution.
What is Deep Reinforcement Learning?
Using deep neural networks as function approximators for policies and value functions to handle high-dimensional inputs.
What is imitation learning?
Training agents by mimicking expert demonstrations instead of learning from reward signals.
Name two high-profile RL applications mentioned in the lecture.
AlphaGo (game playing) and ChatGPT RLHF (language model fine-tuning with human preferences).
What is RLHF (Reinforcement Learning from Human Feedback)?
A training paradigm where human feedback provides reward signals to fine-tune models, as used in ChatGPT.
What is the reward hypothesis?
Any goal can be formalised as the maximisation of the expected cumulative reward signal from the environment.
What examples of sequential data and tasks motivate RL?
Games (Go), robotics control, dialogue systems, and autonomous vehicles, where sequence and decision-making are key.
What is the Markov property in the RL context?
The assumption that the future is independent of the past given the present state, enabling state-based decision-making.