Q-Learning Network MLM Flashcards

1
Q

Q-Learning

A

Q-Learning is a model-free reinforcement learning algorithm. The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. Introduction
A

Q-learning is a values iteration algorithm in reinforcement learning. It’s used to learn the optimal policy for a Markov Decision Process (MDP) when the transition model is not known. The policy learned is the one that maximizes the total reward over all successive steps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Action-Value Function
A

In Q-Learning, the value of a state-action pair is represented by a Q-value, stored in a Q-table. The Q-value is a measure of the expected return from a state, given an action and following a specific policy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. Q-table
A

The Q-table is a table of states and actions that guides the agent to the best action from a given state. The table is initialized arbitrarily, and then values are updated iteratively based on the reward received for actions taken.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. Learning Process
A

During the learning process, the agent explores the environment, and the Q-values are updated using the Bellman equation. This equation states that the Q-value for a state-action pair is the immediate reward plus the discounted maximum Q-value for the next state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. Exploitation vs Exploration
A

The agent needs to balance exploration (trying out new actions to see their effect) and exploitation (choosing the action with the highest Q-value). This is often managed with an ε-greedy strategy, where the agent chooses a random action with probability ε and the action with the highest estimated reward with probability 1-ε.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. Convergence
A

Under certain conditions, the Q-learning algorithm is guaranteed to converge to the optimal policy. These conditions include having a finite number of states and actions, and each state-action pair being visited an infinite number of times.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. Advantages
A

Q-Learning is a model-free approach, meaning it can learn optimal actions just from interactions with the environment without needing a model of the environment’s dynamics. It can handle problems with stochastic transitions and rewards without requiring adaptations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. Applications
A

Q-Learning has been used successfully in various domains including robotics, scheduling, gaming, and resource management.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly