RL: Chapter 2: Mutli-armed Bandits Flashcards

1
Q

Original form of the

k-armed bandit problem

A

You are faced repeatedly with a choice among k different options, or actions.

After each choice, you receive a numerical reward chosen from a stationary distribution that depends on the action you selected.

Your objective is to maximize the expected total reward over some time period. E.g. over 1000 action selections or time steps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Greedy actions

A

The action whose estimate value is greatest at a time step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Exploiting vs Exploring

A

You are exploiting your current knowledge when you select one of the greedy actions.

You are exploring when you select a nongreedy action, as it enables you to improve your estimate of the nongreedy action’s value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ε-greedy methods

A

Methods that behave greedily most of the time, but every once in a while, with small probability ε, instead select randomly from among all the actions with equal probability, independently of the action-value estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Associative search task

A

A task that involves both trial-and-error learning to search for the best actions, and association of these actions with the situations in which they are best.

A.k.a. contextual bandits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Full Reinforcement Learning Problem

A

Tasks in which the action is allowed to affect the next situation as well as the reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly