Options Flashcards

1
Q

What makes RL hard?

A

Delayed reward

Bootstrapping (we are doing estimates of estimates) - We need exploration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is temporal abstraction?

A

Taking smaller actions and aggregating/abstracting them into larger actions.
e.g. instead of taking individual steps through a room take the action of moving through a room and through the door

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does temporal abstraction help with?

A

Temporal abstraction can help with the problem of delayed rewards by requiring less steps between taking actions and achieving rewards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a temporal abstraction option?

A

The combination of (I, Pi, and Beta), where
I is the initiation set of states
Pi is a policy mapping states to actions
Beta is the termination set of states

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an SMDP?

A

A semi-Markov Decision Process. Instead of using a single step size, an SMDP is allowed to make larger, variable jumps using options. Once properly defined an SMDP can be treated as an MDP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

True/False - Temporal abstraction guarantees state space abstraction

A

False - It is possible that we can use temporal abstraction to abstract the state space, but it is not guaranteed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some benefits of temporal abstraction?

A
  1. Temporally abstracted MDPs inherit optimality (including convergence and stability)
  2. Allows us to ignore “boring” parts of the state
  3. May allow for state abstraction (which makes the MDP significantly easier)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is modular reinforcement learning?

A

A subfield of RL that focuses on arbitration processes using goal abstraction. That is, how to decide between parallel, competing goals (in predator prey these may be eat vs not get eaten)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is greatest mass Q-learning?

A

Track multiple goals (each has a q function). Add all q functions and take the largest action

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is top Q-learning

A

Track multiple goals (each has a q function). Take the action with the highest q value (looking at all goals)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is negotiated W-learning?

A

Track multiple goals/agents (each has a q function). The agent with the most to lose gets to choose the action i.e. look at difference between the best option and worst for each agent and let that one choose the action.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Arrow’s impossibility theorem?

A

It essentially says there is no way to design a fair voting system with multiple options. Compatibility between goals may not be guaranteed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Monte Carlo Tree Search?

A

An algorithm for solving MDPs iteratively. Can be viewed as a policy search algorithm.

Select best action -> Expand based on actions -> Simulate using rollout policy -> Backup -> Select -> …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a way to improve rollout policies in MCTS?

A

Apply constraints which we expect to help better explore without additional knowledge (e.g. avoid getting eaten)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are pros and cons (properties) of MCTS?

A

Pro - Useful for large state spaces, planning time is independent of number of states,
Con - Requires many samples to get a good estimate, running time is exponential in the horizon O( (|A|*steps)^Horizon )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly