W8 Hierachical RL Flashcards

Question 1

Q

1) Why can hierarchical reinforcement learning (HRL) be faster?
2) Why can hierarchical reinforcement learning be slower?

Answer

A

1) HRL simplifies problems through abstraction, the temporal abstractions increase sample efficiency
2) domain knowledge needs to be available, algorithmic complexity cost time, marco-actions

Question 2

Q

Why may hierarchical reinforcement learning give an answer of lesser quality?

Answer

A

The macro-actions may skip over possible shorter routes, that the primitive actions would have found.

Question 3

Q

Is hierachical reinforcement more general or less general?

Answer

A

More general
subtasks reduce brittleness due to overspecialization of policies. Policies become more general, and are able to adapt to changes in the environment more easily.

Question 4

Q

What is the options framework?

Answer

A

Whenever a state is reached that is a subgoal, then, apart from following a primitive action (main policy), you can follow the option policy, a macro action consisting of a different subpolicy specially aimed at satisfying the subgoal in one large step. In this way macros are incorporated into the reinforcement learning framework.

Question 5

Q

What is an option?

Answer

A

An option is a group of actions with a termination
condition.
Options take in environment observations and output actions until a termination condition is met.

Question 6

Q

What are the three elements that an option consists of?

Answer

A

The initiation set 𝐼 ⊆ 𝑆 are the states that the option can start from
The subpolicy 𝜋 : 𝑆 × 𝐴 → [0, 1] internal to this particular option
The termination condition 𝛽 : 𝑆 → [0, 1] tells us if 𝜔 terminates in 𝑠

Question 7

Q

What is a macro?

Answer

A

Macros are combinations of primitive actions, and their use can greatly improve the performance of the policy.

Question 8

Q

What is intrinsic motivation?

Answer

A

An inner drive to explore.
Named so to contrast it with classic extrinsic motivation (the conventional RL reward signal).
Often related to model curiosity.

Question 9

Q

what are the three elements of an option 𝜔?

Answer

A

initialization set I_𝜔: the states that the option can start from
subpolicy 𝜋𝜔 (𝑎|𝑠): internal to this particular option
terminal condition 𝛽𝜔 (𝑠): tells us if 𝜔 terminates in s

Question 10

Q

How do multi agent and hierarchical reinforcement learning fit together?

Answer

A

agents often work together in teams or other hierarchical structure

Question 11

Q

What is so special about Montezuma’s Revenge?

Answer

A

The reward signal is little. The agent needs to walk without the reward changing

W8 Hierachical RL Flashcards

(11 cards)