W8 Hierachical RL Flashcards

1
Q

1) Why can hierarchical reinforcement learning (HRL) be faster?
2) Why can hierarchical reinforcement learning be slower?

A

1) HRL simplifies problems through abstraction, the temporal abstractions increase sample efficiency
2) domain knowledge needs to be available, algorithmic complexity cost time, marco-actions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why may hierarchical reinforcement learning give an answer of lesser quality?

A

The macro-actions may skip over possible shorter routes, that the primitive actions would have found.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Is hierachical reinforcement more general or less general?

A

More general
subtasks reduce brittleness due to overspecialization of policies. Policies become more general, and are able to adapt to changes in the environment more easily.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the options framework?

A

Whenever a state is reached that is a subgoal, then, apart from following a primitive action (main policy), you can follow the option policy, a macro action consisting of a different subpolicy specially aimed at satisfying the subgoal in one large step. In this way macros are incorporated into the reinforcement learning framework.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an option?

A

An option is a group of actions with a termination
condition.
Options take in environment observations and output actions until a termination condition is met.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the three elements that an option consists of?

A

The initiation set 𝐼 βŠ† 𝑆 are the states that the option can start from
The subpolicy πœ‹ : 𝑆 Γ— 𝐴 β†’ [0, 1] internal to this particular option
The termination condition 𝛽 : 𝑆 β†’ [0, 1] tells us if πœ” terminates in 𝑠

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a macro?

A

Macros are combinations of primitive actions, and their use can greatly improve the performance of the policy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is intrinsic motivation?

A

An inner drive to explore.
Named so to contrast it with classic extrinsic motivation (the conventional RL reward signal).
Often related to model curiosity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the three elements of an option πœ”?

A

initialization set I_πœ”: the states that the option can start from
subpolicy πœ‹πœ” (π‘Ž|𝑠): internal to this particular option
terminal condition π›½πœ” (𝑠): tells us if πœ” terminates in s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do multi agent and hierarchical reinforcement learning fit together?

A

agents often work together in teams or other hierarchical structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is so special about Montezuma’s Revenge?

A

The reward signal is little. The agent needs to walk without the reward changing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly