C5 Flashcards

1
Q

model-based methods

A

the agent first build its own internal transition model from the environment feedback and uses this local model to find out about the effects of actions on states and rewards. The agent can then generate policy updates from the internal model (planning), without causing changes to the environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

advantage of model-based

A

the agent has its own model of the state transitions of the world, so it can learn the best policy for free, without further incurring the cost of acting in the environment => low sample complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

disadvantage of model-based

A

the learned transition function may be inaccurate and thus the resulting policy of low quality => uncertainty and model bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

model-based planning and learning

A
  • the agent uses the Q function as behaviour policy to sample the new state and reward from the environment and update the policy
  • the agent records the new state and reward in a local transition and reward function. we can now choose to sample from the (cheap) local transition function or from the (expensive) environment transition function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

the goal of model-based methods

A
  • to solve larger and more complex problems in the same amount of time (lower sample complexity and a deeper understanding of the environment)
  • that the generalization power improves so much that new classes of problems can be solved
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

why does the model-based agent learn the local transition function?

A

then once the accuracy of this local function is good enough, it can sample from it to improve the policy, without incurring the cost of actual environment samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is Dyna?

A

a hybrid approach, between model-based and model-free learning

imagination: apart from updates by planning using the local transition function, it also uses environment samples to update the policy directly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what happens when we turn on planning in model-based learning?

A

for each environment sample, we perform N planning steps and the planning amplifies any useful reward information that the agent has learned from the environment, and plows it back into the policy function quickly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how can we reduce the uncertainty of the transition model in model-based methods?

A
  • increasing the number of environment samples
  • use Gaussian processes, where the dynamics model is learned by giving an estimate of the function and of the uncertainty of the function with a covariance matrix on the entire dataset
  • use ensemble methods
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

latent models

A

goal: dimensionality reduction
idea: in most high-dimensional problems some elements are less important => abstract these away from the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is Model-Predictive Control (MPC)?

A

the model is optimized for a limited time into the future, and then it is re-learned after each environment step. In this way small errors do not get a chance to accumulate and influence the outcome greatly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

name 3 methods that perform planning with a neural network

A
  • Value Iteration Networks (VIN)
  • TreeQN
  • Predictron
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Value Iteration Networks (VIN)

A
  • differentiable multi-layer convolutional networks used for planning in Grid worlds
  • uses backpropagation to learn the value iteration parameters, allowing it to navigate unseen environments
  • can generalize to unknown transition probabilites
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

TreeQN

A

extended version of VIN, uses observation abstraction to handle irregular shapes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

end-to-end learning for planning

A

hand-crafted planning algorithms are replaced by differentiable approaches, so the system can learn to plan and make decisions directly from raw input data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is MuZero?

A

a new architecture to learn transition functions for chess, shogi and Go. It can learn the rules of Atari games and board games

17
Q

what is the dynamics model?

A

the combination of transition function and reward

18
Q

what is the difference between planning and learning?

A

with learning you change the environment, which you cannot undo and with planning you can do reversable actions

19
Q

how can we improve the weakness of model-based?

A

weakness: when the local model is not accurate

  • improve the model
  • improve the planning procedure
20
Q

how can we improve the model?

A
  • uncertainty modelling
  • latent models
21
Q

how can we improve planning?

A
  • Model-Predictive Control
  • end-to-end planning and learning
22
Q

what is the biggest drawback of MuZero?

A

it requires a lot of computation

23
Q

what is so wonderful about MuZero?

A

it can find out the rules of multiple games by itself