Markov decision problems and Dynamic programming Flashcards

1
Q

A transition has ___

A

The current state, current action, reward received and next state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

We shouwld always leave all the possible and impossible actions for every state?

A

Yes, and if the action is impossible to do then the output will be doing nothing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If we know the present then the future is ___ of the past. This means that we don’t need the past to decide what will happen next

A

independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

To describe our problem we need ___

A

State space
Action Space
Reward function
Transition Probabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Its better to design the reward in a ___

A

abstract way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In practice when the agent is learning, we showld give ___ in the reward to serve as a ___ to the agent

A

hints

guide

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Optimality criterion defines the criterion to select between ___

A

actions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Discount factor acts as an ___ rate

It makes the agent prefer rewards ___ than ___

A

inflation
sooner
latter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Policy is a mapping that maps everything the agent as ___ (___) to distributions over ___

A

seen so far
history
actions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly