Markov Decision Process Flashcards
(3 cards)
1
Q
When can we classify a process as Markovian?
A
When the future is independent of the past given the present
2
Q
What are the consequences and possible interpretations of choosing a discount factor (gamma) smaller then 1?
A
Choosing gamma smaller than one ensures convergence in infinite-horizon tasks, models time preference, and handles uncertainty about the future
3
Q
How is the value equation written when using a policy pi?
A
It is the sum of all possible actions times the reward observed by that action in that state plus the discounted sum of rewards of the probable future values of the states