Week 4 - Handling Uncertainty Flashcards

Question 1

Q

Certainty assumptions in normal searches

Answer

A

.actions are deterministic
. current state is fully observable

However this doesnt apply in many real world scenarios

ie a mars rover may intend to travel somewhere but there is a probabiliy of crashing ( not deterministic)

current state isnt fully obsevable - sensor may be wrong

Question 2

Q

stochastic

Answer

A

result of action uncertain ( is probabilistic)

Question 3

Q

how does non determinism affect state transitions

Answer

A

means state transitions no longer consistent probability another outcome may occur ( than the intended)

Question 4

Q

what do markov chains model

Answer

A

markov chains model probabilistic transition between states where next state only determinied by the probabilistic transition between states NOT HISTORY OF STATE

Question 5

Q

what is markov property

Answer

A

next state determined by probabilistic transition between states not the history of a state

Question 6

Q

important distinction about markov chains

Answer

A

DOESNT USE ACTIONS IT USES STATES

Question 7

Q

what are MDP ( Markov decision process)

Answer

A

is a sequential decision making problem for a fully observable but stocahstic environment

we carry on with the notion of markov chains but this time with the probability of actions between states and that states have rewards

Question 8

Q

what are the assumptions of MDPs

Answer

A

assumption
markov property holds
probability distribution is stationary ( probabilirtiws dont randomly change)

Question 9

Q

what do we need to model MDPS so we can solve them

Answer

A

set of all states the world can be in

T(s,a ,s’) - transition model that tells us if we are in a state s and take action a what is probabilirty we get to state s’

Initial state of problem

Reward (function) for a given state

Question 10

Q

what is an mdp solution

Answer

A

mdp solution is not a plan ( sequence ) of actions as environment is stochastic and so actions may fail (have unintended consequences)

hence an mdp solution is a politcy
that tells us :
for each state what is the (optimal) action to take
and hence the optimal policy has the highest utility

solutions can be different (per attempt solving due to stochastic nature)

Question 11

Q

formula for expected value ( expected utitlity)

Answer

A

E(U) = [sum of] P(c) x U(c)

c - the values our variable can take

Question 12

Q

bayes rule

Answer

A

p(A n B) / p(B)

Question 13

Q

just read

Answer

A

E[U|s,a]=∑s′∈nei(s)P(s′|s,a)U(s′)

Question 14

Q

properties of an optimal policy

Answer

A

complete ( covers all states)
optimal ( gives best action for state s)
stationary (action depends on current state)
proper (reaches a terminal node )

Question 15

Q

explain how value iteration works

Answer

A

how value iteration works
Sets all states to have utility 0 exerpt terminal states
do one round of value iteration ie apply bellmans update to each state once
keep iterating ( using value iteration) until u i+1 = ui for all states ( utility for all states dont change when you iterate again)

Question 16

Q

how to extract optimal policiy when values have converged

Answer

Study These Flashcards

A

oh is it just saying to exteact optimal policity once its converged

pick the action that has highest expected utility for each state

Question 17

Q

bellmans equation

Answer

Study These Flashcards

A

U i+1 (s)=R(s)+γ⋅ a∈A max s ′∑ P(s ′∣s,a) ⋅ U i(s ′ )

Question 18

Q

expected utility of a policy

Answer

Study These Flashcards

A

U π (s)=E[ t=0 ∑ ∞ γ ^t R(St)]

its saying the discount factor increases in power each time you take an action

ie

Week 4 - Handling Uncertainty Flashcards

(18 cards)