11 - Rational Decisions Over Time Flashcards

1
Q

Difference between search problems and MDPs

A

Search problems aim to find an OPTIMAL SEQUENCE. MDPs aim to find an OPTIMAL POLICY.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Optimal policy maximizes the ________.

A

EXPECTED UTILITY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Types of utility functions

A

Additive & Discounted utility functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Markov Decision Process (MDP)?

A

MDP = Markov Chain + Actions + Rewards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Partially Observable Markov Decision Process (POMDP)?

A

POMDP = Hidden Markov Model + Actions + Rewards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

POMDPs are generalizations of a ________ without ________.

A

POMDPs are generalizations of a MDP without DIRECT STATE OBSERVATIONS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Advantage of MDPs

A

Easy to compute and specify

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Disadvantage of MDPs

A

Assumes perfect knowledge of the state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Advantage of POMDPs

A

Allows for learning and uncertainty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Disadvantage of POMDPs

A

Computationally expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What problems can be modeled as Markov Decision Processes?

A

Sequential Decision Problems in uncertain discrete environments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Utility of a state sequence is the sum of all ________.

A

REWARDS OVER THE SEQUENCE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

MDP vs. POMDP - What is more difficult to solve?

A

POMDPs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How are POMDPs solved?

A

Solved by conversion to an MDP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Value iteration vs. policy iteration - What converges faster and why?

A

Policy iteration since the policy might be optimal without knowing the exact utilities of each state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly