CCC Flashcards

1
Q

What are the properties of DEC-POMDPs

A

Combines elements of game theory and POMDPs
NEXP-Complete
Agents can benefit from communication
Optimal solution balance cost of communicating w/ cost of not communicating
Some algorithms, heuristics, applications are known

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a DEC-POMDP

A

A POMDP with a set of agents taking actions simultaneously

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is inverse reinforcement learning?

A

An agent uses the environment and induced behavior to infer a reward function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Briefly describe Maximum Likelihood Inverse RL (MLIRL)

A

Guess rewards -> Compute policy -> Measure Pr(D|Pi) -> Gradient on R -> Guess rewards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the general case (multinomial) probability of an action being optimal in policy shaping?

A

Pr(a|d_a) = C^(delta_a)/[C^(delta_a) + (1-C)^(delta_a)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the only one case (multi-binomial) probability of an action being optimal in policy shaping?

A

P(a|d_a) ~= C^(delta_a) * (1-C)^[Sum_(j!=s) delta_j]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you represent trajectories as an MDP?

A

States: partial sequences
Actions: story actions
Model: player model
rewards: author evaluations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly