CS7642_Week8 Flashcards

1
Q

From an algorithmic perspective, what makes RL hard?

A
  1. Delayed reward (i.e. week feedback signals)
  2. Bootstrapping (need exploration)
  3. # of states, # of actions (state spaces grows massive quickly)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is temporal abstraction?

A

Making the action space bigger. Think about navigating in a house to the bathroom. Instead of saying take 30 steps to the right then 10 steps to the left, we might just say, go to the first doorway, walk down the hallway, and open the door on your left.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Using temporal abstraction, backing up values is slower than it otherwise would be? (True/False)

A

False. If we’re backing up over fewer states/actions, the information propogates back to earlier states more quickly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

An option generalizes the idea of an action? (True/False)

A

True. It’s atomic in the sense that it’s a wrapper that allows us to have more complicated actions that kind of run their own subroutine, then return the reward that was gathered in that subroutine, even though the number of time steps taken may be variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between a regular MDP and an SMDP (semi-MDP)?

A

A SMDP allows us to take variable length time steps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

We always have to treat MDPs and SMDPs differently? (True/False)

A

False. Once we’ve defined the proper form of the Bellman operator for the SMDP, we can essentially treat the SMDP as we would any regular MDP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The number of steps you take in an SMDP don’t matter? (True/False)

A

False. This is why we define SMDPs with a variation of the Bellman operator such that we can treat the SMDP and MDP as the same, even though the time that it takes to accumulate reward in each of the options it takes might be variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

An SMDP will converge to the optimal solution regardless of the options we design? (True/False)

A

False. This gives us the idea of hierarchical optimality. We can guarantee optimality with respect to the options that we have, but we can’t guarantee that particular set of options is in fact optimal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Monte Carlo Tree Search isn’t useful for large state spaces? (True/False)

A

False. This is one of its principal benefits. Planning is also time independent of the size of the state space in MCTS. The trade-off is that you need lots of samples. However, running time grows exponentially.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the three types of abstraction?

A
  1. Temporal abstraction
  2. Goal abstraction
  3. State abstraction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In a 2-player zero-sum deterministic game of perfect information, minimax == maximin? (True/False)

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In a 2-player zero-sum deterministic game of perfect information, there is never an optimal pure strategy for each player? (True/False)

A

False. There is always an optimal pure strategy in a 2-player zero-sum deterministic game of perfect information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In a 2-player zero-sum non-deterministic game of perfect information, minimax != maximim? (True/False)

A

False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In a 2-player zero-sum non-deterministic game of hidden information, minimax != maximim? (True/False)

A

True. This is where the minimax/maximin equivalence breaks down for zero-sum games. This is because Von Neumann theorem breaks down in presence of imperfect information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When plotting the probability distributions for two mixed strategies, the intersection is the optimal strategy? (True/False)

A

False. Similar to the PWLC idea. Find the maximum minimum on the non-dominated piecewise surface to find the optimal strategy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What type of game is prisoner’s dilemma an example of?

A

2-player NON-zero sum non-deterministic game of hidden information.

17
Q

In a 2-player zero-sum deterministic game of perfect information, it doesn’t matter if agents are rational? (True/False)

A

False. Each player recursively is assuming that you’re doing what’s best for you.

18
Q

What is a Nash Equilibrium?

A

A Nash equilibrium exists IFF you randomly chose one of the players and gave them the chance to switch, they would have no reason to do it.

19
Q

Nash Equilibrium only applies to pure strategies? (True/False)

A

False. NE is just a probability distribution over strategies, which can be pure or mixed strategies.

20
Q

In an n=player pure strategy game, if elimination of strictly dominated strategies eliminates all but one combination, that combination is the unique Nash Equilibria? (True/False)

A

True

21
Q

Any Nash Equilibria will survive elimination of strictly dominated strategies? (True/False)

A

True

22
Q

For any finite game with a finite number of n-players and, there exists at least one Nash Equilibria? (True/False)

A

True.