W7 Multi Agent RL Flashcards

1
Q

Why is there so much interest in multi-agent reinforcement learning?

A

multi-agent is much more realistic because in the real world we also have multiple agents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is one of the main challenges of multi-agent reinforcement learning?

A

1.partial observability
2.nonstationary environments
3.large state space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Nash strategy?

A

when the agent is guaranteed to do no worse than tie against any other opponent strategy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the NAsh equilibrium?

A

a situation where no agent has anything to gain by changing its own strategy (minimax). The agent does not try to exploit the opponent strategy’s flaws, it just wins when the opponent makes mistakes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Pareto Optimum?

A

the best possible outcome for us where we do not hurt others, and others do not hurt us. It is a cooperative strategy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the Pareto efficient solution?

A

the situation where no cooperative agent can be better off without making at least one other agent worse off

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In a competitive multi-agent system, what algorithm can be used to calculate a
Nash strategy?

A

Counterfactual Regret Minimization (CFR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What makes it diffcult to calculate the solution for a game of imperfect information?

A

it increases the size of the state space, and computing the unknown outcomes quickly becomes unfeasible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the 1) Prisoner’s dilemma and the 2) iterated Prisoner’s dilemma.

A

1) if both prisoners confess they both get 5 years in prison
if both stay silent they both get 2 years in prison
if I confess and the other stays silent, I walk free
if I stay silent and the other confesses, I get 10 years
this is an example of mixed behaviour

2) for multiple rounds of the Prisoner’s dilemma a tit for tat strategy works best: in the first round you play cooperative, after that you play whatever the opponent did in the previous round

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Name two multi-agent card games of imperfect information

A

poker, blackjack, bridge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the setting with a heterogeneous reward function usually called?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is regret?

A

the regret of an action is the amount of reward that is missed by an agent for not choosing the actions with the highest payoff

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Name three kinds of strategies that can occur a multi-agent reinforcement learning.

A

CFR
evolutionary strategies
cooperative strategies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Name two solution methods that are appropriate for solving mixed strategy games.

A

evolutionary methods and cooperative methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What AI method is named after ant colonies, bee swarms, bird blocks, or fish schools? How does it work in general terms?

A

swarm computing

focuses on emerging behavior in decentralized, collective, self-organized systems. Introduces forms of communication between agents

cooperation and survival of the group (Pareto)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Name two solution methods that are appropriate for solving mixed strategy games

A

evolutionary methods and cooperative methods

17
Q

Describe the main steps of an evolutionary algorithm.

A

inspired by bio-genetic processes of reproduction: mutation, recombination, selection. repeat:
1. Evaluate the fitness of each individual of the population
2. Select the fittest individuals for reproduction
3. Through crossover and mutation generate new individuals
4. Replace the least fit individuals by the new individuals

focus on competition and survival of the fittest (Nash)

18
Q

Describe the general form of Hide and Seek and three strategies that emerged from the interactions of the hiders or seekers.

A

Collaboration

19
Q

what is population-based training?

A

it combines evolutionary ideas with RL ideas

teams compete against each other and if a team learns good behaviour they survive: team-based learning of better and better behaviour