C6 Flashcards

1
Q

what is zero sum?

A

competitive games: the win of player A is the loss of player B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is decision complexity

A

the number of end positions that define the value of the initial game position. the larger the number of actions in a position, the larger the decision complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is state space complexity?

A

the number of legal positions reachable from the initial position of a game

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is AlphaGo?

A

it is a program that beat the human Go champion, which consists of a combination of supervised learning from grandmaster games and from self-play games

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

name the 3 categories of programs that play Go

A

minimax-style programs, MCTS-based programs and the AlphaGo programs (MCTS combined with deep self-play)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is AlphaGo Zero?

A

it performs tabula rasa learning of Go, based solely on self-play. It plays stronger than AlphaGo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how does a self-learning system work?

A
  1. searcher uses the evaluation network to estimate reward values and policy actions, and the search results are used in games against the opponent in self-play
  2. the game results are collected in a buffer, which is used to train the evaluation network in self-learning
  3. by playing a tournament against a copy of ourselves a virtuous cycle of ever-increasing function improvement is created
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the 3 levels of self-play?

A

playing a against a copy of yourself at:
1. move-level: in MCTS playouts our opponent is a copy of ourselves
2. example-level: input for self training the approximator for the policy and the reward functions is generated by our own games
3. tournament-level: create a training curriculum that starts tabula rasa and ends at world champion level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

2 advantages of MCTS over minimax and alpha-beta

A
  1. it is based on averaging single lines of play instead of traversing subtrees recursively
  2. it does not need a heuristic evaluation, it knows at the end if we have a win or a loss
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are the 4 operations of MCTS?

A
  1. select: traverse the tree from root to a leaf using UCT
  2. expand: add a child to the tree at each step
  3. playout: play random moves until the end of the game (self-play)
  4. backpropagation: propagate the reward back upwards in the tree
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does UCT do?

A

it calculates the value of a child a, to know if we should expand it
UCT(a) = wins_a / visits_a + C_p * sqrt(ln visits_parent / visits_a)

second term is for exploration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is P-UCT?

A

predictor-UCT: it uses input from the policy head of the deep network
P-UCT(a) = wins_a / visits_a + C_p * pi(a|s) * sqrt(visits_parent) / (1+ visits_a)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is tabula rasa learning?

A

it is when you start to learn with zero knowledge: only self-play and a single neural network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is curriculum learning?

A

the network is trained in many small steps, starting against a very weak opponent. As our level of play increases, so does the difficulty of the moves that our teacher proposes to us.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what are the 2 architectural elements of AlphaGo Zero?

A

A neural network and MCTS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is minimax?

A

the root node chooses the child with the maximum value, the next level chooses the minimum value (opponent) and so on

17
Q

what is the estimated state space in Go

A

10^170
(chess is 10^47)

18
Q

what are the 2 architectural elements of conventional chess programs?

A

alpha-beta and a heuristic evaluation function

19
Q

what is the biggest problem that was overcome by AlphaGo Zero?

A

instability

20
Q

how was stability achieved in AlphaGo Zero?

A
  • coverage of the state space: playing a large number of games and MCTS look-ahead
  • correlation is reduced through experience replay buffer
  • convergence of training is improved by using on-policy MCTS and taking small training steps (small learning rate)
21
Q

why is AlphaGo Zero faster than Alpha Go?

A

it uses curriculum learning