Study Flashcards
Explain how Google’s AlphaGo system uses its three neural networks.
- 13-layer policy network
- Value network
- Fast policy network
Minimax
Knowing the complete game tree is useful for a game playing AI, because it allows the program to pick the best possible move at a given game state. This can be done with the minimax algorithm: At each game turn, the AI figures out which move would minimize the worst-case scenario. To do that, it finds the node in the tree corresponding to the current state of the game. It then picks the action that minimizes the worst possible loss it might suffer. This requires traversing the whole game tree down to nodes representing end-of-game states. The minimax algorithm therefore requires the complete game tree. Great for tic-tac-toe, but not useful for chess, and even less so for Go.
Dynamic game
The order of moves (performances of choice by players) is important; e.g. X&Os, shop sets a price and then customer decides whether to buy
Static games
Choices are unordered (perhaps simultaneous) e.g. Roshambo, Diplomacy
Zero-sum game
A game is zero-sum if at each terminal of the extensive form of the game, the sum of the players’ pay-offs is zero.
Ovals in extensive forms of games
Denote ‘information sets’ - sets of nodes in the game tree which cannot be distinguished by the player making a choice.
Strategy (in Game Theory speak)
A function which determines which choice a player makes at every possible choice point (even those which do not arise on a particular occasion).
If one strategy for a player always gives at least as high a payoff for that player as some other strategy, the first dominates the second.
Normal-form
A more compact notation for the pay-offs of two players in a zero-sum game.
- Rows of the matrix correspond to one player
- Columns correspond to another player
- cells shows the pay-off for one player (it is trivial to deduce the other player’s payoff - just negate the existing cells)
A stable solution
A solution is stable if no player wants to move away unilaterally from the solution. At such a stable solution, a game is in equilibrium.
Gain floor of a 2-player zero-sum game (A)
the worst outcome for player 1 if he picks his best strategy i*
Loss ceiling of a 2-player zero-sum game (A)
the best outcome for player 1 (corresponding to the worst for player two) if player2 picks his best strategy j*
Saddle point
When Gain Floor equals Loss ceiling
In this case, both players will do no better by unilaterally picking a different
strategy than i, j. This pair of strategies is a stable solution to the game.
A kind of cooperation emerges, even though the game is adversarial.
In a game with a saddle point, each player has a best strategy (or a set of them). The strategies remain best even if the other players know what they are.
What is the ‘Value’ of a game
the pay-off of its stable solution, i.e.
when Gain Floor = Loss Ceiling = Value (A)
Games without a saddle point
In a game without a saddle point, and hence without a stable solution,
Each pure strategy one player might adopt, if known to an opponent, can be
refuted
Mixed strategy
A mixed strategy involves a random choice among pure strategies
• Each pure strategy is picked with its own particular probability
• Mixed strategies may be found which, even if known, cannot be refuted
• This does apply in once-off games, eg “Prisoner’s Dilemma”, but is mostly
associated with repeatedly-played (iterated) games
Type A program
Work by (or as if by) exploring all legal moves in a game tree to a uniform depth, and backing up evaluations of leaf-node positions.
Type B programs
- choose parts of the tree to explore
- doing selective move generation
- using background knowledge (rules, learned weights, patterns, statistics) to select a few promising lines of play to be explored further, and to discard many poor lines of play to be explored no further
Refinements of Minimax
- alpha-beta, progressive deepening
- transposition tables
- quiescence search
- PVS, Scout, Aspiration Search, MTD(f)
- Killer heuristics, History Heuristic, Null-move Heuristic
- Selective Extensions
- Pondering (‘thinking’ during an opponent’s time)
Alternatives to Mini-max
- Conspiracy number search
- proof number search
- B*
- Monte carlo rollouts
- UCT
Evaluation Function
A number which indicates how good a position is for one player.
- do not treat it as a mathematical probability
- used heavily in search so should be fast
Opening Book
Codification of good opening moves
Chess programs use the knowledge in these publications
- perhaps augmented by team members expert or better in chess
- coded by programmers into a form their program can use
A common strategy of human players confronting computers is to make moves
out of the book - i.e. not found in the book - in the expectation that the computer
will not be able to find the responses which make the move sub-optimal.
Endgame database
A tabulation of the possible positions in which only a very small number of chessmen remain on the board. For each position, it records the best move.
Advantages of Iterative Deepening
- If there is time to perform a deeper search, fine then go ahead and do it. If not, with iterative deepening, you have a search result ready to play.
- With alpha-beta search, good move ordering gives much bigger savings than random move ordering. Iterative deepening can reveal the Principal variation up to depth N, which can be used to order moves up to depth N in searching to depth N+1, and so in that sense, Iterative deepening actually saves time overall
Manifestations of the horizon effect
A search that stops at a particular depth has no information about moves beyond
that depth except what the static evaluation function returns.
The search compares the outcomes of different lines of play within its horizon.
If one line of play results in a loss within the horizon, and another line of play
results in a smaller loss within the horizon, the program will think that second
line better.
But this can result merely in delaying the inevitable - making ineffectual
sacrifices, just pushing bad news over the horizon.
It can also result in failing to realise how advantageous a position is.
How to overcome the ‘Horizon Effect’?
Quiescence Search
The horizon effect occurs because computers only search a certain number of moves ahead. Human players usually have enough intuition to decide whether to abandon a bad-looking move, or search a promising move to a great depth. A quiescence search attempts to emulate this behavior by instructing a computer to search “interesting” positions to a greater depth than “quiet” ones (hence its name) to make sure there are no hidden traps and, usually equivalently, to get a better estimate of its value.
What is the ‘Horizon Effect’?
The horizon effect is a problem in artificial intelligence where, in many games, the number of possible states or positions is immense and computers can only search a small portion of it, typically a few ply down the game tree. Thus, for a computer searching only five ply, there is a possibility that it could make a move that would prove to be detrimental later on (say, after six moves), but it cannot see the consequences because it cannot search far into the tree.
Quiescence Search
Any sensible criterion may be used to distinguish “quiet” moves from “noisy” moves; high activity (high movement on a chess board, extensive capturing in Go, for example) is commonly used for board games. As the main motive of quiescence search is usually to get a good value out of a poor evaluation function, it may also make sense to detect wide fluctuations in values returned by a simple heuristic evaluator over several ply. Modern chess engines may search certain moves up to 2 or 3 times deeper than the minimum. In highly “unstable” games like Go and reversi, a rather large proportion of computer time may be spent on quiescence searching.
Null-move heuristic
In α-β the Null-Move heuristic is a valuable technique in its own right.
- It also bears on quiescence search
• It applies even in games, like chess, where null moves (“passes”) are not legal!
The idea is that you almost always do better by making a move than you would
by allowing your opponent two moves in a row.
By imagining what the opponent could do with two moves in a row, you get a
lower bound on the value of your position.
• If this lower bound is still greater than β (Hope), you get an early and cheap β
cutoff. (since Null-Move generation costs virtually nothing