Study Flashcards

1
Q

Explain how Google’s AlphaGo system uses its three neural networks.

A
  1. 13-layer policy network
  2. Value network
  3. Fast policy network
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Minimax

A

Knowing the complete game tree is useful for a game playing AI, because it allows the program to pick the best possible move at a given game state. This can be done with the minimax algorithm: At each game turn, the AI figures out which move would minimize the worst-case scenario. To do that, it finds the node in the tree corresponding to the current state of the game. It then picks the action that minimizes the worst possible loss it might suffer. This requires traversing the whole game tree down to nodes representing end-of-game states. The minimax algorithm therefore requires the complete game tree. Great for tic-tac-toe, but not useful for chess, and even less so for Go.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dynamic game

A

The order of moves (performances of choice by players) is important; e.g. X&Os, shop sets a price and then customer decides whether to buy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Static games

A

Choices are unordered (perhaps simultaneous) e.g. Roshambo, Diplomacy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Zero-sum game

A

A game is zero-sum if at each terminal of the extensive form of the game, the sum of the players’ pay-offs is zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Ovals in extensive forms of games

A

Denote ‘information sets’ - sets of nodes in the game tree which cannot be distinguished by the player making a choice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Strategy (in Game Theory speak)

A

A function which determines which choice a player makes at every possible choice point (even those which do not arise on a particular occasion).

If one strategy for a player always gives at least as high a payoff for that player as some other strategy, the first dominates the second.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Normal-form

A

A more compact notation for the pay-offs of two players in a zero-sum game.

  • Rows of the matrix correspond to one player
  • Columns correspond to another player
  • cells shows the pay-off for one player (it is trivial to deduce the other player’s payoff - just negate the existing cells)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A stable solution

A

A solution is stable if no player wants to move away unilaterally from the solution. At such a stable solution, a game is in equilibrium.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Gain floor of a 2-player zero-sum game (A)

A

the worst outcome for player 1 if he picks his best strategy i*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Loss ceiling of a 2-player zero-sum game (A)

A

the best outcome for player 1 (corresponding to the worst for player two) if player2 picks his best strategy j*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Saddle point

A

When Gain Floor equals Loss ceiling

In this case, both players will do no better by unilaterally picking a different
strategy than i, j. This pair of strategies is a stable solution to the game.

A kind of cooperation emerges, even though the game is adversarial.

In a game with a saddle point, each player has a best strategy (or a set of them). The strategies remain best even if the other players know what they are.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the ‘Value’ of a game

A

the pay-off of its stable solution, i.e.

when Gain Floor = Loss Ceiling = Value (A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Games without a saddle point

A

In a game without a saddle point, and hence without a stable solution,
Each pure strategy one player might adopt, if known to an opponent, can be
refuted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Mixed strategy

A

A mixed strategy involves a random choice among pure strategies
• Each pure strategy is picked with its own particular probability
• Mixed strategies may be found which, even if known, cannot be refuted
• This does apply in once-off games, eg “Prisoner’s Dilemma”, but is mostly
associated with repeatedly-played (iterated) games

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Type A program

A

Work by (or as if by) exploring all legal moves in a game tree to a uniform depth, and backing up evaluations of leaf-node positions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Type B programs

A
  • choose parts of the tree to explore
  • doing selective move generation
  • using background knowledge (rules, learned weights, patterns, statistics) to select a few promising lines of play to be explored further, and to discard many poor lines of play to be explored no further
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Refinements of Minimax

A
  • alpha-beta, progressive deepening
  • transposition tables
  • quiescence search
  • PVS, Scout, Aspiration Search, MTD(f)
  • Killer heuristics, History Heuristic, Null-move Heuristic
  • Selective Extensions
  • Pondering (‘thinking’ during an opponent’s time)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Alternatives to Mini-max

A
  • Conspiracy number search
  • proof number search
  • B*
  • Monte carlo rollouts
  • UCT
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Evaluation Function

A

A number which indicates how good a position is for one player.

  • do not treat it as a mathematical probability
  • used heavily in search so should be fast
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Opening Book

A

Codification of good opening moves

Chess programs use the knowledge in these publications

  • perhaps augmented by team members expert or better in chess
  • coded by programmers into a form their program can use

A common strategy of human players confronting computers is to make moves
out of the book - i.e. not found in the book - in the expectation that the computer
will not be able to find the responses which make the move sub-optimal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Endgame database

A

A tabulation of the possible positions in which only a very small number of chessmen remain on the board. For each position, it records the best move.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Advantages of Iterative Deepening

A
  1. If there is time to perform a deeper search, fine then go ahead and do it. If not, with iterative deepening, you have a search result ready to play.
  2. With alpha-beta search, good move ordering gives much bigger savings than random move ordering. Iterative deepening can reveal the Principal variation up to depth N, which can be used to order moves up to depth N in searching to depth N+1, and so in that sense, Iterative deepening actually saves time overall
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Manifestations of the horizon effect

A

A search that stops at a particular depth has no information about moves beyond
that depth except what the static evaluation function returns.

The search compares the outcomes of different lines of play within its horizon.
If one line of play results in a loss within the horizon, and another line of play
results in a smaller loss within the horizon, the program will think that second
line better.
But this can result merely in delaying the inevitable - making ineffectual
sacrifices, just pushing bad news over the horizon.
It can also result in failing to realise how advantageous a position is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How to overcome the ‘Horizon Effect’?

A

Quiescence Search

The horizon effect occurs because computers only search a certain number of moves ahead. Human players usually have enough intuition to decide whether to abandon a bad-looking move, or search a promising move to a great depth. A quiescence search attempts to emulate this behavior by instructing a computer to search “interesting” positions to a greater depth than “quiet” ones (hence its name) to make sure there are no hidden traps and, usually equivalently, to get a better estimate of its value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the ‘Horizon Effect’?

A

The horizon effect is a problem in artificial intelligence where, in many games, the number of possible states or positions is immense and computers can only search a small portion of it, typically a few ply down the game tree. Thus, for a computer searching only five ply, there is a possibility that it could make a move that would prove to be detrimental later on (say, after six moves), but it cannot see the consequences because it cannot search far into the tree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Quiescence Search

A

Any sensible criterion may be used to distinguish “quiet” moves from “noisy” moves; high activity (high movement on a chess board, extensive capturing in Go, for example) is commonly used for board games. As the main motive of quiescence search is usually to get a good value out of a poor evaluation function, it may also make sense to detect wide fluctuations in values returned by a simple heuristic evaluator over several ply. Modern chess engines may search certain moves up to 2 or 3 times deeper than the minimum. In highly “unstable” games like Go and reversi, a rather large proportion of computer time may be spent on quiescence searching.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Null-move heuristic

A

In α-β the Null-Move heuristic is a valuable technique in its own right.

  • It also bears on quiescence search
    • It applies even in games, like chess, where null moves (“passes”) are not legal!

The idea is that you almost always do better by making a move than you would
by allowing your opponent two moves in a row.

By imagining what the opponent could do with two moves in a row, you get a
lower bound on the value of your position.
• If this lower bound is still greater than β (Hope), you get an early and cheap β
cutoff. (since Null-Move generation costs virtually nothing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Quiescence Search

A

Quiescence search is the idea of, after reaching the main search horizon, running a Turing-like search in which we only expand capturing moves (or sometimes, capturing and checking) moves. For games other than chess, the main idea would be to only include moves which make large changes to the evaluation. Such a search must also include “pass” moves in which we decide to stop capturing. So, each call to the evaluation function in the main alpha-beta search would be replaced by the following, a slimmed down version of alpha-beta that only searches capturing moves, and that allows the search to stop if the current evaluation is already good enough for a fail high:

    // quiescence search
    // call this from the main alphabeta search in place of eval()
    quiesce(int alpha, int beta) {
        int score = eval();
        if (score >= beta) return score;
        for (each capturing move m) {
            make move m;
            score = -quiesce(-beta,-alpha);
            unmake move m;
            if (score >= alpha) {
                alpha = score;
                if (score >= beta) break;
            }
        }
        return score;
    }
Some people also include checks to the king inside the quiescence search, but you have to be careful: because there is no depth parameter, quiescence can search a huge number of nodes. Captures are naturally limited (you can only perform 16 captures before you've run out of pieces to capture) but checks can go on forever and cause an infinite recursion.
30
Q

Null windows

A

If beta = alpha + 1, aB search will always either fail high or fail low. (same thing as asking ‘Did I my guess for the value equal exactly to alpha)

Such a search is used in NegaScout/PVS to confirm cheaply that subsequent move choices are in fact inferior to those believed to be the best.

However, when subsequent moves are not in factor inferior, they must be searched again.

31
Q

SCOUT Algorithm

A

Takes to an extreme the idea that minimal-window αβ searches are cheap
Replaces general αβ with a divide-and-conquer approach

32
Q

MTD(f) Algorithm

A

Similar to SCOUT, but makes a better initial guess using Iterative Deepening.

33
Q

Alpha-Beta variants

A

All these variations on αβ may involve repeating searches with different choices
for α and β.

This can make savings,
• but probably only if a transposition table is also used, to allow results of
previous partial searches to be reused.
• For chess, MTD(f) seems to need on average 3 - 5 iterations to converge,
and saves 5%-15% in terms of number of nodes generated.

αβ is fundamentally a depth-first tree search algorithm.
• Iterative Deepening gives an effect somewhat like breadth-first search.

34
Q

Transposition table

A
  • A table to look up previously computed results from a transposition
  • transposition = encountering the same positions again and again, but from different sequences of moves
35
Q

What information is stored in a TT?

A

Information stored that:

  1. Allows for later recognition of the same position if it recurs (same arrangement of pieces, same player on the move, other game-specific information)
  2. Tells what valuation should be returned if it is still valid (numeric score, fail low/high, CN range)
  3. Allows judging whether or not the valuation is still valid (depth of tree that provided this, if alpha-beta what bounds were in force)
  4. OPTIONAL - What the best move was found by the previous search
36
Q

How is a TT stored?

A
  • hash table using a numeric key

Approaches to collisions:

  1. Replace with new entry - esp. if stored entry relates to old problem
  2. Choose between old and new entry on principled basis - keeping entry with greatest depth/largest search effort/most reliable value/tightest bounds
37
Q

Zobrist hashing

A

Aim to give a repeatable numeric coding fast for situations arising in games to allow easy determination of whether 2 situations are the same

  • hashtable must contain less than 2^64 entries so only some of the Zobrist key bits will be used as TT hash code, all 64 as signature.
  • codes are updated by XOR with randomised numbers
  • some use rand(); other Hamming distance tricks
38
Q

Advantages of TT

A
  1. Reduction of search effort (for chess, x5)
  2. Useful to store as an opening book
  3. Can help with move-ordering in alpha-beta even if new depth search is not sufficient
  4. Get extra depth for free
39
Q

Why can a transposition table cause a significant speedup for a program that searches a game tree?

A

Speedup accrues from the reduction is search effort resulting from the re-use of information about positions already searched. In a two-player game with branching factor b, if all moves were independent, on 3rd half-ply b-squared moves would be duplicated, leaving b-cubed - b-squared. This represents a substantial saving - b-squared searches of subtrees of size b^(d-3). More generally, as depth increases, progressively greater numbers of progressively smaller subtrees need not be searched again.

40
Q

Besides speed-up, what other advantages are there to using a TT?

A
  1. TTs are useful to store as an opening book
  2. Even if the depth for the current entry is not sufficient for the current search, it can still be informative for move ordering in Alpha-Beta search
  3. You may get extra depth for free - in other words, the entry you retrieved from the hashtable may have derived from a search to e.g. depth-7 (from that position) when you only intend to search to depth-3
41
Q

A disadvantage of using a transposition table is the increase in complexity of a search algorithm. Outline the steps that must be added to an alpha-beta search algorithm to make use of a transposition table.

A

Steps to add a TT into alpha-beta pruning search would be as follows:

  1. At the start of algorithm, perform TT look-up based on current node value
  2. Check if an entry exists matching the current node representation, and the depth associated with the stored entry is >= the depth required for current search.
  3. Ideally there should be information stored in the entry that tells us whether the value is accurate, or it failed high or low.
  4. If it is an exact match, retrieve and return the accurate value.
  5. If not, an the entry records a lowerbound flag, let alpha be equal to the max of the current alpha and this lowerbound.
  6. If not, and the entry records an upperbound flag, let beta equal to the minimum of the current beta and this upperbound.
  7. Check if alpha <= beta, and return the value found from the entry.
  8. Now, for storing a new entry or updating an existing one, which takes place after the recursive alpha-beta call, let the value of the entry equal to the best value found from alpha-beta.
  9. If the value <= the original alpha for this function, set a flag indicating ‘upperbound’ on the TT entry.
  10. If the best value found >= the original beta, set a flag indicating ‘lowerbound’ on the TT entry.
  11. If Step 9 and 10 do not apply, set a flag indicating that it is an exact value in the TT entry.
  12. Add this new/updated entry into the TT hashtable.
42
Q

GHI problem with TTs

A

Palay, 1983

Since the state space of most games is a directed graph, many game-playing systems detect repeated positions with a transposition table. This approach can reduce search effort by a large margin. However, it suffers from the so-called Graph History Interaction (GHI) problem, which causes errors in games containing repeated positions.

A program can reach the same game state via a
different path — a so called transposition. If the previously
cached position was explored deeply enough, the search algorithm does not need to explore the position again. However, if the search space includes cycles, cached results may be unreliable because they ignore the path used to reach the position.

Kishimoto et al. came up with a solution to the GHI problem.

43
Q

TT

A
  • One of the most valuable search enhancements is the
    transposition table
  • A large cache that keeps results of previous searches.
44
Q

Killer heuristic

A
  • we know that alpha-beta search is more efficient with well-ordered moves
  • it often happens in practice that a player has a particularly good move available somewhere in the near future, good in several different situations.
  • if such a killer move can be recognized by the search algorithm and considered first when it is among the generated possible moves, alpha-beta is more likely to cut-off quickly

How to do this:

  1. At each ply in the search tree, keep a count of the number of times each possible move caused a beta cutoff
  2. when a list of moves is generated, consider first the move that has caused most cutoffs at that ply-level
45
Q

History Heuristic

A

[similar to killer, independent of ply levels]

Similar to the thinking behind Killer Heuristic, a move that has repeatedly proved good, may also be good at other ply levels.

Method:

  1. Maintain (probably separately for each player), a table of moves that have caused beta cut-offs, recording the number of cut-offs for each move, regardless of ply level.
  2. When generating moves for a node, sort them in descending order of the number of recorded cut-offs
46
Q

Fuzzy-move matching

A

With the Killer & History Heuristic, it is in some cases quite likely that other ‘similar’ moves will also be good.

Moves might be compared to known killer moves on the basis of strict identity or by sharing one or more of the piece-from-to elements of the description.

47
Q

Singular Extension

A

Background: part of rationale of minimax is that a static evaluation of a position is more likely to be reliable closer to the end of a game.

Idea of singular extensions is to use a very focused search to several ply deeper in order to get a more reliable evaluation for midgame positions.

Method:
In alpha-beta search when the static evaluation of a position is required, do the following:

  1. Pick the possible next move with the best static evaluation
  2. Generate all possible successors of that ‘singular extension’
  3. Repeat for some predetermined number of ply
  4. Report the evaluation at the end of this play out sequence back to the alpha-beta routine

Note: Host alpha-beta should be carried out to a lesser depth if singular extensions are pursued, since substantial move generation and evaluation is necessary

48
Q

Conspiracy Numbers

A

-David McAllester

49
Q

PNS

A

Proof Number Search

  • best-first AND-OR tree search algorithm
  • ## Victor Allis
50
Q

Conspiracy Numbers

A

-David McAllester

Purpose: to continue searching until it is unlikely that the minimax value of the root will change

51
Q

PNS

A

Proof Number Search

  • best-first AND-OR tree search algorithm
  • Victor Allis

Purpose: finding the game-theoretical value in game trees

Based on ideas derived from CN search

PNS aims at proving the true value of the root

PNS specializes conspiracy numbers to and-or tree search with binary outcome, with significant reduced memory requirements compared to CnS

MAX-nodes are represented as OR-nodes, MIN-nodes as AND-nodes.

A solved tree with value true is called proved, while a solved tree with value false is called disproved.

Evaluation of leaves yields in the values of false, true or unknown, the latter can be expanded to become a frontier node. For backpropagation, after expansion, AND-nodes become false if at least one child is false, else unknown if at least one child is unknown, otherwise true. At OR-nodes, false and true are interchanged analogously.

52
Q

Shannon Type A strategy

A
  • brute force
  • 1949 Claude Shannon
  • weak and slow due to huge branching factor and exponential explosion
  • thus, he favoured to search only a small subset of plausible moves with a Type B strategy
  • however when alpha-beta came along in 70s with all its enhancments, it became popular since the task of classifying and excluding ‘non plausible’ moves become difficult and error prone
53
Q

Shannon Type B strategy

A

rely on knowledge (exclusively or partially)

  • doing selective move generation or analysis of positions
  • some knowledge may be coded in ordinary algorithmic fashion
  • some knowledge may be expressed in terms of patterns
54
Q

Expectation Minimization

A

Minimax is not suitable for evaluating interior nodes in a game tree where,
owing to chance factors, certain moves may or may not be possible.

Instead, intermediate game-tree nodes may be used to represent groups of moves
that are possible with different chance outcomes; given values as if by minimax;
and their parent given a value weighted by probability of various outcomes.

55
Q

Danger in self-play

A

Early random strategy at the start may be very bad

56
Q

Explain UCT.

A
  • Kocsis & Szepesvari (2006)
  • based on UCB1 multi-armed bandit algorithm (Auer et al. 2002)
  • changed landscape of game-playing programs in recent years
57
Q

RAVE

A

Rapid Value Action Estimation - specialization of AMAF

58
Q

AlphaGo Overview

A
  • UCT (Upper Confidence Bounds applied to Trees) is based on the notion that a node has a score made-up of a 1. a win-rate and a visits-based term
  • Heavy playouts = A third term is added to bias the probability of a node’s selection based on knowledge/heuristics (e.g. pattern-based),
59
Q

AlphaGo use of DCNNs in UCT Search

A
  1. A 13-layer policy network = used to bias the probability of a node’s selection in the lead-up to a play-out
  2. Value network = used to estimate the value of a position reached after ~20 moves of a playout
  3. Fast policy network = simpler than the policy network which was used to pick a unique move during a playout
60
Q

UCT Node Evaluation in Alpha-Go

A

Score of a node in AlphaGo is made up of up:

  1. Win-rate
  2. Visits-based term = encouraging more exploration of moves for which disappointing results to data may be largely due to unlucky playout decisions
  3. Heuristic value computed by the value network at points 20 moves in the future of playouts
  4. Heuristic value, or bias, computed by the policy network
61
Q

Training of DCNNs

A
  1. Supervised learning stage - 30 million positions from high-dan games used as a training set, just one randomly selected position from each game in order to overcome a problem of positive correlation (policy network -> predict expert move with 57% accuracy)
  2. Reinforcement learning stage - policy network was continually improved through self-play against randomly selected previous versions of itself.
62
Q

Hardware for AlphaGo

A

Single-machine & distributed

Single machine: 40 search threads, 48 CPUs, 8 GPUs

Enabled parallelism without locks by artificially setting a leaf node’s score to 0 so that it would not be expanded by another thread, and so avoided interference with other threads

63
Q

Eternity Solution

A

Hybrid Search:

  1. Heuristically guided beam search
  2. DFS
64
Q

Advantages of Iterative Deepening

A
  1. IF there is time to perform a deeper search, go ahead and do it; if not, with iterative deepening, you have a search ready to play.
  2. With AB search, good move ordering gives much bigger savings than random move ordering. Iterative deepening reveals the Principal Variation up to depth N which can be used to order moves up to depth N in searching to depth N + 1.
65
Q

3 DCNNs in AlphaGo

A
  • 13-layer policy network - used to bias the probability of a node’s selection in the lead-up to a play-out
  • Value network - used to estimate the value of a position reached after ~20 moves of a playout
  • Fast policy network - Simpler than the policy network, used to pick a unique move during a playout
66
Q

Singular extensions

A
  • rationale of minimax - static evaluation more likely to be reliable as the game gets closer to its end
  • idea of singular extensions is to use a very focused search to several ply deeper in order to get a more reliable evaluation for midgame positions
  • in AB search, when a static evaluation is required:
    1. pick the possible next move with the best static evaluation
    2. generate all possible successors of that ‘singular extension’
    3. repeat for some predetermined number of ply
    4. report the evaluation at the end of this playout sequence to the AB routine

Note: the host AB search should be carried out to a lesser depth if singular extensions are pursued, since substantial move generation and evaluation is necessary.

67
Q

Pruning

A

Allows us to ignore portions of the tree that make no difference to the final choice

68
Q

Heuristic evaluation function

A

Allows us to approximate the true utility of a state without doing a complete search.

69
Q

Tic-Tac-Toe game tree

A

Relatively small - fewer than 9! terminal nodes

70
Q

Futility pruning

A

Helps decide in advance which moves will cause a beta cutoff in the successor nodes.

71
Q

Null move pruning (aka null move heuristic)

A

Generates a good lower-bound on the value of a position, using a shallow search in which the opponent gets to move twice at the beginning.

> this lower bound often allows alpha-beta pruning without the expense of a full-depth search.

> was used extensively in HYDRA system

72
Q

Reasons computer playing programs for Go were difficulty pre-UCT (1997) era

A
  1. Branching factor of 361 - board is 19 * 19, moves allowed into almost every empty square
  2. Game length: games commonly last for 150-300 (somtimes 350+) moves
  3. Difficulty writing an evaluation function - the control of territory is often unpredictable until the endgame