Study Flashcards

Question

How to overcome the 'Horizon Effect'?

Answer 1

Quiescence Search The horizon effect occurs because computers only search a certain number of moves ahead. Human players usually have enough intuition to decide whether to abandon a bad-looking move, or search a promising move to a great depth. A quiescence search attempts to emulate this behavior by instructing a computer to search "interesting" positions to a greater depth than "quiet" ones (hence its name) to make sure there are no hidden traps and, usually equivalently, to get a better estimate of its value.

Answer 2

The horizon effect is a problem in artificial intelligence where, in many games, the number of possible states or positions is immense and computers can only search a small portion of it, typically a few ply down the game tree. Thus, for a computer searching only five ply, there is a possibility that it could make a move that would prove to be detrimental later on (say, after six moves), but it cannot see the consequences because it cannot search far into the tree.

Answer 3

Any sensible criterion may be used to distinguish "quiet" moves from "noisy" moves; high activity (high movement on a chess board, extensive capturing in Go, for example) is commonly used for board games. As the main motive of quiescence search is usually to get a good value out of a poor evaluation function, it may also make sense to detect wide fluctuations in values returned by a simple heuristic evaluator over several ply. Modern chess engines may search certain moves up to 2 or 3 times deeper than the minimum. In highly "unstable" games like Go and reversi, a rather large proportion of computer time may be spent on quiescence searching.

Answer 4

In α-β the Null-Move heuristic is a valuable technique in its own right. - It also bears on quiescence search • It applies even in games, like chess, where null moves (“passes”) are not legal! The idea is that you almost always do better by making a move than you would by allowing your opponent two moves in a row. By imagining what the opponent could do with two moves in a row, you get a lower bound on the value of your position. • If this lower bound is still greater than β (Hope), you get an early and cheap β cutoff. (since Null-Move generation costs virtually nothing

Answer 5

Quiescence search is the idea of, after reaching the main search horizon, running a Turing-like search in which we only expand capturing moves (or sometimes, capturing and checking) moves. For games other than chess, the main idea would be to only include moves which make large changes to the evaluation. Such a search must also include "pass" moves in which we decide to stop capturing. So, each call to the evaluation function in the main alpha-beta search would be replaced by the following, a slimmed down version of alpha-beta that only searches capturing moves, and that allows the search to stop if the current evaluation is already good enough for a fail high: ``` // quiescence search // call this from the main alphabeta search in place of eval() ``` ``` quiesce(int alpha, int beta) { int score = eval(); if (score >= beta) return score; for (each capturing move m) { make move m; score = -quiesce(-beta,-alpha); unmake move m; if (score >= alpha) { alpha = score; if (score >= beta) break; } } return score; } Some people also include checks to the king inside the quiescence search, but you have to be careful: because there is no depth parameter, quiescence can search a huge number of nodes. Captures are naturally limited (you can only perform 16 captures before you've run out of pieces to capture) but checks can go on forever and cause an infinite recursion. ```

Answer 6

If beta = alpha + 1, aB search will always either fail high or fail low. (same thing as asking 'Did I my guess for the value equal exactly to alpha) Such a search is used in NegaScout/PVS to confirm cheaply that subsequent move choices are in fact inferior to those believed to be the best. However, when subsequent moves are not in factor inferior, they must be searched again.

Answer 7

Takes to an extreme the idea that minimal-window αβ searches are cheap Replaces general αβ with a divide-and-conquer approach

Answer 8

Similar to SCOUT, but makes a better initial guess using Iterative Deepening.

Answer 9

All these variations on αβ may involve repeating searches with different choices for α and β. This can make savings, • but probably only if a transposition table is also used, to allow results of previous partial searches to be reused. • For chess, MTD(f) seems to need on average 3 - 5 iterations to converge, and saves 5%-15% in terms of number of nodes generated. αβ is fundamentally a depth-first tree search algorithm. • Iterative Deepening gives an effect somewhat like breadth-first search.

Answer 10

- A table to look up previously computed results from a transposition - transposition = encountering the same positions again and again, but from different sequences of moves

Answer 11

Information stored that: 1. Allows for later recognition of the same position if it recurs (same arrangement of pieces, same player on the move, other game-specific information) 2. Tells what valuation should be returned if it is still valid (numeric score, fail low/high, CN range) 3. Allows judging whether or not the valuation is still valid (depth of tree that provided this, if alpha-beta what bounds were in force) 4. OPTIONAL - What the best move was found by the previous search

Answer 12

- hash table using a numeric key Approaches to collisions: 1. Replace with new entry - esp. if stored entry relates to old problem 2. Choose between old and new entry on principled basis - keeping entry with greatest depth/largest search effort/most reliable value/tightest bounds

Answer 13

Aim to give a repeatable numeric coding fast for situations arising in games to allow easy determination of whether 2 situations are the same - hashtable must contain less than 2^64 entries so only some of the Zobrist key bits will be used as TT hash code, all 64 as signature. - codes are updated by XOR with randomised numbers - some use rand(); other Hamming distance tricks

Answer 14

1. Reduction of search effort (for chess, x5) 2. Useful to store as an opening book 3. Can help with move-ordering in alpha-beta even if new depth search is not sufficient 4. Get extra depth for free

Answer 15

Speedup accrues from the reduction is search effort resulting from the re-use of information about positions already searched. In a two-player game with branching factor b, if all moves were independent, on 3rd half-ply b-squared moves would be duplicated, leaving b-cubed - b-squared. This represents a substantial saving - b-squared searches of subtrees of size b^(d-3). More generally, as depth increases, progressively greater numbers of progressively smaller subtrees need not be searched again.

Answer 16

2. TTs are useful to store as an opening book 3. Even if the depth for the current entry is not sufficient for the current search, it can still be informative for move ordering in Alpha-Beta search 4. You may get extra depth for free - in other words, the entry you retrieved from the hashtable may have derived from a search to e.g. depth-7 (from that position) when you only intend to search to depth-3

Answer 17

Steps to add a TT into alpha-beta pruning search would be as follows: 1. At the start of algorithm, perform TT look-up based on current node value 2. Check if an entry exists matching the current node representation, and the depth associated with the stored entry is >= the depth required for current search. 3. Ideally there should be information stored in the entry that tells us whether the value is accurate, or it failed high or low. 4. If it is an exact match, retrieve and return the accurate value. 5. If not, an the entry records a lowerbound flag, let alpha be equal to the max of the current alpha and this lowerbound. 6. If not, and the entry records an upperbound flag, let beta equal to the minimum of the current beta and this upperbound. 7. Check if alpha <= beta, and return the value found from the entry. 8. Now, for storing a new entry or updating an existing one, which takes place after the recursive alpha-beta call, let the value of the entry equal to the best value found from alpha-beta. 9. If the value <= the original alpha for this function, set a flag indicating ‘upperbound’ on the TT entry. 10. If the best value found >= the original beta, set a flag indicating ‘lowerbound’ on the TT entry. 11. If Step 9 and 10 do not apply, set a flag indicating that it is an exact value in the TT entry. 12. Add this new/updated entry into the TT hashtable.

Answer 18

Palay, 1983 Since the state space of most games is a directed graph, many game-playing systems detect repeated positions with a transposition table. This approach can reduce search effort by a large margin. However, it suffers from the so-called Graph History Interaction (GHI) problem, which causes errors in games containing repeated positions. A program can reach the same game state via a different path a so called transposition. If the previously cached position was explored deeply enough, the search algorithm does not need to explore the position again. However, if the search space includes cycles, cached results may be unreliable because they ignore the path used to reach the position. Kishimoto et al. came up with a solution to the GHI problem.

Answer 19

- One of the most valuable search enhancements is the transposition table - A large cache that keeps results of previous searches.

Answer 20

- we know that alpha-beta search is more efficient with well-ordered moves - it often happens in practice that a player has a particularly good move available somewhere in the near future, good in several different situations. - if such a killer move can be recognized by the search algorithm and considered first when it is among the generated possible moves, alpha-beta is more likely to cut-off quickly How to do this: 1. At each ply in the search tree, keep a count of the number of times each possible move caused a beta cutoff 2. when a list of moves is generated, consider first the move that has caused most cutoffs at that ply-level

Answer 21

[similar to killer, independent of ply levels] Similar to the thinking behind Killer Heuristic, a move that has repeatedly proved good, may also be good at other ply levels. Method: 1. Maintain (probably separately for each player), a table of moves that have caused beta cut-offs, recording the number of cut-offs for each move, regardless of ply level. 2. When generating moves for a node, sort them in descending order of the number of recorded cut-offs

Answer 22

With the Killer & History Heuristic, it is in some cases quite likely that other 'similar' moves will also be good. Moves might be compared to known killer moves on the basis of strict identity or by sharing one or more of the piece-from-to elements of the description.

Answer 23

Background: part of rationale of minimax is that a static evaluation of a position is more likely to be reliable closer to the end of a game. Idea of singular extensions is to use a very focused search to several ply deeper in order to get a more reliable evaluation for midgame positions. Method: In alpha-beta search when the static evaluation of a position is required, do the following: 1. Pick the possible next move with the best static evaluation 2. Generate all possible successors of that 'singular extension' 3. Repeat for some predetermined number of ply 4. Report the evaluation at the end of this play out sequence back to the alpha-beta routine Note: Host alpha-beta should be carried out to a lesser depth if singular extensions are pursued, since substantial move generation and evaluation is necessary

Answer 24

-David McAllester

Answer 25

Proof Number Search - best-first AND-OR tree search algorithm - Victor Allis -

Answer 26

-David McAllester Purpose: to continue searching until it is unlikely that the minimax value of the root will change

Answer 27

Proof Number Search - best-first AND-OR tree search algorithm - Victor Allis Purpose: finding the game-theoretical value in game trees Based on ideas derived from CN search PNS aims at proving the true value of the root PNS specializes conspiracy numbers to and-or tree search with binary outcome, with significant reduced memory requirements compared to CnS MAX-nodes are represented as OR-nodes, MIN-nodes as AND-nodes. A solved tree with value true is called proved, while a solved tree with value false is called disproved. Evaluation of leaves yields in the values of false, true or unknown, the latter can be expanded to become a frontier node. For backpropagation, after expansion, AND-nodes become false if at least one child is false, else unknown if at least one child is unknown, otherwise true. At OR-nodes, false and true are interchanged analogously.

Answer 28

- brute force - 1949 Claude Shannon - weak and slow due to huge branching factor and exponential explosion - thus, he favoured to search only a small subset of plausible moves with a Type B strategy - however when alpha-beta came along in 70s with all its enhancments, it became popular since the task of classifying and excluding 'non plausible' moves become difficult and error prone

Answer 29

rely on knowledge (exclusively or partially) - doing selective move generation or analysis of positions - some knowledge may be coded in ordinary algorithmic fashion - some knowledge may be expressed in terms of patterns

Answer 30

Minimax is not suitable for evaluating interior nodes in a game tree where, owing to chance factors, certain moves may or may not be possible. Instead, intermediate game-tree nodes may be used to represent groups of moves that are possible with different chance outcomes; given values as if by minimax; and their parent given a value weighted by probability of various outcomes.

Answer 31

Early random strategy at the start may be very bad

Answer 32

- Kocsis & Szepesvari (2006) - based on UCB1 multi-armed bandit algorithm (Auer et al. 2002) - changed landscape of game-playing programs in recent years

Answer 33

Rapid Value Action Estimation - specialization of AMAF

Answer 34

- UCT (Upper Confidence Bounds applied to Trees) is based on the notion that a node has a score made-up of a 1. a win-rate and a visits-based term - Heavy playouts = A third term is added to bias the probability of a node's selection based on knowledge/heuristics (e.g. pattern-based),

Answer 35

1. A 13-layer policy network = used to bias the probability of a node's selection in the lead-up to a play-out 2. Value network = used to estimate the value of a position reached after ~20 moves of a playout 3. Fast policy network = simpler than the policy network which was used to pick a unique move during a playout

Answer 36

Score of a node in AlphaGo is made up of up: 1. Win-rate 2. Visits-based term = encouraging more exploration of moves for which disappointing results to data may be largely due to unlucky playout decisions 3. Heuristic value computed by the value network at points 20 moves in the future of playouts 4. Heuristic value, or bias, computed by the policy network

Answer 37

1. Supervised learning stage - 30 million positions from high-dan games used as a training set, just one randomly selected position from each game in order to overcome a problem of positive correlation (policy network -> predict expert move with 57% accuracy) 2. Reinforcement learning stage - policy network was continually improved through self-play against randomly selected previous versions of itself.

Answer 38

Single-machine & distributed Single machine: 40 search threads, 48 CPUs, 8 GPUs Enabled parallelism without locks by artificially setting a leaf node's score to 0 so that it would not be expanded by another thread, and so avoided interference with other threads

Answer 39

Hybrid Search: 1. Heuristically guided beam search 2. DFS

Answer 40

1. IF there is time to perform a deeper search, go ahead and do it; if not, with iterative deepening, you have a search ready to play. 2. With AB search, good move ordering gives much bigger savings than random move ordering. Iterative deepening reveals the Principal Variation up to depth N which can be used to order moves up to depth N in searching to depth N + 1.

Answer 41

- 13-layer policy network - used to bias the probability of a node’s selection in the lead-up to a play-out - Value network - used to estimate the value of a position reached after ~20 moves of a playout - Fast policy network - Simpler than the policy network, used to pick a unique move during a playout

Answer 42

- rationale of minimax - static evaluation more likely to be reliable as the game gets closer to its end - idea of singular extensions is to use a very focused search to several ply deeper in order to get a more reliable evaluation for midgame positions - in AB search, when a static evaluation is required: 1. pick the possible next move with the best static evaluation 2. generate all possible successors of that 'singular extension' 3. repeat for some predetermined number of ply 4. report the evaluation at the end of this playout sequence to the AB routine Note: the host AB search should be carried out to a lesser depth if singular extensions are pursued, since substantial move generation and evaluation is necessary.

Answer 43

Allows us to ignore portions of the tree that make no difference to the final choice

Answer 44

Allows us to approximate the true utility of a state without doing a complete search.

Answer 45

Relatively small - fewer than 9! terminal nodes

Answer 46

Helps decide in advance which moves will cause a beta cutoff in the successor nodes.

Answer 47

Generates a good lower-bound on the value of a position, using a shallow search in which the opponent gets to move twice at the beginning. > this lower bound often allows alpha-beta pruning without the expense of a full-depth search. > was used extensively in HYDRA system

Answer 48

1. Branching factor of 361 - board is 19 * 19, moves allowed into almost every empty square 2. Game length: games commonly last for 150-300 (somtimes 350+) moves 3. Difficulty writing an evaluation function - the control of territory is often unpredictable until the endgame

Study Flashcards

(72 cards)