Chapter 6 - Adversarial search Flashcards

Question 1

Q

What is a “competitive environment”?

Answer

A

A competitive environment is an environment where multiple agents are competing in the sense that they have different goals

Question 2

Q

Compare competitive task environments in chess and likeminded games and to the real life shite

Answer

A

The real life is very difficult. We therefore concentrate on games that often have a very easy way of representing its state. This id highly advantageous when learning about it.

Question 3

Q

What are the stances we can take when considering multi-agent environments?

Answer

A

There are 3 stances to consider when considering multi agent environments:

1) Economy
By this we mean that we consider all agents in the environment as an aggregate, and look at things that can happen like we do in macroeconomic tendencies. This approach is best when there is a large number of agents. The economy approach is about allowing us to make prediction on the whole system, rather than actually predicting the action of a single agent. This makes us interested in TRENDS and overall behavior. The whole point is to regard the system as a whole, rather than considering individual agents. In football, we could do this by considering the entire team rather than individual players. For instance, if we play a certain formation, how many passes and pass completion rate did we manage compared to other formations?

2) The second stance is to consider adversarial search agents as just a part of the environment - a part that makes the environment non-deterministic. In other words, we take a multi-agent environment and treat it like a single agent environment by considering the other agents as just a part of the environment. However, this will completely miss the fact that these agents are our competitors. So, we reduce complexity at the cost of performance.
So, for instance: If an agent is trying to walk outside from point A to point B, we could use this stance. We only need to react to changes in the environment, keep our distance to other people, etc. However, if people happened to be actively trying to make us not reach the destination, then this stance would not be useful, as it is very limited in regards to its competitive abilities.

3) The third stance is about explicitly modeling the adversarial agents with the techniques of adversarial game-tree search. This is our main consideration.

Question 4

Q

What are deterministic, two-player, turn-taking, perfect information, zero-sum games?

Answer

A

These are games that have got a LOT of research on within AI.
Perfect information means fully observable TE.

Zero sum game means that what is good for player A is bad for player B. There are no win-win situations. Either the outcome is good for one or the other (or perhaps stalemate).

Question 5

Q

When we are talking about two-player zero-sum games, what do we mean by “position”?

Answer

A

Position is the current state.

Question 6

Q

In two-player zero-sum games, what do we call our players?

Answer

A

We call the player who moves first “MAX” and the player that moves second is “MIN”.

Question 7

Q

Formally define a two-player zero-sum game

Answer

A

The game will have an initial state. S_0. This is the set up.

TO-MOVE(s) - The player whose turn it is to move in state s. Could alternate, could be a function of something else (ex dice = 6)

Actions(s): The actions set. The set of legal moves in state s.

Result(s, a): The resulting state we get by performing action a from state s. This is the transition model. Recall that the transition model requires knowledge of how the world works AND how the actions of the agent impact the world.

Is-Terminal(s): A terminal test, which is true if the game is over, and false if the game is not over. We usually call the set of states that indicate a TRUE signal here as “Terminal states”.

Utility(s, p): A utility function which defines the final numeric value to player p when the game ends in terminal state s. In chess, the outcome is either win loss or draw.. Values are 1, 0 and 1/2.

These 6 elements make up a game formally.

Question 8

Q

Consider the formal definition of a “game”. What parts of it are required to make the state space graph?

Answer

A

We require the initial state, the transition model and the actions set.

Question 9

Q

What do we mean by superimposing?

Answer

A

Superimpose is to draw something on top of something else. For instance we can superimpose a search tree on top of a state space graph.

Question 10

Q

What is the “game tree”?

Answer

A

The game tree is the complete tree as a search that follows every sequence of moves until it reach a terminal state. This of course means that not all game trees will actually be complete, as some state space graphs are infinite and carry no guarantees.

Question 11

Q

What can we say about the plan of MAX?

Answer

A

Because its plan will depend on the actions from MIN, we need to construct a contingency plan. A conditional plan.

Question 12

Q

When can we use AND-OR search to find a plan?

Answer

A

We can use a AND-OR search when the result outcome is win or lose. Recall that the AND-OR search will create a conditional plan.

Question 13

Q

what do we do with games that has more than 2 outcomes?

Answer

A

Game that have multiple outcomes requires a minimax search. A minmax search is a more general AND-OR searhc.

Question 14

Q

Conisder the ply game. What is it?

Answer

A

a ply refers to a single turn in a multiplayer game

Question 15

Q

What is the minimax value?

Answer

A

The minimax value is a value that gives the utility of MAX of being in that state. the minimax of a terminal state is just the utility of the state. In a non-terminal state, MAX prefers to move to a state of maximum value when it is max’s turn to move. Min prefers a state of minimum value (minimum value for MAX, and thus maximum value for MIN).

Summed up: When MAX is to move next, this agent naturally want to move to the position where his minimax value is as high as possible. However, the MIN agent naturally dont want this, and thus want MAX to move to the lowest possible minimax value. Minimax value of some state ASSUMES that both players play optimally from here on out.

Question 16

Q

How do we treat perspectives of scores with the minimax?

Answer

A

We consider one end of the score as good for MAX, and the other end of the score as good for MIN. For instance, in chess, we’d consider 1 as a good score for MAX, and 0 as a good score for MIN. This way, we dont have to change perspectives all the time.

This implies some symmetry between the scores.

Question 17

Q

What is the minimax value?

Answer

A

The minimax value is the utility FOR MAX of being in a specific state. This means, that when finding the minimax value when it is MIN’s turn to move, we still use MAX as a reference point.

This means that the minimax value is a prediction of value. It says: From this specific state, if both players play optimally, the final utility will be X.

Question 18

Q

What is the minimax decision?

Answer

A

The minimax decision is the optimal decision by either min or max. The decision will naturally be differnet in goal. For instance, the minimax decision for MAX is to maximize the utility. For MIN, the minimax decision is to minimize it.

Question 19

Q

What is the assumption of the minimax theory?

Answer

A

Minimax assumes that BOTH players play optimally from the specific state, and until the end.

Question 20

Q

Is the minimax decision optimal?

Answer

A

NO. It is based on assumptions. Sometimes, risky moves can outplay it.

Question 21

Q

What is the time and space complecity of the minimax search algorithm?

Answer

A

If we have a game tree of m depth, b possible moves at every level, then the time complexity is O(b^m).

The space complexity is given as O(bm).

Question 22

Q

Why is the regular minimax algorithm bad for complex games like chess?

Answer

A

It is bad because of its exponential time complexity.

Question 23

Q

Elaborate detailed on the minimax search algorithm

Answer

A

First of all, minimax search is an algorithm that will return a single move that is the optimal move assuming both players play optimally.

The algo is recursive and consists of 3 parts:

1) Main init. Basically just initializing the search, and returning the final move.

2) The MAX-VALUE(game, state) part.
This is called every time it is “MAX’s” turn to move.

3) The MIN-VALUE(game, state) part. This is called every time it is “MIN’s” turn to move.

Both MAX-VALUE and MIN-VALUE are very similar. They are basically the same, but with differnet “check”.

COnsider max value. It works like this:
We call it when max is to make a move. It will first check if the current state is a game ended/terminal state. If so, then it will return the move and the utility. the move that got to this state, and the corresponding utility.

If state is not terminal, then we start the real thing.
We first v and move to equal negative infinite. This is a sentinel value to indicate “not sat yet”. Meaning, could also use a flag or key. We set to negative infinity because then every possible value is better.

The we loop through all actions we have available. From the actions(s) method. For each action, we find to find the value we’d get by choosing this action and assuming optimal play. Therefore, for each action, we need to call the MIN-VALUE algorithm. This call will eventually return, and it will return with a pair of values. This pair is (utility, move). Therefore, in the for loop of the MAX VALUE algo, we store these values. Then we compare v with v2 (stored value of utility). If v2 is greater than v, this means that this move is better. We also in this case set move to be equal to a, which is the move returned by the min-value algo. Then, when we have looped through all the actions, we return the pair of (v, move), or (utility, move) that is the best (the one we have remaining).

The MIN VALUE algo works exactly the same, expect for the fact that the sentinel is +inifnity and the check is “if v2 < v” instead of “if v2 > v”.

Question 24

Q

How can we take the regular two-player setup and transform it into a multiplayer game?

Answer

A

We need to swap the single values at each node/state (the utility prediction) with a vector of values. In such a case, we actually use different values for the different perspectives. Meaning, if we have 3 players A, B and C, we’d have a terminal state like this:
{6, 2, 8} to indicate that this state is a 6 for A, 2 for B and 8 for C.

This means that the utility function return a vector of values.

Question 25

Q

When can it be optimal for A and B to make an alliance against C?

Answer

A

If A and B are in weak positions while C is in a strong position, an alliance can be good for A and B.

Question 26

Q

What is alpha and what is beta in the pruning shit?

Answer

A

Alpha is the minimum value that the MAX player can be assured of.

Beta is the maximum value that the MIN player can be assured of.

Since alpha is the minimum value that the MAX player can be assured of, if we conisder a max node, and its first kid returns the value 10. This means that the MAX player can be SURE that no value lower than 10 needs to be considered. MAX naturally wants to choose the highest value from its children, tehrefore, if the second children in exploruing an option that returns a smaller value than 10, we dont need it since the child (MIN node) will pick this. When this smaller than 10 value is returned to the MAX node we are consisdering, we will not pick it as the alpha value indicates. Alpha is the smallest value that the MAX player get be assured of - meaning that he will not have to consider getting even smaller values, as one of his childrne returned this as the minimum value.

Question 27

Q

Consider this statement:

“The highest value returned by the children of a MAX node”

What is it?

Answer

A

THe alpha value

A MAX node dont need to consider the values of its children that are lower than the alpha value, because it would just pick the alpha value anawyas, as it seeks to maximize.

Question 28

Q

Consider this statement:

“The lowest value returned by the children of a MIN node”

Answer

A

Beta value.

A MIN node dont need to consider values that are larger than this beta value, because it will just pick the beta value anyways.

Question 29

Q

Rams opp MAX-VALUE with alpha beta pruning

Answer

A

MAX-VALUE(game, state, alfa, beta)
if game.isTerminal() return game.utility

v = -infinity

for each move in actions():
1) v2, a2 = min-value(game,state,alfa,beta)
2) if v2 > v then: v = v2, moveReturn = move
alfa = max(alfa, v)
3) if v >= beta: return v, moveReturn

return v, moveReturn

Question 30

Q

Rams opp MIN VALUE with alfa beta

Answer

A

MIN-VALUE(game, state, alfa, beta)

v = infintiy

for each move in actions(state):
1) v2, a2 = MAX-VALUE(game, result(move, state), alfa, beta)
2) jf v2 < v: v = v2 & moveReturn = move. ALSO: beta = min(beta, v)
3) if v <= alfa, return v, moveReturn

return v, moveReturn

Question 31

Q

What scenario cause the best performance of alpha beta pruning?

Answer

A

If the best options are explored first, we can prune more.

Question 32

Q

What are killer moves?

Answer

A

Killer moves are the “best moves”.

Question 33

Q

What is th killer move heuristic?

Answer

A

The killer move heuristic is basically just trying the killer moves first.

Question 34

Q

What are transpositions?

Answer

A

Transpositions are different permutations of the move sequence that end up in the same position.

Question 35

Q

What is a transposition table?

Answer

A

A transposition table is a table that cache the heuristic value of states. It seem to be basically caching of states and their corresponding optimal move. So, if we happen to reach the same state several times in our search, then we’d just look up the solution move rather than calcualting it again.

The power here is that we can avoid searching subtrees over again.

Question 36

Q

Consider chess: It is well known that a simple minimax, even with alpha beta pruning AND caching using transposition tables, will not work on chess due to the enormous state space. What strategies could then be used with chess?

Answer

A

We have 2 types:

Type A strategy: COnsider a WIDE but shallow state space. Reach all nodes within a certain depth.

Type B strategy: Follow only promising lines, and thus search further but narrower.

Historically, most chess programs have been type A.

Question 37

Q

How can we make use of our limited cxomputation time? Meaning, since we cannot search until completion, what do we do?

Answer

A

We can cut off the search early and apply a heuristic evaluation function to states, effetively treating non-terminal states as if they are terminal.

In other words, we use “eval” instead of “utility”

We also change the terminal test with a cutoff test, which of course must also return true if the state is in fact terminal.

Question 38

Q

When we use the heuristic minimax, how can we formally describe the function?

Answer

A

H-MINIMAX(s, d) =
if isCutoff(s,d) then Eval(s, MAX)

if TO-Move(s) == MAX then max_{for every action in actions(s)} H-MINIMAX(result(s,a), d+1)

if To-Move(s) == MIN then min_{for every action in actions} H-MINIMAX(result(s,a), d+1)

When d is the depth

Question 39

Q

Consider the heuristic function Eval(s, p). What does it return?

Answer

A

The heuristic function eval(s,p) will return an estimate of the expected utility of state s to player p.

For terminal states, the estimate must be equal to the utility, and for non-terminal states, the heuristic estimate must be somewhere between the lose-value and win-value.

Utility(loss, p) <= Eval(s, p) <= Utility(win, p)

Question 40

Q

What makes a heuristic good?

Answer

A

The computation must be fast, since the entire point is to improve search time.

The heuristic function needs to be strongly connected to the actual chances of winning.

Question 41

Q

What are “features” of the state?

Answer

A

Domain specific things that we can utilize. It is certain characteristics that we can use to make decisions and predictions.

IN chess, a feautre can be the number of pawns for instance.

if we look at different states that all have the same features, we can group these together and call them “categories” or “equivalence classes” of states.

Question 42

Q

What are categories or “equivalence classes” of states?

Answer

A

Categories or “equivalence classes” of state is a collection of states that all share the same values for features. For example, one category might contain all two-pawn versus one-pawn endgames.

Any given category will contain some states that lead to wins, some that leads to draws, and some that lead to losses (assuming perfect play).

If we have an evaluation function, it might not know what states lead to what outcome. However, it might know the proportion of which lead to victory. Then the algorithm/heuristic can use this proportion information to make the best decision. It is quite common to use the expected value to make decisions.

Question 43

Q

Why is it not common to use probabilities as the only way of making deicions?

Answer

A

Because it requires too much experience that we really dont have.

instead, we usually consider individual features and calculate numerical contributions that we later combine to make up the total value. For instance, with chess, we can use material value and positional value. Summed together to find the total value. this is called a weighted linear function.

Question 44

Q

What is a weighted linear function+

Answer

A

Eval(s) = w1f1(s) + w2f2(s) + … + wn*fn(s)

fi is feature i, wi is the corresponding weight, indicating how important a specific feature is. The weights needs to be normalized so that the final score is within the range of a loss and a win ([0, 1])

Question 45

Q

Regarding evaluation functions and correlation to winning chances: If state s is twice as likely to result in a win as s’, what is required of the eval function?

Answer

A

We only require that eval(s) > eval(s’).

We do NOT require that eval(s) = 2*eval(s’)

Question 46

Q

What is the problem with wiefhted linear evaluation functions?

Answer

A

Linearity assumes independence among features. this is often not the case. Therefore, it is likely to be better to use non-linear combinations of features.

For instance, a bishop is worth more in the endgame than in the early game. Also, a pair of bishops are worth more than the sum of the individuals.

Question 47

Q

What modifications do we need to do with the alpha-beta search to make room for cutting off the search early?

Answer

A

First, we need to change isTerminal with isCutoff.

Then we need to add some incrementer that keeps track of the depth, so that we can cut off the search at a certain depth.

The simple approach is to choose a cutoff at a certain value for d.
However, a more robust variant is to use iterative deepening.

And of course, we need to apply the evaluation function when cuttoff ir reached:

if game.IS-cutoff(state, d) then return game.Eval(state, player)

Question 48

Q

What can we say about the timing of using an evaluation function?

Answer

A

An evaluation function should only be applied to positions that are QUIESCENT. Quiscent positions are positions with no pending move that can significantly alter the game. Like a free queen take for instance.

Therefore, the is-cutoff function should return false if the state is non-quiscent.

Question 49

Q

What is the horizon effect?

Answer

A

The horizon effect is the case where a program sense danger, and push the danger away and above “the horizon”, but ultimately end up doing worse. For instance, if our rook is fucked, the agent might choose to “delay” the capture of it. However, by doing this, it will open up other bad cases which could just end up hurting the player more.

Question 50

Q

How can one mitigate the horizon effect?

Answer

A

Use of singular extensions.This is about recognizing that some moves are much more important than others, and extending the depth of the search for that move.

This technique is ALL ABOUT recognizing what sequences to further analyze. This makes all the difference. This is what makes the differnece between good chess engines, and world class chess engines.

Question 51

Q

What is singular extension+

Answer

A

Singular extension is about extending the search of a spefici move because the chess engine has recognized that there might be significant benefit to this move.

All moves are not worth searching deeper. A great chess engine can recognize the few moves that are critical, and this focus on these moves.

Question 52

Q

What is the potential hazard of using cutoff and evaluation functions?

Answer

A

We might get the wrong answer

Question 53

Q

What is the difference between alpha beta pruning and forward pruning as a general principle?

Answer

A

Alpha beta pruning removes states that are not important to the outcome, meaning we only remove meaningless states.

Forward pruning prunes moves that APPEAR to be bad, even though they might actually be good. However, it is a strategy that can significantly reduce computation time.

Therefore, with forward pruning, we say that we sacrifice some level of correctness in order to reduce the state space.

Forward pruning is a TYPE B SYTRATEGY, as it sacrtifice width to get depth.

Question 54

Q

What is one possibility of forward pruning?

Answer

A

Beam search. Only considering the n best moves according to the evaluation function. However, this is rather dangerous, as there are no guanrantuees that the best choice will not be pruned away.

Question 55

Q

What is ProbCut?

Answer

A

ProbCut is short for probabilistic cut, and is a forward pruning method.

While alpha beta search prunes nodes that are PROVABLY outside the current (alfa, beta) window, the probCut will prune nodes that are PROBABLY outside the window.

The probCut algorithm calcualtes this probability of a node being outside of the window by doing a shallow search to compute the back-up value v of a node and then using past experience to esimate how likely it is that a score of v at depth d in the tree would be outisde (alfa, beta).

Question 56

Q

Recall what move ordering is

Answer

A

Move ordering reference the order in which we consider moves. This order can really make or break the algorithm.

Consider alfa beta. If we consider the worst first all the time, then this methid would not perform better than the regular minimax algorithm.

Question 57

Q

What is “late move reduction”?

Answer

A

Late move reduction assumes that we have done the move ordering well. This means that worse moves are likely to be the moves we explore later.

then the late move reduction technique is about reducing the depth of the search we apply to later moves. We dont prune them completely, but rather reduce the depth of the search. If it happen to return with a great value, we can redo the search with a greater depth.

Question 58

Q

Elaborate on search versus lookup

Answer

A

Lookup is much faster, and is often the preferred choice. For instance, consider chess openings. They are largely standardized, and we can then use table lookup instead of searching uselessly.

You can even use statistics to figure what opening is better etc.

Long story short, there are scenarios where there are NO REASON to use search.

Question 59

Q

why is the alpha beta search not good for a game like Go?

Answer

A

Because GO has a branching factor larger than 300.

Also ,it is very difficult to find an evaluation function for GO that is good. Material value is not really an indicator of winning positions.

Instead of alpha beta, Go usually use Monte Carlo tree search.

Question 60

Q

Elaborate on the monte carlo tree search

Answer

A

The basic MCTS strategy does not use a heuristic evaluation function. Instead, it will consider many simulations of games and calculate average utility of states. Meaning, in order to find an estimate value of the utility of state s, the MCTS will perform simulations from this state s and look at the average utility achieved from that state.

Question 61

Q

If a game has either win or lose outcome, what is then the expected utility measured as?

Answer

A

A win percentage

Question 62

Q

What is a playout policy in MCTS?

Answer

A

Playout policy is a way of making choices during simulation. We naturally dont want to pick random moves, as this will give the result of average utility when moves are random. therefore, we want to head towards good moves.

Question 63

Q

What is pure monte carlo search?

Answer

A

Pure monte carlo search do N simulations starting from the current state of the game, and track which of the moves have the highest win percentage.

Question 64

Q

Is the pure monte carlo search sufficient?

Answer

A

No, because for many games it will not converge towards optimal play as N increase. This is due to the state space being too large.

we need a selection policy that focus on important parts of the game tree.

Answer 65

A

The minimax algorithm consists of 3 parts.
1) The init function
2) the maxValue
3) the minValue

1)
function MINIMAX-SEARCH(game, state)
player = game.toPlay(state)
v, move = maxValue(game, state)
return (v, move)

2)
function maxValue(game, state)

if game.isTerminal(state): return (game.utility(state, player), null)

v, move = (-infinity)
for a in game.actions(state):
v2, a2 = minValue(game, result(state, a))
if v2 > v: v = v2 && move = a

“loop done”
return (v, move)

3)
function minValue(game, state)

if game.isTerminal(state):
return (game.utility(state, player), null)

v, move = infinity
for a in game.actions(state):
v2, a2 = maxValue(game, result(state, a))
if v2<v: v = v2 && move = a

“loop ended”
return (v, move)

IMPORTANT:
I first thought that a2 was sat to “move” if v2 > v (or v2 < v) but this is NOT the case. we set a to be equal to move. Thus, a is the retunred move. a2 is not really used.

Answer 66

A

MAX starts.

THe search will search down until the furthest point. Along the path, all nodes that have been visited will get its own v and move value. These are sat as unsat for the time being.

When the search reach the node furthest down, meaning a goal state/terminal state, it will return the utility of this state.
The parent of this terminal node will consider this child, and all its other children. Then it will pick the node-value/returned value and move that it find to be best according to preferences (min vs max).

Then, the parent will treat the value it found to be the best from its children as ITS OWN UTILITY SCORE. Then, the process will repeat itself as the search continue.

Answer 67

A

We can use minimax for all two player zero sum games. We apply it to a certain game by applying a specific utility function. We can use it for chess for instance.

It goes to show how much the utility/evanualtion function and isTerminal/isCutoff really matters

Answer 68

A

THe game tree is exponential in its depth.

Answer 69

A

3 parts, as with regular minimax.

1)
alpha-beta-search(game, state)

player = game.toPlay(state)
v, move = maxValue(game, state, alfa, beta)
return move

2)
maxValue(game, state, alfa, beta)

if game.isTerminal(state):
return (game.utility(state, player), null)

v, move = -infinity

for a in game.actions(state){
v2, a2 = minValue(game, result(state,a), alfa, beta)
if (v2 > v) {
v = v2
move = a
alfa = max(alfa, v)
}
if (v >= beta){
return (v, move)
}
}
return (v, move)

3)
minValue(game, state, alfa, beta)

if game.isTerminal(state){
return (game.utility(state, player), null)
}

v, move = infinity
for a in game.actions(state){
v2, a2 = maxValue(game, result(state, a), alfa, beta)
if (v2 < v){
v = v2
move = a
beta = min(beta, v)
}
if (v <= alfa){
return (v, move)
}
}

Regarding the maxValue:
maxValue will obviously operate at max node. therefore, it will set alfa to be the highest value it can so far choose. If this max node explore some children, and this children has a child value that is smaller than the alfa value, we know that this immediate children of the parent will return either this value or something lower. this is not useful for us, since we know from the alfa value that the parent node can choose something higher anyways. Therefore, we want to return from exploring that particular min child.

Answer 70

A

It is a pruning technique that guarantuees no loss in quality for hte solution.

It would be O(b^(m/2)) where b is the branching factor and m is the depth.

With random move ordering, we get O(b^(3m/4))

Answer 71

A

1) Exploration of states that have had few playouts

2) Exploitation of states that have done well in the past.

Monte carlo tree search does this by maintaining a search tree and growing it on each iteration following this 4 step scheme:

1) Selection: Starting at the root of the search tree, the player moves to one of children/states based on the selection policy. For instance, this would be to move to the state that has the best win percentage thus far, or if the node is not very explored. ex either like 66/71 or 2/11. We have to include the paths that arent very much explored because they might contain better strategies.
We “simulate” using the selection policy all the way down until we reach a leaf node in the search tree. Then we move on to the next step.

2) Expansion: We create a new child node of the leaf node we just reached. For now, we record this node with “0/0” because it has not been tested yet.

3) Simulation: We use the newly created node as a starting point for a round of simulation. We store the result of who won.

4) We use the result we got from the simulation, and update the values on the path we took in the first step. If black won the game from the state, then we increment all black nodes in both wins and number of simulations. White nodes are only incremented in its number of simulations, since white lost.

x) We repeat these four steps to get a better result.

Answer 72

A

1) Selection: Starting at the root of the search tree, the player moves to one of children/states based on the selection policy. For instance, this would be to move to the state that has the best win percentage thus far, or if the node is not very explored. ex either like 66/71 or 2/11. We have to include the paths that arent very much explored because they might contain better strategies.
We “simulate” using the selection policy all the way down until we reach a leaf node in the search tree. Then we move on to the next step.

2) Expansion: We create a new child node of the leaf node we just reached. For now, we record this node with “0/0” because it has not been tested yet.

3) Simulation: We use the newly created node as a starting point for a round of simulation. We store the result of who won.

4) We use the result we got from the simulation, and update the values on the path we took in the first step. If black won the game from the state, then we increment all black nodes in both wins and number of simulations. White nodes are only incremented in its number of simulations, since white lost.

x) We repeat these four steps to get a better result.

Answer 73

A

UCT: Upper Confidence bounds applied to Trees”.

The policy ranks all moves that are possible at a given state based on an upper confidence bound formula called UCB1:

UCB1(n) = U(n)/N(n) + C*sqrt(logN(Parent(n)) / N(n))

Where U(n) is the total utility of all playouts that went through node n.
N(n) is the number of playouts through node n.
Parent(n( is the parent node of n.
Thus, U(n)/N(n) is the average utility of n.
The sqrt part is the exploration term. It has N(n) in denominator which will favor low levels of playouts.

Notice the log part. It is log of the N-function of parent of n: log(N(parent(n))

C is a constant, often at about sqrt(2).

Answer 74

A

It returns the move with the highest number of playouts. The UCB1 formula will ensure that the node with the most playouts is almost always the move with the highest average utility as well.

Answer 75

A

For games with a very large branching factor we give the edge to monte carlo. also, if it is difficult to come up with a good evaluation function, monte carlo is generally better.

Monte Carlo has the advantage of aggregating. Therefore, individual miscalcluations are not as important, because they will be drowned out by the aggregate.

Answer 76

A

In cases where a single move can change the entire game. For instance, if monte carlo has failed to explore a queen move, it can be very wrong.

Answer 77

A

Stochastic games are games that involves change or luck. This will include games with dice. In other words, we cannot exactly make a path that would be followed under the assumption of optimal play.

Answer 78

A

A game tree is a search tree that is superimposed on the state space graph of the game problem.

Answer 79

A

Because of the uncertianty (stochastic), we cannot create a path of optimal play. We need to include “chance nodes” in additional to MAX and MIN nodes. Recall that MAX is the player who moves first and what to maximize the utility, while MIN moves second and aim to minimize the utility.

What are chance nodes?
Chance nodes are to be treated as a regular layer in the game tree. The layer will be placed accoridng to the game mechanics. For instance, if a dice is thrown at every turn, the chance node level will be included between all levels. Thus we’d get something like: max, chance, min, chance, max, chance ….

The chance nodes themselves dont mean anything more special than the certain function they represent. For instance, if we have a dice throw, a min or max node will have 6 chance nodes as children, which will stand for a number each. Then, based on what these individual numbers are from the chance layer, we continue to create the tree like we usually do.

Answer 80

A

Who are winning from this position?

In order to answer this, we use an evaluation function. Recall that if we never have to stop the search early, we use a utility function instead. There is a great difference between the two, as the evaluation function implies that there is uncertainty involved.

Answer 81

A

Fully observable environments. Like chess. NOT like scrabble.

Answer 82

A

The game tree is a search tree that follows every sequence of moves all the way to a terminal state. The tree may be infinite if the state space is unbounded.

Correction: It might not end at a terminal state in real life because of computation time. However, in theory, it would end when every leaf is a terminal node.

Answer 83

A

The minimax search will indeed return a complete tree. It is designed to keep going until all leaf nodes are terminal states.

We could use a depth limiter to avoid running all the way until all paths lead to terminal states. However, this still wont stop the search/backtrack all the way when a terminal state/goal state for MAX is found.

Answer 84

A

We can use AND-OR search if the only outcomes are win or loose.

Answer 85

A

We use the minimax value at the root. Meaning, we apply the minimax search at look at the move-sequence that lead to the best/optimal value. There is a key distinction here. We need to complete the minimax search in order to find the best value possible. Therefore, chess is not a good example of the minimax search, as any state with utility equal 1 would be as good as any other.

Answer 86

A

Because chess has many paths that leads to victory solutions, and since victory in chess is not nuanced, we dont really care for optimality as long as we have a winning move sequence.

Answer 87

A

The minimax search needs to be complete because of the fact that the utility function might return a value that we dont know if it is optimal or not. We need to brute it to make sure that the utility/minimax value is as high as it can become.

In the case of chess, we dont really care about optimality as long as we have A winning move sequence. Therefore, chess might not be the best game to discuss minimax with.

Answer 88

A

The minimax decision is the move to make. The optimal move to make.

Answer 89

A

Because, if we assume the branching factor remains constant, we have to check all branch possibilities for every level of depth.

d = depth
b = branching factor

O(b^d) time complexity

Answer 90

A

We cannot completely remove the exponent, but we can cut it in half using pruning. Alpha-beta pruning.

The effectiveness of alpha beta pruning is highly dependent on the order of which the states are examined. If the moves are placed perfectly, we can achieve O(b^(m/2)) where m is the depth.

This is achievable because with perfect move ordering, we get an effective branching factor of sqrt(b) when there are b total branches legal. sqrt(b)^(d) =( b^(1/2))^d = b^1/2*d = b^(d/2)

Answer 91

A

Under perfect move odering, we can get O(b^(d/2).

The random is close to O(b^(3d/4))

Answer 92

A

We could use dynamic move ordering. FOr instance, store knowledge about previous runs, or if we use iterative deepening, we could use informaiton from the previous iterations.

Answer 93

A

We use a transposition table. Recall that transpositions are different permutations of the move sequence that lead to the same state. We can store these in the transposition table so that it works like a cache for the heuristic values of states. it can achieve great benefit by avoiding having to search massive subtrees multiple times.

Answer 94

A

Historically, most chess programs have used type A.

Humans use type B though, as they tend to skip moves that are useless.

Answer 95

A

The heuristic variant cuts off at a certain condition. At this point, a heuristic function called an evaluation function will be applied to estimate the utility of the cutoff state.

Answer 96

A

We use features. Recall that features of a state are certain characteristics of the state. A feature could be 2 pawns. We can also create categories of features, or also known as equaivalence classes. Then we add all states that have the same values for all of its features.

Answer 97

A

We never cutoff and apply the evaluation function unless the state is quiescent. A quiescent state implies that there is no pending move that could drastically change the course of the game, like capturing a queen.

Answer 98

A

A policy is a mapping from the state to a solution. This is applicable to endgames, because there arent THAT many of them. We can actually manage a database with some endgames and their corresponding optimal move sequence.

Answer 99

A

We need the initial state, the transition model, and the actions(s) method

Answer 100

A

A complete game tree is a tree that follows every sequence of actions/moves all the way to a terminal state. In many games, the complete game tree is infinite.

Answer 101

A

We can use and or search.

We can do this because all we need is a contingency plan.

Answer 102

A

the minimax value is the utility for player MAX of being in some state.

The minimax value of a terminal state is just the utility.

In a non-terminal state, MAX prefers to move to a state of maximum value when it is MAX’s turn to move, and min prefers a state of minimum value.

Answer 103

A

Consider a root that has 2 edges depth. This measn, including the root layer, the game tree has 3 players.

Consider 3 choices/moves/actions at each state.

When the recursion moves down the first third of the second layer, and eventuelly funinish exploring the third layer of the 1st node in the second layer, we get a value. this value is the smallest value among the values returned by the leaf layer, which was a max lauyer. Let us say this value was 3. This means, the min node picks the smallest ampng for instance (3, 12, 8) which is 3.

However, since min and max alternate, we know that this entire layer (second layer) is a min layer. The max nodes above (in this case, the root only) will pick among the min nodes, and pick the largest values there. This is the clue.

It the min layer nodes that have just finished exploring the first third of the third layer, start exploring the second third of the third layer, and on the very first node here (leaf node) utility is found to be 2. this utulity value wil be returned (by a max node but since terminal, the valeu is returned regardless of min or max). When the min layer catch this value, it can deduct the following:
“we know that the max layer above can choose the value 3. Since this value is 2, and I am not done exploring it, I know that I will return either 2 or some value even smaller. In any case, it will not matter because the max layer above will choose 3 before any of my potential values. Therefore, there is no reason exploring the rest. “

We need a method of remembering the values that serves as these limits.

We use alpha, as the value of the best (i.e. highest value) choice we have found so far at any choice point along the path for max. We can think of alpha as “at least … “.

We use beta, as athe value of the best (i.e. lowest value) choice we have found so far at any choice point along the path for MIN. We can think of beta as “at most…”

Answer 104

A

We can use a transposition table so that we dont have to calculate old depths again, but can directly address them until we reach the next iteration of depth.

Answer 105

A

While alpha beta pruning only cuts away parts of the search state space that is irrelevant to the solution, we can use beam search to prune away moves that look bad. The danger is that we might miss some good ones as well.

Beam search use the k best moves from a stat eaccoridng to the EVAL function. By doing this, we can improve depth on the ones that look the best. For instance if the material position or structure looks much better from these moves.

Answer 106

A

We do N simulations starting from a current state of the game and track which of the possible moves from the current position has the highest win percentage.

Pure monte carlo search is not sufficient for all types of games. This is because for many games this will not cause converging towards optimal play. We need a selection policy that focus the computational resources on the important parts of the game tree. We have to balance two factors:
1) Exploration
2) Exploition

Answer 107

A

Pure motne carlo does N simulations from some state and track the win percentages. It will “randomly” select actions, and try to select the ones with higher win percentages.

Monte carlo tree search maintain a search tree and grows it on each iteration of the following four steps:

1) Selection
Say we have some search tree as we’re in the middle of the monte carlo tree search proceudre.
We use the selection policy to move from the root and down to the leaf level. The selection policy determines the route. A the leaf, we end this step.

2) Expansion.
From the leaf we have reached, we grow the search tree by adding a new child to this node.

3) Simulation
We run a playout from the newly generated node, and record the result but not the path.

4) We use the result of the playout to update all the nodes on the path we used in the selection stage.

Answer 108

A

It appears to be almost a must-have.

In games like chess, iterative deepening allows us to output the most up-to-date result as it is returned to us from depth-limited. This is very flexible.

Iterative deepening allows us to use the evaluation function on a certain depth to order the moves so that our next iteration performs better. Recall that perfect move ordering cause alpha-beta search to halve the exponent.

When using iterative deepening, we use things like transposition tables to store previously computed results to make it more efficient.

Answer 109

A

Pure monte Carlo is just random simulations. Does not use a polayout policy. Thus, it is likely to suck.

Answer 110

A

The playout policy is essentially the most important thing about the monte carlo tree search.

We use a specific playout policy to be able to drive the search towards good simulations, as opposed to selecting random moves. Random moves tend to not converge to good ones.

A playout policy determines what moves to choose.

Answer 111

A

The playout policy is about choosing moves when we perform the simulation(s). This is sort of like applying general rules to how we want to play. Typically based on rules of thumb etc.

The selection policy is about treating the search tree in a good way. Thus, the selection policy will balance exploitation and exploration. This means, a good selection policy will consider exploring new parts of the tree, and also exploiting the moves down the tree that appears to be good.

The two policies are used at differnet phases of the MCTS. Playout policy applies to the actual simulation that we perform after creating a child node that is added to the ever-expanding tree, and running a simulation from this newly generated node.
The selection policy determines how/which path we take down the search tree to generate new node and expand the search tree.

Brainscape's Knowledge GenomeTM

Chapter 6 - Adversarial search Flashcards

Brainscape's Knowledge Genome^TM