Chapter 6 - Adversarial search Flashcards

1
Q

What is a “competitive environment”?

A

A competitive environment is an environment where multiple agents are competing in the sense that they have different goals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Compare competitive task environments in chess and likeminded games and to the real life shite

A

The real life is very difficult. We therefore concentrate on games that often have a very easy way of representing its state. This id highly advantageous when learning about it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the stances we can take when considering multi-agent environments?

A

There are 3 stances to consider when considering multi agent environments:

1) Economy
By this we mean that we consider all agents in the environment as an aggregate, and look at things that can happen like we do in macroeconomic tendencies. This approach is best when there is a large number of agents. The economy approach is about allowing us to make prediction on the whole system, rather than actually predicting the action of a single agent. This makes us interested in TRENDS and overall behavior. The whole point is to regard the system as a whole, rather than considering individual agents. In football, we could do this by considering the entire team rather than individual players. For instance, if we play a certain formation, how many passes and pass completion rate did we manage compared to other formations?

2) The second stance is to consider adversarial search agents as just a part of the environment - a part that makes the environment non-deterministic. In other words, we take a multi-agent environment and treat it like a single agent environment by considering the other agents as just a part of the environment. However, this will completely miss the fact that these agents are our competitors. So, we reduce complexity at the cost of performance.
So, for instance: If an agent is trying to walk outside from point A to point B, we could use this stance. We only need to react to changes in the environment, keep our distance to other people, etc. However, if people happened to be actively trying to make us not reach the destination, then this stance would not be useful, as it is very limited in regards to its competitive abilities.

3) The third stance is about explicitly modeling the adversarial agents with the techniques of adversarial game-tree search. This is our main consideration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are deterministic, two-player, turn-taking, perfect information, zero-sum games?

A

These are games that have got a LOT of research on within AI.
Perfect information means fully observable TE.

Zero sum game means that what is good for player A is bad for player B. There are no win-win situations. Either the outcome is good for one or the other (or perhaps stalemate).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When we are talking about two-player zero-sum games, what do we mean by “position”?

A

Position is the current state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In two-player zero-sum games, what do we call our players?

A

We call the player who moves first “MAX” and the player that moves second is “MIN”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Formally define a two-player zero-sum game

A

The game will have an initial state. S_0. This is the set up.

TO-MOVE(s) - The player whose turn it is to move in state s. Could alternate, could be a function of something else (ex dice = 6)

Actions(s): The actions set. The set of legal moves in state s.

Result(s, a): The resulting state we get by performing action a from state s. This is the transition model. Recall that the transition model requires knowledge of how the world works AND how the actions of the agent impact the world.

Is-Terminal(s): A terminal test, which is true if the game is over, and false if the game is not over. We usually call the set of states that indicate a TRUE signal here as “Terminal states”.

Utility(s, p): A utility function which defines the final numeric value to player p when the game ends in terminal state s. In chess, the outcome is either win loss or draw.. Values are 1, 0 and 1/2.

These 6 elements make up a game formally.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Consider the formal definition of a “game”. What parts of it are required to make the state space graph?

A

We require the initial state, the transition model and the actions set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do we mean by superimposing?

A

Superimpose is to draw something on top of something else. For instance we can superimpose a search tree on top of a state space graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the “game tree”?

A

The game tree is the complete tree as a search that follows every sequence of moves until it reach a terminal state. This of course means that not all game trees will actually be complete, as some state space graphs are infinite and carry no guarantees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What can we say about the plan of MAX?

A

Because its plan will depend on the actions from MIN, we need to construct a contingency plan. A conditional plan.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When can we use AND-OR search to find a plan?

A

We can use a AND-OR search when the result outcome is win or lose. Recall that the AND-OR search will create a conditional plan.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what do we do with games that has more than 2 outcomes?

A

Game that have multiple outcomes requires a minimax search. A minmax search is a more general AND-OR searhc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Conisder the ply game. What is it?

A

a ply refers to a single turn in a multiplayer game

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the minimax value?

A

The minimax value is a value that gives the utility of MAX of being in that state. the minimax of a terminal state is just the utility of the state. In a non-terminal state, MAX prefers to move to a state of maximum value when it is max’s turn to move. Min prefers a state of minimum value (minimum value for MAX, and thus maximum value for MIN).

Summed up: When MAX is to move next, this agent naturally want to move to the position where his minimax value is as high as possible. However, the MIN agent naturally dont want this, and thus want MAX to move to the lowest possible minimax value. Minimax value of some state ASSUMES that both players play optimally from here on out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do we treat perspectives of scores with the minimax?

A

We consider one end of the score as good for MAX, and the other end of the score as good for MIN. For instance, in chess, we’d consider 1 as a good score for MAX, and 0 as a good score for MIN. This way, we dont have to change perspectives all the time.

This implies some symmetry between the scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the minimax value?

A

The minimax value is the utility FOR MAX of being in a specific state. This means, that when finding the minimax value when it is MIN’s turn to move, we still use MAX as a reference point.

This means that the minimax value is a prediction of value. It says: From this specific state, if both players play optimally, the final utility will be X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the minimax decision?

A

The minimax decision is the optimal decision by either min or max. The decision will naturally be differnet in goal. For instance, the minimax decision for MAX is to maximize the utility. For MIN, the minimax decision is to minimize it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the assumption of the minimax theory?

A

Minimax assumes that BOTH players play optimally from the specific state, and until the end.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Is the minimax decision optimal?

A

NO. It is based on assumptions. Sometimes, risky moves can outplay it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the time and space complecity of the minimax search algorithm?

A

If we have a game tree of m depth, b possible moves at every level, then the time complexity is O(b^m).

The space complexity is given as O(bm).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Why is the regular minimax algorithm bad for complex games like chess?

A

It is bad because of its exponential time complexity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Elaborate detailed on the minimax search algorithm

A

First of all, minimax search is an algorithm that will return a single move that is the optimal move assuming both players play optimally.

The algo is recursive and consists of 3 parts:

1) Main init. Basically just initializing the search, and returning the final move.

2) The MAX-VALUE(game, state) part.
This is called every time it is “MAX’s” turn to move.

3) The MIN-VALUE(game, state) part. This is called every time it is “MIN’s” turn to move.

Both MAX-VALUE and MIN-VALUE are very similar. They are basically the same, but with differnet “check”.

COnsider max value. It works like this:
We call it when max is to make a move. It will first check if the current state is a game ended/terminal state. If so, then it will return the move and the utility. the move that got to this state, and the corresponding utility.

If state is not terminal, then we start the real thing.
We first v and move to equal negative infinite. This is a sentinel value to indicate “not sat yet”. Meaning, could also use a flag or key. We set to negative infinity because then every possible value is better.

The we loop through all actions we have available. From the actions(s) method. For each action, we find to find the value we’d get by choosing this action and assuming optimal play. Therefore, for each action, we need to call the MIN-VALUE algorithm. This call will eventually return, and it will return with a pair of values. This pair is (utility, move). Therefore, in the for loop of the MAX VALUE algo, we store these values. Then we compare v with v2 (stored value of utility). If v2 is greater than v, this means that this move is better. We also in this case set move to be equal to a, which is the move returned by the min-value algo. Then, when we have looped through all the actions, we return the pair of (v, move), or (utility, move) that is the best (the one we have remaining).

The MIN VALUE algo works exactly the same, expect for the fact that the sentinel is +inifnity and the check is “if v2 < v” instead of “if v2 > v”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How can we take the regular two-player setup and transform it into a multiplayer game?

A

We need to swap the single values at each node/state (the utility prediction) with a vector of values. In such a case, we actually use different values for the different perspectives. Meaning, if we have 3 players A, B and C, we’d have a terminal state like this:
{6, 2, 8} to indicate that this state is a 6 for A, 2 for B and 8 for C.

This means that the utility function return a vector of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

When can it be optimal for A and B to make an alliance against C?

A

If A and B are in weak positions while C is in a strong position, an alliance can be good for A and B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is alpha and what is beta in the pruning shit?

A

Alpha is the minimum value that the MAX player can be assured of.

Beta is the maximum value that the MIN player can be assured of.

Since alpha is the minimum value that the MAX player can be assured of, if we conisder a max node, and its first kid returns the value 10. This means that the MAX player can be SURE that no value lower than 10 needs to be considered. MAX naturally wants to choose the highest value from its children, tehrefore, if the second children in exploruing an option that returns a smaller value than 10, we dont need it since the child (MIN node) will pick this. When this smaller than 10 value is returned to the MAX node we are consisdering, we will not pick it as the alpha value indicates. Alpha is the smallest value that the MAX player get be assured of - meaning that he will not have to consider getting even smaller values, as one of his childrne returned this as the minimum value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Consider this statement:

“The highest value returned by the children of a MAX node”

What is it?

A

THe alpha value

A MAX node dont need to consider the values of its children that are lower than the alpha value, because it would just pick the alpha value anawyas, as it seeks to maximize.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Consider this statement:

“The lowest value returned by the children of a MIN node”

A

Beta value.

A MIN node dont need to consider values that are larger than this beta value, because it will just pick the beta value anyways.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Rams opp MAX-VALUE with alpha beta pruning

A

MAX-VALUE(game, state, alfa, beta)
if game.isTerminal() return game.utility

v = -infinity

for each move in actions():
1) v2, a2 = min-value(game,state,alfa,beta)
2) if v2 > v then: v = v2, moveReturn = move
alfa = max(alfa, v)
3) if v >= beta: return v, moveReturn

return v, moveReturn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Rams opp MIN VALUE with alfa beta

A

MIN-VALUE(game, state, alfa, beta)

v = infintiy

for each move in actions(state):
1) v2, a2 = MAX-VALUE(game, result(move, state), alfa, beta)
2) jf v2 < v: v = v2 & moveReturn = move. ALSO: beta = min(beta, v)
3) if v <= alfa, return v, moveReturn

return v, moveReturn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What scenario cause the best performance of alpha beta pruning?

A

If the best options are explored first, we can prune more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What are killer moves?

A

Killer moves are the “best moves”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is th killer move heuristic?

A

The killer move heuristic is basically just trying the killer moves first.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What are transpositions?

A

Transpositions are different permutations of the move sequence that end up in the same position.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is a transposition table?

A

A transposition table is a table that cache the heuristic value of states. It seem to be basically caching of states and their corresponding optimal move. So, if we happen to reach the same state several times in our search, then we’d just look up the solution move rather than calcualting it again.

The power here is that we can avoid searching subtrees over again.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Consider chess: It is well known that a simple minimax, even with alpha beta pruning AND caching using transposition tables, will not work on chess due to the enormous state space. What strategies could then be used with chess?

A

We have 2 types:

Type A strategy: COnsider a WIDE but shallow state space. Reach all nodes within a certain depth.

Type B strategy: Follow only promising lines, and thus search further but narrower.

Historically, most chess programs have been type A.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

How can we make use of our limited cxomputation time? Meaning, since we cannot search until completion, what do we do?

A

We can cut off the search early and apply a heuristic evaluation function to states, effetively treating non-terminal states as if they are terminal.

In other words, we use “eval” instead of “utility”

We also change the terminal test with a cutoff test, which of course must also return true if the state is in fact terminal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

When we use the heuristic minimax, how can we formally describe the function?

A

H-MINIMAX(s, d) =
if isCutoff(s,d) then Eval(s, MAX)

if TO-Move(s) == MAX then max_{for every action in actions(s)} H-MINIMAX(result(s,a), d+1)

if To-Move(s) == MIN then min_{for every action in actions} H-MINIMAX(result(s,a), d+1)

When d is the depth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Consider the heuristic function Eval(s, p). What does it return?

A

The heuristic function eval(s,p) will return an estimate of the expected utility of state s to player p.

For terminal states, the estimate must be equal to the utility, and for non-terminal states, the heuristic estimate must be somewhere between the lose-value and win-value.

Utility(loss, p) <= Eval(s, p) <= Utility(win, p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What makes a heuristic good?

A

The computation must be fast, since the entire point is to improve search time.

The heuristic function needs to be strongly connected to the actual chances of winning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What are “features” of the state?

A

Domain specific things that we can utilize. It is certain characteristics that we can use to make decisions and predictions.

IN chess, a feautre can be the number of pawns for instance.

if we look at different states that all have the same features, we can group these together and call them “categories” or “equivalence classes” of states.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What are categories or “equivalence classes” of states?

A

Categories or “equivalence classes” of state is a collection of states that all share the same values for features. For example, one category might contain all two-pawn versus one-pawn endgames.

Any given category will contain some states that lead to wins, some that leads to draws, and some that lead to losses (assuming perfect play).

If we have an evaluation function, it might not know what states lead to what outcome. However, it might know the proportion of which lead to victory. Then the algorithm/heuristic can use this proportion information to make the best decision. It is quite common to use the expected value to make decisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Why is it not common to use probabilities as the only way of making deicions?

A

Because it requires too much experience that we really dont have.

instead, we usually consider individual features and calculate numerical contributions that we later combine to make up the total value. For instance, with chess, we can use material value and positional value. Summed together to find the total value. this is called a weighted linear function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What is a weighted linear function+

A

Eval(s) = w1f1(s) + w2f2(s) + … + wn*fn(s)

fi is feature i, wi is the corresponding weight, indicating how important a specific feature is. The weights needs to be normalized so that the final score is within the range of a loss and a win ([0, 1])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Regarding evaluation functions and correlation to winning chances: If state s is twice as likely to result in a win as s’, what is required of the eval function?

A

We only require that eval(s) > eval(s’).

We do NOT require that eval(s) = 2*eval(s’)

46
Q

What is the problem with wiefhted linear evaluation functions?

A

Linearity assumes independence among features. this is often not the case. Therefore, it is likely to be better to use non-linear combinations of features.

For instance, a bishop is worth more in the endgame than in the early game. Also, a pair of bishops are worth more than the sum of the individuals.

47
Q

What modifications do we need to do with the alpha-beta search to make room for cutting off the search early?

A

First, we need to change isTerminal with isCutoff.

Then we need to add some incrementer that keeps track of the depth, so that we can cut off the search at a certain depth.

The simple approach is to choose a cutoff at a certain value for d.
However, a more robust variant is to use iterative deepening.

And of course, we need to apply the evaluation function when cuttoff ir reached:

if game.IS-cutoff(state, d) then return game.Eval(state, player)

48
Q

What can we say about the timing of using an evaluation function?

A

An evaluation function should only be applied to positions that are QUIESCENT. Quiscent positions are positions with no pending move that can significantly alter the game. Like a free queen take for instance.

Therefore, the is-cutoff function should return false if the state is non-quiscent.

49
Q

What is the horizon effect?

A

The horizon effect is the case where a program sense danger, and push the danger away and above “the horizon”, but ultimately end up doing worse. For instance, if our rook is fucked, the agent might choose to “delay” the capture of it. However, by doing this, it will open up other bad cases which could just end up hurting the player more.

50
Q

How can one mitigate the horizon effect?

A

Use of singular extensions.This is about recognizing that some moves are much more important than others, and extending the depth of the search for that move.

This technique is ALL ABOUT recognizing what sequences to further analyze. This makes all the difference. This is what makes the differnece between good chess engines, and world class chess engines.

51
Q

What is singular extension+

A

Singular extension is about extending the search of a spefici move because the chess engine has recognized that there might be significant benefit to this move.

All moves are not worth searching deeper. A great chess engine can recognize the few moves that are critical, and this focus on these moves.

52
Q

What is the potential hazard of using cutoff and evaluation functions?

A

We might get the wrong answer

53
Q

What is the difference between alpha beta pruning and forward pruning as a general principle?

A

Alpha beta pruning removes states that are not important to the outcome, meaning we only remove meaningless states.

Forward pruning prunes moves that APPEAR to be bad, even though they might actually be good. However, it is a strategy that can significantly reduce computation time.

Therefore, with forward pruning, we say that we sacrifice some level of correctness in order to reduce the state space.

Forward pruning is a TYPE B SYTRATEGY, as it sacrtifice width to get depth.

54
Q

What is one possibility of forward pruning?

A

Beam search. Only considering the n best moves according to the evaluation function. However, this is rather dangerous, as there are no guanrantuees that the best choice will not be pruned away.

55
Q

What is ProbCut?

A

ProbCut is short for probabilistic cut, and is a forward pruning method.

While alpha beta search prunes nodes that are PROVABLY outside the current (alfa, beta) window, the probCut will prune nodes that are PROBABLY outside the window.

The probCut algorithm calcualtes this probability of a node being outside of the window by doing a shallow search to compute the back-up value v of a node and then using past experience to esimate how likely it is that a score of v at depth d in the tree would be outisde (alfa, beta).

56
Q

Recall what move ordering is

A

Move ordering reference the order in which we consider moves. This order can really make or break the algorithm.

Consider alfa beta. If we consider the worst first all the time, then this methid would not perform better than the regular minimax algorithm.

57
Q

What is “late move reduction”?

A

Late move reduction assumes that we have done the move ordering well. This means that worse moves are likely to be the moves we explore later.

then the late move reduction technique is about reducing the depth of the search we apply to later moves. We dont prune them completely, but rather reduce the depth of the search. If it happen to return with a great value, we can redo the search with a greater depth.

58
Q

Elaborate on search versus lookup

A

Lookup is much faster, and is often the preferred choice. For instance, consider chess openings. They are largely standardized, and we can then use table lookup instead of searching uselessly.

You can even use statistics to figure what opening is better etc.

Long story short, there are scenarios where there are NO REASON to use search.

59
Q

why is the alpha beta search not good for a game like Go?

A

Because GO has a branching factor larger than 300.

Also ,it is very difficult to find an evaluation function for GO that is good. Material value is not really an indicator of winning positions.

Instead of alpha beta, Go usually use Monte Carlo tree search.

60
Q

Elaborate on the monte carlo tree search

A

The basic MCTS strategy does not use a heuristic evaluation function. Instead, it will consider many simulations of games and calculate average utility of states. Meaning, in order to find an estimate value of the utility of state s, the MCTS will perform simulations from this state s and look at the average utility achieved from that state.

61
Q

If a game has either win or lose outcome, what is then the expected utility measured as?

A

A win percentage

62
Q

What is a playout policy in MCTS?

A

Playout policy is a way of making choices during simulation. We naturally dont want to pick random moves, as this will give the result of average utility when moves are random. therefore, we want to head towards good moves.

63
Q

What is pure monte carlo search?

A

Pure monte carlo search do N simulations starting from the current state of the game, and track which of the moves have the highest win percentage.

64
Q

Is the pure monte carlo search sufficient?

A

No, because for many games it will not converge towards optimal play as N increase. This is due to the state space being too large.

we need a selection policy that focus on important parts of the game tree.

65
Q

Recall the steps in the MINIMAX algorithm

A

The minimax algorithm consists of 3 parts.
1) The init function
2) the maxValue
3) the minValue

1)
function MINIMAX-SEARCH(game, state)
player = game.toPlay(state)
v, move = maxValue(game, state)
return (v, move)

2)
function maxValue(game, state)

if game.isTerminal(state): return (game.utility(state, player), null)

v, move = (-infinity)
for a in game.actions(state):
v2, a2 = minValue(game, result(state, a))
if v2 > v: v = v2 && move = a

“loop done”
return (v, move)

3)
function minValue(game, state)

if game.isTerminal(state):
return (game.utility(state, player), null)

v, move = infinity
for a in game.actions(state):
v2, a2 = maxValue(game, result(state, a))
if v2<v: v = v2 && move = a

“loop ended”
return (v, move)

IMPORTANT:
I first thought that a2 was sat to “move” if v2 > v (or v2 < v) but this is NOT the case. we set a to be equal to move. Thus, a is the retunred move. a2 is not really used.

66
Q

Elaborate on how the MINIMAX search algorithm works. Not necessarily the lines of code, but the general work it does

A

MAX starts.

THe search will search down until the furthest point. Along the path, all nodes that have been visited will get its own v and move value. These are sat as unsat for the time being.

When the search reach the node furthest down, meaning a goal state/terminal state, it will return the utility of this state.
The parent of this terminal node will consider this child, and all its other children. Then it will pick the node-value/returned value and move that it find to be best according to preferences (min vs max).

Then, the parent will treat the value it found to be the best from its children as ITS OWN UTILITY SCORE. Then, the process will repeat itself as the search continue.

67
Q

Consider the minimax algorithm. What can it be used for, and how?

A

We can use minimax for all two player zero sum games. We apply it to a certain game by applying a specific utility function. We can use it for chess for instance.

It goes to show how much the utility/evanualtion function and isTerminal/isCutoff really matters

68
Q

What is the worst part about minimax?

A

THe game tree is exponential in its depth.

69
Q

Tell the steps of the alpha-beta-minimax search

A

3 parts, as with regular minimax.

1)
alpha-beta-search(game, state)

player = game.toPlay(state)
v, move = maxValue(game, state, alfa, beta)
return move

2)
maxValue(game, state, alfa, beta)

if game.isTerminal(state):
return (game.utility(state, player), null)

v, move = -infinity

for a in game.actions(state){
v2, a2 = minValue(game, result(state,a), alfa, beta)
if (v2 > v) {
v = v2
move = a
alfa = max(alfa, v)
}
if (v >= beta){
return (v, move)
}
}
return (v, move)

3)
minValue(game, state, alfa, beta)

if game.isTerminal(state){
return (game.utility(state, player), null)
}

v, move = infinity
for a in game.actions(state){
v2, a2 = maxValue(game, result(state, a), alfa, beta)
if (v2 < v){
v = v2
move = a
beta = min(beta, v)
}
if (v <= alfa){
return (v, move)
}
}

Regarding the maxValue:
maxValue will obviously operate at max node. therefore, it will set alfa to be the highest value it can so far choose. If this max node explore some children, and this children has a child value that is smaller than the alfa value, we know that this immediate children of the parent will return either this value or something lower. this is not useful for us, since we know from the alfa value that the parent node can choose something higher anyways. Therefore, we want to return from exploring that particular min child.

70
Q

Consider alpha beta pruning. What is it actually?

Consider perfect move ordering. What will the time complexity of alpha-beta search be?

A

It is a pruning technique that guarantuees no loss in quality for hte solution.

It would be O(b^(m/2)) where b is the branching factor and m is the depth.

With random move ordering, we get O(b^(3m/4))

71
Q

Consider monte carlo and a “selection policy”. What are the two elements of the selection policy that we need to consider?

A

1) Exploration of states that have had few playouts

2) Exploitation of states that have done well in the past.

Monte carlo tree search does this by maintaining a search tree and growing it on each iteration following this 4 step scheme:

1) Selection: Starting at the root of the search tree, the player moves to one of children/states based on the selection policy. For instance, this would be to move to the state that has the best win percentage thus far, or if the node is not very explored. ex either like 66/71 or 2/11. We have to include the paths that arent very much explored because they might contain better strategies.
We “simulate” using the selection policy all the way down until we reach a leaf node in the search tree. Then we move on to the next step.

2) Expansion: We create a new child node of the leaf node we just reached. For now, we record this node with “0/0” because it has not been tested yet.

3) Simulation: We use the newly created node as a starting point for a round of simulation. We store the result of who won.

4) We use the result we got from the simulation, and update the values on the path we took in the first step. If black won the game from the state, then we increment all black nodes in both wins and number of simulations. White nodes are only incremented in its number of simulations, since white lost.

x) We repeat these four steps to get a better result.

72
Q

Explain the steps we do each iteration of monte carlo tree search

A

1) Selection: Starting at the root of the search tree, the player moves to one of children/states based on the selection policy. For instance, this would be to move to the state that has the best win percentage thus far, or if the node is not very explored. ex either like 66/71 or 2/11. We have to include the paths that arent very much explored because they might contain better strategies.
We “simulate” using the selection policy all the way down until we reach a leaf node in the search tree. Then we move on to the next step.

2) Expansion: We create a new child node of the leaf node we just reached. For now, we record this node with “0/0” because it has not been tested yet.

3) Simulation: We use the newly created node as a starting point for a round of simulation. We store the result of who won.

4) We use the result we got from the simulation, and update the values on the path we took in the first step. If black won the game from the state, then we increment all black nodes in both wins and number of simulations. White nodes are only incremented in its number of simulations, since white lost.

x) We repeat these four steps to get a better result.

73
Q

Name one very effective selection policy for Monte Carlo Tree Search

A

UCT: Upper Confidence bounds applied to Trees”.

The policy ranks all moves that are possible at a given state based on an upper confidence bound formula called UCB1:

UCB1(n) = U(n)/N(n) + C*sqrt(logN(Parent(n)) / N(n))

Where U(n) is the total utility of all playouts that went through node n.
N(n) is the number of playouts through node n.
Parent(n( is the parent node of n.
Thus, U(n)/N(n) is the average utility of n.
The sqrt part is the exploration term. It has N(n) in denominator which will favor low levels of playouts.

Notice the log part. It is log of the N-function of parent of n: log(N(parent(n))

C is a constant, often at about sqrt(2).

74
Q

Explain what the UCT MCTS algorithm returns

A

It returns the move with the highest number of playouts. The UCB1 formula will ensure that the node with the most playouts is almost always the move with the highest average utility as well.

75
Q

When do we generally say that Monte Carlo is better than alpha-beta search?

A

For games with a very large branching factor we give the edge to monte carlo. also, if it is difficult to come up with a good evaluation function, monte carlo is generally better.

Monte Carlo has the advantage of aggregating. Therefore, individual miscalcluations are not as important, because they will be drowned out by the aggregate.

76
Q

Are there any cases where monte carlo can completely fail?

A

In cases where a single move can change the entire game. For instance, if monte carlo has failed to explore a queen move, it can be very wrong.

77
Q

Explain very short what we mean by stochastic games

A

Stochastic games are games that involves change or luck. This will include games with dice. In other words, we cannot exactly make a path that would be followed under the assumption of optimal play.

78
Q

What is a game tree+

A

A game tree is a search tree that is superimposed on the state space graph of the game problem.

79
Q

What can we not use a standard game tree with stochastic games?

A

Because of the uncertianty (stochastic), we cannot create a path of optimal play. We need to include “chance nodes” in additional to MAX and MIN nodes. Recall that MAX is the player who moves first and what to maximize the utility, while MIN moves second and aim to minimize the utility.

What are chance nodes?
Chance nodes are to be treated as a regular layer in the game tree. The layer will be placed accoridng to the game mechanics. For instance, if a dice is thrown at every turn, the chance node level will be included between all levels. Thus we’d get something like: max, chance, min, chance, max, chance ….

The chance nodes themselves dont mean anything more special than the certain function they represent. For instance, if we have a dice throw, a min or max node will have 6 chance nodes as children, which will stand for a number each. Then, based on what these individual numbers are from the chance layer, we continue to create the tree like we usually do.

80
Q

For each state where we choose to stop the search (ex due to time constraints), what do we ask ourselves?

A

Who are winning from this position?

In order to answer this, we use an evaluation function. Recall that if we never have to stop the search early, we use a utility function instead. There is a great difference between the two, as the evaluation function implies that there is uncertainty involved.

81
Q

What do we mean by “perfect information” games?

A

Fully observable environments. Like chess. NOT like scrabble.

82
Q

What is the game tree?

A

The game tree is a search tree that follows every sequence of moves all the way to a terminal state. The tree may be infinite if the state space is unbounded.

Correction: It might not end at a terminal state in real life because of computation time. However, in theory, it would end when every leaf is a terminal node.

83
Q

Consider the game tree. Is it complete in the sense of exploration?

A

The minimax search will indeed return a complete tree. It is designed to keep going until all leaf nodes are terminal states.

We could use a depth limiter to avoid running all the way until all paths lead to terminal states. However, this still wont stop the search/backtrack all the way when a terminal state/goal state for MAX is found.

84
Q

When can we use an AND-OR search in adversarial search?

A

We can use AND-OR search if the only outcomes are win or loose.

85
Q

Given a game tree, how do we determine the optimal strategy?

A

We use the minimax value at the root. Meaning, we apply the minimax search at look at the move-sequence that lead to the best/optimal value. There is a key distinction here. We need to complete the minimax search in order to find the best value possible. Therefore, chess is not a good example of the minimax search, as any state with utility equal 1 would be as good as any other.

86
Q

Why is chess perhaps not a good game to discuss using minimax search?

A

Because chess has many paths that leads to victory solutions, and since victory in chess is not nuanced, we dont really care for optimality as long as we have a winning move sequence.

87
Q

Why does the minimax search need to be complete?

A

The minimax search needs to be complete because of the fact that the utility function might return a value that we dont know if it is optimal or not. We need to brute it to make sure that the utility/minimax value is as high as it can become.

In the case of chess, we dont really care about optimality as long as we have A winning move sequence. Therefore, chess might not be the best game to discuss minimax with.

88
Q

What is the “minimax decision”

A

The minimax decision is the move to make. The optimal move to make.

89
Q

Why is the minimax search exponential?

A

Because, if we assume the branching factor remains constant, we have to check all branch possibilities for every level of depth.

d = depth
b = branching factor

O(b^d) time complexity

90
Q

COnsider the running time of minimax search, which is exponential. Can we do anything about this?

A

We cannot completely remove the exponent, but we can cut it in half using pruning. Alpha-beta pruning.

The effectiveness of alpha beta pruning is highly dependent on the order of which the states are examined. If the moves are placed perfectly, we can achieve O(b^(m/2)) where m is the depth.

This is achievable because with perfect move ordering, we get an effective branching factor of sqrt(b) when there are b total branches legal. sqrt(b)^(d) =( b^(1/2))^d = b^1/2*d = b^(d/2)

91
Q

What is the effective branching factor of alpha beta pruned minimax search?

A

Under perfect move odering, we can get O(b^(d/2).

The random is close to O(b^(3d/4))

92
Q

What could we do with move ordering to improve search?

A

We could use dynamic move ordering. FOr instance, store knowledge about previous runs, or if we use iterative deepening, we could use informaiton from the previous iterations.

93
Q

What do we do to handle repeated states in adversarial search+

A

We use a transposition table. Recall that transpositions are different permutations of the move sequence that lead to the same state. We can store these in the transposition table so that it works like a cache for the heuristic values of states. it can achieve great benefit by avoiding having to search massive subtrees multiple times.

94
Q

Is type A strategy or type B strategy more used by humans? which is used by computers?

A

Historically, most chess programs have used type A.

Humans use type B though, as they tend to skip moves that are useless.

95
Q

What is the difference between alpha beta search and “heuristic alpha beta search”?

A

The heuristic variant cuts off at a certain condition. At this point, a heuristic function called an evaluation function will be applied to estimate the utility of the cutoff state.

96
Q

What do we use to create an evaluation function?

A

We use features. Recall that features of a state are certain characteristics of the state. A feature could be 2 pawns. We can also create categories of features, or also known as equaivalence classes. Then we add all states that have the same values for all of its features.

97
Q

What is the general rule regarding use of the evaluation function?

A

We never cutoff and apply the evaluation function unless the state is quiescent. A quiescent state implies that there is no pending move that could drastically change the course of the game, like capturing a queen.

98
Q

What do we mean by saying that “a computer can solve an endgame state position by a policy”?

A

A policy is a mapping from the state to a solution. This is applicable to endgames, because there arent THAT many of them. We can actually manage a database with some endgames and their corresponding optimal move sequence.

99
Q

What parts of the game definition is required to create the state space graph?

A

We need the initial state, the transition model, and the actions(s) method

100
Q

Define a complete game tree

A

A complete game tree is a tree that follows every sequence of actions/moves all the way to a terminal state. In many games, the complete game tree is infinite.

101
Q

If we consider a game that has only binary outcomes, what simplification can we use and why?

A

We can use and or search.

We can do this because all we need is a contingency plan.

102
Q

Define the minimax value

A

the minimax value is the utility for player MAX of being in some state.

The minimax value of a terminal state is just the utility.

In a non-terminal state, MAX prefers to move to a state of maximum value when it is MAX’s turn to move, and min prefers a state of minimum value.

103
Q

Elaborate on alpha beta pruning

A

Consider a root that has 2 edges depth. This measn, including the root layer, the game tree has 3 players.

Consider 3 choices/moves/actions at each state.

When the recursion moves down the first third of the second layer, and eventuelly funinish exploring the third layer of the 1st node in the second layer, we get a value. this value is the smallest value among the values returned by the leaf layer, which was a max lauyer. Let us say this value was 3. This means, the min node picks the smallest ampng for instance (3, 12, 8) which is 3.

However, since min and max alternate, we know that this entire layer (second layer) is a min layer. The max nodes above (in this case, the root only) will pick among the min nodes, and pick the largest values there. This is the clue.

It the min layer nodes that have just finished exploring the first third of the third layer, start exploring the second third of the third layer, and on the very first node here (leaf node) utility is found to be 2. this utulity value wil be returned (by a max node but since terminal, the valeu is returned regardless of min or max). When the min layer catch this value, it can deduct the following:
“we know that the max layer above can choose the value 3. Since this value is 2, and I am not done exploring it, I know that I will return either 2 or some value even smaller. In any case, it will not matter because the max layer above will choose 3 before any of my potential values. Therefore, there is no reason exploring the rest. “

We need a method of remembering the values that serves as these limits.

We use alpha, as the value of the best (i.e. highest value) choice we have found so far at any choice point along the path for max. We can think of alpha as “at least … “.

We use beta, as athe value of the best (i.e. lowest value) choice we have found so far at any choice point along the path for MIN. We can think of beta as “at most…”

104
Q

What can we do to make iterative deepening better?

A

We can use a transposition table so that we dont have to calculate old depths again, but can directly address them until we reach the next iteration of depth.

105
Q

How can we use forward pruning to improve search depth?

A

While alpha beta pruning only cuts away parts of the search state space that is irrelevant to the solution, we can use beam search to prune away moves that look bad. The danger is that we might miss some good ones as well.

Beam search use the k best moves from a stat eaccoridng to the EVAL function. By doing this, we can improve depth on the ones that look the best. For instance if the material position or structure looks much better from these moves.

106
Q

What is pure monte carlo search?

Is it sufficient?

A

We do N simulations starting from a current state of the game and track which of the possible moves from the current position has the highest win percentage.

Pure monte carlo search is not sufficient for all types of games. This is because for many games this will not cause converging towards optimal play. We need a selection policy that focus the computational resources on the important parts of the game tree. We have to balance two factors:
1) Exploration
2) Exploition

107
Q

Elaborate on monte carlo tree search. Quickly first, describe why it differs from pure monte carlo search.

A

Pure motne carlo does N simulations from some state and track the win percentages. It will “randomly” select actions, and try to select the ones with higher win percentages.

Monte carlo tree search maintain a search tree and grows it on each iteration of the following four steps:

1) Selection
Say we have some search tree as we’re in the middle of the monte carlo tree search proceudre.
We use the selection policy to move from the root and down to the leaf level. The selection policy determines the route. A the leaf, we end this step.

2) Expansion.
From the leaf we have reached, we grow the search tree by adding a new child to this node.

3) Simulation
We run a playout from the newly generated node, and record the result but not the path.

4) We use the result of the playout to update all the nodes on the path we used in the selection stage.

108
Q

Elaborate on iterative deepening in conjunction with adversarial search.

A

It appears to be almost a must-have.

In games like chess, iterative deepening allows us to output the most up-to-date result as it is returned to us from depth-limited. This is very flexible.

Iterative deepening allows us to use the evaluation function on a certain depth to order the moves so that our next iteration performs better. Recall that perfect move ordering cause alpha-beta search to halve the exponent.

When using iterative deepening, we use things like transposition tables to store previously computed results to make it more efficient.

109
Q

What is pure Monte Carlo search?

A

Pure monte Carlo is just random simulations. Does not use a polayout policy. Thus, it is likely to suck.

110
Q

Elaborate on the playout policy of MCTS

A

The playout policy is essentially the most important thing about the monte carlo tree search.

We use a specific playout policy to be able to drive the search towards good simulations, as opposed to selecting random moves. Random moves tend to not converge to good ones.

A playout policy determines what moves to choose.

111
Q

Elaborate on the difference between playout policy and selection policy of the monte carlo tree search

A

The playout policy is about choosing moves when we perform the simulation(s). This is sort of like applying general rules to how we want to play. Typically based on rules of thumb etc.

The selection policy is about treating the search tree in a good way. Thus, the selection policy will balance exploitation and exploration. This means, a good selection policy will consider exploring new parts of the tree, and also exploiting the moves down the tree that appears to be good.

The two policies are used at differnet phases of the MCTS. Playout policy applies to the actual simulation that we perform after creating a child node that is added to the ever-expanding tree, and running a simulation from this newly generated node.
The selection policy determines how/which path we take down the search tree to generate new node and expand the search tree.

112
Q
A