Chapter 6 - Adversarial search Flashcards
What is a “competitive environment”?
A competitive environment is an environment where multiple agents are competing in the sense that they have different goals
Compare competitive task environments in chess and likeminded games and to the real life shite
The real life is very difficult. We therefore concentrate on games that often have a very easy way of representing its state. This id highly advantageous when learning about it.
What are the stances we can take when considering multi-agent environments?
There are 3 stances to consider when considering multi agent environments:
1) Economy
By this we mean that we consider all agents in the environment as an aggregate, and look at things that can happen like we do in macroeconomic tendencies. This approach is best when there is a large number of agents. The economy approach is about allowing us to make prediction on the whole system, rather than actually predicting the action of a single agent. This makes us interested in TRENDS and overall behavior. The whole point is to regard the system as a whole, rather than considering individual agents. In football, we could do this by considering the entire team rather than individual players. For instance, if we play a certain formation, how many passes and pass completion rate did we manage compared to other formations?
2) The second stance is to consider adversarial search agents as just a part of the environment - a part that makes the environment non-deterministic. In other words, we take a multi-agent environment and treat it like a single agent environment by considering the other agents as just a part of the environment. However, this will completely miss the fact that these agents are our competitors. So, we reduce complexity at the cost of performance.
So, for instance: If an agent is trying to walk outside from point A to point B, we could use this stance. We only need to react to changes in the environment, keep our distance to other people, etc. However, if people happened to be actively trying to make us not reach the destination, then this stance would not be useful, as it is very limited in regards to its competitive abilities.
3) The third stance is about explicitly modeling the adversarial agents with the techniques of adversarial game-tree search. This is our main consideration.
What are deterministic, two-player, turn-taking, perfect information, zero-sum games?
These are games that have got a LOT of research on within AI.
Perfect information means fully observable TE.
Zero sum game means that what is good for player A is bad for player B. There are no win-win situations. Either the outcome is good for one or the other (or perhaps stalemate).
When we are talking about two-player zero-sum games, what do we mean by “position”?
Position is the current state.
In two-player zero-sum games, what do we call our players?
We call the player who moves first “MAX” and the player that moves second is “MIN”.
Formally define a two-player zero-sum game
The game will have an initial state. S_0. This is the set up.
TO-MOVE(s) - The player whose turn it is to move in state s. Could alternate, could be a function of something else (ex dice = 6)
Actions(s): The actions set. The set of legal moves in state s.
Result(s, a): The resulting state we get by performing action a from state s. This is the transition model. Recall that the transition model requires knowledge of how the world works AND how the actions of the agent impact the world.
Is-Terminal(s): A terminal test, which is true if the game is over, and false if the game is not over. We usually call the set of states that indicate a TRUE signal here as “Terminal states”.
Utility(s, p): A utility function which defines the final numeric value to player p when the game ends in terminal state s. In chess, the outcome is either win loss or draw.. Values are 1, 0 and 1/2.
These 6 elements make up a game formally.
Consider the formal definition of a “game”. What parts of it are required to make the state space graph?
We require the initial state, the transition model and the actions set.
What do we mean by superimposing?
Superimpose is to draw something on top of something else. For instance we can superimpose a search tree on top of a state space graph.
What is the “game tree”?
The game tree is the complete tree as a search that follows every sequence of moves until it reach a terminal state. This of course means that not all game trees will actually be complete, as some state space graphs are infinite and carry no guarantees.
What can we say about the plan of MAX?
Because its plan will depend on the actions from MIN, we need to construct a contingency plan. A conditional plan.
When can we use AND-OR search to find a plan?
We can use a AND-OR search when the result outcome is win or lose. Recall that the AND-OR search will create a conditional plan.
what do we do with games that has more than 2 outcomes?
Game that have multiple outcomes requires a minimax search. A minmax search is a more general AND-OR searhc.
Conisder the ply game. What is it?
a ply refers to a single turn in a multiplayer game
What is the minimax value?
The minimax value is a value that gives the utility of MAX of being in that state. the minimax of a terminal state is just the utility of the state. In a non-terminal state, MAX prefers to move to a state of maximum value when it is max’s turn to move. Min prefers a state of minimum value (minimum value for MAX, and thus maximum value for MIN).
Summed up: When MAX is to move next, this agent naturally want to move to the position where his minimax value is as high as possible. However, the MIN agent naturally dont want this, and thus want MAX to move to the lowest possible minimax value. Minimax value of some state ASSUMES that both players play optimally from here on out.
How do we treat perspectives of scores with the minimax?
We consider one end of the score as good for MAX, and the other end of the score as good for MIN. For instance, in chess, we’d consider 1 as a good score for MAX, and 0 as a good score for MIN. This way, we dont have to change perspectives all the time.
This implies some symmetry between the scores.
What is the minimax value?
The minimax value is the utility FOR MAX of being in a specific state. This means, that when finding the minimax value when it is MIN’s turn to move, we still use MAX as a reference point.
This means that the minimax value is a prediction of value. It says: From this specific state, if both players play optimally, the final utility will be X.
What is the minimax decision?
The minimax decision is the optimal decision by either min or max. The decision will naturally be differnet in goal. For instance, the minimax decision for MAX is to maximize the utility. For MIN, the minimax decision is to minimize it.
What is the assumption of the minimax theory?
Minimax assumes that BOTH players play optimally from the specific state, and until the end.
Is the minimax decision optimal?
NO. It is based on assumptions. Sometimes, risky moves can outplay it.
What is the time and space complecity of the minimax search algorithm?
If we have a game tree of m depth, b possible moves at every level, then the time complexity is O(b^m).
The space complexity is given as O(bm).
Why is the regular minimax algorithm bad for complex games like chess?
It is bad because of its exponential time complexity.
Elaborate detailed on the minimax search algorithm
First of all, minimax search is an algorithm that will return a single move that is the optimal move assuming both players play optimally.
The algo is recursive and consists of 3 parts:
1) Main init. Basically just initializing the search, and returning the final move.
2) The MAX-VALUE(game, state) part.
This is called every time it is “MAX’s” turn to move.
3) The MIN-VALUE(game, state) part. This is called every time it is “MIN’s” turn to move.
Both MAX-VALUE and MIN-VALUE are very similar. They are basically the same, but with differnet “check”.
COnsider max value. It works like this:
We call it when max is to make a move. It will first check if the current state is a game ended/terminal state. If so, then it will return the move and the utility. the move that got to this state, and the corresponding utility.
If state is not terminal, then we start the real thing.
We first v and move to equal negative infinite. This is a sentinel value to indicate “not sat yet”. Meaning, could also use a flag or key. We set to negative infinity because then every possible value is better.
The we loop through all actions we have available. From the actions(s) method. For each action, we find to find the value we’d get by choosing this action and assuming optimal play. Therefore, for each action, we need to call the MIN-VALUE algorithm. This call will eventually return, and it will return with a pair of values. This pair is (utility, move). Therefore, in the for loop of the MAX VALUE algo, we store these values. Then we compare v with v2 (stored value of utility). If v2 is greater than v, this means that this move is better. We also in this case set move to be equal to a, which is the move returned by the min-value algo. Then, when we have looped through all the actions, we return the pair of (v, move), or (utility, move) that is the best (the one we have remaining).
The MIN VALUE algo works exactly the same, expect for the fact that the sentinel is +inifnity and the check is “if v2 < v” instead of “if v2 > v”.
How can we take the regular two-player setup and transform it into a multiplayer game?
We need to swap the single values at each node/state (the utility prediction) with a vector of values. In such a case, we actually use different values for the different perspectives. Meaning, if we have 3 players A, B and C, we’d have a terminal state like this:
{6, 2, 8} to indicate that this state is a 6 for A, 2 for B and 8 for C.
This means that the utility function return a vector of values.
When can it be optimal for A and B to make an alliance against C?
If A and B are in weak positions while C is in a strong position, an alliance can be good for A and B.
What is alpha and what is beta in the pruning shit?
Alpha is the minimum value that the MAX player can be assured of.
Beta is the maximum value that the MIN player can be assured of.
Since alpha is the minimum value that the MAX player can be assured of, if we conisder a max node, and its first kid returns the value 10. This means that the MAX player can be SURE that no value lower than 10 needs to be considered. MAX naturally wants to choose the highest value from its children, tehrefore, if the second children in exploruing an option that returns a smaller value than 10, we dont need it since the child (MIN node) will pick this. When this smaller than 10 value is returned to the MAX node we are consisdering, we will not pick it as the alpha value indicates. Alpha is the smallest value that the MAX player get be assured of - meaning that he will not have to consider getting even smaller values, as one of his childrne returned this as the minimum value.
Consider this statement:
“The highest value returned by the children of a MAX node”
What is it?
THe alpha value
A MAX node dont need to consider the values of its children that are lower than the alpha value, because it would just pick the alpha value anawyas, as it seeks to maximize.
Consider this statement:
“The lowest value returned by the children of a MIN node”
Beta value.
A MIN node dont need to consider values that are larger than this beta value, because it will just pick the beta value anyways.
Rams opp MAX-VALUE with alpha beta pruning
MAX-VALUE(game, state, alfa, beta)
if game.isTerminal() return game.utility
v = -infinity
for each move in actions():
1) v2, a2 = min-value(game,state,alfa,beta)
2) if v2 > v then: v = v2, moveReturn = move
alfa = max(alfa, v)
3) if v >= beta: return v, moveReturn
return v, moveReturn
Rams opp MIN VALUE with alfa beta
MIN-VALUE(game, state, alfa, beta)
v = infintiy
for each move in actions(state):
1) v2, a2 = MAX-VALUE(game, result(move, state), alfa, beta)
2) jf v2 < v: v = v2 & moveReturn = move. ALSO: beta = min(beta, v)
3) if v <= alfa, return v, moveReturn
return v, moveReturn
What scenario cause the best performance of alpha beta pruning?
If the best options are explored first, we can prune more.
What are killer moves?
Killer moves are the “best moves”.
What is th killer move heuristic?
The killer move heuristic is basically just trying the killer moves first.
What are transpositions?
Transpositions are different permutations of the move sequence that end up in the same position.
What is a transposition table?
A transposition table is a table that cache the heuristic value of states. It seem to be basically caching of states and their corresponding optimal move. So, if we happen to reach the same state several times in our search, then we’d just look up the solution move rather than calcualting it again.
The power here is that we can avoid searching subtrees over again.
Consider chess: It is well known that a simple minimax, even with alpha beta pruning AND caching using transposition tables, will not work on chess due to the enormous state space. What strategies could then be used with chess?
We have 2 types:
Type A strategy: COnsider a WIDE but shallow state space. Reach all nodes within a certain depth.
Type B strategy: Follow only promising lines, and thus search further but narrower.
Historically, most chess programs have been type A.
How can we make use of our limited cxomputation time? Meaning, since we cannot search until completion, what do we do?
We can cut off the search early and apply a heuristic evaluation function to states, effetively treating non-terminal states as if they are terminal.
In other words, we use “eval” instead of “utility”
We also change the terminal test with a cutoff test, which of course must also return true if the state is in fact terminal.
When we use the heuristic minimax, how can we formally describe the function?
H-MINIMAX(s, d) =
if isCutoff(s,d) then Eval(s, MAX)
if TO-Move(s) == MAX then max_{for every action in actions(s)} H-MINIMAX(result(s,a), d+1)
if To-Move(s) == MIN then min_{for every action in actions} H-MINIMAX(result(s,a), d+1)
When d is the depth
Consider the heuristic function Eval(s, p). What does it return?
The heuristic function eval(s,p) will return an estimate of the expected utility of state s to player p.
For terminal states, the estimate must be equal to the utility, and for non-terminal states, the heuristic estimate must be somewhere between the lose-value and win-value.
Utility(loss, p) <= Eval(s, p) <= Utility(win, p)
What makes a heuristic good?
The computation must be fast, since the entire point is to improve search time.
The heuristic function needs to be strongly connected to the actual chances of winning.
What are “features” of the state?
Domain specific things that we can utilize. It is certain characteristics that we can use to make decisions and predictions.
IN chess, a feautre can be the number of pawns for instance.
if we look at different states that all have the same features, we can group these together and call them “categories” or “equivalence classes” of states.
What are categories or “equivalence classes” of states?
Categories or “equivalence classes” of state is a collection of states that all share the same values for features. For example, one category might contain all two-pawn versus one-pawn endgames.
Any given category will contain some states that lead to wins, some that leads to draws, and some that lead to losses (assuming perfect play).
If we have an evaluation function, it might not know what states lead to what outcome. However, it might know the proportion of which lead to victory. Then the algorithm/heuristic can use this proportion information to make the best decision. It is quite common to use the expected value to make decisions.
Why is it not common to use probabilities as the only way of making deicions?
Because it requires too much experience that we really dont have.
instead, we usually consider individual features and calculate numerical contributions that we later combine to make up the total value. For instance, with chess, we can use material value and positional value. Summed together to find the total value. this is called a weighted linear function.
What is a weighted linear function+
Eval(s) = w1f1(s) + w2f2(s) + … + wn*fn(s)
fi is feature i, wi is the corresponding weight, indicating how important a specific feature is. The weights needs to be normalized so that the final score is within the range of a loss and a win ([0, 1])