20+21 Flashcards

1
Q

How do search scenarios change when we introduce multiple agents?

A

We frequently have to model how other agents will react.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is common in two-player games?

A

Alternating turns and zero sum, meaning anything good for one player is bad for the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does an adversarial game as a search problem contain?

A

Environment composed of set of distinct states, contains: initial state, set of possible actions when in state s, which player has the move in a state, a transition model detailing the final state when taking a move, the terminal test, tells us when a game is over(terminal states). The utility function gives the numeric value for a terminal state. Also need heuristic function to tell utility of non terminal state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can we tell how to move in a two player game?

A

Find the best move for MAX(self). This is done by:

  1. Searching tree to some depth, typically through DFS. 2.apply heuristic evaluation function to each node to estimate utility of corresponding state.
  2. Back up the values of these nodes to parent node.
  3. continue backing up until we get values for children of root node, take best value for MAX.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does alpha-beta pruning work?

A

Search the tree to depth d with DFS, once here compute the node value. As soon as a node can be given a backed up value, we compute it. As soon as we know the values for some of a MAX node’s children we can compute alpha, the largest of these values. As soon as we know the values for some of a MIN node’s children we can compute a beta value, the smallest of the values.
An alpha value of x on a MAX node tells you that its backed up value will never be less than this. A beta value on a MIN node tells us that its backed up value will never be more than this.
If we ever find a max node whose alpha is greater than or equal to beta of one of its children we can prune all unsearched nodes.
If we ever find a MIN node whose beta is less than or equal to alpha of one of its children, we can prune all unsearched nodes of this child.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the efficency of alpha-beta pruning?

A

Worst case: children of every node worst expanded first, so no pruning.
Best-case: always choose the best move for each player as first move, now searching space of size b^(d/2) instead of b^d.
In practice efficency is between two extremes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do we do for chance based games?

A

use expectiminimax (uses expected value of the set of child nodes).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Formalise sequential decision problems.

A

The environment is composed of a set of distinct states: The agent’s initial state, a set of possible actions in state s, a Markovian transition model, this returns the probability distribution over possible states that amy result from an action and a reward associated with each state(can be zero or negative if wanted).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a Markov decision process?

A

An MDP is a sequential decision problem, it has actions a rewards corresponding to every state, in a fully observable stochastic environment, using a Markovian transition model.
These have a policy, telling the agent what to do in a given state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a reward and utility?

A

Reward: immediate advantage/disadvantage from being in a state.
Utility: Agent’s perceived value for being in given state, taking into account current and future rewards. This is given by bellman equation: immediate reward value + discounted value of utility of next state.
For stochastic problems this needs to be the expected utility.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does value iteration involve?

A

1.Assign inital value of 0 to all states.
2. For each state compute update the expected utility:
utility = reward(state) + the expected utility of all states agent can encounter after taking best action.
3.repeat step 2 until utilities converge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly