Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Ai-Module 2

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 47

MODULE 2

• CHAPTER 1: Infrastructure for Search Algorithms and Measuring Problem


Solving Performance.

• CHAPTER 2: Adversarial Search and Games

• Game Theory

• Optimal Decision in Games

• Heuristic Alpha-Beta Tree Search

• Monte Carlo Tree Search


Infrastructure for Search Algorithms
• Search algorithms require a data structure to keep track of the search tree that is
being constructed.

• For each node n of the tree, we have a structure that contains four components:

• n.STATE: the state in the state space to which the node corresponds;

• n.PARENT: the node in the search tree that generated this node;

• n.ACTION: the action that was applied to the parent to generate the node;

• n.PATH-COST: the cost, traditionally denoted by g(n), of the path from the initial
state to the node, as indicated by the parent pointers.
• Given the components for a parent node, it is easy to see how to compute the necessary
components for a child node.

The function CHILD-NODE takes a parent node and an action and returns the resulting
child node:

• A node is a bookkeeping data structure used to represent the search tree.


• A state corresponds to a configuration of the world.
• Now that we have nodes, we need somewhere to put them.

• The frontier needs to be stored in such a way that the search algorithm can
easily choose the next node to expand according to its preferred strategy.

• The appropriate data structure for this is a queue.

• The operations on a queue are as follows:

• EMPTY?(queue) returns true only if there are no more elements in the queue.

• POP(queue) removes the first element of the queue and returns it.

• INSERT(element, queue) inserts an element and returns the resulting queue.


• Queues are characterized by the order in which they store the inserted nodes.

• Three common variants are the first-in, first-out or FIFO queue, which pops the
oldest element of the queue;

• The last-in, first-out or LIFO queue (also known as a stack), which pops the newest
element of the queue;

• and the priority queue, which pops the element of the queue with the highest priority
according to some ordering function.

• For example, in the traveling salesperson problem, the hash table needs to know that
the set of visited cities {Bucharest,Urziceni,Vaslui} is the same as
{Urziceni,Vaslui,Bucharest}.
Measuring problem-Solving Performance
• We can evaluate an algorithm’s performance in four ways:

• Completeness: Is the algorithm guaranteed to find a solution when there is one?

• Optimality: Does the strategy find the optimal solution.

• Time complexity: How long does it take to find a solution?

• Space complexity: How much memory is needed to perform the search?

• The typical measure is the size of the state space graph, |V | + |E|, where V is the set of
vertices (nodes) of the graph and E is the set of edges (links).
• complexity is expressed in terms of three quantities:

• b, the branching factor or maximum number of successors of any node;

• d, the depth of the shallowest goal node (i.e., the number of steps along the
path from the root);

• m, the maximum length of any path in the state space.

• Time is often measured in terms of the number of nodes generated during


the search, and space in terms of the maximum number of nodes stored in
memory.
Chapter 2: Adversarial Search and Games
• In this chapter we cover competitive environments, in which two or more agents
have conflicting goals, giving rise to adversarial search problems.

• we will concentrate on games, such as chess, Go, and poker. For AI researchers,
the simplified nature of these games is a plus: the state of a game is easy to
represent, and agents are usually restricted to a small number of actions whose
effects are defined by precise rules.

• Physical games, such as croquet and ice hockey, have more complicated
descriptions, a larger range of possible actions, and rather imprecise rules defining
the legality of actions
Game Theory
• There are at least three stances we can take towards multi-agent environments.

The first stance,

• when there are a very large number of agents, is to consider them in the aggregate as an economy,
allowing us to do things like predict that increasing demand will cause prices to rise, without having
to predict the action of any individual agent.

The Second stance,

• we could consider adversarial agents as just a part of the environment—a part that makes the
environment nondeterministic.

• But if we model the adversaries in the same way that, say, rain sometimes falls and sometimes
doesn’t, we miss the idea that our adversaries are actively trying to defeat us, whereas the rain
supposedly has no such intention.
The Third Stance,

• The third stance involves explicitly modelling adversarial agents using techniques related to
adversarial game-tree search.

• The focus is on defining the optimal move in a restricted class of games using minimax
search.

• Minimax search is a generalized form of AND–OR search, allowing for efficient exploration
of game trees.

• The chapter introduces the concept of pruning, which involves ignoring portions of the search
tree that do not impact the determination of the optimal move.

• Pruning enhances the efficiency of the search process.

• The necessity of deciding when to cut off the search, recognizing practical constraints.
Two Player Zero-Sum Games
• The games most commonly studied within AI (such as chess and Go) are what game theorists call
deterministic, two-player, turn-taking, perfect information, zero-sum games.

• “Perfect information” is a synonym for “fully observable,”

• “zero-sum” means that what is good for one player is just as bad for the other: there is no “win-win”
outcome.

• For games we often use the term move as a synonym for “action” and position as a synonym for
“state.”

• We will call our two players MAX and MIN, for reasons that will soon become obvious. Position
MAX moves first, and then the players take turns moving until the game is over.

• At the end of the game, points are awarded to the winning player and penalties are given to the
A Game can be Formally Defined with the Following Elements:

• S0: The initial state, which specifies how the game is set up at the start.

• TO-MOVE(s): The player whose turn it is to move in state s.

• ACTIONS(s): The set of legal moves in state s.

• RESULT(s, a): The transition model, which defines the state resulting from taking action a in
state s.

• IS-TERMINAL(s): A terminal test, which is true when the game is over and false otherwise.
States where the game has ended are called terminal states.

• UTILITY(s, p): A utility function (also called an objective function or payoff function), which
defines the final numeric value to player p when the game ends in terminal state s. In chess, the
outcome is a win, loss, or draw, with values 1, 0, or 1/2.2
Optimal Decision in Games
• MAX wants to find a sequence of actions leading to a win, but MIN has something to say about it.

• This means that MAX’s strategy must be a conditional plan—a contingent strategy specifying a
response to each of MIN’s possible moves.

• In games that have a binary outcome (win or lose), we could use AND–OR search to generate the
conditional plan.

• For games with multiple outcome scores, we need a slightly more general algorithm called minimax
search.

• The possible moves for MAX at the root node are labelled a1, a2, and a3. The possible replies to a1
for MIN are b1, b2, b3, and so on. This particular game ends after one move each by MAX and MIN.

• Move……….Ply
• The utilities of the terminal states in this game range from 2 to 14. The minimax value of a terminal
state is just its utility.

• The optimal strategy can be determined by working out the minimax value of each state in the tree,
which we write as MINIMAX(s).

• The minimax value is the utility (for MAX) of being in that state, assuming that both players play
optimally from there to the end of the game.

• MAX prefers to move to a state of maximum value when it is MAX’s turn to move, and MIN
prefers a state of minimum value (that is, minimum value for MAX and thus maximum value for
MIN).
• The first MIN node, labelled B, has three successor states with values 3, 12, and 8, so its
minimax value is 3.

• Similarly, the other two MIN nodes have minimax value 2. The root node is a MAX node;
its successor states have minimax values 3, 2, and 2; so it has a minimax value of 3.
The Minimax Search
• A search algorithm that finds the best move for MAX by trying all actions and choosing
the one whose resulting state has the highest MINIMAX value.

• It is a recursive algorithm that proceeds all the way down to the leaves of the tree and
then backs up the minimax values through the tree as the recursion unwinds.

• The minimax algorithm performs a complete depth-first exploration of the game tree.

• If the maximum depth of the tree is m and there are b legal moves at each point, then the
time complexity of the minimax algorithm is O(bm).

• The space complexity is O(bm) for an algorithm that generates all actions at once, or
O(m) for an algorithm that generates actions one at a time.
Optimal Decision in Multiplayer Game
• First, we need to replace the single value for each node with a vector of values.

• For example, in a three-player game with players A, B, and C, a vector (va, vb, vc)
is associated with each node.

• UTILITY function return a vector of utilities.

• In that state, player C chooses what to do. The two choices lead to terminal states
with utility vectors

• This means that if state X is reached, subsequent play will lead to a terminal state
with utilities
• Anyone who plays multiplayer games, such as Diplomacy becomes aware that much more is going
on than in two-player games.

• Multiplayer games usually involve alliances, whether formal or informal, among the players.

• Players A and B find themselves in weak positions, with limited resources and vulnerable territories.

• Player C, on the other hand, is in a stronger position, controlling key territories and having a superior
army.

• Recognizing their vulnerability, players A and B may decide to form an alliance against player C.

• A and B collaborate to attack and weaken player C. This collaboration is a strategic move to ensure
their mutual survival.

• It's within their self-interest to consider betraying the alliance if they believe they can gain an
advantage by turning on each other.
Alpha-Beta Pruning
• Alpha-beta pruning is a modified version of the minimax algorithm. It is an optimization technique for
the minimax algorithm.

• There is a technique by which without checking each node of the game tree we can compute the
correct minimax decision, and this technique is called pruning.

• This involves two threshold parameter Alpha and beta for future expansion, so it is called alpha-beta
pruning. It is also called as Alpha-Beta Algorithm.

• The two-parameter can be defined as:


• Alpha: The best (highest-value) choice we have found so far at any point along the path of Ma
• ximizer. The initial value of alpha is -∞.
• Beta: The best (lowest-value) choice we have found so far at any point along the path of Minimizer. The initial
value of beta is +∞.
• The main condition which required for alpha-beta pruning is: α>=β

 Key points about alpha-beta pruning:

• The Max player will only update the value of alpha.

• The Min player will only update the value of beta.

• While backtracking the tree, the node values will be passed to upper nodes
instead of values of alpha and beta.

• We will only pass the alpha, beta values to the child nodes.
Working of Alpha-Beta Pruning:

• Step 1: At the first step the, Max player will start first move from node A where α= -∞ and β= +∞, these
value of alpha and beta passed down to node B where again α= -∞ and β= +∞, and Node B passes the same
value to its child D.
• Step 2: At Node D, the value of α will be calculated as its turn for Max. The value of α is compared with
firstly 2 and then 3, and the max (2, 3) = 3 will be the value of α at node D and node value will also 3.

• Step 3: Now algorithm backtrack to node B, where the value of β will change as this is a turn of Min, Now
β= +∞, will compare with the available subsequent nodes value, i.e. min (∞, 3) = 3, hence at node B now
α= -∞, and β= 3.
• In the next step, algorithm traverse the next successor of Node B which is node E, and the values of α= -∞,
and β= 3 will also be passed.

• Step 4: At node E, Max will take its turn, and the value of alpha will change. The current value of alpha
will be compared with 5, so max (-∞, 5) = 5, hence at node E α= 5 and β= 3, where α>=β, so the right
successor of E will be pruned, and algorithm will not traverse it, and the value at node E will be 5.
• Step 5: At next step, algorithm again backtrack the tree, from node B to
node A. At node A, the value of alpha will be changed the maximum
available value is 3 as max (-∞, 3)= 3, and β= +∞, these two values now
passes to right successor of A which is Node C.

• At node C, α=3 and β= +∞, and the same values will be passed on to node F.

• Step 6: At node F, again the value of α will be compared with left child
which is 0, and max(3,0)= 3, and then compared with right child which is 1,
and max(3,1)= 3 still α remains 3, but the node value of F will become 1.
• Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the value
of beta will be changed, it will compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β=
1, and again it satisfies the condition α>=β, so the next child of C which is G will be
pruned, and the algorithm will not compute the entire sub-tree G.
• Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1) = 3.
Following is the final game tree which is the showing the nodes which are computed and
nodes which has never computed. Hence the optimal value for the maximizer is 3 for this
example.
Move Ordering
• Move ordering is a critical factor in the efficiency of Alpha-Beta Pruning. It allows the best
moves to be explored early in the search.
There are two types of move ordering in Alpha beta pruning:
• Worst Ordering: In some cases of alpha beta pruning none of the node pruned by the
algorithm and works like standard minimax algorithm. ...
• Ideal Ordering: In some cases of alpha beta pruning lot of the nodes pruned by the
algorithm.
• To improve the number of cut-offs made by the alpha-beta algorithm, the program attempts
to order moves so that better moves are considered first.
• Alpha-beta pruning effectiveness is highly dependent on move ordering.
HEURISTIC ALPHA-BETA TREE SEARCH
Monte Carlo Tree Search algorithm
• Games like tic-tac-toe, checkers and chess can arguably be solved using the minimax
algorithm.

• However, things can get a little tricky when there are a large number of potential actions to
be taken at each state. This is because minimax explores all the nodes available.

• It can become frighteningly difficult to solve a complex game like Go in a finite amount of
time.

• Go has a branching factor of approximately 300 i.e. from each state there are around 300
actions possible, whereas chess typically has around 30 actions to choose from.

• Monte Carlo Tree Search, invented in 2007, provides a possible solution.


• The basic MCTS algorithm is simple: a search tree is built, node-by-node, according to the
outcomes of simulated play outs. The process can be broken down into the following steps:
• Selection: Selecting good child nodes, starting from the root node R, that represent states
leading to better overall outcome (win).

• Expansion: If L is a not a terminal node (i.e. it does not end the game), then create one or
more child nodes and select one (C).

• Simulation (rollout): Run a simulated playout from C until a result is achieved.

• Back propagation: Update the current move sequence with the simulation result.
UCB Value
• UCB1, or upper confidence bound for a node, is given by the following formula:
where,
• Vi is the average reward/value of all nodes beneath this node
• N is the number of times the parent node has been visited, and
• ni is the number of times the child node i has been visited
Rollout
• What do we mean by a rollout? Until we reach the leaf node, we randomly choose an
action at each step and simulate this action to receive an average reward when the game is
over.
• Iteration 1:
We start with an initial state S0. Here, we
have actions a1 and a2 which lead to states s1
and s2 with total score t and number of visits
n. But how do we choose between the 2 child
nodes?

• This is where we calculate the UCB values for both


the child nodes and take whichever node maximises
that value. Since none of the nodes have been visited
yet, the second term is infinite for both. Hence, we
are just going to take the first node
• We are now at a leaf node where we need to check whether we have visited it. As it turns
out, we haven’t. In this case, on the basis of the algorithm, we do a rollout all the way
down to the terminal state. Let’s say the value of this rollout is 20
• Now comes the 4th phase, or the backpropogation phase. The value of the leaf node (20) is
backpropogated all the way to the root node. So now, t = 20 and n = 1 for nodes S1 and S0
• Iteration 2:

• We go back to the initial state and ask which child node to visit next. Once again, we calculate the UCB values, which
will be 20 + 2 * sqrt(ln(1)/1) = 20 for S1 and infinity for S2. Since S2 has the higher value, we will choose that node

• Rollout will be done at S2 to get to the value 10 which will be backpropogated to the root node. The value at root node
now is 30
Iteration 3:
• In the below diagram, S1 has a higher UCB1 value and hence the expansion should be done here:
Iteration 4:

• We again have to choose between S1 and S2. The UCB value for S1 comes out to be 11.48 and
12.10 for S2:
END OF MODULE 2

You might also like