Lab Manual of ISL
Lab Manual of ISL
Lab Manual of ISL
“Experiment List”
Exp. Lab
Experiments Name
No Outcome
Tutorial exercise for: Design of Intelligent System using PEAS.
1 LO1
Experiment No. 1
Aim: Tutorial exercise for: Design of Intelligent System using PEAS.
Theory:
PEAS stands for Performance Measures, Environment, Actuators, and Sensors.
• Performance Measure: If the objective function to judge the performance of the agent. For example, in case
of pick and place robot, no of correct parts in a bin can be the performance measure.
• Environment: It the real environment where the agent need to deliberate actions.
• Actuators: These are the tools, equipment or organs using which agent performs actions in the environment.
This works as output of the agent.
• Sensors: These are tools, organs using which agent captures the state of the environment. This works as
input to the agent.
Example:
PEAS descriptor for Automated Car Driver:
Performance Measure:
➢ Safety: Automated system should be able to drive the car safely without dashing anywhere.
➢ Optimum speed: Automated system should be able to maintain the optimal speed depending upon
the surroundings.
➢ Comfortable journey: Automated system should be able to give a comfortable journey to the end
user.
Environment:
➢ Roads: Automated car driver should be able to drive on any kind of a road ranging from city roads
to highway.
➢ Traffic conditions: You will find different sort of traffic conditions for different type of roads.
Actuators:
➢ Steering wheel: used to direct car in desired directions.
➢ Accelerator, gear: To increase or decrease speed of the car.
Sensors: To take i/p from environment in car driving.
➢ Cameras
➢ Odometer
➢ GPS
➢ speedometer
Questions:
1. Medical diagnosis system
2. Part-picking robot
3. Crossword puzzle
4. Taxi Driving
5. Refinery Controller
6. Interact. Eng. Tutor
7. Internet Shopping Agent
Experiment No. 2
Aim: Problem Definition with State Space Representation
Theory:
• Formulate a problem as a state space search by showing the legal problem states, the legal operators, and
the initial and goal states .
• A state is defined by the specification of the values of all attributes of interest in the world
• An operator changes one state into the other; it has a precondition which is the value of certain attributes
prior to the application of the operator, and a set of effects, which are the attributes altered by the operator
• The initial state is where you start
• The goal state is the partial description of the solution
The states of 8 tile puzzle are the different permutations of the tiles within frame.
Standard formulation:
States: It specifies the location of each of the 8 tiles and the blank in one of the nice squares.
Initial state : Any state can be designated as the initial state.
Goal : Many goal configurations are possible one such is shown in the figure
Legal moves ( or state) : They generate legal states that result from trying the four actions-
• Blank moves left
• Blank moves right
• Blank moves up
• Blank moves down
Path cost: Each step costs 1, so the path cost is the number of steps in the path.
The tree diagram showing the search space is shown in figure.
Questions:
1. Towers Hanoi
2. Graphs versus Trees
3. Vacuum world
4. Water jug
Extra:
Implement Water Jug Problem Using Problem Formulation
Aim: Implement water jug problem using BFS or DFS (Un-Informed Search).
Theory:
Problem Statement
For further explanation read Section 3.5 of Chapter 3.
In the water jug problem in Artificial Intelligence, we are provided with two jugs: one having the capacity
to hold 3 gallons of water and the other has the capacity to hold 4 gallons of water. There is no other
measuring equipment available and the jugs also do not have any kind of marking on them. So, the agent’s
task here is to fill the 4-gallon jug with 2 gallons of water by using only these two jugs and no other
material. Initially, both our jugs are empty.
Here, let x denote the 4-gallon jug and y denote the 3-gallon jug.
8.
(x,y)
If (x+y)<7
(x-[3-y],y)
Pour some water from the 4 gallon jug to fill the 3 gallon jug.
9. (x,y) If (x+y)<4 (x+y,0)Pour all water from 3 gallon jug to the 4 gallon jug
10. (x,y) if (x+y)<3 (0, x+y) Pour all water from the 4 gallon jug to the 3 gallon jug
The listed production rules contain all the actions that could be performed by the agent in transferring the
contents of jugs. But, to solve the water jug problem in a minimum number of moves, following set of rules
in the given sequence should be performed:
On reaching the 7th attempt, we reach a state which is our goal state. Therefore, at this state, our problem is
solved.
Program:
Problem Statement: There are two jugs (suppose capacity of 3 and 5) and we need to fill the jug in such a
way that 5 liters capacity jug should contain 4 liters of water.
/* jug1 array would hold the values for smaller tank and jug2 array would hold the values for larger
tank */
int jug1[] = new int[count]; int jug2[] = new int[count];
int i=0;
System.out.println();
}
}
Output:
Enter odd capacity of small tank: 3 Enter odd capacity of large tank: 5 JUG1: 3 JUG2: 0
JUG1: 0 JUG2: 3
JUG1: 3 JUG2: 3
JUG1: 1 JUG2: 5
JUG1: 1 JUG2: 0
JUG1: 0 JUG2: 1
JUG1: 3 JUG2: 1
JUG1: 0 JUG2: 4
Experiment No 3
Uniformed Search Techniques
Aim: Path finding in maze using depth-first search (DFS).
Theory:
1. Maze generation algorithms are automated methods for the creation of mazes.
2. A maze can be generated by starting with a predetermined arrangement of cells (most
commonly a rectangular grid but other arrangements are possible) with wall sites between
them.
3. This predetermined arrangement can be considered as a connected graph with the edges
representing possible wall sites and the nodes representing cells.
4. The purpose of the maze generation algorithm can then be considered to be making a sub
graph, where it is challenging to find a route between two particular nodes.
Depth-First Search:
1. Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data
structures.
2. One starts at the root (selecting some arbitrary node as the root in the case of a graph) and
explores as far as possible along each branch before backtracking.
Program:
Depth-first search is an algorithm that can be used to generate a maze. The idea is really simple
and easy to implement using recursive method or stack.
Basically, you start from a random point and keep digging paths in one of 4 directions(up, right,
down, left) until you can’t go any further. Once you are stuck, you take a step back until you find
an open path. You would continue digging from there. It’s just the repetition of these.First of all,
I would like to explain the general idea a little deeper which you can apply using your choice of
programming language. After you have the picture in your mind, you can take a look at the
sample code and applet in java.
Explanation
Create a 2-dimensional int array with odd row and column size. 0 represents paths(orange cell)
and 1 would be walls(black cell).
Next, let’s set the starting point. Generate odd numbers for row and col. Set that cell to 0. Use row
and col variables to keep track of current location. On the picture above, it would be row = 3, col
= 5. For clarity, I will be filling the current cell with red.
Now, choose a random direction(up, right, down, or left) you are moving to. You will always be
moving by 2 cells. The picture above illustrates the current cell moving down. There are couple
things you need to check when you move. First, you need to check if 2 cells ahead of that direction
is outside of the maze. Then, you check if 2 cells ahead is a path(0) or wall(1). If it’s a wall, you
can move by setting these 2 cells to 0(path). Update your current location which is row=5, col=5
at this moment.
As you keep digging as above, you notice that you get to a dead end. In this case, keep moving
your current cell to previous cells until you are able to move to a new direction. This is called
backtracking. Current location is at row=7, col=7, so you would be moving back to row=7, col=5
on the picture above. You can implement this logic using recursive method or stack.
So you keep digging as the picture demonstrates. For better visual, I changed the color of arrow
every time it hits a dead end.
Lastly, this is the final result. With this size, it’s just natural that the maze gets too simple. The
bigger the maze, the more complicated it will get.
Sample Applet
Using Recursive Method
After you choose your starting point, pass that information to the recursive method. In the recursive
method, you can do the following…
return maze;
}
}
/**
* Generate an array with random directions 1-4
Aim: Implement 8-puzzle problem with heuristic function using hill climbing (informed search)
Theory:
In an 8-puzzle game, we need to rearrange some tiles to reach a predefined goal state. Consider
the following 8-puzzle board.
This is the goal state where each tile is in correct place. In this game, you will be given a board
where the tiles aren’t in the correct places. You need to move the tiles using the gap to reach the
goal state.
In the above figure, tiles 6, 7 and 8 are misplaced. So f (n) = 3 for this case.
For solving this problem with hill climbing search, we need to set a value for the heuristic. Suppose
the heuristic function h (n) is the lowest possible f (n) from a given state. First, we need to know
all the possible moves from the current state. Then we have to calculate f(n) (number of misplaced
tiles) for each possible move. Finally we need to choose the path with lowest possible f (n) (which
is our h (n) or heuristic).
Consider the figure above. Here, 3 moves are possible from the current state. For each state we
have calculated f (n). From the current state, it is optimal to move to the state with f (n) = 3 as it is
closer to the goal state. So we have our h (n) = 3.
However, do you really think we can guarantee that it will reach the goal state? What will you do
if you reach on a state(Not the goal state) from which there are no better neighbour states! This
condition can be called a local maxima and this is the problem of hill climbing search. Therefore,
we may get stuck in local maxima. In this scenario, you need to backtrack to a previous state to
perform the search again to get rid of the path having local maxima.
What will happen if we reach to a state where all the f (n) values are equal? This condition is called
a plateau. You need to select a state at random and perform the hill climbing search again!
Input:
You will be given the initial state of the board. The input will be given by row by row of the 3x3
grid of the 8-puzzle game. A digit from 1 to 9 will denote a tiles number. A 0 will denote the gap
of the board.
123
784
605
Task 1: Print the total cost (number of steps) needed to reach the goal state (if possible). Report if
you reach a local maxima and get stuck. Print the board state in this scenario.
Task 2: You need get rid of any local maxima and reach the goal state anyway. Print the total cost
needed to reach the goal state. Mention where you have backtracked to avoid local maxima (if
any).
Program:
Main.java
public
class
Main
{
public static void main(String[] args) {
Eight_Puzzle eight_Puzzle = new Eight_Puzzle();
eight_Puzzle.initializations();
}
}
Priority.java
import java.util.Arrays;
Public
class
Priority
if(preState!=null){//parent exists
nodeArray = getParentRemovedNodeArray(nodeArray, preState);//remove
parent
}
Eight_Puzzle.java
import java.util.Rando m;
import java.util.Stack;
public class Eight_Puzzle {
//solution state of the 8-puzzle game
int goal_state[][] = {
{1, 2, 3},
{8, 0, 4},
{7, 6, 5}
};
//problem board of 8-puzzle game
int game_board[][] = {
{2, 6, 3},
{1, 0, 4},
{8, 7, 5}
};
/* one local maxima input example input
{2, 8, 3},
{1, 6, 4},
{7, 0, 5}
*/
/* one solved input example
{1, 3, 4},
{8, 2, 5},
{0, 7, 6}
{2, 0, 6},
{1, 4, 3},
{8, 7, 5}
//nice backtrack not solved in local maxima
{2, 6, 3},
{1, 0, 4},
{8, 7, 5}
*/
/* one no backtrack local maxima test input example
{1, 4, 0},
{8, 3, 2},
{7, 6, 5}
*/
/* one impossible local maxima test input example
{1, 2, 0},
{8, 3, 4},
{7, 6, 5}
System.out.println("================================
=========");
printState(game_board, "initial problem state");
System.out.println("initial empty tile position: " + emptyTile_row + ",
" + emptyTile_col);
System.out.println("initial fn (number of misplaced tiles): " + min_fn);
System.out.println("================================
=========");
//start hill climbing search
try {
hill_climbing_search();
} catch (Exception e) {
System.out.println("Goal can not be reached, found closest solution
state");
printState(min_fn_node.state, "---------solution state------with min
fn " + min_fn);
}
}
//start hill climbing search for 8-puzzle problem
public void hill_climbing_search() throws Exception {
while (true) {
System.out.println(">=============================
==
=========<");
System.out.println("cost/steps: " + (++stepCounter));
System.out.println(" ");
//Priority.preState = game_board;//change pre state
Node lowestPossible_fn_node = getLowestPossible_fn_node();
addToStackState(Priority.neighbors_nodeArray);//add neighbors
to stack in high to low order fn
printState(lowestPossible_fn_node.state, " new state");
//print all fn values
// System.out.print("all sorted fn of current state: ");
// for (int i = 0; i < Priority.neighbors_nodeArray.length; i++) {
// System.out.print(Priority.neighbors_nodeArray[i].fn + " ");
// }
// System.out.println();
//check for local maxima
int fnCounter = 1;
for (int i = 1; i < Priority.neighbors_nodeArray.length; i++) {
if (Priority.neighbors_nodeArray[i - 1].fn ==
Priority.neighbors_nodeArray[i].fn) {//fns are equal
fnCounter++;
}
}
if (Priority.neighbors_nodeArray.length != 1 && fnCounter ==
Priority.neighbors_nodeArray.length) {//all fns are equal, equal
chances to choose
System.out.println("---fn's are equal, found in local maxima---");
//backtracking
for (int i = 0; i < Priority.neighbors_nodeArray.length; i++) {
if (stack_state != null) {
System.out.println("pop " + (i + 1));
stack_state.pop();
} else {
System.out.println("empty stack inside loop");
}
}
if (stack_state != null) {
Node gameNode = stack_state.pop();
game_board = gameNode.state;//update game board
Priority.preState = gameNode.parent;//update prestate
locateEmptyTilePosition();//locate empty tile for updated state
printState(game_board, "popped state from all equal fn");
System.out.println("empty tile position: " + emptyTile_row + ",
" + emptyTile_col);
} else {
System.out.println("stack empty inside first lm check");
}
} else {//for backtracking
System.out.println("lowest fn: " + lowestPossible_fn_node.fn);
if (lowestPossible_fn_node.fn == 0) {//no misplaced found
System.out.println(" ");
System.out.println("8-Puzzle has been solved!");
System.out.println(" ");
System.out.println("Total cost/steps to reach the goal: " +
stepCounter);
System.out.println(" ");
break;
}
if (lowestPossible_fn_node.fn <= min_fn) {
min_fn = lowestPossible_fn_node.fn;
min_fn_node = lowestPossible_fn_node;//store lowest fn
solution
if (stack_state != null) {
Node gameNode = stack_state.pop();
game_board = gameNode.state;//update game board
Adversarial Search
For every two-person, zero-sum game with finitely many strategies, there exists a value V and
a mixed strategy for each player, such that
1. Given player 2's strategy, the best payoff possible for player 1 is V, and
2. Given player 1's strategy, the best payoff possible for player 2 is −V.
Equivalently, Player 1's strategy guarantees him a payoff of V regardless of Player 2's strategy,
and similarly Player 2 can guarantee himself a payoff of −V. The name minimax arises because
each player minimizes the maximum payoff possible for the other—since the game is zero-
sum, he also minimizes his own maximum loss (i.e. maximize his minimum payoff).
Example:
B chooses B1 B chooses B2 B chooses B3
A chooses A1 +3 −2 +2
A chooses A2 −1 0 +4
A chooses A3 −4 −3 +1
Properties of minimax
Program:
import java.util.Scanner;
public class minmax {
public static int min(int a[][],int n,intsetIndex) {
int smallest = a[setIndex][0];
smallest = a[setIndex][i];
return smallest;
greatest = a[setIndex][i];
return greatest;
set[i][j] = s.nextInt();
System.out.println("");
}
OUTPUT:
-2
3
4
Process completed.
Experiment No 6
Constraint Satisfaction Problem
Conclusion: The constraint satisfaction problem (CSP) consists in finding a solution for a constraint
network. This has numerous applications including, e.g. scheduling and timetabling.
Program:
import java.util.*;
System.out.println();
} }
} else
return true;
return false;
}
i = x - 1;
j = y - 1;
while((i>=0)&&(j>=0))
j = y + 1;
while((i>=0)&&(j<N))
****Q***
*******Q
*****Q**
**Q*****
******Q*
*Q******
***Q****
Experiment No 7
Design of a Planning System Using STRIPS (Block World Problem )
What is STRIPS?
The Stanford Research Institute Problem Solver (STRIPS) is an automated planning technique
that works by executing a domain and problem to find a goal. With STRIPS, you first describe
the world. You do this by providing objects, actions, preconditions, and effects. These are all
the types of things you can do in the game world.
Once the world is described, you then provide a problem set. A problem consists of an initial
state and a goal condition. STRIPS can then search all possible states, starting from the initial
one, executing various actions, until it reaches the goal.
A common language for writing STRIPS domain and problem sets is the Planning Domain
Definition Language (PDDL). PDDL lets you write most of the code with English words, so
that it can be clearly read and (hopefully) well understood. It’s a relatively easy approach to
writing simple AI planning problems.
Problem statement
Design a planning agent for a Blocks World problem. Assume suitable initial state and
final state for the problem.
a. STRIPS : A planning system – Has rules with precondition deletion list and addition list
Sequence of actions :
b. Grab C
c. Pickup C
d. Place on table C
e. Grab B
f. Pickup B
g. Stack B on C
h. Grab A
i. Pickup A
j. Stack A on B
Rules:
k. R1 : pickup(x)
l. R2 : putdown(x)
m. R3 : stack(x,y)
n. R4 : unstack(x,y)
1. Unstack(C,A)
2. Putdown(C)
3. Pickup(B)
4. Stack(B,C)
5. Pickup(A)
6. Stack(A,B)
Experiment No 8
Theory:
Probabilistic reasoning
The aim of a reasoning is to combine the capacity of probability theory to handle uncertainty
with the capacity of deductive logic to exploit structure. The result is a richer and more
expressive formalism with a broad range of possible application areas. Probabilistic logics
attempt to find a natural extension of traditional logic truth tables: the results they define are
derived through probabilistic expressions instead. A difficulty with probabilistic logics is that
they tend to multiply the computational complexities of their probabilistic and logical
components. Other difficulties include the possibility of counter-intuitive results, such as those
of Dempster-Shafer theory. The need to deal with a broad variety of contexts and issues has
led to many different proposals.
Probabilistic Reasoning Using Bayesian Learning: The idea of Bayesian learning is to
compute the posterior probability distribution of the target features of a new example
conditioned on its input features and all of the training examples.
Suppose a new case has inputs X=x and has target features, Y; the aim is to compute
P(Y|X=x∧e), where e is the set of training examples. This is the probability distribution of the
target variables given the particular inputs and the examples. The role of a model is to be the
assumed generator of the examples. If we let M be a set of disjoint and covering models, then
reasoning by cases and the chain rule give
The first two equalities are theorems from the definition of probability. The last equality
makes two assumptions: the model includes all of the information about the examples that is
necessary for a particular prediction [i.e., P(Y | m ∧x∧e)= P(Y | m ∧x) ], and the model does
not change depending on the inputs of the new example [i.e., P(m|x∧e)= P(m|e)]. This formula
says that we average over the prediction of all of the models, where each model is weighted by
its posterior probability given the examples.
P(m|e) = (P(e|m)×P(m))/(P(e)) .
Thus, the weight of each model depends on how well it predicts the data (the likelihood)
and its prior probability. The denominator, P(e), is a normalizing constant to make sure the
posterior probabilities of the models sum to 1. Computing P(e) can be very difficult when there
are many models.
A set {e1,...,ek} of examples are IID (independent and identically distributed), where the
distribution is given by model m if, for all i and j, examples ei and ej are independent given m,
which means P(ei∧ej|m)=P(ei|m)×P(ej|m). We usually assume that the examples are i.i.d.
Suppose the set of training examples e is {e1,...,ek}. That is, e is the conjunction of the
ei, because all of the examples have been observed to be true. The assumption that the examples
are IID implies
The set of models may include structurally different models in addition to models that
differ in the values of the parameters. One of the techniques of Bayesian learning is to make
the parameters of the model explicit and to determine the distribution over the parameters.
Example: Consider the simplest learning task under uncertainty. Suppose there is a single
Boolean random variable, Y. One of two outcomes, a and ¬a, occurs for each example. We
want to learn the probability distribution of Y given some examples.
There is a single parameter, φ, that determines the set of all models. Suppose that φ
represents the probability of Y=true. We treat this parameter as a real-valued random variable
on the interval [0,1]. Thus, by definition of φ, P(a|φ)=φ and P(¬a|φ)=1-φ.
Suppose an agent has no prior information about the probability of Boolean variable Y
and no knowledge beyond the training examples. This ignorance can be modelled by having
the prior probability distribution of the variable φ as a uniform distribution over the interval
[0,1]. This is the probability density function labeledn0=0, n1=0 in.
We can update the probability distribution of φ given some examples. Assume that the
examples, obtained by running a number of independent experiments, are a particular sequence
of outcomes that consists of n0 cases where Y is false and n1 cases where Y is true.
Figure Beta distribution based on different samples.
The posterior distribution for φ given the training examples can be derived by Bayes'
rule. Let the examples e be the particular sequence of observation that resulted in n1 occurrences
of Y=true and n0 occurrences of Y=false. Bayes' rule gives us
P(φ|e)=(P(e|φ)×P(φ))/(P(e)) .
The denominator is a normalizing constant to make sure the area under the curve is 1.
P(e|φ)=φn1×(1-φ)n0
Because there are n0 cases where Y=false, each with a probability of 1-φ, and n1 cases
where Y=true, each with a probability of φ.
One possible prior probability, P(φ), is a uniform distribution on the interval [0,1]. This
would be reasonable when the agent has no prior information about the probability.
The figure on “Beta distribution based on different samples” gives some posterior
distributions of the variable φ based on different sample sizes, and given a uniform prior. The
cases are (n0=1, n1=2), (n0=2, n1=4), and (n0=4, n1=8). Each of these peak at the same place,
namely at (2)/(3). More training examples make the curve sharper.
where K is a normalizing constant that ensures the integral over all values is 1. Thus, the
uniform distribution on [0,1] is the beta distribution Beta1,1.
The generalization of the beta distribution to more than two parameters is known as the
Dirichlet distribution. The Dirichlet distribution with two sorts of parameters, the "counts"
α1,...,αk, and the probability parameters p1,...,pk, is
where K is a normalizing constant that ensures the integral over all values is 1; pi is the
probability of the ith outcome (and so 0 ≤ pi ≤ 1) and αi is one more than the count of the ith
outcome. That is, αi=ni+1. The Dirichlet distribution looks like as in the figure along each
dimension (i.e. as each pj varies between 0 and 1).
For many cases, summing over all models weighted by their posterior distribution is
difficult, because the models may be complicated (e.g., if they are decision trees or even belief
networks). However, for the Dirichlet distribution, the expected value for outcome i (averaging
over all pj's) is
(αi)/(∑j αj) .
The reason that the αi parameters are one more than the counts is to make this formula
simple. This fraction is well defined only when the αj are all non-negative and not all are zero.
Thus, the expected value of the n0=1, n1=2 curve is (3)/(5), for the n0=2, n1=4 case the
expected value is (5)/(8), and for the n0=4, n1=8 case it is (9)/(14). As the learner gets more
training examples, this value approaches (n)/(m).
This estimate is better than (n)/(m) for a number of reasons. First, it tells us what to do
if the learning agent has no examples: Use the uniform prior of (1)/(2). This is the expected
value of the n=0, m=0 case. Second, consider the case where n=0 and m=3. The agent should
not use P(y)=0, because this says that Y is impossible, and it certainly does not have evidence
for this! The expected value of this curve with a uniform prior is (1)/(5).
An agent does not have to start with a uniform prior; it can start with any prior
distribution. If the agent starts with a prior that is a Dirichlet distribution, its posterior will be
a Dirichlet distribution. The posterior distribution can be obtained by adding the observed
counts to the αi parameters of the prior distribution.
The IID assumption can be represented as a belief network, where each of the ei are
independent given model m. This independence assumption can be represented by the belief
network.
If m is made into a discrete variable, any of the inference methods of the previous
chapter can be used for inference in this network. A standard reasoning technique in such a
network is to condition on all of the observed ei and to query the model variable or an
unobserved ei variable.
The problem with specifying a belief network for a learning problem is that the model
grows with the number of observations. Such a network can be specified before any
observations have been received by using a plate model. A plate model specifies what variables
will be used in the model and what will be repeated in the observations. The plate is drawn as
a rectangle that contains some nodes, and an index (drawn on the bottom right of the plate).
The nodes in the plate are indexed by the index. In the plate model, there are multiple copies
of the variables in the plate, one for each value of the index. The intuition is that there is a pile
of plates, one for each value of the index. The number of plates can be varied depending on the
number of observations and what is queried. In this figure, all of the nodes in the plate share a
common parent. The probability of each copy of a variable in a plate given the parents is the
same for each index.
A plate model lets us specify more complex relationships between the variables. In a
hierarchical Bayesian model, the parameters of the model can depend on other parameters.
Such a model is hierarchical in the sense that some parameters can depend on other parameters.
Example: Suppose a diagnostic assistant agent wants to model the probability that a particular
patient in a hospital is sick with the flu before symptoms have been observed for this patient.
This prior information about the patient can be combined with the observed symptoms of the
patient. The agent wants to learn this probability, based on the statistics about other patients in
the same hospital and about patients at different hospitals. This problem can range from the
cases where a lot of data exists about the current hospital (in which case, presumably, that data
should be used) to the case where there is no data about the particular hospital that the patient
is in. A hierarchical Bayesian model can be used to combine the statistics about the particular
hospital the patient is in with the statistics about the other hospitals.
Suppose that for patient X in hospital H there is a random variable SHX that is true when
the patient is sick with the flu. (Assume that the patient identification number and the hospital
uniquely determine the patient.) There is a value φH for each hospital H that will be used for
the prior probability of being sick with the flu for each patient in H. In a Bayesian model, φH
is treated as a real-valued random variable with domain [0,1]. SHX depends on φH, with
P(SHX|φH)=φH. Assume that φH is distributed according to a beta distribution. We don't assume
that φhi and φh2 are independent of each other, but depend on hyperparameters. The
hyperparameters
hyperparameters incan be of
terms thetheprior
conditional α0 and αP(φ
counts probability 1. The parametersα ,α
|α ,α )= Beta depend
(φ ); α on
andthe
α
hi 0 1 hi 0 1
01
Sophisticated methods exist to evaluate such networks. However, if the variables are
made discrete, any of the methods of the previous chapter can be used.
In addition to using the posterior distribution of φ to derive the expected value, we can
use it to answer other questions such as: What is the probability that the posterior probability
of φ is in the range [a,b]? In other words, derive P((φ ≥ a ∧φ ≤ b) | e).
(∫abpn×(1-p)m-n)/(∫01pn×(1-p)m-n)
This kind of knowledge is used in surveys when it may be reported that a survey is
correct with an error of at most 5%, 19 times out of 20. It is also the same type of information
that is used by probably approximately correct (PAC) learning, which guarantees an error at
most ε at least 1-δ of the time. If an agent chooses the midpoint of the range [a,b], namely
(a+b)/(2), as its hypothesis, it will have error less than or equal to (b-a)/(2), just when the
hypothesis is in [a,b]. The value 1-δ corresponds to P(φ ≥ a ∧φ ≤ b | e). If ε=(b-a)/(2) and δ=1-
P(φ ≥ a ∧φ ≤ b | e), choosing the midpoint will result in an error at most ε in 1-δ of the time.
PAC learning gives worst-case results, whereas Bayesian learning gives the expected number.
Typically, the Bayesian estimate is more accurate, but the PAC results give a guarantee of the
error. The sample complexity required for Bayesian learning is typically much less than that of
PAC learning – many fewer examples are required to expect to achieve the desired accuracy
than are needed to guarantee the desired accuracy.
Experiment No 9
where
all s and s are literals,
is the complement to , and the dividing line stands for entails
The clause produced by the resolution rule is called the resolvent of the two input clauses.
When the two clauses contain more than one pair of complementary literals, the
resolution rule can be applied (independently) for each such pair; however, the result is always
a tautology.
Modus ponens can be seen as a special case of resolution of a one-literal clause and a
two-literal clause.
A Resolution Technique: When coupled with a complete search algorithm, the resolution rule
yields a sound and complete algorithm for deciding the satisfiability of a propositional formula,
and, by extension, the validity of a sentence under a set of axioms.
This resolution technique uses proof by contradiction and is based on the fact that any
sentence in propositional logic can be transformed into an equivalent sentence in conjunctive
normal form. The steps are as follows.
All sentences in the knowledge base and the negation of the sentence to be proved
(the conjecture) are conjunctively connected.
The resulting sentence is transformed into a conjunctive normal form with the conjuncts
viewed as elements in a set, S, of clauses.
For example,
Algorithm: The resolution rule is applied to all possible pairs of clauses that contain
complementary literals. After each application of the resolution rule, the resulting sentence is
simplified by removing repeated literals. If the sentence contains complementary literals, it is
discarded (as a tautology). If not, and if it is not yet present in the clause set S, it is added to S,
and is considered for further resolution inferences.
If after applying a resolution rule the empty clause is derived, the original formula is
unsatisfiable (or contradictory), and hence, it can be concluded that the initial conjecture
follows from the axioms.
If, on the other hand, the empty clause cannot be derived, and the resolution rule cannot
be applied to derive any more new clauses, the conjecture is not a theorem of the original
knowledge base.
One instance of this algorithm is the original Davis–Putnam algorithm that was later
refined into the DPLL algorithm that removed the need for explicit representation of the
resolvents.
This description of the resolution technique uses a set S as the underlying data-structure
to represent resolution derivations. Lists, Trees and Directed Acyclic Graphs are other possible
and common alternatives. Tree representations are more faithful to the fact that the resolution
rule is binary. Together with a sequent notation for clauses, a tree representation also makes it
clear to see how the resolution rule is related to a special case of the cut-rule, restricted to
atomic cut-formulas. However, tree representations are not as compact as set or list
representations, because they explicitly show redundant subderivations of clauses that are used
more than once in the derivation of the empty clause. Graph representations can be as compact
in the number of clauses as list representations and they also store structural information
regarding which clauses were resolved to derive each resolvent.
A simple example
In plain language: Suppose is false. In order for the premise to be true, must be
true. Alternatively, suppose is true. In order for the premise to be true, must
be true. Therefore, regardless of falsehood or veracity of , if both premises hold, then the
conclusion is true.
Resolution in First-Order Logic: In first-order logic, resolution condenses the
traditional syllogisms of logical inference down to a single rule.
To understand how resolution works, consider the following example syllogism of term
logic:
All Greeks are Europeans.
Homer is a Greek.
Therefore, Homer is a European.
Therefore,
To recast the reasoning using the resolution technique, first the clauses must be
converted to conjunctive normal form. In this form, all quantification becomes
implicit: universal quantifiers on variables (X, Y, …) are simply omitted as understood,
while existentially quantified variables are replaced by Skolem functions.
Therefore,
So, the question is, how does the resolution technique derive the last clause from the
first two? The rule is simple:
Find two clauses containing the same predicate, where it is negated in one clause but
not in the other.
Perform unification on the two predicates. (If the unification fails, you made a bad
choice of predicates. Go back to the previous step and try again.)
If any unbound variables which were bound in the unified predicates also occur in other
predicates in the two clauses, replace them with their bound values (terms) there as well.
Discard the unified predicates, and combine the remaining ones from the two clauses
into a new clause, also joined by the "∨" operator.
To apply this rule to the above example, we find the predicate P occurs in negated form
¬P(X)
Discarding the unified predicates, and applying this substitution to the remaining
predicates (just Q(X), in this case), produces the conclusion:
Q(a)
∀X Q(X) → R(X)
¬Q(Y) ∨ R(Y)
(Note that the variable in the second clause was renamed to make it clear that variables in
different clauses are distinct.)
Now, unifying Q(X) in the first clause with ¬Q(Y) in the second clause means
that X and Y become the same variable anyway. Substituting this into the remaining clauses
and combining them gives the conclusion:
¬P(X) ∨ R(X)
The resolution rule, as defined by Robinson, also incorporated factoring, which unifies
two literals in the same clause, before or during the application of resolution as defined above.
The resulting inference rule is refutation complete, in that a set of clauses is unsatisfiable if and
only if there exists a derivation of the empty clause using resolution alone.
Program:
%% Sam's likes and dislikes in food
%% ?- likes(sam,dahl).
%% ?- likes(sam,chop_suey).
%% ?- likes(sam,pizza).
%% ?- likes(sam,chips).
Output:
3 ?- likes(sam,pizza).
true.
4 ?- likes(sam,idle).
false.
5 ?-