Expert System and Apllications: Ai - Iii-Unit
Expert System and Apllications: Ai - Iii-Unit
Expert System and Apllications: Ai - Iii-Unit
One of the goals of AI is to understand the concept of intelligence and develop intelligent
computer programs. An example of a computer program that exhibits intelligent behavior is an expert
system (ES). Expert systems are meant to solve real-world problems which require specialized human
expertise and provide expert quality advice, diagnoses, and recommendations. An ES basically a software
program or system that tries to perform tasks similar to human experts in a specific domain of the
problem. It incorporates the concepts and methods of symbolic inference reasoning, and the use of
knowledge for making these inferences. Expert systems may also be referred to as knowledge-based
expert systems.
Expert system can process multiple values for any problem parameter. This causes them to pursue more
than one line of reasoning hence leading to results of incomplete (not fully determined) reasoning being
presented. Problem solving by theses systems is accomplished by applying specific knowledge rather than
specific technique. This is the key idea ES technology. The power ES lies in the store of knowledge
regarding the problem domain; the more knowledge a system is provided, the more competent it
becomes.
Building an ES initially requires extracting the relevant knowledge from a human domain expert; this
knowledge is often based on useful thumb rules and experience rather than absolute certainties. After
extracting knowledge from domain experts, the next step is to represent this knowledge in the system.
Representation of knowledge in a computer is not straight forward and requires special expertise. A
knowledge engineer handles the responsibility of extracting this knowledge and building the ES' s
knowledge base. This process of gathering knowledge from a domain expert and codifying it according to
the formalism is called knowledge engineering. This phase is known as knowledge acquisition, which is a
big area of research. A wide variety of techniques have been developed for this purpose. Generally, an
initial prototype based on the information extracted by interviewing the expert is developed. This proto -
type is then iteratively refined on the basis of the feedback received from the experts and potential users
of the ES. Refinement of a system is possible only if the system is scalable and modifiable and does not
require rewriting of major code.
A simple ES primarily consists of a knowledge base and an inference engine, while features such as
reasoning with uncertainty and explanation of the line of reasoning enhance the capability of ES .Since ES
uses uncertain or heuristic knowledge just like humans, its credibility is often questionable.
To be more precise, the different interdependent and overlapping phases involved in buildi ng an ES may be
categorized as follows:
• Identification Phase In this phase, the knowledge engineer determines important feature of the problem
with the help of the human domain expert. The parameter that are determined in this phases include the
type and scope of the problem, the kind of resource required, and the goal and objective of the ES
• Conceptualization Phase In this phase, knowledge engineer and domain expert decide the concepts,
relations, and control mechanism needed to describe the problem- solving method. At this stage, the
issues of granularity are also addressed, which refers to the details required in the knowledge.
1 VARUNMARAMR AJ
AI_III-UNIT III YEAR II SEM
• Formalization Phase This phase involves expressing the key concepts and relations in some framework
supported by ES building tool . Formalized knowledge consists of data structure, inference rules, control
strategies, and languages required for implementation.
• Implementation Phase During this phase, formalized knowledge is converted to a working computer
program, initially called prototype of the whole system.
• Testing Phase this phase involves evaluating the performance and utility of prototype and revising the
system, if required. The domain expert evaluates the prototype system and provides feedback, which helps
the knowledge engineer to revise it.
Knowledge Engineering
• Ensuring that the computer has all the knowledge needed to solve a problem.
• Choosing one or more forms to represent the required knowledge.
• Ensuring that the computer can use the knowledge efficiently by selecting some of the reasoning
methods.
Questions /Quires Strategies &
domain rules Expert
Domain Knowledge
System
Expert Engineer
Answers/Solutions
Interaction between Knowledge Engineer and Domain Expert for Creating an ES
The main role of the knowledge engineer begins only once the problem of some domain for developing an
ES is decided. The job of the knowledge engineer involve s close collaboration with the domain expert(s)
and the end user(s).
The knowledge engineer will then extract general rules from the discussion and interview held with
expert(s) and gets them checked by the expert(s) for correctness. The engineer then translates the
knowledge into a computer-usable language and designs an inference engine, which is a reasoning
structure that uses the knowledge appropriately.
The domain knowledge, consisting of both formal, textbook knowledge and experiential knowledge
(obtained by the expert' experiences), is entered into the program piece by piece. In the initial stages, the
knowledge engineer may encounter a number of problems such as the inference engine may not be right,
the form of knowledge representation may not be appropriate for the kind of knowledge needed for the
task, or the expert may find the pieces of knowledge incorrect.
The development of ES would remain incomplete if it did not involve close collaboration with end users.
The basic development cycle should include the development of an initial prototype and iterative testing
and modification of that prototype by both experts (for checking the validity of the rules) and users (for
checking the performance of the system and explanations for the answers).
The initial prototype, the knowledge engineer will have to take provisional decisions regarding appropriate
knowledge representation (e.g., rules, semantic net or frames, etc.) and inference methods (e.g., forward
chaining or backward chaining or both). To test these basic design decisions, the first prototype may be so
designed that it only solves a small part of the overall problem.
VARUNMARAMR AJ 2
AI_III-UNIT III YEAR II SEM
There are two ways of building an ES: they can be built from either scratch or by using ES shells or tools.
Currently, expert system shells are available and widely used.
KNOWLEDGE REPRESENTATION
The collection of domain knowledge is called knowledge base, while the general problem solving
knowledge may be called inference engine, user interface, etc. The most common knowledge
representation scheme for expert systems consists of production rules, or simply rules; they are of the
form if –then, where the if part contains a set of conditions in some logical combinations.
Expert systems in which knowledge is represented in the form of rules are called rule based systems.
Another widely used representation in ES is called the unit (also known as frame, semantic net etc.), which
is based upon a more passive view of knowledge.
A unit consists a list of properties of an entity and associated values for those properties.
VARUNMARAMR AJ 3
AI_III-UNIT III YEAR II SEM
Knowledge Base
Knowledge base of an ES consists of knowledge regarding problem domain in the form of static and
dynamic databases. Static knowledge consists of rules and facts, or any other of knowledge representation
which may be compiled as a part of the system and does not change during execution of the system.
Dynamic knowledge consists of facts related to a particular consultation of the system collected by asking
various questions to the user who is consulting the ES. At the beginning of the consultation, the dynamic
knowledge base is empty. As the consultation progresses, dynamic knowledge base grows and is used in
decision making along with static knowledge.
Inference Engine
The term inference refers to the process of searching through knowledge base and deriving new
knowledge. It involves formal reasoning by matching and unification similar to the one performed by
human expert to solve problems in a specific area of knowledge using modus ponen rule. An inference rule
may be defined as a statement that has two parts, an if clause and a then clause. This rule enables expert
systems to find solutions to diagnostic prescriptive problems.
Each rule is independent of others and may be deleted or added without affecting, other rules.
Inference mechanism uses, control strategy that determines the order in which rules are applied. There are
mainly two types of reasoning mechanisms that use inference rules: backward chaining and forward
chaining.
The process of forward chaining starts with the available data and uses inference rules to conclude
more data until a desired goal is achieved. An inference engine uses facts from static and dynamic
knowledge bases and searches through the rules until it finds one in which the if clause is known to be
true. A rule is then said to succeed. It then concludes the then clause and adds this information to the
dynamic knowledge base. The inference engine continues to repeat the process until a goal is reached.
Since the data available determines which inference rules are used, this method is also known as data
driven method.
As an example of the forward chaining, we have the following rules:
Rule 1 If symptoms are headache, sneezing, running nose and sore throat, then patient has cold.
Rule 2 If symptoms are fever, cough and running nose, then patient has measles.
Back-ward chaining starts with a list of goals and works backwards to see if there is data which will
allow it to conclude any of these goals. An inference engine using backward chaining would search the
inference rules until it finds one whose then part matches a desired goal. If the if part of that inference rule
is not known to be true, then it is added to the list of goals.
Knowledge Acquisition
Knowledge present in an ES may be obtained from many sources such as textbooks, reports, case
studies, empirical data, and domain expert which are a prominent source of knowledge. A knowledge
acquisition module allows the system to acquire more knowledge regarding the problem domain from
experts. Interaction between the knowledge engineer and the domain expert involves prolonged series of
intense systematic interviews or using a questionnaire (carefully designed to get expertise knowledge). The
knowledge engineer working on a system should be able to extract expert methods, procedures, strategies,
and thumb rules for solving the problem at hand.
VARUNMARAMR AJ 4
AI_III-UNIT III YEAR II SEM
Case History
Case history stores the files created by inference engine using the dynamic database (created at
the time of different consultation of the system) and is used by the learning module to enrich its
knowledge base. Different cases with solutions are stored in Case Base system and these cases are used for
solving the problem using Case Base Reasoning (CBR).
User Interfaces
User interface of an ES allows user to communicate with the system in an interactive manner and
helps the system in creating working knowledge for the problem that has to be solved. Knowledge may be
entered using some editor or specialized designed user interface. The function of the user , interface is to
present questions and information to the user need to be received and interpreted by the user interface
engine. Responses of the user may or may not be restricted to a set of possi ble legal answers. The user
interface checks all responses to ensure that they are of the correct data type.
Table 8.1 Dialogue Module of a User Interface
System Do you have fever?
User Yes
System Do you have sore throat?
User No
System Do you have cough?
User Yes
System Are you suffering from running nose?
User Yes
System Are you suffering from headache?
User No
Explanation Module
Most of the ES has explanation facilities that enable users to query the system about why it asked; some
questions and how it reached some conclusion; these modules are called How and why.
Explanation Module
invoke Why/How modules
system Do you have running nose?
user Yes
system I am trying to prove that you have cold. Do you have headache?
user Yes
System do you have sore throat?
User No
System Then you don’t have cold .Do you have fever?
User Yes
System Now I am looking at the possibility of measles. Do you have cough?
User Yes
System I can infer measles using the rule "If symptoms are fever, cough, running_nose, then patient has
measles". Measles is concluded.
User How?
System Since you have fever, running_nose, and cough and there is a rule "If symptoms are fever, cough,
running_nose, then patient has measles". So, measles is concluded for you.
VARUNMARAMR AJ 5
AI_III-UNIT III YEAR II SEM
Special Interfaces
Special interfaces may be used in ES for performing specialized activities, such as handling
uncertainty in knowledge. These interfaces form a major area of expert system research which involves
methods for reasoning with uncertain data and uncertain knowledge. A point to be kept in mind regarding
knowledge is that it is generally incomplete and uncertain. To deal with uncertain knowledge, a confidence
factor or a weight may be associated with a rule. The set of methods for using uncertain knowledge in
combination with uncertain data in the reasoning process is called reasoning with uncertainty. Another
important subclass of methods for reasoning with uncertainty is called fuzzy logic and the systems that use
them are known as fuzzy systems.
• Does the system make decisions that experts generally agree to?
• Are the inference rules correct and complete?
• Does the control strategy allow the system to consider items in the natural order that the expert prefers?
• Are relevant questions asked to the user in proper order (otherwise it will be an irritating process)?
• Are the explanation given by the ES adequate for describing how and why conclusions?
VARUNMARAMR AJ 6
AI_III-UNIT III YEAR II SEM
VARUNMARAMR AJ 7
AI_III-UNIT III YEAR II SEM
• If a formula is a theorem for a particular formal theory, then that formula remains a theorem for any
augmented theory obtained by adding axioms to the theory.
In monotonic reasoning, the world of axioms continually increases in size and keeps on expanding.
An example of monotonic form of reasoning is predicate logic. It represents a deductive re asoning system
where new facts are derived from known facts.
B. Non-monotonic monotonic System and Logic
In Non-monotonic system, truths that are present in the system can be retracted when ever
contradictions arise. Hence, number of axioms can increase as well as decrease. Non-monotonic
reasoning is based on inferences made by applying non-monotonic logic.
Both monotonic and non-monotonic reasoning can be best implemented using TMS described
earlier. The term truth maintenance is synonymous with the term knowledge base maintenance and is
defined as keeping track of the interrelations between assertions in a knowledge base.
TMSs are companion components to inference systems. Main job of TMS is to maintain consistency
of the knowledge being used by the problem solvers. The inference engine solves domain problems based
on its current belief set while TMS maintains the currently active belief set.
PROBLEM SOLVER
IE TMS
KB
C. Monotonic TMS
The most practical applications of monotonic systems using TMS are qualitative simulation, fault diagnosis,
and search applications.
A monotonic TMS is a general facility for manipulating Boolean constraints on proposition symbols. The
constraint has the form P ---> Q where P and Q are proposition symbols that an outside observer can
interpret as representations of the statements.
VARUNMARAMR AJ 8
AI_III-UNIT III YEAR II SEM
a proof of L. There are two interface functions used to generate such proofs namely justifying literals and
justifying constraints.
we have a premise set = {P, W} and an internal constraint set {P --> Q, (P ∧W) ->R, (Q ∧ R) —> S}. Most
truth maintenance systems are able to derive S from these constraints and the premise set TMS should
provide the justifications of deriving S from constraints and premi ses.
Justifying literals Derived literals Justifying constraints
{P, WI} R (P ∧W) —> R
{P} Q P-> Q
{Q, R} S (Q∧R)->S
Justification functions can produce a tree with literal S at the root as shown below. At each node of the
tree, the function justifying literals can be used to get children des until one reaches members of the
premise set. The justifications are required to be non-circular, that is, if Q appears in the justification tree
rooted at P, then P must not appear in the justification tree rooted at Q.
D. Non-monotonic TMS
The basic operation of a TMS is to attach a justification to a fact. A fact can be linked with any
component of program knowledge which is to be connected with other components of program
information.
VARUNMARAMR AJ 9
AI_III-UNIT III YEAR II SEM
the class of problems for which a direct algorithmic solution does not exist. The solution of these problems
requires the examination of state spaces.
VARUNMARAMR AJ 10
AI_III-UNIT III YEAR II SEM
ACQUIRE
Arity
ART
CLIPS(C language Integrated Production System)
FLEX
Gensym’s G2
GURU
HUGIN SYSTEM
Knowledge craft
K-vision
MailBot
TMYCIN
VARUNMARAMR AJ 11
AI_III-UNIT III YEAR II SEM
1. Probability Theory
The term probability is defined as a way of turning an opinion or an expectation into a number lying
between 0 and 1. It basically reflects the likelihood of an event, or a chance that a particular event will
occur. Assume a set S (known as a sample space) consisting of independent events, representing all
possible outcomes of a random experiment.
P(A)= (Number of outcomes favorable to A) /(Total number of possible outcomes)
Axioms of Probability
If S represents a sample space and A and B represent events, then the following axioms hold true.
Here, A' represents complement of set A.
•P(A) >=0
•P(S) = 1
•P(A') = 1— P(A)
•P(A U B )= P(A) + P(B), if events A and B are mutually exclusive
•P(A U B ) = P(A) + P(B) — P(A U B), if A and B are not mutually exclusive. This is called addition rule of
probability.
Let us prove that P(A U B ) = P(A) + P(B) — P(A B), if A and B are not mutually exclusive.
Proof We can easily see that A U B = (A U A') (A U B)= A U (A' B)
Therefore,
P(A U B) = P(A U (A' B))
= P(A) + P(A' B) (as A and A' n B are mutually exclusive)
= P(A) + P(A' B) + P(A B) — P(A B) (adding and subtracting P(A B))
= P(A) + P(B) — P(A B) (as P(B) = P(A' B) + P(A B))
If events A1, ..., An in S are mutually exclusive, then we can write
P (A1∪ A2 ...∪An) = P(A1)+P(A2)+••+ P(An)
A. Joint Probability
Joint probability is defined as the probability of occurrence of two independent events in
conjunction. That is, joint probability refers to the probability of both events occurring together. The joint P
of A and B is written as P(A ∩ B) or P(A and B). It may be defined as given below.
P (A and B) = P(A) * P(B)
Two events are said to be independent if the occurrence of one event does not affect the probability of
occurrence of the other.
Consider an example is denoted of two fair coins separately. The probability of getting a head H on
tossing the first coin s denoted by P(A) = 0.5, and the probability of getting a head on tossing the second
coin is denoted by P(B) = 0.5. The probability of getting H on both the coins is called joint probability and is
represented as P(A and B). It is calculated as follows:
P(A and B) = P(A) * P(B)
= 0.5 * 0.5
= 0.25
Similarly, the probability of getting a head H on tossing one or both coins can be calculated. It is
called union of the probabilities P(A) and P(B), and is denoted by P(A U B); it is also written as P(A or B). It
can be calculated for the above example as follows:
VARUNMARAMR AJ 12
AI_III-UNIT III YEAR II SEM
In fact, we can compute probability of any logical combination of A and Bin the following manner
P(A or B)=P(A) +P(B)-P(A and B)
=P(A and B) + P(A and B') + P(A and B) + P(A' and B) - P(A and B)
= P(A and B) + P(A and B') + P(A and B)
= 0.20 + 0.65 + 0.12
= 0.97
B. CONDITIONAL PROBABILITY
The concept of conditional probability relates the probability of one event to the occurrence of
another. It is defined as probability of the occurrence of an event H (hypothesis) provided an event E
(evidence) is known to have occurred. It is denoted by P(H|E) and may be represented as follows:
P(H|E)= Number of events favorable to H which are also favorable to E)
(Number of events favorable to E)
=P(H and E) /P(E)
However, this rule cannot be used in cases where P(E) = 0.
VARUNMARAMR AJ 13
AI_III-UNIT III YEAR II SEM
If events H and E defined on a sample space S of a random experiment arc independent, then P(H |
E) = P(H)
Proof
P(H |E) = P(H and E) / P(E) (by definition)
= P(H) * P(E) / P(E) (using joint probability)
= P(H)
Similarly, we can prove, P(E | H) = P(E)
example: We are given probability of any person chosen at random being literate as 0.40 and probability of
any person chosen at random having age > 60 years as 0.005. Find the probability of the fact that a person
chosen at random of age > 60 years is literate.
Solution:
The probability that a given person chosen at random is both literate and has age > 60 years is
calculated as follows:
[p(X is literate and the age of X > 60)] = P(X is literate) * P(Age of X > 60)
= 0.40 * 0.005
= 0.002
Then, the probability that a person chosen at random having age > 60 years is literate is calculated using
conditional probability
P(X is literate| Age of X > 60)] =[(X is literate and the age of X> 60)]/ [P(Age of X > 60)]
= 0.002/0.005
= 0.4
C. BAYES' THEOREM
It was developed by mathematician Thomas Bayes in the year 1763. This theorem provides a
mathematical model for reasoning where prior beliefs are combined with evidence to get estimates of
uncertainty. It relates the conditional probability and probabilities of events H to and E. The basic idea is to
compute P(H|E) that represents the probability assigned to H after taking into account the new piece of
evidence E.
Bayes' theorem relates the conditional probabilities of events which allows us to express p(H|E) in
terms of the probabilities P(E I H), P(E), and P(H). Here, H denotes hypothesis, while E represents a new
piece of evidence. The relation can be written as
P(H|E) =P(E | H) * P(H) / P(E)
We can derive Bayes' theorem from conditional probability also as shown below.
Example: Suppose we are given the probability of Mike has a cold as 0.25, the probability of mike was
observed sneezing when he had cold in the past as 0.9 and the probability of Mike was observed sneezing
when he did not have cold as 0.20. Find the probability of Mike having a cold given that he sneezes,
Solution
Let us represent Mike has a cold as an hypothesis. H and Mike sneezes as evidence, E. We have to
calculate P(H | E). The following probabilities are given.
Probability of the hypothesis Mike has a cold is 0.25 as P(H) = 0.25
Probability of the fact that Mike was observed sneezing when he had cold is 0.9.
P(Mike was observed sneezing | Mike has a cold) = P(E |H) = 0.90
Probability of the fact that Mike was observed sneezing when he did not have cold is 0.20
P(Mike was observed sneezing |Mike does not have a cold) = P(E |~H) = 0.20
We have to find the probability of Mike having cold, when he was observed sneezing. That is,
P(Mike has a cold I Mike was observed sneezing) = P(H |E)
We use the formula given P (H|E)=P(E| H) * P(H) / P(E | H) * P(H) + P(E |~H)* P(~H)
Let us compute P(E|H) * P(H) + P(E |~H) * P(~ H) = (0.90)(0.25) + (0.20) (0.75) = 0.375
Hence, we obtain P(H |E) = [(0.9 * 0.25)]/0.375 = 0.6
Therefore, we conclude that Mike's probability of having a cold given that he sneezes is equal to
0.6. Similarly, we can determine his probability of having a cold if he was not sneezing in the following
manner
P(H | ~E) = [P(~E I H) * P(H)] / P(~E) = [(1 —0.9)* 0.25]/(1 —0.375) =0.025/0.625 =0.04
Hence, Mike's probability of having a cold if he was not sneezing is obtained to be equal to 0.04.
VARUNMARAMR AJ 15
AI_III-UNIT III YEAR II SEM
Form 1
P(H|E1 and E2)=P(H) * P(E1 | H) * P(H | E1 and E2) / P(E1) * P(E2 | E1)
Derivation We now proceed to derive this form using Bayes' theorem and conditional probability.
We get the following using conditional probability formula:
P(H and El and E2) = P(E2| H and E1) * P(H and E1)
P(H and El) = P(E1 | H) *P(H) and
P(E1 and E2) = P(E21| E1) * P(E1)
Now, substitute the above terms in the following conditional formula
P(H I El and E2) = P(H and El and E2) / P(E1 and E2)
We get,
P(H El and E2) =P(E2|H and El) * P(H and E1) / P(E2 | El) * P(E1)
=P(E2 | H and E1) * P(E| H) * P(H) /P(E2 | El) * P(E1)
=P(H) * P(E1| H) * P(E2 H and El) /P(E1)*P(E2 | El)
Hence, the it is proved of the form.
Form 2
P(H |E1 and E2) = P(E2 | H and El ) * P(H | El )/ P(E2 | El)
P(E 1 | H) * P(H) = P(H | El) * P(E1)
Substituting the value of ( P(E1|H) * P(H)) in (9.7), we get
P (H E1 and E2) = P(H | E1) * P(E1) * P(E2 | H and El) / P(E1) * P(E2 | E1)
This implies
P(H | El and E2) = P(E2 | H and E1) * P(H |E1) / P(E2 | E1)
Hence, the proof of form 2.
Chain Evidence
VARUNMARAMR AJ 16
AI_III-UNIT III YEAR II SEM
if E1 is an evidence of hypothesis H and E2 is an evidence of the previous evidence E1, then we can
compute P(H | E1) using the following formula:
P(H| E1)= P(E1l H)* P(E2 | El ) * P(H)/P(El |E2)* P(E2)
Let us prove formula ,We know that P(H| E1) and P(E1 | E2) using Bayes theorem can be written as given
below.
P(H| E1)=P(E1|H) * P(H) / P(E1) ---------------1
P(E1 |E2)=P(E2 |E1)* P(E1)/P(E2)--------------2
From Eq (2), we get
P(E1)=P(E1| E2) * P(E2) / P(E2 | E1)-----------3
Substitute the value of P(E1) in Eq (1). We get the desired formula for P(H| E1) as follows:
P(H|E1)=P(E1 | H) * P(E2 | E1) * P(H) / p(E1 | E2)*P(E2)
VARUNMARAMR AJ 17
AI_III-UNIT III YEAR II SEM
F. Cumulative probabilities
for the e rules discussed in the preceding section, if we wish to reason about whether the battery
in dead, we should gather all relevant rules and facts. It is very important to combine the probabilities from
the facts and successful rules to get a cumulative probability of the battery being dead. The following two
situations will arise:
i. If sub goals of a rule are probable, then the probability of the rule to succeed should take care of the
probable sub goals.
ii. If rules with the same conclusion have different probabilities, then the overall probability of the rule has
to be found.
The first situation is resolved by simply computing cumulative probability of the conclusion with the help of
and-combination assuming that all sub goals are independent.
Prob(A and B and C and ...) = Prob(A) * Prob(B) * Prob(C) * …..
The second situation is handled by using or-combination to get the overall probability of predicate in the
head of rule. if events are mutually independent, the following formula is used to obtain the OR
probability: Prob(A or B or C or ...) = 1 — [(1 - Prob(A)) — Prob(B)) (1 — Prob(C))...]
Rule 1 if the telephone instrument is old and has been repaired several times in the past then it is 40% sure
that the fault lies with the instrument. it is coded in Prolog in the following manner:
tel ephone_not_ working(0.4) ask(tele_history).
Rule 2 if the instrument has fallen on the ground and broken, then it is 80% sure that the fault lies with the
instrument. The rule may be written as
telephone_not_working(0.8) ask(telephone_broken).
Rule 3 if there are children in the house who play with the keypad of the telephone, with some probability,
then it is 80% sure that the instrument is faulty because of excessive and unusual usage. The rule may be
written as
telephone_not_working(P):ask(chiLdren_present, P1), ask(children_keypad, P2), and_combination([P1, P2,
0.8], P).
VARUNMARAMR AJ 18
AI_III-UNIT III YEAR II SEM
• The system using Bayesian approach needs quite a large amount of probability data to constructa
knowledge base.
where we have four nodes with ( A, B )representing evidences and { C, D} representing hypotheses. Here, A
and B are unconditional no and C and D are conditional nodes.
Bayesian Belief Network Arcs between the nodes represent interdependence of evidences and
hypotheses; such as C is dependent on A; D is dependent on both A and B. Nodes A and B are independent
of each other. Each node has a probability attached to it.
P(A) = 0.3
P(B) = 0.6
P(C | A) = 0.4
P(C | -A) = 0.2
P(D | A, B) = 0.7
19 VARUNMARAMR AJ
AI_III-UNIT III YEAR II SEM
They can also be expressed in the form of conditional probability tables as given in Table 323
Conditional Probability Table
p(A) P(B) A P(C) A B P(D)
0.3 0.6 T 0.4 T T 0.7
F 0.2 T F 0.4
F T 0.2
F F 0.01
= 0.12.
Similarly, the values of components of the denominator may be computed as follows.
P(~A, B, C, D) = P(D| A, B) * P(C i A) * P(B) * P(~A)
= 0.7 * 0.4 * 0.6 * 0.7
= 0.1176
P(~A,~ B, C, D) =P(D | ~A, ~B) * P(C ~A) * P(~B) * P(~A)
= 0.01 * 0.2 * 0.4* 0.7
= 0.00056
VARUNMARAMR AJ 21
AI_III-UNIT III YEAR II SEM
it can be used to learn causal relationships, and hence can be used to gain understand ing about a
problem domain and to predict the consequences of intervention.
• it is an ideal representation for combining prior knowledge (which often comes in causal form) and
data because the model has both causal and probabilistic semantics.
• Bayesian statistical methods in conjunction with Bayesian networks offer an efficient and
principled approach for avoiding over-fitting of data.
Disadvantages
• The probabilities are described as a single numeric point value. This can be a distortion of the
precision that is actually available for supporting evidence.
• There is no way to differentiate between ignorance and uncertainty. These are distinct two
different concepts and be treated as such.
• There exists the computational difficulty of exploring a previously unknown network.
• All the branches must be calculated to find the probability of any branch of the network. Even
though the resulting ability to describe the network can be performed in linear time, this process
of network discovery is an NP-hard task. It might be either too costly to perform or impossible,
given the number and combination of variables.
• The quality and extent of the prior beliefs used in Bayesian inference processing are major short
comings
• -reliability of Bayesian network depends on the reliability of prior knowledge.
• Selecting the proper distribution model to describe the data has a notable effect on the quality of
the resulting network. Therefore, selection of the statistical distribution for modeling the data is
very important.
MB[H,E]=
The measure of disbelief (MD) is similarly defined as the relative decrement of belief in a given hypothesis
H due to some evidence E. it may be represented as follows:
VARUNMARAMR AJ 22
AI_III-UNIT III YEAR II SEM
MD[H,E]=
Alternatively we can use the following definition for the measure of disbelief MD in order to get value of
MD in the range [0, 1].
MB[H,E]=
MB is defined as below
Then, CF[H1, and H2, E] = MB[H1 and H2, E] — MD[H1 and H2, E]
Measure of Disbelief can be defined as
MB [H1 or H2, E] = Min(MB [H1, E], MB [H2, E])
MD [H1 or H2, E] = Min(MD [H1, E], MB [H2, E])
Then, CF[H1 or H2, El = MB [H1 or H2, E] — MD[H1 or H2, E]
Example
Suppose we are given MB [H, , E] = 0.4 and ME[H, E] = 0.3; MD [Hi , E] = O. 1 and MD[H2, E] 0.2.
Compute CF.
Solution
As already defined
MB[H1 and H2, El = Min(MB E], MB [H2, El) = 0.3
MD [H1 and H2, E] = Max(MD [H1, E], MD[H2, E])= 0.2
MB[H1 or H2, E] = Max(MB[H1 ,E], MB[H2, E])= 0.4
MD [H1 or H2, E] = Min(MD[H1, E], MD [H2, E]) = 0.1
Using these values, we can compute CF [H, E, and E2] and CF [H, E1 or E2] in the following manner:
CF[H, E1 and E2] = 0.3 — 0.2 = 0.1
CF[H, E1 or E2] = 0.4 —0.1 = 0.3
VARUNMARAMR AJ 24
AI_III-UNIT III YEAR II SEM
C=A∩B The combination of two belief functions is called joint mass. Here, m3 can also be written
as m1 or m2. The expression ∪ ∅ known as the normalization factor is a measure
of the amount of conflict between the two mass sets. The normalization factor has the effect of complexity
ignoring conflict and attributing any mass associated with conflict to the null set. Consider the diagram
given in Fig representing the lattice subset of the universe, U ={A,B,C}
Two basic belief functions m1 and m2 defined on some subsets. Below table shows Joint Belief calculated
using the formula discussed above.
na3
Combination of m2({A, C})=0.6 m2({B, C})=0.6 m2({∪})=0.6
m1 and m2
M1({A, B})=0.4 m3([A])= 0.24 m3( {B})= 0.32 M3({ A, B})= 0.16
M1({∪})=0.6 m3({AC}), = 0.12 m3 ({B,C}) 0.16 M3(U)= 0.8
we have mutually exclusive hypotheses represented by a set U = { flu, measles, cold, cough}. Let us define
two belief functions ml and m2 based on the evidence of fever and headache, respectively as follows:
VARUNMARAMR AJ 25
AI_III-UNIT III YEAR II SEM
From above Table, we see that we obtain multiple belief values for an empty Set ( ∅) and its total belief
value is 0.56. So, according to formula, we have to scale down the remaining values of non-empty sets s by
dividing by a factor (1 — 0.56 = 0.44). Hence, the final belief values may be written as
m5({flu}) = (0.144/0.44) = 0.327
m5( { cold })= (0.084/0.44) = 0.191
m5({ flu, cold)) = (0.036/0.44) = 0.082
m5( {flu, measles}) = (0.096/0.44) = 0.218
m5( { cold, cough }) = (0.056/0.44) = 0.127
m5(X) = (0.024/0.44) = 0.055
The term belief (or support) of a set A denoted by bel(A) may be defined formally as follows:
Definition: Belief is the sum of all masses of subsets of set A of interest and may be expressed
bel(A) = ∑m(B),∀ ∀B⊆A
For example, if X = { A, B, C}, then
bel(X) = m(A) + m(B) + m(C) + m({ A, B}) + m({ A, C) + m({B, C)) + m({A, B, C})
A belief interval can also be defined for a subset A. It is represented as sub interval [bel(A), pl(A)] of [0,1],
where pl(A) is called plausibility A.
Definition: The plausibility may be formally defined as the sum of all the masses of the sets B „pit' " set of
interest A. It may be expressed as
pl(A)=∑m(B),∀ B such that B∩A ≠∅
say P(A), and is bounded by support bel(A) and plausibility pl(A) as follows:
bel(A) <=P(A)<=pl(A)
some importent results using both belief and plausibility measures are listed below.
pi(A) >= bel(A)
pl(∅)= bel(∅)=0
bel(A) + bel(-A) <= 1
pl(A) + pl(-A) =>1
bel(U) = pl(U) =1
pl(A) + bel(-A) =1
The last result provides another definition of plausibility measures in terms of belief measure as
pl(A) 1 - bel(A).
Dempster and Shafer suggested use of the interval [bel(A), pl(A)] to measure the uncertainty of A (Shafer,
et. al, 1990).
VARUNMARAMR AJ 27