Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Expert System and Apllications: Ai - Iii-Unit

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

AI_III-UNIT III YEAR II SEM

EXPERT SYSTEM AND APLLICATIONS

One of the goals of AI is to understand the concept of intelligence and develop intelligent
computer programs. An example of a computer program that exhibits intelligent behavior is an expert
system (ES). Expert systems are meant to solve real-world problems which require specialized human
expertise and provide expert quality advice, diagnoses, and recommendations. An ES basically a software
program or system that tries to perform tasks similar to human experts in a specific domain of the
problem. It incorporates the concepts and methods of symbolic inference reasoning, and the use of
knowledge for making these inferences. Expert systems may also be referred to as knowledge-based
expert systems.

Expert system can process multiple values for any problem parameter. This causes them to pursue more
than one line of reasoning hence leading to results of incomplete (not fully determined) reasoning being
presented. Problem solving by theses systems is accomplished by applying specific knowledge rather than
specific technique. This is the key idea ES technology. The power ES lies in the store of knowledge
regarding the problem domain; the more knowledge a system is provided, the more competent it
becomes.

1 .PHASES IN BUILDING EXPERT SYSTEMS

Building an ES initially requires extracting the relevant knowledge from a human domain expert; this
knowledge is often based on useful thumb rules and experience rather than absolute certainties. After
extracting knowledge from domain experts, the next step is to represent this knowledge in the system.
Representation of knowledge in a computer is not straight forward and requires special expertise. A
knowledge engineer handles the responsibility of extracting this knowledge and building the ES' s
knowledge base. This process of gathering knowledge from a domain expert and codifying it according to
the formalism is called knowledge engineering. This phase is known as knowledge acquisition, which is a
big area of research. A wide variety of techniques have been developed for this purpose. Generally, an
initial prototype based on the information extracted by interviewing the expert is developed. This proto -
type is then iteratively refined on the basis of the feedback received from the experts and potential users
of the ES. Refinement of a system is possible only if the system is scalable and modifiable and does not
require rewriting of major code.

A simple ES primarily consists of a knowledge base and an inference engine, while features such as
reasoning with uncertainty and explanation of the line of reasoning enhance the capability of ES .Since ES
uses uncertain or heuristic knowledge just like humans, its credibility is often questionable.

To be more precise, the different interdependent and overlapping phases involved in buildi ng an ES may be
categorized as follows:

• Identification Phase In this phase, the knowledge engineer determines important feature of the problem
with the help of the human domain expert. The parameter that are determined in this phases include the
type and scope of the problem, the kind of resource required, and the goal and objective of the ES

• Conceptualization Phase In this phase, knowledge engineer and domain expert decide the concepts,
relations, and control mechanism needed to describe the problem- solving method. At this stage, the
issues of granularity are also addressed, which refers to the details required in the knowledge.
1 VARUNMARAMR AJ
AI_III-UNIT III YEAR II SEM

• Formalization Phase This phase involves expressing the key concepts and relations in some framework
supported by ES building tool . Formalized knowledge consists of data structure, inference rules, control
strategies, and languages required for implementation.

• Implementation Phase During this phase, formalized knowledge is converted to a working computer
program, initially called prototype of the whole system.

• Testing Phase this phase involves evaluating the performance and utility of prototype and revising the
system, if required. The domain expert evaluates the prototype system and provides feedback, which helps
the knowledge engineer to revise it.

Knowledge Engineering

The whole process of building an ES is often referred to as knowledge engineering.

• Ensuring that the computer has all the knowledge needed to solve a problem.
• Choosing one or more forms to represent the required knowledge.
• Ensuring that the computer can use the knowledge efficiently by selecting some of the reasoning
methods.
Questions /Quires Strategies &
domain rules Expert
Domain Knowledge
System
Expert Engineer

Answers/Solutions
Interaction between Knowledge Engineer and Domain Expert for Creating an ES

The main role of the knowledge engineer begins only once the problem of some domain for developing an
ES is decided. The job of the knowledge engineer involve s close collaboration with the domain expert(s)
and the end user(s).
The knowledge engineer will then extract general rules from the discussion and interview held with
expert(s) and gets them checked by the expert(s) for correctness. The engineer then translates the
knowledge into a computer-usable language and designs an inference engine, which is a reasoning
structure that uses the knowledge appropriately.
The domain knowledge, consisting of both formal, textbook knowledge and experiential knowledge
(obtained by the expert' experiences), is entered into the program piece by piece. In the initial stages, the
knowledge engineer may encounter a number of problems such as the inference engine may not be right,
the form of knowledge representation may not be appropriate for the kind of knowledge needed for the
task, or the expert may find the pieces of knowledge incorrect.
The development of ES would remain incomplete if it did not involve close collaboration with end users.
The basic development cycle should include the development of an initial prototype and iterative testing
and modification of that prototype by both experts (for checking the validity of the rules) and users (for
checking the performance of the system and explanations for the answers).
The initial prototype, the knowledge engineer will have to take provisional decisions regarding appropriate
knowledge representation (e.g., rules, semantic net or frames, etc.) and inference methods (e.g., forward
chaining or backward chaining or both). To test these basic design decisions, the first prototype may be so
designed that it only solves a small part of the overall problem.

VARUNMARAMR AJ 2
AI_III-UNIT III YEAR II SEM

There are two ways of building an ES: they can be built from either scratch or by using ES shells or tools.
Currently, expert system shells are available and widely used.

KNOWLEDGE REPRESENTATION

The collection of domain knowledge is called knowledge base, while the general problem solving
knowledge may be called inference engine, user interface, etc. The most common knowledge
representation scheme for expert systems consists of production rules, or simply rules; they are of the
form if –then, where the if part contains a set of conditions in some logical combinations.
Expert systems in which knowledge is represented in the form of rules are called rule based systems.
Another widely used representation in ES is called the unit (also known as frame, semantic net etc.), which
is based upon a more passive view of knowledge.
A unit consists a list of properties of an entity and associated values for those properties.

3. EXPERT SYSTEM ARCHITECTURE


Expert system architecture may be effectively described with the help of a diagram as given in
below figure which contains important components of the system. The user interacts with the system
through a user interface which may use menus, natural language, or any interacts of interaction. Then, an
inference engine is used to reason with the expert knowledge other as the data specific to the problem
being solved. Case-specific data includes both data Well.ded by the user and partial conclusions along with
certainty measures based on this data.
Generally, all expert systems possess an explanation subsystem, which allows the program explain its
reasoning to the user. Some systems also have a knowledge acquisition module that helps the expert or
knowledge engineer to easily update and check the knowledge base.

VARUNMARAMR AJ 3
AI_III-UNIT III YEAR II SEM

Knowledge Base
Knowledge base of an ES consists of knowledge regarding problem domain in the form of static and
dynamic databases. Static knowledge consists of rules and facts, or any other of knowledge representation
which may be compiled as a part of the system and does not change during execution of the system.
Dynamic knowledge consists of facts related to a particular consultation of the system collected by asking
various questions to the user who is consulting the ES. At the beginning of the consultation, the dynamic
knowledge base is empty. As the consultation progresses, dynamic knowledge base grows and is used in
decision making along with static knowledge.

Inference Engine
The term inference refers to the process of searching through knowledge base and deriving new
knowledge. It involves formal reasoning by matching and unification similar to the one performed by
human expert to solve problems in a specific area of knowledge using modus ponen rule. An inference rule
may be defined as a statement that has two parts, an if clause and a then clause. This rule enables expert
systems to find solutions to diagnostic prescriptive problems.
Each rule is independent of others and may be deleted or added without affecting, other rules.
Inference mechanism uses, control strategy that determines the order in which rules are applied. There are
mainly two types of reasoning mechanisms that use inference rules: backward chaining and forward
chaining.
The process of forward chaining starts with the available data and uses inference rules to conclude
more data until a desired goal is achieved. An inference engine uses facts from static and dynamic
knowledge bases and searches through the rules until it finds one in which the if clause is known to be
true. A rule is then said to succeed. It then concludes the then clause and adds this information to the
dynamic knowledge base. The inference engine continues to repeat the process until a goal is reached.
Since the data available determines which inference rules are used, this method is also known as data
driven method.
As an example of the forward chaining, we have the following rules:
Rule 1 If symptoms are headache, sneezing, running nose and sore throat, then patient has cold.
Rule 2 If symptoms are fever, cough and running nose, then patient has measles.
Back-ward chaining starts with a list of goals and works backwards to see if there is data which will
allow it to conclude any of these goals. An inference engine using backward chaining would search the
inference rules until it finds one whose then part matches a desired goal. If the if part of that inference rule
is not known to be true, then it is added to the list of goals.

Knowledge Acquisition
Knowledge present in an ES may be obtained from many sources such as textbooks, reports, case
studies, empirical data, and domain expert which are a prominent source of knowledge. A knowledge
acquisition module allows the system to acquire more knowledge regarding the problem domain from
experts. Interaction between the knowledge engineer and the domain expert involves prolonged series of
intense systematic interviews or using a questionnaire (carefully designed to get expertise knowledge). The
knowledge engineer working on a system should be able to extract expert methods, procedures, strategies,
and thumb rules for solving the problem at hand.

VARUNMARAMR AJ 4
AI_III-UNIT III YEAR II SEM

Case History
Case history stores the files created by inference engine using the dynamic database (created at
the time of different consultation of the system) and is used by the learning module to enrich its
knowledge base. Different cases with solutions are stored in Case Base system and these cases are used for
solving the problem using Case Base Reasoning (CBR).
User Interfaces
User interface of an ES allows user to communicate with the system in an interactive manner and
helps the system in creating working knowledge for the problem that has to be solved. Knowledge may be
entered using some editor or specialized designed user interface. The function of the user , interface is to
present questions and information to the user need to be received and interpreted by the user interface
engine. Responses of the user may or may not be restricted to a set of possi ble legal answers. The user
interface checks all responses to ensure that they are of the correct data type.
Table 8.1 Dialogue Module of a User Interface
System Do you have fever?
User Yes
System Do you have sore throat?
User No
System Do you have cough?
User Yes
System Are you suffering from running nose?
User Yes
System Are you suffering from headache?
User No

Explanation Module
Most of the ES has explanation facilities that enable users to query the system about why it asked; some
questions and how it reached some conclusion; these modules are called How and why.
Explanation Module
invoke Why/How modules
system Do you have running nose?
user Yes
system I am trying to prove that you have cold. Do you have headache?
user Yes
System do you have sore throat?
User No
System Then you don’t have cold .Do you have fever?
User Yes
System Now I am looking at the possibility of measles. Do you have cough?
User Yes
System I can infer measles using the rule "If symptoms are fever, cough, running_nose, then patient has
measles". Measles is concluded.
User How?
System Since you have fever, running_nose, and cough and there is a rule "If symptoms are fever, cough,
running_nose, then patient has measles". So, measles is concluded for you.

VARUNMARAMR AJ 5
AI_III-UNIT III YEAR II SEM

Special Interfaces
Special interfaces may be used in ES for performing specialized activities, such as handling
uncertainty in knowledge. These interfaces form a major area of expert system research which involves
methods for reasoning with uncertain data and uncertain knowledge. A point to be kept in mind regarding
knowledge is that it is generally incomplete and uncertain. To deal with uncertain knowledge, a confidence
factor or a weight may be associated with a rule. The set of methods for using uncertain knowledge in
combination with uncertain data in the reasoning process is called reasoning with uncertainty. Another
important subclass of methods for reasoning with uncertainty is called fuzzy logic and the systems that use
them are known as fuzzy systems.

3. Expert Systems versus Traditional Systems


a. Characteristics of Expert Systems Some
 Expertise: An ES exhibit expert performance, have high level of skill, and possesses adequate
robustness. The high-level expertise and skill of an ES aids in problem solving and makes In cost
effective
 Symbolic reasoning Knowledge in an ES is represented symbolically which can be easily, reformulated
and reasoned.
 Self knowledge A system should be able to explain and examine its own reasoning.
 Learning capability A system should learn from its mistakes and mature as it grows. Flexibility provided
by the ES helps it grow incrementally.
 Ability to provide training Every ES should be capable of providing training by explaining the reasoning
process behind solving a particular problem using relevant knowledge.
 Predictive modeling power This is one of the important features of ES. The system can act as an
information processing model of problem solving. It can explain how new situation led to the change,
which helps users to evaluate the effect of new facts and understand thei r relationship to the solution.

b. Evaluation of Expert Systems


Evaluation of an ES consists of performance and utility evaluation. Performance evaluation consists
of answering various questions such as the ones given below.

• Does the system make decisions that experts generally agree to?
• Are the inference rules correct and complete?
• Does the control strategy allow the system to consider items in the natural order that the expert prefers?
• Are relevant questions asked to the user in proper order (otherwise it will be an irritating process)?
• Are the explanation given by the ES adequate for describing how and why conclusions?

Utility evaluation consists of answering the following questions:

• Does the system help user in some significant way?


• Are the conclusions of the system organized and ordered in a meaningful way?
• Is the system fast enough to satisfy the user?
• Is the user interface friendly enough?

VARUNMARAMR AJ 6
AI_III-UNIT III YEAR II SEM

C .Advantages and Disadvantages of Expert Systems


Developing an expert system as it involves a lot of time and money, one should able to evaluate
whether a problem is suitable for ES solution or not by using the following guidelines:
 Specialized knowledge problem
 High payoff
 Existence of cooperating experts
 Justification of cost involved in developing es
 The types of problems
Advantages
• Helps in preservating scarce expertise.
• Provides consistent answers for repetitive decisions,processes, and tasks.
• Fastens the pace of human professional or semi-professional work.
• Holds and maintains significant levels of information,
• Provides improved quality of decision making.
• Domain experts are not always able to explain their logic and reasoning unlike ES
• Encourages organizations to clarify the logic of their decision- making.
• Leads to major internal cost savings within companies.
• Causes introduction of new products.
• Never forgets to ask a question, unlike a human.
Disadvantages
• Unable to make creative responses as human experts would in unusual circumstance
• Lacks common sense needed in some decision making.
•May cause errors in the knowledge base, and lead to wrong decisions.
•cannot adapt changing environments, unless knowledge base is changed

D. Languages for ES Development


The most prominent language is called LISP (LISt Processing), which is based on lambda calculus. It
is a simple, elegant, and flexible language; most Al research programs are written in LISP. Another AI
programming language, known as Prolog (PROgramming in LOGic), was invented in the early 1970s. Prolog
is based on first-order predicate calculus. Prolog programs behave in a manner similar to rule-based
systems.

4. Truth Maintenance Systems


The Truth maintenance system is a structure which helps in revising sets of beliefs and maintaining the
truth every time a new information contradicts information already present in the system .General -
purpose truth maintenance systems (TMS) have received considerable attention in the past few years.
A monotonic TMS manipulates proposition symbols and Boolean constraints. A non-monotonic TMS allows
for heuristic or non-monotonic relationships between proposition symbols such as 'whenever P is true Q is
likely' or 'if P is true then unless there is evidence to the contrary Q is assumed to be true'.

A. Monotonic system and logic


In monotonic systems, once a fact or piece of knowledge stored in the knowledge base is
identified, it cannot be changed during the process of reasoning. In other words, axioms are not allowed to
change as once a fact is confirmed to be true, it must always remain true and can never be modified.

VARUNMARAMR AJ 7
AI_III-UNIT III YEAR II SEM

• If a formula is a theorem for a particular formal theory, then that formula remains a theorem for any
augmented theory obtained by adding axioms to the theory.
In monotonic reasoning, the world of axioms continually increases in size and keeps on expanding.
An example of monotonic form of reasoning is predicate logic. It represents a deductive re asoning system
where new facts are derived from known facts.
B. Non-monotonic monotonic System and Logic
In Non-monotonic system, truths that are present in the system can be retracted when ever
contradictions arise. Hence, number of axioms can increase as well as decrease. Non-monotonic
reasoning is based on inferences made by applying non-monotonic logic.
Both monotonic and non-monotonic reasoning can be best implemented using TMS described
earlier. The term truth maintenance is synonymous with the term knowledge base maintenance and is
defined as keeping track of the interrelations between assertions in a knowledge base.
TMSs are companion components to inference systems. Main job of TMS is to maintain consistency
of the knowledge being used by the problem solvers. The inference engine solves domain problems based
on its current belief set while TMS maintains the currently active belief set.
PROBLEM SOLVER
IE TMS

KB

C. Monotonic TMS
The most practical applications of monotonic systems using TMS are qualitative simulation, fault diagnosis,
and search applications.
A monotonic TMS is a general facility for manipulating Boolean constraints on proposition symbols. The
constraint has the form P ---> Q where P and Q are proposition symbols that an outside observer can
interpret as representations of the statements.

Functionality of a Monotonic TMS


A TMS stores a set of Boolean constraints, Boolean formulas (premises) and assigns truth values to literals
that satisfy this stored set of constraints. These constraints are known as internal to literals and so do not
appear explicitly as arguments in most of the interface functions.
Add constraint: This interface functions adds a constraint to the internal constraint set. Once a constraint
has been added, it can never be removed.
Follows from: This function takes two arguments, namely, a literal (L) and a premise set (∑) and return the
values yes, no, or unknown. If yes is returned, then the TMS guarantees that L follows from ∑ and the
internal constraints. If no is returned, then the TMS guarantees that L does not follow from ∑, that is, there
exists an interpretation satisfying both the internal constraints and ∑ in which L is false. If the TMS is unable
to determine, then it returns unknown.
Interface Functions: The interface functions compute justifications. If the TMS can determine that L follows
from the internal constraints and a premise set ∑ then one can ask the TMS to justify this fact by producing

VARUNMARAMR AJ 8
AI_III-UNIT III YEAR II SEM

a proof of L. There are two interface functions used to generate such proofs namely justifying literals and
justifying constraints.
we have a premise set = {P, W} and an internal constraint set {P --> Q, (P ∧W) ->R, (Q ∧ R) —> S}. Most
truth maintenance systems are able to derive S from these constraints and the premise set TMS should
provide the justifications of deriving S from constraints and premi ses.
Justifying literals Derived literals Justifying constraints
{P, WI} R (P ∧W) —> R
{P} Q P-> Q
{Q, R} S (Q∧R)->S
Justification functions can produce a tree with literal S at the root as shown below. At each node of the
tree, the function justifying literals can be used to get children des until one reaches members of the
premise set. The justifications are required to be non-circular, that is, if Q appears in the justification tree
rooted at P, then P must not appear in the justification tree rooted at Q.

D. Non-monotonic TMS
The basic operation of a TMS is to attach a justification to a fact. A fact can be linked with any
component of program knowledge which is to be connected with other components of program
information.

Truth Maintenance Table


The term truth maintenance is synonymous with the term knowledge base maintenance and is
defined as keeping track of the interrelations between assertions in a knowledge base. TMS is regarded as
a knowledge-based management system that is activated every the reasoning system or inference system
generates a new truth value. TMS maintains a consistent set of beliefs for the inference engine. The
inference engine pope domain problems based on its current belief set, while TMS maintains a record of
the currently active belief set.
TMS basically operates with two kinds of objects, namely, propositions, which declares different
beliefs, and justifications.

E. Applications of TMS to Search


TMS is also useful in controlling searches. It more specifically controls those searches that are
needed in constraint satisfaction problems. Constraint satisfaction and search problems generally fall into

VARUNMARAMR AJ 9
AI_III-UNIT III YEAR II SEM

the class of problems for which a direct algorithmic solution does not exist. The solution of these problems
requires the examination of state spaces.

7. Applications of Expert System

Planning and Scheduling


 Airline scheduling of flights, ,
 Manufacturing job-shop scheduling,
 Creating plan for applying series of chemical reactions,
 Manufacturing process planning,
 Creating air strike plan projected over several days, etc.
Design and manufacturing.
 Gene cloning,
 Integrated circuits layouts designing,
 Creation of complex organic molecules,
 Modular home building,
 Manufacturing, etc.
Prediction
 Weather prediction for rains, storms, and tornado
 Prediction of crops
 Share market, etc.
Interpretation
 Interpreting data from ICU test equipment for further investigation or monitoring,
 Interpreting intelligent sensors reports to perform situation assessment,
 Interpreting radar images to classify ships, etc.
Financial decision making
 Foreign exchange trading
 Formulating financial strategies
 Giving advices, etc.
Process monitoring and control
 Steel-making and oil-refining industries.
 Nuclear reactor (to detect accident conditions),
 Patient monitoring system in hospitals (for controlling the treatment of the patient in ICUs,
monitoring components to track the behavior of the system, etc.), and so on.
Instruction An ES can offer tutorials and instructions to students by incorporating a student's behavior and
learning capability models and can also evaluate a student's acquired skills.
Debugging
 Suggest the type of maintenance needed to correct faulty equipments,
 Help in debugging large computer programs to improve the performance, etc.
Knowledge Publishing
 Advisors, which counsel a user on appropriate grammatical usage in a text,
 A tax advisor that accompanies a tax preparation program, which advises the user regarding tax
strategy, tactics, and individual tax policy.

VARUNMARAMR AJ 10
AI_III-UNIT III YEAR II SEM

8. LIST OF SHELLS AND TOOLS

 ACQUIRE
 Arity
 ART
 CLIPS(C language Integrated Production System)
 FLEX
 Gensym’s G2
 GURU
 HUGIN SYSTEM
 Knowledge craft
 K-vision
 MailBot
 TMYCIN

VARUNMARAMR AJ 11
AI_III-UNIT III YEAR II SEM

UNCERTAINTY MEASURE: PROBABILITY THEORY

1. Probability Theory
The term probability is defined as a way of turning an opinion or an expectation into a number lying
between 0 and 1. It basically reflects the likelihood of an event, or a chance that a particular event will
occur. Assume a set S (known as a sample space) consisting of independent events, representing all
possible outcomes of a random experiment.
P(A)= (Number of outcomes favorable to A) /(Total number of possible outcomes)
Axioms of Probability
If S represents a sample space and A and B represent events, then the following axioms hold true.
Here, A' represents complement of set A.
•P(A) >=0
•P(S) = 1
•P(A') = 1— P(A)
•P(A U B )= P(A) + P(B), if events A and B are mutually exclusive
•P(A U B ) = P(A) + P(B) — P(A U B), if A and B are not mutually exclusive. This is called addition rule of
probability.
Let us prove that P(A U B ) = P(A) + P(B) — P(A B), if A and B are not mutually exclusive.
Proof We can easily see that A U B = (A U A') (A U B)= A U (A' B)
Therefore,
P(A U B) = P(A U (A' B))
= P(A) + P(A' B) (as A and A' n B are mutually exclusive)
= P(A) + P(A' B) + P(A B) — P(A B) (adding and subtracting P(A B))
= P(A) + P(B) — P(A B) (as P(B) = P(A' B) + P(A B))
If events A1, ..., An in S are mutually exclusive, then we can write
P (A1∪ A2 ...∪An) = P(A1)+P(A2)+••+ P(An)

A. Joint Probability
Joint probability is defined as the probability of occurrence of two independent events in
conjunction. That is, joint probability refers to the probability of both events occurring together. The joint P
of A and B is written as P(A ∩ B) or P(A and B). It may be defined as given below.
P (A and B) = P(A) * P(B)
Two events are said to be independent if the occurrence of one event does not affect the probability of
occurrence of the other.
Consider an example is denoted of two fair coins separately. The probability of getting a head H on
tossing the first coin s denoted by P(A) = 0.5, and the probability of getting a head on tossing the second
coin is denoted by P(B) = 0.5. The probability of getting H on both the coins is called joint probability and is
represented as P(A and B). It is calculated as follows:
P(A and B) = P(A) * P(B)
= 0.5 * 0.5
= 0.25

Similarly, the probability of getting a head H on tossing one or both coins can be calculated. It is
called union of the probabilities P(A) and P(B), and is denoted by P(A U B); it is also written as P(A or B). It
can be calculated for the above example as follows:

VARUNMARAMR AJ 12
AI_III-UNIT III YEAR II SEM

P(A or B) = P(A) + P(B) - P(A) * P(B)


= 0.5 + 0.5 – 0.25
= 0.75
Let us consider another example where a joint probability distribution of two variables A and B are given in
Table 9.1. It should be noted that joint probability distribution for n variables require 2" entries with all
possible combinations.
Table : Joint Probability Distribution
Joint Probabilities A A'
B 0.20 0.12
B' 0.65 0.03

According to the data given in Table ,


P(A and B) = 0.20,
P(A and B') = 0.65,
P(A' and B)= 0.12,
P(A' and B') = 0.03.
From this available data, we can easily compute P(A) and P(B) as shown below.
P(A) = P(A and B) + P(A and B')
= 0.20 + 0.65 = 0.85
P(B)=P(A and B) + P(A' and B)
= 0.20 + 0.12 = 0.32

In fact, we can compute probability of any logical combination of A and Bin the following manner
P(A or B)=P(A) +P(B)-P(A and B)
=P(A and B) + P(A and B') + P(A and B) + P(A' and B) - P(A and B)
= P(A and B) + P(A and B') + P(A and B)
= 0.20 + 0.65 + 0.12
= 0.97

Alternatively, we can calculate P(A or B) as follows:


P(A or B)=1-P((A or B)')
=1-P(A' and B')
=1- 0.03
= 0.97
We can come across situations where the probability of a statement is based on piece of
evidence. This type of probability is called conditional probability.

B. CONDITIONAL PROBABILITY
The concept of conditional probability relates the probability of one event to the occurrence of
another. It is defined as probability of the occurrence of an event H (hypothesis) provided an event E
(evidence) is known to have occurred. It is denoted by P(H|E) and may be represented as follows:
P(H|E)= Number of events favorable to H which are also favorable to E)
(Number of events favorable to E)
=P(H and E) /P(E)
However, this rule cannot be used in cases where P(E) = 0.

VARUNMARAMR AJ 13
AI_III-UNIT III YEAR II SEM

If events H and E defined on a sample space S of a random experiment arc independent, then P(H |
E) = P(H)
Proof
P(H |E) = P(H and E) / P(E) (by definition)
= P(H) * P(E) / P(E) (using joint probability)
= P(H)
Similarly, we can prove, P(E | H) = P(E)

example: We are given probability of any person chosen at random being literate as 0.40 and probability of
any person chosen at random having age > 60 years as 0.005. Find the probability of the fact that a person
chosen at random of age > 60 years is literate.
Solution:
The probability that a given person chosen at random is both literate and has age > 60 years is
calculated as follows:
[p(X is literate and the age of X > 60)] = P(X is literate) * P(Age of X > 60)
= 0.40 * 0.005
= 0.002
Then, the probability that a person chosen at random having age > 60 years is literate is calculated using
conditional probability
P(X is literate| Age of X > 60)] =[(X is literate and the age of X> 60)]/ [P(Age of X > 60)]
= 0.002/0.005
= 0.4
C. BAYES' THEOREM
It was developed by mathematician Thomas Bayes in the year 1763. This theorem provides a
mathematical model for reasoning where prior beliefs are combined with evidence to get estimates of
uncertainty. It relates the conditional probability and probabilities of events H to and E. The basic idea is to
compute P(H|E) that represents the probability assigned to H after taking into account the new piece of
evidence E.
Bayes' theorem relates the conditional probabilities of events which allows us to express p(H|E) in
terms of the probabilities P(E I H), P(E), and P(H). Here, H denotes hypothesis, while E represents a new
piece of evidence. The relation can be written as
P(H|E) =P(E | H) * P(H) / P(E)
We can derive Bayes' theorem from conditional probability also as shown below.

Proof We can express P(H |E) using conditional probability as follows:


P(H |E) = P(H and E) / P(E)
P(H |E) * P(E) = P(H and E) --------------------1

Similarly, P(E I H) can be expressed as

P(E I H) = P(E and H) / P(H)


P(E I H) * P(H) = P(E and H)-----------------------2

From Eqs (1) and (2), we get


P(H I E) * P(E) = P(E I H) * P(H)
Hence, we obtain P(HIE) = P(E| H)*P(H)/ P(E)
This is the Bayes' theorem.
VARUNMARAMR AJ 14
AI_III-UNIT III YEAR II SEM

P(H |E)=P(E | H) * P(H) /P(E | H) * P(H) + P(E | ~H) *p(~H)


: This formula can be derived from the definition of P(H | E) given above.
In order to prove above statement P(E) = P(E | H) * P(H) + P(E | ~H) *p(~H)
we know that P(E)= P (E and H) + P(E and ~H)
using conditional probability,
we obtain
P(E and H) = P(E | H) * P(H)
P(E and ~H) = P(E I -14) * P(~H)
Therefore, p(E) = (E| H) * P(H) + P(E|~H) * P(~H)
Now, substituting the expression for P(E) as P(E | H) * P(H) + P(E |~H) * P(~H) in the definition of P(H | E)
we get
P(H |E)=P(E | H) * P(H) /P(E | H) * P(H) + P(E | ~H) *p(~H)
All the terms that we have come across while discussing Bayes' theorem have conventional names assigned
to them. They are as follows:
• P(H) is known as the prior probability of H. It is called prior because it does not consider any information
regarding E.
• P(H | E) is known as the conditional probability of H, given E. It is also called posterior probability because
it is derived from or depends on the specified value of E.
• P(E |H) is known as the conditional probability of E given H.
• P(E) is the prior probability of E and acts as a normalizing constant.
Some expert systems use Bayesian theory to derive further concepts. For instance, the rule of the
type ‘lf-then' in rule-based systems can incorporate probability as 'if X is true then Y can be conclude with
probability P'.

Example: Suppose we are given the probability of Mike has a cold as 0.25, the probability of mike was
observed sneezing when he had cold in the past as 0.9 and the probability of Mike was observed sneezing
when he did not have cold as 0.20. Find the probability of Mike having a cold given that he sneezes,
Solution
Let us represent Mike has a cold as an hypothesis. H and Mike sneezes as evidence, E. We have to
calculate P(H | E). The following probabilities are given.
Probability of the hypothesis Mike has a cold is 0.25 as P(H) = 0.25
Probability of the fact that Mike was observed sneezing when he had cold is 0.9.
P(Mike was observed sneezing | Mike has a cold) = P(E |H) = 0.90
Probability of the fact that Mike was observed sneezing when he did not have cold is 0.20
P(Mike was observed sneezing |Mike does not have a cold) = P(E |~H) = 0.20
We have to find the probability of Mike having cold, when he was observed sneezing. That is,
P(Mike has a cold I Mike was observed sneezing) = P(H |E)
We use the formula given P (H|E)=P(E| H) * P(H) / P(E | H) * P(H) + P(E |~H)* P(~H)
Let us compute P(E|H) * P(H) + P(E |~H) * P(~ H) = (0.90)(0.25) + (0.20) (0.75) = 0.375
Hence, we obtain P(H |E) = [(0.9 * 0.25)]/0.375 = 0.6
Therefore, we conclude that Mike's probability of having a cold given that he sneezes is equal to
0.6. Similarly, we can determine his probability of having a cold if he was not sneezing in the following
manner
P(H | ~E) = [P(~E I H) * P(H)] / P(~E) = [(1 —0.9)* 0.25]/(1 —0.375) =0.025/0.625 =0.04
Hence, Mike's probability of having a cold if he was not sneezing is obtained to be equal to 0.04.

VARUNMARAMR AJ 15
AI_III-UNIT III YEAR II SEM

D. EXTENSIONS OF BAYES' THEOREM


Different possibilities are discussed given below.
One Hypothesis and Two Evidences
Consider a given hypothesis H and two evidences E1 and E2. The probability of H if both E1 and E2
are true is calculated by using the following formula:
P(H |E1 and E2) = P(E1| H) * P(E2 | H) * p(H) / P(E1 and E2)
We can derive different forms of the formula using Bayes' theorem and conditional probability to compute
P(H I El and E2). These forms are described as follows:

Form 1
P(H|E1 and E2)=P(H) * P(E1 | H) * P(H | E1 and E2) / P(E1) * P(E2 | E1)
Derivation We now proceed to derive this form using Bayes' theorem and conditional probability.
We get the following using conditional probability formula:
P(H and El and E2) = P(E2| H and E1) * P(H and E1)
P(H and El) = P(E1 | H) *P(H) and
P(E1 and E2) = P(E21| E1) * P(E1)
Now, substitute the above terms in the following conditional formula
P(H I El and E2) = P(H and El and E2) / P(E1 and E2)
We get,
P(H El and E2) =P(E2|H and El) * P(H and E1) / P(E2 | El) * P(E1)
=P(E2 | H and E1) * P(E| H) * P(H) /P(E2 | El) * P(E1)
=P(H) * P(E1| H) * P(E2 H and El) /P(E1)*P(E2 | El)
Hence, the it is proved of the form.

Form 2
P(H |E1 and E2) = P(E2 | H and El ) * P(H | El )/ P(E2 | El)
P(E 1 | H) * P(H) = P(H | El) * P(E1)
Substituting the value of ( P(E1|H) * P(H)) in (9.7), we get
P (H E1 and E2) = P(H | E1) * P(E1) * P(E2 | H and El) / P(E1) * P(E2 | E1)
This implies
P(H | El and E2) = P(E2 | H and E1) * P(H |E1) / P(E2 | E1)
Hence, the proof of form 2.

One Hypotheses and Multiple Evidences


If there are n evidences and one hypotheses, then conditional probability is defined as follows:
P(H|E1 and ... and En) = P(H and E1 and ... and En)/ P(E1 and ... and En)
From now on we will write P(H, El , ..., En) instead of P(H and El and ... and En) for the sake of convenience.
So the formula can be rewritten as follows:
P(H| E1..., En) =P(H, E1, ..., En) / P(E1,..., En)
Bayes' theorem to express conditional probability involving n > 2 evidences which are assumed to be
independent of each other is defined as follows:
P(H | E1.., En) = P(E1 | H)*...* P(En | H)* P(H) / P(El, ..., En)

Chain Evidence

VARUNMARAMR AJ 16
AI_III-UNIT III YEAR II SEM

if E1 is an evidence of hypothesis H and E2 is an evidence of the previous evidence E1, then we can
compute P(H | E1) using the following formula:
P(H| E1)= P(E1l H)* P(E2 | El ) * P(H)/P(El |E2)* P(E2)
Let us prove formula ,We know that P(H| E1) and P(E1 | E2) using Bayes theorem can be written as given
below.
P(H| E1)=P(E1|H) * P(H) / P(E1) ---------------1
P(E1 |E2)=P(E2 |E1)* P(E1)/P(E2)--------------2
From Eq (2), we get
P(E1)=P(E1| E2) * P(E2) / P(E2 | E1)-----------3
Substitute the value of P(E1) in Eq (1). We get the desired formula for P(H| E1) as follows:
P(H|E1)=P(E1 | H) * P(E2 | E1) * P(H) / p(E1 | E2)*P(E2)

Multiple Hypotheses and Single Evidence


To increase the utility of probability theory in learning; we must also know the probability of some
hypotheses for a particular evidence.
Assume that there is a set of hypotheses {H1, H2, ..., HK}, and an evidence E for each hypothesis,
then we can compute P(Hi |E) for each hypothesis Hi using the following formula and select the one with
the highest probability:
P(Hi|E)= P(E | Hi) * P(Hi) / P(E)-----------------1
There is another definition for computing P(Hi |E). For any evidence E, we can write
P(E) = E | Hj) * P(Hj)------------2
Substituting the value of P(E) from Eq (2) in Eq (1), we get
P(Hi|E)=P(E | Hi) * P(Hi) / E| Hj) * P(Hi)---------3
From the definition given at (1), we notice that P(E) is a common factor in computing
P(Hi |E).Hence, we can simply look for hypothesis with maximum posterior probability, that is,
Max[P(E|Hi) * P(Hi)] for each Hi. Further, if we assume that each hypothesis is equally probable given no
evidence, that is, P(Hi)= P(Hj) ∀I, j, then we can further reduce this computation by finding Max[P(E Hi)].
This value is known as likelihood of evidence E given hypotheses Hi. Although learning from observations
gives more accurate results, simpler formulas increase efficiency and save valuable computation time.

Multiple Hypotheses and Multiple Evidences


In some situations, we might have multiple hypotheses {H1, H2, ..., Hk} and multiple sources of
evidences {El, E2,… En}. We can generalize the formula given by above Eq to obtain
P( Hi |E1 ..En) = P(E1, ..., En | Hi) * P(Hi) / (E1,..., En | Hj) * P(Hj)
This equation is very complex to compute. This is one of the serious drawbacks of the Bayesian approach.
A large number of probabilities need to be known in advance to apply this formula.

E. Probabilities in rules and facts of rule-based system


In predicate logic, we normally assume that the facts and rules are always true; however, in real-
life situations, facts may also be probably true. A simple way of representing such facts in Prolog is
achieved by putting probability as the last argument of a predicate representing the facts. Probabilities can
be added to both facts and rules.

VARUNMARAMR AJ 17
AI_III-UNIT III YEAR II SEM

F. Cumulative probabilities
for the e rules discussed in the preceding section, if we wish to reason about whether the battery
in dead, we should gather all relevant rules and facts. It is very important to combine the probabilities from
the facts and successful rules to get a cumulative probability of the battery being dead. The following two
situations will arise:
i. If sub goals of a rule are probable, then the probability of the rule to succeed should take care of the
probable sub goals.
ii. If rules with the same conclusion have different probabilities, then the overall probability of the rule has
to be found.

The first situation is resolved by simply computing cumulative probability of the conclusion with the help of
and-combination assuming that all sub goals are independent.
Prob(A and B and C and ...) = Prob(A) * Prob(B) * Prob(C) * …..
The second situation is handled by using or-combination to get the overall probability of predicate in the
head of rule. if events are mutually independent, the following formula is used to obtain the OR
probability: Prob(A or B or C or ...) = 1 — [(1 - Prob(A)) — Prob(B)) (1 — Prob(C))...]

G. Rule-based system using probability:


Example Let us develop a simple rule-based system for the diagnosis of malfunctioning of some equipment,
say a landline telephone, using the concept of probabilities. As the first step, various situations under
which the telephone does not work properly are identified with the help of experts of this domain. For
example, we can think of the following reasons for the malfunctioning of telephones:
• Faulty instrument
• Problem in Exchange
• Broken cable

Rule 1 if the telephone instrument is old and has been repaired several times in the past then it is 40% sure
that the fault lies with the instrument. it is coded in Prolog in the following manner:
tel ephone_not_ working(0.4) ask(tele_history).
Rule 2 if the instrument has fallen on the ground and broken, then it is 80% sure that the fault lies with the
instrument. The rule may be written as
telephone_not_working(0.8) ask(telephone_broken).
Rule 3 if there are children in the house who play with the keypad of the telephone, with some probability,
then it is 80% sure that the instrument is faulty because of excessive and unusual usage. The rule may be
written as
telephone_not_working(P):ask(chiLdren_present, P1), ask(children_keypad, P2), and_combination([P1, P2,
0.8], P).

H. Bayesian method: advantages and disadvantages


Advantages
• Bayesian method is based on a strong theoretical foundation in probability theory; it is currently the most
advanced of all certainty reasoning methods.
• This method has well-defined semantics for decision making.
Disadvantages

VARUNMARAMR AJ 18
AI_III-UNIT III YEAR II SEM

• The system using Bayesian approach needs quite a large amount of probability data to constructa
knowledge base.

2. BAYESIAN BELIEF NETWORKS


The time and storage requirements for large computations become impractical as the value of n
increases. When reasoning with uncertain beliefs human tend to single out a few propositions that are
known to be causally linked. This lead to the lot concept of forming a belief network called a Bayesian
belief network. This belief network is an efficient structure for storing joint probability distribution. In
bayesian networks, only one branch of the tree needs to be traversed. in the case two variables E and H
are only concerned with P(E), P(H E) and P(H, E).
A. Formal Definition of Bayesian Belief Networks
The Bayesian belief network may be formally defined as an acyclic (with no cycle) directed graph
where the nodes of the graph represent evidence or by and an arc connecting two nodes represents
dependence between them.
Variables Joint probability for n variables (dependent or independent) table for n variables. may be
computed as given below.
P(X1……. Xn) =P(Xn | X1……. Xn-1) *P(X1……. Xn-1)
Or
P(X1……. Xn) =P (Xn | X1……. Xn-1) *P(Xn-1 ……. Xn-2)*………* P(X2| X1 )* P(X1 )
JOINT PROBABILITY OF N VARIABLES USING BAYESIAN NETWORK
The joint probability distribution can be written as the local distribution each node and its parents such as
P(X1 …… Xn) = Xi |parent_nodes (X i))
This expression is reduction of joint probability formula of
n variables as some of the terms =i corresponding to independent variables will not be required. if node Xi
has no parents, its local probability distribution is said to be unconditional and it is then written as P(Xi)
instead of P(Xi |parent_nodes (xi)). The nodes having parents are called conditional nodes if the value of
node is observed, then the node is said to be an evidence node. Nodes with no children are termed as
hypotheses nodes, while nodes with no parents are called independent nodes.

where we have four nodes with ( A, B )representing evidences and { C, D} representing hypotheses. Here, A
and B are unconditional no and C and D are conditional nodes.

Bayesian Belief Network Arcs between the nodes represent interdependence of evidences and
hypotheses; such as C is dependent on A; D is dependent on both A and B. Nodes A and B are independent
of each other. Each node has a probability attached to it.
P(A) = 0.3
P(B) = 0.6
P(C | A) = 0.4
P(C | -A) = 0.2
P(D | A, B) = 0.7
19 VARUNMARAMR AJ
AI_III-UNIT III YEAR II SEM

P(D | A. -B) = 0.4


P(D | -A, B) = 0.2
P(D | -A, -B) = 0.01

They can also be expressed in the form of conditional probability tables as given in Table 323
Conditional Probability Table
p(A) P(B) A P(C) A B P(D)
0.3 0.6 T 0.4 T T 0.7
F 0.2 T F 0.4
F T 0.2
F F 0.01

the joint probability for four variables can be computed using


P(A, B, C, D) =P(D| A, B, C) * P(C | A, B) * P(B | A) * P(A)
For example, C is not dependent on B; D is not dependent on C. Hence, P(C | A, B) is reduced to P(C| A) and
P(D | A, B, C) is reduced to P(D | A, B). Further, A and B are independent, therefore, P(B|A) is reduced to
P(B). With these reductions, we obtain the joint probability as
P(A, B, C, D) =P(D| A, B, C) * P(C | A, B) * P(B | A) * P(A)
= 0.7 * 0.4 * 0.6 * 0.3
= 0.0504

B.INFERENCE USING BAYESIAN NETWORK


In order to make network useful for reasoning and problem solving, we should have the mechanism for
computing the influence of a node on any other node
what is the likelihood or probability that the hypothesis is A given C? which is represented as P(A|C)
We can write P(A |C) = P(A,C)/P(C)
where P(A,C) =
where P(C) =
We can now compute the values of numerator and denominator as follows:
= P(A, B, C, D) + P(A,~B,C,D)+P(A,B,C, ~D) + P(A, ~B, C,D) + P(A, ~ B, C, ~D)
= P(A, B, C, D) + P(A, B, C, ~D)+P(A, ~B,C,D) + P(A, ~B,C, ~D) +
P(~A, B, C,D)+ P(~A,~B,C,D) + P(~A, B, C, ~D) + P(~A, ~B, C, ~D)
Each of the joint probabilities on the right-hand-side of both expressions can be easily conipt, using the
data given in conditional probability Table 9.2. The values of each component of the numerator may be
computed as follows:
P(A, B, C, D) = P(D | A, B) * P(C | A) * P(B) * P(A)= 0.7 * 0.4 * 0.6 * 0.3 =0.0504
P(A, ~B, C. D) = P(D | A, ~B) * P(C | A) * P(~B) * P(A) = 0.4 * 0.4* 0.4 * 0.3 =0.0191
P(A, B, C, ~D) = P(~D | A, B) * M| A) * P(B) * P(A) = 0.3 * 0.4 * 0.6 * 0.3 =0.0216
P(A, ~B. C, ~D) = P(~D|A, ~B) * P(C A) * P(~B) * P(A) = 0.6 * 0.4 * 0.4 * 0.3 =0.0288
The value of the numerator
= P(A, B, C, D) + P(A, ~B, C, D) + P(A, B, C,~D) + P(A, ~B, C, ~D)
= 0.0504 + 0.0192 + 0.0216 + 0.0288
VARUNMARAMR AJ 20
AI_III-UNIT III YEAR II SEM

= 0.12.
Similarly, the values of components of the denominator may be computed as follows.
P(~A, B, C, D) = P(D| A, B) * P(C i A) * P(B) * P(~A)
= 0.7 * 0.4 * 0.6 * 0.7
= 0.1176
P(~A,~ B, C, D) =P(D | ~A, ~B) * P(C ~A) * P(~B) * P(~A)
= 0.01 * 0.2 * 0.4* 0.7
= 0.00056

P(~A, B, C, ~D) = P(~D |~A. B) * P(C | ~A) * P(B) * P(~A)


= 0.8 * 0.2 * 0.6 * 0.7
=0.0672
P(~A, ~B, C, ~D)= P(~D| ~A, ~B) * P(C ~A) * P(~B) * P(~A)
= 0.99 * 0.2 * 0.4 * 0.7
= 0.05544
The other four values have already been computed in the numerator and their sum Denominator value =
0.12 + 0.1176 + 0.00056 + 0.0672 + 0.05544 = 0.3608.
Hence, P(A | C) = 0.12/0.3608 = 0.33 (approximately).

Simple bayesian network example:

Inference using Bayesian Belief Network


Using this model one can answer questions using the conditional probability formula as follows:
• "What is the probability that it is an earthquake, given the alarm is ringing?" = P(E| A)
• "What is the probability of burglary, given the alarm is ringing?" = P(B| A)
• "What is the probability of the alarm ringing if both earthquake and ' = P(A |E), burglary occur
• "What is the probability that it is an earthquake, given that there is a tornado'?" = P(E| D)
• "What is the probability of the alarm ringing if both earthquake and tornado occur''
• "What is the probability of the alarm ringing if there is an e = P(A| E) earthquake?"
• "What is the probability that it is a tornado, if there is an earthquake?" = P(D | E)
327

Bayesian Belief Network: Advantages and Disadvantages


Advantages
• Since this model encodes dependencies among all variables, it can easily handle situati ons where
some data entries are missing.
• it is intuitively easier for a human to understand direct dependencies and local distributions than
complete joint distribution.

VARUNMARAMR AJ 21
AI_III-UNIT III YEAR II SEM

 it can be used to learn causal relationships, and hence can be used to gain understand ing about a
problem domain and to predict the consequences of intervention.
• it is an ideal representation for combining prior knowledge (which often comes in causal form) and
data because the model has both causal and probabilistic semantics.
• Bayesian statistical methods in conjunction with Bayesian networks offer an efficient and
principled approach for avoiding over-fitting of data.
Disadvantages
• The probabilities are described as a single numeric point value. This can be a distortion of the
precision that is actually available for supporting evidence.
• There is no way to differentiate between ignorance and uncertainty. These are distinct two
different concepts and be treated as such.
• There exists the computational difficulty of exploring a previously unknown network.
• All the branches must be calculated to find the probability of any branch of the network. Even
though the resulting ability to describe the network can be performed in linear time, this process
of network discovery is an NP-hard task. It might be either too costly to perform or impossible,
given the number and combination of variables.
• The quality and extent of the prior beliefs used in Bayesian inference processing are major short
comings
• -reliability of Bayesian network depends on the reliability of prior knowledge.
• Selecting the proper distribution model to describe the data has a notable effect on the quality of
the resulting network. Therefore, selection of the statistical distribution for modeling the data is
very important.

3. CERTAINTY FACTOR THEORY


Certainty factor theory provides another way of measuring uncertainty by describing a practical way of
compromising on pure Bayesian system. Certainty factor (c F ) is based on number of observations.
The certainty factor is based on some simplifying assumptions for creating confidence measure and
combining these confidences. These assumptions involve representing for and confidence against
separately by measure of belief (MB) and for measure of disbelief’s(MD).
The MB[H, E] is a measure of belief in the range [O, l]in hypothesis H given the evidence E. If evidence
supports it fully then MB [H, E] =1 and it is zero the evidence fails to support the hypothesis. Similarly, MD
[H, E] is a measure of disbelief in the range [0, 1] in hypothesis H given the evidence E. it measures the
extent to which the evidence E supports the negation of the hypothesis H .It is to be noted that MD is not
compliment of MB.
Measure of belief may be intutively defined as follows:
MB[H,E]= (1-P(H))-(1-P(H| E))/ (1-P(H))
=P(H| E)-P(H)/ (1-P(H))
we can modify the above definition to obtain positive value of The modified definition of MB may be
written as

MB[H,E]=

The measure of disbelief (MD) is similarly defined as the relative decrement of belief in a given hypothesis
H due to some evidence E. it may be represented as follows:

VARUNMARAMR AJ 22
AI_III-UNIT III YEAR II SEM

MD[H,E]=
Alternatively we can use the following definition for the measure of disbelief MD in order to get value of
MD in the range [0, 1].

MB[H,E]=

These measures satisfy the following properties:

• if hypothesis H is true assuming E, then MB[H, E] = 1, MD[H, E] = 0, and CF[H, E]= 1


• if hypothesis H is not true assuming E, then MB[H, El = 0, MD[H, E] = 1, and CF[H,E] = - 1
• The sum of CF of belief and disbelief of hypothesis H, given evidence E is 0. That is, CF[H,E]+ CF[-
H,E] = 0

CASE 1 incrementally Acquired Evidence

MB is defined as below

MB[H,E1 and E2]=


Similarly, MD is defined as below


: MD[Hand E1, d E2] =

Now CF can be written as


CF[H, E1 and E2] = MB[H, E1 and E2] — MD[H, E1 and E2]
Example Suppose we make an initial observation E1 that confirms our belief in H with MBR[H E1]and
MD[H. E1] = 0. Consider a second observation E2 that also confirms H with MBR[H,E2]=0.3 and MD[H And
E2) = 0.1. Compute CF[H, E1 and E2].
Solution
Given MB[H, E1] = 0.4 and MD[H, E1] = 0, we get CF[H,E1] = 0.4. The measure of belief of H with combined
observations E1 and E2 is determined in the following manner:
MB[H, E and E2] = MB(H, E 1) + MB(H, E2) * (1 — MB(H, E1))
=0.4+0.3 * (1 —MB(H, E1))
=0.4+0.18
= 0.58
Similarly.
MD[H, E1 and E2] = MD(H, E1) + MD(H, E2) * (1— MD(H, E1))
= MD(H,E2) =0.1
Therefore. CF[H, E1, and E2] = 0.58 — 0.1
= 0.48
Case 2 Combination of two Hypotheses based on same evidence:
Measure of belief can be defined as
MB[ H1, and H2, E] =Min(MB[H1,E], MB[2, E])
MD[H, and H2, E] = Max(MD[H1, E], MD[H2, E])
23
VARUNMARAMR AJ
AI_III-UNIT III YEAR II SEM

Then, CF[H1, and H2, E] = MB[H1 and H2, E] — MD[H1 and H2, E]
Measure of Disbelief can be defined as
MB [H1 or H2, E] = Min(MB [H1, E], MB [H2, E])
MD [H1 or H2, E] = Min(MD [H1, E], MB [H2, E])
Then, CF[H1 or H2, El = MB [H1 or H2, E] — MD[H1 or H2, E]
Example
Suppose we are given MB [H, , E] = 0.4 and ME[H, E] = 0.3; MD [Hi , E] = O. 1 and MD[H2, E] 0.2.
Compute CF.
Solution
As already defined
MB[H1 and H2, El = Min(MB E], MB [H2, El) = 0.3
MD [H1 and H2, E] = Max(MD [H1, E], MD[H2, E])= 0.2
MB[H1 or H2, E] = Max(MB[H1 ,E], MB[H2, E])= 0.4
MD [H1 or H2, E] = Min(MD[H1, E], MD [H2, E]) = 0.1
Using these values, we can compute CF [H, E, and E2] and CF [H, E1 or E2] in the following manner:
CF[H, E1 and E2] = 0.3 — 0.2 = 0.1
CF[H, E1 or E2] = 0.4 —0.1 = 0.3

Case 3 Chained Rule in chained rule


Measure of belief can be defined as
E1 --> E2 —> H
MB[H, E2] =MB1 [H,E2 * Max{ 0, CF(E2, E1 ]).
Measure of Disbelief can be defined as
MD[H, E2] = MD1[H, E2] * Min{ 1, CF[E2,E1]}.
Then, CF[H, E2] = MB[H, E2]— MD [H,E2]

4 DEMPSTER SHAFER THEORY


Dempster-Shafert theory (or the D-S theory) was developed by A P Dempster in 1968 and was extended G
Shafer in 1976. Belief functions allows us to base degrees of belief or confidence for one event on
probabilities of related events. whereas bayesian theory requires probabilities for each event. Dempster –
shafer degree of belief resembles certainty factor .
D-S theory is more attractive because it is relatively flexible and is based on twp ideas namely ,obtaining
degree of belief for one event from probability of related events and obtaining a rule for combining such
degree of belief when they are based on independent items of evidence. Therefore it may also be
considered to be a theory of evidence based on belief of evidence based on belief functions and plausible
reasoning.
The D-S theory uses number in the range [0,1] to indicate amount of belief in a hypothesis for a given
piece of evidence. This number indicates the degree to which the evidence supports the hypothesis.
Implementing D–S theory in a specific problem generally involves solving two related problems. First, we
must sort the uncertainties in the problem into a priori independent items of evidences. Second, we must
carry out Dempster's rule which is computationally feasible.
A degree of belief (also referred to as a mass) is represented as a belief function rather then a Bayesian
probability distribution.

VARUNMARAMR AJ 24
AI_III-UNIT III YEAR II SEM

A. DEMPSTER THEORY FORMALISM


Let Ube the universal set (universe of discourse) of all hypotheses, propositions, or statements
under consideration. The power set P(U), is the set of all possible subsets of U, including the empty set
represented by ∅. The theory of evidence assigns a belief mass to each subset of the power set. A function
m: P(U) -> [0, 1] is called a basic belief assignment (BBA) function. it satisfies the following two axioms:
m(∅) = 0
∑m(A) = 1, ∀ A P(U)
m is also known as a basic probability assignment. The quantity m(A) is a measure of that Portion of m(A) .
n of the total belief committed exactly to A but to no particular subset of A. The value of m(A) is called
mass assigned to A on the unit interval and pertains only to the set A and makes no additional claims
about any subsets of A.

B.DEMPSTER 'S RULE OF COMBINATION


It is a generalization of Baye’s rule. This rule strongly emphasizes the agreement between multiple
sources and ignores all the evidence through a normalization factor.
Assume that m1 and m2 are two belief functions used for representing multiple sources of
evidences for two different hypotheses. Let A, B ⊂U, such that m1 (A) ≠0 m2(B)≠0.The Dempster's rule for
combining two belief functions to generate a function m3 may be defined as
m3(∅) = 0
m3( C) =
∪ ∅

C=A∩B The combination of two belief functions is called joint mass. Here, m3 can also be written
as m1 or m2. The expression ∪ ∅ known as the normalization factor is a measure
of the amount of conflict between the two mass sets. The normalization factor has the effect of complexity
ignoring conflict and attributing any mass associated with conflict to the null set. Consider the diagram
given in Fig representing the lattice subset of the universe, U ={A,B,C}

Two basic belief functions m1 and m2 defined on some subsets. Below table shows Joint Belief calculated
using the formula discussed above.
na3
Combination of m2({A, C})=0.6 m2({B, C})=0.6 m2({∪})=0.6
m1 and m2
M1({A, B})=0.4 m3([A])= 0.24 m3( {B})= 0.32 M3({ A, B})= 0.16
M1({∪})=0.6 m3({AC}), = 0.12 m3 ({B,C}) 0.16 M3(U)= 0.8

we have mutually exclusive hypotheses represented by a set U = { flu, measles, cold, cough}. Let us define
two belief functions ml and m2 based on the evidence of fever and headache, respectively as follows:
VARUNMARAMR AJ 25
AI_III-UNIT III YEAR II SEM

M1,({flu, measles}) = 0.8


M1(U) = 0.2
M2({flu, cold)) = 0.6
M2(U) = 0.4
We can now compute their combination m3 using the values given in
Values of m3
Combination of ml and m2({flu, cold}) = 0.6 m2(U) = 0.4
m2
m1({flu, measles}) = 0.8 m3({ flu)) = 0.48 m3((flu,measles)) = 0.32
m1(U) = 0.2 m3({ flu, cold)) = 0.12 m3(U) = 0.08
We notice that previous belief functions are modified to m3 with the following belief values and are
different from earlier beliefs.
m3({flu})= 0.48
m3({ flu, cold))= 0.12
m3( { flu, measles)) = 0.32
m3(∪) =0.08
Further, if We have another evidence function m4 of sneezing with the following belief values:
M4({ cold, Cough }) = 0.7
m4(∪) = 0.3
Then' the Combination of m3 and m4 gives another belief function as given in Table 9.6.
Table 9.6 Values of New Belief Function
Combination of m1 and m2 m4({cold, cough)) = 0.7 m4(∪)=0.3
m3((flu)) = 0.48 m5(∅)= 0.336 m5({flu})=0.114
m3((flu, cold)) = 0.12 m5(( cold)) = 0.084 m5({ flu, cold))=0.036
m3({flu, measles}) = 0.32 m5(∅) = 0.224 m5({ flu, measles})=0.096
m3(∪)=0.08 m5({cold, cough}) = 0.056 m5(U)=0.024

From above Table, we see that we obtain multiple belief values for an empty Set ( ∅) and its total belief
value is 0.56. So, according to formula, we have to scale down the remaining values of non-empty sets s by
dividing by a factor (1 — 0.56 = 0.44). Hence, the final belief values may be written as
m5({flu}) = (0.144/0.44) = 0.327
m5( { cold })= (0.084/0.44) = 0.191
m5({ flu, cold)) = (0.036/0.44) = 0.082
m5( {flu, measles}) = (0.096/0.44) = 0.218
m5( { cold, cough }) = (0.056/0.44) = 0.127
m5(X) = (0.024/0.44) = 0.055

Handling Different Situations


• If we get an empty set (0) by intersection operation, then we have to redistribute any belief than IS
assigned to ∅ sets proportionately across non-empty sets using the value (1- ∅ )
in the denominator of belief values for non-empty sets.
• While computing a new belief, we may obtain same subset generated from different intersection
process. The value m for this set is computed by summing all such values.

C. Belief and Plausibility


VARUNMARAMR AJ 26
AI_III-UNIT III YEAR II SEM

The term belief (or support) of a set A denoted by bel(A) may be defined formally as follows:
Definition: Belief is the sum of all masses of subsets of set A of interest and may be expressed
bel(A) = ∑m(B),∀ ∀B⊆A
For example, if X = { A, B, C}, then
bel(X) = m(A) + m(B) + m(C) + m({ A, B}) + m({ A, C) + m({B, C)) + m({A, B, C})
A belief interval can also be defined for a subset A. It is represented as sub interval [bel(A), pl(A)] of [0,1],
where pl(A) is called plausibility A.
Definition: The plausibility may be formally defined as the sum of all the masses of the sets B „pit' " set of
interest A. It may be expressed as
pl(A)=∑m(B),∀ B such that B∩A ≠∅
say P(A), and is bounded by support bel(A) and plausibility pl(A) as follows:
bel(A) <=P(A)<=pl(A)
some importent results using both belief and plausibility measures are listed below.
pi(A) >= bel(A)
pl(∅)= bel(∅)=0
bel(A) + bel(-A) <= 1
pl(A) + pl(-A) =>1
bel(U) = pl(U) =1
pl(A) + bel(-A) =1
The last result provides another definition of plausibility measures in terms of belief measure as
pl(A) 1 - bel(A).
Dempster and Shafer suggested use of the interval [bel(A), pl(A)] to measure the uncertainty of A (Shafer,
et. al, 1990).

VARUNMARAMR AJ 27

You might also like