Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
21 views

AI & ML Unit 2 Notes

Uploaded by

Anandakumar A
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

AI & ML Unit 2 Notes

Uploaded by

Anandakumar A
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Unit – II

Acting Under Uncertainty:


Uncertainty:
• The knowledge representation, A->B means if A is true then B is true, but a
situation where not sure about whether A is true or not then cannot express this
statement, this situation is called Uncertainty.
• Agents must act under Uncertainty.

Causes for uncertainty:

• Information occurred from unreliable sources


• Experimental Errors
• Equipment Fault
• Temperature variation
• Climate change

Probabilistic Reasoning:

• It is a way of knowledge representation, the concept of probability is applied to


indicate the uncertainty in knowledge.
Need for Probabilistic Reasoning in AI:
✓ When there are unpredictable outcomes
✓ When an unknown error occurs during an experiment

Ways to solve problems with uncertain knowledge:

✓ Baye’s rule
✓ Bayesian Statistics

Probability:

• It can be defined as a chance that an uncertain event will occur.


• The value of probability always remains between 0 and 1.
• 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
• P(A) = 0, indicates total uncertainty in an event A
• P(A) =1, indicates total certainty in an event A.

Event: Each possible outcome of a variable is called event.


Sample Space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and
objects in real world.
Prior Probability: It is probability computed before observing new information.
Posterior Probability: It is calculated after all information has taken into
account.

Conditional Probability:

• It is probability of occurring an event when another event has already happened.

Bayesian Inference:

• Bayesian inference is a probabilistic approach to machine learning that provides


estimates of the probability of specific events.
• Bayesian inference is a statistical method for understanding the uncertainty
inherent in prediction problems.
• Bayesian inference algorithm can be viewed as a Markov Chain Monte Carlo
algorithm that uses prior probability distributions to optimize the likelihood
function.
• The basis of Bayesian inference is the notion of a priori and a posteriori
probabilities.
• The priori probability is the probability of an event before any evidence is
considered.
• The posteriori probability is the probability of an event after taking into account
all available evidence.

Baye’s Theorem / Baye’s Rule:

• Baye’s theorem determines the probability of an event with uncertain


knowledge.
• It can be derived using product rule and conditional probability of event A
with known event B.

P(A|B) is known as posterior, P(B|A) is called likelihood, P(A) is called Prior


probability, P(B) is called Marginal Probability.

Application of Baye’s Theorem:

• It is used to calculate next step pf robot when already executed step is given.
• Helpful in weather forecasting.
• Solve Monty Hall problem

Naïve Bayes Theorem:

• It is a classification technique based on Baye’s Theorem with an independence


assumption.
• The full joint distribution can be written as

Bayesian Networks:

• "A Bayesian network is a probabilistic graphical model which represents a set of


variables and their conditional dependencies using a directed acyclic graph."
• It is also called a Bayes network, belief network, decision network, or Bayesian
model.
• Bayesian Network can be used for building models from data and experts
opinions, and it consists of two parts: Directed Acyclic Graph, Table of
conditional probabilities.
• It is used to represent conditional dependencies.
• It can also be used in various tasks including prediction, anomaly detection,
diagnostics, automated insight, reasoning, time series prediction.
• A Bayesian network graph is made up of nodes and Arcs.

• Each node corresponds to the random variables, and a variable can be continuous
or discrete.
• Arc or directed arrows represent the causal relationship or conditional
probabilities between random variables.
• These directed links or arrows connect the pair of nodes in the graph.
• These links represent that one node directly influence the other node.
• The Bayesian network graph does not contain any cyclic graph. Hence, it is
known as a directed acyclic graph or DAG.
• The Bayesian network has mainly two components: 1. Causal Component 2.
Actual numbers
• Bayesian network is based on Joint probability distribution and conditional
probability.

Joint probability distribution:

• If variables are x1, x2, x3,....., xn, then the probabilities of a different
combination of x1, x2, x3.. xn, are known as Joint probability distribution.
• P[x1, x2, x3, ,xn], can be written as the following way in terms of the joint
probability distribution. = P[x1| x2, x3,....., xn]. p[x2, x3, , xn] = P[x1| x2,
x3,....., xn]P[x2|x3,....., xn] P[xn-1|xn]P[xn].
• Global semantics defines the full joint distribution as the product of local
condition distributions.
• Local semantics defines each node is conditionally independent of its
nondescendants given its parents.

Example:

The network structure is showing that burglary and earthquake is the parent node of the alarm and directly
affecting the probability of alarm's going off.

Variables: Burglar, Earthquake, Alarm, Johncalls, Marycalls

Conditional Probability table for Alarm A:


Conditional Probability table for David Calls:

Conditional Probability table for Sophia Calls:

Applications of Bayesian Networks:


• Spam Filtering
• Biomonitoring
• Image processing
• Turbo code
• Document Classification

Exact Inference in BN:

• In exact inference, analytically compute the conditional probability distribution


over the variables of interest.
• The basic task for any probabilistic inference system is to compute the
posterior probability distribution for a set of variables.
• The notation X denotes the query variable, E denotes the set of evidence
variables E1,…,Em, Y denotes nonevidence variables.
• Conditional probability can be computed by summing terms from the full joint
distribution.

• Now, a Bayesian network gives a complete representation of the full joint


distribution.
• More specifically, Equation shows that the terms P(x, e, y) in the joint
distribution can be written as products of conditional probabilities from the
network.
• To compute this expression, we have to add four terms, each computed by
multiplying five numbers.
• In the worst case, where we have to sum out almost all the variables, the
complexity of the algorithm for a network with n Boolean variables is O(n2n).

• This expression can be evaluated by looping through the variables in order.


Variable Elimination Algorithm:

• The enumeration algorithm can be improved substantially by eliminating


repeated calculations.
• Do the calculation once and save the results for later use.
• This is a form of dynamic programming.
• It works by evaluating expressions such as equation in right-to-left order.

Approximate Inference in BN:

• Given the intractability of exact inference in large networks, we will consider approximate inference
methods.

• This section describes randomized sampling algorithms, also called Monte Carlo algorithms.

• They work by generating random events based on the probabilities in the Bayes net and counting
up the different answers found in those random events.

Direct Sampling methods:

• The primitive element in any sampling algorithm is the generation of samples


from a known probability distribution.
• For example, an unbiased coin can be thought of as a random variable Coin with
values (heads, tails) and a prior distribution P(Coin) = (0.5,0.5).
• Sampling from this distribution is exactly like flipping the coin: with probability
0.5 it will return heads, and with probability 0.5 it will return tails.
• Given a source of random numbers r uniformly distributed in the range [0,1], it
is a simple matter to sample any distribution on a single variable, whether
discrete or continuous.
• The idea is to sample each variable in turn, in topological order.
• The probability distribution from which the value is sampled is conditioned on
the values already assigned to the variable’s parents.
• Applying it to the network with the ordering Cloudy, Sprinkler, Rain.
• Sample from P(Cloudy) = {0.5,0.5}, value is true.
• Sample from P(Sprinkler | Cloudy) = {0.1,0.9}, value is false.
• Sample from P(Rain | Cloudy = true) = {0.8,0.2}, value is true.

Rejection Sampling in Bayesian Networks:

• Rejection sampling is a general method for producing samples from a hard-to-


sample distribution given an easy-to-sample distribution.
• It can be used to compute conditional probabilities that is, to determine P(X |e).
• First it generates samples from the prior distribution specified by network.
• Then it rejects all those that do not matches the evidence.

Markov Chain Monte Carlo (MCMC) Algorithm:

• MCMC generates each event by making a random change to the preceding event.
• It is therefore helpful to think of the network as being in a particular current state
specifying a value for every variable.
• The next state is generated by randomly sampling a value for one of the
nonevidence variables Xi, conditioned on the current values of the variables in
the Markov blanket of Xi.
• MCMC therefore wanders randomly around the state space-the space of possible
complete assignmentsflipping one variable at a time, but keeping the evidence
variables fixed.
• Consider the query P(Rain1 Sprinkler = true, Wet Grass = true) applied to the
network.
• The evidence variables Sprinkler and WetGrass are fixed to their observed values
and the hidden variables Cloudy and Rain are initialized randomly.
• Thus, the initial state is [true, true, false, true]. Now the following steps are
executed repeatedly:
• Cloudy is sampled, given the current values of its Markov blanket variables: in
this case, we sample from P(Cloudy1 Sprinkler = true, Rain =false). Suppose the
result is Cloudy =false. Then the new current state is [false, true, false, true].

Causal Networks:

• A causal network is an acyclic digraph arising from an evolution of a substitution


system.
• Each substitution event is a vertex in a causal network.
• Two events which are related by causal dependence, meaning one occurs just
before the other, having edge between the corresponding vertices in the causal
network.
• The edge is directed edge leading from the past event to future event.
• A CBN is a graph formed by nodes representing random variables, connected by
links denoting causal influence.
• Some causal networks are independent of choice of evolution and these are
called Causally Invariant.

Structural Causal Models (SCMs):

• SCMs consists of two parts: a graph which visualizes causal connections, and
equations which express the details of the connection. Graph is a mathematical
construction that consists of vertices(nodes) and edges(links).
• SCMs use a special kind pf graph called Directed Acyclic Graph(DAG) for
which all edges are directed and no cycles exist.
• DAGs are common starting place for causal inference.
• Bayesian and causal networks are completely identical.

• A network with 2 nodes and 1 edge.


• This network can be both a Bayesian or causal network.

Implementing Causal Inference:


1) The do-operator:

• The do-operator is a mathematical representation of a physical intervention.


• If the model starts with Z → X → Y.
2) Confounding:

• In this example, age is a confounder pf education and wealth.


• Adjusting for age just means that when looking at age, education and wealth
data, one would compare data points within afe groups, not between age
groups.

3) Estimating Causal Effects:

• Treatment Effect = (Outcome under E) minus (Outcome under C).


• The difference between the outcome a child would receive if assigned to
treatment E and outcome that same child would receive of assigned to
treatment C.
• These are called Potential Outcomes.

You might also like