Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
24 views

Bayesian

Good morning with ml

Uploaded by

vartikaraj31.dev
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
24 views

Bayesian

Good morning with ml

Uploaded by

vartikaraj31.dev
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 14
9 SHOrTeR Bayesian Decision Theory and Bayesian Learning # INTRODUCTION ~“A mathematician Mr. R. Thomas Bayes (1761) suggested a method to take the actiong based on present observations. This method is called Bayesian Decision Theory (BDT)| The Bayesian Decision Theory is based on Bayes theorem. * Basic Concepts in Bayes Decision Theory 1, Marginal Probability (or Simple Probability) P(A): “The ordinary probability’ of occurrence of an event (A) irrespective of all other events is called simple or marginal probability.” It is denoted by P(A). No. of Successful Events a PA) = “Total no. of all Events Condition Probability P(A/B): “The probability (P) of occurrence of an event (A), when event (B) has already occurred is called conditional probability”. It is denoted by P(A/B). P(ANB) = PB) iS P(AIB)= +--(9.2) where P(A B) = P(A) x P(B) P(AMB)is the probability of both independent events A and B happening together. 3. Joint Probability P(A, B): The occurrence of two events (A) and (B) simultaneously is called Joint Probability. It is denoted by P(A, B) or P(A 7 B) or P(A and B). P (ANB) = PA, B) = P(A/B) x PB) (9.8) 192 a | csioN THEORY AND BAYESIAN LEARNING | gest puts! ; 4 pare. It is bi | P(BIA). P(A) (9.4) where, P(AIB) = Posterior Probability | Whe BIA) = Likelihood Probability | P(A) = Prior Probability (Probability of A) P(B) = Marginal Probability (Probability of B) Itis also called Evidence. 193 s Bayes in: The Bayes’ theorem was suggested by Mr. R. Thomas Bay 28 Taco vsed on the conditional probability. It is given as: « Applications of Bayes’ Theorem 1. Tocalculate next step of robot when already executed step is given to us. 2. For weather fore casting. Example 9.1. To calculate probability of “Fire” when “Smoke” is given with data as: Given, P(Fire) = Prior Probability = 0.3 P(Smoke | Fire) = Likelihood Probability = 0.5 P(Smoke) = Evidence = 0.7" Solution. To calculate, on P(Fire| Smoke) = PSmoke | Fire) x P(Fire) P(Smoke) 05) Put the numerical values . (0.5)x (0.3) _ 0.15 PQFire|S a ee (Fire | Smoke) on oT T0214 P(Fire|Smoke)=0.214 Ans, Similarly, we can formulate the problem of rain and clouds also. Example 9.2, Classify and find the fruit whic Hypothesis of a fruit = Use Bays theorem to find the h attributes according to the hypothesis as: {Yellow, Sweet, Long} “fruit” hypothesis, Table 9.1. Training data Fruit Yellow Sweet Long No. of fruits Orange 350 450 0 650 Banana 400 300 350 400 Others 50 100 50 150 Total 800 850 400 1200 Find PPruit |Data) 7 ace} MACHINE Solution. We know, the general formula for Bayes’ Theorem or Bayes’ rule given a, P(BIA). PA) Paip= Bre Let us find the probability of “Orange” fruit first as: 350 800 eo P(Orange| Yellow). P(Yellow) _ 800 1200 _ P(Yellow | Orange) = Ce Gu = B00 OO = 0.58 P(Yellow | Orange) = 0.53 Sweet). P(Sweet) P(Sweet | Orange) = ee a (Swe 1200 450 | 850 — 850 “1200 _ 0.52% 0.70 ~ 650 0.54 1200 P(Sweet | Orange) = 0.69 0 400 P(Orange| Long). P(Long) _ 400 “1200 9 P(Orange) “650° ) P(Long | Orange) = 0 / —, —{ Now, calculate the probability of fruit is Orange by multiply all probabilities results as: nn ae as P(Long | Orange) = P(Fruit | Orange) = 0.53 x 0.69 x 0= 0 P(Fruit | Orange) = 0 Similarly, we can calculate the probabilities of fruit being banana ahd others also as: P(Fruit | Banana) = 1 x 0.75 x 0.87 = 0.65 P(Fruit | Others) = 0.33 x 0.66 x 0.33 = 0.072 Now, let us summarize all probabilities results in a Table 9.2. Table 9.2 Fruit , | Resultant Probability Orange o Banana 0.65 < Gighest value) “ Others 0.072 = peasion THEORY AND BAYESIAN LEARNING aE ee i a = dhe Banane fruit has highest resultant probability of fruit, Therefore, Banana wer: is he ane Banana = {Yellow, Sweet, Long} Ans. mple 9.3. (Patient Diseases Problem) Ex@ consider the following data of a patient as: Effect = the state of patient. having red dot on skin, Cause = the state of patient having Rubella diseases. Let us Given Probabilities, 1 = —— =0.001 P(Cause) 7000 1 = — =0.01 P(Bifect) = 755 =O P(Bffect | Cause) = 0.9 Find the value of probability P(Cause | Effect) using Bayes’ Rule. Solution. The standard general formula of Bayes’ Rule is given as PIB) 1A). PA) (9.6 ( PB) (9.6) Here, P(A) = P(Cause PB = P(BIA) = P(Effect | Cause) = 0. 1.9 x 0.00: P(AIB) = P(Cause | Effect) = panheet = 0.09 P(AIB) = 0.09 Ans. The patient has a probability of (0.09) of having red dot on skin of patient to rubella diseases. Example 9.4. (Diagnostic Lab Testing Problem) Ina pathology testing lab, the COVID-19 tests are performed. All the lab testing results are not true. A diagnosis test has 99% accuracy and all 60% people have COVID-19. If apatient tests positive, what is the probability that they actually have the disease? Use Bayes’ theorem to find the probability. Solution. The general formula for Bayes’ theorem is given as PIA). P(A) PB) PIB) = 29.7) P@ositive/COVID19) x P(COVID19) P(COVID-19/Positive) = Same sitive) P(BIA) = Positive | COVID-19) = 99% or 0.99 P(A) = P(COVID-19) = 60% or 0.60 P(B) = Positive) = 0.6 * 0.99 + 0.4 X 0.01 = 0.59 Put values in Bayes’ formula, 0.99x 0.6 _ P(COVID-19| Positive) = 0.508 = 0.993 An w2 BAYES’ THEOREM IN TERMS OF POSTERIOR. PROBABILITY The Bayes theorem in terms of posterior probability is given as: PD 1A). P(r) = (9.4 PhD) PO) (of where, P(h|D) = Called posterior Probability or conditional probability of hypothesj (h) when data (D) is given _ P(D[A) = Called likelihood probability or conditional probability of data (| when hypothesis (h) is given P(h) = Prior probability of hypothesis (h) or simple probability of hypothesis (} P(D) = Prior probability of data (D) or simple probability of D. 9.; MAXIMUM A POSTERIOR (MAP) HYPOTHESIS AND MAXIMUM LIKELIHOOD (ML) HYPOTHESIS Definition: “The maximum probable hypothesis is called the Maximum A Posterio (MAP) hypothesis”. It is denoted by (hyy,p). It is given as: Iyzap = Arg max P(h|D) 9.9) Put, values from Bayes’ theorem we get, POIh). P(r) PO) In Eq, (9.10), we can ignore the denominator term P(D) because it is constant w.r.t. h. Therefore, yap = Arg max «..(9.10) Iyrap = Arg max P(D/h) . P(A) @.1) To further simplify, we assume all hypothesis i 7 = in sample space (H). In this case, we can write Eq, (oir Pauiprobable i.e, P(hy) = Plh) fy, = Arg max PD {hy @.12) where hiyy,= Maximum Likelihood Hypothesis P@|h) = Likelihood of Data (D) when hypothesis (h) is given z ‘AND BAYESIAN LEARNING sot THEORY. 9 pifference betrer® Max f(x) and Arg max /(@) functions in Mathemati-, gave Max fix) = ‘ions in Mathematics r qed maximum value of function f(x). 1. It is called ' max f(x) ris c® aa enti a argument of variable (x) value, netion f(x) has maximum ample, 2. For exampl For ex® 2g = 90° iple, 9. FOF” 4) of sin = ae yiax/@) © g has its max. value at 90°, It ax fix) of sin = 1 iemeans means sin @ has a maximum value of 1. UTE-FORCE BAYESIAN CONCEPT LEARNING i 8 Or yarivE OF EXPRESSION OF POSTERIOR PROBABILITY DERM RUTE-FORCE ALGORITHM P(HID) ogg BRUTEFORCE ALGORITHM ______ tus consider, for each hypothesis (h) in a sample space (B), calculate the posterior probability as: _ POIA).PM P(alD) = roe (9.18) t the hypothesis (yap) With the highest posterior Fygap = 278 max P(|D) (9.14) The brute-force algorithm uses Bayes’ theorem. It vequires a lot of computations to calculate the probability P(|D). To calculate P(|D), we need as follows. P(h) = a for all hin H (9.15) Outpu probability to determine the probability P(}) and P|?) first Let where, h =A single hypothesis H= A set consisting of all hypothesis H= (hy, hy, hg wor Mh / P(h) = Probability of hypothesis D) 1, itd =hd (9-16) i POI) = {i otherwise : where P(D|h)= Conditional probability of data (D) when hypothesis (h) is given. d,= Data value x= Variable value Now, put the value of P() ‘ana P(D|h) in Ba. (9-9), we get +H 91D, PAID) = po) | IVSyp! THI But P(D) = _ Therefore, put this value of P(D) in Eq. (9.13) we gos i pl 1VSi,01 P(h|D) = TVS, te 1 POID= a1 (9. where, VS,; p is called the version space of hypothesis set (H) w.r.t. data set (D), Finally, the Bayes’ theorem tells us that the posterior probability P(A | D) is given, 1 ith = constant PAID) = 4! VSu,p a 0, otherwise ; [VSu,p 1 * Derivative of probability of Data PO) = Tyr, Let us use theorem of total probability as: PO)= Y POA). Pry) Mell Hence Proved. 9.5 BAYES’ OPTIMAL CLASSIFIER ‘The Bayes’ optimal classifier is a “Probabilistic Model” which makes the most probable prediction for a new example. The optimal classification of a new instance is the value (V) for which the probability P(V, | D) is maximum. The equation for Bayes’ optimal classifier is given as: PCV | D)= Arg. max DPW; sh). Ph; /D) (9.20) + WSV Welt where, P(V;|D) = Probability of value (V,) when data (D) is given. P(V,|h) = Probability of value (V;) when hypothesis (h,) is given. P(h,|D) = Probability of hypothesis (h,), when data is given, called poster" probability. optimal classifier sified correctly, im This me ae Bayes’ Optimal Classificatio: ‘ample is cla menesis space (H) consists of three path US consider a hypoth HE Gy hy hy potheses as Pothesis space the posterior probabilities of these three h ‘ee hypotheses le Bathe hyP guppos® posterior probability of hy[P(h, 1D)] = 0.4 aren posterior probability of hy[P(ht,|D)] = 0.3 as posterior probability of hs[P(r51D)] = on Now, Let anew example (instance) occurs as V. Vj, = @ositi ip a ive ®, Negative ©) P(y|D)=0.4, P(Oly) =0, P@lh,) =1 a P(g] D) = 0.3, P(Olh,) =1, Ply) = 0 P(hg|D) =0.3, P(Olhs) =1, P(@|h,) =0 P(A;ID) PW Ah) Find that (V) is positive or negative i im J egative i.e.. . variable (V)- ie, correctly classify the new value Solution: Given, = P@lh)xPmID) Meo heH =0.4*1+0.3*% 040.3 0=04 Similarly, = ¥, Peelh;)xPih/D) kde w.-(9.22) =0.4x0+0.3X1+0.38x1=06 For correct classification of (V;) we use Bayes’ optimal classifier equation as: Arg. max >, POI h;) PID) = © Negative (9.23) ion sification of new example instance is negative O. a the final correct clas l NAIVE BAYES’ CLASSIFIER IN MACHINE LEARNING thm. It is based on Bayes’ ed learning algori ‘achine learning. The word Jems in m: ce”. The Naive Bayes’ classifier is a probability of an event. Fenaive Bayes’ algorithm is @ supervis "Naive" . It is used for solving classification prob) Drab means ‘untrained” or “without experien abilistic classifier. It predicts 07 the basis of “Naive” and “Bayes”. It ¢a, The Naive Bayes’ algorithm consists of two words be described as follows. * Naive: Naive means “untrained” or called “Naive” because it assumes that the occu! independent of other features. * Bayes’: It defined on Bayes’ theorem. It is given as: P(BIA). P(A) +++(9.24| PB) where, P(A|B) = Called posterior probability (Prob. of event (A) when (B) has occurred) P(B|A) = Called likelihood probability (Prob. of event (B) when (A) has occurred) P(A) = Prior probability P(B) = Marginal probability. . * Working of Naive Bayes’ Classifier: Let us understand the working of Naive Bayes’ classifier algorithm with an example. Example 9.6. Weather Condition: Suppose we have a data set of “Weather Condition’ as shown in Table 9.4. “without experience”. This algorithm ;t rence of a certain feature j P(A|B)= Table 9.4. Weather condition data set 0 Outlook Play 0 Rainy Yes 1 Sunny Yes 2. Overcast Yes 3. Overcast Yes 4. Sunny No 5. Rainy Yes 6. Sunny Yes 7 Overcast Yes 8. Rainy No 9. Sunny No 10. Sunny Yes 11. Rainy No 12. Overcast Yes 13. Overcast Yes -qucory AND Bavesiant LEARNING oo EG ble from ven data set. ag; oa frequency tal Ke # Me Table 9. . Frequency table Weather Yes No Overcast 5 a Rainy 2 2 al Sunny 8 ie peeer reer Total 10 5 2. Make Likelihood Table as: step“ Table 9.6. Likelihood table Weather No Yes Likelihood Overcast 0 5 5/14 = 0.35 Rainy 2 2 4/14 = 0.29 Sunny 2 3 5/14 = 0.35 ‘All 4/14 = 0.29 10/14 = 0.71 step 3. Apply Bayes’ Theorem as: ; PIA). PA) 5 . 9.2 P(AIB) PE) : (9.25) P(Sunny/Yes) x P(Yes) = P(Sunny/Yes) x Pes). ..(9.26) a values, | P(Yes/Sunny) P(Sunny) From likelihood tables (Table 9.6), we get, 3 PGunny/Yes) = 75 = 0:3 P(Sunny) = 0.35 P(Yes) = 0.71 0.3x 0.71 = 0:3%0.71 _ 9.60 P(Yes/Sunny) 0.35 P(Yes/Sunny) = 0-60 7 P(Gunny | No) x P(No) Similarly, P(No/Sunny) = eu SSannG) Ans. --(9.27) Again, from likelihood Table 9.6, we get pe 2 P(Sunny/No) = + = 0.5 : P(No) = 0.29 ls P(Sunny) = 0.35 {s P(No/Sunny) = 2:5%228 < 9.41 b 0.35 wv P(No/Sumny) = 0.41] Ans. As the P(¥es/Sunny) > P(No/Sunny) ie., 0.60 > 0.41. Therefore, we can say that on a sunny day, the player will go for play. * Advantages of Naive Bayes’ Classifier G Itis fast and easy algorithm for classification. @ Tecan be used for binary and multi classification. & Itis mostly used for text classification problems. ° Disadvantages of Naive Bayes’ Classifier 5 It cannot learn relation between independent features. a APPLICATIONS OF NAIVE BAYES’ CLASSIFIER 1. Real Time Prediction 2. Multiclass Classification 8. Text Classification 4, Spam Filtering 5. Sentiment Analysis 6. Recommendation System Note: A classifier is a machine learning model which segregates different objects on the basis of certain features of variables. 9.8 BAYESIAN BELIEF NETWORKS Definition: “A Bayesian belief network is a probabilistic graphical model. It represents a set of variables and their conditional dependencies using a directed acyelic graph”. It is also called a Bayes’ Network, Belief Network, Decision Network or Bayesian Model. The Bayesian network consists of two parts. 1, Directed Acyclic Graph (DAG) 2. Table of Conditional Probabilities In Fig. 9.1, each node represents a random variable, The branches (or Arc) represents relation (or conditional probability) between random variable. \ Fig. 9.1. Bayesian belief network graph ory ANO BAYESIAN LEARNING pc te = D ~. pelief networks sees Bat ie payesian are based on joir a nt. probability and marginal wpa” oS sont PROBABILITY ait ; isti 6: - statistical i bility 58 8 al measuire which alculates the likeli 8s the likelihood of two - 1s as The join Fu together at same point in same time. mat PO, ¥) = PRY) = PRK ; : and Y) = o oted by intersection (4) between X and Y Poe hm features given as XX j ae Woe 1» ap Ka oe X,, the joint probability is given as: PR Xe Xg eX) = [| PEXi Parents(X;)] u ‘ «+(9.28) oe Sv X,, = Features of an object wheres 7! Pp = Product operator X, = ith feature Parent (X,) = Parent node ; O10" MARGINAL PROBABILITY st form of probability is known as marginal probability. The probability of ‘The simple b occurring of aD event (A) in presence of all other events is called Marginal Probability. Teis an independent probability. Favourable Event P(A) = ee oy ‘All Events (9.29) (gig CONDITIONAL PROBABILITY ‘The probability of occurrence of an event (B) when event (A) has already occurred, is called conditional probability of (B). zy P(A and B) «-(9.30) P(BIA)= — PB) (9425 EXPECTATIO mization (EM) algorithm is The expectation—maxi ables of a sample space. ) variable variables from observed (seen) vari * Latent Variables: The unseen (missing data) s are called latent variables. EM algorithm js used to find latent variables of @ data set. * Maximum Likelihood Estimation: In other words, we can, also say that EM algorithm is used to find or vetimate the maximum Jikelihood of laten variables. N-MAXIMIZATION ALGORITHM used to find unknown (unseen) . 9.12.2 Applications of EM Algorithm 9.12.1 Detailed Explanation of EM Algorithm Given a set of incomplete data, start with a set of initialized parameters [ Expectation (B) Step: By using the observed available data of data sep, estimate or find the values of missing data. After this step, we get comple data with no missing values. Maximization (M) Step: Now, we use complete data to update the paramete| Repeat step 2 and step 3 until we converge to the solution. M-step Update Hypothesis M-step Update Variables Fig. 9.2 It is used to find the missing data (called latent variables) of a data set. It is used for parameter estimation of Hidden Markov Model. It is used to calculate Gaussian density of a function. It is used in Natural Language Processing (NLP), computer vision etc. It is used in medical field and structural engineering. Initial values q Expectation step a (E-step) q Maximization step M-step Fig. 9.3. Flow chart of EM algorithm | Besar DecHON THton’ AND BAYESAN LEARNING 9.12.3 Advantages and Disadvantages of EM Algorithm y* Advantages | . . F | ( « It is very easy to implement algorithm in machine learnin; | steps i.e., E-step and M-step only. e The solution of M-step exists in closed form. e After each iteration of EM algorithm, the value | * Disadvantages Ithas slow convergence, * It converges to local optimum only. ¢ Itrequires both forward and backward probabilities, | Examnle 97 Neiwn Da-- Ayo ee as it has only two of likelihood is increased,

You might also like