Unit 3 (DWDM)

Basic Concept of Classification :
Classification is a form of data analysis that extracts models describing important data classes. Such
models, called classifiers, predict categorical (discrete, unordered) class labels. For example, we
canbuildaclassificationmodeltocategorizebankloanapplicationsaseithersafeorrisky.Suchanalysiscanhelp
provideuswithabetterunderstandingofthedataatlarge.Many classification methods have been proposed
by researchers in machine learning, pattern recognition, and statistics.
Data Mining: Data mining in general terms means .In the process of data mining, large data sets are first
sorted, then patterns are identified and relationships are established to perform data analysis and solve
problems.
Classification is a task in data mining that involves assigning a class label to each instance.
There are two main types of classification:
1.binary classification
Binary classification involves classifying instances into two classes, such as “spam” or “not spam”.
2.multi-class classification
multi-class classification involves classifying instances into more than two classes
Classification: It is a data analysis task, i.e. the process of finding a model that describes and distinguishes
data classes and concepts. Classification is the problem of identifying to which of a set of categories
(subpopulations), a new observation belongs to, on the basis of a training set of data containing observations
and whose categories membership is known.
Example: Before starting any project, we need to check its feasibility. In this case, a classifier is required to
predict class labels such as ‘Safe’ and ‘Risky’ for adopting the Project and to further approve it. It is a two-
step process such as:
1. Learning Step (Training Phase): Construction of Classification Model
Different Algorithms are used to build a classifier by making the model learn using the training set
available. The model has to be trained for the prediction of accurate results.
2. Classification Step: Model used to predict class labels and testing the constructed model on test
data and hence estimate the accuracy of the classification rules.
Test data are used to estimate the accuracy of the classification rule
Classification Works:
With the help of the bank loan application that we have discussed above, let us understand the working of
classification. The Data Classification process includes two steps −
 Building the Classifier or Model

 Using Classifier for Classification
Building the Classifier or Model
 This step is the learning step or the learning phase.
 In this step the classification algorithms build the classifier.
 The classifier is built from the training set made up of database tuples and their associated class
labels.
 Each tuple that constitutes the training set is referred to as a category or class. These tuples can
also be referred to as sample, object or data points.
Using Classifier for Classification

In this step, the classifier is used for classification. Here the test data is used to estimate the accuracy of
classification rules. The classification rules can be applied to the new data tuples if the accuracy is considered
acceptable.
Comparison of Classification and Prediction Methods

Here is the criteria for comparing the methods of Classification and Prediction −
 Accuracy − Accuracy of classifier refers to the ability of classifier. It predict the class label
correctly and the accuracy of the predictor refers to how well a given predictor can guess the value
of predicted attribute for a new data.
 Speed − This refers to the computational cost in generating and using the classifier or predictor.
 Robustness − It refers to the ability of classifier or predictor to make correct predictions from
given noisy data.
 Scalability − Scalability refers to the ability to construct the classifier or predictor efficiently;
given large amount of data.
 Interpretability − It refers to what extent the classifier or predictor understands.
Advantages:
 Mining Based Methods are cost-effective and efficient
 Helps in identifying criminal suspects
 Helps in predicting the risk of diseases
 Helps Banks and Financial Institutions to identify defaulters so that they may approve Cards, Loan,
etc.
Disadvantages:
Privacy: When the data is either are chances that a company may give some information about their customers
to other vendors or use this information for their profit.
Accuracy Problem: Selection of Accurate model must be there in order to get the best accuracy and result.
APPLICATIONS:
 Marketing and Retailing

 Manufacturing
 Telecommunication Industry
 Intrusion Detection
 Education System
 Fraud Detection
Generalapproachtosolveaclassificationproblem:
--A classification technique is a systematic approach to build classification models based on adataset.
--Examples are decision tree classifiers, rule-based classifiers, neural networks, support
vectormachinesand naïveBayes classifier.
--Each technique employs a learning algorithm to identify a model that best fits the
relationshipbetweentheattributesetand the class labelof theinput data.
--A training set consists of records whose class labels are known must be provided. The trainingtest
is used to build a classification model, which is applied to the test set. The test set consists ofrecords
whoseclass labelis unknown
--Evaluation of the performance of a classification model is based on the counts of test

recordscorrectlyand incorrectlypredicted bythe model.
--These countsaretabulatedinatableknownasconfusion matrix.
PredictedClass
Class1 Class0
Class=1 f11 f10
ActualClass Class=0 f01 f00
--Eachentryfijin thetable denotes thenumberof recordsfrom theclass ‘i’predicted to beofclass‘j’.
--Forexample,f01referstothenumberof records fromclass 0incorrectlypredictedasclass1.
--Based on the entries in the confusion matrix, the total number of correct predictions made
bythemodel is (f11+f00) andthetotal numberof incorrect predictions is(f01+f10).
--Although a confusion matrix provides the information needed to determine how well
aclassificationmodelperforms, summarizingthis informationwith asinglenumberwould
makeitmoreconvenient to comparethe performanceof different models.
--Thiscanbedoneusingaperformance metric.
--Accuracycanbe expressesas:
Accuracy= Numberofcorrectpredictions/Totalnumberofpredictions
. Accuracy=(f11+f00)/(f11+f10+f00+f01)
--Equivalently,Errorratecan beexpresses as:
Errorrate=Number ofwrongpredictions/Total number ofpredictions
. Errorrate =(f10+f01)/(f11+f10+f00+f01)
DecisionTreeInduction:
Decision tree induction is the learning of decision trees from class-labeled trainingtuples. A decision
tree is a flowchart-like tree structure, where each internal node (non leafnode) denotes a test on an
attribute, each branch represents an outcome of the test, and eachleaf node (or terminal node) holds a
class label. The topmost node in a tree is the root
node.Internalnodesaredenotedbyrectangles,andleafnodes are denotedbyovals.
“How are decision trees used for classification?” Given a tuple, X, for which
theassociatedclasslabelisunknown,theattributevaluesofthetuplearetestedagainstthedecisiontree.Apathistr
acedfromtheroottoaleafnode,whichholdstheclasspredictionforthattuple.Decisiontrees
caneasilybeconvertedtoclassification rules.
During tree construction, attribute selection measures are used to select the attributethat best
partitions the tuples into distinct classes. When decision trees are built, many of thebranches may
reflect noise or outliers in the training data. Tree pruning attempts to identifyand
removesuchbranches,withthegoalofimprovingclassificationaccuracyonunseendata.
Decisiontreeinductionisatechniqueusedforidentifyingunknownclass labels in classification.Thetopics

are:
--Workingof decisiontree
--Buildingadecisiontree
--Methodsforexpressingattributetestconditions
--Measuresforselectingthebest split
--Algorithmfordecisiontreeinductio
Workingof adecisiontree:
Thetreehas threetypesofnodes.
i) Aroot nodehas no incomingedgesand zero or moreoutgoingedges.
ii) Internalnodes,eachofwhichhasexactlyoneincomingedgeandtwoormoreoutgoingedges.
iii) Leaforterminalnodes,eachofwhichhasexactlyoneincomingedgeandnooutgoingedges.
Fig:Adecisiontreeformammalclassificationproblem
In this example, we are classifying whether vertebrate is a mammal or non-mammal. From

thisdecision tree, we can identify a new vertebrate as mammal or non-mammal. If the vertebrate
iscold-blooded, then it is a non-mammal. If the vertebrate is warm-blooded, then check the
nextnodegives berth.Ifitgives berth, then it isa mammal, else, non-mammal.
Fig:Classifyinganun labelledvertebrate
Buildingof adecisiontree:
--Therearevarious algorithmsdevisedfor constructingadecision tree.Theyare:
i) Hunt’salgorithm
ii) ID3(IterativeDichotomiser3)
iii) C4.5(Classification4.5)
iv) CART(ClassificationAlgorithmandRegressionTree)
--Thesealgorithms usually employ agreedy strategy thatgrows adecision tree by making aseries
of locally optimum decisions about which attribute to use for partitioning the data.
Onesuchalgorithm isHuntsalgorithm.
Hunt’salgorithm
--In Hunt’s algorithm, a decision tree is grown in a recursive fashion by partitioning the
trainingrecordsinto subsets.
--Let Dt be a set of training records that are associated with node t and y={y 1,y2,…,yc} be
theclasslabels.
--Therecursiveprocedureforhunt’salgorithm is as follows:
STEP1
If alltherecordsinDtbelongtosameclassyt, thentisaleafnodelabeled asyt.

STEP2
If Dt contains records that belong to more than one class, an attribute test condition is selected
topartition the records into smaller subsets. A child node is created for each outcome and
therecords in Dt are distributed based on the outcomes. The algorithm is then recursively applied
foreachnode.
6
Fig:Trainingsetforpredictingborrowerswhowilldefaultonloanpayments
--In the above data set, the class labels for all the 10 records are not same, so step 1 cannot
besatisfied.Weneed to construct thedecision treeusingstep 2.
--Theclasslabel hasmaximumnumber ofrecords with“no”.So, labelthenodeasfollows:
--Select one of the attribute as root node, say, home owner since home owner with entry
“yes”need not require any further splitting. There are 3 records with home owner =yes and
recordswithhomeowner=no.
--The records with home owner=yes are classified and we now need to classify other 7
recordsi.e., home owner=no. The attribute test condition can be applied either on marital status or
annualincome.
--Let us select marital status, where we apply binary split. Here marital status=married need
notrequirefurther splitting.
The records with marital status=married are classified and we now need to classify other
4recordsi.e.,home owner=no and marital status=single, divorced.
7
--The left out attribute is annual income. Here we select the range since it is a
continuousattribute.
--Nowtheother 4recordsarealsoclassified.
Additionalconditions areneededtohandlesomespecial cases:
i) It is possible for some of the child nodes created in step 2 to be empty; i.e., there are
norecords associated with these nodes. In such cases assign the same class label as
themajority class of training records associated with its parent node; i.e., in our
examplemajorityclass is no, soassign ‘no’forthe newrecord.
ii) If all the records in Dthave identical attributevalues but the class label is different
insuchcases,assign themajorityclass label.
Methodsforexpressingattributetest conditions:
Decisiontreeinductionalgorithms
mustprovideamethodforexpressinganattributetestconditionanditscorresponding
outcomesfordifferentattributetypes.
Thefollowing arethemethods forexpressingattribute test conditions. Theyare:
i) Binary attribute: The test condition for binary attribute generate two outcomes as
shownbelow:
ii) Nominal attributes: since a nominal attribute can have many values, its test
conditioncanbeexpressed in two waysas shown below:
8
For a multi way split, the number of outcomes depends on the number of
distinctvaluesfor the correspondingattribute.
Some algorithms, such as CART supports only binary splits. In such case we
canpartitionthe k-attributevalues into 2k-1-1 ways.
Forexample,marital statusisa3-attributevalue, wecansplitit in22-1-1;i.e., 3ways.
iii) Ordinal attribute:It can also produce binary or multi way splits. Ordinal attributevalues
can be grouped as long as the grouping does not violate the order property
oftheattributevalues.
In the above example, condition a and condition b satisfies order but condition
cviolatestheorder property.
iv) Continuous attributes: The test condition can be expressed as a comparison test
(A<v)or(A>=v)withbinaryoutcomes,orarangequerywithoutcomesoftheformvi<=A<vi
+1fori=1,2,…,k.
9
Measuresforselecting thebestsplit:
 An attribute selection measure is a heuristic for selecting the splitting criterion

that“best” separates a given data partition, D, of class-labeled training tuples into
individualclasses.
 If we were to split D into smaller partitions according to the outcomes of the
splittingcriterion, ideally each partition would be pure (i.e., all the tuples that fall
into a givenpartitionwouldbelongtothesameclass).
There aremanymeasures that can be used to determinethe best wayto split the records.
Let P(i|t) denote the fraction of records belonging to class i at a node t. the measures for
selectingthe best split are often based on the degree of impurity of the child nodes. The smaller
thedegree of impurity, the more skewed the class distribution. For example, a node with
classdistribution (0,1) has zero impurity, whereas a node with uniform class distribution (0.5,0.5)
hasthehighest impurity.
Examplesof impuritymeasures include:
The 3 measures attain maximum values when the class distribution is uniform and
minimumwhenall the records belongto sameclass.
Compare the degree of impurity of the parent node with the degree of impurity of the child
node.The larger their difference, the better thetest condition. The gain, ∆, isa criterion thatcan
10
beusedto determinethegoodness of asplit.
11
Where I(.) is the impurity measure of a given node, N is the total number of records at the
parentnode, k is the attribute values and N(v j) is the number of records associated with node v j.
whenentropyis used asimpuritymeasurethe differencein entropyis knownasinformation gain,
∆info.
Splittingofbinaryattributes
Suppose there are two ways to split the data into smaller subsets, say, A and B. before
splittingtheGINIindexis 0.5 sincethereareequal number ofrecords fromboth theclasses.
ForattributeA,
For node N1, the GINI index is 1-
[(4/7)2+(3/7)2]=0.4898Fornode N2, theGINIindexis 1-
[(2/5)2+(3/5)2]=0.48
TheaverageweightedGINIindex is (7/12)(0.4898)+(5/12)(0.48)=0.486
ForattributeB,theaverageweightedGINIindexis0.375,sincethesubsetsforattributeBhavesmallerGI
NIindexthan A, attributeBis preferable.
Splittingofnominalattributes
12
A nominal attributecan produceeitherbinaryor multi waysplit.
The computation of GINI index is same as for binary attributes. The smaller the average
GINIindex is the best split. In our example, multi way split has the lowest GINI index, so it is the
bestsplit.
Splittingofcontinuousattributes
Inordertosplitacontinuousattribute,weselecta range.
Inourexample,thesortedvaluesrepresentstheascendingorderofdistinctvaluesincontinuousattribute.
Splitpositions representmeanbetweentwoadjacentsortedvalues.
Calculate the GINIindexforeverysplitpositionandthe smaller

GINIindexsplitpositioncanbechosenas therangeforcontinuous attribute
Algorithmfordecision treeinduction:
13
i) The create node() function extends the decision tree by creating anew node. A node
inthedecisiontreehaseitheratestcondition,denotedasnode.test_cond,oraclasslabel,denot
ed as node.label.
ii) The find.best_split () function determines which attribute should be selected as the
testconditionforsplittingthe trainingrecords.
iii) Theclassify()functiondeterminestheclasslabeltobeassignedtoaleafnode.
iv) The stopping_cond() function is used to terminate the tree-growing process by
testingwhetherall therecords areclassified or not.
ModelOverfitting:
--Theerrors committedbya classificationmodel aregenerallydividedintotwotypes:
i) Trainingerrors
ii) Generalizationerrors
--Training errors is the number of misclassification errors committed on training records.

Forexample, a record in test data is already existed in training data, but the class label is
wronglypredicted.This typeoferrors is known as trainingerrors.
--Generalization errors is the expected error of the model. For example, the class label for
therecord in the test data is known but it is wrongly predicted. This type of errors is known
asGeneralizationerrors.
--Agoodmodelshouldhavelowtrainingerrorsaswellaslow testingerrors.
--Thetrainingandtesterrorratesarelargewhenthesizeofthetreeisverysmall.Thissituationisknown
asmodel underfitting.
14
--When the tree becomeslarge, the testerrorrate increasesandtraining error rate
decreases.Thissituationis known asmodel overfitting.
--Inthebelow twotrees,thetreewithlessnodeshashightrainingerrors andlesstesterrors.
Overfittingdueto presenceof noise:
--Considerthetrainingandtestsetsforthemammalclassificationproblem.Twoofthetenrecords
aremislabeled.Batsandwhales areclassified asnonmammalsinsteadofmammals.
--Thedecision tree fortheabovedataset is
15
--The classlabelfor {name=’human’,body-temperature=’warm-blooded’,givesberth=’yes’,four-
legged=’no’, hibernates=’no’} is non-mammals from above decision tree. But humans
aremammals. Theprediction is wrongdueto presenceof noise in data.
--So,changetheclasslabelsbat andwhale. Thedecisiontreeisredrawnasfollows:
--Afterremovingthenoise, thepredictions areright.
Overfittingduetolack of representativesamples:
--Ifthenumbers ofrecordsin trainingdata setareless, then therearemoretest errors.
Fig:trainingdata
--Thedecision treefortheabovetrainingdata is asfollows:
16
Fig:decision
tree
Fig:testset
--From the above decision tree, humans, elephants and dolphins are misclassified since tree
isconstructedwith less numberof records.
17
Evaluatingtheperformanceofaclassifier:
--A classification algorithm should be judged before using it for real time data. The accuracy
anderror rate is judged by finding the class labels of test sets whose class labels are already
known inadvance.
--Thefollowingmethodsareusedforevaluatingtheperformanceof a classifier:
--Holdoutmethod
--Randomsubsampling
--Cross-validation
--Bootstrap
HoldoutMethod:
In this method the original data sets is divided into two parts, 50% or 2/3 rdof original data
isconsidered astraining sets and another 50% or 1/3 rd of original data astest sets
respectively.Now, the classification model is trained on training tests and then applied on test
sets.
Theperformanceoftheclassificationalgorithmisbasedonnumberofcorrectpredictionsmadeonthe
test set.
Limitations
1) Lessnumberofsamplesfortraining(sincetheoriginalsamplesarespitted)
2) Themodel is highlydependent onthecomposition of thetrainingand test sets
RandomSampling:
Multiple repetition of holdout method is known as random sampling. Here the original data
isdivided randomly into training sets and test sets and the accuracy is calculated as in
holdoutmethod. This random sampling is then repeated k times and the accuracy is calculated
foreachtime. Theoverall accuracyis:
--Hereacciis the model accuracyduringith iteration
18
--Hereacciis the model accuracyduringi th iteration
Limitations
1) Lessnumberofsamplesfortraining(sincetheoriginalsamplesarespitted)
2) Arecord maybeused morethan oncein trainingand test tests.
Cross-Validation:
--Therearethreevariationsofcross-validationapproach
a) Twofold cross validation
In this approach data is partitioned into two parts. The first part is considered
as training setand the second part as test set. Now they are swapped and the
first part is considered as testsetand second oneas trainingset. Thetotalerror is
thesum of both theerrors.
b) K-fold crossvalidation
In this approach the data is partitioned into k subsets. One of the partitions is
considered astest set and remaining sets are considered as training set. This
process is repeated k times andthe totalerror is the sum of all the k runs.
c) Leave-one-outapproach
In this approach one record is considered as test set and rest of the samples are
considered astraining set. This process is repeated k times (k= number of
records) and the total error is thesumof all the k runs. Butthis process is
computationallyveryexpensive.
Bootstrap
In this approach a record may be sampled more than once. Means a record when
sampled isagain kept back in the original data. So it is likely that the record may be
sampled again andagain. Consider original data of size N. The probability of a
record to be chosen as bootstrapsample is 1-(1-1/N) N.When the Size of N is very
large then the probability is 1-e-1. Thesamplingis repeatedBtimes to generateb
bootstrap samples.
Classification:AlternativeTechniques:
BayesianClassification:
 Bayesianclassifiersarestatisticalclassifiers.
 Theycanpredictclass membershipprobabilities,
suchastheprobabilitythatagiventuple belongs toaparticularclass.
 BayesianclassificationisbasedonBayes’theorem.
19
Bayes’Theorem:
 LetXbeadatatuple. InBayesianterms,
Xisconsidered―“evidence”anditisdescribedbymeasurementsmade ona
setofnattributes.
 LetHbesomehypothesis,suchasthat thedatatupleXbelongstoaspecified classC.
 Forclassificationproblems,wewanttodetermineP(H|
X),theprobabilitythatthehypothesisH holds
giventhe―evidence‖orobserveddatatupleX.
 P(H|X)istheposterior probability, oraposterioriprobability, ofHconditionedonX.
 Bayes’theoremisusefulinthatitprovidesawayofcalculatingtheposterior
𝑷𝑿𝑯𝑷(𝑯)
probability,P(H|X),fromP(H),P(X|H),andP(X).
𝑷𝑯𝑿=
𝑷(𝑿)
NaïveBayesianClassification:
ThenaïveBayesianclassifier,orsimpleBayesianclassifier,worksasfollows:
1. Let=be a training set of tuples and their associated class labels. As usual, each
tuple isrepresented by an n-dimensional attribute vector, X = (x1, x2, …,xn),
depicting nmeasurementsmadeonthetuplefromnattributes,respectively,A1,A2,
…,An.
2. Suppose that there are m classes, C1, C2, …, Cm. Given a tuple, X, the
classifier
willpredictthatXbelongstotheclasshavingthehighestposteriorprobability,conditi
onedonX.Thatis,thenaïveBayesianclassifierpredictsthattupleXbelongstotheclass
𝑷𝑪𝒊𝑿 >𝑃𝑪𝒋𝑿𝒇𝒐𝒓𝟏<𝒋<𝒎,𝒋≠𝒊.
Ciifandonlyif
Thuswemaximize(𝐶𝑗|𝑋).TheclassCiforwhich(𝐶𝑗|𝑋).ismaximizediscalledthe
maximumposteriorihypothesis.
3. As P(X) is constant for all classes, only P(X|Ci)P(Ci) need be maximized. If
the classprior probabilities are not known, then it is commonly assumed that
the classes areequally likely, that is, P(C1) = P(C2) = …= P(Cm), and we
would therefore maximizeP(X|Ci).Otherwise,wemaximizeP(X|Ci)P(Ci).
4. Givendatasetswithmanyattributes,itwouldbeextremelycomputationallyexpensiv
eto compute P(X|Ci). In order to reduce computation in evaluating P(X|Ci), the
naiveassumption of class conditional independence is made. This presumes
that the valuesof the attributes are conditionally independent of one another,
𝒏
given the class label ofthetuple.Thus,
𝑷𝑿𝑪𝒊=𝑷𝒙𝒌𝑪𝒊
𝒌=𝟏
=𝑷𝒙𝟏𝑪𝟏×𝑷𝒙𝟐𝑪𝟐×……. .×𝑷𝒙𝒏𝑪𝒊
20
5. WecaneasilyestimatetheprobabilitiesP(x1|Ci),P(x2|Ci), :::,P(xn|
Ci)fromthetrainingtuples.
6. For eachattribute,welook at
whethertheattributeiscategoricalorcontinuous-valued.Forinstance,to
computeP(X|Ci),weconsiderthefollowing:
 If Ak is categorical, then P(xk|Ci) is the number of tuples of class Ci
in=havingthevaluexkforAk,dividedby|Ci,D| the
numberoftuplesofclassCiin D.
 If Ak is continuous-valued, then we need to do a bit more work, but the
calculation isprettystraightforward.
Example:
age income stu credit_rating buys_computer

de
nt
youth high no fair no
youth high no excellent no
middle_aged high no fair yes
senior medium no fair yes
senior low yes fair yes
senior low yes excellent no
middle_aged low yes excellent yes
youth medium no fair no
youth low yes fair yes
senior medium yes fair yes
youth medium yes excellent yes
middle_aged medium no excellent yes
middle_aged high yes fair yes
senior medium no excellent no
We wish to predict the class label of a tuple using naïve Bayesian

classification,
giventhesametrainingdataabove.ThetrainingdatawereshownaboveinTable.Thedatatuple
saredescribed bythe attributesage,income,student,andcreditrating.The classlabel
attribute,buyscomputer,hastwodistinctvalues(namely,{yes,no}).Let
C1correspondtotheclass buys computer=yes and C2 correspond to buys computer=no.
The tuple we wish toclassifyis
X={age= “youth”, income= “medium”, student= “yes”, credit_rating=“fair”}
21
WeneedtomaximizeP(X|Ci)P(Ci), fori=1,2.
P(Ci),thepriorprobabilityofeachclass,canbe computedbasedonthetrainingtuples:
P(buyscomputer = yes)= 9/14= 0.643

P(buyscomputer=no)=5/14=0.357
TocomputeP(X|Ci),fori=1,2, wecomputethefollowingconditionalprobabilities:
P(age = youth|buyscomputer = =2/9 =0.222

yes)P(income=medium | buys =4/9 =0.444
computer=yes)P(student=yes | buys =6/9 =0.667
computer=yes)P(creditrating=fair| =6/9 =0.667
buyscomputer=yes)
P(age=youth |buyscomputer=no) =3/5 =0.600

P(income=medium|buyscomputer=no) =2/5 =0.400
P(student=yes|buyscomputer=no) =1/5 =0.200
P(creditrating=fair|buyscomputer=no) =2/5 =0.400
Usingtheseprobabilities,weobtain
P(X|buyscomputer=yes) =P(age=youth| buyscomputer=yes)
×P(income=medium| buyscomputer=yes)
×P(student=yes| buyscomputer=yes)
×P(credit rating=fair|buyscomputer=yes)
=0.222×0.444×0.667×0.667=0.044.
Similarly,
P(X|buys computer=no)=0.600×0.400×0.200× 0.400=0.019.
Tofindtheclass, Ci,thatP(X|Ci)P(Ci), wecompute

P(X| buys computer=yes)P(buyscomputer=yes)= 0.044×0.643=0.028
P(X | buys computer=no) P(buys computer=no) = 0.019 × 0.357 =
0.007Therefore,thenaïveBayesianclassifierpredictsbuyscomputer=yesfortupleX.
22
BayesianBeliefNetworks
Bayesian Belief Networks

The conditional independence assumption made by naıve Bayes classifiers may seem too
rigid, especially for classification problems in which the attributes are somewhat correlated.
This section presents a more flexible approach for modeling the class-conditional
probabilities P(X|Y ). Instead of requiring all the attributes to be conditionally independent
given the class, this approach
allows us to specify which pair of attributes are conditionally independent. We begin with a
discussion on how to represent and build such a probabilistic model, followed by an example
of how to make inferences from the model.
23

Unit 3 (DWDM)

Uploaded by

Document Informationclick to expand document informationdata

Document Informationclick to expand document information

Copyright:

Available Formats

Unit 3 (DWDM)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 3 (DWDM)

Uploaded by

Copyright:

Available Formats

Basic Concept of Classification :

 Building the Classifier or Model

Using Classifier for Classification

Comparison of Classification and Prediction Methods

 Marketing and Retailing

--Evaluation of the performance of a classification model is based on the counts of test

Decisiontreeinductionisatechniqueusedforidentifyingunknownclass labels in classification.Thetopics

In this example, we are classifying whether vertebrate is a mammal or non-mammal. From

--Therearevarious algorithmsdevisedfor constructingadecision tree.Theyare:

If alltherecordsinDtbelongtosameclassyt, thentisaleafnodelabeled asyt.

--Theclasslabel hasmaximumnumber ofrecords with“no”.So, labelthenodeasfollows:

Additionalconditions areneededtohandlesomespecial cases:

 An attribute selection measure is a heuristic for selecting the splitting criterion

Examplesof impuritymeasures include:

For node N1, the GINI index is 1-

[(4/7)2+(3/7)2]=0.4898Fornode N2, theGINIindexis 1-

Calculate the GINIindexforeverysplitpositionandthe smaller

--Theerrors committedbya classificationmodel aregenerallydividedintotwotypes:

--Training errors is the number of misclassification errors committed on training records.

--Inthebelow twotrees,thetreewithlessnodeshashightrainingerrors andlesstesterrors.

Overfittingdueto presenceof noise:

--Thedecision tree fortheabovedataset is

--So,changetheclasslabelsbat andwhale. Thedecisiontreeisredrawnasfollows:

--Afterremovingthenoise, thepredictions areright.

--Ifthenumbers ofrecordsin trainingdata setareless, then therearemoretest errors.

--Hereacciis the model accuracyduringith iteration

a) Twofold cross validation

age income stu credit_rating buys_computer

We wish to predict the class label of a tuple using naïve Bayesian

P(buyscomputer = yes)= 9/14= 0.643

P(age = youth|buyscomputer = =2/9 =0.222

P(age=youth |buyscomputer=no) =3/5 =0.600

Tofindtheclass, Ci,thatP(X|Ci)P(Ci), wecompute

Bayesian Belief Networks

You might also like