Supervised Learning Algorithm
Supervised Learning Algorithm
Supervised learning
Classification Vs predictions
• Supervised learning may be used for classification and
prediction
• Classification:
– predicts categorical class labels (discrete or nominal)
e.g. Yes and NO, Treatment A, B, C, Safe and risky etc.
• Typical Applications
– credit approval
– target marketing
– medical diagnosis
– treatment effectiveness analysis
Classification Process steps
• Model construction
Training phase
Classification
Algorithms
Training
Data
Unseen Data
NAME RANK experienc Senior
e
Rk Assistant 2 no
Professor
(RP, Professor, 4)
Dimple Associate 7 no
KV Professor 8 yes Senior ?
Rajesh Assistant 7 no
Evaluating Classification Methods
• Accuracy
• Speed
– time to construct the model
– time to use the model
• Robustness
– handling noise and missing values
• Scalability
– efficiency in disk-resident databases
• Interpretability
– understanding and insight provided by the model
• Goodness of rules
– decision tree size
– compactness of classification rules
Example Dataset
Age income student Credit_rating Buys_Computer
Youth high no fair no
youth high no excellent no
Middle_aged high no fair yes
senior medium no fair yes
senior low yes fair yes
senior low yes excellent no
Middle_aged low yes excellent yes
youth medium no fair no
youth low yes fair yes
senior medium yes fair yes
youth medium yes excellent yes
Middle_aged medium no excellent yes
Middle_aged high yes fair yes
senior medium no excellent no
Decision Trees
Decision tree
• A flow-chart-like tree structure
• Internal node denotes a test on an attribute
• Branch represents an outcome of the test
• Example
age?
no yes no yes
Decision Trees cont..
Decision tree induction
– Many Algorithms
• Hunt’s Algorithm
• – One of the earliest methods
• ID3 and its successor, C4.5 , CART ,SLIQ and SPRINT
• Greedy strategy
• Split the records based on an attribute test that optimizes a certain
criterion
• Issues
• Determine how to split the records
• How to specify the attribute test condition?
• How to determine the best split?
• Determine when to stop splitting
Algorithm for Decision Tree Induction
• Discrete
• Continuous
• Binary split
• Multi-way split
Algorithm for Decision Tree Induction cont..
Algorithm for Decision Tree Induction cont..
Information gain
• There are two distinct classes (that is, m = 2). Let class C1
correspond to yes and class C2 correspond to no
Gini Index(CART)
• Consider binary split for each attributes
• Gini index measures impurities in D as
Advantages:
• Inexpensive to construct(does not required parameter setting
or domain knowledge)
• Extremely fast at learning step and also for classifying
unknown records
• Easy to interpret for small-sized trees
Introduction
• These are statistical classifiers
• P(X/H) P(H)
P(H/X)=
P(X)
• P(H), P(X/H), and P(X) may be estimated from the given data
Naïve Bayes Classifier
• Let D be a training set of tuples and their associated class
labels
• Suppose that there are m classes, C1, C2, …. , Cm. Given tuple
X , naïve Bayesian classifier predicts that tuple X belongs to
the class Ci if and only if
• P(Ci/X) > P(Cj/X) for 1≤ j≤ m; j ≠ i
• Thus we maximize P(Ci/X). The class Ci for which P(Ci/X) is
maximized is called the maximum posteriori hypothesis. By
Bayes’ theorem
• P(X/Ci)P(Ci)
• P(Ci/X) =
P(X)
Naïve Bayes Classifier cont..
• As P(X) is constant for all classes, only P(X/Ci)P(Ci) need be
maximized
• Classes are equally likely, that is, P(C1) = P(C2) = = P(Cm),
and we would therefore maximize P(X/Ci). Otherwise, we
maximize P(X/Ci)P(Ci)
• P(Ci)=|Ci,D|/|D|,where |Ci,D| is the number of training tuples
of class Ci in D.
• A simplified assumption: attributes are conditionally
independent so
n
• ∏ P(xk/Ci)
• P(X/Ci) = k=1
=P(x1/Ci)P(x2/Ci) P(xn/Ci)
• Here , xk refers to the value of attribute Ak for tuple X.
Naïve Bayes Classifier cont..
• V’= v-MinA/MaxA-MinaA
• Many methods e.g. Same color distance zero and one for
different
KNN cont..
• For missing value of either one or both variable distance be
taken as 1
Introduction
• Regression analysis is a statistical technique used to describe
relationships among variables
y(Salary) 30 57 64 72 36 43 59 90 20 83
y = 23.6+3.5x.
• Example
• Say, 2 predictor variables or attributes, A1, A2 describing a
tuple, X then multiple linear regression model will be as
• y = w0+w1x1+w2x2,
• Least squares can be extended to solve but equations are
tedious to solve
• Statistical software packages such as SPSS etc.. Used to solve
Nonlinear Regression
• To model data that does not show a linear dependence
• e.g. Response variable and predictor variable relationship
can be modeld by polynomial function
• y = w0+w1x+w2x²+w3x³
• 1
• 𝜎=
1+e−z
Logistic regression cont..
• Its limits as z tends to ∞, σ ( z) tends to 1, because e−z tends to 0.
• So it is monotonic function.
• y= 1
1+e− (w0+w1 x)
Logistic regression cont..
Decision boundary
• Classifier give us a set of outputs or classes based on
probability score between 0 and 1
• E.g we have classes 1- Red and 0- orange
• We decide a threshold value above which we classify values
into class red and below which in class orange