Module 3
Module 3
• Accuracy
– The accuracy of a classifier refers to the ability of a given
classifier to correctly predict the class label of new or
previously unseen data.
– the accuracy of a predictor refers to how well a given
predictor can guess the value of the predicted attribute for
new or previously unseen data.
• Speed and scalability
– refers to the computational costs involved in generating
and using the given classifier or predictor.
(time to construct the model, time to use the model)
• Robustness
– Refers to the ability to make correct predictions given noisy
data and missing values
• Scalability
– Ability to construct classifier or predictor efficiently given
large amounts of data
• Interpretability:
– Level of understanding and insight provided by the
model(classifier or predictor)
Classification by Decision Tree Induction
• Decision Tree Induction - learning of decision trees from
class-labeled training tuples
• Decision tree
– A flow-chart-like tree structure
– Internal node denotes a test on an attribute
– Branch represents an outcome of the test
– Leaf nodes represent class labels or class distribution
• Decision tree generation consists of two phases
– Tree construction
• At start, all the training examples are at the root
• Partition examples recursively based on selected
attributes
– Tree pruning
• Identify and remove branches that reflect noise or
outliers
• Use of decision tree: Classifying an unknown sample
– Test the attribute values of the sample against the decision
tree
• How are decision trees used for classification?
• Given a tuple, X, for which the associated class label is
unknown, the attribute values of the tuple are tested against
the decision tree.
• A path is traced from the root to a leaf node, which holds the
class prediction for that tuple.
• Decision trees can easily be converted to classification rules.
Why are decision tree classifiers so popular?
• Decision Tree
– construction does not require any domain knowledge or
parameter setting
– It is appropriate for exploratory knowledge discovery.
– It can handle high dimensional data.
– The learning and classification steps of decision tree
induction are simple and fast.
– good accuracy.
– application areas:- medicine, manufacturing and
production, financial analysis, astronomy, molecular
biology etc.
• In early 1980’s , J. Ross Quinlan, a researcher in machine
learning, developed a decision tree algorithm known as ID3
(Iterative Dichotomiser).
• Quinlan later presented C4.5 (a successor of ID3).
• In 1984, a group of statisticians (L. Breiman, J. Friedman, R.
Olshen, and C. Stone) published the book Classification and
Regression Trees (CART), which described the generation of
binary decision trees.
• ID3 and CART were invented independently of one another at
around the same time, yet follow a similar approach for learning
decision trees from training tuples.
• ID3, C4.5, and CART adopt a greedy (nonbacktracking) approach
in which decision trees are constructed in a top-down recursive
divide-and-conquer manner.
Algorithm for Decision Tree Induction
• Basic algorithm (a greedy algorithm)
– Tree is constructed in a top-down recursive divide-and-conquer manner
– At start, all the training examples are at the root
– Attributes are categorical (if continuous-valued, they are discretized in
advance)
– Examples are partitioned recursively based on selected attributes
– Test attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)
• Conditions for stopping partitioning
– All samples for a given node belong to the same class
– There are no remaining attributes for further partitioning – majority
voting is employed for classifying the leaf
– There are no samples left
Basic algorithm for inducing a decision tree from training tuples
Algorithm: Generate decision tree. Generate a decision tree from the training tuples
of data partition D.
• Input: Data partition, D, which is a set of training tuples and their associated class
labels;
• attribute list, the set of candidate attributes;
• Attribute selection method, a procedure to determine the splitting criterion that
“best” partitions the data tuples into individual classes. This criterion consists of a
splitting attribute and, possibly, either a split point or splitting subset.
• Output: A decision tree.
• Method:
• (1) create a node N;
• (2) if tuples in D are all of the same class, C then
• (3) return N as a leaf node labeled with the class C;
• (4) if attribute list is empty then
• (5) return N as a leaf node labeled with the majority class in D; // majority
voting
Basic algorithm for inducing a decision tree from training tuples(contd.)
• (6) apply Attribute selection method(D, attribute list) to find the “best” splitting
criterion;
• (7) label node N with splitting criterion;
• (8) if splitting attribute is discrete-valued and
multiway splits allowed then // not restricted to binary trees
• (9) attribute list ==attribute list - splitting attribute; // remove splitting attribute
• (10) for each outcome j of splitting criterion
• // partition the tuples and grow subtrees for each partition
• (11) let Dj be the set of data tuples in D satisfying outcome j; // a partition
• (12) if Dj is empty then
• (13) attach a leaf labeled with the majority class in D to node N;
• (14) else attach the node returned by Generate decision tree(Dj, attribute list)
to node N;
• endfor
• (15) return N;
Attribute Selection Measure
• A heuristic for selecting the splitting criterion that “best”
separates a given data partition, D, of class-labeled training
tuples into individual classes.
• Also known as splitting rules because they determine how the
tuples at a given node are to be split.
• Provides a ranking for each attribute describing the given training
tuples.
• The attribute having the best score for the measure is chosen as
the splitting attribute for the given tuples.
• If the splitting attribute is continuous-valued or if we are
restricted to binary trees then either a split point or a splitting
subset must also be determined as part of the splitting criterion.
• three popular attribute selection measures—
– Information gain, gain ratio, and gini index.
Information Gain
• ID3 uses Information Gain as attribute selection method.
• Node N hold(represent) the tuples of partition D.
• The attribute with the highest information gain is chosen as
the splitting attribute for node N. This attribute minimizes
the information needed to classify the tuples in the resulting
partitions
• To partition tuples in D on some attribute A having v distinct
values {a1,a2,……..av},
• If A is discrete valued, these values correspond to v outcomes
of a test on A.
• Attribute A can be used to split D into v partitions
{D1,D2,…….,Dv}, where Dj contains tuples in D having
outcome aj of A.
• These partitions correspond to the branches grown from
node N.
• If attribute a is continuous-valued, we must determine the
best split point for A.
These partitions may be impure. (ie. A partition may contain
tuples from different classes rather than from a single class)
Information Gain-Continuous valued attribute
• determine the “best” split-point for A
• sort the values of A in increasing order.
• the midpoint between each pair of adjacent values is a
possible split-point.
• given v values of A, then v-1 possible splits values.
• the midpoint between ai and ai+1 of A is