Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
3/1/2012




                                     Outline
     Introduction
     Basic Algorithm for Decision Tree Induction
     Attribute Selection Measures
       – Information Gain
       – Gain Ratio
       – Gini Index
     Tree Pruning
     Scalable Decision Tree Induction Methods
                                                                                      2




                               1. Introduction
                           Decision Tree Induction
    The Decision Tree is one of the most powerful and popular classification and
    prediction algorithms in current use in data mining and machine learning. The
    attractiveness of decision trees is due to the fact that, in contrast to neural
    networks, decision trees represent rules. Rules can readily be expressed so that
    humans can understand them or even directly used in a database access language
    like SQL so that records falling into a particular category may be retrieved.

•   A decision tree is a flowchart classifier like tree structure, where

    – each internal node (non-leaf node, decision node) denotes a test on an attribute

    – each branch represents an outcome of the test

    – each leaf node (or terminal node) indicates the value of the target attribute

    (class) of examples.
                                                                                      3
    – The topmost node in a tree is the root node




                                                                                                1
3/1/2012




 A decision tree consists of nodes and arcs which connect nodes. To make a
    decision, one starts at the root node, and asks questions to determine
    which arc to follow, until one reaches a leaf node and the decision is made.

 How are decision trees used for classification?
 – Given an instance, X, for which the associated class label is unknown
 – The attribute values of the instance are tested against the decision tree
 – A path is traced from the root to a leaf node, which holds the class prediction
    for that instance.
 Applications
    Decision tree algorithms have been used for classification in many
    application areas, such as:
    – Medicine
    – Manufacturing and production
    – Financial analysis
    – Astronomy
    – Molecular biology.                                                     4




• Advantages of decision tree
– The construction of decision tree classifiers does not parameter
   setting.
– Decision trees can handle high dimensional data.
– Easy to interpret for small-sized trees
– The learning and classification steps of decision tree induction
   are simple and fast.
– Accuracy is comparable to other classification techniques for
   many simple data sets
– Convertible to simple and easy to understand classification rules




                                                                             5




                                                                                           2
3/1/2012




                 2. Basic Algorithm
              Decision Tree Algorithms
• ID3 algorithm
• C4.5 algorithm
   - A successor of ID3
  – Became a benchmark to which newer supervised learning
  algorithms are often compared.
  – Commercial successor: C5.0
• CART (Classification and Regression Trees) algorithm
  – The generation of binary decision trees
  – Developed by a group of statisticians


                                                            6




                     Basic Algorithm
• Basic algorithm ,[ID3, C4.5, and CART], (a greedy algorithm)
– Tree is constructed in a top-down recursive divide-and-
   conquer manner
– At start, all the training examples are at the root
– Attributes are categorical (if continuous-valued, they are
   discretized in advance)
– Examples are partitioned recursively into smaller subsets as
   the tree is being built based on selected attributes
– Test attributes are selected on the basis of a statistical
   measure (e.g., information gain)




                                                            7




                                                                       3
3/1/2012




                                                      ID3 Algorithm
 function ID3 (I, 0, T) {
      /* I is the set of input attributes (non-target attributes)
      * O is the output attribute (the target attribute)
      * S is a set of training data
      * function ID3 returns a decision tree */
 begin
      if (S is empty) {
      return a single node with the value "Failure";
      }
      if (all records in S have the same value for the target attribute O) {
      return a single leaf node with that value;
      if (I is empty) {
      return a single node with the value of the most frequent value of
      O that are found in records of S;
 /* Note: some elements in this node will be incorrectly classified */
      }
 /* now handle the case where we can’t return a single node */
      compute the information gain for each attribute in I relative to S;
      let A be the attribute with largest Gain(A, S) of the attributes in I;
      }
          let {aj| j=1,2, .., m} be the values of attribute A;
          let {Sj| j=1,2, .., m} be the subsets of S when S is partitioned according the value of A;
      Return a tree with the root node labeled A and
      arcs labeled a1, a2, .., am, where the arcs go to the
      trees ID3(I-{A}, O, S1), ID3(I-{A}, O, S2), .., ID3(I-{A}, O, Sm);
 Recursively apply ID3 to subsets {Sj| j=1,2, .., m}         until they are empty

 end }                                                                                                 8




               3.Attribute Selection Measures

Which is the best attribute?
  – Want to get the smallest tree
  – choose the attribute that produces the “purest”
      nodes
Three popular attribute selection measures:
  – Information gain
  – Gain ratio
  – Gini index

                                                                                                       9




                                                                                                                 4
3/1/2012




                         Information gain
• The estimation criterion in the decision tree algorithm is the
     selection of an attribute to test at each decision node in the
     tree.

• The goal is to select the attribute that is most useful for
     classifying examples. A good quantitative measure of the
     worth of an attribute is a statistical property called information
     gain that measures how well a given attribute separates the
     training examples according to their target classification.

•     This measure is used to select among the candidate
     attributes at each step while growing the tree.               10




Entropy - a measure of homogeneity of the set of examples
    • In order to define information gain precisely, we need to
      define a measure commonly used in information theory,
      called entropy (Expected information, info(),).
    • Given a set S, containing only positive and negative
      examples of some target concept (a 2 class problem), the
      entropy of set S relative to this simple, binary classification
      is defined as:

             Info(S) =

    • where pi is the proportion of S belonging to class i. Note the
      logarithm is still base 2 because entropy is a measure of the
      expected encoding length measured in bits.
    • In all calculations involving entropy we define 0log0 to be 0.
                                                                  11




                                                                                5
3/1/2012




• For example, suppose S is a collection of 25 examples, including 15
  positive and 10 negative examples [15+, 10-]. Then the entropy of
  S relative to this classification is :

   Entropy(S) = - (15/25) log2 (15/25) - (10/25) log2 (10/25) = 0.970

• Notice that the entropy is 0 if all members of S belong to the same
  class. For example,
  Entropy(S) = -1 log2(1) - 0 log20 = -1 0 - 0 log20 = 0.
• Note the entropy is 1 (at its maximum!) when the collection
  contains an equal number of positive and negative examples.
• If the collection contains unequal numbers of positive and
  negative examples, the entropy is between 0 and 1. Figure 1
  shows the form of the entropy function relative to a binary
  classification, as p+ varies between 0 and 1.                    12




             Figure 1: The entropy function relative to a binary classification, as the proportion of positive
                                          examples pp varies between 0 and 1.



Entropy of S = Info(S)

-The average amount of information needed to identify the class label of an
        instance in D.
- A measure of the impurity in a collection of training examples
- The smaller information required, the greater the purity of the partitions.

                                                                                                                 13




                                                                                                                            6
3/1/2012




•   Information gain measures the expected reduction in entropy caused by
    partitioning the examples according to this attribute.

•   The information gain, Gain (S, A) of an attribute A, relative to a collection of
    examples S, is defined as




                    = info (S) – infoA (s)

                    = information needed before splitting – information needed after splitting
•   where Values(A) is the set of all possible values for attribute A, and Sv is
    the subset of S for which attribute A has value v (i.e., Sv = {s  S | A(s) = v
    }). Note the first term in the equation for Gain is just the entropy of the
    original collection S and the second term is infoA (S), the expected value of
    the entropy after S is partitioned using attribute A
                                                                                           14




     An example: Weather Data
    The aim of this exercise is to learn how ID3 works. You will do this by building a
    decision tree by hand for a small dataset. At the end of this exercise you should
    understand how ID3 constructs a decision tree using the concept of Information
    Gain. You will be able to use the decision tree you create to make a decision
    about new data.




                                                                                           15




                                                                                                       7
3/1/2012




  •   In this dataset, there are five categorical attributes outlook, temperature,
      humidity, windy, and play.
  •   We are interested in building a system which will enable us to decide
      whether or not to play the game on the basis of the weather conditions, i.e.
      we wish to predict the value of play using outlook, temperature, humidity,
      and windy.

  •   We can think of the attribute we wish to predict, i.e. play, as the output
      attribute, and the other attributes as input attributes.

  •   In this problem we have 14 examples in which:

      9 examples with play= yes and 5 examples with play = no

      So, S={9,5}, and

  Entropy(S) = info (S) = info([9,5] ) = Entropy(9/14, 5/14)

              = -9/14 log2 9/14 – 5/14 log2 5/14 = 0.940

                                                                                           16




consider splitting on Outlook attribute

Outlook = Sunny
info([2; 3]) = entropy(2/5 ; 3/5 ) = -2/5 log2 2/5
                                     - 3/5 log2 3/5 = 0.971 bits

Outlook = Overcast
info([4; 0]) = entropy(4/4,0/4) = -1 log2 1 - 0 log2 0 = 0 bits

Outlook = Rainy
info([3; 2]) = entropy(3/5,2/5)= - 3/5 log2 3/5 – 2/5 log2 2/5 =0.971 bits

So, the expected information needed to classify objects in all sub trees of the
Outlook attribute is :

info outlook (S) = info([2; 3]; [4; 0]; [3; 2]) = 5/14 * 0.971 + 4/14 * 0 + 5/14 * 0.971
                 = 0.693 bits


information gain = info before split - info after split
gain(Outlook) = info([9; 5]) - info([2; 3]; [4; 0]; [3; 2])
              = 0.940 - 0.693 = 0.247 bits
                                                                                           17




                                                                                                      8
3/1/2012




consider splitting on Temperature attribute

temperature = Hot
info([2; 2]) = entropy(2/4 ; 2/4 ) = -2/4 log2 2/4 - 2/4 log2 2/4 =
                                   = 1 bits

 temperature = mild
info([4; 2]) = entropy(4/6,2/6) = -4/6 log2 4/6 - 2/6 log2 2/6 =
             = 0.918 bits

 temperature = cool
info([3; 1]) = entropy(3/4,1/4)= - 3/4 log2 3/4 – 1/4 log2 1/4 =0.881 bits

So, the expected information needed to classify objects in all sub trees of the
temperature attribute is:
info([2; 2]; [4; 2]; [3; 1]) = 4/14 * 1 + 6/14 * 0.981 + 4/14 * 0.881= 0.911 bits


information gain = info before split - info after split
gain(temperature) = 0.940 - 0.911 = 0.029 bits
                                                                                    18




  • Complete in the same way we get:
    gain(Outlook ) = 0.247 bits
    gain(Temperature ) = 0.029 bits
    gain(Humidity ) = 0.152 bits
    gain(Windy ) = 0.048 bit
  • And the selected attribute will be the one with
    largest information gain = outlook
  • Then Continuing to split …….


                                                                                    19




                                                                                               9
3/1/2012




gain(temperature) = 0.571bits          gain(humidity) = 0.971bits




                          gain(Windy) = 0.020 bits


                                                                    20




           The output decision tree




                                                                    21




                                                                              10
3/1/2012




ID3 versus C4.5
• ID3 uses information gain
• C4.5 can use either information gain or gain ratio
• C4.5 can deal with
  -numeric/continuous attributes
  -missing values
  -noisy data
• Alternate method: classification and regression
  trees(CART)




                                                            22




                  Decision trees advantages

•   Requires little data preparation
•   Are able to handle both categorical and numerical data
•   Are simple to understand and interpret
•   Generate models that can be statistically validated
•   The construction of decision tree classifiers does not
    parameter setting.
•   Decision trees can handle high dimensional data
•   perform well with large data in a short time
•    The learning and classification steps of decision tree
    induction are simple and fast.
•   Accuracy is comparable to other classification techniques
    for many simple data sets
•   Convertible to simple and easy to understand classification
                                                            23
    rules




                                                                       11

More Related Content

What's hot

Decision Trees
Decision TreesDecision Trees
Decision Trees
Student
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
Milind Gokhale
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
CosmoAIMS Bassett
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
Decision tree
Decision treeDecision tree
Decision tree
SEMINARGROOT
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
Rashid Ansari
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
maha797959
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
Megha Sharma
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
Mohit Rajput
 
Clustering
ClusteringClustering
Clustering
M Rizwan Aqeel
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
Ashraf Uddin
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
Md. Ariful Hoque
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
mrizwan969
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
Ujjawal
 
Classification and regression trees (cart)
Classification and regression trees (cart)Classification and regression trees (cart)
Classification and regression trees (cart)
Learnbay Datascience
 
Random forest
Random forestRandom forest
Random forest
Ujjawal
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
Shweta Ghate
 

What's hot (20)

Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Decision tree
Decision treeDecision tree
Decision tree
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Clustering
ClusteringClustering
Clustering
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
Classification and regression trees (cart)
Classification and regression trees (cart)Classification and regression trees (cart)
Classification and regression trees (cart)
 
Random forest
Random forestRandom forest
Random forest
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
 

Viewers also liked

Decision Tree Analysis
Decision Tree AnalysisDecision Tree Analysis
Decision Tree Analysis
Anand Arora
 
B2B Digital Sales - Sell the buyer’s way
B2B Digital Sales - Sell the buyer’s wayB2B Digital Sales - Sell the buyer’s way
B2B Digital Sales - Sell the buyer’s way
McKinsey on Marketing & Sales
 
Go to Market 101
Go to Market 101Go to Market 101
Go to Market 101
vinodharith
 
Qualitative data analysis
Qualitative data analysisQualitative data analysis
Qualitative data analysis
Tilahun Nigatu Haregu
 
McKinsey presentation
McKinsey presentationMcKinsey presentation
McKinsey presentation
Constructingeq
 
Market & competitor analysis template in PPT
Market & competitor analysis template in PPTMarket & competitor analysis template in PPT
Market & competitor analysis template in PPT
Aurelien Domont, MBA
 

Viewers also liked (6)

Decision Tree Analysis
Decision Tree AnalysisDecision Tree Analysis
Decision Tree Analysis
 
B2B Digital Sales - Sell the buyer’s way
B2B Digital Sales - Sell the buyer’s wayB2B Digital Sales - Sell the buyer’s way
B2B Digital Sales - Sell the buyer’s way
 
Go to Market 101
Go to Market 101Go to Market 101
Go to Market 101
 
Qualitative data analysis
Qualitative data analysisQualitative data analysis
Qualitative data analysis
 
McKinsey presentation
McKinsey presentationMcKinsey presentation
McKinsey presentation
 
Market & competitor analysis template in PPT
Market & competitor analysis template in PPTMarket & competitor analysis template in PPT
Market & competitor analysis template in PPT
 

Similar to Decision tree lecture 3

From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
Viet-Trung TRAN
 
data mining.pptx
data mining.pptxdata mining.pptx
data mining.pptx
Kaviya452563
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
Nandhini S
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
Wanderer20
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
Wanderer20
 
Decision tree
Decision treeDecision tree
Decision tree
Karan Deopura
 
Chapter 4.pdf
Chapter 4.pdfChapter 4.pdf
Chapter 4.pdf
DrGnaneswariG
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
Vijayalakshmi171563
 
Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentation
Vijay Yadav
 
Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)
eSAT Journals
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
Souma Maiti
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
Editor IJMTER
 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptx
ChandrakalaV15
 
Decision tree
Decision treeDecision tree
Decision tree
Varun Jain
 
Decision tree
Decision treeDecision tree
Decision tree
RINUSATHYAN
 
DT.pptx
DT.pptxDT.pptx
DT.pptx
PrabhasShetty
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
Kush Kulshrestha
 
Research scholars evaluation based on guides view
Research scholars evaluation based on guides viewResearch scholars evaluation based on guides view
Research scholars evaluation based on guides view
eSAT Publishing House
 
Hx3115011506
Hx3115011506Hx3115011506
Hx3115011506
IJERA Editor
 

Similar to Decision tree lecture 3 (20)

From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
data mining.pptx
data mining.pptxdata mining.pptx
data mining.pptx
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
Decision tree
Decision treeDecision tree
Decision tree
 
Chapter 4.pdf
Chapter 4.pdfChapter 4.pdf
Chapter 4.pdf
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
 
Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentation
 
Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Dbm630 lecture06
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptx
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree
Decision treeDecision tree
Decision tree
 
DT.pptx
DT.pptxDT.pptx
DT.pptx
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
Research scholars evaluation based on guides view
Research scholars evaluation based on guides viewResearch scholars evaluation based on guides view
Research scholars evaluation based on guides view
 
Hx3115011506
Hx3115011506Hx3115011506
Hx3115011506
 

Recently uploaded

How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17
Celine George
 
How to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 NotebookHow to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 Notebook
Celine George
 
220711130045_PRIYA_DAS_M.S___Access__ppt
220711130045_PRIYA_DAS_M.S___Access__ppt220711130045_PRIYA_DAS_M.S___Access__ppt
220711130045_PRIYA_DAS_M.S___Access__ppt
Kalna College
 
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Neny Isharyanti
 
INTRODUCTION TO MICRO ECONOMICS Dr. R. KURINJI MALAR
INTRODUCTION TO MICRO ECONOMICS Dr. R. KURINJI MALARINTRODUCTION TO MICRO ECONOMICS Dr. R. KURINJI MALAR
INTRODUCTION TO MICRO ECONOMICS Dr. R. KURINJI MALAR
DrRkurinjiMalarkurin
 
UNIT 5 - PATIENT SAFETY & CLINICAL RISK.pptx
UNIT 5 - PATIENT SAFETY & CLINICAL RISK.pptxUNIT 5 - PATIENT SAFETY & CLINICAL RISK.pptx
UNIT 5 - PATIENT SAFETY & CLINICAL RISK.pptx
hemaxiparmar
 
How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17
Celine George
 
Views in Odoo - Advanced Views - Pivot View in Odoo 17
Views in Odoo - Advanced Views - Pivot View in Odoo 17Views in Odoo - Advanced Views - Pivot View in Odoo 17
Views in Odoo - Advanced Views - Pivot View in Odoo 17
Celine George
 
Split Shifts From Gantt View in the Odoo 17
Split Shifts From Gantt View in the  Odoo 17Split Shifts From Gantt View in the  Odoo 17
Split Shifts From Gantt View in the Odoo 17
Celine George
 
Michael Stevenson EHF Slides June 28th 2024 Shared.pptx
Michael Stevenson EHF Slides June 28th 2024 Shared.pptxMichael Stevenson EHF Slides June 28th 2024 Shared.pptx
Michael Stevenson EHF Slides June 28th 2024 Shared.pptx
EduSkills OECD
 
NLC English INTERVENTION LESSON 3-D1.pptx
NLC English INTERVENTION LESSON 3-D1.pptxNLC English INTERVENTION LESSON 3-D1.pptx
NLC English INTERVENTION LESSON 3-D1.pptx
Marita Force
 
Capitol Doctoral Presentation -June 2024v2.pptx
Capitol Doctoral Presentation -June 2024v2.pptxCapitol Doctoral Presentation -June 2024v2.pptx
Capitol Doctoral Presentation -June 2024v2.pptx
CapitolTechU
 
hISTORY OF THE jEWISH COMMUNITY IN ROMANIA.pdf
hISTORY OF THE jEWISH COMMUNITY IN ROMANIA.pdfhISTORY OF THE jEWISH COMMUNITY IN ROMANIA.pdf
hISTORY OF THE jEWISH COMMUNITY IN ROMANIA.pdf
zuzanka
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Mohit Tripathi
 
Beginner's Guide to Bypassing Falco Container Runtime Security in Kubernetes ...
Beginner's Guide to Bypassing Falco Container Runtime Security in Kubernetes ...Beginner's Guide to Bypassing Falco Container Runtime Security in Kubernetes ...
Beginner's Guide to Bypassing Falco Container Runtime Security in Kubernetes ...
anjaliinfosec
 
debts of gratitude 2 detailed meaning and certificate of appreciation.pptx
debts of gratitude 2 detailed meaning and certificate of appreciation.pptxdebts of gratitude 2 detailed meaning and certificate of appreciation.pptx
debts of gratitude 2 detailed meaning and certificate of appreciation.pptx
AncyTEnglish
 
Front Desk Management in the Odoo 17 ERP
Front Desk  Management in the Odoo 17 ERPFront Desk  Management in the Odoo 17 ERP
Front Desk Management in the Odoo 17 ERP
Celine George
 
How to Install Theme in the Odoo 17 ERP
How to  Install Theme in the Odoo 17 ERPHow to  Install Theme in the Odoo 17 ERP
How to Install Theme in the Odoo 17 ERP
Celine George
 
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
siemaillard
 

Recently uploaded (20)

How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17
 
How to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 NotebookHow to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 Notebook
 
220711130045_PRIYA_DAS_M.S___Access__ppt
220711130045_PRIYA_DAS_M.S___Access__ppt220711130045_PRIYA_DAS_M.S___Access__ppt
220711130045_PRIYA_DAS_M.S___Access__ppt
 
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
 
INTRODUCTION TO MICRO ECONOMICS Dr. R. KURINJI MALAR
INTRODUCTION TO MICRO ECONOMICS Dr. R. KURINJI MALARINTRODUCTION TO MICRO ECONOMICS Dr. R. KURINJI MALAR
INTRODUCTION TO MICRO ECONOMICS Dr. R. KURINJI MALAR
 
UNIT 5 - PATIENT SAFETY & CLINICAL RISK.pptx
UNIT 5 - PATIENT SAFETY & CLINICAL RISK.pptxUNIT 5 - PATIENT SAFETY & CLINICAL RISK.pptx
UNIT 5 - PATIENT SAFETY & CLINICAL RISK.pptx
 
How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17
 
“A NOSSA CA(U)SA”. .
“A NOSSA CA(U)SA”.                      .“A NOSSA CA(U)SA”.                      .
“A NOSSA CA(U)SA”. .
 
Views in Odoo - Advanced Views - Pivot View in Odoo 17
Views in Odoo - Advanced Views - Pivot View in Odoo 17Views in Odoo - Advanced Views - Pivot View in Odoo 17
Views in Odoo - Advanced Views - Pivot View in Odoo 17
 
Split Shifts From Gantt View in the Odoo 17
Split Shifts From Gantt View in the  Odoo 17Split Shifts From Gantt View in the  Odoo 17
Split Shifts From Gantt View in the Odoo 17
 
Michael Stevenson EHF Slides June 28th 2024 Shared.pptx
Michael Stevenson EHF Slides June 28th 2024 Shared.pptxMichael Stevenson EHF Slides June 28th 2024 Shared.pptx
Michael Stevenson EHF Slides June 28th 2024 Shared.pptx
 
NLC English INTERVENTION LESSON 3-D1.pptx
NLC English INTERVENTION LESSON 3-D1.pptxNLC English INTERVENTION LESSON 3-D1.pptx
NLC English INTERVENTION LESSON 3-D1.pptx
 
Capitol Doctoral Presentation -June 2024v2.pptx
Capitol Doctoral Presentation -June 2024v2.pptxCapitol Doctoral Presentation -June 2024v2.pptx
Capitol Doctoral Presentation -June 2024v2.pptx
 
hISTORY OF THE jEWISH COMMUNITY IN ROMANIA.pdf
hISTORY OF THE jEWISH COMMUNITY IN ROMANIA.pdfhISTORY OF THE jEWISH COMMUNITY IN ROMANIA.pdf
hISTORY OF THE jEWISH COMMUNITY IN ROMANIA.pdf
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
Beginner's Guide to Bypassing Falco Container Runtime Security in Kubernetes ...
Beginner's Guide to Bypassing Falco Container Runtime Security in Kubernetes ...Beginner's Guide to Bypassing Falco Container Runtime Security in Kubernetes ...
Beginner's Guide to Bypassing Falco Container Runtime Security in Kubernetes ...
 
debts of gratitude 2 detailed meaning and certificate of appreciation.pptx
debts of gratitude 2 detailed meaning and certificate of appreciation.pptxdebts of gratitude 2 detailed meaning and certificate of appreciation.pptx
debts of gratitude 2 detailed meaning and certificate of appreciation.pptx
 
Front Desk Management in the Odoo 17 ERP
Front Desk  Management in the Odoo 17 ERPFront Desk  Management in the Odoo 17 ERP
Front Desk Management in the Odoo 17 ERP
 
How to Install Theme in the Odoo 17 ERP
How to  Install Theme in the Odoo 17 ERPHow to  Install Theme in the Odoo 17 ERP
How to Install Theme in the Odoo 17 ERP
 
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
 

Decision tree lecture 3

  • 1. 3/1/2012 Outline  Introduction  Basic Algorithm for Decision Tree Induction  Attribute Selection Measures – Information Gain – Gain Ratio – Gini Index  Tree Pruning  Scalable Decision Tree Induction Methods 2 1. Introduction Decision Tree Induction The Decision Tree is one of the most powerful and popular classification and prediction algorithms in current use in data mining and machine learning. The attractiveness of decision trees is due to the fact that, in contrast to neural networks, decision trees represent rules. Rules can readily be expressed so that humans can understand them or even directly used in a database access language like SQL so that records falling into a particular category may be retrieved. • A decision tree is a flowchart classifier like tree structure, where – each internal node (non-leaf node, decision node) denotes a test on an attribute – each branch represents an outcome of the test – each leaf node (or terminal node) indicates the value of the target attribute (class) of examples. 3 – The topmost node in a tree is the root node 1
  • 2. 3/1/2012 A decision tree consists of nodes and arcs which connect nodes. To make a decision, one starts at the root node, and asks questions to determine which arc to follow, until one reaches a leaf node and the decision is made. How are decision trees used for classification? – Given an instance, X, for which the associated class label is unknown – The attribute values of the instance are tested against the decision tree – A path is traced from the root to a leaf node, which holds the class prediction for that instance. Applications Decision tree algorithms have been used for classification in many application areas, such as: – Medicine – Manufacturing and production – Financial analysis – Astronomy – Molecular biology. 4 • Advantages of decision tree – The construction of decision tree classifiers does not parameter setting. – Decision trees can handle high dimensional data. – Easy to interpret for small-sized trees – The learning and classification steps of decision tree induction are simple and fast. – Accuracy is comparable to other classification techniques for many simple data sets – Convertible to simple and easy to understand classification rules 5 2
  • 3. 3/1/2012 2. Basic Algorithm Decision Tree Algorithms • ID3 algorithm • C4.5 algorithm - A successor of ID3 – Became a benchmark to which newer supervised learning algorithms are often compared. – Commercial successor: C5.0 • CART (Classification and Regression Trees) algorithm – The generation of binary decision trees – Developed by a group of statisticians 6 Basic Algorithm • Basic algorithm ,[ID3, C4.5, and CART], (a greedy algorithm) – Tree is constructed in a top-down recursive divide-and- conquer manner – At start, all the training examples are at the root – Attributes are categorical (if continuous-valued, they are discretized in advance) – Examples are partitioned recursively into smaller subsets as the tree is being built based on selected attributes – Test attributes are selected on the basis of a statistical measure (e.g., information gain) 7 3
  • 4. 3/1/2012 ID3 Algorithm function ID3 (I, 0, T) { /* I is the set of input attributes (non-target attributes) * O is the output attribute (the target attribute) * S is a set of training data * function ID3 returns a decision tree */ begin if (S is empty) { return a single node with the value "Failure"; } if (all records in S have the same value for the target attribute O) { return a single leaf node with that value; if (I is empty) { return a single node with the value of the most frequent value of O that are found in records of S; /* Note: some elements in this node will be incorrectly classified */ } /* now handle the case where we can’t return a single node */ compute the information gain for each attribute in I relative to S; let A be the attribute with largest Gain(A, S) of the attributes in I; } let {aj| j=1,2, .., m} be the values of attribute A; let {Sj| j=1,2, .., m} be the subsets of S when S is partitioned according the value of A; Return a tree with the root node labeled A and arcs labeled a1, a2, .., am, where the arcs go to the trees ID3(I-{A}, O, S1), ID3(I-{A}, O, S2), .., ID3(I-{A}, O, Sm); Recursively apply ID3 to subsets {Sj| j=1,2, .., m} until they are empty end } 8 3.Attribute Selection Measures Which is the best attribute? – Want to get the smallest tree – choose the attribute that produces the “purest” nodes Three popular attribute selection measures: – Information gain – Gain ratio – Gini index 9 4
  • 5. 3/1/2012 Information gain • The estimation criterion in the decision tree algorithm is the selection of an attribute to test at each decision node in the tree. • The goal is to select the attribute that is most useful for classifying examples. A good quantitative measure of the worth of an attribute is a statistical property called information gain that measures how well a given attribute separates the training examples according to their target classification. • This measure is used to select among the candidate attributes at each step while growing the tree. 10 Entropy - a measure of homogeneity of the set of examples • In order to define information gain precisely, we need to define a measure commonly used in information theory, called entropy (Expected information, info(),). • Given a set S, containing only positive and negative examples of some target concept (a 2 class problem), the entropy of set S relative to this simple, binary classification is defined as: Info(S) = • where pi is the proportion of S belonging to class i. Note the logarithm is still base 2 because entropy is a measure of the expected encoding length measured in bits. • In all calculations involving entropy we define 0log0 to be 0. 11 5
  • 6. 3/1/2012 • For example, suppose S is a collection of 25 examples, including 15 positive and 10 negative examples [15+, 10-]. Then the entropy of S relative to this classification is : Entropy(S) = - (15/25) log2 (15/25) - (10/25) log2 (10/25) = 0.970 • Notice that the entropy is 0 if all members of S belong to the same class. For example, Entropy(S) = -1 log2(1) - 0 log20 = -1 0 - 0 log20 = 0. • Note the entropy is 1 (at its maximum!) when the collection contains an equal number of positive and negative examples. • If the collection contains unequal numbers of positive and negative examples, the entropy is between 0 and 1. Figure 1 shows the form of the entropy function relative to a binary classification, as p+ varies between 0 and 1. 12 Figure 1: The entropy function relative to a binary classification, as the proportion of positive examples pp varies between 0 and 1. Entropy of S = Info(S) -The average amount of information needed to identify the class label of an instance in D. - A measure of the impurity in a collection of training examples - The smaller information required, the greater the purity of the partitions. 13 6
  • 7. 3/1/2012 • Information gain measures the expected reduction in entropy caused by partitioning the examples according to this attribute. • The information gain, Gain (S, A) of an attribute A, relative to a collection of examples S, is defined as = info (S) – infoA (s) = information needed before splitting – information needed after splitting • where Values(A) is the set of all possible values for attribute A, and Sv is the subset of S for which attribute A has value v (i.e., Sv = {s  S | A(s) = v }). Note the first term in the equation for Gain is just the entropy of the original collection S and the second term is infoA (S), the expected value of the entropy after S is partitioned using attribute A 14 An example: Weather Data The aim of this exercise is to learn how ID3 works. You will do this by building a decision tree by hand for a small dataset. At the end of this exercise you should understand how ID3 constructs a decision tree using the concept of Information Gain. You will be able to use the decision tree you create to make a decision about new data. 15 7
  • 8. 3/1/2012 • In this dataset, there are five categorical attributes outlook, temperature, humidity, windy, and play. • We are interested in building a system which will enable us to decide whether or not to play the game on the basis of the weather conditions, i.e. we wish to predict the value of play using outlook, temperature, humidity, and windy. • We can think of the attribute we wish to predict, i.e. play, as the output attribute, and the other attributes as input attributes. • In this problem we have 14 examples in which: 9 examples with play= yes and 5 examples with play = no So, S={9,5}, and Entropy(S) = info (S) = info([9,5] ) = Entropy(9/14, 5/14) = -9/14 log2 9/14 – 5/14 log2 5/14 = 0.940 16 consider splitting on Outlook attribute Outlook = Sunny info([2; 3]) = entropy(2/5 ; 3/5 ) = -2/5 log2 2/5 - 3/5 log2 3/5 = 0.971 bits Outlook = Overcast info([4; 0]) = entropy(4/4,0/4) = -1 log2 1 - 0 log2 0 = 0 bits Outlook = Rainy info([3; 2]) = entropy(3/5,2/5)= - 3/5 log2 3/5 – 2/5 log2 2/5 =0.971 bits So, the expected information needed to classify objects in all sub trees of the Outlook attribute is : info outlook (S) = info([2; 3]; [4; 0]; [3; 2]) = 5/14 * 0.971 + 4/14 * 0 + 5/14 * 0.971 = 0.693 bits information gain = info before split - info after split gain(Outlook) = info([9; 5]) - info([2; 3]; [4; 0]; [3; 2]) = 0.940 - 0.693 = 0.247 bits 17 8
  • 9. 3/1/2012 consider splitting on Temperature attribute temperature = Hot info([2; 2]) = entropy(2/4 ; 2/4 ) = -2/4 log2 2/4 - 2/4 log2 2/4 = = 1 bits temperature = mild info([4; 2]) = entropy(4/6,2/6) = -4/6 log2 4/6 - 2/6 log2 2/6 = = 0.918 bits temperature = cool info([3; 1]) = entropy(3/4,1/4)= - 3/4 log2 3/4 – 1/4 log2 1/4 =0.881 bits So, the expected information needed to classify objects in all sub trees of the temperature attribute is: info([2; 2]; [4; 2]; [3; 1]) = 4/14 * 1 + 6/14 * 0.981 + 4/14 * 0.881= 0.911 bits information gain = info before split - info after split gain(temperature) = 0.940 - 0.911 = 0.029 bits 18 • Complete in the same way we get: gain(Outlook ) = 0.247 bits gain(Temperature ) = 0.029 bits gain(Humidity ) = 0.152 bits gain(Windy ) = 0.048 bit • And the selected attribute will be the one with largest information gain = outlook • Then Continuing to split ……. 19 9
  • 10. 3/1/2012 gain(temperature) = 0.571bits gain(humidity) = 0.971bits gain(Windy) = 0.020 bits 20 The output decision tree 21 10
  • 11. 3/1/2012 ID3 versus C4.5 • ID3 uses information gain • C4.5 can use either information gain or gain ratio • C4.5 can deal with -numeric/continuous attributes -missing values -noisy data • Alternate method: classification and regression trees(CART) 22 Decision trees advantages • Requires little data preparation • Are able to handle both categorical and numerical data • Are simple to understand and interpret • Generate models that can be statistically validated • The construction of decision tree classifiers does not parameter setting. • Decision trees can handle high dimensional data • perform well with large data in a short time • The learning and classification steps of decision tree induction are simple and fast. • Accuracy is comparable to other classification techniques for many simple data sets • Convertible to simple and easy to understand classification 23 rules 11