Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

entropy and information gain for decision tree algorithm

Xrd6 gu9tx sjfsmzvmzv 59d fuptuf hda khx. Yotd mhtyc

Uploaded by

engtawkibhasan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

entropy and information gain for decision tree algorithm

Xrd6 gu9tx sjfsmzvmzv 59d fuptuf hda khx. Yotd mhtyc

Uploaded by

engtawkibhasan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Decision trees are a popular machine learning algorithm for both classification and regression tasks.

They
work by recursively splitting the dataset into subsets based on feature values, creating a tree-like
structure of decisions that leads to predictions. Here’s an overview of decision trees and some
commonly used algorithms:

1. Basic Concept of Decision Trees

• Nodes: Each node represents a feature (or attribute) in the dataset.

• Edges: Each branch from a node represents a decision based on that feature’s value.

• Leaf Nodes: Represent the final output (class or value) after all decisions have been made.

• Root Node: The topmost node in a tree, representing the initial feature or question.

2. Decision Tree Algorithms

a) ID3 (Iterative Dichotomiser 3)

• Developed by Ross Quinlan, ID3 is one of the earliest algorithms.

• Criterion: It uses information gain to decide which feature to split on, favoring splits that result
in the greatest reduction in entropy.

• Limitations: Prone to overfitting and can’t handle numeric data directly without modification.

b) C4.5

• An extension of ID3, also developed by Quinlan.

• Criterion: Uses gain ratio to handle continuous and categorical data better than ID3.

• Pruning: Implements pruning to reduce overfitting.

• Handling of Missing Values: C4.5 can handle datasets with missing values more effectively than
ID3.

c) CART (Classification and Regression Trees)

• Developed by Leo Breiman, CART is widely used in both classification and regression.

• Criterion: For classification, CART uses Gini impurity as the splitting criterion, while for
regression, it uses mean squared error (MSE).

• Binary Splits Only: CART splits the data into exactly two branches at each node, creating binary
trees.

• Pruning: Prunes trees based on a cost-complexity parameter to manage overfitting.

d) CHAID (Chi-Square Automatic Interaction Detector)

• CHAID is used for categorical data and is based on the chi-square test.

• Criterion: Uses statistical tests (chi-square for classification, ANOVA for regression) to determine
splits.
• Multifurcating Splits: Unlike CART, CHAID can create branches with multiple splits from a single
node.

• Use Cases: Often used for market research and survey analysis.

3. Advantages of Decision Trees

• Interpretability: Easy to understand and visualize, even for non-experts.

• Non-linearity: Can model non-linear relationships.

• Little Data Preprocessing: Often requires minimal data preparation, like normalization or scaling.

4. Limitations of Decision Trees

• Overfitting: Decision trees can easily overfit, especially with deep trees.

• Bias: Sensitive to small changes in data, which can lead to vastly different trees (high variance).

• Preference for Certain Features: Tend to favor features with more levels.

5. Applications of Decision Trees

• Classification tasks (e.g., spam detection, customer churn prediction)

• Regression tasks (e.g., predicting housing prices)

• Feature selection

To calculate entropy and information gain for building a decision tree:


entropy and information gain calculation with a small dataset and use it to build a decision tree. We’ll
also make a prediction for a given input based on the final decision.
Details:
https://towardsdatascience.com/decision-tree-in-machine-learning-
e380942a4c96

You might also like