entropy and information gain for decision tree algorithm
entropy and information gain for decision tree algorithm
They
work by recursively splitting the dataset into subsets based on feature values, creating a tree-like
structure of decisions that leads to predictions. Here’s an overview of decision trees and some
commonly used algorithms:
• Edges: Each branch from a node represents a decision based on that feature’s value.
• Leaf Nodes: Represent the final output (class or value) after all decisions have been made.
• Root Node: The topmost node in a tree, representing the initial feature or question.
• Criterion: It uses information gain to decide which feature to split on, favoring splits that result
in the greatest reduction in entropy.
• Limitations: Prone to overfitting and can’t handle numeric data directly without modification.
b) C4.5
• Criterion: Uses gain ratio to handle continuous and categorical data better than ID3.
• Handling of Missing Values: C4.5 can handle datasets with missing values more effectively than
ID3.
• Developed by Leo Breiman, CART is widely used in both classification and regression.
• Criterion: For classification, CART uses Gini impurity as the splitting criterion, while for
regression, it uses mean squared error (MSE).
• Binary Splits Only: CART splits the data into exactly two branches at each node, creating binary
trees.
• CHAID is used for categorical data and is based on the chi-square test.
• Criterion: Uses statistical tests (chi-square for classification, ANOVA for regression) to determine
splits.
• Multifurcating Splits: Unlike CART, CHAID can create branches with multiple splits from a single
node.
• Use Cases: Often used for market research and survey analysis.
• Little Data Preprocessing: Often requires minimal data preparation, like normalization or scaling.
• Overfitting: Decision trees can easily overfit, especially with deep trees.
• Bias: Sensitive to small changes in data, which can lead to vastly different trees (high variance).
• Preference for Certain Features: Tend to favor features with more levels.
• Feature selection