The document discusses regression trees and the ID3 and CART algorithms for constructing regression trees. It explains that ID3 replaces information gain with standard deviation reduction when building regression trees. It outlines the steps of calculating standard deviation at each node and selecting the attribute with the highest standard deviation reduction. It then discusses how CART uses least squares to minimize residual sum of squares at each split for regression trees.
The document discusses regression trees and the ID3 and CART algorithms for constructing regression trees. It explains that ID3 replaces information gain with standard deviation reduction when building regression trees. It outlines the steps of calculating standard deviation at each node and selecting the attribute with the highest standard deviation reduction. It then discusses how CART uses least squares to minimize residual sum of squares at each split for regression trees.
The ID3 algorithm can be used to construct a decision tree for regression by replacing Information Gain with Standard Deviation Reduction.
Steps-
1) Calculate Standard Deviation for class variable
2) Calculate SD for each attribute 3) Calculate weighted SD for each attribute 4) Calculate reduction in SD for each attribute by subtracting its weighted SD from SD of class variable 5) Select the attribute with highest SD reduction 6) Split the dataset based on this attribute 7) Repeat the process for each subset dataset of selected attribute Regression Trees Calculating SD for each Attribute Outlook can be sunny, overcast and rain Calculating SD for each Attribute Outlook can be sunny, overcast and rain Calculating SD for each Attribute Outlook can be sunny, overcast and rain Calculating SD for each Attribute Calculating SD for remaining Attributes Calculating SD for remaining Attributes Calculating SD for remaining Attributes Calculating SD for remaining Attributes
Select the attribute with highest standard deviation reduction
Regression Trees Regression Trees- Continue the process till the defined criteria. Eg- <=5 tuples in each leaf
Take average of continuous values in labels
Regression Trees (Final) CART Algorithm For Decision Trees CART ( Classification and Regression Trees) • Creates Binary Trees • Uses Gini Index as attribute selection measure Gini Index CART- Regression Trees • CART in classification cases uses Gini Impurity • CART in regression cases uses least squares • Splits are chosen to minimize the residual sum of squares between the observation and the mean in each node. CART- Regression Trees CART- Regression Trees first, we calculate RSS by split into two regions, start with index 0 Start within index 0 CART- Regression Trees CART- Regression Trees Start within index 1 calculate RSS by split into two regions within index 1 CART- Regression Trees CART- Regression Trees Start within index 2 calculate RSS by split into two regions within index 2 CART- Regression Trees CART- Regression Trees This process continues until the calculation of RSS in the last index. Price with threshold 19 has a smallest RSS, in R1 there are 10 data within price < 19, so we’ll split the data in R1. In order to avoid overfitting, we define the minimum data for each region >= 6. If the region has less than 6 data, the split process in that region stops. CART- Regression Trees CART- Regression Trees Repeat the process till entire tree construction