Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
104 views28 pages

Random Forest Algorithm Overview and Tuning

Random Forest is a non-parametric algorithm that averages the predictions of many de-correlated decision trees to improve performance over a single tree. It can be used for both classification and regression problems. Random Forest introduces randomness during tree construction by selecting a random subset of features at each split, which provides a more diverse set of trees that reduces variance compared to bagging. Key hyperparameters include the number of trees, number of randomly selected features at each split (mtry), minimum node size, sampling scheme, and whether to use early stopping criteria. Random Forest performance can be tuned by optimizing these hyperparameters, such as through random grid search as demonstrated using the h2o package.

Uploaded by

Vivi Wong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views28 pages

Random Forest Algorithm Overview and Tuning

Random Forest is a non-parametric algorithm that averages the predictions of many de-correlated decision trees to improve performance over a single tree. It can be used for both classification and regression problems. Random Forest introduces randomness during tree construction by selecting a random subset of features at each split, which provides a more diverse set of trees that reduces variance compared to bagging. Key hyperparameters include the number of trees, number of randomly selected features at each split (mtry), minimum node size, sampling scheme, and whether to use early stopping criteria. Random Forest performance can be tuned by optimizing these hyperparameters, such as through random grid search as demonstrated using the h2o package.

Uploaded by

Vivi Wong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Random Forest

Algorithm class: Non-parametric

Mechanism: Average predictions of many trees (de-correlated)

Applicable: Both classification and regression problem

Random Forest is a generalization of Bagging, and typically achieves much


better performance
• Essentially, provide an improvement over bagging by a small tweak
• This reduces the variance when we average the trees
Idea

Split variable randomization


• Follow a similar bagging process but …

Trees produced by bagging


Idea

Split variable randomization


• Follow a similar bagging process but …
• Each time a split is to be performed,

- regression trees: m = p/3


- classification trees: m= 𝑝
- m is commonly referred to as mtry

Trees produced by RF
Random Forest

Essentially

• Bagging introduces randomness into rows of the data

• Random forest introduces randomness into __________________________


• This provides a more diverse set of trees that almost always lowers the
prediction error
Out of bag (OOB) Performance
• For large enough N, on average 63% or the original records end up in any
bootstrap sample

• i.e. 37% of the observations are not used in the construction of a particular tree

• These observations are considered OOB and can be used for efficient assessment
of model performance (unstructured, but free, cross validation)

• RF typically has the least variability in prediction accuracy when tuning

• Let’s now look at how to implement RF


Implementation of Random Forest

• Simple way: ranger, full grid search

• More advanced: h2o, random grid search & early stopping rules
Ames Housing Example (RF), with ranger package
Direct implementation of RF, no tuning


For regression tree

Baseline RF model, RMSE ≈ 25,500

Next, we will look at how to tune hyperpara. to improve the model


Random Forest
Tuning Hyperparameters
Random forests provide good "out-of-the-box" performance but there are a few hyperpara.
we can tune to increase performance.

# Trees
Typically have the largest impact on predictive accuracy
Mtry

Min node size/Max depth


(Tree Complexity) Some impact on predictive accuracy, but can increase
computational efficiency
Sampling scheme
Random Forest
#Features in dataset?
Tuning Hyperparameters: # Trees

Need to be sufficiently large: stabilize error rate

Rule of thumb: start with 10p trees and adjust as


necessary

More trees provide robust and stable error


estimates and variable importance measures

Computation time increases linearly with the


number of trees
Random Forest
Tuning Hyperparameters: mtry (#split vars)

Balance low tree correlation and reasonable


predictive strength

Rule of thumb default:


Regression default:
Classification default:

Start with 5 values evenly spaced from 2 to p,


including the default rule-of-thumb value

Few relevant predictors: Should we  or  mtry?


Random Forest
Tuning Hyperparameters: Min node size/Max depth (Tree Complexity)

Control the complexity of individual trees


Error Growth
Rule of thumb:
Regression default: 5
Classification default: 1
Start with 3 values (1,5,10)

Study (Segal, 2004) has shown Run Time Reduction


Few relevant predictors: Node size 
Very large data sets: Node size 

Impact of Node size on error & run time (Right Figure)


• If run time is a concern, can run time substantially by node size
Random Forest
Tuning Hyperparameters: Sampling scheme

1. Sample size (default: 100%)


2. Sample with replacement / without replacement
(default: with replacement)

Rationale:
 Sample size ______ between-tree correlation

Sampling without replacement produces trees that are less biased


• Ensures [obs. with low-freq categories] more likely to be selected
• Especially important when data has categories that are imbalanced

Rule of thumb:
3-4 values of sample sizes ranging from 25-100%
Try both sampling with/without replacement
Ames Housing Example (RF), with ranger package (cont’d)
Tuning Strategy Illustration

mtry
Min node size
Sample scheme

Note: [Link] returns a dataframe with columns mtry,


[Link], replace, [Link], rmse (values to be filled)
Ames Housing Example (RF), with ranger package (cont’d)
Tuning Strategy Illustration

#trees
mtry
Node size
Sample scheme

Fills rmse in hyper_grid


(created by [Link])
Ames Housing Example (RF), with ranger package (cont’d)
Tuning Strategy Illustration %improvement of RMSE w.r.t. baseline model

RMSE slightly improvement over


baseline model

Observations
1. Default mtry = 26 (#features/3)
nearly sufficient

2. Smaller node size performs better


(deeper tree)

3. Sample <100% and sample


without replacement consistently
performs better
• Probably due to data having a lot
of high-cardinality & imbalanced
categorial features
Ames Housing Example (RF), with h2o package

Benefits of h2o package:


• Random grid search
• Full Cartesian hyperpara. search can be computationally expensive
• Randomly jump from one random para. combination to another
• Can specify early stopping rules
• E.g. #models trained >= threshold, certain runtime elapses
Ames Housing Example (RF), with h2o package
Baseline h2o RF
• Syntax and result very similar to the baseline ranger RF

Similar to baseline RF using ranger


Ames Housing Example (RF), with h2o package
h2o RF with Random Grid Search + Early Stopping Rule (Optional)

Recall in ranger,
we build the hyperpara. grid using the following syntax
Ames Housing Example (RF), with h2o package
h2o RF with Random Grid Search + Early Stopping Rule (Optional)

In h2o, we use a list


Ames Housing Example (RF), with h2o package
h2o RF with Random Grid Search + Early Stopping Rule (Optional)

In h2o, we use a list

Min node size Random grid-search strategy: “RandomDiscrete”


• Randomly jump from one hyperpara. combination to another

Early stopping criteria for grid-search


• Stop if the last 10 RF models do NOT improve RMSE by 0.1%
• Stop if run time > 5 mins
Ames Housing Example (RF), with h2o package
h2o RF with Random Grid Search + Early Stopping Rule (Optional)

Early stopping criteria for building one RF


• Stop if the last 10 trees added do NOT improve RMSE by 0.5%
Ames Housing Example (RF), with h2o package
h2o RF with Random Grid Search + Early Stopping Rule (Optional)
Ames Housing Example (RF), with h2o package
h2o RF with Random Grid Search + Early Stopping Rule (Optional)
Note: with early stopping, results may NOT be
the same (#models searched in laptops of
different speed will be different)

Assessed 66 models, best CV RMSE = 24670

This is near-optimal, and the random grid-


search is more efficient
Feature Interpretation
For RF: 2 approaches for variable importance
At this point, do not need to know the details, just know there are 2 measures
Impurity (Same as CART)
• Based on the average total reduction in MSE

Permutation (Applicable for All ML models, will talk about it in more details)
• Permute a feature to a random value, see how it affects MSE
Feature Interpretation
E.g. using ranger
Feature Interpretation
Typically, similar variables at the top between the two approaches
• Can conclude top 3 important vars: Overall_Qual, Gr_Liv_Area, Neighborhood
Summary
Method Hyperpara Unique features RMSE Package
Demonstrated

CART • Tree depth Simple to interpret - rpart


• Node size
• cp caret
method =
“rpart”
Random Forest # Trees (~10p) Subsample rows/cols ~24000 ranger
Mtry (#split vars, p/3 or 𝑝) Early Stopping
Node size (Tree Complexity) (in adding trees) h2o
Sampling scheme Algorithm =
• Sample size “randomForest”
• Sample with/without replacement
End

You might also like