Support, Decision and Random
Support, Decision and Random
Support Vector Regression (SVR) is a type of machine learning algorithm used for regression analysis. The goal of SVR is
to find a function that approximates the relationship between the input variables and a continuous target variable, while
minimizing the prediction error.
SVR seeks to find a hyperplane that best fits the data points in a continuous space. This is achieved by mapping the input
variables to a high-dimensional feature space and finding the hyperplane that maximizes the margin (distance) between the
hyperplane and the closest data points, while also minimizing the prediction error.
SVR can handle non-linear relationships between the input variables and the target variable by using a kernel function to
map the data to a higher-dimensional space
Kernel: A kernel helps us find a hyperplane in the higher dimensional space
without increasing the computational cost. Usually, the computational cost
will increase if the dimension of the data increases. This increase in dimension
is required when we are unable to find a separating hyperplane in a given
dimension and are required to move in a higher dimension.
•Hyperplane: This is basically a separating line between two data classes in
SVM. But in Support Vector Regression, this is the line that will be used to
predict the continuous output
•Decision Boundary: A decision boundary can be thought of as a demarcation
line (for simplification) on one side of which lie positive examples and on the
other side lie the negative examples. On this very line, the examples may be
classified as either positive or negative.
Support Vector Regression
Consider these two red lines as the decision boundary and the green line as the
hyperplane. Our objective, when we are moving on with SVR, is to basically
consider the points that are within the decision boundary line. Our best fit
line is the hyperplane that has a maximum number of points.
The first thing that we’ll understand is what is the decision boundary (the danger
red line above!). Consider these lines as being at any distance, say ‘a’, from the
hyperplane. So, these are the lines that we draw at distance ‘+a’ and ‘-a’ from the
hyperplane. This ‘a’ in the text is basically referred to as epsilon.
Assuming that the equation of the hyperplane is as follows:
Y = wx+b (equation of hyperplane)
Then the equations of decision boundary become:
wx + b = +a
wx + b = -a
Thus, any hyperplane that satisfies our SVR should satisfy:
-a < Y- wx+b < +a
Our main aim here is to decide a decision boundary at ‘a’ distance from the
original hyperplane such that data points closest to the hyperplane or the
support vectors are within that boundary line.
Hence, we are going to take only those points that are within the decision
boundary and have the least error rate, or are within the Margin of Tolerance.
This gives us a better fitting model.
DECISION TREE Regression
•Root Node: It is the topmost node in the tree, which represents the complete dataset. It is the starting point of the decision-
making process.
•Leaf/Terminal Node: A node without any child nodes that indicates a class label or a numerical value.
•Splitting: The process of splitting a node into two or more sub-nodes using a split criterion and a selected feature.
•Branch/Sub-Tree: A subsection of the decision tree starts at an internal node and ends at the leaf nodes.
•Parent Node: The node that divides into one or more child nodes.
•Child Node: The nodes that emerge when a parent node is split.
•Pruning: The process of removing branches from the tree that do not provide any additional information or lead to
overfitting.
DECISION TREE Regression
Decision tree Regression work by partitioning the feature space into regions and predicting the target variable based on the average (or median) value of the
training samples in each region.
Building a Decision Tree
1.Root Node Selection: The feature that provides the best split (according to the chosen criterion) is selected as the root node.
2.Splitting: The dataset is split into subsets based on the value of the selected feature.
3.Recursive Splitting: The splitting process is repeated recursively for each subset until a stopping criterion is met. This criterion could be a maximum tree
depth, minimum samples per leaf, or other hyperparameters.
4.Leaf Node Prediction: When a stopping criterion is reached, the average (or median) value of the target variable in each leaf node is used as the prediction
for new instances falling into that leaf.
VARIANCE
variance typically refers to assessing the variability or spread of the target variable within each leaf node of the tree. This variance can be used as a measure of
impurity when making decisions about how to split the data at each node. The most common measure used to calculate the variance within each node is the
mean squared error (MSE).
DECISION TREE Regression
Variance Reduction
Variance reduction is a measure of how much the variance of the target variable is reduced as a result of splitting the data
based on a particular feature at a particular node in the tree. The formula for calculating variance reduction in decision tree
regression typically involves comparing the variance of the target variable before and after the split.
Random Forest Regression
A Random Forest is like a group decision-making team in machine learning. It combines the opinions of many “trees”
(individual models) to make better predictions, creating a more robust and accurate overall model.In other words,
builds multiple decision trees during training. Each decision tree is constructed by selecting a random subset of features
and a random subset of the training data points. This randomness helps to ensure that the individual trees are diverse and
not overly correlated with each other.
1.Bagging (Bootstrap Aggregating): This method involves
training multiple models on random subsets of the training
data. The predictions from the individual models are then
combined, typically by averaging.
2.Boosting: This method involves training a sequence of
models, where each subsequent model focuses on the errors
made by the previous model. The predictions are combined
using a weighted voting scheme.
3.Stacking: This method involves using the predictions from
one set of models as input features for another model. The
final prediction is made by the second-level model.
Random Forest Regression