UNIT 3 Regression
UNIT 3 Regression
1
o Simple linear regression should be used when there is only a
single independent variable.
● Multiple Regression
o Used to predict a continuous dependent variable based on
multiple independent variables.
o Multiple linear regression should be used when there are
multiple independent variables.
● NonLinear Regression
o Relationship between the dependent variable and
independent variable(s) follows a nonlinear pattern.
o Provides flexibility in modeling a wide range of functional forms.
Regression Algorithms
There are many different types of regression algorithms, but some of the
most common include:
● Linear Regression
o Linear regression is one of the simplest and most widely
used statistical models. This assumes that there is a linear
relationship between the independent and dependent
variables. This means that the change in the dependent
variable is proportional to the change in the independent
variables.
● Polynomial Regression
o Polynomial regression is used to model nonlinear
relationships between the dependent variable and the
independent variables. It adds polynomial terms to the linear
regression model to capture more complex relationships.
● Support Vector Regression (SVR)
o Support vector regression (SVR) is a type of regression
algorithm that is based on the support vector machine (SVM)
algorithm. SVM is a type of algorithm that is used for
classification tasks, but it can also be used for regression
tasks. SVR works by finding a hyperplane that minimizes the
sum of the squared residuals between the predicted and actual
values.
● Decision Tree Regression
o Decision tree regression is a type of regression algorithm that
builds a decision tree to predict the target value. A decision
tree is a tree-like structure that consists of nodes and
branches. Each node represents a decision, and each branch
represents the outcome of that decision. The goal of
decision tree regression is to build a tree that can accurately
predict the target value for new data points.
2
● Random Forest Regression
o Random forest regression is an ensemble method that
combines multiple decision trees to predict the target value.
Ensemble methods are a type of machine learning algorithm
that combines multiple models to improve the performance of
the overall model. Random forest regression works by
building a large number of decision trees, each of which is
trained on a different subset of the training data. The final
prediction is made by averaging the predictions of all of the
trees.
Regularized Linear Regression Techniques
● Ridge Regression
o Ridge regression is a type of linear regression that is used to
prevent overfitting. Overfitting occurs when the model learns
the training data too well and is unable to generalize to new
data.
● Lasso regression
o Lasso regression is another type of linear regression that is
used to prevent overfitting. It does this by adding a penalty
term to the loss function that forces the model to use some
weights and to set others to zero.
Characteristics of Regression
Here are the characteristics of the regression:
● Continuous Target Variable: Regression deals with predicting
continuous target variables that represent numerical values.
Examples include predicting house prices, forecasting sales figures, or
estimating patient recovery times.
● Error Measurement: Regression models are evaluated based on their
ability to minimize the error between the predicted and actual
values of the target variable. Common error metrics include mean
absolute error (MAE), mean squared error (MSE), and root mean
squared error (RMSE).
● Model Complexity: Regression models range from simple linear
models to more complex nonlinear models. The choice of model
complexity depends on the complexity of the relationship between the
input features and the target variable.
● Overfitting and Underfitting: Regression models are susceptible to
overfitting and underfitting.
● Interpretability: The interpretability of regression models varies
depending on the algorithm used. Simple linear models are highly
3
interpretable, while more complex models may be more difficult to
interpret.
Examples
Which of the following is a regression task?
● Predicting age of a person
4
● Robust to outliers
● Can handle both linear and nonlinear relationships.
Disadvantages of Regression
● Assumes linearity
● Sensitive to multicollinearity
● May not be suitable for highly complex relationships