Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
35 views

Data Science

Regression analysis is a statistical method used to model relationships between variables. It allows prediction of continuous target variables from inputs and understanding how the target changes with each input. There are different types including linear, logistic, polynomial, support vector, decision tree, and random forest regression. Each type is suited to different problem domains like trends analysis, forecasting, and classification.

Uploaded by

Abisha
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Data Science

Regression analysis is a statistical method used to model relationships between variables. It allows prediction of continuous target variables from inputs and understanding how the target changes with each input. There are different types including linear, logistic, polynomial, support vector, decision tree, and random forest regression. Each type is suited to different problem domains like trends analysis, forecasting, and classification.

Uploaded by

Abisha
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Regression Analysis

Regression analysis is a statistical method to model the relationship between a


dependent (target) and independent (predictor) variables with one or more
independent variables.
More specifically, Regression analysis helps us to understand how the value of the
dependent variable is changing corresponding to an independent variable when
other independent variables are held fixed. It predicts continuous/real values such as
temperature, age, salary, price, etc.

Some examples of regression can be as:


 Prediction of rain using temperature and other factors
 Determining Market trends
 Prediction of road accidents due to rash driving

Terminologies Related to the Regression Analysis:


Dependent Variable: The main factor in Regression analysis which we want to
predict or understand is called the dependent variable. It is also called target
variable.
Independent Variable: The factors which affect the dependent variables or which
are used to predict the values of the dependent variables are called independent
variable, also called as a predictor.
Outliers: Outlier is an observation which contains either very low value or very
high value in comparison to other observed values. An outlier may hamper the
result, so it should be avoided.
Multicollinearity: If the independent variables are highly correlated with each
other than other variables, then such condition is called Multicollinearity. It should
not be present in the dataset, because it creates problem while ranking the most
affecting variable.
Underfitting and Overfitting: If our algorithm works well with the training dataset
but not well with test dataset, then such problem is called Overfitting. And if our
algorithm does not perform well even with training dataset, then such problem is
called underfitting.

Below are some other reasons for using Regression analysis:

 Regression estimates the relationship between the target and the independent
variable.
 It is used to find the trends in data.
 It helps to predict real/continuous values.
 By performing the regression, we can confidently determine the most
important factor, the least important factor, and how each factor is affecting
the other factors.

Types of Regression
There are various types of regressions which are used in data science and machine
learning. Each type has its own importance on different scenarios, but at the core,
all the regression methods analyze the effect of the independent variable on
dependent variables.

Here we are discussing some important types of regression which are given below:
 Linear Regression
 Logistic Regression
 Polynomial Regression
 Support Vector Regression
 Decision Tree Regression
 Random Forest Regression
 Ridge Regression
 Lasso Regression.
Linear Regression:
 Linear regression is a statistical regression method which is used for
predictive analysis.
 It is one of the very simple and easy algorithms which works on regression
and shows the relationship between the continuous variables.
 It is used for solving the regression problem in machine learning.
Below is the mathematical equation for Linear regression:
Y= aX+b
Here, Y = dependent variables (target variables),
X= Independent variables (predictor variables),
A and b are the linear coefficients

Some popular applications of linear regression are:

o Analyzing trends and sales estimates


o Salary forecasting
o Real estate prediction
o Arriving at ETAs in traffic.

Logistic Regression:
Logistic regression is another supervised learning algorithm which is used to solve
the classification problems. In classification problems, we have dependent variables
in a binary or discrete format such as 0 or 1.
Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes
or No, True or False, Spam or not spam, etc.
There are three types of logistic regression:
 Binary(0/1, pass/fail)
 Multi(cats, dogs, lions)
 Ordinal(low, medium, high)

Polynomial Regression:
Polynomial Regression is a type of regression which models the non-linear dataset
using a linear model.
It is similar to multiple linear regression, but it fits a non-linear curve between the
value of x and corresponding conditional values of y.
Suppose there is a dataset which consists of datapoints which are present in a non-
linear fashion, so for such case, linear regression will not best fit to those
datapoints. To cover such datapoints, we need Polynomial regression.

Support Vector Regression:


Support Vector Machine is a supervised learning algorithm which can be used for
regression as well as classification problems. So if we use it for regression
problems, then it is termed as Support Vector Regression.
Kernel: It is a function used to map a lower-dimensional data into higher
dimensional data.
Hyperplane: In general SVM, it is a separation line between two classes, but in
SVR, it is a line which helps to predict the continuous variables and cover most of
the datapoints.
Boundary line: Boundary lines are the two lines apart from hyperplane, which
creates a margin for datapoints.
Support vectors: Support vectors are the datapoints which are nearest to the
hyperplane and opposite class.

Decision Tree Regression:


Decision Tree is a supervised learning algorithm which can be used for solving both
classification and regression problems.
It can solve problems for both categorical and numerical data
Decision Tree regression builds a tree-like structure in which each internal node
represents the “test” for an attribute, each branch represent the result of the test, and
each leaf node represents the final decision or result.

Ridge Regression:
Ridge regression is one of the most robust versions of linear regression in which a
small amount of bias is introduced so that we can get better long term predictions.
The amount of bias added to the model is known as Ridge Regression penalty. We
can compute this penalty term by multiplying with the lambda to the squared weight
of each individual features.

Lasso Regression:
Lasso regression is another regularization technique to reduce the complexity of the
model.
It is similar to the Ridge Regression except that penalty term contains only the
absolute weights instead of a square of weights.

You might also like