Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Module5

Regression in machine learning is a supervised learning technique used to predict continuous numerical values based on independent features, with various types including simple, multiple, polynomial, and regularized regressions. Evaluation metrics such as Mean Absolute Error and R2-Score are used to assess model performance, while applications range from price prediction to trend forecasting. The key distinction between regression and classification lies in the nature of the output, with regression focusing on continuous values and classification on discrete categories.

Uploaded by

Bhagya Lakshmi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module5

Regression in machine learning is a supervised learning technique used to predict continuous numerical values based on independent features, with various types including simple, multiple, polynomial, and regularized regressions. Evaluation metrics such as Mean Absolute Error and R2-Score are used to assess model performance, while applications range from price prediction to trend forecasting. The key distinction between regression and classification lies in the nature of the output, with regression focusing on continuous values and classification on discrete categories.

Uploaded by

Bhagya Lakshmi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Regression in machine learning

Regression in machine learning refers to a supervised


learning technique where the goal is to predict a continuous numerical
value based on one or more independent features. It finds relationships
between variables so that predictions can be made. we have two types of
variables present in regression:

 Dependent Variable (Target): The variable we are trying to


predict e.g house price.

 Independent Variables (Features): The input variables that


influence the prediction e.g locality, number of rooms.

Regression analysis problem works with if output variable is a real or


continuous value such as “salary” or “weight”. Many different regression
models can be used but the simplest model in them is linear regression.

Types of Regression

Regression can be classified into different types based on the number of


predictor variables and the nature of the relationship between variables:

1. Simple Linear Regression

Linear regression is one of the simplest and most widely used statistical
models. This assumes that there is a linear relationship between the
independent and dependent variables. This means that the change in the
dependent variable is proportional to the change in the independent
variables. For example predicting the price of a house based on its size.

2. Multiple Linear Regression

Multiple linear regression extends simple linear regression by using


multiple independent variables to predict target variable. For example
predicting the price of a house based on multiple features such as size,
location, number of rooms, etc.

3. Polynomial Regression

Polynomial regression is used to model with non-linear relationships


between the dependent variable and the independent variables. It adds
polynomial terms to the linear regression model to capture more complex
relationships. For example when we want to predict a non-linear trend like
population growth over time we use polynomial regression.

4. Ridge & Lasso Regression

Ridge & lasso regression are regularized versions of linear regression that
help avoid overfitting by penalizing large coefficients. When there’s a risk

1
of overfitting due to too many features we use these type of regression
algorithms.

5. Support Vector Regression (SVR)

SVR is a type of regression algorithm that is based on the Support Vector


Machine (SVM) algorithm. SVM is a type of algorithm that is used for
classification tasks but it can also be used for regression tasks. SVR works
by finding a hyperplane that minimizes the sum of the squared residuals
between the predicted and actual values.

6. Decision Tree Regression

Decision tree Uses a tree-like structure to make decisions where each


branch of tree represents a decision and leaves represent outcomes. For
example predicting customer behavior based on features like age, income,
etc there we use decison tree regression.

7. Random Forest Regression

Random Forest is a ensemble method that builds multiple decision trees


and each tree is trained on a different subset of the training data. The
final prediction is made by averaging the predictions of all of the trees. For
example customer churn or sales data using this.

Regression Evaluation Metrics

Evaluation in machine learning measures the performance of a model.


Here are some popular evaluation metrics for regression:

 Mean Absolute Error (MAE): The average absolute difference


between the predicted and actual values of the target variable.

 Mean Squared Error (MSE): The average squared difference


between the predicted and actual values of the target variable.

 Root Mean Squared Error (RMSE): Square root of the mean


squared error.

 Huber Loss: A hybrid loss function that transitions from MAE to


MSE for larger errors, providing balance between robustness and
MSE’s sensitivity to outliers.

 R2 – Score: Higher values indicate better fit ranging from 0 to 1.

Applications of Regression

 Predicting prices: Used to predict the price of a house based on


its size, location and other features.

2
 Forecasting trends: Model to forecast the sales of a product based
on historical sales data.

 Identifying risk factors: Used to identify risk factors for heart


patient based on patient medical data.

 Making decisions: It could be used to recommend which stock to


buy based on market data.

Advantages of Regression

 Easy to understand and interpret.

 Robust to outliers.

 Can handle both linear relationships easily.

Disadvantages of Regression

 Assumes linearity.

 Sensitive to situation where two or more independent variables are


highly correlated with each other i.e multicollinearity.

 May not be suitable for highly complex relationships.

Classification vs Regression in Machine Learning

Classification and regression are two primary tasks in supervised machine


learning, where key difference lies in the nature of the output:
classification deals with discrete outcomes (e.g., yes/no,
categories), while regression handles continuous values (e.g.,
price, temperature).

Both approaches require labeled data for training but differ in their
objectives—classification aims to find decision boundaries that separate
classes, whereas regression focuses on finding the best-fitting line to
predict numerical outcomes. Understanding these distinctions helps in
selecting the right approach for specific machine learning tasks.

3
For example, it can determine whether an email is spam or not, classify
images as “cat” or “dog,” or predict weather conditions like “sunny,”
“rainy,” or “cloudy.” with decision boundary and regression models are
used to predict house prices based on features like size and location, or
forecast stock prices over time with straight fit line.

Decision Boundary vs Best-Fit Line

When teaching the difference between classification and regression in


machine learning, a key concept to focus on is the decision
boundary (used in classification) versus the best-fit line (used in
regression). These are fundamental tools that help models make
predictions, but they serve distinctly different purposes.

1. Decision Boundary in Classification

It is an surface or line that separates data points into different


classes in a feature space. It can be linear (a straight line) or non-
linear (a curve), depending on the complexity of the data and the
algorithm used. For example:

 A linear decision boundary might separate two classes in a 2D space


with a straight line (e.g., logistic regression).

 A more complex model, may create non-linear boundaries to better


fit intricate datasets.

4
During training classifier learns to partition the feature space by
finding a boundary that minimizes classification errors.

 For binary classification, this boundary separates data points into


two groups (e.g., spam vs. non-spam emails).

 In multi-class classification, multiple boundaries are created to


separate more than two classes.

The decision boundary is not inherent to the training data but rather
depends on the classifier used; we will understand more about classifiers
in next chapter.

2. Best-Fit Line in Regression

In regression, a best-fit line (or regression line) represents the


relationship between independent variables (inputs) and a dependent
variable (output). It is used to predict continuous numerical values
capturing trends and relationships within the data, allowing for accurate
predictions of continuous variables. The best-fit line can be linear or
non-linear:

 A straight line is used for linear regression.

 Curves are used for more complex regressions, like polynomial


regression

5
The plot demonstrates Regression, where both Linear and Polynomial
models are used to predict continuous target values based on the input
feature, in contrast to Classification, which would create decision
boundaries to separate discrete classes.

Classification Algorithms

There are different types of classification algorithms that have been


developed over time to give the best results for classification tasks. Don’t
worry if they seem overwhelming at first—we’ll dive deeper into each
algorithm, one by one, in the upcoming chapters.

 Logistic Regression

 Decision Tree

 Random Forest

 K – Nearest Neighbors

 Support Vector Machine

 Naive Bayes

Regression Algorithms

There are different types of regression algorithms that have been


developed over time to give the best results for regression tasks.

 Lasso Regression

 Ridge Regression

 XGBoost Regressor

 LGBM Regressor

Comparison between Classification and Regression

6
Feature Classification Regression

In this problem statement,


the target variables are
Output discrete. Continuous numerical value
type (e.g., price, temperature).
Discrete categories (e.g.,
“spam” or “not spam”)

To predict an exact
To predict which category
Goal numerical value based on
a data point belongs to.
input data.

Email spam detection,


House price prediction, stock
Example image recognition,
market forecasting, sales
problems customer sentiment
prediction.
analysis.

Evaluation metrics like


Evaluation Mean Squared Error, R2-
Precision, Recall, and F1-
metrics Score, , MAPE and RMSE.
Score

Clearly defined No distinct boundaries,


Decision
boundaries between focuses on finding the best
boundary
different classes. fit line.

Linear Regression,
Logistic regression,
Common Polynomial Regression,
Decision trees, Support
algorithms Decision Trees (with
Vector Machines (SVM)
regression objective).

Classification vs Regression : Conclusion

Classification trees are employed when there’s a need to categorize the


dataset into distinct classes associated with the response variable. Often,
these classes are binary, such as “Yes” or “No,” and they are mutually
exclusive. While there are instances where there may be more than two
classes, a modified version of the classification tree algorithm is used in
those scenarios.

7
On the other hand, regression trees are utilized when dealing with
continuous response variables. For instance, if the response variable
represents continuous values like the price of an object or the
temperature for the day, a regression tree is the appropriate choice.

There are situations where a blend of regression and classification


approaches is necessary. For instance, ordinal regression comes into
play when dealing with ranked or ordered categories, while multi-
label classification is suitable for cases where data points can be
associated with multiple classes at the same time.

Linear Regression in Machine learning

Linear regression is a statistical method used to model the relationship


between a dependent variable and one or more independent variables. It
provides valuable insights for prediction and data analysis. This article will
explore its types, assumptions, implementation, advantages, and
evaluation metrics.

Understanding Linear Regression

Linear regression is also a type of supervised machine-learning


algorithm that learns from the labelled datasets and maps the data
points with most optimized linear functions which can be used for
prediction on new datasets. It computes the linear relationship between
the dependent variable and one or more independent features by fitting a
linear equation with observed data. It predicts the continuous output
variables based on the independent input variable.

For example if we want to predict house price we consider various factor


such as house age, distance from the main road, location, area and
number of room, linear regression uses all these parameter to predict
house price as it consider a linear relation between all these features and
price of house.

Why Linear Regression is Important?

The interpretability of linear regression is one of its greatest strengths.


The model’s equation offers clear coefficients that illustrate the influence
of each independent variable on the dependent variable, enhancing our
understanding of the underlying relationships. Its simplicity is a significant
advantage; linear regression is transparent, easy to implement, and
serves as a foundational concept for more advanced algorithms.

Now that we have discussed why linear regression is important now we


will discuss its working based on best fit line in regression.

What is the best Fit Line?

8
Our primary objective while using linear regression is to locate the best-fit
line, which implies that the error between the predicted and actual values
should be kept to a minimum. There will be the least error in the best-fit
line.

The best Fit Line equation provides a straight line that represents the
relationship between the dependent and independent variables. The slope
of the line indicates how much the dependent variable changes for a unit
change in the independent variable(s).

Here Y is called a dependent or target variable and X is called an


independent variable also known as the predictor of Y. There are many
types of functions or modules that can be used for regression. A linear
function is the simplest type of function. Here, X may be a single feature
or multiple features representing the problem.

Linear regression performs the task to predict a dependent variable value


(y) based on a given independent variable (x)). Hence, the name is Linear
Regression. In the figure above, X (input) is the work experience and Y
(output) is the salary of a person. The regression line is the best-fit line for
our model.

In linear regression some hypothesis are made to ensure reliability of the


model’s results.

Hypothesis function in Linear Regression

Assumptions are:

 Linearity: It assumes that there is a linear relationship between the


independent and dependent variables. This means that changes in

9
the independent variable lead to proportional changes in the
dependent variable.

 Independence: The observations should be independent from each


other that is the errors from one observation should not influence
other.

As we have discussed that our independent feature is the experience i.e X


and the respective salary Y is the dependent variable. Let’s assume there
is a linear relationship between X and Y then the salary can be predicted
using:

10
Types of Linear Regression

When there is only one independent feature it is known as Simple Linear


Regression or Univariate Linear Regression and when there are more
than one feature it is known as Multiple Linear
Regression or Multivariate Regression.

1. Simple Linear Regression

Simple linear regression is the simplest form of linear regression and it


involves only one independent variable and one dependent variable. The
equation for simple linear regression is:
y=β0+β1Xy=β0+β1X
where:

 Y is the dependent variable

 X is the independent variable

 β0 is the intercept

 β1 is the slope

Assumptions of Simple Linear Regression

Linear regression is a powerful tool for understanding and predicting the


behavior of a variable, however, it needs to meet a few conditions in order
to be accurate and dependable solutions.

1. Linearity: The independent and dependent variables have a linear


relationship with one another. This implies that changes in the
dependent variable follow those in the independent variable(s) in a
linear fashion. This means that there should be a straight line that
can be drawn through the data points. If the relationship is not
linear, then linear regression will not be an accurate model.

1. Independence: The observations in the dataset are


independent of each other. This means that the value of the
dependent variable for one observation does not depend on

11
the value of the dependent variable for another
observation. If the observations are not independent, then
linear regression will not be an accurate model.
3. Homoscedasticity: Across all levels of the independent
variable(s), the variance of the errors is constant. This
indicates that the amount of the independent variable(s)
has no impact on the variance of the errors. If the
variance of the residuals is not constant, then linear
regression will not be an accurate model

4. Normality: The residuals should be normally distributed. This


means that the residuals should follow a bell-shaped curve. If the
residuals are not normally distributed, then linear regression will not
be an accurate model.

Use Case of Simple Linear Regression

 In a case study evaluating student performance analysts use simple


linear regression to examine the relationship between study hours
and exam scores. By collecting data on the number of hours
students studied and their corresponding exam results the analysts
developed a model that reveal correlation, for each additional hour
spent studying, students exam scores increased by an average of 5
points. This case highlights the utility of simple linear regression in
understanding and improving academic performance.

 Another case study focus on marketing and sales where businesses


uses simple linear regression to forecast sales based on historical
data particularly examining how factors like advertising expenditure
influence revenue. By collecting data on past advertising spending

12
and corresponding sales figures analysts develop a regression
model that tells the relationship between these variables. For
instance if the analysis reveals that for every additional dollar spent
on advertising sales increase by $10. This predictive capability
enables companies to optimize their advertising strategies and
allocate resources effectively.

The goal of the algorithm is to find the best Fit Line equation that
can predict the values based on the independent variables.

In regression set of records are present with X and Y values and these
values are used to learn a function so if you want to predict Y from an
unknown X this learned function can be used. In regression we have to
find the value of Y, So, a function is required that predicts continuous Y in
the case of regression given X as independent features.

Assumptions of Multiple Linear Regression

For Multiple Linear Regression, all four of the assumptions from Simple
Linear Regression apply. In addition to this, below are few more:

1. No multicollinearity: There is no high correlation between the


independent variables. This indicates that there is little or no
correlation between the independent variables. Multicollinearity
occurs when two or more independent variables are highly
correlated with each other, which can make it difficult to determine
the individual effect of each variable on the dependent variable. If
there is multicollinearity, then multiple linear regression will not be
an accurate model.

2. Additivity: The model assumes that the effect of changes in a


predictor variable on the response variable is consistent regardless
of the values of the other variables. This assumption implies that

13
there is no interaction between variables in their effects on the
dependent variable.

3. Feature Selection: In multiple linear regression, it is essential to


carefully select the independent variables that will be included in
the model. Including irrelevant or redundant variables may lead to
overfitting and complicate the interpretation of the model.

4. Overfitting: Overfitting occurs when the model fits the training


data too closely, capturing noise or random fluctuations that do not
represent the true underlying relationship between variables. This
can lead to poor generalization performance on new, unseen data.

Multiple linear regression sometimes faces issues like multicollinearity.

Multicollinearity

Multicollinearity is a statistical phenomenon where two or more


independent variables in a multiple regression model are highly
correlated, making it difficult to assess the individual effects of each
variable on the dependent variable.

Detecting Multicollinearity includes two techniques:

 Correlation Matrix: Examining the correlation matrix among the


independent variables is a common way to detect multicollinearity.
High correlations (close to 1 or -1) indicate potential
multicollinearity.

 VIF (Variance Inflation Factor): VIF is a measure that quantifies


how much the variance of an estimated regression coefficient
increases if your predictors are correlated. A high VIF (typically
above 10) suggests multicollinearity.

Use Case of Multiple Linear Regression

Multiple linear regression allows us to analyze relationship between


multiple independent variables and a single dependent variable. Here are
some use cases:

 Real Estate Pricing: In real estate MLR is used to predict property


prices based on multiple factors such as location, size, number of
bedrooms, etc. This helps buyers and sellers understand market
trends and set competitive prices.

 Financial Forecasting: Financial analysts use MLR to predict stock


prices or economic indicators based on multiple influencing factors
such as interest rates, inflation rates and market trends. This
enables better investment strategies and risk management24.

14
 Agricultural Yield Prediction: Farmers can use MLR to estimate
crop yields based on several variables like rainfall, temperature, soil
quality and fertilizer usage. This information helps in planning
agricultural practices for optimal productivity

 E-commerce Sales Analysis: An e-commerce company can utilize


MLR to assess how various factors such as product price, marketing
promotions and seasonal trends impact sales.

Now that we have understood about linear regression, its assumption and
its type now we will learn how to make a linear regression model.

Cost function for Linear Regression

As we have discussed earlier about best fit line in linear regression, its not
easy to get it easily in real life cases so we need to calculate errors that
affects it. These errors need to be calculated to mitigate them. The
difference between the predicted value Y^ Y^ and the true value Y and
it is called cost function or the loss function.

In Linear Regression, the Mean Squared Error (MSE) cost function is


employed, which calculates the average of the squared errors between
the predicted values y^iy^i and the actual values yiyi. The purpose is to
determine the optimal values for the intercept θ1θ1 and the coefficient of
the input feature θ2θ2 providing the best-fit line for the given data points.
The linear equation expressing this relationship is y^i=θ1+θ2xiy^i=θ1
+θ2xi.

MSE function can be calculated as:

Cost function(J)=1n∑ni(yi^−yi)2Cost function(J)=n1∑ni(yi^−yi)2

Utilizing the MSE function, the iterative process of gradient descent is


applied to update the values of \θ1&θ2θ1&θ2. This ensures that the MSE
value converges to the global minima, signifying the most accurate fit of
the linear regression line to the dataset.

This process involves continuously adjusting the parameters \(\theta_1\)


and \(\theta_2\) based on the gradients calculated from the MSE. The final
result is a linear regression line that minimizes the overall squared
differences between the predicted and actual values, providing an optimal
representation of the underlying relationship in the data.

Now we have calculated loss function we need to optimize model to


mtigate this error and it is done through gradient descent.

Gradient Descent for Linear Regression

15
A linear regression model can be trained using the optimization
algorithm gradient descent by iteratively modifying the model’s
parameters to reduce the mean squared error (MSE) of the model on a
training dataset. To update θ1 and θ2 values in order to reduce the Cost
function (minimizing RMSE value) and achieve the best-fit line the model
uses Gradient Descent. The idea is to start with random θ1 and θ2 values
and then iteratively update the values, reaching minimum cost.

A gradient is nothing but a derivative that defines the effects on outputs


of the function with a little bit of variation in inputs.

Let’s differentiate the cost function(J) with respect to θ1 θ1

16
Finding the coefficients of a linear equation that best fits the training data
is the objective of linear regression. By moving in the direction of the
Mean Squared Error negative gradient with respect to the coefficients, the
coefficients can be changed. And the respective intercept and coefficient
of X will be if α α is the learning rate.

17
valuation Metrics for Linear Regression

A variety of evaluation measures can be used to determine the strength of


any linear regression model. These assessment metrics often give an
indication of how well the model is producing the observed outputs.

The most common measurements are:

Mean Square Error (MSE)

Mean Squared Error (MSE) is an evaluation metric that calculates the


average of the squared differences between the actual and predicted
values for all the data points. The difference is squared to ensure that
negative and positive differences don’t cancel each other out.

Here,

 n is the number of data points.

 yi is the actual or observed value for the ith data point.

 yi^yi is the predicted value for the ith data point.

MSE is a way to quantify the accuracy of a model’s predictions. MSE is


sensitive to outliers as large errors contribute significantly to the overall
score.

Mean Absolute Error (MAE)

Mean Absolute Error is an evaluation metric used to calculate the


accuracy of a regression model. MAE measures the average absolute
difference between the predicted values and actual values.

Mathematically, MAE is expressed as:

Here,

18
 n is the number of observations

 Yi represents the actual values.

 Yi^Yi represents the predicted values

Lower MAE value indicates better model performance. It is not sensitive to


the outliers as we consider absolute differences.

Root Mean Squared Error (RMSE)

The square root of the residuals’ variance is the Root Mean Squared Error.
It describes how well the observed data points match the expected
values, or the model’s absolute fit to the data.

RSME is not as good of a metric as R-squared. Root Mean Squared Error


can fluctuate when the units of the variables vary since its value is
dependent on the variables’ units (it is not a normalized measure).

Coefficient of Determination (R-squared)

19
R squared metric is a measure of the proportion of variance in the
dependent variable that is explained the independent variables in the
model.

Adjusted R-Squared Error

Adjusted R2 measures the proportion of variance in the dependent


variable that is explained by independent variables in a regression
model. Adjusted R-square accounts the number of predictors in the model
and penalizes the model for including irrelevant predictors that don’t
contribute significantly to explain the variance in the dependent variables.

Mathematically, adjusted R2 is expressed as:

Here,

 n is the number of observations

 k is the number of predictors in the model

 R2 is coeeficient of determination

Adjusted R-square helps to prevent overfitting. It penalizes the model with


additional predictors that do not contribute significantly to explain the
variance in the dependent variable.

While evaluation metrics help us measure the performance of a model,


regularization helps in improving that performance by addressing
overfitting and enhancing generalization.

20
Regularization Techniques for Linear Models

Lasso Regression (L1 Regularization)

Lasso Regression is a technique used for regularizing a linear regression


model, it adds a penalty term to the linear regression objective function to
prevent overfitting.

The objective function after applying lasso regression is:

 the first term is the least squares loss, representing the squared
difference between predicted and actual values.

 the second term is the L1 regularization term, it penalizes the sum


of absolute values of the regression coefficient θj.

Ridge Regression (L2 Regularization)

Ridge regression is a linear regression technique that adds a


regularization term to the standard linear objective. Again, the goal is to
prevent overfitting by penalizing large coefficient in linear regression
equation. It useful when the dataset has multicollinearity where predictor
variables are highly correlated.

The objective function after applying ridge regression is:

 the first term is the least squares loss, representing the squared
difference between predicted and actual values.

 the second term is the L1 regularization term, it penalizes the sum


of square of values of the regression coefficient θj.

Elastic Net Regression

Elastic Net Regression is a hybrid regularization technique that combines


the power of both L1 and L2 regularization in linear regression objective.

 the first term is least square loss.

 the second term is L1 regularization and third is ridge regression.

 λ is the overall regularization strength.

21
 α controls the mix between L1 and L2 regularization.

Logistic Regression in Machine Learning

What is Logistic Regression?

Logistic regression is a supervised machine learning


algorithm used for classification tasks where the goal is to predict the
probability that an instance belongs to a given class or not. Logistic
regression is a statistical algorithm which analyze the relationship
between two data factors. The article explores the fundamentals of
logistic regression, it’s types and implementations.

Logistic regression is used for binary classification where we use sigmoid


function, that takes input as independent variables and produces a
probability value between 0 and 1.

For example, we have two classes Class 0 and Class 1 if the value of the
logistic function for an input is greater than 0.5 (threshold value) then it
belongs to Class 1 otherwise it belongs to Class 0. It’s referred to as
regression because it is the extension of linear regression but is mainly
used for classification problems.

Key Points:

 Logistic regression predicts the output of a categorical dependent


variable. Therefore, the outcome must be a categorical or discrete
value.

 It can be either Yes or No, 0 or 1, true or False, etc. but instead of


giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.

 In Logistic regression, instead of fitting a regression line, we fit an


“S” shaped logistic function, which predicts two maximum values (0
or 1).

Types of Logistic Regression

On the basis of the categories, Logistic Regression can be classified into


three types:

1. Binomial: In binomial Logistic regression, there can be only two


possible types of the dependent variables, such as 0 or 1, Pass or
Fail, etc.

22
2. Multinomial: In multinomial Logistic regression, there can be 3 or
more possible unordered types of the dependent variable, such as
“cat”, “dogs”, or “sheep”

3. Ordinal: In ordinal Logistic regression, there can be 3 or more


possible ordered types of dependent variables, such as “low”,
“Medium”, or “High”.

Assumptions of Logistic Regression

We will explore the assumptions of logistic regression as understanding


these assumptions is important to ensure that we are using appropriate
application of the model. The assumption include:

1. Independent observations: Each observation is independent of the


other. meaning there is no correlation between any input variables.

2. Binary dependent variables: It takes the assumption that the


dependent variable must be binary or dichotomous, meaning it can
take only two values. For more than two categories SoftMax
functions are used.

3. Linearity relationship between independent variables and log odds:


The relationship between the independent variables and the log
odds of the dependent variable should be linear.

4. No outliers: There should be no outliers in the dataset.

5. Large sample size: The sample size is sufficiently large

Understanding Sigmoid Function

So far, we’ve covered the basics of logistic regression, but now let’s focus
on the most important function that forms the core of logistic regression.

 The sigmoid function is a mathematical function used to map the


predicted values to probabilities.

 It maps any real value into another value within a range of 0 and 1.
The value of the logistic regression must be between 0 and 1, which
cannot go beyond this limit, so it forms a curve like the “S” form.

 The S-form curve is called the Sigmoid function or the logistic


function.

 In logistic regression, we use the concept of the threshold value,


which defines the probability of either 0 or 1. Such as values above
the threshold value tends to 1, and a value below the threshold
values tends to 0.

How does Logistic Regression work?


23
The logistic regression model transforms the linear regression function
continuous value output into categorical value output using a sigmoid
function, which maps any real-valued set of independent variables input
into a value between 0 and 1. This function is known as the logistic
function.

24
Equation of Logistic Regression:

The odd is the ratio of something occurring to something not occurring. it


is different from probability as the probability is the ratio of something
occurring to everything that could possibly occur. so odd will be:

25
Likelihood Function for Logistic Regression

The predicted probabilities will be:

 for y=1 The predicted probabilities will be: p(X;b,w) = p(x)

 for y = 0 The predicted probabilities will be: 1-p(X;b,w) = 1-p(x)

26
Terminologies involved in Logistic Regression

Here are some common terms involved in logistic regression:

 Independent variables: The input characteristics or predictor


factors applied to the dependent variable’s predictions.

 Dependent variable: The target variable in a logistic regression


model, which we are trying to predict.

 Logistic function: The formula used to represent how the


independent and dependent variables relate to one another. The
logistic function transforms the input variables into a probability

27
value between 0 and 1, which represents the likelihood of the
dependent variable being 1 or 0.

 Odds: It is the ratio of something occurring to something not


occurring. it is different from probability as the probability is the
ratio of something occurring to everything that could possibly occur.

 Log-odds: The log-odds, also known as the logit function, is the


natural logarithm of the odds. In logistic regression, the log odds of
the dependent variable are modeled as a linear combination of the
independent variables and the intercept.

 Coefficient: The logistic regression model’s estimated parameters,


show how the independent and dependent variables relate to one
another.

 Intercept: A constant term in the logistic regression model, which


represents the log odds when all independent variables are equal to
zero.

 Maximum likelihood estimation: The method used to estimate


the coefficients of the logistic regression model, which maximizes
the likelihood of observing the data given the model

Multinomial Logistic Regression:

Target variable can have 3 or more possible types which are not ordered
(i.e. types have no quantitative significance) like “disease A” vs
“disease B” vs “disease C”.

How to Evaluate Logistic Regression Model?

28
So far, we’ve covered the implementation of logistic regression. Now, let’s
dive into the evaluation of logistic regression and understand why it’s
important

Evaluating the model helps us assess the model’s performance and


ensure it generalizes well to new data

We can evaluate the logistic regression model using the following metrics:

 Area Under the Receiver Operating Characteristic Curve


(AUC-ROC): The ROC curve plots the true positive rate against the
false positive rate at various thresholds. AUC-ROC measures the
area under this curve, providing an aggregate measure of a model’s
performance across different classification thresholds.

 Area Under the Precision-Recall Curve (AUC-PR): Similar to


AUC-ROC, AUC-PR measures the area under the precision-recall
curve, providing a summary of a model’s performance across
different precision-recall trade-offs.

 Differences Between Linear and Logistic


Regression
 Now lets dive into the key differences of Linear Regression
and Logistic Regression and evaluate that how they are
different from each other.
 The difference between linear regression and logistic
regression is that linear regression output is the continuous
value that can be anything while logistic regression predicts
the probability that an instance belongs to a given class or
not.

29
Linear Regression Logistic Regression

Linear regression is used Logistic regression is


to predict the used to predict the
continuous dependent categorical dependent
variable using a given variable using a given
set of independent set of independent
variables. variables.

Linear regression is used


It is used for solving
for solving regression
classification problems.
problem.

In this we predict the


In this we predict values
value of continuous
of categorical variables
variables

In this we find best fit


In this we find S-Curve.
line.

Maximum likelihood
Least square estimation
estimation method is
method is used for
used for Estimation of
estimation of accuracy.
accuracy.

The output must be Output must be


continuous value, such categorical value such
as price, age, etc. as 0 or 1, Yes or no, etc.

It required linear
relationship between It not required linear
dependent and relationship.
independent variables.

There may be There should be little to


collinearity between the no collinearity between
independent variables. independent variables.

30

You might also like