0% found this document useful (0 votes)

2 views

Module5

Regression in machine learning is a supervised learning technique used to predict continuous numerical values based on independent features, with various types including simple, multiple, polynomial, and regularized regressions. Evaluation metrics such as Mean Absolute Error and R2-Score are used to assess model performance, while applications range from price prediction to trend forecasting. The key distinction between regression and classification lies in the nature of the output, with regression focusing on continuous values and classification on discrete categories.

Uploaded by

Bhagya Lakshmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Module5

Uploaded by

Bhagya Lakshmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Regression in machine learning

Regression in machine learning refers to a supervised

learning technique where the goal is to predict a continuous numerical
value based on one or more independent features. It finds relationships
between variables so that predictions can be made. we have two types of
variables present in regression:

 Dependent Variable (Target): The variable we are trying to

predict e.g house price.

 Independent Variables (Features): The input variables that

influence the prediction e.g locality, number of rooms.

Regression analysis problem works with if output variable is a real or

continuous value such as “salary” or “weight”. Many different regression
models can be used but the simplest model in them is linear regression.

Types of Regression

Regression can be classified into different types based on the number of

predictor variables and the nature of the relationship between variables:

1. Simple Linear Regression

Linear regression is one of the simplest and most widely used statistical
models. This assumes that there is a linear relationship between the
independent and dependent variables. This means that the change in the
dependent variable is proportional to the change in the independent
variables. For example predicting the price of a house based on its size.

2. Multiple Linear Regression

Multiple linear regression extends simple linear regression by using

multiple independent variables to predict target variable. For example
predicting the price of a house based on multiple features such as size,
location, number of rooms, etc.

3. Polynomial Regression

Polynomial regression is used to model with non-linear relationships

between the dependent variable and the independent variables. It adds
polynomial terms to the linear regression model to capture more complex
relationships. For example when we want to predict a non-linear trend like
population growth over time we use polynomial regression.

4. Ridge & Lasso Regression

Ridge & lasso regression are regularized versions of linear regression that
help avoid overfitting by penalizing large coefficients. When there’s a risk

1
of overfitting due to too many features we use these type of regression
algorithms.

5. Support Vector Regression (SVR)

SVR is a type of regression algorithm that is based on the Support Vector

Machine (SVM) algorithm. SVM is a type of algorithm that is used for
classification tasks but it can also be used for regression tasks. SVR works
by finding a hyperplane that minimizes the sum of the squared residuals
between the predicted and actual values.

6. Decision Tree Regression

Decision tree Uses a tree-like structure to make decisions where each

branch of tree represents a decision and leaves represent outcomes. For
example predicting customer behavior based on features like age, income,
etc there we use decison tree regression.

7. Random Forest Regression

Random Forest is a ensemble method that builds multiple decision trees

and each tree is trained on a different subset of the training data. The
final prediction is made by averaging the predictions of all of the trees. For
example customer churn or sales data using this.

Regression Evaluation Metrics

Evaluation in machine learning measures the performance of a model.

Here are some popular evaluation metrics for regression:

 Mean Absolute Error (MAE): The average absolute difference

between the predicted and actual values of the target variable.

 Mean Squared Error (MSE): The average squared difference

between the predicted and actual values of the target variable.

 Root Mean Squared Error (RMSE): Square root of the mean

squared error.

 Huber Loss: A hybrid loss function that transitions from MAE to

MSE for larger errors, providing balance between robustness and
MSE’s sensitivity to outliers.

 R2 – Score: Higher values indicate better fit ranging from 0 to 1.

Applications of Regression

 Predicting prices: Used to predict the price of a house based on

its size, location and other features.

2
 Forecasting trends: Model to forecast the sales of a product based
on historical sales data.

 Identifying risk factors: Used to identify risk factors for heart

patient based on patient medical data.

 Making decisions: It could be used to recommend which stock to

buy based on market data.

Advantages of Regression

 Easy to understand and interpret.

 Robust to outliers.

 Can handle both linear relationships easily.

Disadvantages of Regression

 Assumes linearity.

 Sensitive to situation where two or more independent variables are

highly correlated with each other i.e multicollinearity.

 May not be suitable for highly complex relationships.

Classification vs Regression in Machine Learning

Classification and regression are two primary tasks in supervised machine

learning, where key difference lies in the nature of the output:
classification deals with discrete outcomes (e.g., yes/no,
categories), while regression handles continuous values (e.g.,
price, temperature).

Both approaches require labeled data for training but differ in their
objectives—classification aims to find decision boundaries that separate
classes, whereas regression focuses on finding the best-fitting line to
predict numerical outcomes. Understanding these distinctions helps in
selecting the right approach for specific machine learning tasks.

3
For example, it can determine whether an email is spam or not, classify
images as “cat” or “dog,” or predict weather conditions like “sunny,”
“rainy,” or “cloudy.” with decision boundary and regression models are
used to predict house prices based on features like size and location, or
forecast stock prices over time with straight fit line.

Decision Boundary vs Best-Fit Line

When teaching the difference between classification and regression in

machine learning, a key concept to focus on is the decision
boundary (used in classification) versus the best-fit line (used in
regression). These are fundamental tools that help models make
predictions, but they serve distinctly different purposes.

1. Decision Boundary in Classification

It is an surface or line that separates data points into different

classes in a feature space. It can be linear (a straight line) or non-
linear (a curve), depending on the complexity of the data and the
algorithm used. For example:

 A linear decision boundary might separate two classes in a 2D space

with a straight line (e.g., logistic regression).

 A more complex model, may create non-linear boundaries to better

fit intricate datasets.

4
During training classifier learns to partition the feature space by
finding a boundary that minimizes classification errors.

 For binary classification, this boundary separates data points into

two groups (e.g., spam vs. non-spam emails).

 In multi-class classification, multiple boundaries are created to

separate more than two classes.

The decision boundary is not inherent to the training data but rather
depends on the classifier used; we will understand more about classifiers
in next chapter.

2. Best-Fit Line in Regression

In regression, a best-fit line (or regression line) represents the

relationship between independent variables (inputs) and a dependent
variable (output). It is used to predict continuous numerical values
capturing trends and relationships within the data, allowing for accurate
predictions of continuous variables. The best-fit line can be linear or
non-linear:

 A straight line is used for linear regression.

 Curves are used for more complex regressions, like polynomial

regression

5
The plot demonstrates Regression, where both Linear and Polynomial
models are used to predict continuous target values based on the input
feature, in contrast to Classification, which would create decision
boundaries to separate discrete classes.

Classification Algorithms

There are different types of classification algorithms that have been

developed over time to give the best results for classification tasks. Don’t
worry if they seem overwhelming at first—we’ll dive deeper into each
algorithm, one by one, in the upcoming chapters.

 Logistic Regression

 Decision Tree

 Random Forest

 K – Nearest Neighbors

 Support Vector Machine

 Naive Bayes

Regression Algorithms

There are different types of regression algorithms that have been

developed over time to give the best results for regression tasks.

 Lasso Regression

 Ridge Regression

 XGBoost Regressor

 LGBM Regressor

Comparison between Classification and Regression

6
Feature Classification Regression

In this problem statement,

the target variables are
Output discrete. Continuous numerical value
type (e.g., price, temperature).
Discrete categories (e.g.,
“spam” or “not spam”)

To predict an exact
To predict which category
Goal numerical value based on
a data point belongs to.
input data.

Email spam detection,

House price prediction, stock
Example image recognition,
market forecasting, sales
problems customer sentiment
prediction.
analysis.

Evaluation metrics like

Evaluation Mean Squared Error, R2-
Precision, Recall, and F1-
metrics Score, , MAPE and RMSE.
Score

Clearly defined No distinct boundaries,

Decision
boundaries between focuses on finding the best
boundary
different classes. fit line.

Linear Regression,
Logistic regression,
Common Polynomial Regression,
Decision trees, Support
algorithms Decision Trees (with
Vector Machines (SVM)
regression objective).

Classification vs Regression : Conclusion

Classification trees are employed when there’s a need to categorize the

dataset into distinct classes associated with the response variable. Often,
these classes are binary, such as “Yes” or “No,” and they are mutually
exclusive. While there are instances where there may be more than two
classes, a modified version of the classification tree algorithm is used in
those scenarios.

7
On the other hand, regression trees are utilized when dealing with
continuous response variables. For instance, if the response variable
represents continuous values like the price of an object or the
temperature for the day, a regression tree is the appropriate choice.

There are situations where a blend of regression and classification

approaches is necessary. For instance, ordinal regression comes into
play when dealing with ranked or ordered categories, while multi-
label classification is suitable for cases where data points can be
associated with multiple classes at the same time.

Linear Regression in Machine learning

Linear regression is a statistical method used to model the relationship

between a dependent variable and one or more independent variables. It
provides valuable insights for prediction and data analysis. This article will
explore its types, assumptions, implementation, advantages, and
evaluation metrics.

Understanding Linear Regression

Linear regression is also a type of supervised machine-learning

algorithm that learns from the labelled datasets and maps the data
points with most optimized linear functions which can be used for
prediction on new datasets. It computes the linear relationship between
the dependent variable and one or more independent features by fitting a
linear equation with observed data. It predicts the continuous output
variables based on the independent input variable.

For example if we want to predict house price we consider various factor

such as house age, distance from the main road, location, area and
number of room, linear regression uses all these parameter to predict
house price as it consider a linear relation between all these features and
price of house.

Why Linear Regression is Important?

The interpretability of linear regression is one of its greatest strengths.

The model’s equation offers clear coefficients that illustrate the influence
of each independent variable on the dependent variable, enhancing our
understanding of the underlying relationships. Its simplicity is a significant
advantage; linear regression is transparent, easy to implement, and
serves as a foundational concept for more advanced algorithms.

Now that we have discussed why linear regression is important now we

will discuss its working based on best fit line in regression.

What is the best Fit Line?

8
Our primary objective while using linear regression is to locate the best-fit
line, which implies that the error between the predicted and actual values
should be kept to a minimum. There will be the least error in the best-fit
line.

The best Fit Line equation provides a straight line that represents the
relationship between the dependent and independent variables. The slope
of the line indicates how much the dependent variable changes for a unit
change in the independent variable(s).

Here Y is called a dependent or target variable and X is called an

independent variable also known as the predictor of Y. There are many
types of functions or modules that can be used for regression. A linear
function is the simplest type of function. Here, X may be a single feature
or multiple features representing the problem.

Linear regression performs the task to predict a dependent variable value

(y) based on a given independent variable (x)). Hence, the name is Linear
Regression. In the figure above, X (input) is the work experience and Y
(output) is the salary of a person. The regression line is the best-fit line for
our model.

In linear regression some hypothesis are made to ensure reliability of the

model’s results.

Hypothesis function in Linear Regression

Assumptions are:

 Linearity: It assumes that there is a linear relationship between the

independent and dependent variables. This means that changes in

9
the independent variable lead to proportional changes in the
dependent variable.

 Independence: The observations should be independent from each

other that is the errors from one observation should not influence
other.

As we have discussed that our independent feature is the experience i.e X

and the respective salary Y is the dependent variable. Let’s assume there
is a linear relationship between X and Y then the salary can be predicted
using:

10
Types of Linear Regression

When there is only one independent feature it is known as Simple Linear

Regression or Univariate Linear Regression and when there are more
than one feature it is known as Multiple Linear
Regression or Multivariate Regression.

1. Simple Linear Regression

Simple linear regression is the simplest form of linear regression and it

involves only one independent variable and one dependent variable. The
equation for simple linear regression is:
y=β0+β1Xy=β0+β1X
where:

 Y is the dependent variable

 X is the independent variable

 β0 is the intercept

 β1 is the slope

Assumptions of Simple Linear Regression

Linear regression is a powerful tool for understanding and predicting the

behavior of a variable, however, it needs to meet a few conditions in order
to be accurate and dependable solutions.

1. Linearity: The independent and dependent variables have a linear

relationship with one another. This implies that changes in the
dependent variable follow those in the independent variable(s) in a
linear fashion. This means that there should be a straight line that
can be drawn through the data points. If the relationship is not
linear, then linear regression will not be an accurate model.

1. Independence: The observations in the dataset are

independent of each other. This means that the value of the
dependent variable for one observation does not depend on

11
the value of the dependent variable for another
observation. If the observations are not independent, then
linear regression will not be an accurate model.
3. Homoscedasticity: Across all levels of the independent
variable(s), the variance of the errors is constant. This
indicates that the amount of the independent variable(s)
has no impact on the variance of the errors. If the
variance of the residuals is not constant, then linear
regression will not be an accurate model

4. Normality: The residuals should be normally distributed. This

means that the residuals should follow a bell-shaped curve. If the
residuals are not normally distributed, then linear regression will not
be an accurate model.

Use Case of Simple Linear Regression

 In a case study evaluating student performance analysts use simple

linear regression to examine the relationship between study hours
and exam scores. By collecting data on the number of hours
students studied and their corresponding exam results the analysts
developed a model that reveal correlation, for each additional hour
spent studying, students exam scores increased by an average of 5
points. This case highlights the utility of simple linear regression in
understanding and improving academic performance.

 Another case study focus on marketing and sales where businesses

uses simple linear regression to forecast sales based on historical
data particularly examining how factors like advertising expenditure
influence revenue. By collecting data on past advertising spending

12
and corresponding sales figures analysts develop a regression
model that tells the relationship between these variables. For
instance if the analysis reveals that for every additional dollar spent
on advertising sales increase by $10. This predictive capability
enables companies to optimize their advertising strategies and
allocate resources effectively.

The goal of the algorithm is to find the best Fit Line equation that
can predict the values based on the independent variables.

In regression set of records are present with X and Y values and these
values are used to learn a function so if you want to predict Y from an
unknown X this learned function can be used. In regression we have to
find the value of Y, So, a function is required that predicts continuous Y in
the case of regression given X as independent features.

Assumptions of Multiple Linear Regression

For Multiple Linear Regression, all four of the assumptions from Simple
Linear Regression apply. In addition to this, below are few more:

1. No multicollinearity: There is no high correlation between the

independent variables. This indicates that there is little or no
correlation between the independent variables. Multicollinearity
occurs when two or more independent variables are highly
correlated with each other, which can make it difficult to determine
the individual effect of each variable on the dependent variable. If
there is multicollinearity, then multiple linear regression will not be
an accurate model.

2. Additivity: The model assumes that the effect of changes in a

predictor variable on the response variable is consistent regardless
of the values of the other variables. This assumption implies that

13
there is no interaction between variables in their effects on the
dependent variable.

3. Feature Selection: In multiple linear regression, it is essential to

carefully select the independent variables that will be included in
the model. Including irrelevant or redundant variables may lead to
overfitting and complicate the interpretation of the model.

4. Overfitting: Overfitting occurs when the model fits the training

data too closely, capturing noise or random fluctuations that do not
represent the true underlying relationship between variables. This
can lead to poor generalization performance on new, unseen data.

Multiple linear regression sometimes faces issues like multicollinearity.

Multicollinearity

Multicollinearity is a statistical phenomenon where two or more

independent variables in a multiple regression model are highly
correlated, making it difficult to assess the individual effects of each
variable on the dependent variable.

Detecting Multicollinearity includes two techniques:

 Correlation Matrix: Examining the correlation matrix among the

independent variables is a common way to detect multicollinearity.
High correlations (close to 1 or -1) indicate potential
multicollinearity.

 VIF (Variance Inflation Factor): VIF is a measure that quantifies

how much the variance of an estimated regression coefficient
increases if your predictors are correlated. A high VIF (typically
above 10) suggests multicollinearity.

Use Case of Multiple Linear Regression

Multiple linear regression allows us to analyze relationship between

multiple independent variables and a single dependent variable. Here are
some use cases:

 Real Estate Pricing: In real estate MLR is used to predict property

prices based on multiple factors such as location, size, number of
bedrooms, etc. This helps buyers and sellers understand market
trends and set competitive prices.

 Financial Forecasting: Financial analysts use MLR to predict stock

prices or economic indicators based on multiple influencing factors
such as interest rates, inflation rates and market trends. This
enables better investment strategies and risk management24.

14
 Agricultural Yield Prediction: Farmers can use MLR to estimate
crop yields based on several variables like rainfall, temperature, soil
quality and fertilizer usage. This information helps in planning
agricultural practices for optimal productivity

 E-commerce Sales Analysis: An e-commerce company can utilize

MLR to assess how various factors such as product price, marketing
promotions and seasonal trends impact sales.

Now that we have understood about linear regression, its assumption and
its type now we will learn how to make a linear regression model.

Cost function for Linear Regression

As we have discussed earlier about best fit line in linear regression, its not
easy to get it easily in real life cases so we need to calculate errors that
affects it. These errors need to be calculated to mitigate them. The
difference between the predicted value Y^ Y^ and the true value Y and
it is called cost function or the loss function.

In Linear Regression, the Mean Squared Error (MSE) cost function is

employed, which calculates the average of the squared errors between
the predicted values yîyî and the actual values yiyi. The purpose is to
determine the optimal values for the intercept θ1θ1 and the coefficient of
the input feature θ2θ2 providing the best-fit line for the given data points.
The linear equation expressing this relationship is yî=θ1+θ2xiyî=θ1
+θ2xi.

MSE function can be calculated as:

Cost function(J)=1n∑ni(yi^−yi)2Cost function(J)=n1∑ni(yi^−yi)2

Utilizing the MSE function, the iterative process of gradient descent is

applied to update the values of \θ1&θ2θ1&θ2. This ensures that the MSE
value converges to the global minima, signifying the most accurate fit of
the linear regression line to the dataset.

This process involves continuously adjusting the parameters $\theta_1$

and $\theta_2$ based on the gradients calculated from the MSE. The final
result is a linear regression line that minimizes the overall squared
differences between the predicted and actual values, providing an optimal
representation of the underlying relationship in the data.

Now we have calculated loss function we need to optimize model to

mtigate this error and it is done through gradient descent.

Gradient Descent for Linear Regression

15
A linear regression model can be trained using the optimization
algorithm gradient descent by iteratively modifying the model’s
parameters to reduce the mean squared error (MSE) of the model on a
training dataset. To update θ1 and θ2 values in order to reduce the Cost
function (minimizing RMSE value) and achieve the best-fit line the model
uses Gradient Descent. The idea is to start with random θ1 and θ2 values
and then iteratively update the values, reaching minimum cost.

A gradient is nothing but a derivative that defines the effects on outputs

of the function with a little bit of variation in inputs.

Let’s differentiate the cost function(J) with respect to θ1 θ1

16
Finding the coefficients of a linear equation that best fits the training data
is the objective of linear regression. By moving in the direction of the
Mean Squared Error negative gradient with respect to the coefficients, the
coefficients can be changed. And the respective intercept and coefficient
of X will be if α α is the learning rate.

17
valuation Metrics for Linear Regression

A variety of evaluation measures can be used to determine the strength of

any linear regression model. These assessment metrics often give an
indication of how well the model is producing the observed outputs.

The most common measurements are:

Mean Square Error (MSE)

Mean Squared Error (MSE) is an evaluation metric that calculates the

average of the squared differences between the actual and predicted
values for all the data points. The difference is squared to ensure that
negative and positive differences don’t cancel each other out.

Here,

 n is the number of data points.

 yi is the actual or observed value for the ith data point.

 yi^yi is the predicted value for the ith data point.

MSE is a way to quantify the accuracy of a model’s predictions. MSE is

sensitive to outliers as large errors contribute significantly to the overall
score.

Mean Absolute Error (MAE)

Mean Absolute Error is an evaluation metric used to calculate the

accuracy of a regression model. MAE measures the average absolute
difference between the predicted values and actual values.

Mathematically, MAE is expressed as:

Here,

18
 n is the number of observations

 Yi represents the actual values.

 Yi^Yi represents the predicted values

Lower MAE value indicates better model performance. It is not sensitive to

the outliers as we consider absolute differences.

Root Mean Squared Error (RMSE)

The square root of the residuals’ variance is the Root Mean Squared Error.
It describes how well the observed data points match the expected
values, or the model’s absolute fit to the data.

RSME is not as good of a metric as R-squared. Root Mean Squared Error

can fluctuate when the units of the variables vary since its value is
dependent on the variables’ units (it is not a normalized measure).

Coefficient of Determination (R-squared)

19
R squared metric is a measure of the proportion of variance in the
dependent variable that is explained the independent variables in the
model.

Adjusted R-Squared Error

Adjusted R2 measures the proportion of variance in the dependent

variable that is explained by independent variables in a regression
model. Adjusted R-square accounts the number of predictors in the model
and penalizes the model for including irrelevant predictors that don’t
contribute significantly to explain the variance in the dependent variables.

Mathematically, adjusted R2 is expressed as:

Here,

 n is the number of observations

 k is the number of predictors in the model

 R2 is coeeficient of determination

Adjusted R-square helps to prevent overfitting. It penalizes the model with

additional predictors that do not contribute significantly to explain the
variance in the dependent variable.

While evaluation metrics help us measure the performance of a model,

regularization helps in improving that performance by addressing
overfitting and enhancing generalization.

20
Regularization Techniques for Linear Models

Lasso Regression (L1 Regularization)

Lasso Regression is a technique used for regularizing a linear regression

model, it adds a penalty term to the linear regression objective function to
prevent overfitting.

The objective function after applying lasso regression is:

 the first term is the least squares loss, representing the squared
difference between predicted and actual values.

 the second term is the L1 regularization term, it penalizes the sum

of absolute values of the regression coefficient θj.

Ridge Regression (L2 Regularization)

Ridge regression is a linear regression technique that adds a

regularization term to the standard linear objective. Again, the goal is to
prevent overfitting by penalizing large coefficient in linear regression
equation. It useful when the dataset has multicollinearity where predictor
variables are highly correlated.

The objective function after applying ridge regression is:

 the first term is the least squares loss, representing the squared
difference between predicted and actual values.

 the second term is the L1 regularization term, it penalizes the sum

of square of values of the regression coefficient θj.

Elastic Net Regression

Elastic Net Regression is a hybrid regularization technique that combines

the power of both L1 and L2 regularization in linear regression objective.

 the first term is least square loss.

 the second term is L1 regularization and third is ridge regression.

 λ is the overall regularization strength.

21
 α controls the mix between L1 and L2 regularization.

Logistic Regression in Machine Learning

What is Logistic Regression?

Logistic regression is a supervised machine learning

algorithm used for classification tasks where the goal is to predict the
probability that an instance belongs to a given class or not. Logistic
regression is a statistical algorithm which analyze the relationship
between two data factors. The article explores the fundamentals of
logistic regression, it’s types and implementations.

Logistic regression is used for binary classification where we use sigmoid

function, that takes input as independent variables and produces a
probability value between 0 and 1.

For example, we have two classes Class 0 and Class 1 if the value of the
logistic function for an input is greater than 0.5 (threshold value) then it
belongs to Class 1 otherwise it belongs to Class 0. It’s referred to as
regression because it is the extension of linear regression but is mainly
used for classification problems.

Key Points:

 Logistic regression predicts the output of a categorical dependent

variable. Therefore, the outcome must be a categorical or discrete
value.

 It can be either Yes or No, 0 or 1, true or False, etc. but instead of

giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.

 In Logistic regression, instead of fitting a regression line, we fit an

“S” shaped logistic function, which predicts two maximum values (0
or 1).

Types of Logistic Regression

On the basis of the categories, Logistic Regression can be classified into

three types:

1. Binomial: In binomial Logistic regression, there can be only two

possible types of the dependent variables, such as 0 or 1, Pass or
Fail, etc.

22
2. Multinomial: In multinomial Logistic regression, there can be 3 or
more possible unordered types of the dependent variable, such as
“cat”, “dogs”, or “sheep”

3. Ordinal: In ordinal Logistic regression, there can be 3 or more

possible ordered types of dependent variables, such as “low”,
“Medium”, or “High”.

Assumptions of Logistic Regression

We will explore the assumptions of logistic regression as understanding

these assumptions is important to ensure that we are using appropriate
application of the model. The assumption include:

1. Independent observations: Each observation is independent of the

other. meaning there is no correlation between any input variables.

2. Binary dependent variables: It takes the assumption that the

dependent variable must be binary or dichotomous, meaning it can
take only two values. For more than two categories SoftMax
functions are used.

3. Linearity relationship between independent variables and log odds:

The relationship between the independent variables and the log
odds of the dependent variable should be linear.

4. No outliers: There should be no outliers in the dataset.

5. Large sample size: The sample size is sufficiently large

Understanding Sigmoid Function

So far, we’ve covered the basics of logistic regression, but now let’s focus
on the most important function that forms the core of logistic regression.

 The sigmoid function is a mathematical function used to map the

predicted values to probabilities.

 It maps any real value into another value within a range of 0 and 1.
The value of the logistic regression must be between 0 and 1, which
cannot go beyond this limit, so it forms a curve like the “S” form.

 The S-form curve is called the Sigmoid function or the logistic

function.

 In logistic regression, we use the concept of the threshold value,

which defines the probability of either 0 or 1. Such as values above
the threshold value tends to 1, and a value below the threshold
values tends to 0.

How does Logistic Regression work?

23
The logistic regression model transforms the linear regression function
continuous value output into categorical value output using a sigmoid
function, which maps any real-valued set of independent variables input
into a value between 0 and 1. This function is known as the logistic
function.

24
Equation of Logistic Regression:

The odd is the ratio of something occurring to something not occurring. it

is different from probability as the probability is the ratio of something
occurring to everything that could possibly occur. so odd will be:

25
Likelihood Function for Logistic Regression

The predicted probabilities will be:

 for y=1 The predicted probabilities will be: p(X;b,w) = p(x)

 for y = 0 The predicted probabilities will be: 1-p(X;b,w) = 1-p(x)

26
Terminologies involved in Logistic Regression

Here are some common terms involved in logistic regression:

 Independent variables: The input characteristics or predictor

factors applied to the dependent variable’s predictions.

 Dependent variable: The target variable in a logistic regression

model, which we are trying to predict.

 Logistic function: The formula used to represent how the

independent and dependent variables relate to one another. The
logistic function transforms the input variables into a probability

27
value between 0 and 1, which represents the likelihood of the
dependent variable being 1 or 0.

 Odds: It is the ratio of something occurring to something not

occurring. it is different from probability as the probability is the
ratio of something occurring to everything that could possibly occur.

 Log-odds: The log-odds, also known as the logit function, is the

natural logarithm of the odds. In logistic regression, the log odds of
the dependent variable are modeled as a linear combination of the
independent variables and the intercept.

 Coefficient: The logistic regression model’s estimated parameters,

show how the independent and dependent variables relate to one
another.

 Intercept: A constant term in the logistic regression model, which

represents the log odds when all independent variables are equal to
zero.

 Maximum likelihood estimation: The method used to estimate

the coefficients of the logistic regression model, which maximizes
the likelihood of observing the data given the model

Multinomial Logistic Regression:

Target variable can have 3 or more possible types which are not ordered
(i.e. types have no quantitative significance) like “disease A” vs
“disease B” vs “disease C”.

How to Evaluate Logistic Regression Model?

28
So far, we’ve covered the implementation of logistic regression. Now, let’s
dive into the evaluation of logistic regression and understand why it’s
important

Evaluating the model helps us assess the model’s performance and

ensure it generalizes well to new data

We can evaluate the logistic regression model using the following metrics:

 Area Under the Receiver Operating Characteristic Curve

(AUC-ROC): The ROC curve plots the true positive rate against the
false positive rate at various thresholds. AUC-ROC measures the
area under this curve, providing an aggregate measure of a model’s
performance across different classification thresholds.

 Area Under the Precision-Recall Curve (AUC-PR): Similar to

AUC-ROC, AUC-PR measures the area under the precision-recall
curve, providing a summary of a model’s performance across
different precision-recall trade-offs.

 Differences Between Linear and Logistic

Regression
 Now lets dive into the key differences of Linear Regression
and Logistic Regression and evaluate that how they are
different from each other.
 The difference between linear regression and logistic
regression is that linear regression output is the continuous
value that can be anything while logistic regression predicts
the probability that an instance belongs to a given class or
not.

29
Linear Regression Logistic Regression

Linear regression is used Logistic regression is

to predict the used to predict the
continuous dependent categorical dependent
variable using a given variable using a given
set of independent set of independent
variables. variables.

Linear regression is used

It is used for solving
for solving regression
classification problems.
problem.

In this we predict the

In this we predict values
value of continuous
of categorical variables
variables

In this we find best fit

In this we find S-Curve.
line.

Maximum likelihood
Least square estimation
estimation method is
method is used for
used for Estimation of
estimation of accuracy.
accuracy.

The output must be Output must be

continuous value, such categorical value such
as price, age, etc. as 0 or 1, Yes or no, etc.

It required linear
relationship between It not required linear
dependent and relationship.
independent variables.

There may be There should be little to

collinearity between the no collinearity between
independent variables. independent variables.

Introduction To Deterministic Models
80% (5)
Introduction To Deterministic Models
3 pages
ML 2 nd Unit
No ratings yet
ML 2 nd Unit
50 pages
Supervised Learning Regression
No ratings yet
Supervised Learning Regression
15 pages
Module_2
No ratings yet
Module_2
5 pages
Lec 2
No ratings yet
Lec 2
6 pages
9 Types of Regression Analysis
No ratings yet
9 Types of Regression Analysis
16 pages
ML points
No ratings yet
ML points
13 pages
ML-U2-Regression
No ratings yet
ML-U2-Regression
20 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
118 pages
UNIT 3 Regression
No ratings yet
UNIT 3 Regression
5 pages
AI lab7
No ratings yet
AI lab7
13 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Machine learning
No ratings yet
Machine learning
62 pages
Machine Learning Algorithns - Unit3
No ratings yet
Machine Learning Algorithns - Unit3
124 pages
Key Differences Between Regression and Classification (1)
No ratings yet
Key Differences Between Regression and Classification (1)
6 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-34-62
No ratings yet
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-34-62
29 pages
ML UNIT II
No ratings yet
ML UNIT II
30 pages
LECTURE Regression
No ratings yet
LECTURE Regression
12 pages
COMP1801 - Copy 1
No ratings yet
COMP1801 - Copy 1
18 pages
module 2 modified
No ratings yet
module 2 modified
67 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Data Science
No ratings yet
Data Science
5 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
MODULE 5
No ratings yet
MODULE 5
31 pages
Unit - 2 ML notes
No ratings yet
Unit - 2 ML notes
14 pages
Lecture 2
No ratings yet
Lecture 2
17 pages
Unit 2 Supervised Learning and Applications
No ratings yet
Unit 2 Supervised Learning and Applications
13 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
SemVII_MachineLearning
No ratings yet
SemVII_MachineLearning
22 pages
Regression: UNIT - V Regression Model
100% (1)
Regression: UNIT - V Regression Model
21 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Unit - Iii Data Analysis
No ratings yet
Unit - Iii Data Analysis
39 pages
ML 3 (1)
No ratings yet
ML 3 (1)
50 pages
Week-14 Lecture 28
No ratings yet
Week-14 Lecture 28
34 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
118 pages
4 ML
No ratings yet
4 ML
41 pages
Module 5
No ratings yet
Module 5
48 pages
Unit 2 Notes - Final
No ratings yet
Unit 2 Notes - Final
32 pages
DOC-20240831-WA0023.
No ratings yet
DOC-20240831-WA0023.
22 pages
Supervised and Unsupervised Learning
No ratings yet
Supervised and Unsupervised Learning
92 pages
Unit 2 - NOTES1 - ML
No ratings yet
Unit 2 - NOTES1 - ML
35 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
UNIT-6
No ratings yet
UNIT-6
107 pages
6_Classification and Regression Tasks
No ratings yet
6_Classification and Regression Tasks
115 pages
ML unit-2 half
No ratings yet
ML unit-2 half
16 pages
unit-3
No ratings yet
unit-3
30 pages
Aiml 4
No ratings yet
Aiml 4
107 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
Machinelearning Algorithm Basics2 NOTES
No ratings yet
Machinelearning Algorithm Basics2 NOTES
72 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
Unit2 ML Notes
No ratings yet
Unit2 ML Notes
19 pages
ML-classification models
No ratings yet
ML-classification models
27 pages
Lecture 3
No ratings yet
Lecture 3
51 pages
Machine learning
No ratings yet
Machine learning
4 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Unit-12
No ratings yet
Unit-12
3 pages
Data Transformation in Machine Learning _ GeeksforGeeks
100% (1)
Data Transformation in Machine Learning _ GeeksforGeeks
17 pages
How to handle Noise in Machine learning_ _ GeeksforGeeks
No ratings yet
How to handle Noise in Machine learning_ _ GeeksforGeeks
8 pages
IoT Software Overview
No ratings yet
IoT Software Overview
4 pages
Internet of Things Technology and Protocols
No ratings yet
Internet of Things Technology and Protocols
4 pages
IoT Embedded Devices and System - Javatpoint
No ratings yet
IoT Embedded Devices and System - Javatpoint
3 pages
Module-5-28march
No ratings yet
Module-5-28march
10 pages
Lecture-16-Introduction-to-AI
No ratings yet
Lecture-16-Introduction-to-AI
24 pages
Search Tree
No ratings yet
Search Tree
50 pages
Module4
No ratings yet
Module4
4 pages
Binary-tree
No ratings yet
Binary-tree
16 pages
Binary Heaps Notes For GATE _ Introduction to Binary Heaps - Data Structures
No ratings yet
Binary Heaps Notes For GATE _ Introduction to Binary Heaps - Data Structures
13 pages
SE-1
No ratings yet
SE-1
11 pages
Fibonacci Heap - javatpoint
No ratings yet
Fibonacci Heap - javatpoint
37 pages
An Undergraduate Journal Review
No ratings yet
An Undergraduate Journal Review
8 pages
A Short Course of Time-Series Analysis and Forecasting by D S G Pollock
No ratings yet
A Short Course of Time-Series Analysis and Forecasting by D S G Pollock
133 pages
ScienceSLM G10 Q3 M7 Biodiversity and Population
No ratings yet
ScienceSLM G10 Q3 M7 Biodiversity and Population
32 pages
Package QCA': R Topics Documented
No ratings yet
Package QCA': R Topics Documented
62 pages
Calculus BC Formulas & Justifications - Set G practice ANSWERS
No ratings yet
Calculus BC Formulas & Justifications - Set G practice ANSWERS
2 pages
Population Growth and Competition in Lemna Sp. and Spirodela Sp.
No ratings yet
Population Growth and Competition in Lemna Sp. and Spirodela Sp.
15 pages
Population Ecology: Dynamics of Populations Growth & Decline
No ratings yet
Population Ecology: Dynamics of Populations Growth & Decline
55 pages
Population Forecasting
100% (1)
Population Forecasting
21 pages
Growth Models Logistics and Gompertz[1]
No ratings yet
Growth Models Logistics and Gompertz[1]
8 pages
La208had PDF
No ratings yet
La208had PDF
6 pages
RPubs - The Analytics Edge EdX MIT15
No ratings yet
RPubs - The Analytics Edge EdX MIT15
57 pages
UNIT 1- LESSON 1.1. HUMAN POPULATION PPT
No ratings yet
UNIT 1- LESSON 1.1. HUMAN POPULATION PPT
15 pages
GROUP 1 - Science 10 - Ecosystem - Biodiversity
No ratings yet
GROUP 1 - Science 10 - Ecosystem - Biodiversity
46 pages
Chapter 4-Population-Biology Concepts
No ratings yet
Chapter 4-Population-Biology Concepts
35 pages
Six Types of Species-Area Curves
No ratings yet
Six Types of Species-Area Curves
8 pages
Design of Water Distribution: 1.1.1. Storage Reservoir
100% (2)
Design of Water Distribution: 1.1.1. Storage Reservoir
12 pages
An Invitation to Biomathematics 1st Edition Raina Stefanova Robeva download
100% (14)
An Invitation to Biomathematics 1st Edition Raina Stefanova Robeva download
60 pages
Process of Technology Change
No ratings yet
Process of Technology Change
78 pages
Bla Bla
No ratings yet
Bla Bla
13 pages
12 - PU Curves, RA Bills, MB'S, BOQ's Etc.
No ratings yet
12 - PU Curves, RA Bills, MB'S, BOQ's Etc.
10 pages
Population Dynamics
No ratings yet
Population Dynamics
25 pages
A Simple Selection Test Between Gompertz and Logistic Growth Models
No ratings yet
A Simple Selection Test Between Gompertz and Logistic Growth Models
10 pages
Population Geography
No ratings yet
Population Geography
25 pages
Kinetics of Hyaluronic Acid Production by Streptococcus Zooepidemicus Considering The Effect of Glucose
No ratings yet
Kinetics of Hyaluronic Acid Production by Streptococcus Zooepidemicus Considering The Effect of Glucose
9 pages
Logistics - Theory and Practice PDF
100% (1)
Logistics - Theory and Practice PDF
47 pages
The Diffusion of Online Shopping in Australia: Comparing The Bass, Logistic and Gompertz Growth Models
No ratings yet
The Diffusion of Online Shopping in Australia: Comparing The Bass, Logistic and Gompertz Growth Models
12 pages
Logistic Growth and The Price of Commodities
No ratings yet
Logistic Growth and The Price of Commodities
20 pages
STUDENTS ASSIGNMENTS EE CH 5
No ratings yet
STUDENTS ASSIGNMENTS EE CH 5
4 pages
Inventory Management
No ratings yet
Inventory Management
17 pages