Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
5 views

Linear and Logistic Regression

Linear regression is a supervised machine learning algorithm used for predicting continuous outcomes based on input features, exemplified by housing price prediction. It assumes a linear relationship between variables and provides interpretable coefficients. Logistic regression, on the other hand, is used for binary classification problems, predicting probabilities of outcomes using the sigmoid function.

Uploaded by

chaudharysidhu11
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Linear and Logistic Regression

Linear regression is a supervised machine learning algorithm used for predicting continuous outcomes based on input features, exemplified by housing price prediction. It assumes a linear relationship between variables and provides interpretable coefficients. Logistic regression, on the other hand, is used for binary classification problems, predicting probabilities of outcomes using the sigmoid function.

Uploaded by

chaudharysidhu11
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Need of Linear Regression

Linear regression is a fundamental machine learning algorithm used for predicting


a continuous outcome variable based on one or more input features.
Need for linear regression with an example:
Example: Housing Price Prediction
Imagine you are working on a real estate project where the goal is to predict the
price of a house based on various features such as square footage, number of
bedrooms, and neighborhood. The dataset might look like this:

Square Bedrooms Neighbouhood Price


1500 2 Urban 300,000
2000 3 Sub Urban 400,000
1200 2 Rural 250,000
Examples
• Continuous Outcome: Linear regression is suitable when the outcome
variable (price in this case) is continuous. It models the relationship
between the input features and the continuous output.
• Linear Relationship: Linear regression assumes a linear relationship
between the input features and the outcome. In the real estate example,
you may expect that as the square footage increases, the price also tends
to increase. Linear regression helps quantify and model such linear
relationships.
• Interpretability: Linear regression provides interpretable coefficients for
each input feature, indicating the strength and direction of their influence
on the outcome. For example, a positive coefficient for square footage
suggests that an increase in square footage is associated with a higher
house price.
Linear regression
• Linear regression is a supervised machine learning algorithm used for
predicting a continuous outcome variable (also called the dependent
variable) based on one or more predictor variables (independent
variables). It assumes a linear relationship between the input
variables and the output.
• The basic idea behind linear regression is to find the best-fitting
straight line (linear equation) that minimizes the sum of the squared
differences between the observed and predicted values. The
equation for a simple linear regression with one independent variable
is typically represented as:
Y=mx+b
• Where:
• y is the dependent variable (the variable you are trying to predict).
• x is the independent variable (the variable used to make predictions).
• m is the slope of the line, representing the relationship between x
and y.
• b is the y-intercept, the point where the line crosses the y-axis.
• For multiple linear regression with more than one independent
variable, the equation is extended to:
• For multiple linear regression with more than one independent
variable, the equation is extended to:
Basic steps to implement linear Regression
• Data Collection: Gather a dataset with relevant information.
• Data Preprocessing: Clean the data, handle missing values, and preprocess
features if needed.
• Feature Selection: Choose the relevant features that might influence the
dependent variable.
• Split Data: Divide the dataset into a training set and a testing set.
• Model Training: Use the training set to find the best-fitting line by
adjusting the coefficients.
• Model Evaluation: Evaluate the model's performance on the testing set.
• Prediction: Use the trained model to make predictions on new or unseen
data.
Numerical Example
• Consider a dataset with one independent variable (x) and one
dependent variable (y).
• We'll use the equations mentioned earlier to find the slope (m) and y-
intercept (b) for the best-fitting line.
• Suppose we have the following dataset:
• x:1,2,3,4,5
• y:2,4,5,4,5
We want to find the equation of the line y=mx+b that best fits this data.
0.6
Numerical
Need of Logistic Regression
• logistic regression is valuable for binary classification problems,
especially when you need to model the probability of an event
occurring based on multiple input features.
• Supervised Machine learning Classification Model
• Dependent Variable is categorical and binary(0 or 1)
• Independent Variable-Study Hours Dependent Variable-Exam Results
Study Hours Exam Result
2 0
4 0
6 0
8 1
10 1
Imagine you are working on a medical diagnosis project where the goal is to predict
whether a patient has a particular medical condition based on some features. For simplicity,
let's consider a binary outcome: 1 if the patient has the condition and 0 if the patient does
not.
Now, suppose you have a dataset with the following features and outcomes:

Age Blood Pressure Cholesterol Level Outcome


40 120 200 1
55 140 250 0
60 130 180 1

Not possible to apply linear regression in such a case.

The logistic regression model uses the sigmoid function to map the linear combination of
features to a probability in the range [0, 1]. The sigmoid function ensures that the predicted
probabilities are well-behaved and can be interpreted as the likelihood of belonging to a specific
class.
Logistic Regression
• Logistic regression is a supervised machine learning algorithm used
for binary classification problems, where the outcome variable
(dependent variable) is categorical and has only two possible classes,
often denoted as 0 and 1. It's named "regression," but it's primarily
used for classification tasks.
• The logistic regression model predicts the probability that a given
input belongs to a particular class. Unlike linear regression, where the
output is a continuous value, logistic regression uses the logistic
function (also known as the sigmoid function) to squash the output
into the range of [0, 1].
• The logistic function maps any real-valued number z to the
range [0, 1], which can be interpreted as a probability. The
output of the logistic regression model can be interpreted as the
probability that the given input belongs to class 1.
• The logistic regression model makes predictions by comparing
the output probability to a threshold (usually 0.5). If the
predicted probability is greater than or equal to the threshold,
the input is classified as belonging to class 1; otherwise, it is
classified as belonging to class 0.
• The training process involves finding the optimal values for the
coefficients (b0​,b1​,…,bn) that minimize the difference between
the predicted probabilities and the actual class labels in the
training data. This is typically done through an optimization
algorithm, such as gradient descent.
Key steps in logistic Regression:
• Data Collection: Gather a dataset with input features and corresponding binary
class labels.
• Data Preprocessing: Clean the data, handle missing values, and preprocess
features if needed.
• Feature Selection: Choose the relevant features that might influence the binary
outcome.
• Split Data: Divide the dataset into a training set and a testing set.
• Model Training: Use the training set to find the optimal coefficients through an
optimization algorithm.
• Model Evaluation: Evaluate the model's performance on the testing set using
metrics like accuracy, precision, recall, or F1 score.
• Prediction: Use the trained model to make predictions on new or unseen data.
Suppose we have a dataset with information about whether students
pass (1) or fail (0) an exam based on the number of hours they studied.

• Feature X: Hours Studied


• Target y: Result (Pass or Fail)
Types of Logistic Regression
• On the basis of the categories, Logistic Regression can be classified
into three types:
• Binomial: In binomial Logistic regression, there can be only two
possible types of the dependent variables, such as 0 or 1, Pass or Fail,
etc.
• Multinomial: In multinomial Logistic regression, there can be 3 or
more possible unordered types of the dependent variable, such as
“cat”, “dogs”, or “sheep”
• Ordinal: In ordinal Logistic regression, there can be 3 or more
possible ordered types of dependent variables, such as “low”,
“Medium”, or “High”.
Assumptions of Logistic Regression
• Independent observations: Each observation is independent of the other.
meaning there is no correlation between any input variables.
• Binary dependent variables: It takes the assumption that the dependent
variable must be binary or dichotomous, meaning it can take only two
values. For more than two categories SoftMax functions are used.
• Linearity relationship between independent variables and log odds: The
relationship between the independent variables and the log odds of the
dependent variable should be linear.
• No outliers: There should be no outliers in the dataset.
• Large sample size: The sample size is sufficiently large
Logistic Regression Numerical example

Study Hours Exam Result


2 0
3 0
4 0
5 1
6 1
7 1
8 1

You might also like