Linear and Logistic Regression
Linear and Logistic Regression
The logistic regression model uses the sigmoid function to map the linear combination of
features to a probability in the range [0, 1]. The sigmoid function ensures that the predicted
probabilities are well-behaved and can be interpreted as the likelihood of belonging to a specific
class.
Logistic Regression
• Logistic regression is a supervised machine learning algorithm used
for binary classification problems, where the outcome variable
(dependent variable) is categorical and has only two possible classes,
often denoted as 0 and 1. It's named "regression," but it's primarily
used for classification tasks.
• The logistic regression model predicts the probability that a given
input belongs to a particular class. Unlike linear regression, where the
output is a continuous value, logistic regression uses the logistic
function (also known as the sigmoid function) to squash the output
into the range of [0, 1].
• The logistic function maps any real-valued number z to the
range [0, 1], which can be interpreted as a probability. The
output of the logistic regression model can be interpreted as the
probability that the given input belongs to class 1.
• The logistic regression model makes predictions by comparing
the output probability to a threshold (usually 0.5). If the
predicted probability is greater than or equal to the threshold,
the input is classified as belonging to class 1; otherwise, it is
classified as belonging to class 0.
• The training process involves finding the optimal values for the
coefficients (b0,b1,…,bn) that minimize the difference between
the predicted probabilities and the actual class labels in the
training data. This is typically done through an optimization
algorithm, such as gradient descent.
Key steps in logistic Regression:
• Data Collection: Gather a dataset with input features and corresponding binary
class labels.
• Data Preprocessing: Clean the data, handle missing values, and preprocess
features if needed.
• Feature Selection: Choose the relevant features that might influence the binary
outcome.
• Split Data: Divide the dataset into a training set and a testing set.
• Model Training: Use the training set to find the optimal coefficients through an
optimization algorithm.
• Model Evaluation: Evaluate the model's performance on the testing set using
metrics like accuracy, precision, recall, or F1 score.
• Prediction: Use the trained model to make predictions on new or unseen data.
Suppose we have a dataset with information about whether students
pass (1) or fail (0) an exam based on the number of hours they studied.