Data Analytics Using R

Power point presentation on Data analytics

Uploaded by

sharmahemant3612

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Data Analytics Using R

Power point presentation on Data analytics

Uploaded by

sharmahemant3612

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Logistic regression

Compiled and Presented by:

Dr.Chetna Arora
Logistic regression
• Logistic regression is a method used to understand the relationship between a result
that has two possible outcomes (like "yes" or "no") and one or more factors that
might influence this result. One or more factors that might influence this result
refers to the independent variables (also called predictors or features) that could
affect the outcome. These factors are what you measure or observe to see how they
impact the dependent variable (the binary outcome).
For example, let’s say you are trying to predict whether a student passes or fails an
exam (the result, which has two outcomes: pass/fail).
• The factors that might influence this result could be:
Study hours (the more hours a student studies, the more likely they are to pass).
Attendance (students who attend more classes might be more likely to pass).
Previous grades (students with higher grades in the past might be more likely to
pass).
These factors (study hours, attendance, previous grades) are the independent
variables, and the result (pass or fail) is the dependent variable. Logistic regression
helps determine how these factors combine to influence the outcome, and it
estimates the probability of a student passing or failing based on these factors.
• When we simply refer to "logistic regression," we are often talking about binary
logistic regression.
• In logistic regression, the independent variables (IVs) can indeed have more than two options or
values.

Binary dependent variable (DV): In logistic regression, the DV must have only two possible
outcomes (e.g., "yes" or "no", "success" or "failure").

Independent variables (IVs): The IVs can be of various types:

Binary IV: An IV with two categories (e.g., male/female, yes/no).

Categorical IV with more than two levels: An IV can have multiple categories (e.g., marital status:
single, married, divorced). In this case, you'd create dummy variables (binary variables for each
category).
Continuous IV: An IV with a wide range of numeric values (e.g., age, income, hours of study).
Ordinal IV: An IV with ordered categories (e.g., education level: high school < bachelor’s <
master’s).
So, while the DV is always binary in basic logistic regression, the IVs can have any number of
options (categories or numeric values). For example:

IV1 (binary): Gender (male/female).

IV2 (ordinal): Education level (high school, bachelor’s, master’s).
IV3 (continuous): Age (in years).
IV4 (categorical): Region (North, South, East, West).
Logistic regression models can handle this variety of IVs, and the coefficients estimated for each
IV show how they impact the probability of the binary outcome (DV).
• Binary Logistic Regression: This is used when you have only
two options for your dependent variable (DV). For example,
predicting whether a student will pass (yes/no).
• Multinomial Logistic Regression: When your dependent
variable has more than two categories that are not ordered.
For instance, if you’re predicting the type of fruit someone
likes: apple, banana, or orange. Here, there’s no ranking; they
are just different categories.
• Ordinal Logistic Regression: This is used when your
dependent variable has more than two categories that are
ordered. For example, if you’re predicting levels of education:
high school, bachelor’s, master’s. Here, the categories have a
clear order (from lower to higher education).
Why we can’t we use Linear Regression?
• Linear Regression
• Output Range: The dependent variable (Y) can take any value from −∞ to +∞. This is
suitable for continuous outcomes, like predicting a person’s height or salary.
Negative infinity: The dependent variable (Y) can be as low as we can imagine, without
any lower limit. For example, a salary can be very low, theoretically approaching
negative infinity (taking consideration of debt). Positive infinity: Similarly, Y can also
be infinitely high, meaning there's no upper limit to the values it can take (like a person’s
height or an account balance). Height is always a positive value in real life, so it wouldn't
approach negative infinity either. But the concept of linear regression applies to any
continuous variable, even if in practice they have natural limits.
• Nature of Prediction: The model fits a line to minimize the difference between the
predicted values and actual values (least squares), making it effective for numeric
predictions.
• Logistic Regression
• Output: The dependent variable (Y) is binary, meaning it can only be 0 or 1. This
represents categories like "yes/no," "success/failure," or any two distinct groups.
• Nature of Prediction: Logistic regression predicts the probability of Y being 1 (success).
It uses the logistic function to map any real-valued number into a range between 0 and 1,
ensuring that predictions are valid probabilities.
• Odds:
• In statistics, odds represent the ratio of the probability
of an event happening to the probability of it not
happening. It's often used to describe binary outcomes
(like success/failure, yes/no).
• Odds=P(event)/1−P(event)
• Where:
• P(event) is the probability of the event happening (a
value between 0 and 1).
• 1−P(event) is the probability of the event not happening.
• Example:
• If the probability of an event (say, winning a
game) is 0.8 (80%), then the odds of winning
are:
• Odds of winning=0.8/1−0.8=0.8/0.2=4
• This means the odds are 4 to 1 in favor of
winning. For every 4 chances of winning,
there's 1 chance of not winning.
• Log Odds:
• The log odds (also called the "logit") is simply
the natural logarithm of the odds. It's used in
logistic regression to convert probabilities,
which are bounded between 0 and 1, into a
range that can be positive or negative, making
it easier to model with a linear equation.
• Log Odds=ln⁡P(event)/1−P(event)
• Example:
• If the odds of winning a game are 4 (as in the
previous example), then the log odds are:
• Log Odds=ln⁡(4)≈1.386
• This means a log odds of 1.386 corresponds to
a 4-to-1 chance of winning the game. When we
say the odds are 4-to-1, it means the event is 4
times more likely to happen than not happen.
• Relationship Between Probability, Odds, and
Log Odds:
• If you have probability, you can calculate odds:
• Odds=P/1−P
• If you have odds, you can calculate log odds:
• Log Odds=ln⁡(Odds)
• To go from log odds back to probability:
• P=eLog Odds/1+eLog Odds
• Logistic Regression Models Log Odds:
• The logistic regression equation predicts the log odds of the
dependent variable (usually binary, like 0 or 1) rather than
directly predicting the probability itself. So, understanding log
odds helps you grasp the essence of the logistic regression
model.
• The logistic regression equation:
• Log Odds=ln⁡P/(1−P)=β0+β1X1+β2X2+⋯+βnXn
• Here, the left-hand side is the log odds of the event happening,
and the right-hand side is a linear equation involving the
predictors (X1,X2,…,Xn) and their corresponding coefficients
(β1,β2,…,βn).
• Logistic regression transforms the log odds back
into a probability using the logistic (sigmoid)
function. This is important because we want to
predict probabilities (values between 0 and 1)
but the linear regression model works best with
unbounded values (which log odds provide).
• Once you have the log odds, you can convert
them to probabilities with this equation:
• P(y=1∣X)=1/1+e−(β0+β1X1+⋯+βnXn)
• P(y=1∣X)=1/1+e−(β0+β1X1+⋯+βnXn)
• Where:
• P(y=1∣X) is the probability that the output y is 1 given
the input X.
• e is the base of the natural logarithm.
• β0is the intercept (constant term).
• β1,β2,…,βn are the coefficients for the predictor
variables X1,X2,…,Xnrespectively.
• X1,X2,…,Xn are the input features (independent
variables).
Steps to Perform Binary Logistic Regression
in R
• Install and Load Necessary Packages: If you haven’t already, you
may want to install the dplyr package for data manipulation.

install.packages("dplyr") # Install dplyr package

library(dplyr) # Load dplyr package
Prepare Your Data: Ensure your data is in a data frame format,
with the dependent variable as a factor.

# Sample data frame

data <- data.frame(outcome = c(0, 1, 0, 1, 1, 0, 1, 0, 1, 0),
predictor1 = c(22, 25, 30, 35, 40, 22, 30, 35, 28, 32),
predictor2 = c(5, 6, 8, 9, 10, 7, 8, 9, 5, 6))
• # Convert outcome to factor
data$outcome <- as.factor(data$outcome)
•
Fit the Logistic Regression Model: Use the glm()
function to fit the model.

# Fit the logistic regression model

model <- glm(outcome ~ predictor1 + predictor2, data =
data, family = binomial)
• model
• Call: glm(formula = outcome ~ predictor1 + predictor2, family =
binomial, data = data) Coefficients: (Intercept) predictor1 predictor2
-3.4926 0.1664 -0.2041
• Degrees of Freedom: 9 Total (i.e. Null); 7 Residual Null Deviance:
13.86 Residual Deviance: 12.8 AIC: 18.8
• Call: This indicates the model call that was executed.
• glm: This is the function used to fit a Generalized Linear Model.
• formula = outcome ~ predictor1 + predictor2: This specifies the
relationship between the dependent variable (outcome) and the
independent variables (predictor1 and predictor2). It means that the
model predicts the outcome based on predictor1 and predictor2.
• family = binomial: This indicates that the model is a logistic
regression model, which is appropriate for binary outcomes (e.g.,
success/failure, yes/no).data = data: This specifies the dataset used
to fit the model.
• Coefficients: (Intercept) predictor1 predictor2 -3.4926 0.1664
-0.2041
• This section lists the estimated coefficients for the model:
– (Intercept): The intercept (β₀) is -3.4926. This is the log odds of
the outcome when all predictors are 0. In the context of logistic
regression, it represents the baseline log odds.
– predictor1: The coefficient for predictor1 is 0.1664. This indicates
that for every one-unit increase in predictor1, the log odds of the
outcome increase by 0.1664, holding all other variables constant.
– predictor2: The coefficient for predictor2 is -0.2041. This indicates
that for every one-unit increase in predictor2, the log odds of the
outcome decrease by 0.2041, holding all other variables constant.
• Degrees of Freedom: 9 Total (i.e. Null); 7 Residual
• Degrees of Freedom: This refers to the number of
independent values that can vary in the analysis.
– Total (i.e. Null): This is the total degrees of freedom for
the model, which is 9 in this case. It represents the total
number of observations minus 1 (n - 1), where n is the
total number of data points.
– Residual: This is the degrees of freedom remaining after
fitting the model, which is 7 in this case. It is calculated as
the total degrees of freedom minus the number of
parameters estimated (including the intercept)
• Null Deviance: 13.86
• Null Deviance: This measures how well the
response variable is predicted by a model with
only the intercept (no predictors). It reflects
how much variation exists in the outcome
when only the mean outcome is used as a
predictor. A higher null deviance indicates
more variation.
• Residual Deviance:
Residual Deviance: 12.8
• Residual Deviance: This measures how well the response variable is
predicted by the model that includes the predictors (predictor1 and
predictor2). A lower residual deviance compared to the null deviance
indicates that the predictors improve the model fit.
• AIC:
• AIC: 18.8
• AIC (Akaike Information Criterion): This is a measure of the relative
quality of the statistical model for a given set of data. It penalizes the
complexity of the model (i.e., the number of parameters) to avoid
overfitting. Lower AIC values indicate a better-fitting model. The AIC can
be used to compare different models: the one with the lowest AIC is
generally preferred.
View model summary
• summary(model)
Make Predictions: You can use the model to predict probabilities
of success.

•
# Predict probabilities
data$predicted_probabilities <- predict(model, type = "response")
Classify Outcomes: Convert probabilities to binary outcomes
based on a cutoff (commonly 0.5).

data$predicted_outcomes <- ifelse(data$predicted_probabilities >

0.5, 1, 0)
Evaluate Model Performance: Check the
accuracy of your predictions.
# Evaluate accuracy
table(data$outcome,
data$predicted_outcomes)
• You can visualize the predicted probabilities to see how well the
model fits:

# Plot predicted probabilities

library(ggplot2)
ggplot(data, aes(x = predictor1, y = predicted_probabilities)) +
geom_point() +
geom_smooth(method = "glm", method.args = list(family =
"binomial")) +
labs(title = "Predicted Probabilities from Logistic Regression", x =
"Predictor 1", y = "Predicted Probability")
Summary

ANSWERS With Marks EC402: Econometrics
No ratings yet
ANSWERS With Marks EC402: Econometrics
17 pages
ECN 813 Dummy Variable
No ratings yet
ECN 813 Dummy Variable
21 pages
Mode Linear Regression SQL
100% (1)
Mode Linear Regression SQL
21 pages
ML - Unit 2
No ratings yet
ML - Unit 2
155 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
MBAS901 - L4
No ratings yet
MBAS901 - L4
83 pages
Detailed_Logistic_Regression
No ratings yet
Detailed_Logistic_Regression
30 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
Regression Analysis Linear Multiple Logistic
No ratings yet
Regression Analysis Linear Multiple Logistic
25 pages
Logistic Regression: Psy 524 Ainsworth
No ratings yet
Logistic Regression: Psy 524 Ainsworth
37 pages
Logistic Regression
No ratings yet
Logistic Regression
16 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
5.1) Binary logistic regression
No ratings yet
5.1) Binary logistic regression
32 pages
Materi MT
No ratings yet
Materi MT
14 pages
Business Econometrics Using SAS Tools (BEST) : Class XI and XII - OLS BLUE and Assumption Errors
No ratings yet
Business Econometrics Using SAS Tools (BEST) : Class XI and XII - OLS BLUE and Assumption Errors
15 pages
Session9-LogisticRegression_a6c5bc556df30fa3eb779e22e464a08a - Copy
No ratings yet
Session9-LogisticRegression_a6c5bc556df30fa3eb779e22e464a08a - Copy
33 pages
Day 13 Logistic Regression
No ratings yet
Day 13 Logistic Regression
28 pages
Basic Statistics: Introductory Workshop MS-Bapm
No ratings yet
Basic Statistics: Introductory Workshop MS-Bapm
78 pages
Regression Anslysis
No ratings yet
Regression Anslysis
23 pages
Binary Logistic (5)
No ratings yet
Binary Logistic (5)
29 pages
Logistic Regression
100% (2)
Logistic Regression
30 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
09_23ECE216_LogisticRegression
No ratings yet
09_23ECE216_LogisticRegression
40 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
Logistic Regression Report
No ratings yet
Logistic Regression Report
39 pages
Using SAS To Extend Logistic Regression
No ratings yet
Using SAS To Extend Logistic Regression
8 pages
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
No ratings yet
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
19 pages
Logistic Regression
No ratings yet
Logistic Regression
7 pages
Logistic Regression (Autosaved)
No ratings yet
Logistic Regression (Autosaved)
21 pages
Regression Analysis
No ratings yet
Regression Analysis
14 pages
Logistic Regression (Autosaved)
No ratings yet
Logistic Regression (Autosaved)
21 pages
Linear Regression and Logit
No ratings yet
Linear Regression and Logit
15 pages
Regression in M.L
No ratings yet
Regression in M.L
13 pages
Logistic Regression
No ratings yet
Logistic Regression
30 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
2.1 Regression Analysis
No ratings yet
2.1 Regression Analysis
28 pages
University of Caloocan City: Managerial Economics Eco 3
No ratings yet
University of Caloocan City: Managerial Economics Eco 3
34 pages
Quantitative Analysis III Slides (Printer-Friendly) - 2023.03.15
No ratings yet
Quantitative Analysis III Slides (Printer-Friendly) - 2023.03.15
52 pages
Logistic Regression
No ratings yet
Logistic Regression
47 pages
CH-3
No ratings yet
CH-3
123 pages
Logistic Regression
No ratings yet
Logistic Regression
27 pages
Stata
No ratings yet
Stata
6 pages
FAM Unit6
No ratings yet
FAM Unit6
32 pages
ML Course Slides
No ratings yet
ML Course Slides
345 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Lecture 03 Logistic Regression
No ratings yet
Lecture 03 Logistic Regression
34 pages
Binary Logistic Regression Lecture 9
No ratings yet
Binary Logistic Regression Lecture 9
33 pages
Practical - 592 MA SOCIOLOGY SPSS Fourth Sem
No ratings yet
Practical - 592 MA SOCIOLOGY SPSS Fourth Sem
45 pages
MLCourse Slides
No ratings yet
MLCourse Slides
356 pages
20-questions-to-test-your-skills-on-logistic-regression
No ratings yet
20-questions-to-test-your-skills-on-logistic-regression
9 pages
Logistic Regressions
No ratings yet
Logistic Regressions
11 pages
ML Course Slides
No ratings yet
ML Course Slides
356 pages
MLCourseSlides
No ratings yet
MLCourseSlides
427 pages
MEDI 1020_Workshop 7_Regression (1)
No ratings yet
MEDI 1020_Workshop 7_Regression (1)
15 pages
5.3) Ordinal logistic regression 2
No ratings yet
5.3) Ordinal logistic regression 2
40 pages
Materi MT
No ratings yet
Materi MT
14 pages
Statistics For Decisions Making: Dr. Rohit Joshi, IIM Shillong
No ratings yet
Statistics For Decisions Making: Dr. Rohit Joshi, IIM Shillong
64 pages
Logistic Regression
No ratings yet
Logistic Regression
11 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
OTM Correlation Regression Dec 23
No ratings yet
OTM Correlation Regression Dec 23
8 pages
Assignment of Econometrics
No ratings yet
Assignment of Econometrics
12 pages
SEM
100% (1)
SEM
10 pages
APtest 3B
No ratings yet
APtest 3B
4 pages
Grice Iwasaki AMR
No ratings yet
Grice Iwasaki AMR
28 pages
Airbnb Price Estimation
No ratings yet
Airbnb Price Estimation
1 page
Chapter Three: Estimation of Multiple Linear Regression Model
No ratings yet
Chapter Three: Estimation of Multiple Linear Regression Model
18 pages
Data Mining and Analysis: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Analysis: Fundamental Concepts and Algorithms
9 pages
Mba-1-Sem-Business-Statistics-Mba-Aktu-Previous Year Paper
No ratings yet
Mba-1-Sem-Business-Statistics-Mba-Aktu-Previous Year Paper
7 pages
Jurnal Najdi Abdilah
No ratings yet
Jurnal Najdi Abdilah
7 pages
Correlations: Cross Tabs
No ratings yet
Correlations: Cross Tabs
2 pages
One Way Anova
No ratings yet
One Way Anova
21 pages
Linear Regression in Real Life - Towards Data Science
No ratings yet
Linear Regression in Real Life - Towards Data Science
16 pages
Module No. 12 Title: Pearson R and Spearman Rho: 1. The Coefficient of Correlation 2. Rank Correlation
100% (1)
Module No. 12 Title: Pearson R and Spearman Rho: 1. The Coefficient of Correlation 2. Rank Correlation
14 pages
ML SP24 Mid Term Exam - Solution
No ratings yet
ML SP24 Mid Term Exam - Solution
8 pages
Group Assignment Alfy 602 2023
No ratings yet
Group Assignment Alfy 602 2023
3 pages
Qualitative Response Regression Models 1
No ratings yet
Qualitative Response Regression Models 1
29 pages
Hasil Uji Validitas Dan Realibilitas
No ratings yet
Hasil Uji Validitas Dan Realibilitas
5 pages
Pearson Product Moment Correlation Coefficient
No ratings yet
Pearson Product Moment Correlation Coefficient
2 pages
What Is Linear Discriminant Analysis
No ratings yet
What Is Linear Discriminant Analysis
3 pages
Group 5: Glendell Atesora Estelle Nica Marie Dunlao Johny Sevilla Jeoffrey Casipong Elny Rose Vio Zweetsel Señeres
No ratings yet
Group 5: Glendell Atesora Estelle Nica Marie Dunlao Johny Sevilla Jeoffrey Casipong Elny Rose Vio Zweetsel Señeres
45 pages
INSY662 - F23 - Week 3-2
No ratings yet
INSY662 - F23 - Week 3-2
15 pages
Exam Pa Note
No ratings yet
Exam Pa Note
73 pages
Quant Interview Cheat Sheet
No ratings yet
Quant Interview Cheat Sheet
13 pages
Logistics Regression Exercise
No ratings yet
Logistics Regression Exercise
2 pages
EC229 Part II Answers
No ratings yet
EC229 Part II Answers
9 pages
Anova
No ratings yet
Anova
8 pages
Least Squares Methods To Forecast Sales For A Company
No ratings yet
Least Squares Methods To Forecast Sales For A Company
5 pages