Python Machine Learning - Logistic Regression
Python Machine Learning - Logistic Regression
Dark mode
Dark code
HTML CSS JAVASCRIPT SQL PYTHON JAVA PHP BOOTSTRAP HOW TO W3.CSS C C++ C# REACT R JQUERY DJANGO
atp ot b Scatte
Matplotlib Bars
ADVERTISEMENT
Matplotlib Histograms
Matplotlib Pie Charts
Machine Learning
Getting Started
Mean Median Mode
Machine Learning - Logistic Regression
Standard Deviation
❮ Previous Next ❯
Percentile
Data Distribution
Normal Data Distribution
On this page, W3schools.com collaborates with NYC Data Science Academy, to deliver digital training content to our students.
Scatter Plot
Linear Regression
Logistic Regression
Polynomial Regression
Multiple Regression
Scale
Logistic regression aims to solve classification problems. It does this by predicting categorical outcomes, unlike linear regression
Train/Test
that predicts a continuous outcome.
Decision Tree
Confusion Matrix In the simplest case there are two outcomes, which is called binomial, an example of which is predicting if a tumor is malignant
Hierarchical Clustering or benign. Other cases have more than two outcomes to classify, in this case it is called multinomial. A common example for
Logistic Regression multinomial logistic regression would be predicting the class of an iris flower between 3 different species.
Grid Search
Here we will be using basic logistic regression to predict a binomial variable. This means it has only two possible outcomes.
Categorical Data
K-means
Bootstrap Aggregation
Cross Validation
How does it work?
In Python we have modules that will do the work for us. Start by importing the NumPy module.
import numpy
#Note: X has to be reshaped into a column from a row for the LogisticRegression() function to work.
#y represents whether or not the tumor is cancerous (0 for "No", 1 for "Yes").
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
We will use a method from the sklearn module, so we will have to import that module as well:
From the sklearn module we will use the LogisticRegression() method to create a logistic regression object.
This object has a method called fit() that takes the independent and dependent values as parameters and fills the regression
object with data that describes the relationship:
logr = linear_model.LogisticRegression()
logr.fit(X,y)
Now we have a logistic regression object that is ready to whether a tumor is cancerous based on the tumor size:
ADVERTISEMENT
import numpy
from sklearn import linear_model
logr = linear_model.LogisticRegression()
logr.fit(X,y)
Result
[0]
Run example »
We have predicted that a tumor with a size of 3.46mm will not be cancerous.
ADVERTISEMENT
Coefficient
In logistic regression the coefficient is the expected change in log-odds of having the outcome per unit change in X.
This does not have the most intuitive understanding so let's use it to create something that makes more sense, odds.
Example
See the whole example in action:
import numpy
from sklearn import linear_model
logr = linear_model.LogisticRegression()
logr.fit(X,y)
log_odds = logr.coef_
odds = numpy.exp(log_odds)
print(odds)
Result
[4.03541657]
Run example »
This tells us that as the size of a tumor increases by 1mm the odds of it being a cancerous tumor increases by 4x.
Probability
The coefficient and intercept values can be used to find the probability that each tumor is cancerous.
Create a function that uses the model's coefficient and intercept values to return a new value. This new value represents
probability that the given observation is a tumor:
def logit2prob(logr,x):
log_odds = logr.coef_ * x + logr.intercept_
odds = numpy.exp(log_odds)
probability = odds / (1 + odds)
return(probability)
Function Explained
To find the log-odds for each observation, we must first create a formula that looks similar to the one from linear regression,
extracting the coefficient and the intercept.
odds = numpy.exp(log_odds)
Now that we have the odds, we can convert it to probability by dividing it by 1 plus the odds.
Let us now use the function with what we have learned to find out the probability that each tumor is cancerous.
Example
See the whole example in action:
import numpy
from sklearn import linear_model
X = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
logr = linear_model.LogisticRegression()
logr.fit(X,y)
print(logit2prob(logr, X))
Result
[[0.60749955]
[0.19268876]
[0.12775886]
[0.00955221]
[0.08038616]
[0.07345637]
[0.88362743]
[0.77901378]
[0.88924409]
[0.81293497]
[0.57719129]
[0.96664243]]
Run example »
Results Explained
3.78 0.61 The probability that a tumor with the size 3.78cm is cancerous is 61%.
2.44 0.19 The probability that a tumor with the size 2.44cm is cancerous is 19%.
2.09 0.13 The probability that a tumor with the size 2.09cm is cancerous is 13%.
ADVERTISEMENT
ADVERTISEMENT
FORUM | ABOUT
W3Schools is optimized for learning and training. Examples might be simplified to improve reading and learning. Tutorials, references, and examples are constantly reviewed to avoid errors, but we
cannot warrant full correctness of all content. While using W3Schools, you agree to have read and accepted our terms of use, cookie and privacy policy.