Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
LOGISTIC REGRESSION
CLASSIFICATION
LOGISTIC REGRESSION
• LOGISTIC REGRESSION IS A MACHINE LEARNING CLASSIFICATION ALGORITHM
THAT IS USED TO PREDICT THE PROBABILITY OF A CATEGORICAL DEPENDENT
VARIABLE.
• IN LOGISTIC REGRESSION, THE DEPENDENT VARIABLE IS A BINARY VARIABLE
THAT CONTAINS DATA CODED AS 1 (YES, SUCCESS, ETC.) OR 0 (NO, FAILURE,
ETC.).
• IN OTHER WORDS, THE LOGISTIC REGRESSION MODEL PREDICTS P(Y=1) AS A
FUNCTION OF X.
REMEMBER THE HYPOTHESIS OF LINEAR
REGRESSION
LOGISTIC REGRESSION
• LOGISTIC REGRESSION JUST HAS A TRANSFORMATION BASED ON LINEAR
REGRESSION HYPOTHESIS.
• FOR LOGISTIC REGRESSION, FOCUSING ON BINARY CLASSIFICATION HERE, WE
HAVE CLASS 0 AND CLASS 1.
• TO COMPARE WITH THE TARGET, WE WANT TO CONSTRAIN PREDICTIONS TO
SOME VALUES BETWEEN 0 AND 1.
• THAT’S WHY SIGMOID FUNCTION IS APPLIED ON THE RAW MODEL OUTPUT AND
PROVIDES THE ABILITY TO PREDICT WITH PROBABILITY.
LOGISTIC REGRESSION
• WHAT HYPOTHESIS FUNCTION RETURNS IS THE
PROBABILITY THAT Y = 1, GIVEN X,
PARAMETERIZED BY Θ, WRITTEN AS: H(X) = P(Y =
1|X; Θ).
• DECISION BOUNDARY CAN BE DESCRIBED AS:
PREDICT 1, IF ΘᵀX ≥ 0 → H(X) ≥ 0.5; PREDICT 0,
IF ΘᵀX < 0 → H(X) < 0.5.
LOGISTIC FUNCTION
• THE LOGISTIC FUNCTION, ALSO CALLED THE SIGMOID FUNCTION WAS
DEVELOPED BY STATISTICIANS TO DESCRIBE PROPERTIES OF POPULATION
GROWTH IN ECOLOGY, RISING QUICKLY AND MAXING OUT AT THE CARRYING
CAPACITY OF THE ENVIRONMENT. IT’S AN S-SHAPED CURVE THAT CAN TAKE
ANY REAL-VALUED NUMBER AND MAP IT INTO A VALUE BETWEEN 0 AND 1, BUT
NEVER EXACTLY AT THOSE LIMITS : 1 / (1 + E^-VALUE)
• WHERE E IS THE BASE OF THE NATURAL LOGARITHMS (EULER’S NUMBER OR THE
EXP() FUNCTION IN YOUR SPREADSHEET) AND VALUE IS THE ACTUAL NUMERICAL
VALUE THAT YOU WANT TO TRANSFORM.
LOGISTIC
FUNCTION
• PLOT OF THE NUMBERS
BETWEEN -5 AND 5
TRANSFORMED INTO THE
RANGE 0 AND 1 USING THE
LOGISTIC FUNCTION.
LOGISTIC REGRESSION
• INPUT VALUES (X) ARE COMBINED LINEARLY USING WEIGHTS OR COEFFICIENT VALUES (REFERRED TO AS THE GREEK
CAPITAL LETTER BETA) TO PREDICT AN OUTPUT VALUE (Y).
• A KEY DIFFERENCE FROM LINEAR REGRESSION IS THAT THE OUTPUT VALUE BEING MODELED IS A BINARY VALUES (0
OR 1) RATHER THAN A NUMERIC VALUE.
• BELOW IS AN EXAMPLE LOGISTIC REGRESSION EQUATION:
• Y = E^(B0 + B1*X) / (1 + E^(B0 + B1*X))
• WHERE Y IS THE PREDICTED OUTPUT, B0 IS THE BIAS OR INTERCEPT TERM AND B1 IS THE COEFFICIENT FOR THE
SINGLE INPUT VALUE (X).
• EACH COLUMN IN YOUR INPUT DATA HAS AN ASSOCIATED B COEFFICIENT (A CONSTANT REAL VALUE) THAT MUST
BE LEARNED FROM YOUR TRAINING DATA.
• THE ACTUAL REPRESENTATION OF THE MODEL THAT YOU WOULD STORE IN MEMORY OR IN A FILE ARE THE
COEFFICIENTS IN THE EQUATION (THE BETA VALUE OR B’S)
TECHNICAL INTERLUDE
• LOGISTIC REGRESSION MODELS THE PROBABILITY OF THE DEFAULT CLASS (E.G. THE
FIRST CLASS).
• FOR EXAMPLE, IF WE ARE MODELING PEOPLE’S SEX AS MALE OR FEMALE FROM THEIR
HEIGHT, THEN THE FIRST CLASS COULD BE MALE AND THE LOGISTIC REGRESSION
MODEL COULD BE WRITTEN AS THE PROBABILITY OF MALE GIVEN A PERSON’S
HEIGHT, OR MORE FORMALLY LIKE BELOW POINT.
• P(SEX=MALE|HEIGHT)
• WRITTEN ANOTHER WAY, WE ARE MODELING THE PROBABILITY THAT AN INPUT (X)
BELONGS TO THE DEFAULT CLASS (Y=1), WE CAN WRITE THIS FORMALLY AS BELOW
POINT.
• P(X) = P(Y=1|X)
LOG ODDS
• LOGISTIC REGRESSION IS A LINEAR METHOD, BUT THE PREDICTIONS ARE TRANSFORMED USING THE
LOGISTIC FUNCTION.
• THE IMPACT OF THIS IS THAT WE CAN NO LONGER UNDERSTAND THE PREDICTIONS AS A LINEAR
COMBINATION OF THE INPUTS AS WE CAN WITH LINEAR REGRESSION, FOR EXAMPLE, THE MODEL
CAN BE STATED AS: P(X) = E^(B0 + B1*X) / (1 + E^(B0 + B1*X))
• WE CAN TURN AROUND THE ABOVE EQUATION AS FOLLOWS (REMEMBER WE CAN REMOVE THE E
FROM ONE SIDE BY ADDING A NATURAL LOGARITHM (LN) TO THE OTHER): NEXT POINT
• LN(P(X) / 1 – P(X)) = B0 + B1 * X
• THIS IS USEFUL BECAUSE WE CAN SEE THAT THE CALCULATION OF THE OUTPUT ON THE RIGHT IS
LINEAR AGAIN (JUST LIKE LINEAR REGRESSION), AND THE INPUT ON THE LEFT IS A LOG OF THE
PROBABILITY OF THE DEFAULT CLASS.
LOG ODDS
• LN(P(X) / 1 – P(X)) = B0 + B1 * X
• THIS RATIO ON THE LEFT IS CALLED THE ODDS OF THE DEFAULT CLASS.
• ODDS ARE CALCULATED AS A RATIO OF THE PROBABILITY OF THE EVENT DIVIDED BY THE
PROBABILITY OF NOT THE EVENT, E.G. 0.8/(1-0.8) WHICH HAS THE ODDS OF 4. SO WE COULD
INSTEAD WRITE: LN(ODDS) = B0 + B1 * X
• BECAUSE THE ODDS ARE LOG TRANSFORMED, WE CALL THIS LEFT HAND SIDE THE LOG-ODDS OR
THE PROBIT.
• WE CAN MOVE THE EXPONENT BACK TO THE RIGHT AND WRITE IT AS:
• ODDS = E^(B0 + B1 * X)
• ALL OF THIS HELPS US UNDERSTAND THAT INDEED THE MODEL IS STILL A LINEAR COMBINATION OF
THE INPUTS, BUT THAT THIS LINEAR COMBINATION RELATES TO THE LOG-ODDS OF THE DEFAULT
CLASS.
COST FUNCTION
• LINEAR REGRESSION USES LEAST SQUARED
ERROR AS LOSS FUNCTION THAT GIVES A
CONVEX GRAPH AND THEN WE CAN COMPLETE
THE OPTIMIZATION BY FINDING ITS VERTEX AS
GLOBAL MINIMUM.
• HOWEVER, IT’S NOT AN OPTION FOR LOGISTIC
REGRESSION ANYMORE. SINCE THE HYPOTHESIS
IS CHANGED, LEAST SQUARED ERROR WILL
RESULT IN A NON-CONVEX GRAPH WITH LOCAL
MINIMUMS BY CALCULATING WITH SIGMOID
FUNCTION APPLIED ON RAW MODEL OUTPUT.
COST FUNCTION
• INTUITIVELY, WE WANT TO ASSIGN MORE
PUNISHMENT WHEN PREDICTING 1 WHILE THE
ACTUAL IS 0 AND WHEN PREDICT 0 WHILE THE
ACTUAL IS 1.
• THE LOSS FUNCTION OF LOGISTIC
REGRESSION IS DOING THIS EXACTLY WHICH
IS CALLED LOGISTIC LOSS.
• IF Y = 1, LOOKING AT THE PLOT ON LEFT,
WHEN PREDICTION = 1, THE COST = 0, WHEN
PREDICTION = 0, THE LEARNING ALGORITHM
IS PUNISHED BY A VERY LARGE COST.
• SIMILARLY, IF Y = 0, THE PLOT ON RIGHT
SHOWS, PREDICTING 0 HAS NO PUNISHMENT
BUT PREDICTING 1 HAS A LARGE VALUE OF
COST.
COST FUNCTION
• ANOTHER ADVANTAGE OF THIS LOSS
FUNCTION IS THAT ALTHOUGH WE ARE
LOOKING AT IT BY Y = 1 AND Y = 0
SEPARATELY, IT CAN BE WRITTEN AS ONE
SINGLE FORMULA WHICH BRINGS
CONVENIENCE FOR CALCULATION: TOP 1ST
PICTURE
• SO THE COST FUNCTION OF THE MODEL IS
THE SUMMATION FROM ALL TRAINING
DATA SAMPLES: 2ND PICTURE
OPTIMIZATION
• WITH THE RIGHT LEARNING ALGORITHM, WE CAN START TO FIT BY MINIMIZING
J(Θ) AS A FUNCTION OF Θ TO FIND OPTIMAL PARAMETERS.
• WE CAN STILL APPLY GRADIENT DESCENT AS THE OPTIMIZATION ALGORITHM.
• IT TAKES PARTIAL DERIVATIVE OF J WITH RESPECT TO Θ (THE SLOPE OF J), AND
UPDATES Θ VIA EACH ITERATION WITH A SELECTED LEARNING RATE Α UNTIL
THE GRADIENT DESCENT HAS CONVERGED.
NEWTON’S METHOD
• ANOTHER POPULAR OPTIMIZATION
ALGORITHM, NEWTON’S METHOD, THAT APPLIES
DIFFERENT APPROACH TO REACH THE GLOBAL
MINIMUM OF COST FUNCTION.
• SIMILAR TO GRADIENT DESCENT, WE FIRSTLY TAKE
THE PARTIAL DERIVATIVE OF J(Θ) THAT IS THE
SLOPE OF J(Θ), AND NOTE IT AS F(Θ).
• INSTEAD OF DECREASING Θ BY A CERTAIN CHOSEN
LEARNING RATE Α MULTIPLIED WITH F(Θ) ,
NEWTON’S METHOD GETS AN UPDATED Θ AT THE
POINT OF INTERSECTION OF THE TANGENT LINE OF
F(Θ) AT PREVIOUS Θ AND X AXIS.
• AFTER AMOUNT OF ITERATIONS, NEWTON’S
METHOD WILL CONVERGE AT F(Θ) = 0.
NEWTON’S
METHOD
• SEE THE SIMPLIFIED PLOT,
STARTING FROM THE RIGHT, THE
YELLOW DOTTED LINE IS THE
TANGENT OF F(Θ) AT THE Θ0.
• IT DETERMINES THE POSITION OF
Θ1, AND THE DISTANCE FROM
THE Θ0 TO Θ1 IS Δ.
• THIS PROCESS REPEATS UNTIL
FINDING THE OPTIMAL Θ THAT
SUBJECTS TO F(Θ) = 0, WHICH IS
Θ3 IN THIS PLOT.
MAXIMUM LIKELIHOOD ESTIMATION
• MAXIMUM-LIKELIHOOD ESTIMATION IS A COMMON LEARNING ALGORITHM USED BY
A VARIETY OF MACHINE LEARNING ALGORITHMS, ALTHOUGH IT DOES MAKE
ASSUMPTIONS ABOUT THE DISTRIBUTION OF YOUR DATA.
• THE BEST COEFFICIENTS WOULD RESULT IN A MODEL THAT WOULD PREDICT A
VALUE VERY CLOSE TO 1 (E.G. MALE) FOR THE DEFAULT CLASS AND A VALUE VERY
CLOSE TO 0 (E.G. FEMALE) FOR THE OTHER CLASS.
• THE INTUITION FOR MAXIMUM-LIKELIHOOD FOR LOGISTIC REGRESSION IS THAT A
SEARCH PROCEDURE SEEKS VALUES FOR THE COEFFICIENTS (BETA VALUES) THAT
MINIMIZE THE ERROR IN THE PROBABILITIES PREDICTED BY THE MODEL TO THOSE IN
THE DATA (E.G. PROBABILITY OF 1 IF THE DATA IS THE PRIMARY CLASS).
PREPARE DATA FOR LOGISTIC REGRESSION
• THE ASSUMPTIONS MADE BY LOGISTIC REGRESSION ABOUT THE DISTRIBUTION AND RELATIONSHIPS
IN YOUR DATA ARE MUCH THE SAME AS THE ASSUMPTIONS MADE IN LINEAR REGRESSION.
• BINARY OUTPUT VARIABLE: THIS MIGHT BE OBVIOUS AS WE HAVE ALREADY MENTIONED IT, BUT
LOGISTIC REGRESSION IS INTENDED FOR BINARY (TWO-CLASS) CLASSIFICATION PROBLEMS. IT WILL
PREDICT THE PROBABILITY OF AN INSTANCE BELONGING TO THE DEFAULT CLASS, WHICH CAN BE
SNAPPED INTO A 0 OR 1 CLASSIFICATION.
• REMOVE NOISE: LOGISTIC REGRESSION ASSUMES NO ERROR IN THE OUTPUT VARIABLE (Y), CONSIDER
REMOVING OUTLIERS AND POSSIBLY MISCLASSIFIED INSTANCES FROM YOUR TRAINING DATA.
• REMOVE CORRELATED INPUTS: LIKE LINEAR REGRESSION, THE MODEL CAN OVERFIT IF YOU HAVE
MULTIPLE HIGHLY-CORRELATED INPUTS. CONSIDER CALCULATING THE PAIRWISE CORRELATIONS
BETWEEN ALL INPUTS AND REMOVING HIGHLY CORRELATED INPUTS.
• IT DOES ASSUME A LINEAR RELATIONSHIP BETWEEN THE INPUT VARIABLES WITH THE OUTPUT. DATA
TRANSFORMS OF YOUR INPUT VARIABLES THAT BETTER EXPOSE THIS LINEAR RELATIONSHIP CAN
RESULT IN A MORE ACCURATE MODEL.
• FAIL TO CONVERGE: IT IS POSSIBLE FOR THE EXPECTED LIKELIHOOD ESTIMATION PROCESS THAT
LEARNS THE COEFFICIENTS TO FAIL TO CONVERGE. THIS CAN HAPPEN IF THERE ARE MANY HIGHLY
CORRELATED INPUTS IN YOUR DATA OR THE DATA IS VERY SPARSE (E.G. LOTS OF ZEROS IN YOUR
INPUT DATA)

More Related Content

What's hot

Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
Amir Al-Ansary
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Edureka!
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
Knoldus Inc.
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
DrZahid Khan
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Simplilearn
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
Venkata Reddy Konasani
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
Sharayu Patil
 
Logistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | DisadvantagesLogistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | Disadvantages
Rajat Sharma
 
Logistic Regression Analysis
Logistic Regression AnalysisLogistic Regression Analysis
Logistic Regression Analysis
COSTARCH Analytical Consulting (P) Ltd.
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
Student
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
Suresh Pokharel
 
Logistic regression
  Logistic regression  Logistic regression
Logistic regression
Learnbay Datascience
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT Slides
AbhishekKumar4995
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
VARUN KUMAR
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
Andrew Ferlitsch
 
Lasso regression
Lasso regressionLasso regression
Lasso regression
Masayuki Tanaka
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
Suraj Parmar
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
Knoldus Inc.
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 

What's hot (20)

Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
Logistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | DisadvantagesLogistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | Disadvantages
 
Logistic Regression Analysis
Logistic Regression AnalysisLogistic Regression Analysis
Logistic Regression Analysis
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Logistic regression
  Logistic regression  Logistic regression
Logistic regression
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT Slides
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
Lasso regression
Lasso regressionLasso regression
Lasso regression
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 

Similar to Logistic regression

UNIT1-part2.pptx
UNIT1-part2.pptxUNIT1-part2.pptx
UNIT1-part2.pptx
AshokRachapalli1
 
Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)
MikeBlyth
 
Logmodels2
Logmodels2Logmodels2
Logmodels2
Logmodels2Logmodels2
BOOLEAN ALGEBRA & LOGIC GATE
BOOLEAN ALGEBRA & LOGIC GATEBOOLEAN ALGEBRA & LOGIC GATE
BOOLEAN ALGEBRA & LOGIC GATE
Ideal Eyes Business College
 
07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent
Subhas Kumar Ghosh
 
2. Linear regression with one variable.pptx
2. Linear regression with one variable.pptx2. Linear regression with one variable.pptx
2. Linear regression with one variable.pptx
Emad Nabil
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Maninda Edirisooriya
 
Standards math 8
Standards math 8Standards math 8
Standards math 8
Carisa Koch
 
Linear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in MLLinear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in ML
Kumud Arora
 
Simple linear regression (Updated).pptx
Simple linear regression (Updated).pptxSimple linear regression (Updated).pptx
Simple linear regression (Updated).pptx
discountglasstx
 
K-map Digital Logic Design DLD Theory l
K-map  Digital Logic Design DLD Theory lK-map  Digital Logic Design DLD Theory l
K-map Digital Logic Design DLD Theory l
khanyz4884
 
simplex method -1.pptx
simplex method -1.pptxsimplex method -1.pptx
simplex method -1.pptx
KulsumPaleja1
 
Regression
RegressionRegression
Regression
RAVI PRASAD K.J.
 
application of partial differentiation
application of partial differentiationapplication of partial differentiation
application of partial differentiation
eteaching
 
Ekreg ho-11-spatial ec 231112
Ekreg ho-11-spatial ec 231112Ekreg ho-11-spatial ec 231112
Ekreg ho-11-spatial ec 231112
Catur Purnomo
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
somimemon
 
How to use Logistic Regression in GIS using ArcGIS and R statistics
How to use Logistic Regression in GIS using ArcGIS and R statisticsHow to use Logistic Regression in GIS using ArcGIS and R statistics
How to use Logistic Regression in GIS using ArcGIS and R statistics
Omar F. Althuwaynee
 
Linear regression in machine learning
Linear regression in machine learningLinear regression in machine learning
Linear regression in machine learning
Shajun Nisha
 
Quantitative Analysis Homework Help
Quantitative Analysis Homework HelpQuantitative Analysis Homework Help
Quantitative Analysis Homework Help
Excel Homework Help
 

Similar to Logistic regression (20)

UNIT1-part2.pptx
UNIT1-part2.pptxUNIT1-part2.pptx
UNIT1-part2.pptx
 
Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)
 
Logmodels2
Logmodels2Logmodels2
Logmodels2
 
Logmodels2
Logmodels2Logmodels2
Logmodels2
 
BOOLEAN ALGEBRA & LOGIC GATE
BOOLEAN ALGEBRA & LOGIC GATEBOOLEAN ALGEBRA & LOGIC GATE
BOOLEAN ALGEBRA & LOGIC GATE
 
07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent
 
2. Linear regression with one variable.pptx
2. Linear regression with one variable.pptx2. Linear regression with one variable.pptx
2. Linear regression with one variable.pptx
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
 
Standards math 8
Standards math 8Standards math 8
Standards math 8
 
Linear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in MLLinear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in ML
 
Simple linear regression (Updated).pptx
Simple linear regression (Updated).pptxSimple linear regression (Updated).pptx
Simple linear regression (Updated).pptx
 
K-map Digital Logic Design DLD Theory l
K-map  Digital Logic Design DLD Theory lK-map  Digital Logic Design DLD Theory l
K-map Digital Logic Design DLD Theory l
 
simplex method -1.pptx
simplex method -1.pptxsimplex method -1.pptx
simplex method -1.pptx
 
Regression
RegressionRegression
Regression
 
application of partial differentiation
application of partial differentiationapplication of partial differentiation
application of partial differentiation
 
Ekreg ho-11-spatial ec 231112
Ekreg ho-11-spatial ec 231112Ekreg ho-11-spatial ec 231112
Ekreg ho-11-spatial ec 231112
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
How to use Logistic Regression in GIS using ArcGIS and R statistics
How to use Logistic Regression in GIS using ArcGIS and R statisticsHow to use Logistic Regression in GIS using ArcGIS and R statistics
How to use Logistic Regression in GIS using ArcGIS and R statistics
 
Linear regression in machine learning
Linear regression in machine learningLinear regression in machine learning
Linear regression in machine learning
 
Quantitative Analysis Homework Help
Quantitative Analysis Homework HelpQuantitative Analysis Homework Help
Quantitative Analysis Homework Help
 

Recently uploaded

Procurement and Contract Strategy in Malaysia
Procurement and Contract Strategy in MalaysiaProcurement and Contract Strategy in Malaysia
Procurement and Contract Strategy in Malaysia
SingLingLim1
 
buy a fake University of London diploma supplement
buy a fake University of London diploma supplementbuy a fake University of London diploma supplement
buy a fake University of London diploma supplement
GlethDanold
 
Computer Graphics - Cartesian Coordinate System.pdf
Computer Graphics - Cartesian Coordinate System.pdfComputer Graphics - Cartesian Coordinate System.pdf
Computer Graphics - Cartesian Coordinate System.pdf
Amol Gaikwad
 
Computer Vision and GenAI for Geoscientists.pptx
Computer Vision and GenAI for Geoscientists.pptxComputer Vision and GenAI for Geoscientists.pptx
Computer Vision and GenAI for Geoscientists.pptx
Yohanes Nuwara
 
AI INTERACTION WITH HUMAN IN DAILY LIFE (1).pptx
AI INTERACTION WITH HUMAN IN DAILY LIFE (1).pptxAI INTERACTION WITH HUMAN IN DAILY LIFE (1).pptx
AI INTERACTION WITH HUMAN IN DAILY LIFE (1).pptx
MoinKhan447017
 
Future Networking v Energy Limits ICTON 2024 Bari Italy
Future Networking v Energy Limits ICTON 2024 Bari ItalyFuture Networking v Energy Limits ICTON 2024 Bari Italy
Future Networking v Energy Limits ICTON 2024 Bari Italy
University of Hertfordshire
 
Electrical Engineering, DC - AC Machines
Electrical Engineering, DC - AC MachinesElectrical Engineering, DC - AC Machines
Electrical Engineering, DC - AC Machines
Jason J Pulikkottil
 
Classification of optical fibers and Modes of Optical Fiber
Classification of optical fibers and Modes of Optical FiberClassification of optical fibers and Modes of Optical Fiber
Classification of optical fibers and Modes of Optical Fiber
ShailajaUdtewar3
 
SM_5th-SEM_Cse_Mobile-Computing.pdf_________________
SM_5th-SEM_Cse_Mobile-Computing.pdf_________________SM_5th-SEM_Cse_Mobile-Computing.pdf_________________
SM_5th-SEM_Cse_Mobile-Computing.pdf_________________
smarakd64
 
Failure Engineering - Architecting Resilient API's
Failure Engineering - Architecting Resilient API'sFailure Engineering - Architecting Resilient API's
Failure Engineering - Architecting Resilient API's
Akash Saxena
 
Good Energy Haus: PHN Presents Building Electrification, A Passive House Symp...
Good Energy Haus: PHN Presents Building Electrification, A Passive House Symp...Good Energy Haus: PHN Presents Building Electrification, A Passive House Symp...
Good Energy Haus: PHN Presents Building Electrification, A Passive House Symp...
TE Studio
 
DOUBLE SKIN FACADE PRESENTATION SLIDE.pdf
DOUBLE SKIN FACADE PRESENTATION SLIDE.pdfDOUBLE SKIN FACADE PRESENTATION SLIDE.pdf
DOUBLE SKIN FACADE PRESENTATION SLIDE.pdf
shakyabhumika51
 
Fuel-Dlivery-Project PowerPoint presentations
Fuel-Dlivery-Project  PowerPoint presentationsFuel-Dlivery-Project  PowerPoint presentations
Fuel-Dlivery-Project PowerPoint presentations
jithujithin657
 
一比一原版(surrey毕业证书)英国萨里大学毕业证如何办理
一比一原版(surrey毕业证书)英国萨里大学毕业证如何办理一比一原版(surrey毕业证书)英国萨里大学毕业证如何办理
一比一原版(surrey毕业证书)英国萨里大学毕业证如何办理
g1toa2w
 
Structural Dynamics and Earthquake Engineering
Structural Dynamics and Earthquake EngineeringStructural Dynamics and Earthquake Engineering
Structural Dynamics and Earthquake Engineering
tushardatta
 
The X Window System Graphical User Interface
The X Window System Graphical User InterfaceThe X Window System Graphical User Interface
The X Window System Graphical User Interface
hindirahuerfano
 
Software Requirement Engineering Analyzing the Problem.pdf
Software Requirement Engineering Analyzing the Problem.pdfSoftware Requirement Engineering Analyzing the Problem.pdf
Software Requirement Engineering Analyzing the Problem.pdf
jeevaakatiravanhod
 
Trends in digital era-Programming Knowledge
Trends in digital era-Programming KnowledgeTrends in digital era-Programming Knowledge
Trends in digital era-Programming Knowledge
DrJSathyaPriyaPhd
 
Thesis on Assessment of Landslide Prone Area and Their Consequences Due to C...
Thesis on Assessment of Landslide Prone Area and Their Consequences  Due to C...Thesis on Assessment of Landslide Prone Area and Their Consequences  Due to C...
Thesis on Assessment of Landslide Prone Area and Their Consequences Due to C...
ErBamBhandari
 
A Case of Unrecognized Peripartum Cardiomyopathy Which Was Noticed During Eme...
A Case of Unrecognized Peripartum Cardiomyopathy Which Was Noticed During Eme...A Case of Unrecognized Peripartum Cardiomyopathy Which Was Noticed During Eme...
A Case of Unrecognized Peripartum Cardiomyopathy Which Was Noticed During Eme...
CrimsonPublishers-SBB
 

Recently uploaded (20)

Procurement and Contract Strategy in Malaysia
Procurement and Contract Strategy in MalaysiaProcurement and Contract Strategy in Malaysia
Procurement and Contract Strategy in Malaysia
 
buy a fake University of London diploma supplement
buy a fake University of London diploma supplementbuy a fake University of London diploma supplement
buy a fake University of London diploma supplement
 
Computer Graphics - Cartesian Coordinate System.pdf
Computer Graphics - Cartesian Coordinate System.pdfComputer Graphics - Cartesian Coordinate System.pdf
Computer Graphics - Cartesian Coordinate System.pdf
 
Computer Vision and GenAI for Geoscientists.pptx
Computer Vision and GenAI for Geoscientists.pptxComputer Vision and GenAI for Geoscientists.pptx
Computer Vision and GenAI for Geoscientists.pptx
 
AI INTERACTION WITH HUMAN IN DAILY LIFE (1).pptx
AI INTERACTION WITH HUMAN IN DAILY LIFE (1).pptxAI INTERACTION WITH HUMAN IN DAILY LIFE (1).pptx
AI INTERACTION WITH HUMAN IN DAILY LIFE (1).pptx
 
Future Networking v Energy Limits ICTON 2024 Bari Italy
Future Networking v Energy Limits ICTON 2024 Bari ItalyFuture Networking v Energy Limits ICTON 2024 Bari Italy
Future Networking v Energy Limits ICTON 2024 Bari Italy
 
Electrical Engineering, DC - AC Machines
Electrical Engineering, DC - AC MachinesElectrical Engineering, DC - AC Machines
Electrical Engineering, DC - AC Machines
 
Classification of optical fibers and Modes of Optical Fiber
Classification of optical fibers and Modes of Optical FiberClassification of optical fibers and Modes of Optical Fiber
Classification of optical fibers and Modes of Optical Fiber
 
SM_5th-SEM_Cse_Mobile-Computing.pdf_________________
SM_5th-SEM_Cse_Mobile-Computing.pdf_________________SM_5th-SEM_Cse_Mobile-Computing.pdf_________________
SM_5th-SEM_Cse_Mobile-Computing.pdf_________________
 
Failure Engineering - Architecting Resilient API's
Failure Engineering - Architecting Resilient API'sFailure Engineering - Architecting Resilient API's
Failure Engineering - Architecting Resilient API's
 
Good Energy Haus: PHN Presents Building Electrification, A Passive House Symp...
Good Energy Haus: PHN Presents Building Electrification, A Passive House Symp...Good Energy Haus: PHN Presents Building Electrification, A Passive House Symp...
Good Energy Haus: PHN Presents Building Electrification, A Passive House Symp...
 
DOUBLE SKIN FACADE PRESENTATION SLIDE.pdf
DOUBLE SKIN FACADE PRESENTATION SLIDE.pdfDOUBLE SKIN FACADE PRESENTATION SLIDE.pdf
DOUBLE SKIN FACADE PRESENTATION SLIDE.pdf
 
Fuel-Dlivery-Project PowerPoint presentations
Fuel-Dlivery-Project  PowerPoint presentationsFuel-Dlivery-Project  PowerPoint presentations
Fuel-Dlivery-Project PowerPoint presentations
 
一比一原版(surrey毕业证书)英国萨里大学毕业证如何办理
一比一原版(surrey毕业证书)英国萨里大学毕业证如何办理一比一原版(surrey毕业证书)英国萨里大学毕业证如何办理
一比一原版(surrey毕业证书)英国萨里大学毕业证如何办理
 
Structural Dynamics and Earthquake Engineering
Structural Dynamics and Earthquake EngineeringStructural Dynamics and Earthquake Engineering
Structural Dynamics and Earthquake Engineering
 
The X Window System Graphical User Interface
The X Window System Graphical User InterfaceThe X Window System Graphical User Interface
The X Window System Graphical User Interface
 
Software Requirement Engineering Analyzing the Problem.pdf
Software Requirement Engineering Analyzing the Problem.pdfSoftware Requirement Engineering Analyzing the Problem.pdf
Software Requirement Engineering Analyzing the Problem.pdf
 
Trends in digital era-Programming Knowledge
Trends in digital era-Programming KnowledgeTrends in digital era-Programming Knowledge
Trends in digital era-Programming Knowledge
 
Thesis on Assessment of Landslide Prone Area and Their Consequences Due to C...
Thesis on Assessment of Landslide Prone Area and Their Consequences  Due to C...Thesis on Assessment of Landslide Prone Area and Their Consequences  Due to C...
Thesis on Assessment of Landslide Prone Area and Their Consequences Due to C...
 
A Case of Unrecognized Peripartum Cardiomyopathy Which Was Noticed During Eme...
A Case of Unrecognized Peripartum Cardiomyopathy Which Was Noticed During Eme...A Case of Unrecognized Peripartum Cardiomyopathy Which Was Noticed During Eme...
A Case of Unrecognized Peripartum Cardiomyopathy Which Was Noticed During Eme...
 

Logistic regression

  • 2. LOGISTIC REGRESSION • LOGISTIC REGRESSION IS A MACHINE LEARNING CLASSIFICATION ALGORITHM THAT IS USED TO PREDICT THE PROBABILITY OF A CATEGORICAL DEPENDENT VARIABLE. • IN LOGISTIC REGRESSION, THE DEPENDENT VARIABLE IS A BINARY VARIABLE THAT CONTAINS DATA CODED AS 1 (YES, SUCCESS, ETC.) OR 0 (NO, FAILURE, ETC.). • IN OTHER WORDS, THE LOGISTIC REGRESSION MODEL PREDICTS P(Y=1) AS A FUNCTION OF X.
  • 3. REMEMBER THE HYPOTHESIS OF LINEAR REGRESSION
  • 4. LOGISTIC REGRESSION • LOGISTIC REGRESSION JUST HAS A TRANSFORMATION BASED ON LINEAR REGRESSION HYPOTHESIS. • FOR LOGISTIC REGRESSION, FOCUSING ON BINARY CLASSIFICATION HERE, WE HAVE CLASS 0 AND CLASS 1. • TO COMPARE WITH THE TARGET, WE WANT TO CONSTRAIN PREDICTIONS TO SOME VALUES BETWEEN 0 AND 1. • THAT’S WHY SIGMOID FUNCTION IS APPLIED ON THE RAW MODEL OUTPUT AND PROVIDES THE ABILITY TO PREDICT WITH PROBABILITY.
  • 5. LOGISTIC REGRESSION • WHAT HYPOTHESIS FUNCTION RETURNS IS THE PROBABILITY THAT Y = 1, GIVEN X, PARAMETERIZED BY Θ, WRITTEN AS: H(X) = P(Y = 1|X; Θ). • DECISION BOUNDARY CAN BE DESCRIBED AS: PREDICT 1, IF ΘᵀX ≥ 0 → H(X) ≥ 0.5; PREDICT 0, IF ΘᵀX < 0 → H(X) < 0.5.
  • 6. LOGISTIC FUNCTION • THE LOGISTIC FUNCTION, ALSO CALLED THE SIGMOID FUNCTION WAS DEVELOPED BY STATISTICIANS TO DESCRIBE PROPERTIES OF POPULATION GROWTH IN ECOLOGY, RISING QUICKLY AND MAXING OUT AT THE CARRYING CAPACITY OF THE ENVIRONMENT. IT’S AN S-SHAPED CURVE THAT CAN TAKE ANY REAL-VALUED NUMBER AND MAP IT INTO A VALUE BETWEEN 0 AND 1, BUT NEVER EXACTLY AT THOSE LIMITS : 1 / (1 + E^-VALUE) • WHERE E IS THE BASE OF THE NATURAL LOGARITHMS (EULER’S NUMBER OR THE EXP() FUNCTION IN YOUR SPREADSHEET) AND VALUE IS THE ACTUAL NUMERICAL VALUE THAT YOU WANT TO TRANSFORM.
  • 7. LOGISTIC FUNCTION • PLOT OF THE NUMBERS BETWEEN -5 AND 5 TRANSFORMED INTO THE RANGE 0 AND 1 USING THE LOGISTIC FUNCTION.
  • 8. LOGISTIC REGRESSION • INPUT VALUES (X) ARE COMBINED LINEARLY USING WEIGHTS OR COEFFICIENT VALUES (REFERRED TO AS THE GREEK CAPITAL LETTER BETA) TO PREDICT AN OUTPUT VALUE (Y). • A KEY DIFFERENCE FROM LINEAR REGRESSION IS THAT THE OUTPUT VALUE BEING MODELED IS A BINARY VALUES (0 OR 1) RATHER THAN A NUMERIC VALUE. • BELOW IS AN EXAMPLE LOGISTIC REGRESSION EQUATION: • Y = E^(B0 + B1*X) / (1 + E^(B0 + B1*X)) • WHERE Y IS THE PREDICTED OUTPUT, B0 IS THE BIAS OR INTERCEPT TERM AND B1 IS THE COEFFICIENT FOR THE SINGLE INPUT VALUE (X). • EACH COLUMN IN YOUR INPUT DATA HAS AN ASSOCIATED B COEFFICIENT (A CONSTANT REAL VALUE) THAT MUST BE LEARNED FROM YOUR TRAINING DATA. • THE ACTUAL REPRESENTATION OF THE MODEL THAT YOU WOULD STORE IN MEMORY OR IN A FILE ARE THE COEFFICIENTS IN THE EQUATION (THE BETA VALUE OR B’S)
  • 9. TECHNICAL INTERLUDE • LOGISTIC REGRESSION MODELS THE PROBABILITY OF THE DEFAULT CLASS (E.G. THE FIRST CLASS). • FOR EXAMPLE, IF WE ARE MODELING PEOPLE’S SEX AS MALE OR FEMALE FROM THEIR HEIGHT, THEN THE FIRST CLASS COULD BE MALE AND THE LOGISTIC REGRESSION MODEL COULD BE WRITTEN AS THE PROBABILITY OF MALE GIVEN A PERSON’S HEIGHT, OR MORE FORMALLY LIKE BELOW POINT. • P(SEX=MALE|HEIGHT) • WRITTEN ANOTHER WAY, WE ARE MODELING THE PROBABILITY THAT AN INPUT (X) BELONGS TO THE DEFAULT CLASS (Y=1), WE CAN WRITE THIS FORMALLY AS BELOW POINT. • P(X) = P(Y=1|X)
  • 10. LOG ODDS • LOGISTIC REGRESSION IS A LINEAR METHOD, BUT THE PREDICTIONS ARE TRANSFORMED USING THE LOGISTIC FUNCTION. • THE IMPACT OF THIS IS THAT WE CAN NO LONGER UNDERSTAND THE PREDICTIONS AS A LINEAR COMBINATION OF THE INPUTS AS WE CAN WITH LINEAR REGRESSION, FOR EXAMPLE, THE MODEL CAN BE STATED AS: P(X) = E^(B0 + B1*X) / (1 + E^(B0 + B1*X)) • WE CAN TURN AROUND THE ABOVE EQUATION AS FOLLOWS (REMEMBER WE CAN REMOVE THE E FROM ONE SIDE BY ADDING A NATURAL LOGARITHM (LN) TO THE OTHER): NEXT POINT • LN(P(X) / 1 – P(X)) = B0 + B1 * X • THIS IS USEFUL BECAUSE WE CAN SEE THAT THE CALCULATION OF THE OUTPUT ON THE RIGHT IS LINEAR AGAIN (JUST LIKE LINEAR REGRESSION), AND THE INPUT ON THE LEFT IS A LOG OF THE PROBABILITY OF THE DEFAULT CLASS.
  • 11. LOG ODDS • LN(P(X) / 1 – P(X)) = B0 + B1 * X • THIS RATIO ON THE LEFT IS CALLED THE ODDS OF THE DEFAULT CLASS. • ODDS ARE CALCULATED AS A RATIO OF THE PROBABILITY OF THE EVENT DIVIDED BY THE PROBABILITY OF NOT THE EVENT, E.G. 0.8/(1-0.8) WHICH HAS THE ODDS OF 4. SO WE COULD INSTEAD WRITE: LN(ODDS) = B0 + B1 * X • BECAUSE THE ODDS ARE LOG TRANSFORMED, WE CALL THIS LEFT HAND SIDE THE LOG-ODDS OR THE PROBIT. • WE CAN MOVE THE EXPONENT BACK TO THE RIGHT AND WRITE IT AS: • ODDS = E^(B0 + B1 * X) • ALL OF THIS HELPS US UNDERSTAND THAT INDEED THE MODEL IS STILL A LINEAR COMBINATION OF THE INPUTS, BUT THAT THIS LINEAR COMBINATION RELATES TO THE LOG-ODDS OF THE DEFAULT CLASS.
  • 12. COST FUNCTION • LINEAR REGRESSION USES LEAST SQUARED ERROR AS LOSS FUNCTION THAT GIVES A CONVEX GRAPH AND THEN WE CAN COMPLETE THE OPTIMIZATION BY FINDING ITS VERTEX AS GLOBAL MINIMUM. • HOWEVER, IT’S NOT AN OPTION FOR LOGISTIC REGRESSION ANYMORE. SINCE THE HYPOTHESIS IS CHANGED, LEAST SQUARED ERROR WILL RESULT IN A NON-CONVEX GRAPH WITH LOCAL MINIMUMS BY CALCULATING WITH SIGMOID FUNCTION APPLIED ON RAW MODEL OUTPUT.
  • 13. COST FUNCTION • INTUITIVELY, WE WANT TO ASSIGN MORE PUNISHMENT WHEN PREDICTING 1 WHILE THE ACTUAL IS 0 AND WHEN PREDICT 0 WHILE THE ACTUAL IS 1. • THE LOSS FUNCTION OF LOGISTIC REGRESSION IS DOING THIS EXACTLY WHICH IS CALLED LOGISTIC LOSS. • IF Y = 1, LOOKING AT THE PLOT ON LEFT, WHEN PREDICTION = 1, THE COST = 0, WHEN PREDICTION = 0, THE LEARNING ALGORITHM IS PUNISHED BY A VERY LARGE COST. • SIMILARLY, IF Y = 0, THE PLOT ON RIGHT SHOWS, PREDICTING 0 HAS NO PUNISHMENT BUT PREDICTING 1 HAS A LARGE VALUE OF COST.
  • 14. COST FUNCTION • ANOTHER ADVANTAGE OF THIS LOSS FUNCTION IS THAT ALTHOUGH WE ARE LOOKING AT IT BY Y = 1 AND Y = 0 SEPARATELY, IT CAN BE WRITTEN AS ONE SINGLE FORMULA WHICH BRINGS CONVENIENCE FOR CALCULATION: TOP 1ST PICTURE • SO THE COST FUNCTION OF THE MODEL IS THE SUMMATION FROM ALL TRAINING DATA SAMPLES: 2ND PICTURE
  • 15. OPTIMIZATION • WITH THE RIGHT LEARNING ALGORITHM, WE CAN START TO FIT BY MINIMIZING J(Θ) AS A FUNCTION OF Θ TO FIND OPTIMAL PARAMETERS. • WE CAN STILL APPLY GRADIENT DESCENT AS THE OPTIMIZATION ALGORITHM. • IT TAKES PARTIAL DERIVATIVE OF J WITH RESPECT TO Θ (THE SLOPE OF J), AND UPDATES Θ VIA EACH ITERATION WITH A SELECTED LEARNING RATE Α UNTIL THE GRADIENT DESCENT HAS CONVERGED.
  • 16. NEWTON’S METHOD • ANOTHER POPULAR OPTIMIZATION ALGORITHM, NEWTON’S METHOD, THAT APPLIES DIFFERENT APPROACH TO REACH THE GLOBAL MINIMUM OF COST FUNCTION. • SIMILAR TO GRADIENT DESCENT, WE FIRSTLY TAKE THE PARTIAL DERIVATIVE OF J(Θ) THAT IS THE SLOPE OF J(Θ), AND NOTE IT AS F(Θ). • INSTEAD OF DECREASING Θ BY A CERTAIN CHOSEN LEARNING RATE Α MULTIPLIED WITH F(Θ) , NEWTON’S METHOD GETS AN UPDATED Θ AT THE POINT OF INTERSECTION OF THE TANGENT LINE OF F(Θ) AT PREVIOUS Θ AND X AXIS. • AFTER AMOUNT OF ITERATIONS, NEWTON’S METHOD WILL CONVERGE AT F(Θ) = 0.
  • 17. NEWTON’S METHOD • SEE THE SIMPLIFIED PLOT, STARTING FROM THE RIGHT, THE YELLOW DOTTED LINE IS THE TANGENT OF F(Θ) AT THE Θ0. • IT DETERMINES THE POSITION OF Θ1, AND THE DISTANCE FROM THE Θ0 TO Θ1 IS Δ. • THIS PROCESS REPEATS UNTIL FINDING THE OPTIMAL Θ THAT SUBJECTS TO F(Θ) = 0, WHICH IS Θ3 IN THIS PLOT.
  • 18. MAXIMUM LIKELIHOOD ESTIMATION • MAXIMUM-LIKELIHOOD ESTIMATION IS A COMMON LEARNING ALGORITHM USED BY A VARIETY OF MACHINE LEARNING ALGORITHMS, ALTHOUGH IT DOES MAKE ASSUMPTIONS ABOUT THE DISTRIBUTION OF YOUR DATA. • THE BEST COEFFICIENTS WOULD RESULT IN A MODEL THAT WOULD PREDICT A VALUE VERY CLOSE TO 1 (E.G. MALE) FOR THE DEFAULT CLASS AND A VALUE VERY CLOSE TO 0 (E.G. FEMALE) FOR THE OTHER CLASS. • THE INTUITION FOR MAXIMUM-LIKELIHOOD FOR LOGISTIC REGRESSION IS THAT A SEARCH PROCEDURE SEEKS VALUES FOR THE COEFFICIENTS (BETA VALUES) THAT MINIMIZE THE ERROR IN THE PROBABILITIES PREDICTED BY THE MODEL TO THOSE IN THE DATA (E.G. PROBABILITY OF 1 IF THE DATA IS THE PRIMARY CLASS).
  • 19. PREPARE DATA FOR LOGISTIC REGRESSION • THE ASSUMPTIONS MADE BY LOGISTIC REGRESSION ABOUT THE DISTRIBUTION AND RELATIONSHIPS IN YOUR DATA ARE MUCH THE SAME AS THE ASSUMPTIONS MADE IN LINEAR REGRESSION. • BINARY OUTPUT VARIABLE: THIS MIGHT BE OBVIOUS AS WE HAVE ALREADY MENTIONED IT, BUT LOGISTIC REGRESSION IS INTENDED FOR BINARY (TWO-CLASS) CLASSIFICATION PROBLEMS. IT WILL PREDICT THE PROBABILITY OF AN INSTANCE BELONGING TO THE DEFAULT CLASS, WHICH CAN BE SNAPPED INTO A 0 OR 1 CLASSIFICATION. • REMOVE NOISE: LOGISTIC REGRESSION ASSUMES NO ERROR IN THE OUTPUT VARIABLE (Y), CONSIDER REMOVING OUTLIERS AND POSSIBLY MISCLASSIFIED INSTANCES FROM YOUR TRAINING DATA. • REMOVE CORRELATED INPUTS: LIKE LINEAR REGRESSION, THE MODEL CAN OVERFIT IF YOU HAVE MULTIPLE HIGHLY-CORRELATED INPUTS. CONSIDER CALCULATING THE PAIRWISE CORRELATIONS BETWEEN ALL INPUTS AND REMOVING HIGHLY CORRELATED INPUTS. • IT DOES ASSUME A LINEAR RELATIONSHIP BETWEEN THE INPUT VARIABLES WITH THE OUTPUT. DATA TRANSFORMS OF YOUR INPUT VARIABLES THAT BETTER EXPOSE THIS LINEAR RELATIONSHIP CAN RESULT IN A MORE ACCURATE MODEL. • FAIL TO CONVERGE: IT IS POSSIBLE FOR THE EXPECTED LIKELIHOOD ESTIMATION PROCESS THAT LEARNS THE COEFFICIENTS TO FAIL TO CONVERGE. THIS CAN HAPPEN IF THERE ARE MANY HIGHLY CORRELATED INPUTS IN YOUR DATA OR THE DATA IS VERY SPARSE (E.G. LOTS OF ZEROS IN YOUR INPUT DATA)