Datamining Lecture6
Datamining Lecture6
LECTURE 6
Linear Regression
Logistic Regression
Neutral Networks
A Little Bit of Deep Learning
CSC177
Dr. Victor Chen
Regression Problem
• The problem of predicting continuous values is
called regression problem
• General approach: find a continuous function that
models the continuous points.
Linear Regression with one input
y=α+β*x
y = α + β1 * X1 + β2 * X2 + … + βk * Xk
b1 = slope
= ∆y/ ∆x
b0 (y intercept)
Sum of Squares of Error (SSE) is the sum of all the squared errors.
Sum of Squares of Regression (SSR)
Dependent variable
Data mean
R-Squared score
• R-Squared score can be used as a single summary
number to measure the quality of linear regression
model
• The value of R2 can range between 0 and 1.
• The higher R2, the more accurate the regression
model is.
Nonlinear Regression
Nonlinear functions can also be fit as
regressions
• Haiquan Chen, Wei-Shinn Ku, Haixun Wang, Liang Tang, Min-Te Sun:
Scaling Up Markov Logic Probabilistic Inference for Social Graphs. IEEE Trans. Knowl. Data
Eng. 29(2): 433-445 (2017)
15
Experimental validation
Now Logistic Regression…
Linear Regression Doesn’t Work
• A linear function/regression is not good
• It may produce probabilities beyond [0, 1]
1
𝑃 𝐶𝑥 =
1 + 𝑒 −(α+β⋅𝒙)
Q: What is the logistic regression model for more than one dimension?
Logistic Regression
• For 2-class problem, the probability threshold is
set to 0.5 so
• If the predicted probability >= 0.5, predict “y = 1”,
• If the predicted probability > 0.5, predict “y = 0”,
How β affects the model
Compare Two Models In One Dimension
Coefficients
𝛽1 = −1.9
𝛽2 = −0.4
𝛼 = 13.04
Estimating the coefficients
• Use gradient descent algorithm to find the near-
optimal coefficients for linear/logistic regression
Gradient descent for two parameters
Gradient descent for two parameters
Gradient descent implementation
Sklearn implementation
• http://scikit-
learn.org/stable/modules/generated/sklearn.linear_model.
LinearRegression.html
• http://scikit-
learn.org/stable/modules/generated/sklearn.linear_model.
LogisticRegression.html
Logistic/Linear Regression Advantages
• Linear regression produces a continuous output.
• Logistic regression produces class membership
with predicted probability .
• The coefficients can be used for understanding
the feature importance.
• Works for relatively large datasets
Neutral networks
• Logistic regression can be considered as the
simplest form of a neutral network, which is a
collection of perceptron.
• Perceptron is seen as an analogy to a biological
neuron.
Perceptron (Neuron)
2-class classification with one neuron
2-class classification with two neurons,
one for each class
Putting multiple neurons in parallel we can predict multiple classes
sigmoid vs softmax
• sigmoid function is • softmax function is
used for the two-class used for the multiclass
logistic regression logistic regression
Softmax implementation
What is softmax([1, 2, 3])
e1
• y1 = 1 2 3 = 0.09
e +e +e
e2
• y1 = 1 2 3 = 0.24
e +e +e
e3
• y3 = 1 2 3 = 0.67
e +e +e
Output: