Linear+regression+with+one+variable
Linear+regression+with+one+variable
Cost Function
Gradient Descent
500
Housing Prices
400
300
dependent 4
200
variable X
125
0
Independent variable
Supervised Regression:
Learning
“right answers” or “Labeled Predict continuous valued
data” given output (price)
B
Notation:
m = Number of training examples
Example
x's = "input" variable / features
y's = "output" variable / "target" variable
x (1) 2104
(x,y) one training example (one raw)
y (2) 232
(x (i),y (i)) i th training example
x (4) 852
Training set the job of a learning algorithm to
output a function is usually
w denoted lowercase h and h
Learning algorithm stands for hypothesis
x h > y
X
Linear Equations
Y
y = f (X) = 00 + <9lX
Change in Y
θ1= Slop (ΔY)
Change in X (ΔX)
θ0=Y-intercept
X
Types of Regression Models
Positive Linear Relationship Relationship NOT Linear
3.5
3-
2.5 - t +
2 +
1.5 | J
1
0.5 -
0 + t
5 10 15 20
The cost function, let us figure out how to fit the best
possible straight line to our data.
Size in feet 2 ( x ) Price ($ ) in 1000's ( y)
Training Set
2104 460
1416 232
1534 315
852 178
••• ••
#
Hypothesis: he ( x ) — 9Q + 9\ X
How to choose θi’s ?
Scatter plot
Y
60
40
20
0 X
0 20 40 60
Thinking Challenge
Y
60
40
20
0 X
0 20 40 60
11
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Y
60
40
20
0 X
0 20 40 60
Intercept
unchanged
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Slope
unchanged
Y
60
40
20
0 X
0 20 40 60
Intercept
changed
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Slope
changed
Y
6
0
4
0 X
2 0 2 4 60
Intercept 0 0 0
changed
0
Least Squares
• 1. „Best Fit‟ Means Difference Between Actual Y
Values and Predicted Y Values is a Minimum.
• So square errors!
m m
Yi h(x i) ˆi
2 2
i1 i1
15
Least Squares
• 1. „Best Fit‟ Means Difference Between Actual Y Values &
Predicted Y Values Are a Minimum. So square errors!
m m
Yi h(x i) ˆi
2 2
i1 i1
• 2. LS Minimizes the Sum of the Squared Differences
(errors) (SSE)
16
Least Squares Graphically
n
LS minimizes 2
2
1 2
2 2
3 2
4
i 1
Y Y201X2 ˆ2
^4
^2
^1 ^3
hθ(xi ) θ0 θ1 X i
X
17
Least Squared errors Linear
Regression
I
,
A
Minimiz
e
At θ1=2,
At θ1=1,
Simple
Simple Hypothesis
Cost function visualization
At θ1=2,
At θ1=1,
Parameters:
Cost Function:
m
(i ) 2
J ( 0o , 0 i ) = 2 hl 2Z (h o { x ) - y (i )
)
Goal: minimize J ( O n . 0 \ )
0o , 0 \
Gradient Descent
Iterative solution not only in linear regression. It's
actually used all over the place in machine learning.
Outline:
• Start with some %
• Keep changing 0o
^ i -
to reduce / ( $o ? $i)
Red: means
Starting point
high blue:
means low
J()
local
minimum
New Starting
point
Red: means
high blue:
means low
J()
New local
minimum
With different starting
point
Gradient descent Algorithm
d
repeat until convergence
• Where
o
^
:= is the assignment operator
:= 9 j - a
893;
o ois the learning rate which basically defines how big the steps are during
the descent
1 75
1.50
125
tempi Oi — at % J ( Bo, Oi )
dGx
- Z 1.00
0 75
0 50
9Q := tempo 0.25
-0.25 x
-050
6\ := tempi -0 50-0 75
-0.75
-1 005-1.00
J(θ1)
d
1 1 j(1)
+ slop
d1
- slop
d j
j(0,1)
d 1
h (xi ) Yi 2
d j 2m i1
d
0 1 (xi ) Yi
1 m
d
j(0,1)
2
d j d j 2m i1
m
j(0,1) h(xi ) Yi
d 1
j 0:
d0 m i1
m
j(0,1) h(xi ) Yi xi
d 1
j 1:
d1 m i1
Gradient descent algorithm
}
"Batch" Gradient Descent
m
9\ 91 i
am E { ho { x {l)
)- y ) - x
{i)
^
}
Example after implement some iterations
using gradient descent
Iteration 1
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 2
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 3
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 4
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 5
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 6
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 7
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 8
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 9
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 10
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 11
0.6
0.5 -
0.4 -
0.3 -
0.2 -
0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Thanks