Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
5 views

Linear+regression+with+one+variable

Uploaded by

alaaabdo347890
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Linear+regression+with+one+variable

Uploaded by

alaaabdo347890
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Machine learning Algorithms

Linear regression with one variable :


Model representation
 Model Representation

 Cost Function

 Gradient Descent
500
Housing Prices
400

300

dependent 4
200
variable X

125
0
Independent variable

Supervised Regression:
Learning
“right answers” or “Labeled Predict continuous valued
data” given output (price)
B

Training set of Size in feet 2 ( x) Price ($ ) in 1000's (y)


housing prices 2104
1416
460
232
^
( Portland, OR )
1534 315 > m
852 178
••• •••

Notation:
m = Number of training examples
Example
x's = "input" variable / features
y's = "output" variable / "target" variable
x (1) 2104
(x,y) one training example (one raw)
y (2) 232
(x (i),y (i)) i th training example
x (4) 852
Training set the job of a learning algorithm to
output a function is usually
w denoted lowercase h and h
Learning algorithm stands for hypothesis

x h > y

the job of a hypothesis function


is taking the value of x and it
tries to output the estimated
value of y. So h is a function that
maps from x's to y's
How do we represent h ?

X
Linear Equations
Y
y = f (X) = 00 + <9lX
Change in Y
θ1= Slop (ΔY)

Change in X (ΔX)

θ0=Y-intercept
X
Types of Regression Models
Positive Linear Relationship Relationship NOT Linear
3.5
3-
2.5 - t +
2 +
1.5 | J
1
0.5 -
0 + t
5 10 15 20

Negative Linear Relationship No Relationship


S
7
6
5
4
3
2
1
O t + + t
O 2 4 6 S 1O
u

The cost function, let us figure out how to fit the best
possible straight line to our data.
Size in feet 2 ( x ) Price ($ ) in 1000's ( y)
Training Set
2104 460
1416 232
1534 315
852 178
••• ••
#

Hypothesis: he ( x ) — 9Q + 9\ X
How to choose θi’s ?
Scatter plot

• 1. Plot of All (Xi, Yi) Pairs


• 2. Suggests How Well Model Will Fit

Y
60
40
20
0 X
0 20 40 60
Thinking Challenge

How would you draw a line through the points?


How do you determine which line ‘fits best’?

Y
60
40
20
0 X
0 20 40 60

11
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?

Y
60
40
20
0 X
0 20 40 60
Intercept
unchanged
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Slope
unchanged

Y
60
40
20
0 X
0 20 40 60
Intercept
changed
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Slope
changed

Y
6
0
4
0 X
2 0 2 4 60
Intercept 0 0 0
changed
0
Least Squares
• 1. „Best Fit‟ Means Difference Between Actual Y
Values and Predicted Y Values is a Minimum.
• So square errors!
m m
 Yi  h(x i)   ˆi
2 2
i1 i1

15
Least Squares
• 1. „Best Fit‟ Means Difference Between Actual Y Values &
Predicted Y Values Are a Minimum. So square errors!

m m
 Yi  h(x i)   ˆi
2 2
i1 i1
• 2. LS Minimizes the Sum of the Squared Differences
(errors) (SSE)

16
Least Squares Graphically
n
LS minimizes   2
  2
1   2
2   2
3   2
4
i 1
Y Y201X2 ˆ2
^4
^2
^1 ^3
hθ(xi )  θ0  θ1 X i
X
17
Least Squared errors Linear
Regression

I
,

A
Minimiz
e

> predictions on the


X training set
the actual values
Idea : Choose #oA so that
/10 (2) is close to y for our
training examples ( x , y )
Minimiz
e
Cost function visualization
Consider a simple case of hypothesis by setting θ0=0, then h becomes : hθ(x)=θ1x
Each value of θ1 corresponds to a different hypothesis as it is the slope of the line
which corresponds to different lines passing through the origin as shown in plots below as y-intercept
i.e. θ0 is nulled out.

At θ1=2,

At θ1=1,

At θ1=0.5, J(0.5) ( 0.52 + l 2 + 1.52 ) = 0.58

Simple
Simple Hypothesis
Cost function visualization

At θ1=2,

At θ1=1,

At θ1=0.5, J(0.5) „ -- ( 0.52 + l 2 + 1.52 ) = 0.58


2* 6

On plotting points like this further, one gets


the following graph for the cost function which
is dependent on parameter θ1.

plot each value of θ1 corresponds to a


different hypothesizes -2 4 9\ 6
Cost function visualization

What is the optimal value of θ1 that minimizes


J(θ1) ?

It is clear that best value for θ1 =1 as J(θ1 ) = 0,


which is the minimum.

How to find the best value for θ1 ?

Plotting ?? Not practical specially in high


dimensions?
The solution :

1. Analytical solution: not applicable for large -2 4 9\ 6


datasets
2. Numerical solution: ex: Gradient descent .
Hypothesis:
h 0 ( x ) = #o H-

Parameters:

Cost Function:
m
(i ) 2
J ( 0o , 0 i ) = 2 hl 2Z (h o { x ) - y (i )
)

Goal: minimize J ( O n . 0 \ )
0o , 0 \
Gradient Descent
Iterative solution not only in linear regression. It's
actually used all over the place in machine learning.

 Objective: minimize any function ( Cost Function J)


p

Have some function J ( QQ , Q\ )


Want min J( #oA )
0O,0I

Outline:
• Start with some %

• Keep changing 0o
^ i -
to reduce / ( $o ? $i)

until we hopefully end up at a minimum


Imagine that this is a landscape of grassy park, and you want to go
to the lowest point in the park as rapidly as possible

Red: means
Starting point
high blue:
means low

J()


local
minimum 
New Starting
point

Red: means
high blue:
means low

J()

New local
minimum



With different starting
point
Gradient descent Algorithm
d
repeat until convergence
• Where
o
^
:= is the assignment operator
:= 9 j - a
893;

o ois the learning rate which basically defines how big the steps are during
the descent

o o «J( 0 0 ) s the partial derivative term


OGj o» i
'
o j = 0, 1 represents the feature index number

Also the parameters should be updated simulatenously , i.e. ,

tempo ’.= OQ — ex. %r J( eo , 0 )


dOo|
i
2 00

1 75

1.50

125

tempi Oi — at % J ( Bo, Oi )
dGx
- Z 1.00

0 75
0 50

9Q := tempo 0.25

-0.25 x
-050
6\ := tempi -0 50-0 75
-0.75
-1 005-1.00
J(θ1)

d
1 1  j(1)
+ slop
d1

θ1 θ1= θ1- (+ve)


J(θ1)

- slop

θ1= θ1- (-ve)


θ1
Gradient descent algorith Linear Regression Model

repeat until convergence


d
Mra3rwi)
]
I

( for j = 1 and j = (!)


/
d m

d j
j(0,1) 
d 1
 h (xi ) Yi 2
d j 2m i1
d
 0 1 (xi ) Yi 
1 m
d
j(0,1)  
2
d j d j 2m i1
m
j(0,1)  h(xi )  Yi 
d 1
j  0:
d0 m i1
m
j(0,1)  h(xi )  Yi  xi
d 1
j 1:
d1 m i1
Gradient descent algorithm

repeat until convergence {



m
1 ( i ) ) - (i)
#0 0() am E ( M* 2/ )
m
01 = 01

1
a 111 E ( M* (i ) ) - (i )
2/ ) xw•

}
"Batch" Gradient Descent

"Batch": Each step of gradient descent


uses all the training examples.
repeat until convergence {
m
9o : Oo

1
am w {i )
£ (M* ) y )
-

m
9\ 91 i
am E { ho { x {l)
)- y ) - x
{i)
^
}
Example after implement some iterations
using gradient descent
Iteration 1

0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 2

0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 3

0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 4

0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 5

0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 6

0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 7

0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 8

0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 9
0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 10
0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 11
0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Thanks

You might also like