linear+regression+with+multiple+variable
linear+regression+with+multiple+variable
y
2104 460
1416
1534
852 178
x) Oo
D
D
X1(4) 852
Hypothesis:
Previously: HQ { X ) $o H~ @\ X
j
1 X 0 =1
Feature Scaling
Idea: Make sure features are on a similar scale.
E.g. X \ = size (0- 2000 feet 2 )
3? 2 = number of bedrooms (1-5 )
size ( feet 2 )
x1 2000
number of bedrooms
5
0 x1 1 0 x2 1
C
0 < x1
xl < 3 * -100100
< x3 < 100
x3 100
22 < x2
x2 <0.5
0.5 ^ -0.00001 x4< 0.00001
0.000< 01 x4 0.00001
Mean normalization
Replace x with X - / /,; to make features have approximately zero mean
* *
( Do not apply to XQ = 1 ).
E .g . size- 1000
Xi = 2000
X2
# bedroom.s- 2
5
-0.5 < xi < 0.5, -0.5 < x2 < 0.5
Average of x1 in
the training
x 1 1 set
x1
R1 Rang
e
- "Debugging": How to make sure gradient
descent is working correctly.
min J ( 6 )
e if gradient is working properly
then J(Ɵ) should decrease
after every iteration.
J(Ɵ1
) J(Ɵ2
J(Ɵ3
θ1 )
)
θ2
θ3
300 400
No. of iterations
Making sure gradient descent is working correctly.
m
Gradient descent not working.
Use smaller a.
No. of iterations ^
./ ( <9 )
30
70
1!
-10
-20 11
-30
JO
JO -10 0 10 20 30 40 S
X
Y’ = 0 + - Y’ = 0 + 1X + - Y’ = 0 + 1X + 1X2 + -
1X 1X2 1X3
40
30
20
-1000
10
r
•
-10
-20
-30
JO -10 0 10 20 30 40 SO
25
X “"So -30 ?0
• -10 0
X
10 20 30 40 50
Polynomial regression
Price
( v)
hθ(xθ=)0θ+1x1θ+2x1 2
hθ(xθ=)0θ+1x1θ+2x1 2θ +3x1 3
Size ( x )
h$ ( x ) — 00 + 0\ %\ "" I 0‘2 %2 i
“"
0.1
2
= o+ \
0 0 ( size ) + 02 ( size ) + 0$ ( size )*
X\ ( size ) Size 1-1000
2 Range
X 2 = ( size ) size2 1-1000 000
) 3
^3 ( size size3 1- 1000 000 000
Generalization in Machine Learning
Overfitting and underfitting are the two biggest causes for poor
performance of machine learning algorithms.
Over-fitting
Overfitting refers to a model that models the training data
too well.
The sweet spot is the point just before the error on the test dataset starts to increase
where the model has good skill on both the training dataset and the unseen test
dataset.
Both overfitting and underfitting can lead to poor model performance. But by
far the most common problem in applied machine learning is overfitting.
k-fold cross validation allows you to train and test your model k-times on
different subsets of training data and build up an estimate of the performance of
a machine learning model on unseen data.
Example
*\ X o o o o oo
X oo o o X o oo o X o
X OoO X 0 0 0G o
'
X
Xo Xo , X O X X X _-
x o>po X X \J3 oo X D L '
*XXX x\ x Xx X X XXX
X XX X xx
V X
-
Under flcclng. .
A ppr u p ir j Lfi l ittirtfl v«r ritting
>
fv
I | *
t V •
*
1 i
;
« . .* - •
«c
•n
#
7 n
;
* • :
f '>
V .4
Time Time Time