Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
4 views

linear+regression+with+multiple+variable

Uploaded by

alaaabdo347890
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

linear+regression+with+multiple+variable

Uploaded by

alaaabdo347890
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Machine learning Algorithms

Linear regression with multiple variables


Polynomial regression
aH (ft

Size ( feet 2 ) Price ( $1000)

y
2104 460
1416
1534
852 178

x) Oo
D
D

Multiple features ( variables ).


Size ( feet 2 ) Number of Number of Age of home Price ( $1000)
bedrooms floors ( years)
X1 X2 X3 X4 y
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315 m=47
852 2 1 36 178
••• ••• ••• ••• •••
1416
Notation: 3 
 
• n = number of features |N=4 x (2) = 2 
 
x(0 = input ( features) of i 1 ' training example. 40 
.()
i-’ = value of feature j in i ' training example
11
. X3(2)  2
]
^ a . D
B D

Multiple features ( variables ).


Size ( feet 2 ) Number of Number of Age of home Price ( $1000)
bedrooms floors ( years)
X1 X2 X3 X4 y
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
••• ••• ••• ••• •••
/ \

X1(4)  852
Hypothesis:
Previously: HQ { X ) $o H~ @\ X

hg ( x ) = 0Q H $1^1 I $2^2 + * * * + @nxn


" “"

The weights used by the model indicate the effect of each


descriptive feature on the predictions returned by the
model

h 80  0.1x1 3x2  0.01x3  2x4


Z I I
Base Price t no. of bedroom I age of house
t size no. of floor
t
Hypothesis: h$ ( x ) = fx = 9QXQ + S\ X \ + + + 6nxn
Parameters: $oA • • • 5 v0 n
^ 2 •••

} ( simultaneously update for everyi = 0, ... , n )


New algorithm (» > 1):

j
1 X 0 =1
Feature Scaling
Idea: Make sure features are on a similar scale.
E.g. X \ = size (0- 2000 feet 2 )
3? 2 = number of bedrooms (1-5 )

size ( feet 2 )
x1 2000

number of bedrooms
5

0  x1  1 0  x2  1
C

More generally, when we're performing feature scaling,


what we often want to do is get every feature into
approximately a -1 to +1 range and concretely, your
feature x0 is always equal to 1. So, that's already in that
range.
1  xi  1

0 < x1
xl < 3 * -100100
< x3 < 100
x3  100

22 < x2
x2 <0.5
0.5 ^ -0.00001 x4< 0.00001
0.000< 01 x4  0.00001
Mean normalization
Replace x with X - / /,; to make features have approximately zero mean
* *
( Do not apply to XQ = 1 ).

E .g . size- 1000
Xi = 2000

X2
# bedroom.s- 2
5
-0.5 < xi < 0.5, -0.5 < x2 < 0.5
Average of x1 in
the training

x 1  1 set

x1 
R1 Rang
e
- "Debugging": How to make sure gradient
descent is working correctly.

- How to choose learning rate a


Choosing a
a too small a too large
Increasing value for J ( 6 )
slow convergence

• May overshoot the minimum


• May fail to converge
• May even diverge

To see if gradient descent is working, print out J ( Q ) each iteration


• The value should decrease at each iteration
• If it doesn't, adjust a
Making sure gradient descent is working correctly.

min J ( 6 )
e if gradient is working properly
then J(Ɵ) should decrease
after every iteration.

J(Ɵ1
) J(Ɵ2
J(Ɵ3
θ1 )
)
θ2
θ3

300 400
No. of iterations
Making sure gradient descent is working correctly.
m
Gradient descent not working.
Use smaller a.
No. of iterations ^
./ ( <9 )

- For sufficiently small cv, J ( 0 ) should decrease on every iteration.


- But if a is too small, gradient descent can be slow to converge.
V 1J

•The yellow plot shows the divergence of the


algorithm when the learning rate is really high
where in the learning steps overshoot.

•The green plot shows the case where learning


rate is not as large as the previous case but is
high enough that the steps keep oscillating at
a point which is not the minima.

•The red plot would be the optimum


curve for the cost drop as it drops steeply
initially and then saturates very close to the
optimum value.

•The blue plot is the least value


of α and converges very slowly as the steps
taken by the algorithm during update steps are epoch
very small.
Effect of alpha on convergence
Summary:
- If a is too small: slow convergence.
%

- If a is too large: J ( 6 ) may not decrease on


every iteration; may not converge.
In order to choose optimum value of α run the algorithm with different
values like:

. . . , 0.001, 0.003 0.01 0.03 0.3

plot the learning curve to understand whether the value should be


increased or decreased.
The sign of the coefficient for the highest order regressor
determines
the direction of the curvature
Linear Quadratic Cubic
Y’ = 0 + 1X Y’ = 0 + 1X + 1X2 Y’ = 0 + 1X + 1X2
+ 1X3
40

30

70

1!

-10
-20 11

-30
JO

JO -10 0 10 20 30 40 S
X

Y’ = 0 + - Y’ = 0 + 1X + - Y’ = 0 + 1X + 1X2 + -
1X 1X2 1X3
40

30

20

-1000
10

r

-10
-20
-30

JO -10 0 10 20 30 40 SO
25
X “"So -30 ?0
• -10 0
X
10 20 30 40 50
Polynomial regression

Price
( v)
hθ(xθ=)0θ+1x1θ+2x1 2
hθ(xθ=)0θ+1x1θ+2x1 2θ +3x1 3

Size ( x )

h$ ( x ) — 00 + 0\ %\ "" I 0‘2 %2 i
“"
0.1
2
= o+ \
0 0 ( size ) + 02 ( size ) + 0$ ( size )*
X\ ( size ) Size 1-1000
2 Range
X 2 = ( size ) size2 1-1000 000

) 3
^3 ( size size3 1- 1000 000 000
Generalization in Machine Learning

The goal of a good machine learning model is to generalize well


from the training data to any data from the problem domain. This
allows us to make predictions in the future on data the model has
never seen. “learning general concepts from specific examples”
There is a terminology used in machine learning when we talk
about how well a machine learning model learns and
generalizes to new data, namely overfitting and underfitting.

 Overfitting and underfitting are the two biggest causes for poor
performance of machine learning algorithms.
Over-fitting
Overfitting refers to a model that models the training data
too well.

Overfitting happens when a model learns the detail and noise


in the training data to the extent that it negatively impacts the
performance of the model on new data. This means that the
noise or random fluctuations in the training data is picked up
and learned as concepts by the model. The problem is that
these concepts do not apply to new data and negatively
impact the models ability to generalize.

decision trees are a machine learning algorithm that is very


flexible and is subject to overfitting training data.
Under-fitting
 Underfitting refers to a model that can neither model the training
data nor generalize to new data.
An underfitting machine learning model is not a suitable model and
will be obvious as it will have poor performance on the training data.
A Good Fit in Machine Learning
Ideally, you want to select a model at the sweet spot between underfitting and
overfitting.

The sweet spot is the point just before the error on the test dataset starts to increase
where the model has good skill on both the training dataset and the unseen test
dataset.

 Both overfitting and underfitting can lead to poor model performance. But by
far the most common problem in applied machine learning is overfitting.

in order to limit overfitting by using a resampling technique (k-fold cross


validation) to estimate model accuracy.

k-fold cross validation allows you to train and test your model k-times on
different subsets of training data and build up an estimate of the performance of
a machine learning model on unseen data.
Example

*\ X o o o o oo
X oo o o X o oo o X o
X OoO X 0 0 0G o
'

X
Xo Xo , X O X X X _-
x o>po X X \J3 oo X D L '
*XXX x\ x Xx X X XXX
X XX X xx
V X
-
Under flcclng. .
A ppr u p ir j Lfi l ittirtfl v«r ritting
>

MUL irnpk lei


^ . -
( fDfiefiKinfc f M
raplAin ihr- ^ n ncc 'i gnori tn br cru &J
Example 2

Values Values Values


•— *
i •
* : i

fv
I | *
t V •
*
1 i
;
« . .* - •

«c
•n
#
7 n
;

* • :

f '>
V .4
Time Time Time

Underfitted Good Fit/Robust Overfitted


Thanks

You might also like