0% found this document useful (0 votes)

16 views

Introduction of Machine Learning

Uploaded by

gjiacheng123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Introduction of Machine Learning

Uploaded by

gjiacheng123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Introduction of

Machine / Deep Learning

Hung-yi Lee 李宏毅
Machine Learning
≈ Looking for Function
• Speech Recognition

f( ) = “How are you”

• Image Recognition
f( ) = “Cat”
• Playing Go

f( ) = “5-5” (next move)

Different types of Functions
Regression: The function outputs a scalar.
PM2.5 today
Predict PM2.5 of
PM2.5
temperature f tomorrow
Concentration
of O3

Classification: Given options (classes), the function outputs

the correct one.

Spam
filtering f Yes/No
Different types of Functions
Classification: Given options (classes), the function
outputs the correct one.
Each position
is a class
(19 x 19 classes)

Function
a position on
the board

Next move
Playing GO
Structured Learning
create something with
structure (image, document)

Regression,
Classification
How to find a function?
A Case Study
YouTube Channel

https://www.youtube.com/c/HungyiLeeNTU
The function we want to find …

𝑦=𝑓
no. of views
on 2/26
1. Function
with Unknown Parameters

𝑦=𝑓

Model 𝑦 = 𝑏 + 𝑤𝑥! based on domain knowledge

feature
𝑦: no. of views on 2/26, 𝑥!: no. of views on 2/25
𝑤 and 𝑏 are unknown parameters (learned from data)
weight bias
Ø Loss is a function of
2. Define Loss parameters 𝐿 𝑏, 𝑤
from Training Data Ø Loss: how good a set of
values is.
𝐿 0.5𝑘, 1 𝑦 = 𝑏 + 𝑤𝑥! 𝑦 = 0.5𝑘 + 1𝑥! How good it is?
Data from 2017/01/01 – 2020/12/31
2017/01/01 01/02 01/03 …… 2020/12/30 12/31

4.8k 4.9k 7.5k 3.4k 9.8k

0.5𝑘+1𝑥! = 𝑦 5.3k

𝑒!= 𝑦 − 𝑦& = 0.4𝑘

label 𝑦&

4.9k
Ø Loss is a function of
2. Define Loss parameters 𝐿 𝑏, 𝑤
from Training Data Ø Loss: how good a set of
values is.
𝐿 0.5𝑘, 1 𝑦 = 𝑏 + 𝑤𝑥! 𝑦 = 0.5𝑘 + 1𝑥! How good it is?
Data from 2017/01/01 – 2020/12/31
2017/01/01 01/02 01/03 …… 2020/12/30 12/31

4.8k 4.9k 7.5k 3.4k 9.8k

0.5𝑘+1𝑥! = 𝑦 5.4k 0.5𝑘+1𝑥! = 𝑦

𝑒"= 𝑦 − 𝑦& = 2.1𝑘 𝑒#
𝑦& 𝑦&

4.9k 7.5k 9.8k

Ø Loss is a function of
2. Define Loss parameters 𝐿 𝑏, 𝑤
from Training Data Ø Loss: how good a set of
values is.

4.8k 4.9k

𝑏 + 𝑤𝑥! = 𝑦 1
Loss: 𝐿 = , 𝑒"
𝑒! 𝑁
"
𝑦&

4.9k

𝑒 = 𝑦 − 𝑦& 𝐿 is mean absolute error (MAE)

𝑒 = 𝑦 − 𝑦& "
𝐿 is mean square error (MSE)
If 𝑦 and 𝑦& are both probability distributions Cross-entropy
Ø Loss is a function of
2. Define Loss parameters 𝐿 𝑏, 𝑤
from Training Data Ø Loss: how good a set of
Model 𝑦 = 𝑏 + 𝑤𝑥! values is.
Small 𝐿

𝑏 Error Surface

Large 𝐿 𝑤
Source of image: http://chico386.pixnet.net/album/photo/171572850

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

%,'

Gradient Descent
Ø (Randomly) Pick an initial value 𝑤 (
𝜕𝐿
Ø Compute |%)%!
𝜕𝑤
Loss
𝐿 Negative Increase w

Positive Decrease w

𝑤( 𝑤
Source of image: http://chico386.pixnet.net/album/photo/171572850

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

%,'

Gradient Descent
Ø (Randomly) Pick an initial value 𝑤 (
𝜕𝐿
Ø Compute |%)%!
𝜕𝑤
Loss 𝜕𝐿
!
𝑤 ←𝑤 −𝜂 ( |%)%!
𝐿 𝜕𝑤
𝜕𝐿
𝜂 |%)%! 𝜂: learning rate
𝜕𝑤
hyperparameters

𝑤( 𝑤! 𝑤
Source of image: http://chico386.pixnet.net/album/photo/171572850

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

%,'

Gradient Descent
Ø (Randomly) Pick an initial value 𝑤 (
𝜕𝐿
Ø Compute |%)%!
𝜕𝑤
Loss 𝜕𝐿
!
𝑤 ←𝑤 −𝜂 ( |%)%!
𝐿 𝜕𝑤
Ø Update 𝑤 iteratively
Does local minima truly cause the problem?

Local global
minima minima
𝑤( 𝑤! 𝑤" 𝑤* 𝑤
3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿
%,'

Ø (Randomly) Pick initial values 𝑤 (, 𝑏 (

Ø Compute

𝜕𝐿 𝜕𝐿
|%)%! ,')'! 𝑤! ← 𝑤( −𝜂 |%)%! ,')'!
𝜕𝑤 𝜕𝑤
𝜕𝐿 𝜕𝐿
|%)%! ,')'! 𝑏! ← 𝑏( − 𝜂 |%)%! ,')'!
𝜕𝑏 𝜕𝑏

Can be done in one line in most deep learning frameworks

Ø Update 𝑤 and 𝑏 interatively
Model 𝑦 = 𝑏 + 𝑤𝑥!

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

%,'

Compute 𝜕𝐿⁄𝜕𝑤, 𝜕𝐿⁄𝜕𝑏

𝑤 ∗ = 0.97, 𝑏 ∗ = 0.1𝑘
𝑏 𝐿 𝑤 ∗ , 𝑏 ∗ = 0.48𝑘
(−𝜂 𝜕𝐿⁄𝜕𝑤, −𝜂 𝜕𝐿⁄𝜕𝑏)

Compute 𝜕𝐿⁄𝜕𝑤, 𝜕𝐿⁄𝜕𝑏

𝑤
Machine Learning is so simple ……
𝑤 ∗ = 0.97, 𝑏 ∗ = 0.1𝑘
𝑦 = 𝑏 + 𝑤𝑥! 𝐿 𝑤 ∗ , 𝑏 ∗ = 0.48𝑘
Step 1: Step 2: define
Step 3:
function with loss from
optimization
unknown training data
Machine Learning is so simple ……
𝑤 ∗ = 0.97, 𝑏 ∗ = 0.1𝑘
𝑦 = 𝑏 + 𝑤𝑥! 𝐿 𝑤 ∗ , 𝑏 ∗ = 0.48𝑘
Step 1: Step 2: define
Step 3:
function with loss from
optimization
unknown training data

Training

𝑦 = 0.1𝑘 + 0.97𝑥! achieves the smallest loss 𝐿 = 0.48𝑘

on data of 2017 – 2020 (training data)
How about data of 2021 (unseen during training)?
𝐿′ = 0.58𝑘
Red: real no. of views
𝑦 = 0.1𝑘 + 0.97𝑥! blue: estimated no. of views

Views
(k)

2021/01/01 2021/02/14
2017 - 2020 2021
𝑦 = 𝑏 + 𝑤𝑥!
𝐿 = 0.48𝑘 𝐿′ = 0.58𝑘
'
2017 - 2020 2021
𝑦 = 𝑏 + , 𝑤# 𝑥#
𝐿 = 0.38𝑘 𝐿′ = 0.49𝑘
#$!
𝒃 𝒘∗𝟏 𝒘∗𝟐 𝒘∗𝟑 𝒘∗𝟒 𝒘∗𝟓 𝒘∗𝟔 𝒘∗𝟕
0.05k 0.79 -0.31 0.12 -0.01 -0.10 0.30 0.18
()
2017 - 2020 2021
𝑦 = 𝑏 + , 𝑤# 𝑥# 𝐿′ = 0.46𝑘
𝐿 = 0.33𝑘
#$!
%&
2017 - 2020 2021
𝑦 = 𝑏 + , 𝑤# 𝑥#
𝐿 = 0.32𝑘 𝐿′ = 0.46𝑘
#$!
Linear models
Linear models are too simple … we need more sophisticated modes.

Different w
Different 𝑏

𝑥!

Linear models have severe limitation. Model Bias

We need a more flexible model!
red curve = constant + sum of a set of

0
𝑥!

2
All Piecewise Linear Curves
= constant + sum of a set of

More pieces require more

Beyond Piecewise Linear?
Approximate continuous curve
𝑦 by a piecewise linear curve.

𝑥!
To have good approximation, we need sufficient pieces.
red curve = constant + sum of a set of

How to represent
this function? Hard Sigmoid

𝑥!

Sigmoid Function
1
𝑦=𝑐
1 + 𝑒 2 '3%4"
= 𝑐 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏 + 𝑤𝑥!

𝑥!
Different 𝑤

Change slopes

Different b

Shift

Different 𝑐

Change height
red curve = sum of a set of + constant

𝑦
𝑐! 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏! + 𝑤!𝑥!
1

𝑐5 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏5 + 𝑤5𝑥! 3

0
𝑥!
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + 𝑤* 𝑥!
0 * 1 + 2 + 3
𝑐" 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏" + 𝑤"𝑥! 2
New Model: More Features
𝑦 = 𝑏 + 𝑤𝑥!

𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + 𝑤* 𝑥!
*

𝑦 = 𝑏 + , 𝑤# 𝑥#
#

𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥#
* #
𝑗: 1,2,3
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥# no. of features
* # 𝑖: 1,2,3
no. of sigmoid
1
𝑟! = 𝑏! + 𝑤!!𝑥! + 𝑤!"𝑥" + 𝑤!5𝑥5 + 𝑤!!
𝑏! 𝑤!" 𝑥!
𝑤67 : weight for 𝑥7 for i-th sigmoid 1
𝑤!5
2 𝑥"
𝑟" = 𝑏" + 𝑤"!𝑥! + 𝑤""𝑥" + 𝑤"5𝑥5 +

1 𝑥5

3
𝑟5 = 𝑏5 + 𝑤5!𝑥! + 𝑤5"𝑥" + 𝑤55𝑥5 +

1
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥# 𝑖: 1,2,3
* # 𝑗: 1,2,3

𝑟! = 𝑏! + 𝑤!!𝑥! + 𝑤!"𝑥" + 𝑤!5𝑥5

𝑟" = 𝑏" + 𝑤"!𝑥! + 𝑤""𝑥" + 𝑤"5𝑥5
𝑟5 = 𝑏5 + 𝑤5!𝑥! + 𝑤5"𝑥" + 𝑤55𝑥5

𝑟! 𝑏! 𝑤!! 𝑤!( 𝑤!+ 𝑥!

𝑟( = 𝑏( + 𝑤(! 𝑤(( 𝑤(+ 𝑥(
𝑟+ 𝑏+ 𝑤+! 𝑤+( 𝑤++ 𝑥+

𝒓 = 𝒃 + 𝑊 𝒙
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥# 𝑖: 1,2,3
* # 𝑗: 1,2,3

1
𝑟! + 𝑤!!
𝑏! 𝑤!" 𝑥!
1
𝑤!5
2 𝑥"
𝒓 = 𝒃 + 𝑊 𝒙 𝑟" +

1 𝑥5

3
𝑟5 +

1
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥# 𝑖: 1,2,3
* # 𝑗: 1,2,3

1
𝑎! 𝑟! + 𝑤!!
1 𝑏! 𝑤!" 𝑥!
𝑎! = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑟! = 1
𝑤!5
1 + 𝑒 28"
2 𝑥"
𝑎" 𝑟" +

1 𝑥5

3
𝒂 =𝜎 𝒓 𝑎5 𝑟5 +

1
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥# 𝑖: 1,2,3
* # 𝑗: 1,2,3

1
𝑎! 𝑟! + 𝑤!!
𝑐! 𝑏! 𝑤!" 𝑥!
1
𝑤!5
𝑐" 2 𝑥"
𝑦 + 𝑎" 𝑟" +
𝑏
1 1 𝑥5
𝑐5
3
𝑎5 𝑟5 +

1
𝑦 = 𝑏 + 𝒄 , 𝒂
1
𝑎! 𝑟! + 𝑤!!
𝑐! 𝑏! 𝑤!" 𝑥!
1
𝑤!5
𝑐" 2 𝑥"
𝑦 + 𝑎" 𝑟" +
𝑏
1 1 𝑥5
𝑐5
3
𝑎5 𝑟5 +

1
𝑦 = 𝑏 + 𝒄, 𝒂

𝒂 =𝜎 𝒓 𝒓 = 𝒃 + 𝑊 𝒙
1
𝑎! 𝑟! + 𝑤!!
𝑐! 𝑏! 𝑤!" 𝑥!
1
𝑤!5
𝑐" 2 𝑥"
𝑦 + 𝑎" 𝑟" +
𝑏
1 1 𝑥5
𝑐5
3
𝑎5 𝑟5 +

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙
Function with unknown parameters
𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙

𝒙 feature Rows
of 𝑊

……
𝜃!
Unknown parameters
𝜃(
𝜽 =
𝜃+
𝑊 𝒃 ⋮

𝒄, 𝑏
Back to ML Framework

Step 1: Step 2: define

Step 3:
function with loss from
optimization
unknown training data

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙
Loss Ø Loss is a function of parameters 𝐿 𝜃
Ø Loss means how good a set of values is.

feature

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙
𝑒
label 𝑦
D
Given a set of values

1
Loss: 𝐿 = , 𝑒"
𝑁
"
Back to ML Framework

Step 1: Step 2: define

Step 3:
function with loss from
optimization
unknown training data

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙
Optimization of New Model 𝜃!

𝜽∗ = 𝑎𝑟𝑔 min 𝐿 𝜽 = 𝜃(
𝜽 𝜃+
⋮
Ø (Randomly) Pick initial values 𝜽(
𝜕𝐿 𝜕𝐿
|𝜽$𝜽! 𝜂 |𝜽$𝜽!
𝜕𝜃! 𝜃!! 𝜃!- 𝜕𝜃!
𝒈 = 𝜕𝐿 𝜃(! ← 𝜃(- − 𝜕𝐿
gradient 𝜕𝜃 |𝜽$𝜽! 𝜂 |𝜽$𝜽!
( ⋮ ⋮ 𝜕𝜃(
⋮ ⋮
𝒈 = ∇𝐿 𝜽- 𝜽! ← 𝜽- − 𝜂𝒈
Optimization of New Model
𝜽∗ = 𝑎𝑟𝑔 min 𝐿
𝜽

Ø (Randomly) Pick initial values 𝜽(

Ø Compute gradient 𝒈 = ∇𝐿 𝜽(
𝜽! ← 𝜽( − 𝜂𝒈
Ø Compute gradient 𝒈 = ∇𝐿 𝜽!
𝜽" ← 𝜽! − 𝜂𝒈
Ø Compute gradient 𝒈 = ∇𝐿 𝜽"
𝜽5 ← 𝜽" − 𝜂𝒈
Optimization of New Model
𝜽∗ = 𝑎𝑟𝑔 min 𝐿
𝜽
B batch
Ø (Randomly) Pick initial values 𝜽( 𝐿
Ø Compute gradient 𝒈 = ∇𝐿! 𝜽( 𝐿!
batch
update 𝜽! ← 𝜽( − 𝜂𝒈
N
Ø Compute gradient 𝒈 = ∇𝐿" 𝜽! 𝐿"
update 𝜽" ← 𝜽! − 𝜂𝒈 batch
Ø Compute gradient 𝒈 = ∇𝐿5 𝜽" 𝐿5
update 𝜽5 ← 𝜽" − 𝜂𝒈 batch
1 epoch = see all the batches once
Optimization of New Model
Example 1
Ø 10,000 examples (N = 10,000) B batch
Ø Batch size is 10 (B = 10)
How many update in 1 epoch?
batch
1,000 updates
Example 2
N
Ø 1,000 examples (N = 1,000) batch
Ø Batch size is 100 (B = 100)
How many update in 1 epoch?
10 updates batch
Back to ML Framework

Step 1: Step 2: define

Step 3:
function with loss from
optimization
unknown training data

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙

More variety of models …

Sigmoid → ReLU
How to represent
this function?

𝑥!

Rectified Linear
Unit (ReLU) 𝑐 𝑚𝑎𝑥 0, 𝑏 + 𝑤𝑥!

𝑥!
𝑐′ 𝑚𝑎𝑥 0, 𝑏′ + 𝑤′𝑥!
Sigmoid → ReLU

𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥#
* #

Activation function

𝑦 = 𝑏 + , 𝑐* 𝑚𝑎𝑥 0, 𝑏* + , 𝑤*# 𝑥#
(* #

Which one is better?

Experimental Results

𝑦 = 𝑏 + , 𝑐* 𝑚𝑎𝑥 0, 𝑏* + , 𝑤*# 𝑥#
(* #

linear 10 ReLU 100 ReLU 1000 ReLU

2017 – 2020 0.32k 0.32k 0.28k 0.27k
2021 0.46k 0.45k 0.43k 0.43k
Back to ML Framework

Step 1: Step 2: define

Step 3:
function with loss from
optimization
unknown training data

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙

Even more variety of models …

+ 𝑎! +
𝑥!
1 1

𝑥"
+ 𝑎" +

1 or 1 𝑥5

……
+ 𝑎5 +

1 1

𝒂′ = 𝜎 𝒃′ + 𝑊′ 𝒂 𝒂 =𝜎 𝒃 + 𝑊 𝒙
Experimental Results
• Loss for multiple hidden layers
• 100 ReLU for each layer
• input features are the no. of views in the past 56
days
1 layer 2 layer 3 layer 4 layer
2017 – 2020 0.28k 0.18k 0.14k 0.10k
2021 0.43k 0.39k 0.38k 0.44k
Red: real no. of views
3 layers blue: estimated no. of views

Views
(k)

2021/01/01 2021/02/14
Back to ML Framework

Step 1: Step 2: define

Step 3:
function with loss from
optimization
unknown training data

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙

It is not fancy enough.

Let’s give it a fancy name!

hidden layer hidden layer
+ 𝑎! +
𝑥!
1 1

𝑥"
+ 𝑎" +

1 1 𝑥5

……
+ 𝑎5 +

1 Neuron 1

Neural Network This mimics human brains … (???)

Many layers means Deep Deep Learning

Deep = Many hidden layers
22 layers

http://cs231n.stanford.e
du/slides/winter1516_le 19 layers
cture8.pdf

8 layers
6.7%
7.3%
16.4%

AlexNet (2012) VGG (2014) GoogleNet (2014)

Deep = Many hidden layers

152 layers 101 layers

Special
structure

Why we want “Deep” network,

not “Fat” network? 3.57%

7.3% 6.7%
16.4%
AlexNet VGG GoogleNet Residual Net Taipei
(2012) (2014) (2014) (2015) 101
Why don’t we go deeper?
• Loss for multiple hidden layers
• 100 ReLU for each layer
• input features are the no. of views in the past 56
days
1 layer 2 layer 3 layer 4 layer
2017 – 2020 0.28k 0.18k 0.14k 0.10k
2021 0.43k 0.39k 0.38k 0.44k
Why don’t we go deeper?
• Loss for multiple hidden layers
• 100 ReLU for each layer
• input features are the no. of views in the past 56
days
1 layer 2 layer 3 layer 4 layer
2017 – 2020 0.28k 0.18k 0.14k 0.10k
2021 0.43k 0.39k 0.38k 0.44k

Better on training data, worse on unseen data

Overfitting
Let’s predict no. of views today!
• If we want to select a model for predicting no. of
views today, which one will you use?
1 layer 2 layer 3 layer 4 layer
2017 – 2020 0.28k 0.18k 0.14k 0.10k
2021 0.43k 0.39k 0.38k 0.44k

We will talk about model selection next time. J

To learn more ……
Backpropagation
Basic Introduction Computing gradients in
an efficient way

https://youtu.be/Dr-WRlEFefw https://youtu.be/ibJpTrp5mcE

DL Unit-2
No ratings yet
DL Unit-2
24 pages
Epp List Final
100% (1)
Epp List Final
1,754 pages
Week2 DL
No ratings yet
Week2 DL
29 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
Week 2 Artificial Neural Networks
No ratings yet
Week 2 Artificial Neural Networks
62 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
16-dl-1 - converted
No ratings yet
16-dl-1 - converted
9 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
Towards A Mathematical Understanding of Neural Network-Based Machine Learning: What We Know and What We Don't
No ratings yet
Towards A Mathematical Understanding of Neural Network-Based Machine Learning: What We Know and What We Don't
56 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
Lecture20 Backprop
No ratings yet
Lecture20 Backprop
77 pages
ML 01
No ratings yet
ML 01
24 pages
Deep Learning Computer Vision
No ratings yet
Deep Learning Computer Vision
302 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
DL 02 Basics
No ratings yet
DL 02 Basics
94 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
Lect 8
No ratings yet
Lect 8
117 pages
03 Ai
No ratings yet
03 Ai
59 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Deep Learning Lectures - 2
No ratings yet
Deep Learning Lectures - 2
73 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
No ratings yet
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
40 pages
Instructor's Solution Manual For Neural Networks
No ratings yet
Instructor's Solution Manual For Neural Networks
40 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
L3_CSE256_FA24_FFN
No ratings yet
L3_CSE256_FA24_FFN
64 pages
Practical-5_2CEIT606_Artificial Intelligence
No ratings yet
Practical-5_2CEIT606_Artificial Intelligence
14 pages
Lecture1
No ratings yet
Lecture1
56 pages
Lecture04 VDL
No ratings yet
Lecture04 VDL
93 pages
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
No ratings yet
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
61 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
Lec 04 Deep Networks 2
No ratings yet
Lec 04 Deep Networks 2
78 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Lec 03
No ratings yet
Lec 03
42 pages
A2.2 DNN Update 2
No ratings yet
A2.2 DNN Update 2
51 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Lecture 18. Backpropagation
No ratings yet
Lecture 18. Backpropagation
55 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Probability Neuron Network
No ratings yet
Probability Neuron Network
84 pages
EE5434 Regression
No ratings yet
EE5434 Regression
96 pages
Understanding and Creating Neural Networks
No ratings yet
Understanding and Creating Neural Networks
69 pages
IBest_DeepLearning
No ratings yet
IBest_DeepLearning
123 pages
DL - M2 - Deep Feedforward NN
No ratings yet
DL - M2 - Deep Feedforward NN
97 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
nonlinear
No ratings yet
nonlinear
8 pages
4-Neural Networks and Activation Function
No ratings yet
4-Neural Networks and Activation Function
28 pages
Lecture2
No ratings yet
Lecture2
67 pages
AIML - Unit 4 Notes
No ratings yet
AIML - Unit 4 Notes
23 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
Deep Learning Turorial PDF
No ratings yet
Deep Learning Turorial PDF
301 pages
DL ppt
No ratings yet
DL ppt
110 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Coefficients of Friction
No ratings yet
Coefficients of Friction
6 pages
Abdul Kalam
No ratings yet
Abdul Kalam
6 pages
Mastermind 96A
No ratings yet
Mastermind 96A
8 pages
Upper Limb and Lower Limb
No ratings yet
Upper Limb and Lower Limb
16 pages
Feria de Las Flores
No ratings yet
Feria de Las Flores
7 pages
History of Commerce and Trade: This Photo CC By-Sa-Nc
100% (1)
History of Commerce and Trade: This Photo CC By-Sa-Nc
31 pages
Home by Fabelio (Company Profile)
No ratings yet
Home by Fabelio (Company Profile)
23 pages
01 Flight Operations
No ratings yet
01 Flight Operations
98 pages
s68 PDF
No ratings yet
s68 PDF
2 pages
Led Screen Installation Manual Extended
No ratings yet
Led Screen Installation Manual Extended
10 pages
Elster-Eca G4 Körüklü
No ratings yet
Elster-Eca G4 Körüklü
2 pages
Pcie Gen-5 Design Challenges of High-Speed Servers
No ratings yet
Pcie Gen-5 Design Challenges of High-Speed Servers
3 pages
Clamp Connectors
No ratings yet
Clamp Connectors
3 pages
Esr 3572
No ratings yet
Esr 3572
30 pages
Approach To Infants and Children With Cyanotic Congenital Heart Disease
100% (1)
Approach To Infants and Children With Cyanotic Congenital Heart Disease
6 pages
Filters Solutions
No ratings yet
Filters Solutions
10 pages
A Comprehensive of Age-Related Differences in Glycemic Control and Complications in Type 2 Diabetes (WWW - Kiu.ac - Ug)
No ratings yet
A Comprehensive of Age-Related Differences in Glycemic Control and Complications in Type 2 Diabetes (WWW - Kiu.ac - Ug)
6 pages
Particle Morphology and Powder Properties During - Wageningen University and Research 513788
No ratings yet
Particle Morphology and Powder Properties During - Wageningen University and Research 513788
6 pages
CS401 CPU Scheduling Exercise Problem 2 Solution FINAL
No ratings yet
CS401 CPU Scheduling Exercise Problem 2 Solution FINAL
2 pages
Xii Ips 6 B. Inggris
No ratings yet
Xii Ips 6 B. Inggris
43 pages
Theoretical Framework 1
No ratings yet
Theoretical Framework 1
4 pages
Sodium Carbonate Titration
No ratings yet
Sodium Carbonate Titration
9 pages
Sensors 21 08376
No ratings yet
Sensors 21 08376
15 pages
Process Planning Assiment
No ratings yet
Process Planning Assiment
9 pages
Wipro WASE - Aptitude
100% (1)
Wipro WASE - Aptitude
43 pages
Active Calculus Activities CH 1-4 (v.12.30.13)
No ratings yet
Active Calculus Activities CH 1-4 (v.12.30.13)
113 pages
Dynax-Maxxum XTsi en
No ratings yet
Dynax-Maxxum XTsi en
49 pages
Oow Duties Wet Cargo
No ratings yet
Oow Duties Wet Cargo
28 pages
A Shew Recipe Costing XLSX 2
No ratings yet
A Shew Recipe Costing XLSX 2
2 pages

Introduction of Machine Learning

Uploaded by

Introduction of Machine Learning

Uploaded by

Introduction of

Machine / Deep Learning

f( ) = “How are you”

f( ) = “5-5” (next move)

Classification: Given options (classes), the function outputs

Model 𝑦 = 𝑏 + 𝑤𝑥! based on domain knowledge

4.8k 4.9k 7.5k 3.4k 9.8k

𝑒!= 𝑦 − 𝑦& = 0.4𝑘

4.8k 4.9k 7.5k 3.4k 9.8k

0.5𝑘+1𝑥! = 𝑦 5.4k 0.5𝑘+1𝑥! = 𝑦

4.9k 7.5k 9.8k

𝑒 = 𝑦 − 𝑦& 𝐿 is mean absolute error (MAE)

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

Ø (Randomly) Pick initial values 𝑤 (, 𝑏 (

Can be done in one line in most deep learning frameworks

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

Compute 𝜕𝐿⁄𝜕𝑤, 𝜕𝐿⁄𝜕𝑏

Compute 𝜕𝐿⁄𝜕𝑤, 𝜕𝐿⁄𝜕𝑏

𝑦 = 0.1𝑘 + 0.97𝑥! achieves the smallest loss 𝐿 = 0.48𝑘

Linear models have severe limitation. Model Bias

More pieces require more

𝑟! = 𝑏! + 𝑤!!𝑥! + 𝑤!"𝑥" + 𝑤!5𝑥5

𝑟! 𝑏! 𝑤!! 𝑤!( 𝑤!+ 𝑥!

Step 1: Step 2: define

Step 1: Step 2: define

Ø (Randomly) Pick initial values 𝜽(

Step 1: Step 2: define

More variety of models …

Which one is better?

linear 10 ReLU 100 ReLU 1000 ReLU

Step 1: Step 2: define

Even more variety of models …

Step 1: Step 2: define

It is not fancy enough.

Let’s give it a fancy name!

Neural Network This mimics human brains … (???)

Many layers means Deep Deep Learning

AlexNet (2012) VGG (2014) GoogleNet (2014)

152 layers 101 layers

Why we want “Deep” network,

Better on training data, worse on unseen data

We will talk about model selection next time. J

You might also like