Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

lect8_dnn (1)

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 33

Artificial Neural Networks from Scratch

 Learn to build neural network from scratch.


o Focus on multi-level feedforward neural networks (multi-level perceptrons)
 Training large neural networks is one of the most important
workload in large scale parallel and distributed systems
o Programming assignments throughout the semester will use this.
What do (deep) neural networks do?

 Learning (highly) non-linear functions.

0 0 0
(0, 1) (1, 1)
0 1 1
1 0 1
1 1 0
(0, 0) (1, 0)
Logic XOR () operation
Artificial neural network example

 A neural network consists of


Input Hidden Hidden Hidden Output
layer layer 1 layer 2 layer 3 layer layers of artificial neurons and
connections between them.
 Each connection is associated
with a weight.
 Training of a neural network is to
get to the right weights (and
biases) such that the error across
the training data is minimized.
Training a neural network

 A neural network is trained with m training samples


((……(
is an input vector, is an output vector
 Training objective: minimize the prediction error (loss)

is the predicted output vector for the input vector


 Approach: Gradient descent (stochastic gradient descent, batch gradient descent, mini-
batch gradient descent).
o Use error to adjust the weight value to reduce the loss. The adjustment amount is proportional to
the contribution of each weight to the loss – Given an error, adjust the weight a little to reduce the
error.
Stochastic gradient descent

 Given one training sample (


 Compute the output of the neural network
 Training objective: minimize the prediction error (loss) – there are
different ways to define error. The following is an example:

 Estimate how much each weight in contributes to the error:


 Update the weight by =. Here is the learning rate.
Algorithm for learning artificial neural
network

 Initialize the weights


 Training
o For each training data ), Using forward propagation to compute the neural
network output vector
o Compute the error (various definitions)
o Use backward propagation to compute for each weight
o Update =
o Repeat until E is sufficiently small.
A single neuron
b
(𝑖)
𝑋 1 w1

∑❑
Neuron

(𝑖) wm
𝑋𝑚
+b Activation function

 An artificial neuron has two components: (1) weighted sum and


activation function.
o Many activation functions: Sigmoid, ReLU, etc.
Sigmoid function

 The derivative of the sigmoid function



Training for the logic AND with a single
neuron
 In general, one neuron can be trained to realize a linear function.
 Logic AND function is a linear function:

0 0 0
(0, 1) (1, 1)
0 1 0
1 0 0
1 1 1
(0, 0) (1, 0)
Logic AND () operation
Training for the logic AND with a single
neuron
b=0
=0 =0

∑ Sigmoid(s) O=Sigmoid(0)=0.5

𝑋 2=1 =0
Activation function
+

 Consider training data input (=0, ), output Y=0.


 NN Output = 0.5
 Error:
 To update , , and b, gradient descent needs to compute , , and
Chain rules for calculating , , and
b=0
=0 =0
+
∑ Sigmoid(s) O=Sigmoid(0)=0.5

𝑋 2=1 =0
Activation function

 If a variable z depends on the variable y, which itself depends on the


variable x, then z depends on x as well, via the intermediate variable
y. The chain rule is a formula that expresses the derivative as :
Training for the logic AND with a single
neuron
b=0
=0 =0

∑ Sigmoid(s) O=Sigmoid(0)=0.5

𝑋 2=1 =0
Activation function
+

 =
 = = sigmoid(s) (1-sigmoid(s)) = 0.5 (1-0.5) = 0.25, = = 0
 To update : = 0 – 0.1*0.5*0.25*0 = 0
 Assume rate = 0.1
Training for the logic AND with a single
neuron
b=0
=0 =0

∑ Sigmoid(s) O=Sigmoid(0)=0.5

𝑋 2=1 =0
Activation function
+

 =
 = = sigmoid(s) (1-sigmoid(s)) = 0.5 (1-0.5) = 0.25, = = 1
 To update : = 0 – 0.1*0.5*0.25*1 = -0.0125
Training for the logic AND with a single
neuron
b=0
=0 =0

∑ Sigmoid(s) O=Sigmoid(0)=0.5

𝑋 2=1 =0
Activation function
+

 =
 = = sigmoid(s) (1-sigmoid(s)) = 0.5 (1-0.5) = 0.25, = 1
 To update b: b = 0 – 0.1*0.5*0.25*1 = -0.0125
Training for the logic AND with a single
neuron
b=-0.0125
=0 =0

∑ Sigmoid(s) O=Sigmoid(0)=0.5

𝑋 2=1
=-0.0125
Activation function
+

 This process is repeated until the error is sufficiently small


 The initial weight should be randomized. Gradient descent can get stuck in the local optimal.
 See lect7/one.cpp for training the logic AND operation with a single neuron.
 Note: Logic XOR operation is non-linear and cannot be trained with one neuron.
Multi-level feedforward neural networks

 A multi-level feedforward neural network is a neural network that consists of


multiple levels of neurons. Each level can have many neurons and connections
between neurons in different levels do not form loops.
o Information moves in one direction (forward) from input nodes, through hidden nodes,
to output nodes.
 One artificial neuron can only realize a linear function
 Many levels of neurons can combine linear functions can train arbitrarily
complex functions.
o One hidden layer (with infinite number of neurons) can train for any continuous
function.
Multi-level feedforward neural networks
examples
 A layer of neurons that do not directly connect to outputs is called a
hidden layer.
Input Hidden Hidden Hidden Output
layer layer 1 layer 2 layer 3 layer
Build a 3-level neural network from scratch

 3 levels: Input level, hidden level, output level


o Other assumptions: fully connected between layers, all neurons use sigmoid
() as the activation function.
 Notations:
o N0: size of the input level. Input:
o N1: size of the hidden layer
o N2: size of the output layer. Output: OO
Build a 3-level neural network from scratch

 Notations:
o N0, N1, N2: sizes of the input layer, hidden layer, and output layer, respectively
o N0N1 weights from input layer to hidden layer. the weight from input unit i to
hidden unit j. B0[N1] biases. B0
o N1N2 weights from hidden layer to output layer. the weight from hidden unit i to
output unit j. B1[N2] biases. B1
o,
3-level feedforward neural network

Output: OO[N2]

N Hidden layer biases: B2[N2]


Output layer 1 2
2
Weight: W1[N1][N2]
Hidden layer
Output: HO[N1]
N Hidden layer biases: B1[N1]
Hidden layer 1 2
1

Weight: W0[N0][N1]

N
Input layer 1 2
0 Hidden layer weighted sum: HS[N1]
Input: IN[N0] Output layer weighted sum: HS[N2]
Forward propogation (compute OO and E)

 Compute hidden layer weighted sum: HS


o
o In matrix form:

 Compute hidden
o(
o In matrix form: (HS
Forward propogation

 From input (IN[N0]), compute output (OO[N2]) and error E.


 Compute output layer weighted sum: OS
o
o In matrix form:

 Compute final
o(
o In matrix form: O (OS

 Let us use mean square error:


Backward propogation

 To goal is to compute , , , and .

]
 In matrix form: =
 This can be stored in an array dE_OO[N2];
Backward propogation

 To goal is to compute , , , and .


 is done
 =

 In matrix form: =
 This can be stored in an array dE_OS[N2];
Backward propogation

 To goal is to compute , , , and .


 are done

 Hence,

Backward propogation

 To goal is to compute , , , and . 𝑂𝑆 𝑖


 , are done
i
𝑊 11 ,𝑖 𝑊 1 𝑁 2 ,𝑖

 Hence,
Backward propogation

 To goal is to compute , , , and .


 , are done

 In matrix form:
Backward propogation

 To goal is to compute , , , and .


 , , are done

 = + = +
Backward propogation

 To goal is to compute , , , and .


 , , are done
 =
Backward propogation

 To goal is to compute , , , and .


 , , , are done
 Once is computed, we can repeat the process for the hidden layer
by replacing OO with HO, OS with HS, B2 with B1 and W2 with W1, in
the differential equation. Also the input is IN[N0] and the output is
HO[N1].
Summary

H1 H2
IN layer O X Layer 1 Layer 2 Layer 3 Y

𝜕𝐸 𝜕𝐸
𝜕 𝐼𝑁 layer 𝜕𝑂

 The output of a layer is the input of the next layer.


 Backward propagation uses results from forward propagation.
o = ,,
Training for the logic XOR and AND with a 6-
unit 2-level nueral network
 Logic XOR function is not a linear function (can’t train with
lect8/one.cpp). See 3level.cpp

0 0 0 (0, 1) (1, 1) AND


0 1 1
1 0 1
XOR
1 1 0
(0, 0) (1, 0)
Logic XOR () operation
Summary
 Briefly discuss multi-level feedforward neural networks
 The training of neural networks
 Following 3level.cpp, one should be able to write a program for any
multi-level feedforward neural networks.

You might also like