Perceptron
Foundations of Data Analysis
April 12, 2022
History of Perceptron
■ Frank Rosenblatt
■ 1928-1969
invented perceptron algorithm 2
History of Perceptron
■ Mark 1 Perceptron (1958)
■ 20 x 20 pixel camera
■ Hardware, not software!
"an electronic computer that [the Navy] expects will be able to walk, talk,
see, write, reproduce itself and be conscious of its existence"
- NY Times, 1958 3
Perceptron Learning Algorithm
■ First neural network learning model in the 1960’s
4
Perceptron Learning Algorithm
■ First neural network learning model in the 1960’s
■ Simple and limited (single layer model)
5
Perceptron Learning Algorithm
■ First neural network learning model in the 1960’s
■ Simple and limited (single layer model)
■ Basic concepts are similar to multi-layer models
6
What is Perceptron?
The goal of perceptron algorithm is to find a hyperplane that
separates a set of data into two classes.
Hyperplane (decision boundary)
Class 1
Class 0
7
What is Perceptron?
The goal of perceptron algorithm is to find a hyperplane that
separates a set of data into two classes.
Hyperplane (decision boundary)
Class 1
• Binary classifier
• Supervised learning
Class 0
8
Perceptron
bias term
Class 1
1, if w · x + ✓ > 0
f (x) = 1, w · x + b > 0
0, otherwise
P
n
Class 0 w·x= wi xi (dot product)
i=1
9
Perceptron
Net Activation funciton Output
x1 w1
x2 w2 Z
1, if w · x > ✓
z= 0, w · x <= ✓
xn wn
P
n
w·x= w i xi
i=1
• Learning weights such that an objective function is minimized
10
Activation Function
Outputs the label given an input or a set of inputs.
⎧ 1 if x ≥ 0
f (x) := sgn(x) = ⎨
⎩−1 if x < 0
Step function
€
f (x) = max(0, x)
ReLU (rectified linear unit)
1
f (x) := σ (x) =
1+ e(−ax )
Sigmoid function 11
Perceptron as a Single Layer Neuron
12
Examples
✓ = 0.2
.9 .5
.1 -.3
Z z =?
1, if w · x > ✓
z= 0, w · x <= ✓
.8 .4
.6 .2 Z z =?
.2 -.5 ✓ = 0.1
13
How to Learn Perceptron?
Class 1
1, if w · x + ✓ > 0
f (x) = 1, w · x + b > 0
0, otherwise
Class 0 w, ✓ are unknown parameters
14
How to Learn Perceptron?
Class 1
1, if w · x + ✓ > 0
f (x) = 1, w · x + b > 0
0, otherwise
Class 0 w, ✓ are unknown parameters
■ In supervised learning the network has its output
compared with known correct answers
■ Supervised learning
■ Learning with a teacher
15
CS 478 - Perceptrons 16
Perceptron Learning Rules
■ Consider linearly separable problems
■ How to find appropriate weights
■ Look if the output result o belongs to the desired class
has the desired value d (give labels)
new old
P
w =w + 4w 4w = ⌘ (d o)xi
i
η is called the learning rate, with 0 < η ≤ 1
Perceptron Convergence Theorem: Guaranteed to find a solution
in finite time if a solution exists
17
Perceptron Learning Rules
■ The algorithm converges to the correct classification
if and only if the training data is linearly separable
■ When assigning a value to η we must keep in mind two
conflicting requirements
■ Averaging of past inputs to provide stable weights
estimates, which requires small η
■ Fast adaptation with respect to real changes in the
underlying distribution, which requires large η
18
Linear Separability
19
Limited Functionality of Hyperplane
CS 478 - Perceptrons 20
Multilayer Network
output layer n
o1 = sgn(∑ w1i x i )
i= 0
hidden layer
n
€
o2 = sgn(∑ w 2i x i )
i= 0
input layer
21