Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
5 views

4. Learning Algorithm

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

4. Learning Algorithm

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Introduction to

Artificial Intelligent
(AI)
4. Learning
Algorithms
Motivation
Real world example:
• Fish packing plant: separate sea
bass from salmon using optical
sensing
• Features: Physical differences such
as length, lightness, width, number
and shape of fins, position of the
mouth
• Noise: variations in lighting,
position of the fish on the conveyor,
“static” due to the electronics of 2
Motivation

Histograms for the length feature for the two categories


3
Motivation

Histograms for the lightness feature for the two categories


4
decision boundary

The two features of lightness and width for sea bass and salmon

How would our system automatically determine the decision boundary? 5


Loss

Loss is the function of error over training data

Error is the difference between a single actual value and a single predicted
value 6
Loss

Regression loss functions


Python code
import numpy as np
def rmse(predictions, targets):
differences = predictions - targets
differences_squared = differences ** 2
mean_of_differences_squared =
Mean square loss differences_squared.mean()
rmse_val = np.sqrt(mean_of_differences_squared)
return rmse_val

Python code
import numpy as np
def mae(predictions, targets):
differences = predictions - targets
absolute_differences = np.absolute(differences)
mean_absolute_differences =
Mean absolute loss absolute_differences.mean()
return mean_absolute_differences 7
Learning algorithms

1969: Strassen's  1950s - 1970s: Early Foundations


algorithm for  1957: Perceptron (Frank Rosenblatt) – One of the earliest neural networks
matrix designed for binary classification.
multiplication
 1960s: K-nearest neighbors (KNN) – A simple instance-based learning
method developed for classification tasks.
 1980s: The Rise of Neural Networks
 1980: Multi-layer Perceptron training by Backpropagation – Developed by
Paul Werbos, later popularized in the 1980s for training neural networks.
 1990s: Advancements in Ensemble Methods and Optimization
All algorithms  1995: Random Forest (Leo Breiman) – A decision tree-based ensemble
are implemented learning technique that reduces overfitting.
on the CPU
 1995: Support Vector Machines gain practical relevance with the advent of
kernel methods.

8
Learning algorithms
 2000s: Kernel Methods and Probabilistic Models
2006: NVIDIA
release CUDA  2001: Adaboost – An adaptive boosting method developed by Yoav Freund and
Robert Schapire.
2009: Andrew Ng
utilized GPUs to  2010s: Deep Learning Revolution
accelerate the  2012: AlexNet (Krizhevsky et al.) – A deep convolutional neural network that
training of large won the ImageNet competition, leading to breakthroughs in computer vision.
neural networks  2014: Generative Adversarial Networks (GANs) (Ian Goodfellow et al.) –
Introduced a new framework for generating synthetic data through adversarial
learning.
 2017: Transformers (Vaswani et al.) – Revolutionized natural language
processing (NLP) by eliminating the need for recurrent neural networks.
 2020s: Scalable AI and Further Innovations
 2020: GPT-3 (OpenAI) – A large-scale transformer-based model demonstrating
significant progress in language understanding and generation.

9
Random forest

10
Adaboost
decision stumps or decision trees

‘Boosting’ : a family of algorithms which converts weak learners to


strong learners.
𝑛
𝐻 ( 𝑥 ) =𝑠𝑖𝑔𝑛 ∑ 𝛼 𝑖 h𝑖 ( 𝑥 )
𝑖=1
: learners
: weight of the leaner

11
Adaboost

12
Adaboost
 Weak learners for image recognition

Haar filters
Common features 160,000+ possible features
associated with each 24 x 24 window

13
Cascade filter

14
Cascade filter
Prepare data

Negative Images Positive Images


images which do not contain the target object images which contain the target object

A proportion of 2:1 or higher between negative and positive samples is considered accept
15
Cascade filter

16
Biological Neurons

◉ A typical biological neuron is composed of:


○ A cell body;
○ Dendrites: input channels
○ Axon: output cable; it usually branches.

17
Biological Neurons

◉ The major job of neuron:


○ receives information, usually in the form of electrical pulses, from
many other neurons.
○ sum these inputs in a complex dynamic way
○ sends out information in the form of a stream of electrical impulses
down its axon and on to many other neurons.
○ The connections (synapses) are crucial for excitation, inhibition or
modulation of the cells.
○ Learning is possible by adjusting the synapses!

How to build a mathematical model of the neuron?

18
Model of a neuron

◉ Simplest model
inputs
outputs
System

19
Model of a neuron

Input x
outputs
System

𝒎
𝑅𝑒𝑙𝑎𝑡𝑖𝑜𝑠h𝑖𝑝 𝒚 = ∑ 𝒙 𝒊
𝒊=𝟏
But:
• The neuron only fires when it is sufficiently
excited
• Firing rate has an upper bound
20
Model of a neuron

◉ Modified model:

○ b: threshold bias → the neuron will not fire till it “high” enough.

Based upon this model, is it possible for the inputs to


inhibit the activation of the neuron?
The synaptic weights!

21
Model of a neuron

𝒎
𝒖𝒌 = ∑ 𝒘 𝒊 𝒙 𝒊 𝒚 𝒌 =𝝋(𝒖𝒌 +𝒃𝒌 )
𝒊=𝟏

𝒗 𝒌 =𝒖𝒌 +𝒃𝒌
22
Model of a Neuron

◉ Three basic components for the model of a neuron:


○ A set of synapses or connecting links: characterized by a weight or
strength of its own.
○ An adder for summing the input signals, weighted by the
respective synapses of the neuron (a linear combiner).
○ An activation function for limiting the amplitude of the neuron
output
◉ Mathematical model:
𝒎
𝒖𝒌 = ∑ 𝒘 𝒊 𝒙 𝒊 𝒚 𝒌 =𝝋(𝒖𝒌 +𝒃𝒌 )
𝒊=𝟏

23
Type of Activation (Squash) Functions

◉ Threshold function
(McCulloch-Pitts model -
1943)

◉ Piecewise-linear function

24
Type of Activation (Squash) Functions

◉ Logistic function:

◉ Hyperbolic tangent
function:

25
Type of Activation (Squash) Functions

◉ Gaussian functions

26
Network Architectures

◉ Network architecture defines how nodes are connected.

27
Learning in neural networks

◉ Learning is a process by which the free parameters of a


neural network are adapted through a process of
stimulation by the environment in which the network is
embedded.
◉ Process of learning:
○ The NN is stimulated by an environment
○ The NN undergoes changes in its free parameters as a result of
this stimulation.
○ The NN responds in a new way to the environment because of the
changes that have occurred in its internal structure.

How can the network adjust the weights?

28
Simplest neural network: Perceptron

◉ Perceptron is built around the McCulloch-Pitts model.

29
Perceptron
◉ Goal: To correctly classify the set of externally
applied stimuli x1, x2,…, xn into one of two
classes, C1 and C2.

the input vector: the weight


vector:
Where n denotes the iteration step
Perceptron

◉ Output of the neuron y(n)

◉ What is the decision boundary?

31
Decision boundary

Decision boundary

◉ m = 1: ?
◉ m=2?
◉ m=2?

◉ How to choose the


proper weights?

32
Selection of weights

Two basic methods can be employed to select a suitable weight


vector
◉ By off-line calculation of weights (without learning).
○ Possible if the system is relative simple
◉ By learning procedure
○ The weight vector is determined from a given
(training) set of input-output vectors (exemplars) in
such a way to achieve the best classification of the
training vectors

33
Off-line calculation of weights

Example
Truth table of NAND
Three points (0,0), (0,1) and
(1,0) belong to one class. And (1,1)
belong to another class.

The decision boundary is the straight line described by the following


equation
x1 + x2 = 1.5 or − x1 − x2 + 1.5 = 0

 w = (1.5, −1, − 1)
Is the decision line unique for this problem?

34
Perceptron Learning

◉ if C1 and C2 are linearly separable, there exist weight


vector such that:

Given a training set () where

◉ Training target: Find a weight vector w such that the


perceptron can correctly classify the training set X.

35
Perceptron Learning

◉ Feed a pattern x to the perceptron with weight vector w, it


will produce a binary output y (1 or 0). First consider the
case,

◉ If the correct label (all the labels of the training samples are
known) is d=0; should we update the weights?
◉ If the desired output is d=1, assume the new weight vector
is w’, then we have:

◉ But how to choose Δw ?

36
Perceptron Learning

◉ if the true label is d=1, and the perceptron makes a


mistake, its synaptic weights are adjusted by

37
Perceptron Learning

◉ consider the case,

◉ only adjust the weights when the perceptron makes


mistakes (d=0)

◉ If the true label is d=0, and the perceptron makes a


mistake, its synaptic weights are adjusted by

38
Perceptron Learning

◉ To unify this algorithm


○ consider the error signal: e=d-y
○ the error signal when d=1: e=1-0=1
○ the error signal when d=0: e=0-1=-1
◉ Then

39
Perceptron Learning

◉ Algorithm Perceptron
Start with a randomly chosen weight vector w(1);
while there exist input vectors that are misclassified by w(n)
Do Let x(n) be a misclassified input vector;
Update the weight vector to

Increment n
end-while

40
Perceptron Learning

◉ Example: Let us consider a simple classification problem


where the input space is one-dimensional space, i.e., a real
line:
○ Class 1 (d = 1) : x = 0.5, 2
○ Class 2 (d = 0) : x = -1, -2

◉ Solution:

41
Perceptron Learning

42
Perceptron Convergence Theorem

◉ Perceptron Convergence Theorem:


If C1 and C2 are linearly separable, after a finite number
of steps, the weights stop changing

43
Multilayer Perceptrons

◉ Multilayer perceptrons
(MLPs)
○ Generalization of the
single-layer perceptron
◉ Consists of
○ An input layer
○ One or more hidden
layers of computation
nodes
○ An output layer of
computation nodes
◉ Architectural graph of
a multilayer
perceptron with two
hidden layers:

44
Backpropagation

45
Data Augmentation
Data Augmentation is a technique that is used to create new artificial data from
already existing data sets

Motivation
Underfitting Overfitting
The model works well with the training When a model is trained with lots of data, it
data but performs poorly with the testing starts to pick up data from the noise and
data incorrect data entries.
Reasons: Reasons:
• Low variation of data and highly • High variation of the data and low bias.
biased model. • Model created is too complex and
• Model developed can’t handle advanced.
complex data. • The size of the training data is high.
• Small size of the training dataset.
• Training data is of poor quality
containing noise.

46
Data Augmentation
Data augmentation methods
Geometric Transformation Color Transformation AI Generative
• Flipping • Brightness • Generative
• Cropping • Darkness Adversarial
Networks
• Rotating • Sharpness
• Variation Auto-
• Zooming • Saturation Encoders
• Color Augmentation • Neural Style
Transfer

47
Benefits of Neural Networks

◉ High computational power


○ Generalization : Producing reasonable outputs for inputs not
encountered during training (learning).
○ Has a massively parallel distributed structure.

◉ Useful properties and capabilities


○ Nonlinearity : Most physical systems are nonlinear
○ Adaptivity (plasticity): Has built-in capability to adapt their
synaptic weights to changes in the environment
○ Fault tolerance : If a neuron or its connecting links are damaged,
the overall response may still be ok (due to the distributed nature
of information stored in a network).

48
Limitation Neural Network

◉ Fully connected -> different from biological neuron


◉ Input size is enormous
◉ Can’t share weight
First Covolution neural network
AlexNet (2012)
CNN layers
Convolution layer
Convolution layer
Convolution layer
Activation Layer

• Activation derivative saturate at 0  deeper layer weight


unchange
Pooling layer
58

You might also like