Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
17 views

Optical Character Recognition (OCR) in Python

Optical Character Recognition (OCR) in Python

Uploaded by

pedrokampos2024
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Optical Character Recognition (OCR) in Python

Optical Character Recognition (OCR) in Python

Uploaded by

pedrokampos2024
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

OPTICAL CHARACTER RECOGNITION (OCR) IN PYTHON

COURSE CONTENT

• OCR with Tesseract


• Techniques for pre-processing images
• OCR with EAST
• Training an OCR from scratch
• Artificial Neural Networks and Convolutional Neural Networks
• EasyOCR for natural scenarios
• OCR in videos
• Projects
• Searching for specific terms
• Scanner + OCR
• License plate reading
TEXT DETECTION – CONTROLLED SCENARIOS

Source: https://www.pyimagesearch.com/2017/07/17/credit-card-ocr-with-opencv-and-python/
TEXT DETECTION – CONTROLLED SCENARIOS
TEXT DETECTION – NATURAL SCENARIOS

Source: Mancas-Thillou e Gosselin

• Lighting conditions
• Blur
• Resolution What to do?
• Viewing angle
• Non-planar objects
• Objects that are not paper
• Noise in the image
• Unknown layout
• Slanted text
Source: https://www.rsipvision.com/real-time-ocr/
• Different letters
TESSERACT

• It is an OCR engine (Optical Character Recognition)


• Originally started as a PhD project at Hewlett-Packard
(HP) laboratories – scanner
• After gaining popularity, it was developed by HP in 1984.
The team worked until 1994
• It was ported to Windows in 1996
• In 2005 it was released to the community as an open
source project
• In 2006 its development began to be sponsored by
Google
• Since 2006, it is considered to be one of the best and
most popular OCR tools
• Oficial repository: github.com/tesseract-ocr/tesseract
TESSERACT

• The first version of Tesseract only supported English


• The second version supported Brazilian Portuguese, French, Italian,
German and Dutch
• The third version dramatically expanded support to include
ideographic (symbolic) languages such as Japanese and Chinese, as
well as right-to-left writing languages such as Arabic and Hebrew.
• The fourth (June 2021) supports over 100 languages for characters
and symbols
• Pytesseract: https://pypi.org/project/pytesseract/
TESSERACT

Source: Balaaji Parthasarathy


TESSERACT

To recognize an image
containing a single character,
we normally use a
Convolutional Neural
Network (CNN)
Source: Krut Patel

A text of arbitrary length is a sequence of characters and


it is more interesting to use RNNs (Recurrent Neural
Networks). LSTM (Long short-term memory) is a popular
form of RNN.
LSTM architecture
Source: Colah's blog
TESSERACT

Source: Tesseract 3 OCR process paper


TESSERACT

• Tesseract has a function to detect the orientation of the text in the image,
as well as the language it is written in
• This option is called OSD - Orientation and script detection
• Detect if the text in the image is rotated
• If it's rotated, we can apply some kind of preprocessing
THRESHOLDING

• Thresholding (binarization) is the simplest method of image


segmentation
• It consists of separating an image into regions of interest and non-interest
by choosing a cut-off point (called threshold)
SIMPLE THRESHOLDING

The value of the new color that the pixel


will have is calculated according to the
cutoff point (threshold)
Any pixel with intensity less than or equal
to the threshold becomes black. If the
pixel has an intensity greater than the
threshold, it becomes white.
OTSU METHOD

Example of a histogram of a
bimodal image
ADAPTIVE THRESHOLDING (GAUSSIAN)

• The threshold is calculated for small regions of the image,


different thresholds are obtained for small regions of the
image (using the average)
• Gaussian: also uses standard deviation (considering the
variation in the pixels) – uses convolution
RESIZING

Scale factor:
>1 increase
<1 decrease
sx = sy uniform scale factor (no distortion)

Increase 183%

Before interpolation After interpolation No interpolation


RESIZING

OpenCV options
• INTER_NEAREST - a nearest neighbor interpolation. It is widely used because it is
the fastest
• INTER_LINEAR - a bilinear interpolation (it's used by default), generally good for
zooming in and out of images
• INTER_AREA - uses the pixel area ratio. May be a preferred method for image
reduction as it provides good results
• INTER_CUBIC - bicubic (4x4 neighboring pixels). It has better results
• INTER_LANCZOS4 - Lanczos interpolation (8x8 neighboring pixels). Among these
algorithms, it is the one with the best quality results
RESIZING

Comparison between interpolation algorithms – increasing the size

original (50x50) area (400x400) nearest (400x400) linear (400x400) cubic (400x400) lanczos4 (400x400)

http://tanbakuchi.com/posts/comparison-of-openv-interpolation-algorithms/
RESIZING

Comparison between interpolation algorithms – decreasing the size

original (400x400) area (50x50) nearest (50x50) linear (50x50) cubic (50x50) lanczos4 (50x50)

http://tanbakuchi.com/posts/comparison-of-openv-interpolation-algorithms/
MORPHOLOGICAL OPERATIONS – EROSION AND
DILATION

EROSION

DILATION
MORPHOLOGICAL OPERATIONS – OPENING
AND CLOSING

DILATION

EROSION
OPENING

DILATION

ORIGINAL IMAGE EROSION

CLOSING
NOISE REMOVAL WITH BLUR

Spatial filters are implemented through kernels (masks/arrays),


with odd dimensions. Filters can be::
• Low-pass – used for blur
• High-pass – used to sharpen
CONVOLUTION
• Convolution is a mathematical operation
performed on two matrices, which produces a
third matrix that is the result of the operation
• The primary matrix is the image to be treated,
and the treatment of this matrix of the original
image is done by another matrix called “kernel”,
or mask
• Depending on the kernel values it is possible to
obtain filters of different types, such as blur and kernel
sharpen

Matrix of the image


BLUR WITH AVERAGE

Result

Original image

Kernel = average
GAUSSIAN BLUR

Graphic representation of
a 21x21 Gaussian filter
BLUR WITH MEDIAN

Median

The pixels below the kernel are


sorted and the middle value is Median
chosen to be the new value of
the central (main) pixel
BILATERAL FILTER
TEXT DETECTION

Text detection before recognition is an


essential step Example of images where there would be
no need for prior detection
EAST – TEXT DETECTOR

EAST (Efficient Accurate Scene Text detector) is a deep learning model, officially published in
2017 by Zhou et al.
• Uses convolutional layers to extract features from images and thus detect the existence of
texts
• It only identifies the location of the text. It is not an OCR that will convert it to characters
• It is one of the most accurate text detection techniques
• It is also one of the best known and gained popularity after the release of version 3.4.2 of
OpenCV, which made it possible to implement it more easily through the DNN module
EAST – TEXT DETECTOR

Features at different processing levels are treated EAST architecture


(Fully-Convolutional Network)
in a matrix, leading to the identification
of bounding boxes where the text appears in the
image
• It has only two stages: FCN (Fully Convolutional
Network) and NMS (non-maximum suppression)
to suppress weak and overlapping bounding
boxes

Source: Zhou et al.


ADITIONAL READING

EAST Paper
https://arxiv.org/abs/1704.03155v2

Mancas-Thillou e Gosselin
https://www.tcts.fpms.ac.be/publications/regpapers/2007/VS_cmtbg2007.pdf

Tesseract OCR
https://github.com/tesseract-ocr/tesseract

EAST geometry
https://stackoverflow.com/questions/55583306/decoding-geometry-output-of-east-text-detection
ARTIFICIAL NEURAL NETWORKS
ARTIFICIAL NEURAL NETWORKS

Axon terminals

Dendrites
Cell body

Axon
ARTIFICIAL NEURAL NETWORKS

Axon terminals
Dendrites Cell body

Axon
Synapse
ARTIFICIAL NEURON
$

𝑠𝑢𝑚 = % 𝑥𝑖 ∗ 𝑤𝑖
!"#

Inputs sum = (1 * 0.8) + (7 * 0.1) + (5 * 0)

Weights
sum = 0.8 + 0.7 + 0
1 sum = 1.5
0.8

0.1 Step function


7 ∑ f Output = 1

0
Greater or equal to 1 = 1
Sum Step Otherwise = 0
function function
5
ARTIFICIAL NEURON

𝑠𝑢𝑚 = % 𝑥𝑖 ∗ 𝑤𝑖
!"#

sum = (-1 * 0.8) + (7 * 0.1) + (5 * 0)


Inputs
sum = -0.8 + 0.7 + 0
Weights
-1
sum = -0.1
0.8

0.1
7 ∑ f Output = 0

0
Sum Step
function function
5
“AND” OPERATOR

X1 X2 Class

0 0 0

0 1 0

1 0 0

1 1 1
NEURÔNIO ARTIFICIAL x1 x2 Class

x1 0 0 0
0
0 0*0+0*0=0 0 1 0
x1 1 1 0 0

∑ f 0 0 1 1 1
1*0+0*0=0
0
x2 0 ∑ f 0

0
0–0=0
x2 0 0–0=0
0–0=0
x1 0 1–0=1
0
0*0+1*0=0
x1 1
∑ f 0 0
1*0+1*0=0
0
x2 1 ∑ f 0
0
x2 1
weight(n + 1) = weight(n) + (learningRate * input * error)
weight(n + 1) = 0 + (0.1 * 1 * 1) = 0.1
X1 X2 Class
0 0 0

x1 0 0 1 0
0.5 1 0 0
0 * 0.5 + 0 * 0.5 = 0
x1 1 1 1 1
∑ f 0 0.5
1 * 0.5 + 0 * 0.5 = 0.5
0.5
x2 0 ∑ f 0

0.5
0–0=0
x2 0 0–0=0
0–0=0
x1 0 1–1=0
0.5
0 * 0.5 + 1 * 0.5 = 0.5
x1 1
∑ f 0 0.5
1 * 0.5 + 1 * 0.5 = 1
0.5
x2 1 ∑ f 1

0.5
x2 1
SINGLE LAYER PERCEPTRON
y y

1 0 1 1 1 0

0 0 0
x 1 1 1 0 0 1
0 1 x
0 1

0 0 1
x
0 1
MULTILAYER PERCEPTRON

Hidden layer

Inputs Weights
∑ f Weights
x1

∑ f

∑ f Sum Activation
function function
x2

∑ f
SIGMOID FUNCTION

1
𝑦=
1 + 𝑒 !"

Values in the range from 0 to 1

If X is high the result will be approximately 1


If X is low the result will be approximately 0
No negative number is returned
“XOR” OPERATOR

X1 X2 Class

0 0 0

0 1 1

1 0 1

1 1 0
NEURÔNIO
$

𝑠𝑢𝑚 = % 𝑥𝑖 ∗ 𝑤𝑖
ARTIFICIAL 𝑦=
1 x1 x2 Class
Sum = 0 0 0 0
1 + 𝑒 %&
!"# Activation = 0.5
0 1 1

0.5 1 0 1
-0.424
1 1 0
0.358

-0.017 0 * (-0.424) + 0 * 0.358 = 0


Sum = 0
0
Activation = 0.5

-0.740 0.5
-0.893 0 * (-0.740) + 0 * (-0.577) = 0
-0.577

0.148
0 * (-0.961) + 0 * (-0.469) = 0
0
Sum = 0
-0.961 Activation = 0.5

-0.469 0.5
NEURÔNIO ARTIFICIAL x1 x2 Class
Sum = 0.358 0 0 0
Activation = 0.589
0 1 1

0.589 1 0 1
-0.424
1 1 0
0.358

-0.017 0 * (-0.424) + 1 * 0.358 = 0.358


Sum = -0.577
0
Activation = 0.360

-0.740 0.360
-0.893 0 * (-0.740) + 1 * (-0.577) = -0.577
-0.577

0.148
0 * (-0.961) + 1 * (-0.469) = -0.469
1
Sum = -0.469
-0.961 Activation = 0.385

-0.469 0.385
NEURÔNIO ARTIFICIAL x1 x2 Class
Sum = -0.424 0 0 0
Activation = 0.396
0 1 1

0.396 1 0 1
-0.424
1 1 0
0.358

-0.017 1 * (-0.424) + 0 * 0.358 = -0.424


Sum = -0.740
1
Activation = 0.323

-0.740 0.323
-0.893 1 * (-0.740) + 0 * (-0.577) = -0.740
-0.577

0.148
1 * (-0.961) + 0 * (-0.469) = -0.961
0
Sum = -0.961
-0.961 Activation = 0.277

-0.469 0.277
NEURÔNIO ARTIFICIAL x1 x2 Class
Sum = -0.066 0 0 0
Activation = 0.484
0 1 1

0.484 1 0 1
-0.424
1 1 0
0.358

-0.017 1 * (-0.424) + 1 * 0.358 = -0.066


Sum = -1.317
1
Activation = 0.211

-0.740 0.211
-0.893 1 * (-0.740) + 1 * (-0.577) = -1.317
-0.577

0.148
1 * (-0.961) + 1 * (-0.469) = -1.430
1
Sum = -1.430
-0.961 Activation = 0.193

-0.469 0.193
NEURÔNIO
$

𝑠𝑢𝑚 = % 𝑥𝑖 ∗ 𝑤𝑖
ARTIFICIAL 𝑦=
1 x1 x2 Class

1 + 𝑒 %& 0 0 0
!"#
0.5 0 1 1

1 0 1

1 1 0
-0.017

-0.893 Sum = -0.381


0.5 0.406
Activation = 0.406

0.148
0

0.5 * (-0.017) + 0.5 * (-0.893) + 0.5 * 0.148 = -0.381

0.5
NEURÔNIO ARTIFICIAL x1 x2 Class
0 0 0

0.589 0 1 1

1 0 1

1 1 0
-0.017

-0.893 Sum = -0.274


0.360 0.432
Activation = 0.432

0.148
1

0.589 * (-0.017) + 0.360 * (-0.893) + 0.385 * 0.148 = -0.274

0.385
NEURÔNIO ARTIFICIAL x1 x2 Class
0 0 0

0.395 0 1 1

1 0 1

1 1 0
-0.017

-0.893 Sum = -0.254


0.323 0.437
Activation = 0.437

0.148
0

0.395 * (-0.017) + 0.323 * (-0.893) + 0.277 * 0.148 = -0.254

0.277
NEURÔNIO ARTIFICIAL x1 x2 Class
0 0 0

0.483 0 1 1

1 0 1

1 1 0
-0.017

-0.893 Sum = -0.168


0.211 0.458
Activation = 0.458

0.148
1

0.483 * (-0.017) + 0.211 * (-0.893) + 0.193 * 0.148 = -0.168

0.193
ERROR (LOSS)

• Simplest algorithm
• error = expectedOuput – prediction

x1 x2 Class Prediction Error

0 0 0 0.406 -0.406

0 1 1 0.432 0.568

1 0 1 0.437 0.563

1 1 0 0.458 -0.458

Absolute average = 0.49


GRADIENT

min C(w1, w2 ... wn)

Calculate partial derivative to move


to the gradient direction
GRADIENT

• Find the combination of weights


where the error is as small as possible
error
• Gradient is calculated to know Calculate the slope of the curve with
how much to adjust the weights partial derivatives

Local minimum

Global minimum
w
GRADIENT (DERIVATIVE)

1
𝑦=
1 + 𝑒 !"

𝑑 = y * (1 – y)
DELTA PARAMETER

Activation function

Derivative

Delta

error
Gradient

w
NEURÔNIO ARTIFICIAL
𝐷𝑒𝑙𝑡𝑎𝑂𝑢𝑡𝑝𝑢𝑡 = 𝐸𝑟𝑟𝑜𝑟 ∗ 𝑆𝑖𝑔𝑚𝑜𝑖𝑑𝐷𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒 x1 x2 Class
0 0 0

0.5 0 1 1

1 0 1

1 1 0
-0.017

-0.893 Sum = -0.381


0.5 0.406
Activation = 0.406
Error = 0 – 0.406 = -0.406
0.148 Derivative (sigmoid) = 0.241
0 DeltaOutput = -0.406 * 0.241 = -0.098

0.5
NEURÔNIO ARTIFICIAL x1 x2 Class
𝐷𝑒𝑙𝑡𝑎𝑂𝑢𝑡𝑝𝑢𝑡 = 𝐸𝑟𝑟𝑜𝑟 ∗ 𝑆𝑖𝑔𝑚𝑜𝑖𝑑𝐷𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒
0 0 0

0.589 0 1 1

1 0 1

1 1 0
-0.017

-0.893 Sum = -0.274


0.360 0.432
Activation = 0.432
Error = 1 – 0.432 = 0.568
0.148 Derivative (sigmoid) = 0.245
1 DeltaOutput = 0.568 * 0.245 = 0.139

0.385
NEURÔNIO ARTIFICIAL
𝐷𝑒𝑙𝑡𝑎𝑂𝑢𝑡𝑝𝑢𝑡 = 𝐸𝑟𝑟𝑜𝑟 ∗ 𝑆𝑖𝑔𝑚𝑜𝑖𝑑𝐷𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒 x1 x2 Class
0 0 0

0.395 0 1 1

1 0 1

1 1 0
-0.017

-0.893 Sum = -0.254


0.323 0.437
Activation = 0.437
Error = 1 – 0.437 = 0.563
0.148 Derivative (sigmoid) = 0.246
0 DeltaOutput = 0.563 * 0.246 = 0.139

0.277
NEURÔNIO ARTIFICIAL x1 x2 Class
𝐷𝑒𝑙𝑡𝑎𝑂𝑢𝑡𝑝𝑢𝑡 = 𝐸𝑟𝑟𝑜𝑟 ∗ 𝑆𝑖𝑔𝑚𝑜𝑖𝑑𝐷𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒
0 0 0

0.483 0 1 1

1 0 1

1 1 0
-0.017

-0.893 Sum = -0.168


0.211 0.458
Activation = 0.458
Error = 0 – 0.458 = -0.458
0.148 Derivative (sigmoid) = 0.248
1 DeltaOutput = -0.458 * 0.248 = -0.114

0.193
NEURÔNIO
𝐷𝑒𝑙𝑡𝑎𝐻𝑖𝑑𝑑𝑒𝑛 = ARTIFICIAL
𝐷𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒𝑆𝑖𝑔𝑚𝑜𝑖𝑑 ∗ 𝑤𝑒𝑖𝑔ℎ𝑡 ∗ 𝐷𝑒𝑙𝑡𝑎𝑂𝑢𝑡𝑝𝑢𝑡 x1 x2 Class

Sum = 0 0 0 0
Derivative = 0.25
0 1 1
0.5 1 0 1

1 1 0

-0.017

0 Sum = 0
Derivative = 0.25
-0.893
0.5 DeltaOutput = -0.098

0.25 * (-0.017) * (-0.098) = 0.000


0.148
0
0.25 * (-0.893) * (-0.098) = 0.022
0.25 * 0.148 * (-0.098) = -0.004
Sum = 0
Derivative = 0.25

0.5
NEURÔNIO
𝐷𝑒𝑙𝑡𝑎𝐻𝑖𝑑𝑑𝑒𝑛 = ARTIFICIAL
𝐷𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒𝑆𝑖𝑔𝑚𝑜𝑖𝑑 ∗ 𝑤𝑒𝑖𝑔ℎ𝑡 ∗ 𝐷𝑒𝑙𝑡𝑎𝑂𝑢𝑡𝑝𝑢𝑡 x1 x2 Class

Sum = 0.358 0 0 0
Derivative = 0.242
0 1 1
0.589 1 0 1

1 1 0

-0.017

0 Sum = -0.577
Derivative = 0.230
-0.893
0.360 DeltaOutput = 0.139

0.242 * (-0.017) * 0.139 = -0.001


0.148
1
0.230 * (-0.893) * 0.139 = -0.029
0.236 * 0.148 * 0.139 = 0.005
Sum = -0.469
Derivative = 0.236

0.385
NEURÔNIO
𝐷𝑒𝑙𝑡𝑎𝐻𝑖𝑑𝑑𝑒𝑛 = ARTIFICIAL
𝐷𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒𝑆𝑖𝑔𝑚𝑜𝑖𝑑 ∗ 𝑤𝑒𝑖𝑔ℎ𝑡 ∗ 𝐷𝑒𝑙𝑡𝑎𝑂𝑢𝑡𝑝𝑢𝑡 x1 x2 Class

Sum = -0.424 0 0 0
Derivative = 0.239
0 1 1
0.396 1 0 1

1 1 0

-0.017

1 Sum = -0.740
Derivative = 0.219
-0.893
0.323 DeltaOuput = 0.139

0.239 * (-0.017) * 0.139 = -0.001


0.148
0
0.219 * (-0.893) * 0.139 = -0.027
0.200 * 0.148 * 0.139 = 0.004
Sum = -0.961
Derivative = 0.200

0.277
NEURÔNIO
𝐷𝑒𝑙𝑡𝑎𝐻𝑖𝑑𝑑𝑒𝑛 = ARTIFICIAL
𝐷𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒𝑆𝑖𝑔𝑚𝑜𝑖𝑑 ∗ 𝑤𝑒𝑖𝑔ℎ𝑡 ∗ 𝐷𝑒𝑙𝑡𝑎𝑂𝑢𝑡𝑝𝑢𝑡 x1 x2 Class

Sum = -0.066 0 0 0
Derivative = 0.250
0 1 1
0.484 1 0 1

1 1 0

-0.017

1 Sum = -1.317
Derivative = 0.167
-0.893
0.211 DeltaOuput = -0.114

0.250 * (-0.017) * (-0.114) = 0.000


0.148
1
0.167 * (-0.893) * (-0.114) = 0.017
0.156 * 0.148 * (-0.114) = -0.003
Sum = -1.430
Derivative = 0.156

0.193
BACKPROPAGATION

𝑤𝑒𝑖𝑔ℎ𝑡𝑛 + 1 = 𝑤𝑒𝑖𝑔ℎ𝑡𝑛 ∗ 𝑚𝑜𝑚𝑒𝑛𝑡𝑢𝑚 + (𝑖𝑛𝑝𝑢𝑡 ∗ 𝑑𝑒𝑙𝑡𝑎 ∗ 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒)


NEURÔNIO
0.5 ARTIFICIAL
𝑤𝑒𝑖𝑔ℎ𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 ∗ 𝑚𝑜𝑚𝑒𝑛𝑡𝑢𝑚
𝑛 +1 =
(𝒊𝒏𝒑𝒖𝒕 ∗ 𝒅𝒆𝒍𝒕𝒂 ∗ 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒)
𝑛 + 0.589

0 0
DeltaOutput = -0.098 DeltaOutput = 0.139

0 1

0.5 * (-0.098) + 0.589 * 0.139 +


0.396 * 0.139 + 0.484 * (-0.114) = 0.032
0.396
0.484

1
1
DeltaOutput = 0.139
DeltaOutput = -0.114

0
1
NEURÔNIO ARTIFICIAL
𝑤𝑒𝑖𝑔ℎ𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 ∗ 𝑚𝑜𝑚𝑒𝑛𝑡𝑢𝑚
𝑛 +1 =
(𝒊𝒏𝒑𝒖𝒕 ∗ 𝒅𝒆𝒍𝒕𝒂 ∗ 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒)
𝑛 +

0 0

0.5 DeltaOutput = -0.098 0.360 DeltaOutput = 0.139

0 1

0.5 * (-0.098) + 0.360 * 0.139 +


0.323 * 0.139 + 0.211 * (-0.114) = 0.022

1
1
0.323 DeltaOutput = 0.139
0.211 DeltaOutput = -0.114

0
1
NEURÔNIO ARTIFICIAL
𝑤𝑒𝑖𝑔ℎ𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 ∗ 𝑚𝑜𝑚𝑒𝑛𝑡𝑢𝑚
𝑛 +1 =
(𝒊𝒏𝒑𝒖𝒕 ∗ 𝒅𝒆𝒍𝒕𝒂 ∗ 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒)
𝑛 +

0 0
DeltaOuput = -0.098 DeltaOutput = 0.139

0 1

0.5 0.385
0.5 * (-0.098) + 0.385 * 0.139 +
0.277 * 0.139 + 0.193 * (-0.114) = 0.021

1
1
DeltaOutput = 0.139
DeltaOutput = -0.114

0
1

0.277
0.193
NEURÔNIO
Learning rate = 0.3 ARTIFICIAL
Momentum = 1
Input x delta -0.017
0.032
-0.893
0.022
0.021
0.148

𝑊𝑒𝑖𝑔ℎ𝑡𝑛 + 1 = 𝑤𝑒𝑖𝑔ℎ𝑡𝑛 ∗ 𝑚𝑜𝑚𝑒𝑛𝑡𝑢𝑚 +


(𝑖𝑛𝑝𝑢𝑡 ∗ 𝑑𝑒𝑙𝑡𝑎 ∗ 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒)

(-0.017 * 1) + 0.032 * 0.3 = -0.007


-0.007

(-0.893 * 1) + 0.022 * 0.3 = -0.886 -0.886

(0.148 * 1) + 0.021 * 0.3 = 0.154


0.154
NEURÔNIO ARTIFICIAL
Delta = 0.000 Delta = -0.001 𝑤𝑒𝑖𝑔ℎ𝑡𝑛 + 1 = 𝑤𝑒𝑖𝑔ℎ𝑡𝑛 ∗ 𝑚𝑜𝑚𝑒𝑛𝑡𝑢𝑚 +
(𝒊𝒏𝒑𝒖𝒕 ∗ 𝒅𝒆𝒍𝒕𝒂 ∗ 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒)

0 1

0 0

Delta = -0.001 Delta = 0.000

0 1 0 * 0.000 + 0 * (-0.001) + 1 * (-0.001) + 1 * 0.000 = -0.000

0 * 0.000 + 1 * (-0.001) + 0 * (-0.001) + 1 * 0.000 = -0.000

1 1
Rounded!
NEURÔNIO ARTIFICIAL 𝑤𝑒𝑖𝑔ℎ𝑡𝑛 + 1 = 𝑤𝑒𝑖𝑔ℎ𝑡𝑛 ∗ 𝑚𝑜𝑚𝑒𝑛𝑡𝑢𝑚 +
(𝒊𝒏𝒑𝒖𝒕 ∗ 𝒅𝒆𝒍𝒕𝒂 ∗ 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒)

0 1

Delta = 0.022 Delta = -0.027

0 0

0 * 0.022 + 0 * (-0.029) + 1 * (-0.027) + 1 * 0.017 = -0.010

0 * 0.022 + 1 * (-0.029) + 0 * (-0.027) + 1 * 0.017 = -0.012

0 1

Delta = -0.029 Delta = 0.017

1 1
NEURÔNIO ARTIFICIAL 𝑤𝑒𝑖𝑔ℎ𝑡𝑛 + 1 = 𝑤𝑒𝑖𝑔ℎ𝑡𝑛 ∗ 𝑚𝑜𝑚𝑒𝑛𝑡𝑢𝑚 +
(𝒊𝒏𝒑𝒖𝒕 ∗ 𝒅𝒆𝒍𝒕𝒂 ∗ 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒)

0 1

0 0

Delta = -0.004 Delta = 0.004

0 * (-0.004) + 0 * 0.005 + 1 * 0.004 + 1 * (-0.003) = 0.001

0 * (-0.004) + 1 * 0.005 + 0 * 0.004 + 1 * (-0.003) = 0.002

0 1

1 1

Delta = 0.005 Delta = -0.003


NEURÔNIO
Learning rate = 0.3 ARTIFICIAL -0.424
Momentum = 1 0.358
Input x delta
-0.000 -0.010 0.001
-0.000 -0.012 0.002
-0.740

𝑊𝑒𝑖𝑔ℎ𝑡𝑛 + 1 = 𝑤𝑒𝑖𝑔ℎ𝑡𝑛 ∗ 𝑚𝑜𝑚𝑒𝑛𝑡𝑢𝑚 + -0.577

(𝑖𝑛𝑝𝑢𝑡 ∗ 𝑑𝑒𝑙𝑡𝑎 ∗ 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒)


(-0.424 * 1) + (-0.000) * 0.3 = -0.424 -0.961
-0.424

(0.358 * 1) + (-0.000) * 0.3 = 0.358 0.358 -0.469

(-0.740 * 1) + (-0.010) * 0.3 = -0.743

(-0.577 * 1) + (-0.012) * 0.3 = -0.581 -0.743

-0.581

(-0.961 * 1) + 0.001 * 0.3 = -0.961

(-0.469 * 1) + 0.002 * 0.3 = -0.468


-0.961

-0.468
BIAS

-0.851 1
1
Different values even if all entries are zero -0.424

0.358
Change the output
-0.017

0
-0.541
0.145

-0.740 -0.893

-0.577

0.148
1

-0.961 0.985

-0.469
ERROR (LOOS)

• Simplest algorithm
• error = expected output – prediction

x1 x2 Class Prediction Error

0 0 0 0.406 -0.406

0 1 1 0.432 0.568

1 0 1 0.437 0.563

1 1 0 0.458 -0.458

Absolute average = 0.49


MEAN SQUARED ERROR (MSE) AND ROOT
MEAN SQUARED ERROR (RMSE)

x1 x2 Class Prediction Error

0 0 0 0.406 (0 – 0.406)2 = 0.164

0 1 1 0.432 (1 – 0.432)2 = 0.322

1 0 1 0.437 (1 – 0.437)2 = 0.316

1 1 0 0.458 (0 - 0.458)2 = 0.209

Sum = 1.011

MSE = 1.011 / 4 = 0.252


RMSE = 0.501
3
NEURÔNIO ARTIFICIAL
Credit history Debts

1
Properties Income Risk

Calculates the error for all rows and updates the weights
1 1 100
2 1 1 2 100
2
2
2
2
1
1
2
3
010
100
Batch gradient descent
2 2 1 3 001
2 2 2 3 001
3 2 1 1 100
3 2 2 3 010
1 2 1 3 001
1 1 2 3 001
Calculates the error for each row and updates the weights
1 1 1 1 100
1 1 1 2 010 Stochastic gradient descent
1 1 1 3 001
3 1 1 2 100

Credit history Debts Properties Income Risk

3 1 1 1 100
2 1 1 2 100
2 2 1 2 010
2 2 1 3 100
2 2 1 3 001
2 2 2 3 001
3 2 1 1 100
3 2 2 3 010
1 2 1 3 001
1 1 2 3 001
1 1 1 1 100
1 1 1 2 010
1 1 1 3 001
3 1 1 2 100
GRADIENT DESCENT

• Stochastic
• Helps prevent local minima
• Faster (no need to load all rows into memory)
• Mini batch gradient descent
• Choose a number of rows to calculate the error and update the weights
STEP FUNCTION

0 or 1
SIGMOID FUNCTION

1
𝑦=
1 + 𝑒 !"

Values in the range from 0 to 1


HYPERBOLIC TANGENT FUNCTION

# @ ! # A@
Y= # @ $ # A@

Values in the range from -1 to 1


RELU FUNCTION (RECTIFIED LINEAR UNITS)

Y = max(0, x)

Values >= 0
SOFTMAX FUNCTION

#(")
Y=
∑ #(")

Source: https://deepnotes.io/category/cnn-series
LINEAR FUNCTION
PIXELS
LOREM IPSUM

32 x 32 = 1.024 x 3 = 3.072 inputs

• It does not use all pixels


• It applies a dense neural network, but at the beginning it transforms the data
• What are the most important features?
CONVOLUTIONAL
LOREM IPSUM NEURAL NETWORK STEPS

1.
1. Convolution
Part 1 operation
2. Biological
Pooling fundamentals
Single layer perceptron
3.
2. Flattening
Part 2
4. Multi-layer
Dense neuralperceptronnetwork
3. Part 3
Pybrain
Sklearn
TensorFlow
PyTorch
STEP 1:IPSUM
LOREM CONVOLUTION OPERATION

1. Part 1
0 0 0 0 fundamentals
Biological 0 0 0
0 1 0layer
Single 0 perceptron
0 1 0 0
0 0 0 0 0 0 0 1 0 0
2. 0Part
0 0
2 1 0 1 1 X 1 0 1 =
Multi-layer perceptron
0 1 0 1 1 0 0 0 1 1
3. 0Part
1 03 1 1 0 1 Feature detector
Pybrain
0 1 0 0 0 1 1
Feature map
Sklearn Image
TensorFlow
0*1+0*0+0*0+0*1+1*0+0*1+0*0+0*1+0*1=0
PyTorch
STEP 1:IPSUM
LOREM CONVOLUTION OPERATION

0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1
0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 1 1 X 1 0 1 =
0 1 0 1 1 0 0 0 1 1
0 1 0 1 1 0 1 Feature detector
0 1 0 0 0 1 1
Feature map
Image

0*1+0*0+0*0+1*1+0*0+0*1+0*0+0*1+0*1=1
STEP 1:IPSUM
LOREM CONVOLUTION OPERATION

0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0
0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 1 1 X 1 0 1 =
0 1 0 1 1 0 0 0 1 1
0 1 0 1 1 0 1 Feature detector
0 1 0 0 0 1 1
Feature map
Image

0*1+0*0+0*0+0*1+0*0+0*1+0*0+0*1+0*1=0
STEP 1:IPSUM
LOREM CONVOLUTION OPERATION

0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 1
0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 1 1 X 1 0 1 =
0 1 0 1 1 0 0 0 1 1
0 1 0 1 1 0 1 Feature detector
0 1 0 0 0 1 1
Feature map
Image

0*1+0*0+0*0+0*1+0*0+1*1+0*0+0*1+0*1=1
STEP 1:IPSUM
LOREM CONVOLUTION OPERATION

0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 1 0
0 0 0 0 0 0 0 1 0 0 0 2 1 1 2
0 0 0 1 0 1 1 X 1 0 1 = 1 2 2 3 1
0 1 0 1 1 0 0 0 1 1 1 3 3 3 2
0 1 0 1 1 0 1 Feature detector
1 3 1 3 5
0 1 0 0 0 1 1
Feature map
Image

1*1+0*0+0*0+1*1+0*0+1*1+0*0+1*1+1*1=5
STEP 1:IPSUM
LOREM CONVOLUTION OPERATION – RELU

0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 1 0
0 0 0 0 0 0 0 1 0 0 0 2 1 1 2
0 0 0 1 0 1 1 X 1 0 1 = 1 2 2 3 1
0 1 0 1 1 0 0 0 1 1 1 3 3 3 2
0 1 0 1 1 0 1 Feature detector
1 3 1 3 5
0 1 0 0 0 1 1
Feature map
Image
STEP 1:IPSUM
LOREM CONVOLUTIONAL LAYER

0 0 0 0 0 0 0 0 1 0 1 0
0 1 00 1
0 1 0 0 0 1 0 0 02 11 01 12 0
0 02 11 01 12 0
0 0 0 0 0 0 0 1 02 0220 1131 0110 121 0 0
0 0 0 1 0 1 1 1 02 220 131 110 21 0
1 13 0230 22032 13121 1101 212 0
0 1 0 1 1 0 0 1 13 230 2032 3121 101 12 0
1 13 1311 23032 23252 3213 111 2
0 1 0 1 1 0 1 1 13 311 3032 3252 213 11 2
1 13 311 332 352 23 1
0 1 0 0 0 1 1 1 13 311 332 352 23 1
1 13 31 33 35 2
Image 1 13 31 33 35 2
1 13 31 33 35 2
1 3 1 3 5
The network will decide which feature detector to use 1 3 1 3 5
Feature maps
The convolutional layer is the set of feature maps
STEP 2:
LOREM POOLING
IPSUM
STEP 2:
LOREM POOLING
IPSUM

0 1 0 1 0
0 2 1 1 2 2
1 2 2 3 1
1 3 3 3 2
1 3 1 3 5
Feature map
STEP 2:
LOREM POOLING
IPSUM

0 1 0 1 0
0 2 1 1 2 2 1
1 2 2 3 1
1 3 3 3 2
1 3 1 3 5
Feature map
STEP 2:
LOREM POOLING
IPSUM

0 1 0 1 0
0 2 1 1 2 2 1 2
1 2 2 3 1
1 3 3 3 2
1 3 1 3 5
Feature map
STEP 2:
LOREM POOLING
IPSUM

0 1 0 1 0
0 2 1 1 2 2 1 2
1 2 2 3 1 3
1 3 3 3 2
1 3 1 3 5
Feature map
STEP 2:
LOREM POOLING
IPSUM

0 1 0 1 0
0 2 1 1 2 2 1 2
1 2 2 3 1 3 3
1 3 3 3 2
1 3 1 3 5
Feature map
STEP 2:
LOREM POOLING
IPSUM

0 1 0 1 0
0 2 1 1 2 2 1 2
1 2 2 3 1 3 3 2
1 3 3 3 2
1 3 1 3 5
Feature map
STEP 2:
LOREM POOLING
IPSUM

0 1 0 1 0
0 2 1 1 2 2 1 2
1 2 2 3 1 3 3 2
1 3 3 3 2 3
1 3 1 3 5
Feature map
STEP 2:
LOREM POOLING
IPSUM

0 1 0 1 0
0 2 1 1 2 2 1 2
1 2 2 3 1 3 3 2
1 3 3 3 2 3 3
1 3 1 3 5
Feature map
STEP 2:
LOREM POOLING
IPSUM

0 1 0 1 0
0 2 1 1 2 2 1 2
1 2 2 3 1 3 3 2
1 3 3 3 2 3 3 5
1 3 1 3 5
Feature map
CONVOLUTIONAL
LOREM IPSUM NEURAL NETWORK –
POOLING

0 0 0 0 0 0 0 0 1 0 1 0
0 1 00 1
0 1 0 0 0 1 0 0 2 1 1 2 0
0 1 0 1 2 1 2
0 02 11 01 12 0 2 2 1
0 0 0 0 0 0 0 1 02 0220 1131 0110 121 0 0 3 23 12 2
1 02 220 131 110 21 0 3 23 12 2
0 0 0 1 0 1 1
1 13 0230 22032 13121 1101 212 0 3 33 2352 12 1 2 2
1 13 230 2032 3121 101 12 0 3 33 352 2 1 2
0 1 0 1 1 0 0 3 33 352 2 1 2
1 13 1311 23032 23252 3213 111 2 3 33 352 2 1 2
0 1 0 1 1 0 1 1 13 311 3032 3252 213 11 2 3 33 35 2
1 13 311 332 352 23 1 3 33 35 2
0 1 0 0 0 1 1 1 13 311 332 352 23 1 3 33 35 2
1 13 31 33 35 2 3 3 5
Image 1 13 31 33 35 2 3 3 5
1 13 31 33 35 2
1 3 1 3 5
1 3 1 3 5 Max pooling

Feature maps
STEP 3:
LOREM FLATTENING
IPSUM

2
1
2
2 1 2
3
3 3 2
3
3 3 5
2
Pooled feature
map
3
3
5
STEP 3:
LOREM DENSE
IPSUM NEURAL NETWORK

0.9

0.8

0.1

0.2

0.1
STEP 3:
LOREM DENSE
IPSUM NEURAL NETWORK

0.1

0.2

0.1

0.7

0.9
STEP 3:
LOREM DENSE
IPSUM NEURAL NETWORK

0.1

0.2

0.9

0.2

0.3
STEP 3:
LOREM DENSE
IPSUM NEURAL NETWORK

0.9

65%
0.6

05%
0.6

0.2 30%

0.3
CONVOLUTIONAL
LOREM IPSUM NEURAL NETWORK

0 0 0 0 0 0 0 0 1 0 1 0 2
0
1 0 1 0 2 1 2
0 1 0 0 0 1 0
0 20 11 10 21 0 2 1 2 1
0 0 0 0 0 0 0 0 2 1 1 2 3 32 21 2
1 20 022 131 0 11 1 2 0 3 32 21 2 2
0 0 0 1 0 1 1
1 2 02 13 0 1 1 0 3 33 53
2 21 2
0 1 0 1 1 0 0 1 31 032 2032 1123 10 1 21 0 3 33 53
1 3 03 203 112 10 21 0 3 33
2 5 21 2 3
0 1 0 1 1 0 1 1 31 113 2033 22503 3112 110 21 0 3 3
32
5
21 2
0 1 0 0 0 1 1 1 31 113 2033 22503 3112 110 21 0
3 32 21 2 3
3 33 53
1 3 1 1 203 225 31 11 2 2 21 2
1 3 3 3 2 3 33 53 2
Image 1 3 1 1 203 225 31 11 2 2
1 3 3 3 2 3 33 53
1 3 11 23 25 3 1 2
1 31 32 32 23 1 3 3 5 3
1 3 1 3 5
1 3 3 3 2 3 3 5
1 3 1 3 5 3
1 3 3 3 2
1 3 1 3 5
1 3 1 3 5 5

Training using gradient descent

In addition to adjusting the weights, the feature detector is also changed

You might also like