Improved Optical Character Recognition With Deep Neural Network
Improved Optical Character Recognition With Deep Neural Network
Penang, Malaysia
Abstract— Optical Character Recognition (OCR) plays an caused by damaged print cartridges. Unfortunately, these
important role in the retrieval of information from pixel-based training samples are yet to be found in existing solutions. In
images to searchable and machine-editable text formats. In old or order to recognize poor quality English characters, an improved
poorly printed documents, printed characters are typically broken OCR with sufficient training data is needed.
and blurred, making character recognition potentially far more
complex. In this work, deep neural network using Inception V3 is In transfer learning, training samples can be used to pre-train
used to train and perform OCR. The Inception V3 network is a network in the source domain, and these well-trained learning
trained with 53,342 noisy character images, which were collected characteristics can be delivered and benefit the training process
from receipts and newspapers. Our experimental results show that in the target domain of the second network. In recent years,
the proposed deep neural network achieved significantly better traditional methods in the field of OCR research have been
recognition accuracy on poor quality text images and resulted in almost replaced with deep learning methods such as
an overall 21.5% reduction in error rate compared to existing Convolutional Neural Networks (CNN). Oquab et al. proposed
OCRs. that using CNN to learn image representations on a large
annotation dataset can adequately transfer this information to
Index Terms—OCR (Optical Character Recognition), Deep other visual recognition tasks with a limited amount of training
Learning, Transfer Learning data [2]. Yejun Tang et al. proposed to add an adaptation layer
in CNN using transfer learning, which achieves performance
I. INTRODUCTION improvement in historical Chinese character recognition tasks
Text character recognition commonly deals with the [3]. Inspired by these works, we propose to apply a deep neural
recognition of optically processed characters and is also known network with transfer learning for broken English character
as optical character recognition (OCR). The basic idea of OCR recognition.
is to convert any hand written or printed text into data files that
can be edited and read by machine. With OCR, any article or II. METHODOLOGY
book can be scanned directly and the image can then be easily
converted to text using a computer. The OCR system has two A. OCR Design Model
major advantages, which are the ability to increase productivity The adopted methodology in this paper is inspired from
by reducing staff involvement and the ability to store text YeJun Tang OCR system [3]. Although the research work done
efficiently. Generally, the areas where OCR can be applied are by YeJun Tang was focused on Chinese text characters, but the
postal departments, banks, publication industry, government transfer learning concept can be applied for English text
agencies, education, finance and health care [1]. The universal characters with the same purpose to reduce training time and
OCR system consists of three main steps which are image improve recognition accuracy. A pre-trained model is used and
acquisition and preprocessing, feature extraction and applied together with transfer learning in this paper to enhance
classification [1]. Image preprocessing phase cleans up and the recognition results and speed up the training process. The
enhances the image by noise removal, correction, binarization, proposed OCR system has been designed with the help of
dilation, color adjustment and text segmentation. Feature various modules as shown in Fig. 1.
extraction is to extract and capture information from the
acquired text image to be used for classification. In the
classification phase, the portion of the segmented text in the
document image is mapped to the equivalent textual
representation.
There are several existing OCR solutions which are
commonly used in machine learning and pattern recognition.
However, there is still a challenging problem for recognizing
broken or faded English characters. The performance of OCR
directly depends on the quality of input image or document, thus
making character recognition in scene images is potentially far
more complicated. In addition, poor quality English characters Fig. 1. The proposed OCR model
are typically obtained from old printed documents, and some are
246
2018 IEEE 14th International Colloquium on Signal Processing & its Applications (CSPA 2018), 9 -10 March 2018, Penang, Malaysia
TABLE II
4) Activation Function: The purpose of activation function
EXPERIMENT ON THE NUMBER OF FULLY CONNECTED LAYERS is to convert an input signal of a node in a neural network to an
Number Layer Size Accuracy Time
output signal [6]. Without activation function, neural networks
of Layers (1st, 2nd, 3rd) (%) (mins) would not be able to learn and model complicated information
256 63.0 11.72 such as videos, audio, speech and images [7]. There are
512 62.9 14.20 different types of activation function that can be applied such
1 1024 64.5 20.30
2048 64.1 31.15 as Relu (Rectified Linear Unit), sigmoid and tanh. We have
4096 64.4 54.02 conducted accuracy measurement using ReLu, sigmoid and
1024, 256 61.1 21.60 tanh. Experimental results show that ReLu has the highest
2048, 256 61.9 35.23 accuracy (73.1%) while sigmoid and tanh achieved 67.5% and
2 4096, 1024 65.8 75.61
1024, 4096 65.6 44.28
68.8% accuracy respectively. ReLu has no gradient vanishing
4096, 2048 66.6 105.05 problem as ReLu’s gradient is always constant = 1. The only
4096, 2048, 1024 61.4 105.73 disadvantage of ReLu is that it can cause overfitting, but this
4096, 1024, 256 61.6 75.32 can be solved by using dropout. The final network
3 1024, 4096, 1024 60.9 63.40
4096, 1024, 1024 64.4 81.88
implementation is as follows. We use the Inception V3 network
4096, 4096, 1024 61.0 158.2 model, with two fully connected layers (1st: 4096, 2nd: 2048).
We train the final network with 50,000 iterations, a batch size
of 128 with a dropout factor of 0.2 and ReLu is used as the
TABLE III
ITERATION VS ACCURACY AND TIME USAGE
activation function for the proposed OCR system.
Iteration No Accuracy % Time (mins) IV. RESULTS
1000 40.40 1.50
5000 63.80 7.48 A. Benchmark with Existing OCR
10000 63.80 18.32
50000 65.50 73.50 The proposed OCR is benchmarked against existing OCR,
100000 63.20 146.13 the a9t9 (also called as OCR space). The a9t9 OCR was released
150000 65.10 221.57 in 2017 and the API is available online [8]. a9t9 OCR supports
250000 65.70 470.62 image dimensions of 40 by 40 pixels up to 2600 by 2600 pixels.
Since a9t9 OCR does not provide any standard testing data in
247
2018 IEEE 14th International Colloquium on Signal Processing & its Applications (CSPA 2018), 9 -10 March 2018, Penang, Malaysia
Accuracy (%)
80
75
70
65
60
55
(a) 50
White Black Printer Split-
Ink Pepper Edge Corner
line line head up/divisi
leakage noise distortion missing
distortion distortion damage on
Accuracy 59.7 62.9 67.7 67.7 72.6 77.4 83.9 90.3
V. CONCLUSION
Recognized
Accuracy
Imag Character
(%)
(%)
248
2018 IEEE 14th International Colloquium on Signal Processing & its Applications (CSPA 2018), 9 -10 March 2018, Penang, Malaysia
249