Automatically Detect and Recognize Text in Natural Images
Automatically Detect and Recognize Text in Natural Images
S.GANESH-U17EC216
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
Abstract
foreground[1]. Image segmentation is the next instantly to reduce the wait times in banks. OCR
critical step which is used to cluster the pixel of systems enable form processing tools to
image in order to provide information for the surface extract and read the relevant information from paper
of the image. Feature extraction that comes after based forms. Medical professionals
have to deal with a large volume of forms containing
image segmentation has been applied in order to
important information about the
extract all the useful characteristic of character
patients. It is useful to keep up with all the
features and reduce the number of errors in character
information by putting it into a central
recognition process.
database digitally, so that the information can be
Replication of human reading
accessed efficiently as required. The
process with the help of machines has been an old
large scale digitization projects need efficient OCR
area of research in the field of pattern recognition and
system to convert millions of printed
machine learning [Cle65]. In spite of
books and documents to digital archives. Digital
this early start, machine reading or optical character
archives provide searchable access to the content,
recognition (OCR) of degraded text
easy backup facilities and eliminate the need for
(broken or merged characters and presence of noise)
physical storage of printed
without human intervention remains
documents.
an elusive goal. This thesis introduces a new
OCR technology is widely used in many
segmentation free OCR approach using a
other fields like mail sorting, education, finance
combination of artificial neural networks (ANNs) and
and government or private offices. It automates the
hidden Markov models (HMMs) for
reading of addresses on letters and
degraded text recognition. In addition, it provides
parcels for efficient mail disbursement. It facilitates
novel applications of ANNs and HMMs
in digital archiving of conference
in document image analysis and recognition. The
proceeding and journals to make them available for
thesis also contributes in the field of
on-line access. Invoice imaging tools
cognitive psychology by presenting new
help in many businesses to keep track of financial
psychophysical experiments to determine the
records. In offices, it simplifies the
impact of overall word shape and importance of letter
collection of data from printed documents for
positions during visual recognition
analysis and further usage. In short, OCR
of words.
technology has revolutionized the document
management process in a wide range of
industries by turning a scanned document image into
a computer readable text document.
OCR systems transform a two dimensional image of
text, that could contain machine
printed or handwritten text, ideally in any script, from
its image representation into
machine readable text. OCR systems usually work in
a pipeline and there are several
steps before actual text recognition takes place
[Bre08]. A typical OCR system may comprises of
preprocessing, layout analysis, character recognition
and language modeling.
Preprocessing normally includes
binarization, noise removal, skew correction and
optionally script and orientation detection. Layout Figure 1.1: A block diagram of a typical OCR
analysis identifies text columns, text blocks, System. The diagram shows different
text lines and reading order of the text page. intermediate processing steps during OCR process.
The database has been developed on diverse character segmentation significantly affects character
collection of document images from scientific, legal recognition accuracies. Sophisticated character
where T and E are the sets of ground-truth In order to compare our results with [7], we
and estimated rectangles respectively. The standard f have implemented the comparison measures
measure was used to combine the precision and recall proposed there. Our algorithm performance is as
figures into a single measure of quality. The relative follows: the Word Recall rate is 79.04%, and the
weights of these are controlled by a parameter ,ߙ Stroke Precision is 79.59% (since our definition of a
which we set to 0.5 to give equal weight to precision stroke is different from [7], we counted connected
and recall: components inside and outside the ground truth
rectangles. Additionally, we counted Pixel Precision,
the number of pixels inside ground truth rectangles
divided by the total number of detected pixels. This
ratio is 90.39%. This outperforms the results shown
in [7] In addition to providing result on ICDAR
database, we propose a new benchmark database for
The comparison between precision, recall text detection in natural images [26]. The database,
and f-measure of different algorithms tested on the which will be made freely downloadable from our
ICDAR database is shown in Table 1. In order to website, consists of 307 color images of sizes ranging
determine the importance of stroke width information from 1024x1360 to 1024x768.
(Section 3.1) and geometric filtering (Section 3.2),
we additionally run the algorithm on the test set in The database is much harder than ICDAR,
two more configurations: configuration #1 had all the due to the presence of vegetations, repeating patterns,
stroke width values less than ∞ set to 5 (changing this such as windows, virtually undistinguishable from
constant did not affect the results significantly). text without OCR, etc. Our algorithm's performance
Configuration #2 had the geometric filtering turned on the database is as follows: precision: 0.54, recall:
off. In both cases, the precision and recall dropped 0.42, f-measure: 0.47. Again, in measuring these
(p=0.66, r=0.55 in configuration #1, p=0.65, r=0.5 in values we followed the methodology described in [8].
configuration #2). This shows the importance of Since one of the byproducts of our algorithm is a
information provided by the SWT. In Figure 7 we letter mask, this mask can be used as a text
show typical cases where text was not detected. segmentation mask. In order to evaluate the usability
of the text segmentation produced by our algorithm, FOR VIDEO TECHNOLOGY, VOL. 12, NO. 4,
we presented an off-the-shelf OCR package with APRIL 2002, pp. 256- 268
several natural images, containing text and,
additionally, with the binarized images representing [5] A. Jain, B. Yu, “Automatic Text Location in
text-background segmentation. The results of the Images and Video Frames”, Pattern Recognition
OCR in both cases are shown in Figure 11. 31(12): 2055-2076 (1998)