Text Extraction From Digital Images With Text To Speech Conversion and Language Translation
Text Extraction From Digital Images With Text To Speech Conversion and Language Translation
ISSN No:-2456-2165
Abstract:- In this age of digital information, the need for In this paper, we propose a system which extracts text
digitalizing anything and everything is a growing need. from a given picture and then convert it into speech and it
Today most of the information is available either on paper can also be converted into many other languages. The system
or in the form of photographs. Tracking and modifying tries to combine the needed text manipulations in one single
information from images is inconvenient and time- product.
consuming. Thus we need to extract textual images into
editable text. We do have technologies to extract text but II. EXISTING METHODOLOGY
they are mainly against clean backgrounds and it seemed
to generate erroneous results. Thus, there is a need for a Now, the old method to perform the text to speech con-
system to extract text from general backgrounds with version requires a web camera to acquiring an image and
more accuracy. Generating such text can be utilized as an converting it into a text document using Optical Character
input to a TTS Text-to-Speech where it converts any text Recognition (OCR). The next stage involves natural language
into a speech signal and later on translated into the processing and digital signal processing for converting the text
desired language. Our market is beamed with text into speech using Text to Speech synthesizer (TTS). The
extraction, TTS and translation products but is often in following paragraphs discuss various methods in the field.
separate products. Our aim to achieve is combining all
these text manipulations in one product and to be able to The method proposed in [1] is a combination of scene
sync with existing text modifying products. text detection and scene text recognition algorithm. An image
with text is given as input, preprocessing methods are used to
Keywords:- Optical Character Recognition, Firebase ML Kit, remove noises. Binarization helps in identifying text from an
Binarization, TTS, Discrete Contour Evolution. image. Thinning and scaling is performed by connectivity
algorithm if any data is lost during preprocessing. The
I. INTRODUCTION approach in [1] uses a character descriptor to separate text
from an image. The detected text is converted using a
Information today has been highly graphical and are descriptor and wavelet feature. Sibling of each character is
stored in the form of images or in videos. Yet recent calculated using an adjacent character grouping algorithm.
technology is restricted to how to retrieve those Stroke-related features are extracted using skeletons and
informational texts from the image. That’s why text character boundaries. There are 3 main steps in the
extraction plays a vital role in many applications that implementation as follows: Given a synthesized patch from a
include information retrieval, digital library, multimedia training set, we obtain character boundary and character
systems, and many more. Text can be of immense use if can skeleton by applying discrete contour evolution (DCE) and
be converted to audio, especially for reducing visual skeleton pruning on the basis of DCE.
reliability. Text-to-speech (TTS) is a process of producing
spoken word from the text. As the world has grown into a In robust algorithm [2], proposes a new text detection
global village, the diversity in native languages shouldn’t and extraction method that overcomes the weakness of
make anyone from experience life as an outsider. Thus previous approaches. The input image is first transformed into
language conversion to the desired language of the user can a binary image and edge detection is applied. Instead of
be of great use, in another lingual atmosphere. performing a simple thresholding method, maximally stable
external regions (MSER) are detected. These regions contain
Text manipulations have been always in trend and in the text components and are appointed as white pixels.
need. Text to speech systems was initially developed to assist However, the resulting binary image does not reveal the exact
the visually impaired by offering a computer-generated boundaries of the text. For this reason, MSER binary image is
spoken voice that would read out loud the text. It beneficial, enhanced by performing a thresholding operation on each
especially for blind people as they will be able to understand connected component. Edges are then detected and fed into a
what is written by hearing it. Language translation is very stroke width detector where strokes, stroke widths, and
helpful, especially to understand signboards. connected components are found and filtered. In this using a
robust algorithm, that proved to be effective on blurred images
and noisy images as well.
1) Image Capturing: This step involves to extract text from Fig. 1 Proposed Architecture
general scenes with accuracy, mainly from digital images by
means of digital image processing techniques. This is done by
using phone camera. Focused images captured on-the-go or
those stored in the memory space can be utilized in this stage.
V. CONCLUSION
REFERENCES