Smart-Image-to-Text-and-Text-to-Speech-Reorganization-Using-Machine-Learning
Smart-Image-to-Text-and-Text-to-Speech-Reorganization-Using-Machine-Learning
AN INTERNATIONAL SCHOLARLY || MULTIDISCIPLINARY || OPEN ACCESS || INDEXING IN ALL MAJOR DATABASE & METADATA
Abstract—The optical character recognition (OCR) and transform to effectively gather and filter text sections in a picture.
The geometric properties and stroke width transform are used to
text-to-speech (TTS) concepts are combined in this project. By
remove the non-text maximum stable exterior regions. Individual
successfully establishing a voice interface connection with
letters and alphabets are then grouped to find text sequences, which
computers, this type of framework helps persons who are visually
are subsequently broken up into words. In order to digitize the
handicapped. Image to text and text to speech conversion is a
words, optical character recognition (OCR) is used. The text is
technique that uses the OCR method to read and scan 20+ different
converted to speech in the final phase by feeding it through our text-
languages and numbers in the image and converts them to voices.
to-speech synthesizer (TTS). On images from documents to nature
The voice processing module and the picture processing module are
settings, the suggested method is tested. The correctness and
both implemented in this project. Numerous methods have been
robustness of the suggested framework have been demonstrated by
used in the past, such as the Edged Based Method, Connected
promising findings, which promote its practical use in real-world
Component Method, Texture-Based Method, and Mathematical
applications.
Morphology Method, however they have significant limitations
when measured by exactness, f-score, and review. These picture
Keywords— Image Processing, Text Recognition and Extraction,
texts can be found in magazines, photographs, newspapers,
Maximally stable extremal regions, OCR(Optical Character-
banners, and other media. The development of intelligent systems
Recognition),SWT(Stroke-Width-Transform) TTS(Text-to-speech
to enhance quality of life is the focus of current technological
synthesizer)
developments in the fields of natural language processing and
image processing. An efficient method for text recognition, I. INTRODUCTION
extraction from images, and text-to-speech conversion is proposed
Modern performance on tasks requiring knowledge of documents
in this paper. In this work, a successful method for text detection,
and natural language has been shown for sequence modelling. Due to its
extraction from photos, and text to speech conversion is suggested.
practical potential for automatically transforming unstructured text input
The incoming image is first improved by using grey scale
into structured information to get insight about a document's contents,
conversion. Then, using the maximum stable external areas feature
form-based document understanding is a burgeoning study area.
detector, the text portions of the improved image are located. The
However, due to the variety of layout patterns in form like documents, it
following step is to use geometric filtering along with a stroke width
can be difficult to correctly serialise tokens in practise. We suggest Form
AN INTERNATIONAL SCHOLARLY || MULTIDISCIPLINARY || OPEN ACCESS || INDEXING IN ALL MAJOR DATABASE & METADATA
Net, a sequence model that takes structure into account in order to natural scene will have a certaintilt angle. These all pose great
improve the serialisation of forms. First, we create Rich Attention, which challenges for screen text recognition. This paper proposes a
makes use of the spatial relationship between tokens in a way that allows method based on character segmentation
for a more accurate calculation of attention score. Second, by embedding Database Used: - ICDAR 2017 Robust Reading Competition -
representations from their neighbouring tokens through graph Chinese (RRC-2017): a dataset containing over 2,000 high-
convolutions, we create Super-Tokens for each word. As a result, Form resolution images of natural scenes and born-digital images
Net explicitly restores any possibly lost local syntactic information. with bounding boxes around the text regions.
Source of impaired speech: - There are several sources of
II. LITERACTURE REVIEW
impaired speech that can affect Urdu-text detection and
1) Title: - A Novel Method based on Character Segmentation for recognition in natural scene images using deep learning. Some
Slant ChineseScreen-render Text Detection and Recognition of the most common sources of impaired speech include:
Author: - Tianlun Zheng 1,2, Xiaofeng Wang1,2,*, Xin Yuan Background noise, Accent or dialect and Speech disorders.
1,2, and Shiqin Wang 1 3) Title:- Novel Approach for Image Text Recognition and
Methodology: - Chinese characters were firstly extracted Translation
using vertical projection &error correction; then it can be Author: - Srinandan Komondor, Y. Mohana Roopa, M. Madhu
recognized via inception module based convolutional Bala
neural networks. The proposed model can effectively Methodology: - One of the most concerned problems of today is
segment Chinese characters from screen-rendered images, to exactly translate the text present in an image to a human
and significantly reduce the training time. readable text. This has been gaining attention these days
Database Used: ICDAR 2013 Robust Reading Competition because of the immense work done by the Computer Vision
- Chinese (RRC-2013): a dataset containing over 2,000 high- Community. The main important concept behind this
resolution images of natural scenes and born-digital images technology is something called as OCR(Optical Character
with bounding boxes around the text regions. Recognition.) With the help of the OCR, search& recognize the
Source of impaired speech: - There are several sources of text in electronic documents and can easily convert them into
impaired speech that can affect Urdu-text detection and human readable.
recognition in natural scene images using deep learning. Database Used: paper is likely inspired by the increasing
Some of the most common sources of impaired speech demand for image text recognition and translation systems that
include: Background noise, Accent or dialect and Speech can process text in multiple languages from images captured in
disorders the real world. This demand is driven by the growing use of
Observation: - One of the main observations of the paper is social media and the internet, which generates large amounts of
that traditional segmentation-free approaches may not be multilingual image content that requires accurate and efficient
effective for slant Chinese text, as they do not account for processing
deformation and overlapping of characters. This highlights Source of impaired speech: - The researchers conducted
the need for a new approach that can handle these challenges. experiments to evaluate the accuracy and effectiveness of the
2) Title: - A Novel Method based on Character Segmentation for proposed approach. They tested the system using images with
Slant ChineseScreen-render Text Detection and Recognition text in English, Spanish, and Chinese, and evaluated the
Author: - Tianlun Zheng 1,2, Xiaofeng Wang1,2, *, Xin Yuan system's ability to recognize and translate the text into other
1,2, and Shiqin Wang 1 languages. The results showed that the proposed approach
Methodology: - Screen rendering text has broad application achieved high accuracy in recognizing text in images and
prospects in the fields of medical records, dictionary screen translating it into different languages.
capture, and screen-assisted reading. However, Chinese screen
rendering text always has the challenges of small font size and
low resolution. Obtaining a screen-rendered text image in a
AN INTERNATIONAL SCHOLARLY || MULTIDISCIPLINARY || OPEN ACCESS || INDEXING IN ALL MAJOR DATABASE & METADATA
The voice processing module and the image processing module are the
1. Unit Testing: -It is the testing of individual
two main components that make up the text-to-speech device. The
software units of the application .it is done after
image processing module uses the camera to capture photos and
the complexion of an individual unit before
converts them into text. In order for the sound to be perceived, the
integration. Unit testing involves the designof test
speech processing module turns the text into audio and processes it with
cases that validate that the internal program logic
explicit physical properties. Second, the voice processing module
is functioning properly, and that program inputs
converts.txt to speech from.jpg, where OCR alters the extension. OCR
produce valid outputs. All decision branches and
(optical character recognition), is a technological advancement that
internal code flow should be validated. This is a
accurately recognizes characters using the optical system. The camera
structural testing, that relies on knowledge of its
functions as the equivalent of the eye, while the computer serves as the
construction and is invasive. Unit tests perform
equivalent of the human intellect when it comes to processing images
basic tests at component level and test a specific
A. AIM: business process, application, and/or system
configuration. Unit tests ensure that each unique
In this work, an effective approach is suggested for text recognition
path of a business process performs accurately to
and extraction from images and text to speech conversion.
the documented specifications and contains
B. OBJECTIVES: clearly defined inputs and expected results.
To detect and extract text from images &convert it into a digital form
2. Integration Testing: Integration tests are
of speech for an effective medium of communication. To extract
designed to test integrated software components
information (text) and convert them into digital form and recite it
to determineif they run as one program. Testing is
accordingly. To be as effective medium for communication.
event driven and is more concerned with the basic
IV. SOFTWARE DESIGN outcome of screens or fields. Integration tests
demonstrate that al- though the components were
The software used in Smart Image to Text & Text to Speech
individually satisfaction, as shown by successfully
Recognition using Machine Learning typically involves several
unit testing, the combination of components is
components that work together to enable this functionality following fig
correct and consistent. Integration testing is
4.1 shows the system architecture process:
specifically aimed at exposing the problems that
arise from the combination of component
AN INTERNATIONAL SCHOLARLY || MULTIDISCIPLINARY || OPEN ACCESS || INDEXING IN ALL MAJOR DATABASE & METADATA
AN INTERNATIONAL SCHOLARLY || MULTIDISCIPLINARY || OPEN ACCESS || INDEXING IN ALL MAJOR DATABASE & METADATA
2. No More Typing
The last point that will leave you with is the fact that typing could be a
skill of the past if user continue down this road. Text to speech
converters is big, but what if speech to text takes over
It would be a thing of the past to type out your message or paper,
because user could just use our voice. It does make sense in some
regards, because it can probably speak much faster than typing in most
cases. But there are certain drawbacks that could hinder the expansion
of this idea.
[3].R. Bhardwaj and K. D. Jain, "Text extraction from natural scene images using
machine learning," 2019 International Conference on Machine Learning, Big
Data, Cloud and Parallel Computing (COMITCon), Allahabad, India, 2019, pp.
56-61, doi: 10.1109/COMITCon.2019.8884491.
[4]https://towardsdatascience.com/image-to-text-recognition-using-deep-
learning-6b2e8d6b5f70
[5]https://www.analyticsvidhya.com/blog/2021/01/ocr-with-deep-learning-and-
Fig. 5.6: - English to Hindi Language Conversion Interface Screen opencv-for-image-to-text-conversion/
In future, this work can be extended to detect the text from video or real [7]. R. Shrestha and M. Dahal, "Deep Learning: A Comprehensive Guide to
time analysis and can be automatically documented in Word Pad or any Neural Network Methods in Natural Language Processing, Image Recognition,
and Voice Recognition," Packt Publishing, 2019.
other editable format for further use.
1. Taking Over Education [8]. A. Géron, "Hands-On Machine Learning with Scikit-Learn, Keras, and
TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems,"
Another wild reality is that it could possibly overthrow some forms of O'Reilly Media, 2019.