Visveswaraya Technological University: Design of An Automatic Reader For The Visually Impaired Using Raspberry Pi
Visveswaraya Technological University: Design of An Automatic Reader For The Visually Impaired Using Raspberry Pi
Visveswaraya Technological University: Design of An Automatic Reader For The Visually Impaired Using Raspberry Pi
2020-2021
CERTIFICATE
DECLARATION
We the student of 7th semester B.E in Electrical and Electronics Engineering, The Oxford
college of Engineering, Bengaluru, hereby declare that the project work entitled “DESIGN
OF AN AUTOMATIC READER FOR THE VISUALLY IMPAIRED USING
RASPBERRY Pi” has been done and submitted in partial fulfilment of the requirements for
the award of the degree in Bachelor of Engineering in ELECTRICAL AND
ELECTRONICS ENGINEERING Of Visvesvaraya Technological University, Belagavi
during the year 2022-2023
CHETHAN V R 1OX19EE005
SURYANARAYANAREDDY 1OX19EE034
BHARATHKUMAR K P 1OX19EE043
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompany the successful completion of any task would be
incomplete without the mention of the people who made it possible with continuous guidance and
encouragement and crowned our effort with success.
We have great pleasure in expressing our deep sense of gratitude to Late Shri. S. NARASARAJU,
Founder chairman and we consider ourselves proud to be a part of the Oxford family the institution
that stood by our way in all our endeavors. So, we express our gratitude to Shri. S. N. V. L.
NARASIMHA RAJU, Chairman, The Oxford Educational Institutions and Dr. K M
RAVIKUMAR, Director, The Oxford Educational Institutions for providing all facilities for our
work to be a better one.
We would like to express our gratitude to Dr. N KANNAN, Principal, The Oxford College of
Engineering for providing congenial environment surrounding to work in, we heartly thank to our
beloved HOD, Dr. V. S. BHARATH, Department of EEE for his encouragement and support.
Words are inadequate in offering our thanks to our guide Dr. V. S. BHARATH, professor & HOD.
Project Coordinator Mr. Jayakumar N, Assistant Professor, Department of EEE with profound
gratefulness for a moral inspiration, encouragement and valuable guidance throughout course of this
work.
We also thank all the staff members of Electrical and Electronics department and all those who have
directly and indirectly helped us with their valuable suggestions in the successful completion of this
Project. Last but not the least we would thank our beloved parents for their support and
encouragement to successfully complete the task by meeting the entire requirement.
CHETHAN V R 1OX19EE005
SURYANARAYANAREDDY 1OX19EE034
BHARATHKUMAR K P 1OX19EE043
DESIGN OF AN AUTOMATIC READER FOR THE VISUALLY IMPAIRED USING RASPBERRY Pi
CHAPTER – 1
ABSTRACT
This project is an automatic document reader for visually impaired people, developed on the
Raspberry Pi processor board. It controls the peripherals like camera, a speaker which act as an
interface between the system and the user. Optical character recognition (OCR) technology is
used for the identification of the printed characters using image sensing devices and computer
programming. The OCR process is done using online and offline methods.
It converts images of typed or printed text into machine-encoded text. These encoded texts are
then converted into the audio output (Speech). Raspberry Pi is used for the translation of printed
document into data files using Tesseract library and Python programming. These data files are
then computed by OpenCV library and Python programming language to get the audio output.
This project explains the automated document reader for blind people with the help of Raspberry
Pi. This process takes place with the help of Optical Character Recognition (OCR) technology
for identification of the printed characters using image sensing devices and computerized
programming. It converts images of typed, handwritten, or printed text into machine encoded
text with the help of OCR. In this project these images are converted into audio output
(Speech)with the help of OCR and Text-to-speech synthesis. The conversion of printed
document into text files is done with the help of Raspberry Pi which again uses Tesseract library
and Python programming. The text files are processed with the help of OpenCV library and
python programming language and hence the audio output is obtained.
Keywords: Character recognition, Low power, Document Image Analysis (DIA), Raspberry Pi
4B, Speech Output, OCR based book reader, OpenCV, Python Programming.
CHAPTER – 2
INTRODUCTION
Today there is a huge amount of written material available everywhere, and converting that into
Braille is not a suitable option due to the vastness of the content that is available and the lack of
return on such a huge investment. But that would mean denying the blind an access to these
huge quantities of scholarly material. A suitable and viable alternative for this predicament is
designing a smart device which can convert this print media into speech format and it can in turn
play it out for the visually impaired. Such a device is radical in its kind and is a huge benefit,
although the initial investment may be high but in the long run it is pretty cost effective. The
main objective of this project is to make use of such a system specifically the Raspberry Pi
system along with its ancillaries in unison and use it to convert text into speech and thus assist
the visually impaired.
The proposed algorithm uses a camera module with which it can capture the desired text and
then convert it to binary representation [1], i.e., converting the image into a Gray-scale image.
From this grayscale image the individual characters are extracted and recognized all of which is
carried out by the Optical Character Recognition Algorithm. Upon undergoing the processes of
Scanning, Pre-processing, Segmentation and Feature Extraction, the scanned text is finally ready
to give its output by means of the speaker connected to the Pi module. Even though such
systems are present, most of them are in the crude forms and developing a commercially viable
setup will be a huge aid for the visually impaired thus giving them access to unprecedented
amounts of text and written media. Such as system which involves only a one-time investment is
thus a vital assist tool. The main objective of this project is converting print and written media
into playable audio with high efficiency. A unique addition in this device is to record speech in
the memory and replay these audio files at a convenient time.
There are many existing solutions to the problem of assisting individuals who are blind to read,
however, none of them provide an efficient reading. We focus on improving the competence of
blind people by providing them with a solution where the details are given in the form of audio
signal. Raspberry Pi-Based Reader is an automatic document reader for visually impaired people
using OCR technology. The proposed project uses a camera-based assistive device which can be
used by individuals to read printed text. The scheme is to implement an embedded system-based
image capturing technique using Raspberry Pi board. The design is inspired by prior research
with visually impaired people, and it is small and portable, that helps in achieving result in little
setup. Here, we have put forward a text read out system for visually impaired people. OCR and
Text-to-Speech synthesis is used to convert images into audio output (Speech). The proposed
apparatus has a camera which act as the input device for digitization and this digitized script is
processed by OCR (software module). A procedure is followed for recognition of characters and
the line of reading. In the context of software development, the Open CV (Open-source
Computer Vision) libraries are employed to capture image of text and character recognition. The
final identified text document is given to the output devices based on the choice of the user.
Headset connected to the Raspberry
CHAPTER – 3
3.1 BRAILLE
Braille is a reading and writing system for blind and vision impaired people. It is made
up of raised dots that can be 'read' by touch. The basic component is a rectangular 'cell'
of six dots, arranged in two vertical columns of three dots.
Each dot arrangement represents a different letter or number. For example, the letter 'A'
is a single dot (the first dot in the top left column. 'B' is two vertically aligned dots (first
and second dots in the left column), while 'C' is two horizontally aligned dots (top dot in
both columns).
In this way, braille offers 63 different dot combinations to form the alphabet, numbers,
punctuation marks and abbreviations. Braille is used around the world in many
languages. Just about any written information can be presented in braille including
books, music, mathematics and knitting patterns.
Grade 1 - the braille alphabet, numbers and punctuation. This is equivalent to the print
alphabet. People learning braille usually start with Grade 1. However, this form takes up
a lot of space, which makes Grade 1 braille books much bulkier than print books.
Grade 2- braille that, in addition to the alphabet, uses abbreviations and contractions
(similar to that of shorthand). Grade 2 braille is used for more complicated texts, such as
novels and large documents, because it takes up less space. For example, the word
'braille' is written as 'brl'. The shorter words mean less finger travel across a line and a
faster reading speed. Grade 2 is the most popular form of braille.
A braille display (also known as a screen reader) - this is a piece of equipment connected
to the computer that reads screen text and presents it to the user via one line of
refreshable braille.
A braille embosser - this is a type of printer that prints text in braille dots. It relies on a
braille translator to translate text.
A braille keyboard - this is a keyboard consisting of six keys for producing braille dots, a
space bar, carriage return and backspace key. It allows the user to type in braille.
Scanners - text can be converted into braille using a scanner and a computerised braille
translation program.
Tele braille III - this device attaches to a telephone typewriter (TTY). The TTY is a
small screen and typewriter that is used in place of the telephone handset, so that the
conversation is typed rather than spoken. The Tele braille III transcribes the written text
and displays it in braille.
• Braille isn’t used to transcribe and write books and publications alone.
• It’s recommended to learn braille by touch if you’re losing your vision but still have
some sight remaining.
CHAPTER – 4
LITERATURE SURVEY
CHAPTER – 5
BLOCK DIAGRAM
CHAPTER – 6
PROPOSED SYSTEM
CHAPTER – 7
FLOW OF PROCESS
7.1IMAGE CAPTURING
The first step in which the device is moved over the printed page and the inbuilt camera captures
the images of the text. The quality of the image captured will be high so as to have fast and clear
recognition due to the high-resolution camera.
7.2 PRE-PROCESSING
Pre-processing stage consists of three steps: Skew Correction, Linearization and Noise removal.
The captured image is checked for skewing. There are possibilities of image getting skewed
with either left or right orientation. Here the image is first brightened and binarized. The
function for skew detection checks for an angle of orientation between ±15 degrees and if
detected then a simple image rotation is carried out till the lines match with the true horizontal
axis, which produces a skew corrected image. The noise introduced during capturing or due to
poor quality of the page has to be cleared before further processing.
7.3 SEGMENTATION
After pre-processing, the noise free image is passed to the segmentation phase. It is an operation
that seeks to decompose an image of sequence of characters into sub-image of individual symbol
(characters). The binarized image is checked for inter line spaces. If inter line spaces are
detected then the image is segmented into sets of paragraphs across the interline gap. The lines
in the paragraphs are scanned for horizontal space intersection with respect to the background.
Histogram of the image is used to detect the width of the horizontal lines. Then the lines are
scanned vertically for vertical space intersection. Here histograms are used to detect the width of
the words. Then the words are decomposed into characters using character width computation.
CHAPTER – 8
RELATED WORK
the authors proposed a prototype which helps any blind person to listen any text
images of English and Tamil language. Reading of text is happened by taking images
of the text and converting the image to audio output in the above-mentioned
languages. This was done with the help of Raspberry Pi 3 model B, a web-camera,
Tesseract OCR (Optical Character Recognition) engine and Google Speech API
(Application Program Interface) which is the text to speech engine. The disadvantage
of the systems that it produces unclear output with incorrect regional accent and also
problem in speech for the Tamil language.
the authors proposed a smart book reader for visually challenged based on Optical
Character Recognition. The Raspberry Pi3 kit and Raspberry Pi Camera Module are
used here. The Google Tesseract is used for OCR and Pico is used for text to speech
in this project. On pre-processing stage this method uses binarization, de-noising,
desk Ewing and segmentation techniques for image clarity purpose. Sometimes a
mobile application allows blind people to “read” text by using a Photo-to-speech
application. A combination of OCR and Text to-Speech (TTS) framework is
integrated. With the help of smart phone take a picture and hear the text that exists in
the picture. Some drawbacks are, it’s not providing any automatic system for
capturing images.
Generally, optical character recognition (OCR) recognizes the texts from the data of
captured images. Conversion of scanned or photographed document into electronic
transcript is happening here. Digital text synthesized into voice by using the
technology of speech synthesis (TTS) and played through any audio system. This
system is constructed by using raspberry pi, HD camera and Bluetooth headset.
the authors proposed a model which enables any user to hear any text in real-time
without taking any pain of reading. The whole process is established with the help of
OCR (Optical Character Recognition) and TTS (Text-to-Speech) frameworks. This
combination is happening into Raspberry Pi v2. The disadvantage of the system is that
captured image was blurred and for that reason sometimes OCR gives wrong result.
The proposed system. Guaranteed to read the text present in anywhere for assisting
blind persons. The disadvantage of the system is that spell problem for OCR output.
DEPT OF EEE | TOCE 16
DESIGN OF AN AUTOMATIC READER FOR THE VISUALLY IMPAIRED USING RASPBERRY Pi
the authors proposed a camera-based label reader for blind persons to read any text. A
camera is used for capturing the image of text or board, and then the image is pre-
processed and separates the label from that processed image with the help of open CV
library. After identifying the text, pronunciation of text is happened through voice. A
motion-based method is applied to detect the object or the text which is written on the
board or hoarding or in any places.
proposed a self-assistive device where any live streaming speech is sent to Google
API and after conversion of speech to text, speak the speech via speaker and
displaying the result onto the LCD screen. But a good internet facility is needed for
this method purpose.
the authors designed a voice-based navigation system for blind people in a hospital
environment. With the help of ultrasonic sensors and an RFID reader (which are
interfaced with Raspberry Pi3) an obstacle avoidance system is designed to locate the
exact place in the hospital. Most of the models depend on good internet and most of
these models have OCR problems, due to these shortcomings in our proposed method,
there is nothing related to internet. Also, there are some processing steps for better
result in OCR.
CHAPTER – 9
CIRCUIT DIAGRAM
CHAPTER – 10
ARCHITECTURE OF RASPBERRY Pi
It is a credit card sized minicomputer that plugs into a computer monitor or TV and it uses
standard keyboard and mouse. Raspberry Pi 2 and Raspberry Pi 3 are the 2 models of
Raspberry Pi. The hardware components of the Raspberry Pi include 4 USB ports,40 GPIO
pins for input or output, CSI camera interface, full HDMI port, DSI display interface, SOC
(system on a chip), LAN controller, micro-SD card slot, Bluetooth 4.1, audio jack, and video
socket, 5V micro-USB connector and Ethernet port, power supply. Power Supply Unit
supplies electrical energy to the output loads. In real time the camera feeds its images to a
computer or computer network, often via USB, Ethernet or Wi-Fi. The Raspberry Pi board is
connected to the Projectors, Monitors and TV through a HDMI to VGA Converter.
Raspbian being a free operating system based on Debian developed for the Raspberry Pi
module. The operating system is a set of basic programs and services that helps the Raspberry
Pi run. Many versions of Raspbian are available like Raspbian Stretch and Raspbian Jessie.
As of the latest update Raspbian uses PIXEL, Pi Improved X-Window Environment, and
Lightweight as its fundamental desktop environment.
CHAPTER – 11
HARDWARE IMPLEMENTATION
11.1 Raspberry Pi 3 Model B
Raspberry pi nowadays is very important in embedded system development also its makes
development very fast and one can build the demo project within hours. Here we use
raspberry pi 3 model B as the processing platform as it acts as both processor and controller.
It supports Debian based OS and hence it is portable pocket computer.
The Raspberry pi 3 model B is 10 times more powerful than 1st generation. It has wireless
lan and Bluetooth connectivity. The Raspberry Pi 3 has quad-core ARM Cortex-A53
processor. The Raspberry pi is credit card size single board computer. It has 40 general
purpose input and output pins.
Ethernet port
Consolidated 3.5mm sound jack and
composite video
Camera interface (CSI)
Display Interface (DSI)
micro-SD card space
Raspberry pi 3 camera is used to capture the image of printed text. The camera can be
directly plug into raspberrypi3 model B camera port. It is light weight and portable.
SPECIFICATION:
11.3 SPEAKER/HEADPHONE
A rectangular plywood for hosting the mechanism, Camera, and RIP system all together
tightly.
To get the best quality of image so the ocr can extract all the words clearly.
It is the device that supplies electrical energy to the output loads. And also, it gives a
regulated power supply of +5V with a output current compatibility of 100mA.
A press-button or simply button is a simple switch mechanism for controlling some aspect of
a machine or a process. Press button is used to activate the program and headset for audio
output. Press Buttons are typically made out of very hard materials like plastic or metal. The
surface of button is usually flat or shaped to accommodate the human finger. So it can be
easily depressed or pushed.
CHAPTER – 12
SOFTWARE IMPLEMENTATION
Software is a set of instructions and programme which decides the functionality of hardware.
In our design we have used python 3. also installed OpenCV library, tesseract OCR and gtts.
In our project Raspberry pi is instructed to do the task using python. As it is easy to use and
user-friendly language with lot of features and packages available. There is no need of
installation of it because inpithereis inbuilt software of python 3 which comes with pip 3 as
its package installer.
Input:
(1) C = Clicks/Press the push-button to choose particular Language (English, Bengali and
Hindi)
Output:
3: C = C+1.
5: end if
7: if (B1press = 0) then
9: end if
14: end if
15: end if
20: end if
22: end if
27: end if
28: end if
29: end if
Output:
1: if (I = Exist) then
9: end if
(3) For every Language: cl1=1, cl2=2 and cl3=3; where cl1 for English, cl2 for Bengali and
cl3 for Hindi
Output:
1: if (C = 0) then
3: Exit.
4: end if
5: if (C >3) then
7: Exit.
8: end if
12: end if
16: end if
19: end if
22: end if
Input:
Output:
1: if (O = Exist) then
3: end if
It stands for open-source computer vision library which contains set of algorithms and special
inbuilt functions that handles computer vision. It supports wide variety of programming
languages. The most commonly used is python. We can install open cv library in raspberry pi
by typing “sudo apt-get install python-OpenCV” on command window. Once it is
successfully installed, one can run python code by importing cv2 library of open cv.
In this open CV is used for capturing the image of books page using pi camera and apply it
inbuilt functions for pre- processing like de-skewing, noise removing, binarizing etc. so that
we get clear image for converting it to text by OCR module.
OCR has played important role in this module. OCR or Optical Character Recognition is a
technology which is used to recognize text from printed scanned documents through the
optical mechanism. Tesseract Is an optional character recognition engine. To install it type
“sudo apt-get install tesseract-ocr” command in the terminal.
OCR software is used to convert image into text format. It is a conversion of image of typed
or
handwritten or printed text into machine encoded text. It is use for blind and visually
impaired
people also use for automatic number plate recognition. Tesseract is a type of OCR engine
with
matrix matching. The selection of Tesseract engine is due to its flexibility and extensibility of
machines and the fact that many communities are active researchers to develop this OCR
engine and also due to this reason Tesseract OCR can support149 languages. In this project
we are identifying English alphabets and also the two basic languages Marathi and Hindi.
form of audio output by speaker. TTS, is a form of speech synthesis that converts text
into audio output.
CHAPTER – 13
SIMULATION ENVIRONMENT
The image to text and text to speech conversion is done by the OCR software installed in
raspberry pi. The conversion which is done in OCR can be simulated in MATLAB. The
conversion process in MATLAB includes the following processes.
2. Complementation.
The following image which is captured by the webcam contains the following word. This
image is in the jpeg format which has to be converted into text.
In this section sample image is converted into binary format. The image which was a 3D
image initially is converted to 2D image. Binary 0 represents black colour of the characters.
Binary 1 represents white colour of the characters.
The area of the text is bordered and the boundary for each character is isolated. The boundary
for each character is programmed and it can vary from 0 to 255 bits of characters occupying
memory in the database.
The isolated blocks of characters are segmented and are automatically labelled for identity.
Image segmentation is the process of partitioning a digital image into multiple segments (sets
of pixels, also known as super pixels).
The result of image segmentation is a set of segments that collectively cover the entire image,
or a set of contours extracted from the image (see edge detection). Each of the pixels in a
region are similar with respect to some characteristic or computed property, such as color,
intensity, or texture. Adjacent regions are significantly different with respect to the same
characteristics. Connected-component labelling is used in computer vision to detect
connected regions in binary digital images, although color images and data with higher
dimensionality can also be processed. When integrated into an image recognition system or
human-computer interaction interface, connected component labelling can operate on a
variety of information. Blob extraction is generally performed on the resulting binary image
from a thresholding step. Blobs may be counted, filtered, and tracked.
13.5 FORMING CHARACTER SKELETON
Skeletonization is a process for reducing foreground regions in a binary image to a skeletal
remnant that largely preserves the extent and connectivity of the original region while
throwing away most of the original foreground pixels. To see how this works, imagine that
the foreground regions in the input binary image are made of some uniform slow-burning
material.
Light fires simultaneously at all points along the boundary of this region and watch the fire
move into the interior. At points where the fire travelling from two different boundaries
meets itself, the fire will extinguish itself and the points at which this happens form the so
called `quench line’.
CHAPTER – 14
METHODOLOGY
The power supply is given to the 5V micro-USB connector of raspberry pi through the
Switched Mode Power Supply. The web camera is connected to the USB port of
raspberry pi. The audio output is taken from the audio jack of the raspberry pi. The
Internet is connected to the Ethernet port in raspberry pi. The page to be read is placed on
a base and the camera is focused to capture the image. The captured image is processed.
The captured image is converted to text by the software. The software library converts the
given input to desired language text. The text is converted into speech by using voice to
text conversion module. The final output is taken by speaker. Speaker can also be
CHAPTER – 15
OBJECTIVES
Portable System
CHAPTER – 16
CONCLUSION AND FUTURE WORK
The paper ‘Design of an Automatic Reader for the Visually Impaired using Raspberry Pi’
stepwise improved over some similar type projects. This model is built by using different
parts: image is captured by using a Raspberry Pi camera, recognize the text by using
Tesseract OCR framework and after that read the text through eSpeak TTS.To create a
more efficient model with good outcome, processing and optimization are more important.
Sometimes OCR gives incorrect text due to processing problem and for that reason final
result represents a meaningless text. Due to this reason preprocessing is an important part
in the whole system. Actually, if characters are cleared, big then there is no problem, but,
due to light issue or small character or presence of images in between text gives little bit
unexpected results. In our proposed method due to binarization, it gives better result over
English, Bengali and Hindi languages; also, there is no need of internet.
The proposed model is already in a feasible state of use due to the fixed box type model.
But there are some future works for this model:
• Recognizing process of large texts is sometimes slow, for that reason introducing
of some distributed technique is reserved for future work.
• Now this model can read only English, Bengali and Hindi language, but in future
it should allow many other languages.
• The increment of the menu options which allow the user to play and pause the
audio containing synthesized text.
• If any image or diagram is present in between text, then recognition of that image
or diagram is an important task, which is also reserved for future work.
CHAPTER – 17
REFERENCES
1. AmalJojie, Ashbin George, DhanyaDhanalalNayana J, Book Reader for Blind, IOSR
Journal of Engineering (IOSRJEN)
2. S. Aditi, SP. Annapoorani, A. Kanchana, Book Reader Using Raspberry Pi for Visually
Impaired, International Research Journal of Engineering and Technology (IRJET), Volume
05, Issue 03, March 2018
7. Rahul R. Patil, Adumbral R. Misal, Ketan R. Nalawade, Survey paper on Text Recognition
Using Image Processing, International Journal of Advanced Research in Electronics and
Communication Engineering (IJARECE), Volume 04, Issue 03, March 2015.
8. Praveen Choudhary, Dr.Vipin Kumar Jain, Text Extraction from an Image by using Digital
Image Processing, International Research Journal of Computer Science (IRJCS), Volume 05,
Issue 07, July 2018.
10.AnushGoel,AkashSehrawat,AnkushPatil,PrashantChouguleSupriyaKhatavkar,Raspberry
Pi Based Reader for Blind People, International Research Journal of Engineering and
Technology (IRJET), Volume05, Issue 06,June2018
12. Agalya, A., B. Nagaraj, and K. Rajasekaran. "Concentration control of continuous stirred
tank reactor using particle swarm optimization algorithm." Trans EngSci 1, no. 4 (2013): 57-
63.
13. Aaron James S, Sanjana S, Monisha M, OCR based automatic book reader for the
visually impaired using Raspberry PI, International Journal of Innovative Research in
Computer and Communication Engineering, Volume 04, Issue 7, January 2016.
15. Esra Ali Hassan, Esra Ali Hassan, Smart Glasses for the Visually Impaired People,
Computers Helping People with Special Needs, 15th International Conference, ICCHP, July
2016.