0% found this document useful (0 votes)

3 views

Module # 10C - Text Recognition with Tesseract OCR

The document provides a comprehensive guide on Optical Character Recognition (OCR) using Tesseract, detailing its processes, installation instructions for different platforms, and code examples for text recognition from images. It explains the functionality of Tesseract, including its neural network capabilities and configuration options for language and segmentation modes. Additionally, it covers integrating text-to-speech conversion with recognized text using the pyttsx3 library.

Uploaded by

Haya

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Module # 10C - Text Recognition with Tesseract OCR

Uploaded by

Haya

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Optical Character Recognition (OCR)

Text Recognition with Tesseract

RASPBERRY PI COURSE GUIDE

thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com

Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract

Optical Character Recognition (OCR):

OCR = Optical Character Recognition. In other words, OCR systems transform a two-
dimensional image of text, that could contain machine printed or handwritten text from
its image representation into machine-readable text. OCR as a process generally
consists of several sub-processes to perform as accurately as possible. The sub-
processes are:

 Preprocessing of the Image

 Text Localization
 Character Segmentation
 Character Recognition
 Post Processing

What is Tesseract OCR?

Tesseract is an open source text recognition (OCR) Engine, available under the Apache
2.0 license. It can be used directly, or (for programmers) using an API to extract printed
text from images. It supports a wide variety of languages.
It can be used with the existing layout analysis to recognize text within a large
document, or it can be used in conjunction with an external text detector to recognize
text from an image of a single text line.
Tesseract 4.00 includes a new neural network subsystem configured as a text line
recognizer. To recognize an image containing a single character, we typically use a
Convolutional Neural Network (CNN). Text of arbitrary length is a sequence of
characters, and such problems are solved using RNNs and LSTM is a popular form of
RNN.

thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com

Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract

Legacy Tesseract 3.x was dependant on the multi-stage process where we can
differentiate steps:

 Word finding
 Line finding
 Character classification

To install tesseract in laptop use the following commands in Anaconda Command

Prompt, make sure you are in the same environment in which OpenCV is installed.

conda install -c conda-forge tesseract

-c conda-forge pytesseract

thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com

Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract

To install tesseract in Raspberry Pi, type the following commands in CLI of Raspberry
Pi, make sure you are in the same environment in which OpenCV is installed.
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
sudo pip install pytesseract

To check Tesseract's installation, type the following command in the terminal:

tesseract –version

Code for Text Recognition from a Saved Picture:

import pytesseract
from PIL import Image
import cv2

img = cv2.imread('para.jpg',cv2.IMREAD_COLOR)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #convert
to grey to reduce detials
gray = cv2.bilateralFilter(gray, 11, 17, 17)
original = pytesseract.image_to_string(gray, config='')
print (original)

Before running the above code, make sure that you have saved an image with jpg
extension named as para in your root folder. As in line 5 of the code ‘para.jpg’ is being
read.

thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com

Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract

If we want to convert our recognized text into speech then we are required to use a text-
to-speech converter. For that we can install pyttsx3 through the following command:
1. Go to Anaconda prompt and type conda install pip . This will install pip in the
current conda environment.

2. After step 1, type pip install pyttsx3.

To check the installation, run the below code in your Jupyter Notebook and you will hear
a voice saying ‘I will speak this text’

import pyttsx3
engine = pyttsx3.init()
engine.say("I will speak this text")
engine.runAndWait()

Now by adding few extra lines of code we can convert our recognized text into speech.
Hence applying OCR + TTS Technique.
import pytesseract
from PIL import Image
import cv2
import pyttsx3;

engine = pyttsx3.init();

thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com

Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract

You can give three important flags for tesseract to work and these are -l , --oem , --psm.

 The -l flag controls the language of the input text.

 The --oem argument, or OCR Engine Mode, controls the type of algorithm used by
Tesseract.
 The --psm controls the automatic Page Segmentation Mode used by Tesseract.

It can be used like this with .image_to_string method of tesseract (used in 2nd last
line of 1st code):

config = ("-l eng --oem 1 --psm 7")

original = pytesseract.image_to_string(gray, config="-l eng --

oem 1 --psm 7")

By default, Tesseract expects a page of text when it segments an image. If you're just
seeking to OCR a small region, try a different segmentation mode, using the --psm
argument. There are 14 modes available which can be found here. By default,
Tesseract fully automates the page segmentation but does not perform orientation and
script detection.

 PSM – Page Segmentation Mode

 OEM (type of algorithm used by Tesseract)

thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com

Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract

There is also one more important argument, OCR engine mode (oem). Tesseract 4 has
two OCR engines — Legacy Tesseract engine and LSTM engine. There are four modes
of operation chosen using the --oem option.

OEM Mode:

0 Legacy engine only.

1 Neural nets LSTM engine only.
2 Legacy + LSTM engines.
3 Default, based on what is available.

Page segmentation modes

There are several ways a page of text can be analysed. The tesseract api provides
several page segmentation modes if you want to run OCR on only a small region or in
different orientations, etc.

Here's a list of the supported page segmentation modes by tesseract -

0 Orientation and script detection (OSD) only.

1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR.
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
11 Sparse text. Find as much text as possible in no particular order.
12 Sparse text with OSD.
13 Raw line. Treat the image as a single text line, bypassing hacks that are
Tesseract-specific.

To change your page segmentation mode, change the --psm argument in your custom
config string to any of the above mentioned mode codes.

thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com

Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract

Code for Text Recognition with Raspberry Pi Camera:

import cv2
import pytesseract
from picamera.array import PiRGBArray
from picamera import PiCamera

camera = PiCamera()
camera.resolution = (640, 480)
camera.framerate = 30

rawCapture = PiRGBArray(camera, size=(640, 480))

for frame in camera.capture_continuous(rawCapture,

format="bgr", use_video_port=True):
image = frame.array
cv2.imshow("Frame", image)
key = cv2.waitKey(1) & 0xFF

rawCapture.truncate(0)

if key == ord("s"):
text = pytesseract.image_to_string(image)
print(text)
cv2.imshow("Frame", image)
cv2.waitKey(0)
break

cv2.destroyAllWindows()

thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com

Email: academy@thingsroam.com

Ux Designer Resume
No ratings yet
Ux Designer Resume
1 page
Deep Learning R18 Jntuh Lab Manual
0% (1)
Deep Learning R18 Jntuh Lab Manual
21 pages
ITP For Installation & Leakage Test For HVAC Ducts and Accessories
No ratings yet
ITP For Installation & Leakage Test For HVAC Ducts and Accessories
1 page
Build Your Own Optical Character Recognition (Ocr) System Using Google'S Tesseract and Opencv
No ratings yet
Build Your Own Optical Character Recognition (Ocr) System Using Google'S Tesseract and Opencv
10 pages
Ocr Nanonets Tesseract
No ratings yet
Ocr Nanonets Tesseract
39 pages
BIS RaysRecog
No ratings yet
BIS RaysRecog
16 pages
Optical Character Recognition by Open Source OCR Tool Tesseract A Case Study
No ratings yet
Optical Character Recognition by Open Source OCR Tool Tesseract A Case Study
7 pages
Optical Character Recognition
No ratings yet
Optical Character Recognition
27 pages
98DSP-PPT
No ratings yet
98DSP-PPT
8 pages
Image Captioning Using CNN and LSTM
No ratings yet
Image Captioning Using CNN and LSTM
9 pages
We Used Tesseract OCR For Train The Data and Recognize The Character From Digital Image Under The Apache 2
No ratings yet
We Used Tesseract OCR For Train The Data and Recognize The Character From Digital Image Under The Apache 2
1 page
Optical Character Recognition by Open Source OCR Tool Tesseract: A Case Study
No ratings yet
Optical Character Recognition by Open Source OCR Tool Tesseract: A Case Study
8 pages
Helmet and Vehicle License Plate Detection System
No ratings yet
Helmet and Vehicle License Plate Detection System
26 pages
Study Materials - Denoising Autoencoders
No ratings yet
Study Materials - Denoising Autoencoders
7 pages
Python Tesseract
No ratings yet
Python Tesseract
2 pages
Ocr Based Auto Navigation of Robot by Reading The Sign Board
No ratings yet
Ocr Based Auto Navigation of Robot by Reading The Sign Board
5 pages
Loco R
No ratings yet
Loco R
53 pages
AutoML Tools
No ratings yet
AutoML Tools
2 pages
TCS Ocr
No ratings yet
TCS Ocr
39 pages
Ocr
No ratings yet
Ocr
19 pages
Creating Optical Character Recognition (OCR) Applications Using Neural Networks - CodeProject
No ratings yet
Creating Optical Character Recognition (OCR) Applications Using Neural Networks - CodeProject
7 pages
Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects For Blind Persons
No ratings yet
Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects For Blind Persons
5 pages
Tesseract
No ratings yet
Tesseract
6 pages
deep-learning-r18-jntuh-lab-manual
No ratings yet
deep-learning-r18-jntuh-lab-manual
20 pages
ppt for presentation
No ratings yet
ppt for presentation
27 pages
Chapter 1
No ratings yet
Chapter 1
4 pages
Lecture 2.3.5lstmencoders
No ratings yet
Lecture 2.3.5lstmencoders
9 pages
Final Project Thesis
No ratings yet
Final Project Thesis
7 pages
Csteph Project Final Report
No ratings yet
Csteph Project Final Report
13 pages
Sensing, Capturing, Sending To Website
No ratings yet
Sensing, Capturing, Sending To Website
3 pages
Ogre Fmx09
No ratings yet
Ogre Fmx09
32 pages
Tesseract Ocr
No ratings yet
Tesseract Ocr
3 pages
Traffic Sign Classification: BY Qasim Lakdawala (19BT04020) Husain Kalolwala (19BT04016)
No ratings yet
Traffic Sign Classification: BY Qasim Lakdawala (19BT04020) Husain Kalolwala (19BT04016)
18 pages
Assignment 2 MLDS Lab
No ratings yet
Assignment 2 MLDS Lab
3 pages
aaasbm_python_PROJECT_arya_sir[1]...document
No ratings yet
aaasbm_python_PROJECT_arya_sir[1]...document
10 pages
ML Number Plate
No ratings yet
ML Number Plate
21 pages
ABSTRACT
No ratings yet
ABSTRACT
2 pages
Image Convert to Text
No ratings yet
Image Convert to Text
16 pages
Performance Characterization and Acceleration of Optical Character Recognition On Handheld Platforms
No ratings yet
Performance Characterization and Acceleration of Optical Character Recognition On Handheld Platforms
10 pages
A Gentle Introduction To LSTM Autoencoders
No ratings yet
A Gentle Introduction To LSTM Autoencoders
74 pages
An Experimental Performance Analysis On Robotics Process Automation (RPA) With Open Source OCR Engines: Microsoft Ocr and Google Tesseract OCR
No ratings yet
An Experimental Performance Analysis On Robotics Process Automation (RPA) With Open Source OCR Engines: Microsoft Ocr and Google Tesseract OCR
10 pages
Optical Character Recognition Using MATLAB: Sandeep Tiwari, Shivangi Mishra, Priyank Bhatia, Praveen Km. Yadav
No ratings yet
Optical Character Recognition Using MATLAB: Sandeep Tiwari, Shivangi Mishra, Priyank Bhatia, Praveen Km. Yadav
4 pages
Presentation 1
No ratings yet
Presentation 1
14 pages
Design Phase
No ratings yet
Design Phase
10 pages
Entity Search
No ratings yet
Entity Search
6 pages
Optical Character Recognition (OCR) : 17bit110 Soham Modi
No ratings yet
Optical Character Recognition (OCR) : 17bit110 Soham Modi
2 pages
Safety For Drivers Using OpenCV in Python
No ratings yet
Safety For Drivers Using OpenCV in Python
8 pages
Ocr Gtts
No ratings yet
Ocr Gtts
49 pages
Text Prior Guided Scene Text Image
No ratings yet
Text Prior Guided Scene Text Image
18 pages
Creating SimpleOCR Application
No ratings yet
Creating SimpleOCR Application
9 pages
A Gentle Introduction To LSTM Autoencoders
No ratings yet
A Gentle Introduction To LSTM Autoencoders
16 pages
Number Plate Recognition Using Ocr Techn
No ratings yet
Number Plate Recognition Using Ocr Techn
5 pages
Autoencoders: Presented By: 2019220013 Balde Lansana (
No ratings yet
Autoencoders: Presented By: 2019220013 Balde Lansana (
21 pages
Keras1 - 1.4 Advanced Model Architectures
No ratings yet
Keras1 - 1.4 Advanced Model Architectures
11 pages
Raspberry Pi
No ratings yet
Raspberry Pi
21 pages
Welcome To The Project Presentation On Number Plate
100% (1)
Welcome To The Project Presentation On Number Plate
21 pages
Sign
No ratings yet
Sign
23 pages
Fujitake DTrOCR Decoder-Only Transformer For Optical Character Recognition WACV 2024 Paper
No ratings yet
Fujitake DTrOCR Decoder-Only Transformer For Optical Character Recognition WACV 2024 Paper
11 pages
Package Tesseract': July 25, 2019
No ratings yet
Package Tesseract': July 25, 2019
5 pages
2005.08641v1(lalit)
No ratings yet
2005.08641v1(lalit)
6 pages
Optical Character Recognition (OCR) For Printed Devnagari Script UsingArtificial Neural Network
No ratings yet
Optical Character Recognition (OCR) For Printed Devnagari Script UsingArtificial Neural Network
5 pages
Optical Character Recognition: Fundamentals and Applications
From Everand
Optical Character Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
FCS Complete
No ratings yet
FCS Complete
160 pages
SNS PROBLEMS
No ratings yet
SNS PROBLEMS
7 pages
CamScanner 07-10-2024 11.19
No ratings yet
CamScanner 07-10-2024 11.19
113 pages
Day 02 (Module # 2C -Part-1)
No ratings yet
Day 02 (Module # 2C -Part-1)
8 pages
EL-301 ass
No ratings yet
EL-301 ass
6 pages
Activity
No ratings yet
Activity
1 page
Ghazwa Uhud
No ratings yet
Ghazwa Uhud
2 pages
UiPath - Blank PDD Template
No ratings yet
UiPath - Blank PDD Template
11 pages
Jacobides Et Al 2021 The Evolutionary Dynamics of The Artificial Intelligence Ecosystem
No ratings yet
Jacobides Et Al 2021 The Evolutionary Dynamics of The Artificial Intelligence Ecosystem
25 pages
175 Electronic
No ratings yet
175 Electronic
8 pages
Graphical User Interface: Aula 4 - Sis - Inglês Text: Keyboard and Mouse
No ratings yet
Graphical User Interface: Aula 4 - Sis - Inglês Text: Keyboard and Mouse
7 pages
ME 153 2SAY1617 Syllabus
No ratings yet
ME 153 2SAY1617 Syllabus
2 pages
S3 Ict Paper One PDF
No ratings yet
S3 Ict Paper One PDF
7 pages
QR Codes in Payments Final
No ratings yet
QR Codes in Payments Final
32 pages
MODULE 4 Functions and Pointers
No ratings yet
MODULE 4 Functions and Pointers
19 pages
What Is LG3
No ratings yet
What Is LG3
2 pages
Week 10
No ratings yet
Week 10
14 pages
Input Device Group - 105152
No ratings yet
Input Device Group - 105152
17 pages
Refrigeration Trainer MANDRA Experimental Design
No ratings yet
Refrigeration Trainer MANDRA Experimental Design
9 pages
Toshiba G3 Manuals
No ratings yet
Toshiba G3 Manuals
445 pages
R1Plus - User Manual - ENGv2 - 3
No ratings yet
R1Plus - User Manual - ENGv2 - 3
128 pages
101 Bank Management System Synopsis
No ratings yet
101 Bank Management System Synopsis
6 pages
Grade 8 - Exampler Nov 2023
No ratings yet
Grade 8 - Exampler Nov 2023
11 pages
GE Slimline PowerSPS - CPB SPS - 201411 PDF
No ratings yet
GE Slimline PowerSPS - CPB SPS - 201411 PDF
16 pages
Ticket PDF
No ratings yet
Ticket PDF
2 pages
I2c - Master Code
No ratings yet
I2c - Master Code
4 pages
Screenshot 2023-06-13 at 04.51.24
No ratings yet
Screenshot 2023-06-13 at 04.51.24
15 pages
MobColl - SRS 1.0 PDF
No ratings yet
MobColl - SRS 1.0 PDF
19 pages
Pioneer cdj100s Service Manual
No ratings yet
Pioneer cdj100s Service Manual
48 pages
Woredanet-Ethiopian Government Network
100% (1)
Woredanet-Ethiopian Government Network
10 pages
12V200AH
No ratings yet
12V200AH
2 pages
Passwords
No ratings yet
Passwords
5 pages
Kia Motors Corporation 12 Heolleung-Ro, Seocho-Gu, Seoul, 137-938, Korea
100% (1)
Kia Motors Corporation 12 Heolleung-Ro, Seocho-Gu, Seoul, 137-938, Korea
12 pages
IEC Categories Contactor PDF
No ratings yet
IEC Categories Contactor PDF
1 page
Satchwell: Universal Pressure Switch
No ratings yet
Satchwell: Universal Pressure Switch
4 pages