Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Module # 10C - Text Recognition with Tesseract OCR

The document provides a comprehensive guide on Optical Character Recognition (OCR) using Tesseract, detailing its processes, installation instructions for different platforms, and code examples for text recognition from images. It explains the functionality of Tesseract, including its neural network capabilities and configuration options for language and segmentation modes. Additionally, it covers integrating text-to-speech conversion with recognized text using the pyttsx3 library.

Uploaded by

Haya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Module # 10C - Text Recognition with Tesseract OCR

The document provides a comprehensive guide on Optical Character Recognition (OCR) using Tesseract, detailing its processes, installation instructions for different platforms, and code examples for text recognition from images. It explains the functionality of Tesseract, including its neural network capabilities and configuration options for language and segmentation modes. Additionally, it covers integrating text-to-speech conversion with recognized text using the pyttsx3 library.

Uploaded by

Haya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Optical Character Recognition (OCR)

Text Recognition with Tesseract

RASPBERRY PI COURSE GUIDE

thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com


Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract

Optical Character Recognition (OCR):


OCR = Optical Character Recognition. In other words, OCR systems transform a two-
dimensional image of text, that could contain machine printed or handwritten text from
its image representation into machine-readable text. OCR as a process generally
consists of several sub-processes to perform as accurately as possible. The sub-
processes are:

 Preprocessing of the Image


 Text Localization
 Character Segmentation
 Character Recognition
 Post Processing

What is Tesseract OCR?


Tesseract is an open source text recognition (OCR) Engine, available under the Apache
2.0 license. It can be used directly, or (for programmers) using an API to extract printed
text from images. It supports a wide variety of languages.
It can be used with the existing layout analysis to recognize text within a large
document, or it can be used in conjunction with an external text detector to recognize
text from an image of a single text line.
Tesseract 4.00 includes a new neural network subsystem configured as a text line
recognizer. To recognize an image containing a single character, we typically use a
Convolutional Neural Network (CNN). Text of arbitrary length is a sequence of
characters, and such problems are solved using RNNs and LSTM is a popular form of
RNN.

thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com


Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract

Legacy Tesseract 3.x was dependant on the multi-stage process where we can
differentiate steps:

 Word finding
 Line finding
 Character classification

To install tesseract in laptop use the following commands in Anaconda Command


Prompt, make sure you are in the same environment in which OpenCV is installed.

conda install -c conda-forge tesseract


-c conda-forge pytesseract

thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com


Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract

To install tesseract in Raspberry Pi, type the following commands in CLI of Raspberry
Pi, make sure you are in the same environment in which OpenCV is installed.
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
sudo pip install pytesseract

To check Tesseract's installation, type the following command in the terminal:

tesseract –version

Code for Text Recognition from a Saved Picture:

import pytesseract
from PIL import Image
import cv2

img = cv2.imread('para.jpg',cv2.IMREAD_COLOR)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #convert
to grey to reduce detials
gray = cv2.bilateralFilter(gray, 11, 17, 17)
original = pytesseract.image_to_string(gray, config='')
print (original)

Before running the above code, make sure that you have saved an image with jpg
extension named as para in your root folder. As in line 5 of the code ‘para.jpg’ is being
read.

thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com


Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract

If we want to convert our recognized text into speech then we are required to use a text-
to-speech converter. For that we can install pyttsx3 through the following command:
1. Go to Anaconda prompt and type conda install pip . This will install pip in the
current conda environment.

2. After step 1, type pip install pyttsx3.

To check the installation, run the below code in your Jupyter Notebook and you will hear
a voice saying ‘I will speak this text’

import pyttsx3
engine = pyttsx3.init()
engine.say("I will speak this text")
engine.runAndWait()

Now by adding few extra lines of code we can convert our recognized text into speech.
Hence applying OCR + TTS Technique.
import pytesseract
from PIL import Image
import cv2
import pyttsx3;

engine = pyttsx3.init();

img = cv2.imread('para.jpg',cv2.IMREAD_COLOR)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #convert
to grey to reduce detials
gray = cv2.bilateralFilter(gray, 11, 17, 17)
original = pytesseract.image_to_string(gray, config='')
print (original)
engine.say(original);
engine.runAndWait() ;

thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com


Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract

You can give three important flags for tesseract to work and these are -l , --oem , --psm.

 The -l flag controls the language of the input text.


 The --oem argument, or OCR Engine Mode, controls the type of algorithm used by
Tesseract.
 The --psm controls the automatic Page Segmentation Mode used by Tesseract.

It can be used like this with .image_to_string method of tesseract (used in 2nd last
line of 1st code):

config = ("-l eng --oem 1 --psm 7")

original = pytesseract.image_to_string(gray, config="-l eng --


oem 1 --psm 7")

By default, Tesseract expects a page of text when it segments an image. If you're just
seeking to OCR a small region, try a different segmentation mode, using the --psm
argument. There are 14 modes available which can be found here. By default,
Tesseract fully automates the page segmentation but does not perform orientation and
script detection.

 PSM – Page Segmentation Mode


 OEM (type of algorithm used by Tesseract)

thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com


Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract

There is also one more important argument, OCR engine mode (oem). Tesseract 4 has
two OCR engines — Legacy Tesseract engine and LSTM engine. There are four modes
of operation chosen using the --oem option.

OEM Mode:

0 Legacy engine only.


1 Neural nets LSTM engine only.
2 Legacy + LSTM engines.
3 Default, based on what is available.

Page segmentation modes

There are several ways a page of text can be analysed. The tesseract api provides
several page segmentation modes if you want to run OCR on only a small region or in
different orientations, etc.

Here's a list of the supported page segmentation modes by tesseract -

0 Orientation and script detection (OSD) only.


1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR.
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
11 Sparse text. Find as much text as possible in no particular order.
12 Sparse text with OSD.
13 Raw line. Treat the image as a single text line, bypassing hacks that are
Tesseract-specific.

To change your page segmentation mode, change the --psm argument in your custom
config string to any of the above mentioned mode codes.

thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com


Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract

Code for Text Recognition with Raspberry Pi Camera:

import cv2
import pytesseract
from picamera.array import PiRGBArray
from picamera import PiCamera

camera = PiCamera()
camera.resolution = (640, 480)
camera.framerate = 30

rawCapture = PiRGBArray(camera, size=(640, 480))

for frame in camera.capture_continuous(rawCapture,


format="bgr", use_video_port=True):
image = frame.array
cv2.imshow("Frame", image)
key = cv2.waitKey(1) & 0xFF

rawCapture.truncate(0)

if key == ord("s"):
text = pytesseract.image_to_string(image)
print(text)
cv2.imshow("Frame", image)
cv2.waitKey(0)
break

cv2.destroyAllWindows()

thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com


Email: academy@thingsroam.com

You might also like