Module # 10C - Text Recognition with Tesseract OCR
Module # 10C - Text Recognition with Tesseract OCR
Legacy Tesseract 3.x was dependant on the multi-stage process where we can
differentiate steps:
Word finding
Line finding
Character classification
To install tesseract in Raspberry Pi, type the following commands in CLI of Raspberry
Pi, make sure you are in the same environment in which OpenCV is installed.
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
sudo pip install pytesseract
tesseract –version
import pytesseract
from PIL import Image
import cv2
img = cv2.imread('para.jpg',cv2.IMREAD_COLOR)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #convert
to grey to reduce detials
gray = cv2.bilateralFilter(gray, 11, 17, 17)
original = pytesseract.image_to_string(gray, config='')
print (original)
Before running the above code, make sure that you have saved an image with jpg
extension named as para in your root folder. As in line 5 of the code ‘para.jpg’ is being
read.
If we want to convert our recognized text into speech then we are required to use a text-
to-speech converter. For that we can install pyttsx3 through the following command:
1. Go to Anaconda prompt and type conda install pip . This will install pip in the
current conda environment.
To check the installation, run the below code in your Jupyter Notebook and you will hear
a voice saying ‘I will speak this text’
import pyttsx3
engine = pyttsx3.init()
engine.say("I will speak this text")
engine.runAndWait()
Now by adding few extra lines of code we can convert our recognized text into speech.
Hence applying OCR + TTS Technique.
import pytesseract
from PIL import Image
import cv2
import pyttsx3;
engine = pyttsx3.init();
img = cv2.imread('para.jpg',cv2.IMREAD_COLOR)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #convert
to grey to reduce detials
gray = cv2.bilateralFilter(gray, 11, 17, 17)
original = pytesseract.image_to_string(gray, config='')
print (original)
engine.say(original);
engine.runAndWait() ;
You can give three important flags for tesseract to work and these are -l , --oem , --psm.
It can be used like this with .image_to_string method of tesseract (used in 2nd last
line of 1st code):
By default, Tesseract expects a page of text when it segments an image. If you're just
seeking to OCR a small region, try a different segmentation mode, using the --psm
argument. There are 14 modes available which can be found here. By default,
Tesseract fully automates the page segmentation but does not perform orientation and
script detection.
There is also one more important argument, OCR engine mode (oem). Tesseract 4 has
two OCR engines — Legacy Tesseract engine and LSTM engine. There are four modes
of operation chosen using the --oem option.
OEM Mode:
There are several ways a page of text can be analysed. The tesseract api provides
several page segmentation modes if you want to run OCR on only a small region or in
different orientations, etc.
To change your page segmentation mode, change the --psm argument in your custom
config string to any of the above mentioned mode codes.
import cv2
import pytesseract
from picamera.array import PiRGBArray
from picamera import PiCamera
camera = PiCamera()
camera.resolution = (640, 480)
camera.framerate = 30
rawCapture.truncate(0)
if key == ord("s"):
text = pytesseract.image_to_string(image)
print(text)
cv2.imshow("Frame", image)
cv2.waitKey(0)
break
cv2.destroyAllWindows()