Code Snippets

This document contains 3 examples of using the Python Tesseract OCR library. The first example loads an image, initializes the Tesseract API, and prints the recognized text and confidence level. The second example initializes the API, sets character whitelisting, performs OCR on a buffer, and prints the word confidence data. The third example enhances an image using edge detection, blurring, and thresholding before using Tesseract to recognize the text.

Uploaded by

FIKRUL ISLAMY

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views

Code Snippets

Uploaded by

FIKRUL ISLAMY

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

python-tesseract

Code Snippets
Example 1
import cv2.cv as cv import tesseract image=cv.LoadImage("foo.png", cv.CV_LOAD_IMAGE_GRAYSCALE) api = tesseract.TessBaseAPI() api.Init(".","eng",tesseract.OEM_DEFAULT) #api.SetPageSegMode(tesseract.PSM_SINGLE_WORD) api.SetPageSegMode(tesseract.PSM_AUTO) tesseract.SetCvImage(image,api) text=api.GetUTF8Text() conf=api.MeanTextConf() print text

Example 2
import tesseract api = tesseract.TessBaseAPI() api.Init(".","eng",tesseract.OEM_DEFAULT) api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz") api.SetPageSegMode(tesseract.PSM_AUTO) mImgFile = "eurotext.jpg" mBuffer=open(mImgFile,"rb").read() result = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api) print "result(ProcessPagesBuffer)=",result intPtr=api.AllWordConfidences() print str(intPtr) pyPtrRaw=tesseract.cdata(intPtr,300) print len(pyPtrRaw),len(mBuffer) pyPtr=[ord(data) for i,data in enumerate(pyPtrRaw)] print pyPtr tesseract.delete_intp(intPtr)

Example 3 How to do image edge enhancing and tesseract ocr

import import import import tesseract cv2 cv2.cv as cv numpy as np

scale = 1 delta = 0 ddepth = cv2.CV_16S gray=cv2.imread("an91cut.jpg")

### trim the edges cut_offset=23 gray=gray[cut_offset:-cut_offset,cut_offset:-cut_offset] ### convert to gray color gray = cv2.cvtColor(gray,cv2.COLOR_BGR2GRAY) ### edge enhancing by Sobeling # Gradient-X grad_x = cv2.Sobel(gray,ddepth,1,0,ksize = 3, scale = scale, delta = delta,borderType = cv2.BORDER_DEFAULT) #grad_x = cv2.Scharr(gray,ddepth,1,0) # Gradient-Y grad_y = cv2.Sobel(gray,ddepth,0,1,ksize = 3, scale = scale, delta = delta, borderType = cv2.BORDER_DEFAULT) #grad_y = cv2.Scharr(gray,ddepth,0,1)
abs_grad_x = cv2.convertScaleAbs(grad_x) # converting back to uint8 abs_grad_y = cv2.convertScaleAbs(grad_y) gray = cv2.addWeighted(abs_grad_x,0.5,abs_grad_y,0.5,0)

### Bluring image1 = cv2.medianBlur(gray,5) image1[image1 < 50]= 255 image1 = cv2.GaussianBlur(image1,(31,13),0) color_offset=230 image1[image1 >= color_offset]= 255 image1[image1 < color_offset ] = 0 #black #### Insert White Border offset=30

1 of 3

2/4/2014 5:24 PM

CodeSnippets - python-tesseract - Reusable Code Snippets - python w...

http://code.google.com/p/python-tesseract/wiki/CodeSnippets

height,width = image1.shape image1=cv2.copyMakeBorder(image1,offset,offset,offset,offset,cv2.BORDER_CONSTANT,value=(255,255,255)) cv2.namedWindow("Test") cv2.imshow("Test", image1) cv2.imwrite("an91cut_decoded.jpg",image1) cv2.waitKey(0) cv2.destroyWindow("Test") ### tesseract OCR api = tesseract.TessBaseAPI() api.Init(".","eng",tesseract.OEM_DEFAULT) #api.SetPageSegMode(tesseract.PSM_AUTO) #as suggested by zdenko podobny <zdenop@gmail.com>, #using PSM_SINGLE_BLOCK will be more reliable for ocr-ing a line of word. api.SetPageSegMode(tesseract.PSM_SINGLE_BLOCK) height1,width1 = image1.shape channel1=1 image = cv.CreateImageHeader((width1,height1), cv.IPL_DEPTH_8U, channel1) cv.SetData(image, image1.tostring(),image1.dtype.itemsize * channel1 * (width1)) tesseract.SetCvImage(image,api) text=api.GetUTF8Text() conf=api.MeanTextConf() image=None print "..............." print "Ocred Text: %s"%text print "Cofidence Level: %d %%"%conf

an91cut.jpg enhanced image-->

2 of 3

2/4/2014 5:24 PM

opencv cheatsheet
No ratings yet
opencv cheatsheet
60 pages
Opencv 4.X Cheat Sheet (Python Version) : Filtering
No ratings yet
Opencv 4.X Cheat Sheet (Python Version) : Filtering
2 pages
Python Tesseract
No ratings yet
Python Tesseract
2 pages
We Used Tesseract OCR For Train The Data and Recognize The Character From Digital Image Under The Apache 2
No ratings yet
We Used Tesseract OCR For Train The Data and Recognize The Character From Digital Image Under The Apache 2
1 page
madmaze_pytesseract_ A Python wrapper for Google Tesseract
No ratings yet
madmaze_pytesseract_ A Python wrapper for Google Tesseract
5 pages
Iqjaqokskss
No ratings yet
Iqjaqokskss
3 pages
Module # 10C - Text Recognition with Tesseract OCR
No ratings yet
Module # 10C - Text Recognition with Tesseract OCR
8 pages
Setting Up A Simple OCR Server: by Real Python 37 Comments
No ratings yet
Setting Up A Simple OCR Server: by Real Python 37 Comments
8 pages
Extracting Text From Scanned PDF Using Pytesseract & Open CV
No ratings yet
Extracting Text From Scanned PDF Using Pytesseract & Open CV
9 pages
Tesseract
No ratings yet
Tesseract
6 pages
Package Tesseract': July 25, 2019
No ratings yet
Package Tesseract': July 25, 2019
5 pages
OpenCV OCR and Text Recognition With Tesseract - PyImageSearch
No ratings yet
OpenCV OCR and Text Recognition With Tesseract - PyImageSearch
65 pages
OpenCV - Cheatsheet
100% (1)
OpenCV - Cheatsheet
12 pages
Python Project
No ratings yet
Python Project
2 pages
opencv cheatsheet
No ratings yet
opencv cheatsheet
65 pages
Optical Character Recognition Research: Index
No ratings yet
Optical Character Recognition Research: Index
6 pages
Installing and Using Tesseract 500 OCRFINAL
No ratings yet
Installing and Using Tesseract 500 OCRFINAL
4 pages
TP02 - Image Processing Using Python-OpenCV
No ratings yet
TP02 - Image Processing Using Python-OpenCV
3 pages
ALCANTARAuLaboratory-6-Image-Processing-Student_031006
No ratings yet
ALCANTARAuLaboratory-6-Image-Processing-Student_031006
9 pages
Python Quebrar Captch Python Ocr
No ratings yet
Python Quebrar Captch Python Ocr
4 pages
Ocr Nanonets Tesseract
No ratings yet
Ocr Nanonets Tesseract
39 pages
CV-Mini Project 2
No ratings yet
CV-Mini Project 2
15 pages
Optical Character Recognition (OCR) in Python
No ratings yet
Optical Character Recognition (OCR) in Python
110 pages
CV - Expt2
No ratings yet
CV - Expt2
21 pages
Drawing Functions
No ratings yet
Drawing Functions
23 pages
Exp.3
No ratings yet
Exp.3
21 pages
Remove Text from Images using CV2 and Keras-OCR _ by Carlo Borella _ Towards Data Science
No ratings yet
Remove Text from Images using CV2 and Keras-OCR _ by Carlo Borella _ Towards Data Science
18 pages
Installing and Using Tesseract OCR PDF
100% (1)
Installing and Using Tesseract OCR PDF
5 pages
Google Group Tesseract Ocr
No ratings yet
Google Group Tesseract Ocr
3 pages
ip_lab_manual
No ratings yet
ip_lab_manual
7 pages
18DIP Lab 2
No ratings yet
18DIP Lab 2
11 pages
Open CV
No ratings yet
Open CV
9 pages
Project Guidelines_ AIML
No ratings yet
Project Guidelines_ AIML
30 pages
LẬP TRÌNH XỬ LÝ ẢNH
No ratings yet
LẬP TRÌNH XỬ LÝ ẢNH
8 pages
Emgucv - OCRForm - Cs at Master Emgucv - Emgucv GitHub
No ratings yet
Emgucv - OCRForm - Cs at Master Emgucv - Emgucv GitHub
8 pages
MVS_Prac_4
No ratings yet
MVS_Prac_4
7 pages
Build Your Own Optical Character Recognition (Ocr) System Using Google'S Tesseract and Opencv
No ratings yet
Build Your Own Optical Character Recognition (Ocr) System Using Google'S Tesseract and Opencv
10 pages
Tesseract Ocr
No ratings yet
Tesseract Ocr
3 pages
Watermark Images: Image Processing - Opencv, Python & C++ By: Rahul Kedia
No ratings yet
Watermark Images: Image Processing - Opencv, Python & C++ By: Rahul Kedia
9 pages
IP_LAB[1]
No ratings yet
IP_LAB[1]
8 pages
Word Extraction-1
No ratings yet
Word Extraction-1
2 pages
CV Lab Manual
No ratings yet
CV Lab Manual
45 pages
CV Lab 1
No ratings yet
CV Lab 1
7 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
REF2 - Basic Image Processing
No ratings yet
REF2 - Basic Image Processing
18 pages
98DSP-PPT
No ratings yet
98DSP-PPT
8 pages
Introduction To Computer Vision by Dylan Seychell
No ratings yet
Introduction To Computer Vision by Dylan Seychell
35 pages
DRASHTI_CVML
No ratings yet
DRASHTI_CVML
83 pages
biometric-lab
No ratings yet
biometric-lab
21 pages
LAB1
No ratings yet
LAB1
7 pages
AI for CV labmanual
No ratings yet
AI for CV labmanual
23 pages
DIP Lab Manual No 03
No ratings yet
DIP Lab Manual No 03
11 pages
vertopal.com_Copy of CVPR1 (1)
No ratings yet
vertopal.com_Copy of CVPR1 (1)
7 pages
Ahsbsdns
No ratings yet
Ahsbsdns
1 page
Optical Character Recognition by Open Source OCR Tool Tesseract A Case Study
No ratings yet
Optical Character Recognition by Open Source OCR Tool Tesseract A Case Study
7 pages
How To
No ratings yet
How To
2 pages
Object Detection Withtensorflow: D. Hari Vamshi V. Raju U. Laxman
No ratings yet
Object Detection Withtensorflow: D. Hari Vamshi V. Raju U. Laxman
25 pages
Digital Image Processing: 1 Objectives
No ratings yet
Digital Image Processing: 1 Objectives
8 pages
P6 - Computer Vision
No ratings yet
P6 - Computer Vision
27 pages
PCV Lab Codes
No ratings yet
PCV Lab Codes
51 pages
Symbolic Mathematics in Python - Scipy Lecture Notes
No ratings yet
Symbolic Mathematics in Python - Scipy Lecture Notes
8 pages
Python Programming For Arc Gis
No ratings yet
Python Programming For Arc Gis
26 pages
Python Programming For Arc Gis
No ratings yet
Python Programming For Arc Gis
26 pages
Marine Mammals
No ratings yet
Marine Mammals
32 pages
NumPy For IDL Users - Mathesaurus
No ratings yet
NumPy For IDL Users - Mathesaurus
13 pages
Calling IDL From Python
No ratings yet
Calling IDL From Python
5 pages
Python - ArchWiki
No ratings yet
Python - ArchWiki
6 pages