AWS Lambda functions to extract text from various binary formats.
-
Updated
Feb 7, 2018 - Python
AWS Lambda functions to extract text from various binary formats.
Build a RAG preprocessing pipeline
Recognize page content of a PDF as text using Tesseract and Ghostscript.
A powerful and user-friendly tool based on OCRmyPDF, offering a seamless GUI for conversion of image-based PDFs into searchable text.
Simple and reliable script to conduct high-quality fast OCR on a PDF
Example Django-Python project which contains OCR, PDF to OCR PDF, Text Similarity/Dissimilarity, PDF to PNG converter modules.
PDF OCR service in docker
Use Optical Character Recognition technology to convert scanned PDFs into TXT files locally.
Utility with collect in one place, some operations that are normally done on PDF files.
A tool for compare, merge, display difference and make OCR between the PDFs.
Add a description, image, and links to the pdf-ocr-extraction topic page so that developers can more easily learn about it.
To associate your repository with the pdf-ocr-extraction topic, visit your repo's landing page and select "manage topics."