Document Prediction¶

Pre-requisite¶

Processing document data depends on the optical character recognition (OCR) package tesseract.

For Ubuntu users, you can install Tesseract and its developer tools by simply running:

sudo apt install tesseract-ocr

For macOS users, run:

sudo port install tesseract

or run:

brew install tesseract

For Windows users, installer is available from Tesseract at UB-Mannheim. To access tesseract-OCR from any location you may have to add the directory where the tesseract-OCR binaries are located to the Path variables.

For additional support, please refer to official instructions for tesseract

Quick Start¶

AutoMM for Scanned Document Classification

How to use AutoMM to build a scanned document classifier.

document_classification.html

Classifying PDF Documents with AutoMM

How to use AutoMM to build a PDF document classifier.

pdf_classification.html