Abstract
In this paper we present some initial results of OCRopodium project to build a scalable workflow for OCR of historical collections. Large-scale digitisation projects dealing with text-based historical material face challenges that are not well-catered-to by commercial software. Open source tools allow for better customisation to match these requirements, particularly with regard to character model training and per-project language modelling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Breuel, T.M.: The ocropus open source ocr system (2007)
The openfst project, http://www.openfst.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bryant, M., Blanke, T., Hedges, M., Palmer, R. (2010). Open Source Historical OCR: The OCRopodium Project. In: Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., Frommholz, I. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2010. Lecture Notes in Computer Science, vol 6273. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15464-5_72
Download citation
DOI: https://doi.org/10.1007/978-3-642-15464-5_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15463-8
Online ISBN: 978-3-642-15464-5
eBook Packages: Computer ScienceComputer Science (R0)