Open Source Historical OCR: The OCRopodium Project

Bryant, Michael; Blanke, Tobias; Hedges, Mark; Palmer, Richard

doi:10.1007/978-3-642-15464-5_72

Michael Bryant²⁰,
Tobias Blanke²⁰,
Mark Hedges²⁰ &
…
Richard Palmer²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6273))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

1735 Accesses
6 Citations

Abstract

In this paper we present some initial results of OCRopodium project to build a scalable workflow for OCR of historical collections. Large-scale digitisation projects dealing with text-based historical material face challenges that are not well-catered-to by commercial software. Open source tools allow for better customisation to match these requirements, particularly with regard to character model training and per-project language modelling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Natural History in Europeana - Accessing Scientific Collection Objects via LOD

The Historical Significance of the Cambridge Genizah Inventory Project

Bringing places from the distant past to the present: a report on the World Historical Gazetteer

Article 12 October 2022

References

Breuel, T.M.: The ocropus open source ocr system (2007)
Google Scholar
The openfst project, http://www.openfst.org

Download references

Author information

Authors and Affiliations

Centre for e-Research, King’s College London,
Michael Bryant, Tobias Blanke, Mark Hedges & Richard Palmer

Authors

Michael Bryant
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Blanke
View author publications
You can also search for this author in PubMed Google Scholar
Mark Hedges
View author publications
You can also search for this author in PubMed Google Scholar
Richard Palmer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science, University of Glasgow, 17 Lilybank Gardens, G12 8QQ, Glasgow, UK
Mounia Lalmas & Joemon Jose &
Vienna University of Technology, 1040, Vienna, Austria
Andreas Rauber
Istituto di Scienza e Tecnologia dell’Informazione, Consiglio Nazionale delle Ricerche, Via G Moruzzi 1, 56124, Pisa, Italy
Fabrizio Sebastiani
University of Glasgow, G12 8QQ, Glasgow, Uk
Ingo Frommholz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bryant, M., Blanke, T., Hedges, M., Palmer, R. (2010). Open Source Historical OCR: The OCRopodium Project. In: Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., Frommholz, I. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2010. Lecture Notes in Computer Science, vol 6273. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15464-5_72

Download citation

DOI: https://doi.org/10.1007/978-3-642-15464-5_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15463-8
Online ISBN: 978-3-642-15464-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Open Source Historical OCR: The OCRopodium Project

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Natural History in Europeana - Accessing Scientific Collection Objects via LOD

The Historical Significance of the Cambridge Genizah Inventory Project

Bringing places from the distant past to the present: a report on the World Historical Gazetteer

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Open Source Historical OCR: The OCRopodium Project

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Natural History in Europeana - Accessing Scientific Collection Objects via LOD

The Historical Significance of the Cambridge Genizah Inventory Project

Bringing places from the distant past to the present: a report on the World Historical Gazetteer

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation