Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Word matching using single closed contours for indexing handwritten historical documents

  • Original Paper
  • Published:
International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on optical character recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution (Lavrenko et al. in proc. document Image Analysis for Libraries (DIAL’04), pp. 278–287, 2004). Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes (Adamek and O’Connor in IEEE Trans Circuits Syst Video Technol 5:2004). We demonstrate that multiscale contour-based descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83%, which considerably exceeds the performance of other systems reported in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Lavrenko, V., Rath, T., Manmatha, R.: Holistic word recognition for handwritten historical documents. In: Proceedings Document Image Analysis for Libraries (DIAL’04), pp. 278–287 (2004)

  2. Adamek, T., O’Connor, N.E.: A multiscale representation method for non-rigid shapes with a single closed contour. IEEE Trans. Circuits Syst. Video Technol., special issue on Audio and Video Analysis for Multimedia Interactive Services. no. 5, (2004)

  3. Rath, T.M., Manmatha, R., Lavrenko, V.: A search engine for historical manuscript images. In: Proceedings SIGIR Conference (SIGIR’04), July (2004)

  4. Tomai, C.I., Zhang, B., Govindaraju, V.: Transcript mapping for historic handwritten document images. In: Proceedings 8th International Workshop on Frontiers in Handwriting Recognition, August pp. 413–418 (2002)

  5. Rath T., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proceedings Conference ICDAR’03, vol. 1, pp. 218–222 (2003)

  6. Rath, T., Manmatha, R.: Word image matching using dynamic time warping. In: Proceedings Conference. CVPR’03, vol. 2, pp. 521–527 (2003)

  7. Marinai, S., Dengel, A. (eds.): Document analysis systems vi. In: Proceedings 6th International Workshop DAS’04, Florence, Italy. Sept. Verlag, LNCS 3163 (2004)

  8. Harding, S.M., Croft, W.B., Weir, C.: Probabilistic retrieval of ocr degraded text using n-grams. In: Proceedings of the 1st European Conference. on Research and Advanced Technology for Digital Libraries, Pisa, Italy, Sept. pp. 345–359 (1997)

  9. Teppert C.C., Suen C.Y., Wakahara T. (1990). The state of the art in on-line handwriting recognition. IEEE Trans. Pattern Anal. Machine Intell. 12(8):787–808

    Article  Google Scholar 

  10. Cheng D., Yan H. (1998). Recognition of handwritten digits based on contour information. Pattern Recognit. 31(3):235–255

    Article  Google Scholar 

  11. Belongie S., Malik J., Puzicha J. (2002). Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Machine Intell. 24(4):509–522

    Article  Google Scholar 

  12. Favata, J.T.: Character model word recognition. In: 5th International Workshop on Frontiers in Handwriting Recognition, Essex, England, Sept. 1996, pp. 437–440 (1996)

  13. Madhvanath S., Govindaraju V. (2001). The role of holistic paradigms in handwritten word recognition. IEEE Trans. Pattern Anal. Machine Intell. 23(2):149–164

    Article  Google Scholar 

  14. Farag R.F. (1979). Word-level recognition of cursive script. IEEE Trans. Comput. C-28(2):172–175

    Google Scholar 

  15. Dzuba, G., Filatov, A., Gershuny, D., Kil, I.: Handwritten word recognition-the approach proved by practice. In: Proceedings of the Sixth International Workshop Frontiers in Handwriting Recognition, pp. 99–111 (1998)

  16. Madhvanath S., Kim G., Govindaraju V. (1999). Chaincode contour processing for handwritten word recognition. IEEE Trans. Pattern Anal. Machine Intell. 21(9):928–932

    Article  Google Scholar 

  17. Madhvanath S., Kleinberg E., Govindaraju V. (1999). Holistic verification of handwritten phrases. IEEE Trans. Pattern Anal. Machine Intell. 21(12):1344–1356

    Article  Google Scholar 

  18. Manmatha, R., Srimal, N.: Scale space technique for word segmentation in handwritten manuscripts. In: Proceedings of the 2nd International Conference on Scale-Space. Theories in Computer Vision, Corfu, Greece, Sept. 1999 pp. 22–33 (1999)

  19. Niblack W. (1986). An Introduction to Digital Image Processing. Prentice Hall, Englewood Cliffs

    Google Scholar 

  20. Sauvola, J., Seppänen, T., Haapakoski, S., Pietikäinen, M.: Adaptive document binarization. In: Proceedings International Conference on Document Analysis and Recognition, vol. 1, pp. 147–152 (1997)

  21. Vincent L. (1993). Morphological grayscale reconstruction in image analysis: applications and efficient algorithms. IEEE Trans. Image Process. 2(2):176–201

    Article  Google Scholar 

  22. He, J., Do, Q.D.M., Downton, A.C., Kim, J.H.: A comparison of binarization methods for historical archive documents In: Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR’2005), Seoul, Korea, Aug. (2005)

  23. Adamek, T., O’Connor, N.E., Murphy, N.: Multi-scale representation and optimal matching of non-rigid shapes. In: Proceedings of the 4th International Workshop on Content-Based Multimedia Indexing, CBMI’05, Riga, Latvia, June (2005)

  24. Myers, C.M., Rabiner, L.R.: Rosenberg, A.E.: Performance tradeoff in dynamic time warping algorithms for isolated word recognition IEEE Trans. Acoust., Speech, Signal Process. 28(12) (1980)

  25. Mokhtarian F., Mackworth A.K. (1992). A theory of multiscale, curvature-based shape representation for planar curves. IEEE Trans. Pattern Anal. Machine Intell. 14(8):789–805

    Article  Google Scholar 

  26. Jeannin, S., Bober, M.: Description of core experiments for MPEG-7 motion/shape. In: MPEG-7, ISO/IEC/JTC1/SC29/ WG11/MPEG99/N2690, Seoul, March (1999)

  27. Manmatha, R., Croft, W.B.: Word spotting: Indexing handwritten archives. In: Proceedings of the Intelligent Multi-media Information Retrieval Collection, M. Maybury (ed.), AAAI/MIT Press (1997)

  28. Kane, S., Lehman, A., Partridge, E.: Indexing George Washington’s handwritten manuscripts. Tech. Rep., Technical Report MM-34, Center for Intelligent Information Retrieval, University of Massachusetts Amherst (2001)

  29. Adamek, T., O’Connor, N.: Efficient contour-based shape representation and matching. In: Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR’03), Berkeley, CA, Nov. (2003)

  30. Huang L.K., Wang M.J.J. (1996). Efficient shape matching through model-based shape recognition. Pattern Recognit. 29(2):207–215

    Article  Google Scholar 

  31. Latecki, L.J., Lakämper, R., Eckhardt, U.: Shape descriptors for non-rigid shapes with a single closed contour. In: IEEE Conference. On Computer Vision and Pattern Recognition (CVPR’00), pp. 424–429 (2000)

  32. ISO/IEC MPEG7 eXperimentation model (XM software), www.lis.ei.tum.de/research/bv/topics/mmdb/ e_mpeg7.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomasz Adamek.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Adamek, T., O’Connor, N.E. & Smeaton, A.F. Word matching using single closed contours for indexing handwritten historical documents. IJDAR 9, 153–165 (2007). https://doi.org/10.1007/s10032-006-0024-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-006-0024-y

Keywords