Abstract
In this paper we address the problem of text extraction, enhancement and recognition in digital video. Compared with optical character recognition (OCR) from document images, text extraction and recognition in digital video presents several new challenges. First, the text in video is often embedded in complex backgrounds, making text extraction and separation difficult. Second, image data contained in video frames is often digitized and/or subsampled at a much lower resolution than is typical for document images. As a result, most commercial OCR software can not recognize text extracted from video. We have implemented a hybrid wavelet/neural network segmenter to extract text regions and use a two stage enhancement scheme prior to recognition. First, we use Shannon interpolation to raise the image resolution, and second we postprocess the block with normal/inverse text classification and adaptive thresholding. Experimental results show that our text extraction scheme can extract both scene text and graphical text robustly and reasonable OCR results are achieved after enhancement.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Reference
G. Piccioli, E. De Micheli, P. Parodi, and M. Campani. Robust method for road sign detection and recognition. Image and Vision Computing, 14:209–254, 1996.
S. K. Kim, D. W. Kim, and H. J. Kiml, A recognition of vehicle license plate using a genetic algorithm based segmentation. In Proceedings of ICIP, pages 661–664, 1996.
T. Gotoh, T. Toriu, S. Sasaki, and M. Yoshida. A flexible vision-based algorithm for a book sorting system. IEEE Trans. PAMI, 10:393–399, 1998.
.J. Zhou, D. Lopresti, and T. Tasdizen. Finding text in color images. In Proceedings of SPIE, Document Recognition V, pages 130–140, 1998.
R. Lienhart and F. Stuber. Automatic text recognition in digital videos. In Proceedings of ACM Multimedia, pages 11–20, 1996.
A. K. Jain and B. Yu. Automatic text location in images and video frames. In Proceedings of ICPR, pages 1497–1499, 1998.
Hae-Kwang Kim. Efficient automatic text location method and content-based indexing and structuring of video database. Journal of Visual Communication and Image Representation, 7:336–344, 1996.
C-M. Lee and A. Kankanhalli. Automatic extraction of characters in complex scene images. International Journal of Pattern Rocognition and Artificial Intelligence, 9:67–82, 1995.
J. Ohya, A. Shio, and S. Akamatsu. Recognizing characters in scene images. IEEE Trans. PAMI, 16:214–220, 1994.
A. K. Jain and S. Bhattacharjee. Text segmentation using Gabor niters for automatic document processing. Machine Vision and Applications, 5:169–184, 1992.
V. Wu, R. Manmatha, and E. M. Riseman.Automatic text detection and recognition. pages 707–712. 5 1997.
Y. Zhong, K. Karu, and A.K. Jain. Locating text in complex color images. Pattern Recognition, 28:1523-1236, 1995.
John D. Hobby and Tin K. Ho. Enhancing degraded document images via bitmap clustering and averaging. In ICDAR'97: Fourth International Conference on Document Analysis and Recogntion, pages 394–400, August 1997.
J. Liang and R. M. Haralick. Document image restoration using binary morphological filters. In SPIE Vol. 2660, 1996.
J. Shim, C. Dorai, and R. Bolle. Automatic text extraction from video for contentbased annotation and retrieval. In Proceedings of ICPR, pages 618–620, 1998.
J. Zhou and D. Lopresti. Ocr for world wide web images. In Proceedings of SPIE, Document Recognition IV, pages 58–66, 1997.
S. G. Mallat. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. PAMI, 11:674–693, 1989.
K. Sung and T. Poggio. Example-based learning for view-based human face detection. Technical report, MIT, A.I. Memo 1521, CBCL Paper 112, 1994.
K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, New York, 1990.
Niblack W. In An introducti on to image processing, pages 115–116, Englewood Cliffs, N.J.: Prentice Hall, 1986.
V. Kobia, D. S. Doermann, and K. I. Lin. Archiving, indexing, and retrieval of video in the compressed domain. In Proc. of the SPIE Conference on Multimedia Storage and Archiving Systems, volume 2916, pages 78–89, 1996.
S. Chen. OCR performance evaluation software-user's manual. In T h e Uni versi ty of Washington Database.
T. Kanungo, G. A. Marton, and O. Bulbul. Omnipage vs. sakhr: Paired model evaluation of two arable ocr products. In Proc. of the SPIE Conference on Document Recognition and Retrieval (VI), volume 3651, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, H., Doermann, D., Kia, O. (1999). Text Extraction, Enhancement and OCR in Digital Video. In: Lee, SW., Nakano, Y. (eds) Document Analysis Systems: Theory and Practice. DAS 1998. Lecture Notes in Computer Science, vol 1655. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48172-9_29
Download citation
DOI: https://doi.org/10.1007/3-540-48172-9_29
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66507-6
Online ISBN: 978-3-540-48172-0
eBook Packages: Springer Book Archive