Document Localization and Classification As Stages of a Document Recognition System
Pages 699 - 716
Abstract
Abstract
The article is devoted to approaches and methods for analyzing document images, which were developed and used by scientists of the scientific school of V.L. Arlazarov to solve problems of the type definition and localization of documents with a known structure. It describes the principles for building solutions that have emerged as input data have become more complex and performance requirements have stricted. The methods presented in the article demonstrate the scientific path of the school from working with scanned images to photographs and video stream frames, from the most general classes of documents tied to text structure to the strictest ones based on their visual features.
References
[1]
Arlazarov V. L., Arlazarov V. V., Bulatov K. B., Chernov T. S., Nikolaev D. P., Polevoy D. V., Sheshkus A. V., Skoryukina N. S., Slavin O. A., and Usilin S. A. Mobile ID document recognition–Coarse-to-fine approach Pattern Recognit. Image Anal. 2022 32 89-108
[2]
Arlazarov V. V., Andreeva E. I., Bulatov K. B., Nikolaev D. P., Petrova O. O., Savelev B. I., and Slavin O. A. Document image analysis and recognition: A survey Komp’yuternaya Opt. 2022 46 567-589
[3]
V. V. Arlasarov, A. E. Zhukovsky, V. E. Krivtsov, D. P. Nikolaev, and D. V. Polevoy, “Analysis of features of the use of fixed and mobile small-sized digital video camera for OCR,” Inf. Tekhnol. Vychslitel’nye Sist., No. 3, 71–81 (2014).
[4]
F. Attivissimo, N. Giaquinto, M. Scarpetta, and M. Spadavecchia, “An automatic reader of identity documents,” in 2019 IEEE Int. Conf. on Systems, Man and Cybernetics (SMC), Bari, Italy,2019, Ed. by M.-P. Fanti (IEEE, 2019), pp. 3525–3530.
[5]
O. Augereau, N. Journet, and J.-Ph. Domenger, “Semi-structured document image matching and recognition,” Proc. SPIE 8658 (SPIE), 865804 (2013).
[6]
A. M. Awal, N. Ghanmi, R. Sicre, and T. Furon, “Complex document classification and localization application on identity document images,” in 2017 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, Japan,2017, Ed. by K. Kise (IEEE, 2017), pp. 426–431.
[7]
Bay H., Ess A., Tuytelaars T., and Van Gool L. Speeded-up robust features (SURF) Comput. Vision Image Understanding 2008 110 346-359
[8]
Bessmeltsev V., Bulushev E., and Goloshevsky N. GraphiCon 2011
[9]
Bezmaternykh P., Nikolaev D., and Postnikov V. ITaS 2008 2008
[10]
Bocharov D. A., Aksenov K. A., Shemyakina Yu. A., and Konovalenko I. A. Robust criterion for vanishing point estimation of linear trajectories of detected vehicles in a video stream Sensornye Sist. 2019 33 44-51
[11]
Brady M. L. A fast discrete approximation algorithm for the Radon transform SIAM J. Comput. 1998 27 107-119
[12]
Bulatov K. B., Bezmaternykh P. V., Nikolaev D. P., and Arlazarov V. V. Towards a unified framework for identity documents analysis and recognition Komp’yuternaya Opt. 2022 46 436-454
[13]
Bulatov K. B., Ilin D. A., Polevoy D. V., and Chernyshova Y. S. Problems of machine-readable zone recognition captured with digital mobile cameras Tr. Inst. Sist. Anal. Ross. Akad. Nauk 2015 65 85-94
[14]
Canny J. A computational approach to edge detection IEEE Trans. Pattern Anal. Mach. Intell. 1986 PAMI-8 679-698
[15]
X. Chen and A. L. Yuille, “Detecting and reading text in natural scenes,” in Proc. 2004 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition,2004. CVPR 2004, Washington, D.C., 2004, Ed. by L. Davis (IEEE, 2004), pp. II–II.
[16]
Christian S. Williem, and K. In, “Correcting geometric and photometric distortion of document images on a smartphone,” J. Electron. Imaging 2015 24 13038
[17]
Clark P. and Mirmehdi M. Recognising text in real scenes Int. J. Document Anal. Recognit. 2002 4 243-257
[18]
B. Epshtein, E. Ofek, and Yo. Wexler, “Detecting text in natural scenes with stroke width transform,” in 2010 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, San Francisco,2010, Ed. by L. Davis (IEEE, 2010), pp. 2963–2970.
[19]
Fischler M. A. and Bolles R. C. Random sample consensus Commun. ACM 1981 24 381-395
[20]
Federal Migration Service of the Russian Federation, Rules and Method for Generating Machine-Readable Record in the Passport of a Citizen of the Russian Federation Identifying the Citizen of the Russian Federation on Its Territory. https://base.garant.ru/55172175/53f89421bbdaf741eb2d1ecc4ddb4c33/. Cited October 20, 2022.
[21]
ICAO. Doc 9303. Machine Readable Travel, Parts 2–7 (2015).
[22]
B. Jähne, H. Scharr, and S. Körkel, “Principles of filter design,” in Handbook of Computer Vision and Applications (Academic, 1999), Vol. 2, pp. 125–151.
[23]
K. Javed and F. Shafait, “Real-time document localization in natural images by recursive application of a CNN,” in 2017 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, Japan,2017, Ed. by K. Kise (IEEE, 2017), pp. 105–110.
[24]
Kolmakov S. I., Skoryukina N. S., and Arlazarov V. V. Machine-readable zones detection in images captured by mobile devices’ cameras Pattern Recognit. Image Anal. 2020 30 489-495
[25]
Konovalenko I. A., Shemiakina J. A., and Faradjev I. A. “Calculation of a vanishing point by the maximum likelihood estimation method,” Vestn. Yuzhno-Ural. Gos. Univ Ser. Mat. Model. Program. 2020 13 107-117
[26]
Liang J., Dementhon D., and Doermann D. Geometric rectification of camera-captured document images IEEE Trans. Pattern Anal. Mach. Intell. 2008 30 591-605
[27]
A. Lukoyanov, D. Nikolaev, and I. Konovalenko, “Modification of YAPE keypoint detection algorithm for wide local contrast range images,” Proc. SPIE 10696 (SPIE), 1069616 (2017).
[28]
Matalov D., Limonova E., Skoryukina N., and Arlazarov V. V. Document Analysis and Recognition–ICDAR 2021 2021 Cham Springer
[29]
M. Muja and D. G. Lowe, “Fast matching of binary features,” in 2012 Ninth Conf. on Computer and Robot Vision, Toronto,2012, Ed. by P. Giguere (IEEE, 2012), pp. 404–410.
[30]
D. P. Nikolaev, S. M. Karpenko, I. P. Nikolayev, and P. P. Nikolayev “Hough transform: Underestimated tool in the computer vision field,” in ECMS 2008 Proc., Nicosia,2008, Ed. by L. S. Louca, Y. Chrysanthou, Z. Oplatkova, and K. Al-Begain (European Council for Modelling & Simulation, 2008), pp. 238–243.
[31]
M. Norouzi, A. Punjani, and D. J. Fleet, “Fast search in Hamming space with multi-index hashing,” in 2012 IEEE Conf. on Computer Vision and Pattern Recognition, Providence, R.I.,2012, Ed. by R. Chellappa (IEEE, 2012), pp. 3108–3115.
[32]
V. V. Postnikov, “Automatic identification and recognition of structured documents,” Extended Abstract of Candidate’s Dissertation in Engineering (Inst. for Systems Analysis, Russ. Acad. Sci., Moscow, 2001).
[33]
V. V. Postnikov, “A formal approach to the identification problem for graphical images of structured documents,” in Collection of Sci. Works of the Institute for Systems Analysis, Russian Academy of Sciences, Development of Paperless Technologies in Organization Systems (Inst. Sist. Anal. Ross. Akad. Nauk, Moscow, 1999), pp. 280–299.
[34]
Savelyev B. I., Skoryukina N. S., and Arlazarov V. V. “A method for machine-readable zones location based on a combination of the Hough transform and the search for feature points,” Bull. S. Ural State Univ. Ser. Math. Modell Program. Comput. Software 2022 15 100-110
[35]
J. Shemiakina, I. Konovalenko, D. Tropin, and I. Faradjev, “Fast projective image rectification for planar objects with Manhattan structure,” Proc. SPIE 11433 (SPIE), 114331 (2020).
[36]
Shemiakina J. A., Zhukovsky A. E., Konovalenko I. A., and Nikolaev D. P. Algorithm for automatic framing of digital images under projective transformation Tr. Inst. Sistemnogo Anal. Ross. Akad. Nauk 2018 68 142-149
[37]
Shemiakina J., Limonova E., Skoryukina N., Arlazarov V. V., and Nikolaev D. P. A method of image quality assessment for text recognition on camera-captured and projectively distorted documents Mathematics 2021 9 2155
[38]
A. Sheshkus, A. Ingacheva, and D. Nikolaev, “Vanishing points detection using combination of fast Hough transform and deep learning,” Proc. SPIE 10696 (SPIE), 106960H (2017).
[39]
A. Sheshkus, A. Ingacheva, V. Arlazarov, and D. Nikolaev, “HoughNet: Neural network architecture for vanishing points detection,” in 2019 Int. Conf. on Document Analysis and Recognition (ICDAR), Sydney,2019, Ed. by C. Suen (IEEE, 2019), pp. 844–849.
[40]
Sheshkus A., Chirvonaya A., Matveev D., Nikolaev D., and Arlazarov V. L. Vanishing point detection with direct and transposed fast Hough transform inside the neural network Komp’yuternaya Opt. 2020 44 737-745
[41]
A. V. Sheshkus and D. P. Nikolaev, “Transfer of a high-level knowledge in HoughNet neural network,” Proc. SPIE 11433 (SPIE), 1143322 (2019).
[42]
N. Skoryukina, V. Arlazarov, and D. Nikolaev, “Fast method of ID documents location and type identification for mobile and server application,” in 2019 Int. Conf. on Document Analysis and Recognition (ICDAR), Sydney,2019, Ed. by C. Suen (IEEE, 2019), pp. 850–857.
[43]
N. Skoryukina, D. P. Nikolaev, and V. V. Arlazarov, “2D art recognition in uncontrolled conditions using one-shot learning,” Proc. SPIE 11041 (SPIE), 110412 (2019).
[44]
N. Skoryukina, T. Chernov, K. Bulatov, D. P. Nikolaev, and V. Arlazarov, “Snapscreen: TV-stream frame search with projectively distorted and noisy query,” Proc. SPIE 10341 (SPIE), 103410 (2016).
[45]
N. Skoryukina, I. Faradjev, V. L. Arlazarov, and J. Shemiakina, “Document localization algorithms based on feature points and straight lines,” Proc. SPIE 10696 (SPIE), 106961 (2017).
[46]
N. S. Skoryukina, I. A. Faradjev, K. B. Bulatov, and V. V. Arlazarov, “Impact of geometrical restrictions in RANSAC sampling on the ID document classification,” Proc. SPIE 11433 (SPIE), 1143306 (2020).
[47]
N. Skoryukina, V. V. Arlazarov, and A. Milovzorov, “Memory consumption reduction for identity document classification with local and global features combination,” Proc. SPIE 11605 (SPIE), 116051 (2021).
[48]
Skoryukina N. Machine-readable zones localization method robust to capture conditions Tr. Inst. Sist. Anal. Ross. Akad. Nauk 2017 67 81-86
[49]
N. Skoryukina, D. P. Nikolaev, A. Sheshkus, and D. Polevoy, “Real time rectangular document detection on mobile devices,” Proc. SPIE 9445 (SPIE), 94452А (2015).
[50]
Yu. Takezawa, M. Hasegawa, and S. Tabbone, “Robust perspective rectification of camera-captured document images,” in 2017 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, Japan,2017, Ed. by K. Kise (IEEE, 2017), pp. 27–32.
[51]
D. V. Tropin, S. A. Ilyuhin, D. P. Nikolaev, and V. V. Arlazarov, “Approach for document detection by contours and contrasts,” in 2020 25th Int. Conf. on Pattern Recognition (ICPR), Milan,2021, Ed. by R. Cucchiara (IEEE, 2021), pp. 9689–9695.
[52]
D. Tropin, I. Konovalenko, N. Skoryukina, D. Nikolaev, and V. V. Arlazarov, “Improved algorithm of ID card detection by a priori knowledge of the document aspect ratio,” Proc. SPIE 11605, 116051 (2021).
[53]
Tropin D. V., Ershov A. M., Nikolaev D. P., and Arlazarov V. V. Advanced Hough-based method for on-device document localization Komp’yuternaya Opt. 2021 45 702-712
[54]
Tropin D. V., Shemiakina J. A., Konovalenko I. A., and Faradjev I. A. Localization of planar objects on the images with complex structure of projective distortion Inf. Protsessy 2019 19 208-229
[55]
P. Turcot and D. G. Lowe, “Better matching with fewer features: The selection of useful features in large database recognition problems,” in 2009 IEEE 12th Int. Conf. on Computer Vision Workshops, ICCV Workshops, Kyoto, Japan,2009, Ed. by J. Shi (IEEE, 2009), pp. 2109–2116.
[56]
S. Usilin, D. Nikolaev, V. Postnikov, and G. Schaefer, “Visual appearance based document image classification,” in 2010 IEEE Int. Conf. on Image Processing, Hong Kong,2010, Ed. by W.-C. Siu (IEEE, 2010), pp. 2133–2136.
[57]
M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proc. 2001 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. CVPR 2001, Kauai, Hawaii,2001, Ed. by R. Kasturi (IEEE, 2001), pp. I–I .
[58]
Von Gioi R. G., Jakubowicz J., Morel J. M., and Randall G. LSD: A line segment detector Image Process. On Line 2012 2 35-55
[59]
Zhang Z. and He L. Whiteboard scanning and image enhancement Digital Signal Process. 2007 17 414-432
[60]
Zhang Z., Ganesh A., Liang X., and Ma Yi. TILT: Transform invariant low-rank textures Int. J. Comput. Vision 2012 99 1-24
[61]
A. Zhukovsky, D. Nikolaev, V. Arlazarov, V. Postnikov, D. Polevoy, N. Skoryukina, T. Chernov, J. Shemiakina, A. Mukovozov, I. Konovalenko, and M. Povolotsky, “Segments graph-based approach for document capture in a smartphone video stream,” in 2017 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, Japan,2017, Ed. by K. Kise (IEEE, 2017), pp. 337–342.
Recommendations
Document recognition: concepts and implementations
Document recognition is a task in which a document in its physical presentation format is transformed into a structured author-oriented model of the document. The presentation format can be bitmaps of document pages, a description of the document in a ...
Comments
Information & Contributors
Information
Published In
© Pleiades Publishing, Ltd. 2023. ISSN 1054-6618, Pattern Recognition and Image Analysis, 2023, Vol. 33, No. 4, pp. 699–716. © Pleiades Publishing, Ltd., 2023.
Publisher
Springer-Verlag
Berlin, Heidelberg
Publication History
Published: 01 December 2023
Accepted: 20 January 2023
Revision received: 20 January 2023
Received: 20 January 2023
Author Tags
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025