Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Document rectification and illumination correction using a patch-based CNN

Published: 08 November 2019 Publication History

Abstract

We propose a novel learning method to rectify document images with various distortion types from a single input image. As opposed to previous learning-based methods, our approach seeks to first learn the distortion flow on input image patches rather than the entire image. We then present a robust technique to stitch the patch results into the rectified document by processing in the gradient domain. Furthermore, we propose a second network to correct the uneven illumination, further improving the readability and OCR accuracy. Due to the less complex distortion present on the smaller image patches, our patch-based approach followed by stitching and illumination correction can significantly improve the overall accuracy in both the synthetic and real datasets.

Supplementary Material

ZIP File (a168-li.zip)
Supplemental files.

References

[1]
Steve Bako, Soheil Darabi, Eli Shechtman, Jue Wang, Kalyan Sunkavalli, and Pradeep Sen. 2016. Removing Shadows from Images of Documents. In Asian Conference on Computer Vision. Springer, 173--183.
[2]
Michael S Brown and W Brent Seales. 2001. Document restoration using 3D shape: a general deskewing algorithm for arbitrarily warped documents. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, Vol. 2. IEEE, 367--374.
[3]
Michael S Brown and Y-C Tsoi. 2006. Geometric and shading correction for images of printed materials using boundary. IEEE Transactions on Image Processing 15, 6 (2006), 1544--1554.
[4]
Huaigu Cao, Xiaoqing Ding, and Changsong Liu. 2003. Rectifying the bound document image captured by the camera: A model based approach. In Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on. IEEE, 71--75.
[5]
Frédéric Courteille, Alain Crouzil, Jean-Denis Durou, and Pierre Gurdjos. 2007. Shape from shading for the digitization of curved documents. Machine Vision and Applications 18, 5 (2007), 301--316.
[6]
Sagnik Das, Gaurav Mishra, Akshay Sudharshana, and Roy Shilkrot. 2017. The Common Fold: Utilizing the Four-Fold to Dewarp Printed Documents from a Single Image. In Proceedings of the 2017 ACM Symposium on Document Engineering. ACM, 125--128.
[7]
MENG Gaofeng, SU Yuanqi, WU Ying, Shiming Xiang, PAN Chunhong, et al. 2018. Exploiting Vector Fields for Geometric Rectification of Distorted Document Images. (2018).
[8]
Michaël Gharbi, YiChang Shih, Gaurav Chaurasia, Jonathan Ragan-Kelley, Sylvain Paris, and Frédo Durand. 2015. Transform recipes for efficient cloud photo enhancement. ACM Transactions on Graphics (TOG) 34, 6 (2015), 228.
[9]
Rafael C Gonzalez and Richard E Woods. 2007. Image processing. Digital image processing 2 (2007).
[10]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[11]
Mingming He, Jing Liao, Pedro V Sander, and Hugues Hoppe. 2018. Gigapixel Panorama Video Loops. ACM Transactions on Graphics (TOG) 37, 1 (2018), 3.
[12]
Yuan He, Pan Pan, Shufu Xie, Jun Sun, and Satoshi Naoi. 2013. A book dewarping system by boundary-based 3D surface reconstruction. In Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 403--407.
[13]
Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2016. Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Transactions on Graphics (TOG) 35, 4 (2016), 110.
[14]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. arXiv preprint (2017).
[15]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision. Springer, 694--711.
[16]
Taeho Kil, Wonkyo Seo, Hyung Il Koo, and Nam Ik Cho. 2017. Robust Document Image Dewarping Method Using Text-Lines and Line Segments. In Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, Vol. 1. IEEE, 865--870.
[17]
Beom Su Kim, Hyung Il Koo, and Nam Ik Cho. 2015. Document dewarping via text-line based optimization. Pattern Recognition 48, 11 (2015), 3600--3614.
[18]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[19]
Hyung Il Koo and Nam Ik Cho. 2010. State estimation in a document image and its application in text block identification and text line extraction. In European Conference on Computer Vision. Springer, 421--434.
[20]
Hyung Il Koo, Jinho Kim, and Nam Ik Cho. 2009. Composition of a dewarped and enhanced document image from two view images. IEEE Transactions on Image Processing 18, 7 (2009), 1551--1562.
[21]
Olivier Lavialle, X Molines, Franck Angella, and Pierre Baylou. 2001. Active contours network to straighten distorted text lines. In Image Processing, 2001. Proceedings. 2001 International Conference on, Vol. 3. IEEE, 748--751.
[22]
Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10. 707--710.
[23]
Xiaoyu Li, Bo Zhang, Pedro V Sander, and Jing Liao. 2019. Blind Geometric Distortion Correction on Images Through Deep Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4855--4864.
[24]
Jian Liang, Daniel DeMenthon, and David Doermann. 2005. Unwarping images of curved documents using global shape optimization. In Int. Workshop on Camerabased Document Analysis and Recognition. 25--29.
[25]
Jian Liang, Daniel DeMenthon, and David Doermann. 2008. Geometric rectification of camera-captured document images. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 4 (2008), 591--605.
[26]
Ke Ma, Zhixin Shu, Xue Bai, Jue Wang, and Dimitris Samaras. 2018. DocUNet: Document Image Unwarping via A Stacked U-Net. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700--4709.
[27]
Gaofeng Meng, Ying Wang, Shenquan Qu, Shiming Xiang, and Chunhong Pan. 2014. Active flattening of curved document images via two structured beams. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3890--3897.
[28]
Gaofeng Meng, Shiming Xiang, Nanning Zheng, Chunhong Pan, et al. 2013. Nonparametric illumination correction for scanned document images via convex hulls. age 9, 10 (2013), 11.
[29]
Lothar Mischke and Wolfram Luther. 2005. Document image de-warping based on detection of distorted text lines. In International Conference on Image Analysis and Processing. Springer, 1068--1075.
[30]
Daniel Marques Oliveira and Rafael Dueire Lins. 2009. A new method for shading removal and binarization of documents acquired with portable digital cameras. In Proc. Third International Workshop Camera-Based Docu-ment Analysis and Recognition, Vol. 2. 3--10.
[31]
Jaakko Sauvola and Matti Pietikäinen. 2000. Adaptive document image binarization. Pattern recognition 33, 2 (2000), 225--236.
[32]
Vatsal Shah and Vineet Gandhi. 2018. An Iterative Approach for Shadow Removal in Document Images. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1892--1896.
[33]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[34]
Ray Smith. 2007. An overview of the Tesseract OCR engine. In Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on, Vol. 2. IEEE, 629--633.
[35]
Chew Lim Tan, Li Zhang, Zheng Zhang, and Tao Xia. 2006. Restoring warped document images through 3d shape modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 2 (2006), 195--208.
[36]
Yuandong Tian and Srinivasa G Narasimhan. 2011. Rectification and 3D reconstruction of curved document images. In CVPR 2011. IEEE, 377--384.
[37]
Bill Triggs, Philip F McLauchlan, Richard I Hartley, and Andrew W Fitzgibbon. 1999. Bundle adjustment a modern synthesis. In International workshop on vision algorithms. Springer, 298--372.
[38]
Yau-Chat Tsoi and Michael S Brown. 2007. Multi-view document rectification using boundary. In Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 1--8.
[39]
Adrian Ulges, Christoph H Lampert, and Thomas Breuel. 2004. Document capture using stereo vision. In Proceedings of the 2004 ACM symposium on Document engineering. ACM, 198--200.
[40]
Toshikazu Wada, Hiroyuki Ukida, and Takashi Matsuyama. 1997. Shape from shading with interreflections under a proximal light source: Distortion-free copying of an unfolded book. International Journal of Computer Vision 24, 2 (1997), 125--135.
[41]
Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. 2014. Intel math kernel library. In High-Performance Computing on the Intel® Xeon PhiâĎć. Springer, 167--188.
[42]
Changhua Wu and Gady Agam. 2002. Document image de-warping for text/graphics recognition. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). Springer, 348--357.
[43]
Atsushi Yamashita, Atsushi Kawarago, Toru Kaneko, and Kenjiro T Miura. 2004. Shape reconstruction and image restoration for non-flat surfaces of documents with a stereo vision system. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Vol. 1. IEEE, 482--485.
[44]
Lei Yang, Yu-Chiu Tse, Pedro V. Sander, Jason Lawrence, Diego Nehab, Hugues Hoppe, and Clara L. Wilkins. 2011. Image-based Bidirectional Scene Reprojection. ACM Trans. Graph. 30, 6 (2011), 150:1--150:10.
[45]
Shaodi You, Yasuyuki Matsushita, Sudipta Sinha, Yusuke Bou, and Katsushi Ikeuchi. 2018. Multiview rectification of folded documents. IEEE transactions on pattern analysis and machine intelligence 40, 2 (2018), 505--511.
[46]
Ali Zandifar. 2007. Unwarping scanned image of japanese/english documents. In Image Analysis and Processing, 2007. ICIAP 2007. 14th International Conference on. IEEE, 129--136.
[47]
Li Zhang, Andy M Yip, Michael S Brown, and Chew Lim Tan. 2009. A unified framework for document restoration using inpainting and shape-from-shading. Pattern Recognition 42, 11 (2009), 2961--2978.
[48]
Li Zhang, Yu Zhang, and Chew Tan. 2008. An improved physically-based method for geometric restoration of distorted document images. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 4 (2008), 728--734.
[49]
Zheng Zhang, Chew Lim Tan, and Liying Fan. 2004. Restoration of curved document images through 3D shape modeling. In null. IEEE, 10--15.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 38, Issue 6
December 2019
1292 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3355089
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2019
Published in TOG Volume 38, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. convolutional neural networks
  2. deep learning
  3. document image rectification

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)70
  • Downloads (Last 6 weeks)8
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Implicit Bonded Discrete Element Method with Manifold OptimizationACM Transactions on Graphics10.1145/371185244:1(1-17)Online publication date: 9-Jan-2025
  • (2025)DocMamba: Robust Document Image Dewarping via Selective State Space Sequence ModelingMultiMedia Modeling10.1007/978-981-96-2054-8_23(304-318)Online publication date: 9-Jan-2025
  • (2024)Optimisation of Convolution-Based Image Lightness ProcessingJournal of Imaging10.3390/jimaging1008020410:8(204)Online publication date: 22-Aug-2024
  • (2024)DocNLCProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i6.28366(5563-5571)Online publication date: 20-Feb-2024
  • (2024)RelScene: A Benchmark and baseline for Spatial Relations in text-driven 3D Scene GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681653(10563-10571)Online publication date: 28-Oct-2024
  • (2024)In Situ 3D Scene Synthesis for Ubiquitous Embodied InterfacesProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681616(3666-3675)Online publication date: 28-Oct-2024
  • (2024)DIG: Complex Layout Document Image Generation with Authentic-looking Text for Enhancing Layout AnalysisProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681609(3239-3247)Online publication date: 28-Oct-2024
  • (2024)Document Registration: Towards Automated Labeling of Pixel-Level Alignment Between Warped-Flat DocumentsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681548(9933-9942)Online publication date: 28-Oct-2024
  • (2024)VRCopilot: Authoring 3D Layouts with Generative AI Models in VRProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676451(1-13)Online publication date: 13-Oct-2024
  • (2024)A Dynamic Duo of Finite Elements and Material PointsACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657449(1-11)Online publication date: 13-Jul-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media