research-article

Document rectification and illumination correction using a patch-based CNN

Authors:

Pedro V. SanderAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 38, Issue 6

Article No.: 168, Pages 1 - 11

https://doi.org/10.1145/3355089.3356563

Published: 08 November 2019 Publication History

Abstract

We propose a novel learning method to rectify document images with various distortion types from a single input image. As opposed to previous learning-based methods, our approach seeks to first learn the distortion flow on input image patches rather than the entire image. We then present a robust technique to stitch the patch results into the rectified document by processing in the gradient domain. Furthermore, we propose a second network to correct the uneven illumination, further improving the readability and OCR accuracy. Due to the less complex distortion present on the smaller image patches, our patch-based approach followed by stitching and illumination correction can significantly improve the overall accuracy in both the synthetic and real datasets.

Supplementary Material

ZIP File (a168-li.zip)

Supplemental files.

Download
36.32 MB

References

[1]

Steve Bako, Soheil Darabi, Eli Shechtman, Jue Wang, Kalyan Sunkavalli, and Pradeep Sen. 2016. Removing Shadows from Images of Documents. In Asian Conference on Computer Vision. Springer, 173--183.

[2]

Michael S Brown and W Brent Seales. 2001. Document restoration using 3D shape: a general deskewing algorithm for arbitrarily warped documents. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, Vol. 2. IEEE, 367--374.

[3]

Michael S Brown and Y-C Tsoi. 2006. Geometric and shading correction for images of printed materials using boundary. IEEE Transactions on Image Processing 15, 6 (2006), 1544--1554.

Digital Library

[4]

Huaigu Cao, Xiaoqing Ding, and Changsong Liu. 2003. Rectifying the bound document image captured by the camera: A model based approach. In Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on. IEEE, 71--75.

[5]

Frédéric Courteille, Alain Crouzil, Jean-Denis Durou, and Pierre Gurdjos. 2007. Shape from shading for the digitization of curved documents. Machine Vision and Applications 18, 5 (2007), 301--316.

Digital Library

[6]

Sagnik Das, Gaurav Mishra, Akshay Sudharshana, and Roy Shilkrot. 2017. The Common Fold: Utilizing the Four-Fold to Dewarp Printed Documents from a Single Image. In Proceedings of the 2017 ACM Symposium on Document Engineering. ACM, 125--128.

Digital Library

[7]

MENG Gaofeng, SU Yuanqi, WU Ying, Shiming Xiang, PAN Chunhong, et al. 2018. Exploiting Vector Fields for Geometric Rectification of Distorted Document Images. (2018).

[8]

Michaël Gharbi, YiChang Shih, Gaurav Chaurasia, Jonathan Ragan-Kelley, Sylvain Paris, and Frédo Durand. 2015. Transform recipes for efficient cloud photo enhancement. ACM Transactions on Graphics (TOG) 34, 6 (2015), 228.

Digital Library

[9]

Rafael C Gonzalez and Richard E Woods. 2007. Image processing. Digital image processing 2 (2007).

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[11]

Mingming He, Jing Liao, Pedro V Sander, and Hugues Hoppe. 2018. Gigapixel Panorama Video Loops. ACM Transactions on Graphics (TOG) 37, 1 (2018), 3.

Digital Library

[12]

Yuan He, Pan Pan, Shufu Xie, Jun Sun, and Satoshi Naoi. 2013. A book dewarping system by boundary-based 3D surface reconstruction. In Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 403--407.

Digital Library

[13]

Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2016. Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Transactions on Graphics (TOG) 35, 4 (2016), 110.

Digital Library

[14]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. arXiv preprint (2017).

[15]

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision. Springer, 694--711.

[16]

Taeho Kil, Wonkyo Seo, Hyung Il Koo, and Nam Ik Cho. 2017. Robust Document Image Dewarping Method Using Text-Lines and Line Segments. In Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, Vol. 1. IEEE, 865--870.

[17]

Beom Su Kim, Hyung Il Koo, and Nam Ik Cho. 2015. Document dewarping via text-line based optimization. Pattern Recognition 48, 11 (2015), 3600--3614.

Digital Library

[18]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[19]

Hyung Il Koo and Nam Ik Cho. 2010. State estimation in a document image and its application in text block identification and text line extraction. In European Conference on Computer Vision. Springer, 421--434.

[20]

Hyung Il Koo, Jinho Kim, and Nam Ik Cho. 2009. Composition of a dewarped and enhanced document image from two view images. IEEE Transactions on Image Processing 18, 7 (2009), 1551--1562.

Digital Library

[21]

Olivier Lavialle, X Molines, Franck Angella, and Pierre Baylou. 2001. Active contours network to straighten distorted text lines. In Image Processing, 2001. Proceedings. 2001 International Conference on, Vol. 3. IEEE, 748--751.

[22]

Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10. 707--710.

[23]

Xiaoyu Li, Bo Zhang, Pedro V Sander, and Jing Liao. 2019. Blind Geometric Distortion Correction on Images Through Deep Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4855--4864.

[24]

Jian Liang, Daniel DeMenthon, and David Doermann. 2005. Unwarping images of curved documents using global shape optimization. In Int. Workshop on Camerabased Document Analysis and Recognition. 25--29.

[25]

Jian Liang, Daniel DeMenthon, and David Doermann. 2008. Geometric rectification of camera-captured document images. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 4 (2008), 591--605.

Digital Library

[26]

Ke Ma, Zhixin Shu, Xue Bai, Jue Wang, and Dimitris Samaras. 2018. DocUNet: Document Image Unwarping via A Stacked U-Net. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700--4709.

[27]

Gaofeng Meng, Ying Wang, Shenquan Qu, Shiming Xiang, and Chunhong Pan. 2014. Active flattening of curved document images via two structured beams. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3890--3897.

Digital Library

[28]

Gaofeng Meng, Shiming Xiang, Nanning Zheng, Chunhong Pan, et al. 2013. Nonparametric illumination correction for scanned document images via convex hulls. age 9, 10 (2013), 11.

[29]

Lothar Mischke and Wolfram Luther. 2005. Document image de-warping based on detection of distorted text lines. In International Conference on Image Analysis and Processing. Springer, 1068--1075.

Digital Library

[30]

Daniel Marques Oliveira and Rafael Dueire Lins. 2009. A new method for shading removal and binarization of documents acquired with portable digital cameras. In Proc. Third International Workshop Camera-Based Docu-ment Analysis and Recognition, Vol. 2. 3--10.

[31]

Jaakko Sauvola and Matti Pietikäinen. 2000. Adaptive document image binarization. Pattern recognition 33, 2 (2000), 225--236.

[32]

Vatsal Shah and Vineet Gandhi. 2018. An Iterative Approach for Shadow Removal in Document Images. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1892--1896.

[33]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[34]

Ray Smith. 2007. An overview of the Tesseract OCR engine. In Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on, Vol. 2. IEEE, 629--633.

Digital Library

[35]

Chew Lim Tan, Li Zhang, Zheng Zhang, and Tao Xia. 2006. Restoring warped document images through 3d shape modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 2 (2006), 195--208.

Digital Library

[36]

Yuandong Tian and Srinivasa G Narasimhan. 2011. Rectification and 3D reconstruction of curved document images. In CVPR 2011. IEEE, 377--384.

Digital Library

[37]

Bill Triggs, Philip F McLauchlan, Richard I Hartley, and Andrew W Fitzgibbon. 1999. Bundle adjustment a modern synthesis. In International workshop on vision algorithms. Springer, 298--372.

[38]

Yau-Chat Tsoi and Michael S Brown. 2007. Multi-view document rectification using boundary. In Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 1--8.

[39]

Adrian Ulges, Christoph H Lampert, and Thomas Breuel. 2004. Document capture using stereo vision. In Proceedings of the 2004 ACM symposium on Document engineering. ACM, 198--200.

Digital Library

[40]

Toshikazu Wada, Hiroyuki Ukida, and Takashi Matsuyama. 1997. Shape from shading with interreflections under a proximal light source: Distortion-free copying of an unfolded book. International Journal of Computer Vision 24, 2 (1997), 125--135.

Digital Library

[41]

Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. 2014. Intel math kernel library. In High-Performance Computing on the Intel® Xeon PhiâĎć. Springer, 167--188.

[42]

Changhua Wu and Gady Agam. 2002. Document image de-warping for text/graphics recognition. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). Springer, 348--357.

[43]

Atsushi Yamashita, Atsushi Kawarago, Toru Kaneko, and Kenjiro T Miura. 2004. Shape reconstruction and image restoration for non-flat surfaces of documents with a stereo vision system. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Vol. 1. IEEE, 482--485.

[44]

Lei Yang, Yu-Chiu Tse, Pedro V. Sander, Jason Lawrence, Diego Nehab, Hugues Hoppe, and Clara L. Wilkins. 2011. Image-based Bidirectional Scene Reprojection. ACM Trans. Graph. 30, 6 (2011), 150:1--150:10.

Digital Library

[45]

Shaodi You, Yasuyuki Matsushita, Sudipta Sinha, Yusuke Bou, and Katsushi Ikeuchi. 2018. Multiview rectification of folded documents. IEEE transactions on pattern analysis and machine intelligence 40, 2 (2018), 505--511.

Digital Library

[46]

Ali Zandifar. 2007. Unwarping scanned image of japanese/english documents. In Image Analysis and Processing, 2007. ICIAP 2007. 14th International Conference on. IEEE, 129--136.

[47]

Li Zhang, Andy M Yip, Michael S Brown, and Chew Lim Tan. 2009. A unified framework for document restoration using inpainting and shape-from-shading. Pattern Recognition 42, 11 (2009), 2961--2978.

Digital Library

[48]

Li Zhang, Yu Zhang, and Chew Tan. 2008. An improved physically-based method for geometric restoration of distorted document images. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 4 (2008), 728--734.

Digital Library

[49]

Zheng Zhang, Chew Lim Tan, and Liying Fan. 2004. Restoration of curved document images through 3D shape modeling. In null. IEEE, 10--15.

Cited By

Lu JCao GLi CHu S(2025)Implicit Bonded Discrete Element Method with Manifold OptimizationACM Transactions on Graphics10.1145/371185244:1(1-17)Online publication date: 9-Jan-2025
https://dl.acm.org/doi/10.1145/3711852
Han MLi H(2025)DocMamba: Robust Document Image Dewarping via Selective State Space Sequence ModelingMultiMedia Modeling10.1007/978-981-96-2054-8_23(304-318)Online publication date: 9-Jan-2025
https://dl.acm.org/doi/10.1007/978-981-96-2054-8_23
Rowlands DFinlayson G(2024)Optimisation of Convolution-Based Image Lightness ProcessingJournal of Imaging10.3390/jimaging1008020410:8(204)Online publication date: 22-Aug-2024
https://doi.org/10.3390/jimaging10080204
Show More Cited By

Index Terms

Document rectification and illumination correction using a patch-based CNN
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
        Computational photography
  2. Computer graphics
    1. Image manipulation
      1. Image processing

Recommendations

Am I readable? Transfer learning based document image rectification
Abstract
Document image rectification is a commonly explored problem in computer vision. However, in recent works, the improvements made on a distorted document page are mostly confined to a few specific and limited types of distortions in the document ...
Perspective rectification of document images using fuzzy set and morphological operations

In this paper, we deal with the problem of document image rectification from image captured by digital cameras. The improvement on the resolution of digital camera sensors has brought more and more applications for non-contact text capture. ...
Goal-Oriented Rectification of Camera-Based Document Images

Document digitization with either flatbed scanners or camera-based systems results in document images which often suffer from warping and perspective distortions that deteriorate the performance of current OCR approaches. In this paper, we present a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 38, Issue 6

December 2019

1292 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/3355089

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2019

Published in TOG Volume 38, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

79
Total Citations
View Citations
662
Total Downloads

Downloads (Last 12 months)70
Downloads (Last 6 weeks)8

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lu JCao GLi CHu S(2025)Implicit Bonded Discrete Element Method with Manifold OptimizationACM Transactions on Graphics10.1145/371185244:1(1-17)Online publication date: 9-Jan-2025
https://dl.acm.org/doi/10.1145/3711852
Han MLi H(2025)DocMamba: Robust Document Image Dewarping via Selective State Space Sequence ModelingMultiMedia Modeling10.1007/978-981-96-2054-8_23(304-318)Online publication date: 9-Jan-2025
https://dl.acm.org/doi/10.1007/978-981-96-2054-8_23
Rowlands DFinlayson G(2024)Optimisation of Convolution-Based Image Lightness ProcessingJournal of Imaging10.3390/jimaging1008020410:8(204)Online publication date: 22-Aug-2024
https://doi.org/10.3390/jimaging10080204
Wang RXue YJin LWooldridge MDy JNatarajan S(2024)DocNLCProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i6.28366(5563-5571)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i6.28366
Ye ZZheng XLiu YPeng YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)RelScene: A Benchmark and baseline for Spatial Relations in text-driven 3D Scene GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681653(10563-10571)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681653
Jiang HSong LWeng DSun ZLi HDongye XZhang ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)In Situ 3D Scene Synthesis for Ubiquitous Embodied InterfacesProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681616(3666-3675)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681616
Ying DYu FChen HLu WCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)DIG: Complex Layout Document Image Generation with Authentic-looking Text for Enhancing Layout AnalysisProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681609(3239-3247)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681609
Zhang WWang QHuang KHuang XGuo FGu XCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Document Registration: Towards Automated Labeling of Pixel-Level Alignment Between Warped-Flat DocumentsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681548(9933-9942)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681548
Zhang LPan JGettig JOney SGuo A(2024)VRCopilot: Authoring 3D Layouts with Generative AI Models in VRProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676451(1-13)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676451
Li XLi MHan XWang HYang YJiang C(2024)A Dynamic Duo of Finite Elements and Material PointsACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657449(1-11)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657449
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents