research-article

Layout-aware Single-image Document Flattening

Authors:

Dong-Ming YanAuthors Info & Claims

ACM Transactions on Graphics, Volume 43, Issue 1

Article No.: 9, Pages 1 - 17

https://doi.org/10.1145/3627818

Published: 02 November 2023 Publication History

Abstract

Single image rectification of document deformation is a challenging task. Although some recent deep learning-based methods have attempted to solve this problem, they cannot achieve satisfactory results when dealing with document images with complex deformations. In this article, we propose a new efficient framework for document flattening. Our main insight is that most layout primitives in a document have rectangular outline shapes, making unwarping local layout primitives essentially homogeneous with unwarping the entire document. The former task is clearly more straightforward to solve than the latter due to the more consistent texture and relatively smooth deformation. On this basis, we propose a layout-aware deep model working in a divide-and-conquer manner. First, we employ a transformer-based segmentation module to obtain the layout information of the input document. Then a new regression module is applied to predict the global and local UV maps. Finally, we design an effective merging algorithm to correct the global prediction with local details. Both quantitative and qualitative experimental results demonstrate that our framework achieves favorable performance against state-of-the-art methods. In addition, the current publicly available document flattening datasets have limited 3D paper shapes without layout annotation and also lack a general geometric correction metric. Therefore, we build a new large-scale synthetic dataset by utilizing a fully automatic rendering method to generate deformed documents with diverse shapes and exact layout segmentation labels. We also propose a new geometric correction metric based on our paired document UV maps. Code and dataset will be released at https://github.com/BunnySoCrazy/LA-DocFlatten.

Supplementary Material

3627818-supp (3627818-supp.pdf)

Supplementary material

Download
44.38 KB

References

[1]

Md Amirul Islam, Mrigank Rochan, Neil D. B. Bruce, and Yang Wang. 2017. Gated feedback refinement network for dense image labeling. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’17). 3751–3759.

[2]

Galal M. Binmakhashen and Sabri A. Mahmoud. 2019. Document layout analysis: A comprehensive survey. ACM Comput. Surv. 52, 6 (2019).

Digital Library

[3]

Dário Augusto Borges Oliveira and Matheus Palhares Viana. 2017. Fast CNN-based document layout analysis. In International Conference on Computer Vision Workshop. 1173–1180.

[4]

Michael S. Brown and W. Brent Seales. 2001. Document restoration using 3D shape: A general deskewing algorithm for arbitrarily warped documents. In IEEE International Conference on Computer Vision (ICCV’01), Vol. 2. 367–374.

[5]

Michael S. Brown, Mingxuan Sun, Ruigang Yang, Lin Yun, and W. Brent Seales. 2007. Restoring 2D content from distorted documents. IEEE Trans. Pattern Anal. Mach. Intell. 29, 11 (2007), 1904–1916.

Digital Library

[6]

Michael S. Brown and Yau-Chat Tsoi. 2006. Geometric and shading correction for images of printed materials using boundary. IEEE Trans. Image Process. 15, 6 (2006), 1544–1554.

Digital Library

[7]

Alexander Burden, Melissa Cote, and Alexandra Branzan Albu. 2019. Rectification of camera-captured document images with mixed contents and varied layouts. In IEEE Conference on Computer Robot Vision. 33–40.

[8]

Huaigu Cao, Xiaoqing Ding, and Changsong Liu. 2003. A cylindrical surface model to rectify the bound document image. In IEEE International Conference on Computer Vision (ICCV’03), Vol. 1. 228–233.

[9]

Lei Chen, Rui Liu, Dongsheng Zhou, Xin Yang, and Qiang Zhang. 2020. Fused behavior recognition model based on attention mechanism. Vis. Comput. Industr., Biomed. Art 3, 1 (2020), 1–10.

[10]

Frédéric Courteille, Alain Crouzil, Jean-Denis Durou, and Pierre Gurdjos. 2007. Shape from shading for the digitization of curved documents. Pattern Recog. 18 (2007), 301–316.

[11]

Sagnik Das, Ke Ma, Zhixin Shu, Dimitris Samaras, and Roy Shilkrot. 2019. DewarpNet: Single-image document unwarping with stacked 3D and 2D regression networks. In IEEE International Conference on Computer Vision (ICCV’19). 131–140.

[12]

Sagnik Das, Gaurav Mishra, Akshay Sudharshana, and Roy Shilkrot. 2017. The common fold: Utilizing the four-fold to dewarp printed documents from a single image. In ACM Symposium on Document Engineering. 125–128.

Digital Library

[13]

Sagnik Das, Kunwar Yashraj Singh, Jon Wu, Erhan Bas, Vijay Mahadevan, Rahul Bhotika, and Dimitris Samaras. 2021. End-to-end piece-wise unwarping of document images. In IEEE/CVF International Conference on Computer Vision. 4268–4277.

[14]

Tanmoy Dasgupta, Nibaran Das, and Mita Nasipuri. 2020. Multistage curvilinear coordinate transform based document image dewarping using a novel quality estimator. CoRR abs/2003.06872 (2020).

[15]

Homa Davoudi, Marco Fiorucci, and Arianna Traviglia. 2021. Ancient document layout analysis: Autoencoders meet sparse coding. In International Conference on Pattern Recognition. 5936–5942.

[16]

Henghui Ding, Xudong Jiang, Bing Shuai, Ai Qun Liu, and Gang Wang. 2020. Semantic segmentation with context encoding and multi-path decoding. IEEE Trans. Image Process. 29 (2020), 3520–3533.

Digital Library

[17]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.

[18]

Mohamed Fawzi, Mohsen. A. Rashwan, Hany Ahmed, Shaimaa Samir, Sherif M. Abdou, Hassanin M. Al-Barhamtoshy, and Kamal M. Jambi. 2015. Rectification of camera captured document images for camera-based OCR technology. In IAPR International Conference on Document Analysis and Recognition. 1226–1230.

Digital Library

[19]

Hao Feng, Shaokai Liu, Jiajun Deng, Wengang Zhou, and Houqiang Li. 2023. Deep unrestricted document image rectification. arXiv preprint arXiv:2304.08796.

[20]

Hao Feng, Yuechen Wang, Wengang Zhou, Jiajun Deng, and Houqiang Li. 2021a. DocTr: Document image transformer for geometric unwarping and illumination correction. In ACM International Conference on Multimedia. 273–281.

Digital Library

[21]

Hao Feng, Wengang Zhou, Jiajun Deng, Qi Tian, and Houqiang Li. 2021b. DocScanner: Robust document image rectification with progressive learning. CoRR abs/2110.14968 (2021).

[22]

Hao Feng, Wengang Zhou, Jiajun Deng, Yuechen Wang, and Houqiang Li. 2022. Geometric representation learning for document image rectification. In European Conference on Computer Vision.

Digital Library

[23]

Marc-André Gardner, Kalyan Sunkavalli, Ersin Yumer, Xiaohui Shen, Emiliano Gambaretto, Christian Gagné, and Jean-François Lalonde. 2017. Learning to predict indoor illumination from a single image. ACM Trans. Graph. 36, 6 (2017).

Digital Library

[24]

Dafang He, Scott Cohen, Brian Price, Daniel Kifer, and C. Lee Giles. 2017. Multi-scale multi-task FCN for semantic page segmentation and table detection. In IAPR International Conference on Document Analysis and Recognition, Vol. 01. 254–261.

[25]

Yuan He, Pan Pan, Shufu Xie, Jun Sun, and Satoshi Naoi. 2013. A book dewarping system by boundary-based 3D surface reconstruction. In IAPR International Conference on Document Analysis and Recognition. 403–407.

Digital Library

[26]

Xiangwei Jiang, Rujiao Long, Nan Xue, Zhibo Yang, Cong Yao, and Gui-Song Xia. 2022. Revisiting document image dewarping by grid regularization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4543–4552.

[27]

Beom Su Kim, Hyung Il Koo, and Nam Ik Cho. 2015. Document dewarping via text-line based optimization. Pattern Recog. 48, 11 (2015), 3600–3614.

Digital Library

[28]

Theodore Kim, Nils Thürey, Doug James, and Markus Gross. 2008. Wavelet turbulence for fluid simulation. ACM Trans. Graph. 27, 3 (2008), 1–6.

Digital Library

[29]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations.

[30]

Hyung Il Koo and Nam Ik Cho. 2010. State estimation in a document image and its application in text block identification and text line extraction. In European Conference on Computer Vision (ECCV’10). 421–434.

[31]

Olivier Lavialle, X. Molines, Franck Angella, and Pierre Baylou. 2001. Active contours network to straighten distorted text lines. In IEEE International Conference on Image Processing, Vol. 3. 748–751.

[32]

Xiaoyu Li, Bo Zhang, Jing Liao, and Pedro V. Sander. 2019. Document rectification and illumination correction using a patch-based CNN. ACM Trans. Graph. 38, 6 (2019).

Digital Library

[33]

Jian Liang, Daniel DeMenthon, and David Doermann. 2008. Geometric rectification of camera-captured document images. IEEE Trans. Pattern Anal. Mach. Intell. 30, 4 (2008), 591–605.

Digital Library

[34]

Guosheng Lin, Anton Milan, Chunhua Shen, and Ian Reid. 2017b. RefineNet: Multi-path refinement networks for high-resolution semantic segmentation. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’17). 1925–1934.

[35]

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017a. Feature pyramid networks for object detection. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’17). 2117–2125.

[36]

Ce Liu, Jenny Yuen, and Antonio Torralba. 2010. Sift flow: Dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33, 5 (2010), 978–994.

Digital Library

[37]

Xiyan Liu, Gaofeng Meng, Bin Fan, Shiming Xiang, and Chunhong Pan. 2020. Geometric rectification of document images using adversarial gated unwarping network. Pattern Recog. 108 (2020), 107576.

[38]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’15). 3431–3440.

[39]

Ke Ma, Sagnik Das, Zhixin Shu, and Dimitris Samaras. 2022. Learning from documents in the wild to improve document unwarping. In ACM SIGGRAPH Conference Proceedings. 1–9.

Digital Library

[40]

Ke Ma, Zhixin Shu, Xue Bai, Jue Wang, and Dimitris Samaras. 2018. DocUNet: Document image unwarping via a stacked u-net. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’18).

[41]

Amir Markovitz, Inbal Lavi, Or Perel, Shai Mazor, and Roee Litman. 2020. Can you read me now? Content aware rectification using angle supervision. In European Conference on Computer Vision (ECCV’20). 208–223.

Digital Library

[42]

Gaofeng Meng, Yuanqi Su, Ying Wu, Shiming Xiang, and Chunhong Pan. 2018. Exploiting vector fields for geometric rectification of distorted document images. In European Conference on Computer Vision (ECCV’18). 180–195.

Digital Library

[43]

Gaofeng Meng, Ying Wang, Shenquan Qu, Shiming Xiang, and Chunhong Pan. 2014. Active flattening of curved document images via two structured beams. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’14). 3890–3897.

Digital Library

[44]

Lothar Mischke and Wolfram Luther. 2005. Document image de-warping based on detection of distorted text lines. In International Conference on Image Analysis Processing. 1068–1075.

Digital Library

[45]

Tobias Pfaff, Nils Thuerey, Jonathan Cohen, Sarah Tariq, and Markus Gross. 2010. Scalable fluid simulation using anisotropic turbulence particles. ACM Trans. Graph. 29, 6 (2010).

Digital Library

[46]

René Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. 2021. Vision transformers for dense prediction. In IEEE International Conference on Computer Vision (ICCV’21). 12179–12188.

[47]

Dhaval Salvi, Kang Zheng, Youjie Zhou, and Song Wang. 2015. Distance transform based active contour approach for document image rectification. In IEEE Winter Conference on on Applications of Computer Vision. 757–764.

Digital Library

[48]

Nikolaos Stamatopoulos, Basilis Gatos, Ioannis Pratikakis, and Stavros J. Perantonis. 2011. Goal-oriented rectification of camera-based document images. IEEE Trans. Image Process. 20, 4 (2011), 910–920.

Digital Library

[49]

Mingxuan Sun, Ruigang Yang, Lin Yun, G. Landon, W. Brent Seales, and Michael S. Brown. 2005. Geometric and photometric restoration of distorted documents. In IEEE International Conference on Computer Vision (ICCV’05), Vol. 2. 1117–1123.

[50]

Yusuke Takezawa, Makoto Hasegawa, and Salvatore Tabbone. 2017. Robust perspective rectification of camera-captured document images. In IAPR International Conference on Document Analysis and Recognition, Vol. 06. 27–32.

[51]

Chew Lim Tan, Li Zhang, Zheng Zhang, and Tao Xia. 2006. Restoring warped document images through 3D shape modeling. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2 (2006), 195–208.

Digital Library

[52]

Yuandong Tian and Srinivasa G. Narasimhan. 2011. Rectification and 3D reconstruction of curved document images. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’11). 377–384.

Digital Library

[53]

Yau-Chat Tsoi and Michael S. Brown. 2007. Multi-view document rectification using boundary. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’07). 1–8.

[54]

Adrian Ulges, Christoph H. Lampert, and Thomas M. Breuel. 2005. Document image dewarping using robust estimation of curled text lines. In IAPR International Conference on Document Analysis and Recognition, Vol. 2. 1001– 1005.

Digital Library

[55]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, California, 6000–6010.

[56]

Toshikazu Wada, Hiroyuki Ukida, and Takashi Matsuyama. 1997. Shape from shading with interreflections under a proximal light source: Distortion-free copying of an unfolded book.Int. J. Comput. Vis. 24, 2 (1997), 125–135.

Digital Library

[57]

Zhou Wang, Eero P. Simoncelli, and Alan C. Bovik. 2003. Multiscale structural similarity for image quality assessment. In 37th Asilomar Conference on Signals, Systems & Computers, Vol. 2. IEEE, 1398–1402.

[58]

Xingjiao Wu, Ziling Hu, Xiangcheng Du, Jing Yang, and Liang He. 2021. Document layout analysis via dynamic residual feature fusion. In International Conference on Multimedia and Expo. 1–6.

[59]

Ke Xian, Chunhua Shen, Zhiguo Cao, Hao Lu, Yang Xiao, Ruibo Li, and Zhenbo Luo. 2018. Monocular relative depth perception with web stereo data supervision. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’18). 311–320.

[60]

Guo-Wang Xie, Fei Yin, Xu-Yao Zhang, and Cheng-Lin Liu. 2021a. Dewarping document image by displacement flow estimation with fully convolutional network. CoRR abs/2104.06815 (2021).

[61]

Guo-Wang Xie, Fei Yin, Xu-Yao Zhang, and Cheng-Lin Liu. 2021b. Document dewarping with control points. In International Conference on Document Analysis and Recognition. Springer, 466–480.

Digital Library

[62]

Shaodi You, Yasuyuki Matsushita, Sudipta Sinha, Yusuke Bou, and Katsushi Ikeuchi. 2018. Multiview rectification of folded documents. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2 (2018), 505–511.

Digital Library

[63]

Ali Zandifar. 2007. Unwarping scanned image of Japanese/English documents. In International Conference on Image Analysis and Processing. 129–136.

[64]

Jiaxin Zhang, Canjie Luo, Lianwen Jin, Fengjun Guo, and Kai Ding. 2022. Marior: Margin removal and iterative content rectification for document dewarping in the wild. arXiv preprint arXiv:2207.11515 (2022).

[65]

Li Zhang, A. M. Yip, M. S. Brown, and Chew Lim Tan. 2009. A unified framework for document restoration using inpainting and shape-from-shading. Pattern Recog. 42, 11 (2009), 2961–2978.

Digital Library

[66]

Li Zhang, Yu Zhang, and Chew Tan. 2008. An improved physically-based method for geometric restoration of distorted document images. IEEE Trans. Anal. Mach. Intell. 30, 4 (2008), 728–734.

[67]

Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. 2019. PubLayNet: Largest dataset ever for document layout analysis. In IAPR International Conference on Document Analysis and Recognition. 1015–1022.

Cited By

Li YXu HKitamura YTag BFujita KKostakos VKay JHoang T(2024)Towards using Eye Gaze Redirection in Immersive Reading Tasks for Visual Fatigue ReductionCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678474(607-611)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3675094.3678474
Zhu ZTang ZGao L(2024)Table image dewarping with key element segmentationInternational Journal on Document Analysis and Recognition10.1007/s10032-024-00480-z27:3(349-362)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.1007/s10032-024-00480-z
Kumari PDas S(2024)Am I readable? Transfer learning based document image rectificationInternational Journal on Document Analysis and Recognition10.1007/s10032-024-00476-927:3(433-446)Online publication date: 22-May-2024
https://dl.acm.org/doi/10.1007/s10032-024-00476-9
Show More Cited By

Index Terms

Layout-aware Single-image Document Flattening
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Document analysis
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
      2. Computer vision representations
        Image representations
  2. Computer graphics
    1. Image manipulation
      1. Image-based rendering

Index terms have been assigned to the content through auto-classification.

Recommendations

A Deep Learning-Based System for Document Layout Analysis
ICMLSC '22: Proceedings of the 2022 6th International Conference on Machine Learning and Soft Computing

Document image understanding is an essential process in the digital transformation era. Those systems automatically convert a paper document to a digital document for storing and information extracting. In practice, document layout analysis is a ...
Geometric Representation Learning for Document Image Rectification
Computer Vision – ECCV 2022
Abstract
In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one. However, such geometric constraints are largely ignored in existing advanced solutions, which limits the rectification ...
Document Layout Analysis Based on Emergent Computation
ICDAR '97: Proceedings of the 4th International Conference on Document Analysis and Recognition

A new method of document layout analysis is proposed for a document reader to be used for reading a wide variety of documents. Emergent computation, which is a key concept of artificial life, is adopted to analyze various complex document structures. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 43, Issue 1

February 2024

211 pages

EISSN:1557-7368

DOI:10.1145/3613512

Editor:
Carol O'Sullivan
Trinity College Dublin, Ireland

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2023

Online AM: 13 October 2023

Accepted: 25 September 2023

Revised: 10 July 2023

Received: 08 September 2022

Published in TOG Volume 43, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSFC
Youth Innovation Promotion Association of the Chinese Academy of Sciences

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
862
Total Downloads

Downloads (Last 12 months)862
Downloads (Last 6 weeks)95

Reflects downloads up to 06 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li YXu HKitamura YTag BFujita KKostakos VKay JHoang T(2024)Towards using Eye Gaze Redirection in Immersive Reading Tasks for Visual Fatigue ReductionCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678474(607-611)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3675094.3678474
Zhu ZTang ZGao L(2024)Table image dewarping with key element segmentationInternational Journal on Document Analysis and Recognition10.1007/s10032-024-00480-z27:3(349-362)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.1007/s10032-024-00480-z
Kumari PDas S(2024)Am I readable? Transfer learning based document image rectificationInternational Journal on Document Analysis and Recognition10.1007/s10032-024-00476-927:3(433-446)Online publication date: 22-May-2024
https://dl.acm.org/doi/10.1007/s10032-024-00476-9
Zhang WWang QHuang KGu XGuo F(2024)Coarse-to-Fine Document Image Registration for DewarpingDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70546-5_20(343-358)Online publication date: 11-Sep-2024
https://doi.org/10.1007/978-3-031-70546-5_20

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents