Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Layout-aware Single-image Document Flattening

Published: 02 November 2023 Publication History

Abstract

Single image rectification of document deformation is a challenging task. Although some recent deep learning-based methods have attempted to solve this problem, they cannot achieve satisfactory results when dealing with document images with complex deformations. In this article, we propose a new efficient framework for document flattening. Our main insight is that most layout primitives in a document have rectangular outline shapes, making unwarping local layout primitives essentially homogeneous with unwarping the entire document. The former task is clearly more straightforward to solve than the latter due to the more consistent texture and relatively smooth deformation. On this basis, we propose a layout-aware deep model working in a divide-and-conquer manner. First, we employ a transformer-based segmentation module to obtain the layout information of the input document. Then a new regression module is applied to predict the global and local UV maps. Finally, we design an effective merging algorithm to correct the global prediction with local details. Both quantitative and qualitative experimental results demonstrate that our framework achieves favorable performance against state-of-the-art methods. In addition, the current publicly available document flattening datasets have limited 3D paper shapes without layout annotation and also lack a general geometric correction metric. Therefore, we build a new large-scale synthetic dataset by utilizing a fully automatic rendering method to generate deformed documents with diverse shapes and exact layout segmentation labels. We also propose a new geometric correction metric based on our paired document UV maps. Code and dataset will be released at https://github.com/BunnySoCrazy/LA-DocFlatten.

Supplementary Material

3627818-supp (3627818-supp.pdf)
Supplementary material

References

[1]
Md Amirul Islam, Mrigank Rochan, Neil D. B. Bruce, and Yang Wang. 2017. Gated feedback refinement network for dense image labeling. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’17). 3751–3759.
[2]
Galal M. Binmakhashen and Sabri A. Mahmoud. 2019. Document layout analysis: A comprehensive survey. ACM Comput. Surv. 52, 6 (2019).
[3]
Dário Augusto Borges Oliveira and Matheus Palhares Viana. 2017. Fast CNN-based document layout analysis. In International Conference on Computer Vision Workshop. 1173–1180.
[4]
Michael S. Brown and W. Brent Seales. 2001. Document restoration using 3D shape: A general deskewing algorithm for arbitrarily warped documents. In IEEE International Conference on Computer Vision (ICCV’01), Vol. 2. 367–374.
[5]
Michael S. Brown, Mingxuan Sun, Ruigang Yang, Lin Yun, and W. Brent Seales. 2007. Restoring 2D content from distorted documents. IEEE Trans. Pattern Anal. Mach. Intell. 29, 11 (2007), 1904–1916.
[6]
Michael S. Brown and Yau-Chat Tsoi. 2006. Geometric and shading correction for images of printed materials using boundary. IEEE Trans. Image Process. 15, 6 (2006), 1544–1554.
[7]
Alexander Burden, Melissa Cote, and Alexandra Branzan Albu. 2019. Rectification of camera-captured document images with mixed contents and varied layouts. In IEEE Conference on Computer Robot Vision. 33–40.
[8]
Huaigu Cao, Xiaoqing Ding, and Changsong Liu. 2003. A cylindrical surface model to rectify the bound document image. In IEEE International Conference on Computer Vision (ICCV’03), Vol. 1. 228–233.
[9]
Lei Chen, Rui Liu, Dongsheng Zhou, Xin Yang, and Qiang Zhang. 2020. Fused behavior recognition model based on attention mechanism. Vis. Comput. Industr., Biomed. Art 3, 1 (2020), 1–10.
[10]
Frédéric Courteille, Alain Crouzil, Jean-Denis Durou, and Pierre Gurdjos. 2007. Shape from shading for the digitization of curved documents. Pattern Recog. 18 (2007), 301–316.
[11]
Sagnik Das, Ke Ma, Zhixin Shu, Dimitris Samaras, and Roy Shilkrot. 2019. DewarpNet: Single-image document unwarping with stacked 3D and 2D regression networks. In IEEE International Conference on Computer Vision (ICCV’19). 131–140.
[12]
Sagnik Das, Gaurav Mishra, Akshay Sudharshana, and Roy Shilkrot. 2017. The common fold: Utilizing the four-fold to dewarp printed documents from a single image. In ACM Symposium on Document Engineering. 125–128.
[13]
Sagnik Das, Kunwar Yashraj Singh, Jon Wu, Erhan Bas, Vijay Mahadevan, Rahul Bhotika, and Dimitris Samaras. 2021. End-to-end piece-wise unwarping of document images. In IEEE/CVF International Conference on Computer Vision. 4268–4277.
[14]
Tanmoy Dasgupta, Nibaran Das, and Mita Nasipuri. 2020. Multistage curvilinear coordinate transform based document image dewarping using a novel quality estimator. CoRR abs/2003.06872 (2020).
[15]
Homa Davoudi, Marco Fiorucci, and Arianna Traviglia. 2021. Ancient document layout analysis: Autoencoders meet sparse coding. In International Conference on Pattern Recognition. 5936–5942.
[16]
Henghui Ding, Xudong Jiang, Bing Shuai, Ai Qun Liu, and Gang Wang. 2020. Semantic segmentation with context encoding and multi-path decoding. IEEE Trans. Image Process. 29 (2020), 3520–3533.
[17]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.
[18]
Mohamed Fawzi, Mohsen. A. Rashwan, Hany Ahmed, Shaimaa Samir, Sherif M. Abdou, Hassanin M. Al-Barhamtoshy, and Kamal M. Jambi. 2015. Rectification of camera captured document images for camera-based OCR technology. In IAPR International Conference on Document Analysis and Recognition. 1226–1230.
[19]
Hao Feng, Shaokai Liu, Jiajun Deng, Wengang Zhou, and Houqiang Li. 2023. Deep unrestricted document image rectification. arXiv preprint arXiv:2304.08796.
[20]
Hao Feng, Yuechen Wang, Wengang Zhou, Jiajun Deng, and Houqiang Li. 2021a. DocTr: Document image transformer for geometric unwarping and illumination correction. In ACM International Conference on Multimedia. 273–281.
[21]
Hao Feng, Wengang Zhou, Jiajun Deng, Qi Tian, and Houqiang Li. 2021b. DocScanner: Robust document image rectification with progressive learning. CoRR abs/2110.14968 (2021).
[22]
Hao Feng, Wengang Zhou, Jiajun Deng, Yuechen Wang, and Houqiang Li. 2022. Geometric representation learning for document image rectification. In European Conference on Computer Vision.
[23]
Marc-André Gardner, Kalyan Sunkavalli, Ersin Yumer, Xiaohui Shen, Emiliano Gambaretto, Christian Gagné, and Jean-François Lalonde. 2017. Learning to predict indoor illumination from a single image. ACM Trans. Graph. 36, 6 (2017).
[24]
Dafang He, Scott Cohen, Brian Price, Daniel Kifer, and C. Lee Giles. 2017. Multi-scale multi-task FCN for semantic page segmentation and table detection. In IAPR International Conference on Document Analysis and Recognition, Vol. 01. 254–261.
[25]
Yuan He, Pan Pan, Shufu Xie, Jun Sun, and Satoshi Naoi. 2013. A book dewarping system by boundary-based 3D surface reconstruction. In IAPR International Conference on Document Analysis and Recognition. 403–407.
[26]
Xiangwei Jiang, Rujiao Long, Nan Xue, Zhibo Yang, Cong Yao, and Gui-Song Xia. 2022. Revisiting document image dewarping by grid regularization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4543–4552.
[27]
Beom Su Kim, Hyung Il Koo, and Nam Ik Cho. 2015. Document dewarping via text-line based optimization. Pattern Recog. 48, 11 (2015), 3600–3614.
[28]
Theodore Kim, Nils Thürey, Doug James, and Markus Gross. 2008. Wavelet turbulence for fluid simulation. ACM Trans. Graph. 27, 3 (2008), 1–6.
[29]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations.
[30]
Hyung Il Koo and Nam Ik Cho. 2010. State estimation in a document image and its application in text block identification and text line extraction. In European Conference on Computer Vision (ECCV’10). 421–434.
[31]
Olivier Lavialle, X. Molines, Franck Angella, and Pierre Baylou. 2001. Active contours network to straighten distorted text lines. In IEEE International Conference on Image Processing, Vol. 3. 748–751.
[32]
Xiaoyu Li, Bo Zhang, Jing Liao, and Pedro V. Sander. 2019. Document rectification and illumination correction using a patch-based CNN. ACM Trans. Graph. 38, 6 (2019).
[33]
Jian Liang, Daniel DeMenthon, and David Doermann. 2008. Geometric rectification of camera-captured document images. IEEE Trans. Pattern Anal. Mach. Intell. 30, 4 (2008), 591–605.
[34]
Guosheng Lin, Anton Milan, Chunhua Shen, and Ian Reid. 2017b. RefineNet: Multi-path refinement networks for high-resolution semantic segmentation. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’17). 1925–1934.
[35]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017a. Feature pyramid networks for object detection. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’17). 2117–2125.
[36]
Ce Liu, Jenny Yuen, and Antonio Torralba. 2010. Sift flow: Dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33, 5 (2010), 978–994.
[37]
Xiyan Liu, Gaofeng Meng, Bin Fan, Shiming Xiang, and Chunhong Pan. 2020. Geometric rectification of document images using adversarial gated unwarping network. Pattern Recog. 108 (2020), 107576.
[38]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’15). 3431–3440.
[39]
Ke Ma, Sagnik Das, Zhixin Shu, and Dimitris Samaras. 2022. Learning from documents in the wild to improve document unwarping. In ACM SIGGRAPH Conference Proceedings. 1–9.
[40]
Ke Ma, Zhixin Shu, Xue Bai, Jue Wang, and Dimitris Samaras. 2018. DocUNet: Document image unwarping via a stacked u-net. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’18).
[41]
Amir Markovitz, Inbal Lavi, Or Perel, Shai Mazor, and Roee Litman. 2020. Can you read me now? Content aware rectification using angle supervision. In European Conference on Computer Vision (ECCV’20). 208–223.
[42]
Gaofeng Meng, Yuanqi Su, Ying Wu, Shiming Xiang, and Chunhong Pan. 2018. Exploiting vector fields for geometric rectification of distorted document images. In European Conference on Computer Vision (ECCV’18). 180–195.
[43]
Gaofeng Meng, Ying Wang, Shenquan Qu, Shiming Xiang, and Chunhong Pan. 2014. Active flattening of curved document images via two structured beams. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’14). 3890–3897.
[44]
Lothar Mischke and Wolfram Luther. 2005. Document image de-warping based on detection of distorted text lines. In International Conference on Image Analysis Processing. 1068–1075.
[45]
Tobias Pfaff, Nils Thuerey, Jonathan Cohen, Sarah Tariq, and Markus Gross. 2010. Scalable fluid simulation using anisotropic turbulence particles. ACM Trans. Graph. 29, 6 (2010).
[46]
René Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. 2021. Vision transformers for dense prediction. In IEEE International Conference on Computer Vision (ICCV’21). 12179–12188.
[47]
Dhaval Salvi, Kang Zheng, Youjie Zhou, and Song Wang. 2015. Distance transform based active contour approach for document image rectification. In IEEE Winter Conference on on Applications of Computer Vision. 757–764.
[48]
Nikolaos Stamatopoulos, Basilis Gatos, Ioannis Pratikakis, and Stavros J. Perantonis. 2011. Goal-oriented rectification of camera-based document images. IEEE Trans. Image Process. 20, 4 (2011), 910–920.
[49]
Mingxuan Sun, Ruigang Yang, Lin Yun, G. Landon, W. Brent Seales, and Michael S. Brown. 2005. Geometric and photometric restoration of distorted documents. In IEEE International Conference on Computer Vision (ICCV’05), Vol. 2. 1117–1123.
[50]
Yusuke Takezawa, Makoto Hasegawa, and Salvatore Tabbone. 2017. Robust perspective rectification of camera-captured document images. In IAPR International Conference on Document Analysis and Recognition, Vol. 06. 27–32.
[51]
Chew Lim Tan, Li Zhang, Zheng Zhang, and Tao Xia. 2006. Restoring warped document images through 3D shape modeling. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2 (2006), 195–208.
[52]
Yuandong Tian and Srinivasa G. Narasimhan. 2011. Rectification and 3D reconstruction of curved document images. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’11). 377–384.
[53]
Yau-Chat Tsoi and Michael S. Brown. 2007. Multi-view document rectification using boundary. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’07). 1–8.
[54]
Adrian Ulges, Christoph H. Lampert, and Thomas M. Breuel. 2005. Document image dewarping using robust estimation of curled text lines. In IAPR International Conference on Document Analysis and Recognition, Vol. 2. 1001– 1005.
[55]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, California, 6000–6010.
[56]
Toshikazu Wada, Hiroyuki Ukida, and Takashi Matsuyama. 1997. Shape from shading with interreflections under a proximal light source: Distortion-free copying of an unfolded book.Int. J. Comput. Vis. 24, 2 (1997), 125–135.
[57]
Zhou Wang, Eero P. Simoncelli, and Alan C. Bovik. 2003. Multiscale structural similarity for image quality assessment. In 37th Asilomar Conference on Signals, Systems & Computers, Vol. 2. IEEE, 1398–1402.
[58]
Xingjiao Wu, Ziling Hu, Xiangcheng Du, Jing Yang, and Liang He. 2021. Document layout analysis via dynamic residual feature fusion. In International Conference on Multimedia and Expo. 1–6.
[59]
Ke Xian, Chunhua Shen, Zhiguo Cao, Hao Lu, Yang Xiao, Ruibo Li, and Zhenbo Luo. 2018. Monocular relative depth perception with web stereo data supervision. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’18). 311–320.
[60]
Guo-Wang Xie, Fei Yin, Xu-Yao Zhang, and Cheng-Lin Liu. 2021a. Dewarping document image by displacement flow estimation with fully convolutional network. CoRR abs/2104.06815 (2021).
[61]
Guo-Wang Xie, Fei Yin, Xu-Yao Zhang, and Cheng-Lin Liu. 2021b. Document dewarping with control points. In International Conference on Document Analysis and Recognition. Springer, 466–480.
[62]
Shaodi You, Yasuyuki Matsushita, Sudipta Sinha, Yusuke Bou, and Katsushi Ikeuchi. 2018. Multiview rectification of folded documents. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2 (2018), 505–511.
[63]
Ali Zandifar. 2007. Unwarping scanned image of Japanese/English documents. In International Conference on Image Analysis and Processing. 129–136.
[64]
Jiaxin Zhang, Canjie Luo, Lianwen Jin, Fengjun Guo, and Kai Ding. 2022. Marior: Margin removal and iterative content rectification for document dewarping in the wild. arXiv preprint arXiv:2207.11515 (2022).
[65]
Li Zhang, A. M. Yip, M. S. Brown, and Chew Lim Tan. 2009. A unified framework for document restoration using inpainting and shape-from-shading. Pattern Recog. 42, 11 (2009), 2961–2978.
[66]
Li Zhang, Yu Zhang, and Chew Tan. 2008. An improved physically-based method for geometric restoration of distorted document images. IEEE Trans. Anal. Mach. Intell. 30, 4 (2008), 728–734.
[67]
Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. 2019. PubLayNet: Largest dataset ever for document layout analysis. In IAPR International Conference on Document Analysis and Recognition. 1015–1022.

Cited By

View all
  • (2024)Towards using Eye Gaze Redirection in Immersive Reading Tasks for Visual Fatigue ReductionCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678474(607-611)Online publication date: 5-Oct-2024
  • (2024)Table image dewarping with key element segmentationInternational Journal on Document Analysis and Recognition10.1007/s10032-024-00480-z27:3(349-362)Online publication date: 15-Jul-2024
  • (2024)Am I readable? Transfer learning based document image rectificationInternational Journal on Document Analysis and Recognition10.1007/s10032-024-00476-927:3(433-446)Online publication date: 22-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 43, Issue 1
February 2024
211 pages
EISSN:1557-7368
DOI:10.1145/3613512
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2023
Online AM: 13 October 2023
Accepted: 25 September 2023
Revised: 10 July 2023
Received: 08 September 2022
Published in TOG Volume 43, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Document image rectification
  2. document layout analysis
  3. deep neural networks
  4. geometric models

Qualifiers

  • Research-article

Funding Sources

  • NSFC
  • Youth Innovation Promotion Association of the Chinese Academy of Sciences

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)862
  • Downloads (Last 6 weeks)95
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards using Eye Gaze Redirection in Immersive Reading Tasks for Visual Fatigue ReductionCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678474(607-611)Online publication date: 5-Oct-2024
  • (2024)Table image dewarping with key element segmentationInternational Journal on Document Analysis and Recognition10.1007/s10032-024-00480-z27:3(349-362)Online publication date: 15-Jul-2024
  • (2024)Am I readable? Transfer learning based document image rectificationInternational Journal on Document Analysis and Recognition10.1007/s10032-024-00476-927:3(433-446)Online publication date: 22-May-2024
  • (2024)Coarse-to-Fine Document Image Registration for DewarpingDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70546-5_20(343-358)Online publication date: 11-Sep-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media