Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3595916.3626448acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

Published: 01 January 2024 Publication History

Abstract

Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of annotated instances, which is both expensive and time-consuming. As a result, differences between the source and target domains may significantly impact how well these models function. To solve this problem, domain adaptation approaches have been developed that use a small quantity of labeled data to adjust the model to the target domain. In this research, we introduced a synthetic document dataset called RanLayNet, enriched with automatically assigned labels denoting spatial positions, ranges, and types of layout elements. The primary aim of this endeavor is to develop a versatile dataset capable of training models with robustness and adaptability to diverse document formats. Through empirical experimentation, we demonstrate that a deep layout identification model trained on our dataset exhibits enhanced performance compared to a model trained solely on actual documents. Moreover, we conduct a comparative analysis by fine-tuning inference models using both PubLayNet and IIIT-AR-13K datasets on the Doclaynet dataset. Our findings emphasize that models enriched with our dataset are optimal for tasks such as achieving 0.398 and 0.588 mAP95 score in the scientific document domain for the TABLE class.

Supplementary Material

PDF File (Appendix_RanLayNet_v1.pdf)
Appendix: RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization. Paper ID: 98

References

[1]
Riaz Ahmad, Muhammad Tanvir Afzal, and Muhammad Abdul Qadir. 2016. Information extraction from PDF sources based on rule-based system using integrated formats. In Semantic Web Challenges: Third SemWebEval Challenge at ESWC 2016, Heraklion, Crete, Greece, May 29-June 2, 2016, Revised Selected Papers 3. Springer, 293–308.
[2]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213–229.
[3]
Xiyang Dai, Yinpeng Chen, Jianwei Yang, Pengchuan Zhang, Lu Yuan, and Lei Zhang. 2021. Dynamic detr: End-to-end object detection with dynamic attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2988–2997.
[4]
Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440–1448.
[5]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 580–587.
[6]
Glenn Jocher, Alex Stoken, Ayush Chaurasia, Jirka Borovec, Yonghye Kwon, Kalen Michael, Liu Changyu, Jiacong Fang, Piotr Skalski, Adam Hogan, 2021. ultralytics/yolov5: v6. 0-YOLOv5n’Nano’models, Roboflow integration, TensorFlow export, OpenCV DNN support. Zenodo (2021).
[7]
Mehran Khodabandeh, Arash Vahdat, Mani Ranjbar, and William G Macready. 2019. A robust learning approach to domain adaptive object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 480–490.
[8]
Jing Li, Wei Zhang, Liang Wang, Jiajun Hu, and Zhiguo Zhang. 2022. Unsupervised Domain Adaptation for Document Layout Detection. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR).
[9]
Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, and Ming Zhou. 2020. DocBank: A benchmark dataset for document layout analysis. arXiv preprint arXiv:2006.01038 (2020).
[10]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 740–755.
[11]
Xianlong Min, Siyuan Guo, Xiaojing Wang, Shijun Chen, Yueying Li, Jun Zhang, Jian Yang, and Yun Fu. 2020. DocLayNet: A Large-scale Dataset for Document Layout Analysis. In 2020 ACM Multimedia Conference on Multimedia Conference. ACM, 2713–2721.
[12]
Ajoy Mondal, Peter Lipps, and CV Jawahar. 2020. IIIT-AR-13K: a new dataset for graphical object detection in documents. In International Workshop on Document Analysis Systems. Springer, 216–230.
[13]
Xi Peng and Kate Saenko. 2015. Domain adaptation for visual applications: A comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2015), 147–163.
[14]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779–788.
[15]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015).
[16]
Mingxing Tan, Ruoming Pang, and Quoc V Le. 2020. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10781–10790.
[17]
Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. 2019. Detectron2. (2019).
[18]
Yifan Zhang, Yang Xiao, Shu Li, and Ziwei Liu. 2022. DocUNet: Document Image Unwarping via A Stacked U-Net. IEEE Transactions on Image Processing (2022).
[19]
Li Zhong, Zhen Fang, Feng Liu, Jie Lu, Bo Yuan, and Guangquan Zhang. 2021. How does the combined risk affect the performance of unsupervised domain adaptation approaches?. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11079–11087.
[20]
Zhiyuan Zhong, Xu Zhang, Xu-Cheng Huang, Zhi Li, Xiaodong Yang, Fei Peng, Ding Ren, Ze Chen, Deng Cai, Zhongyu Cheng, 2019. PubLayNet: largest dataset ever for document layout analysis. In 2019 25th International Conference on Pattern Recognition (ICPR). IEEE, 1046–1051.

Index Terms

  1. RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia
          December 2023
          745 pages
          ISBN:9798400702051
          DOI:10.1145/3595916
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 01 January 2024

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Domain Adaptation
          2. Domain Generalization
          3. Image Processing
          4. Object Detection

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          MMAsia '23
          Sponsor:
          MMAsia '23: ACM Multimedia Asia
          December 6 - 8, 2023
          Tainan, Taiwan

          Acceptance Rates

          Overall Acceptance Rate 59 of 204 submissions, 29%

          Upcoming Conference

          MM '24
          The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 263
            Total Downloads
          • Downloads (Last 12 months)263
          • Downloads (Last 6 weeks)54
          Reflects downloads up to 22 Sep 2024

          Other Metrics

          Citations

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media