research-article

Open access

RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

Authors:

Siddhesh S Bangar,

Rajiv Ratn Shah,

Shin'Ichi SatohAuthors Info & Claims

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

Article No.: 74, Pages 1 - 6

https://doi.org/10.1145/3595916.3626448

Published: 01 January 2024 Publication History

All formats PDF

Abstract

Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of annotated instances, which is both expensive and time-consuming. As a result, differences between the source and target domains may significantly impact how well these models function. To solve this problem, domain adaptation approaches have been developed that use a small quantity of labeled data to adjust the model to the target domain. In this research, we introduced a synthetic document dataset called RanLayNet, enriched with automatically assigned labels denoting spatial positions, ranges, and types of layout elements. The primary aim of this endeavor is to develop a versatile dataset capable of training models with robustness and adaptability to diverse document formats. Through empirical experimentation, we demonstrate that a deep layout identification model trained on our dataset exhibits enhanced performance compared to a model trained solely on actual documents. Moreover, we conduct a comparative analysis by fine-tuning inference models using both PubLayNet and IIIT-AR-13K datasets on the Doclaynet dataset. Our findings emphasize that models enriched with our dataset are optimal for tasks such as achieving 0.398 and 0.588 mAP95 score in the scientific document domain for the TABLE class.

Supplementary Material

PDF File (Appendix_RanLayNet_v1.pdf)

Appendix: RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization. Paper ID: 98

Download
824.67 KB

References

[1]

Riaz Ahmad, Muhammad Tanvir Afzal, and Muhammad Abdul Qadir. 2016. Information extraction from PDF sources based on rule-based system using integrated formats. In Semantic Web Challenges: Third SemWebEval Challenge at ESWC 2016, Heraklion, Crete, Greece, May 29-June 2, 2016, Revised Selected Papers 3. Springer, 293–308.

[2]

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213–229.

Digital Library

[3]

Xiyang Dai, Yinpeng Chen, Jianwei Yang, Pengchuan Zhang, Lu Yuan, and Lei Zhang. 2021. Dynamic detr: End-to-end object detection with dynamic attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2988–2997.

[4]

Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440–1448.

Digital Library

[5]

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 580–587.

Digital Library

[6]

Glenn Jocher, Alex Stoken, Ayush Chaurasia, Jirka Borovec, Yonghye Kwon, Kalen Michael, Liu Changyu, Jiacong Fang, Piotr Skalski, Adam Hogan, 2021. ultralytics/yolov5: v6. 0-YOLOv5n’Nano’models, Roboflow integration, TensorFlow export, OpenCV DNN support. Zenodo (2021).

[7]

Mehran Khodabandeh, Arash Vahdat, Mani Ranjbar, and William G Macready. 2019. A robust learning approach to domain adaptive object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 480–490.

[8]

Jing Li, Wei Zhang, Liang Wang, Jiajun Hu, and Zhiguo Zhang. 2022. Unsupervised Domain Adaptation for Document Layout Detection. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR).

[9]

Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, and Ming Zhou. 2020. DocBank: A benchmark dataset for document layout analysis. arXiv preprint arXiv:2006.01038 (2020).

[10]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 740–755.

[11]

Xianlong Min, Siyuan Guo, Xiaojing Wang, Shijun Chen, Yueying Li, Jun Zhang, Jian Yang, and Yun Fu. 2020. DocLayNet: A Large-scale Dataset for Document Layout Analysis. In 2020 ACM Multimedia Conference on Multimedia Conference. ACM, 2713–2721.

[12]

Ajoy Mondal, Peter Lipps, and CV Jawahar. 2020. IIIT-AR-13K: a new dataset for graphical object detection in documents. In International Workshop on Document Analysis Systems. Springer, 216–230.

Digital Library

[13]

Xi Peng and Kate Saenko. 2015. Domain adaptation for visual applications: A comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2015), 147–163.

[14]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779–788.

[15]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015).

[16]

Mingxing Tan, Ruoming Pang, and Quoc V Le. 2020. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10781–10790.

[17]

Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. 2019. Detectron2. (2019).

[18]

Yifan Zhang, Yang Xiao, Shu Li, and Ziwei Liu. 2022. DocUNet: Document Image Unwarping via A Stacked U-Net. IEEE Transactions on Image Processing (2022).

[19]

Li Zhong, Zhen Fang, Feng Liu, Jie Lu, Bo Yuan, and Guangquan Zhang. 2021. How does the combined risk affect the performance of unsupervised domain adaptation approaches?. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11079–11087.

[20]

Zhiyuan Zhong, Xu Zhang, Xu-Cheng Huang, Zhi Li, Xiaodong Yang, Fei Peng, Ding Ren, Ze Chen, Deng Cai, Zhongyu Cheng, 2019. PubLayNet: largest dataset ever for document layout analysis. In 2019 25th International Conference on Pattern Recognition (ICPR). IEEE, 1046–1051.

Index Terms

RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

Index terms have been assigned to the content through auto-classification.

Recommendations

Zero-Shot Deep Domain Adaptation
Computer Vision – ECCV 2018
Abstract
Domain adaptation is an important tool to transfer knowledge about a task (e.g. classification) learned in a source domain to a second, or target domain. Current approaches assume that task-relevant target-domain data is available during training. ...
Open-world Domain Adaptation and Generalization
ACM-TURC '24: Proceedings of the ACM Turing Award Celebration Conference - China 2024

Deep learning has achieved unprecedented success in various artificial intelligence areas and tasks. One precondition is that large-scale labeled training data is provided to train a neural network. Although recent self-supervised pre-training can ...
Unsupervised Multi-camera Domain Adaptation for Object Detection in Cultural Sites
Image Analysis and Processing – ICIAP 2022
Abstract
Domain adaptation approaches can be used to efficiently train object detectors by leveraging labeled synthetic images, inexpensively generated from 3D models, and unlabeled real images, which are cheaper to obtain than labeled ones. Most of the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

December 2023

745 pages

ISBN:9798400702051

DOI:10.1145/3595916

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

MMAsia '23

Sponsor:

SIGMM

MMAsia '23: ACM Multimedia Asia

December 6 - 8, 2023

Tainan, Taiwan

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
263
Total Downloads

Downloads (Last 12 months)263
Downloads (Last 6 weeks)54

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents