RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

Anand, Avinash; Jaiswal, Raj; Gupta, Mohit; Bangar, Siddhesh S; Bhuyan, Pijush; Lal, Naman; Singh, Rajeev; Jha, Ritika; Shah, Rajiv Ratn; Satoh, Shin'ichi

doi:10.1145/3595916.3626448

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.09530 (cs)

[Submitted on 15 Apr 2024 (v1), last revised 19 Apr 2024 (this version, v2)]

Title:RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

Authors:Avinash Anand, Raj Jaiswal, Mohit Gupta, Siddhesh S Bangar, Pijush Bhuyan, Naman Lal, Rajeev Singh, Ritika Jha, Rajiv Ratn Shah, Shin'ichi Satoh

View PDF HTML (experimental)

Abstract:Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of annotated instances, which is both expensive and time-consuming. As a result, differences between the source and target domains may significantly impact how well these models function. To solve this problem, domain adaptation approaches have been developed that use a small quantity of labeled data to adjust the model to the target domain. In this research, we introduced a synthetic document dataset called RanLayNet, enriched with automatically assigned labels denoting spatial positions, ranges, and types of layout elements. The primary aim of this endeavor is to develop a versatile dataset capable of training models with robustness and adaptability to diverse document formats. Through empirical experimentation, we demonstrate that a deep layout identification model trained on our dataset exhibits enhanced performance compared to a model trained solely on actual documents. Moreover, we conduct a comparative analysis by fine-tuning inference models using both PubLayNet and IIIT-AR-13K datasets on the Doclaynet dataset. Our findings emphasize that models enriched with our dataset are optimal for tasks such as achieving 0.398 and 0.588 mAP95 score in the scientific document domain for the TABLE class.

Comments:	8 pages, 6 figures, MMAsia 2023 Proceedings of the 5th ACM International Conference on Multimedia in Asia
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.09530 [cs.CV]
	(or arXiv:2404.09530v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.09530
Journal reference:	In Proceedings of the 5th ACM International Conference on Multimedia in Asia 2023. Association for Computing Machinery, NY, USA, Article 74, pp. 1-6
Related DOI:	https://doi.org/10.1145/3595916.3626448

Submission history

From: Mohit Gupta [view email]
[v1] Mon, 15 Apr 2024 07:50:15 UTC (7,272 KB)
[v2] Fri, 19 Apr 2024 06:44:18 UTC (7,272 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators