DocSegTr: An Instance-Level End-to-End Document Image Segmentation Transformer

Biswas, Sanket; Banerjee, Ayan; Lladós, Josep; Pal, Umapada

Computer Science > Computer Vision and Pattern Recognition

arXiv:2201.11438 (cs)

[Submitted on 27 Jan 2022 (v1), last revised 21 Sep 2022 (this version, v2)]

Title:DocSegTr: An Instance-Level End-to-End Document Image Segmentation Transformer

Authors:Sanket Biswas, Ayan Banerjee, Josep Lladós, Umapada Pal

View PDF

Abstract:Understanding documents with rich layouts is an essential step towards information extraction. Business intelligence processes often require the extraction of useful semantic content from documents at a large scale for subsequent decision-making tasks. In this context, instance-level segmentation of different document objects (title, sections, figures etc.) has emerged as an interesting problem for the document analysis and understanding community. To advance the research in this direction, we present a transformer-based model called \emph{DocSegTr} for end-to-end instance segmentation of complex layouts in document images. The method adapts a twin attention module, for semantic reasoning, which helps to become highly computationally efficient compared with the state-of-the-art. To the best of our knowledge, this is the first work on transformer-based document segmentation. Extensive experimentation on competitive benchmarks like PubLayNet, PRIMA, Historical Japanese (HJ) and TableBank demonstrate that our model achieved comparable or better segmentation performance than the existing state-of-the-art approaches with the average precision of 89.4, 40.3, 83.4 and 93.3. This simple and flexible framework could serve as a promising baseline for instance-level recognition tasks in document images.

Comments:	Preprint
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2201.11438 [cs.CV]
	(or arXiv:2201.11438v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2201.11438

Submission history

From: Sanket Biswas [view email]
[v1] Thu, 27 Jan 2022 10:50:22 UTC (5,276 KB)
[v2] Wed, 21 Sep 2022 15:58:41 UTC (26,507 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:DocSegTr: An Instance-Level End-to-End Document Image Segmentation Transformer

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DocSegTr: An Instance-Level End-to-End Document Image Segmentation Transformer

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators