research-article

Form-NLU: Dataset for the Form Natural Language Understanding

Authors:

Soyeon Caren HanAuthors Info & Claims

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2807 - 2816

https://doi.org/10.1145/3539618.3591886

Published: 18 July 2023 Publication History

Get Access

Abstract

Compared to general document analysis tasks, form document structure understanding and retrieval are challenging. Form documents are typically made by two types of authors; A form designer, who develops the form structure and keys, and a form user, who fills out form values based on the provided keys. Hence, the form values may not be aligned with the form designer's intention (structure and keys) if a form user gets confused. In this paper, we introduce Form-NLU, the first novel dataset for form structure understanding and its key and value information extraction, interpreting the form designer's intent and the alignment of user-written value on it. It consists of 857 form images, 6k form keys and values, and 4k table keys and values. Our dataset also includes three form types: digital, printed, and handwritten, which cover diverse form appearances and layouts. We propose a robust positional and logical relation-based form key-value information extraction framework. Using this dataset, Form-NLU, we first examine strong object detection models for the form layout understanding, then evaluate the key information extraction task on the dataset, providing fine-grained results for different types of forms and keys. Furthermore, we examine it with the off-the-shelf pdf layout extraction tool and prove its feasibility in real-world cases.

Supplemental Material

MP4 File

Form understanding presents notable challenges compared with general document understanding because of the uncertainties caused by multiple parties. We introduce Form-NLU, aiming to tackle these challenges by focusing on form structure understanding and extracting key-value information. The dataset comprises 857 form images, encompassing 6,000 form keys and values and 4,000 table keys and values. It covers digital, printed, and handwritten forms, providing diverse appearances and layouts that appeared in the real world. To address this problem, a robust form key-value information extraction framework is proposed, leveraging positional and logical relations. The evaluation includes assessing object detection models for form layout understanding, examining key information extraction performance across various form types and keys, and exploring the feasibility of an off-the-shelf PDF layout extraction tool in real-world scenarios.

Download
253.44 MB

References

[1]

Feiqi Cao, Soyeon Caren Han, Siqu Long, Changwei Xu, and Josiah Poon. 2022. Understanding Attention for Vision-and-Language Tasks. In Proceedings of the 29th International Conference on Computational Linguistics. 3438--3453.

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels

MTL-FoUn: A Multi-Task Learning Approach to Form Understanding

A human morning routine dataset

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations