short-paper

Workshop on Document Intelligence Understanding

Authors:

Eun-Jung HoldenAuthors Info & Claims

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 5273 - 5276

https://doi.org/10.1145/3583780.3615312

Published: 21 October 2023 Publication History

Get Access

Abstract

Document understanding and information extraction include different tasks to understand a document and extract valuable information automatically. Recently, there has been a rising demand for developing document understanding among different domains, including business, law, and medicine, to boost the efficiency of work that is associated with a large number of documents. This workshop aims to bring together researchers and industry developers in the field of document intelligence and understanding diverse document types to boost automatic document processing and understanding techniques. We also release a data challenge on the recently introduced document-level VQA dataset, PDFVQA. The PDFVQA challenge examines the model's structural and contextual understandings on the natural full document level of multiple consecutive document pages by including questions with a sequence of answers extracted from multi-pages of the full document. This task helps to boost the document understanding step from the single-page level to the full document level understanding.

References

[1]

Yihao Ding, Zhe Huang, Runlin Wang, YanHang Zhang, Xianru Chen, Yuzhong Ma, Hyunsuk Chung, and Soyeon Caren Han. 2022. V-Doc: Visual questions answers with Documents. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21492--21498.

Crossref

Google Scholar

[2]

Yihao Ding, Siqu Long, Jiabin Huang, Kaixuan Ren, Xingxiang Luo, Hyunsuk Chung, and Soyeon Caren Han. 2023 a. Form-NLU: Dataset for the Form Natural Language Understanding. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2807--2816.

Digital Library

Google Scholar

[3]

Yihao Ding, Siwen Luo, Hyunsuk Chung, and Soyeon Caren Han. 2023 b. PDFVQA: A New Dataset for Real-World VQA on PDF Documents. arXiv preprint arXiv:2304.06447 (2023).

Google Scholar

[4]

Zheng Huang, Kai Chen, Jianhua He, Xiang Bai, Dimosthenis Karatzas, Shijian Lu, and CV Jawahar. 2019. Icdar2019 competition on scanned receipt ocr and information extraction. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1516--1520.

Crossref

Google Scholar

[5]

Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, and Ming Zhou. 2020. DocBank: A Benchmark Dataset for Document Layout Analysis. In Proceedings of the 28th International Conference on Computational Linguistics. 949--960.

Crossref

Google Scholar

[6]

Siwen Luo, Yihao Ding, Siqu Long, Josiah Poon, and Soyeon Caren Han. 2022. Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, 2906--2916.

Google Scholar

[7]

Minesh Mathew, Dimosthenis Karatzas, and CV Jawahar. 2021. Docvqa: A dataset for vqa on document images. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2200--2209.

Crossref

Google Scholar

[8]

Seunghyun Park, Seung Shin, Bado Lee, Junyeop Lee, Jaeheung Surh, Minjoon Seo, and Hwalsuk Lee. 2019. CORD: a consolidated receipt dataset for post-OCR parsing. In Workshop on Document Intelligence at NeurIPS 2019.

Google Scholar

[9]

Ryota Tanaka, Kyosuke Nishida, and Sen Yoshida. 2021. Visualmrc: Machine reading comprehension on document images. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 13878--13888.

Crossref

Google Scholar

[10]

Zhihao Zhang, Siwen Luo, Junyi Chen, Sijia Lai, Siqu Long, Hyunsuk Chung, and Soyeon Caren Han. 2023. PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 1152--1155.

Digital Library

Google Scholar

[11]

Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. 2019. Publaynet: largest dataset ever for document layout analysis. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1015--1022.

Crossref

Google Scholar

Index Terms

Workshop on Document Intelligence Understanding
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval

Recommendations

Unlocking the Potential of Unstructured Data in Business Documents Through Document Intelligence
CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)

With the recent advancements, organizations have brought data to the forefront of their digital transformation journeys. Financial services industry is also moving towards adopting data-driven strategies for improved and faster decision making and ...
Understanding the Structure of Streaming Documents based on Neural Network
ICCPR '19: Proceedings of the 2019 8th International Conference on Computing and Pattern Recognition

Document structure understanding can obtain the structural information of long-form documents, which plays a key role in the automatic layout of document formats. In the main document format, the structure of streaming documents is difficult to ...
Digital Document Understanding and Visualization
HICSS '00: Proceedings of the 33rd Hawaii International Conference on System Sciences-Volume 3 - Volume 3

The explosion of digital documents on the internet and in the workplace has led to an increasing need for computer systems that help us not only manage the documents but also manage our understanding of these documents and their relationships.When users ...

Comments

Information & Contributors

Information

Published In

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

October 2023

5508 pages

ISBN:9798400701245

DOI:10.1145/3583780

General Chairs:
Ingo Frommholz
University of Wolverhampton, UK
,
Frank Hopfgartner
University of Koblenz, Germany
,
Mark Lee
University of Birmingham, UK
,
Michael Oakes
University of Birmingham, UK
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Min Zhang
Tsinghua University, China
,
Rodrygo Santos
Federal University of Minas Gerais, Brazil

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Google Research

Conference

CIKM '23

Sponsor:

CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2023

Birmingham, United Kingdom

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
78
Total Downloads

Downloads (Last 12 months)78
Downloads (Last 6 weeks)2

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Unlocking the Potential of Unstructured Data in Business Documents Through Document Intelligence

Understanding the Structure of Streaming Documents based on Neural Network

Digital Document Understanding and Visualization