Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3394171.3413511acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

One-shot Text Field labeling using Attention and Belief Propagation for Structure Information Extraction

Published: 12 October 2020 Publication History

Abstract

Structured information extraction from document images usually consists of three steps: text detection, text recognition, and text field labeling. While text detection and text recognition have been heavily studied and improved a lot in literature, text field labeling is less explored and still faces many challenges. Existing learning based methods for text labeling task usually require a large amount of labeled examples to train a specific model for each type of document. However, collecting large amounts of document images and labeling them is difficult and sometimes impossible due to privacy issues. Deploying separate models for each type of document also consumes a lot of resources. Facing these challenges, we explore one-shot learning for the text field labeling task. Existing one-shot learning methods for the task are mostly rule-based and have difficulty in labeling fields in crowded regions with few landmarks and fields consisting of multiple separate text regions. To alleviate these problems, we proposed a novel deep end-to-end trainable approach for one-shot text field labeling, which makes use of attention mechanism to transfer the layout information between document images. We further applied conditional random field on the transferred layout information for the refinement of field labeling. We collected and annotated a real-world one-shot field labeling dataset with a large variety of document types and conducted extensive experiments to examine the effectiveness of the proposed model. To stimulate research in this direction, the collected dataset and the one-shot model will be released (https://github.com/AlibabaPAI/one_shot_text_labeling).

Supplementary Material

MP4 File (3394171.3413511.mp4)
The talk for paper one-shot text field labeling using attention and belief propagation for structural information extraction.

References

[1]
David Aldavert, Marcc al Rusi nol, and Ricardo Toledo. 2017. Automatic static/variable content separation in administrative document images. In International Conference on Document Analysis and Recognition, Vol. 1.
[2]
Santanu Chaudhury, Megha Jindal, and Sumantra Dutta Roy. 2009. Model-guided segmentation and layout labelling of document images using a hierarchical conditional random field. In ICPRAI. Springer.
[3]
Vincent Poulain d'Andecy, Emmanuel Hartmann, and Marcc al Rusinol. 2018. Field extraction by hybrid incremental and a-priori structural templates. In International Workshop on Document Analysis Systems.
[4]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning.
[5]
Maroua Hammami, Pierre Héroux, et almbox. 2015. One-shot field spotting on colored forms using subgraph isomorphism. In International Conference on Document Analysis and Recognition.
[6]
Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese neural networks for one-shot image recognition. In International Conference on Machine Learning workshop, Vol. 2.
[7]
Sunil Kumar, Rajat Gupta, et almbox. 2007. Text extraction and document image segmentation using matched wavelets and MRF model. IEEE Transactions on Image Processing (2007).
[8]
Chen-Yu Lee and Simon Osindero. 2016. Recursive recurrent nets with attention modeling for ocr in the wild. In IEEE Conference on Computer Vision and Pattern Recognition.
[9]
Yujia Li, Chenjie Gu, et almbox. 2019. Graph Matching Networks for Learning the Similarity of Graph Structured Objects. International Conference on Machine Learning (2019).
[10]
Xiaojing Liu, Feiyu Gao, et almbox. 2019. Graph Convolution for Multimodal Information Extraction from Visually Rich Documents. The North American Chapter of the Association for Computational Linguistics (2019).
[11]
Ke Ma, Zhixin Shu, et almbox. 2018. Docunet: document image unwarping via a stacked U-Net. In IEEE Conference on Computer Vision and Pattern Recognition.
[12]
Eric Medvet, Alberto Bartoli, and Giorgio Davanzo. [n.d.]. A probabilistic approach to printed document understanding. International Journal on Document Analysis and Recognition, Vol. 14, 4 ( [n.,d.]).
[13]
International Conference on Document Analysis and Recognition. 2019. SROIE, ICADR, https://rrc.cvc.uab.es/?ch=13&com=introduction, 2019.
[14]
Rasmus Berg Palm, Florian Laws, and Ole Winther. 2018. Attend, Copy, Parse-End-to-end information extraction from documents. arXiv preprint arXiv:1812.07248 (2018).
[15]
Claudio Antonio Peanho, Henrique Stagni, et almbox. [n.d.]. Semantic information extraction from images of complex documents. Applied Intelligence, Vol. 37, 4 ([n.,d.]).
[16]
Sachin Ravi and Hugo Larochelle. 2017. Optimization as a model for few-shot learning. International Conference on Learning Representations (2017).
[17]
Marcc al Rusinol, Tayeb Benkhelfallah, et almbox. 2013. Field extraction from administrative documents by incremental structural templates. In International Conference on Document Analysis and Recognition.
[18]
Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management, Vol. 24, 5 (1988), 513--523.
[19]
Adam Santoro, Sergey Bartunov, et almbox. 2016. Meta-learning with memory-augmented neural networks. In International Conference on Machine Learning.
[20]
Baoguang Shi, Xiang Bai, and Cong Yao. 2016. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2016).
[21]
Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. In Neural Information Processing Systems.
[22]
Carlos Soto and Shinjae Yoo. 2019. Visual Detection with Context for Document Layout Analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3455--3461.
[23]
Vishal Sunder, Ashwin Srinivasan, et almbox. 2019. One-shot Information Extraction from Document Images using Neuro-Deductive Program Synthesis. International Joint Conference on Artificial Intelligence workshops (2019).
[24]
Flood Sung, Yongxin Yang, et almbox. 2018. Learning to compare: Relation network for few-shot learning. In IEEE Conference on Computer Vision and Pattern Recognition.
[25]
Petar Velivc kovi?, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. International Conference on Learning Representations (2018).
[26]
Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. 2015. Order matters: Sequence to sequence for sets. International Conference on Learning Representations (2015).
[27]
Oriol Vinyals, Charles Blundell, et almbox. 2016. Matching networks for one shot learning. In Neural Information Processing Systems.
[28]
Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. 2019. LayoutLM: Pre-training of Text and Layout for Document Image Understanding. arxiv: 1912.13318 [cs.CL]
[29]
Qiangpeng Yang, Mengli Cheng, et almbox. 2018. Inceptext: A new inception-text module with deformable psroi pooling for multi-oriented scene text detection. International Joint Conference on Artificial Intelligence (2018).
[30]
Xiao Yang, Ersin Yumer, et almbox. 2017. Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition.
[31]
Andrei Zanfir and Cristian Sminchisescu. 2018. Deep learning of graph matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2684--2693.
[32]
Xinyu Zhou, Cong Yao, et almbox. 2017. EAST: an efficient and accurate scene text detector. In IEEE Conference on Computer Vision and Pattern Recognition. 5551--5560.

Cited By

View all
  • (2024)A Robust Framework for One-Shot Key Information Extraction via Deep Partial Graph MatchingIEEE Transactions on Image Processing10.1109/TIP.2024.335725133(1070-1079)Online publication date: 2024
  • (2024)A Robust Component-Based Template Matching Approach Using Document Layout Graph for Extracting InformationAdvances and Trends in Artificial Intelligence. Theory and Applications10.1007/978-981-97-4677-4_2(10-22)Online publication date: 10-Jul-2024
  • (2024)One-Shot Transformer-Based Framework for Visually-Rich Document UnderstandingDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70533-5_15(244-261)Online publication date: 8-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. structure information extraction
  2. text field labeling

Qualifiers

  • Research-article

Funding Sources

  • China Postdoctoral Science Foundation

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Robust Framework for One-Shot Key Information Extraction via Deep Partial Graph MatchingIEEE Transactions on Image Processing10.1109/TIP.2024.335725133(1070-1079)Online publication date: 2024
  • (2024)A Robust Component-Based Template Matching Approach Using Document Layout Graph for Extracting InformationAdvances and Trends in Artificial Intelligence. Theory and Applications10.1007/978-981-97-4677-4_2(10-22)Online publication date: 10-Jul-2024
  • (2024)One-Shot Transformer-Based Framework for Visually-Rich Document UnderstandingDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70533-5_15(244-261)Online publication date: 8-Sep-2024
  • (2023)Visual information extraction deep learning method:a critical reviewJournal of Image and Graphics10.11834/jig.22090428:8(2276-2297)Online publication date: 2023
  • (2023)Enhancing Visually-Rich Document Understanding via Layout Structure ModelingProceedings of the 5th ACM International Conference on Multimedia in Asia Workshops10.1145/3611380.3628554(1-10)Online publication date: 6-Dec-2023
  • (2023)Enhancing Visually-Rich Document Understanding via Layout Structure ModelingProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612327(4513-4523)Online publication date: 26-Oct-2023
  • (2022)Scalable and Cost-effective Serverless Architecture for Information Extraction WorkflowsProceedings of the 2nd Workshop on High Performance Serverless Computing10.1145/3526060.3535458(15-23)Online publication date: 30-Jun-2022
  • (2022)Dual-VIE: Dual-Level Graph Attention Network for Visual Information ExtractionPRICAI 2022: Trends in Artificial Intelligence10.1007/978-3-031-20862-1_31(422-434)Online publication date: 4-Nov-2022
  • (2021)Fine-Grained Language Identification in Scene Text ImagesProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475615(4573-4581)Online publication date: 17-Oct-2021
  • (2021)StrucTexT: Structured Text Understanding with Multi-Modal TransformersProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475345(1912-1920)Online publication date: 17-Oct-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media