Knowledge-Based Neural Pre-training for Intelligent Document Management

Margiotta, Daniele; Croce, Danilo; Rotoloni, Marco; Cacciamani, Barbara; Basili, Roberto

doi:10.1007/978-3-031-08421-8_39

Daniele Margiotta¹²,
Danilo Croce¹²,
Marco Rotoloni¹³,
Barbara Cacciamani¹³ &
…
Roberto Basili¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13196))

Included in the following conference series:

International Conference of the Italian Association for Artificial Intelligence

1038 Accesses

Abstract

Banks are usually large and complex companies that face a number of challenges to support the rapid and effective sharing of information and content across their organizations. Extracting complex metadata from raw bank documents is therefore central to support intelligent data indexing, information circulation and to promote more complex predictive capabilities, e.g., compliance assessment problems. In this paper, we present a weakly-supervised neural methodology for creating semantic metadata from bank documents. It exploits a neural pre-training method optimized against legacy semantic resources able to minimize the training effort. We studied an application to business process design and management in banks and tested the method on documents from the Italian banking community. The measured impact of the proposed training approach to process-related metadata creation confirms its applicability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Self-supervised Information Retrieval Trained from Self-generated Sets of Queries and Relevant Documents

Insurance Business Process Automation Using Deep Learning Techniques

DocReader: Bounding-Box Free Training of a Document Information Extraction Model

Notes

1.
https://huggingface.co/idb-ita/gilberto-uncased-from-camembert.
2.
It is available at: https://www.abilab.it/tassonomia-processi-bancari.
3.
Decision functions f other than $f_{desc}$ and the adoption of the Sibling Recognition task had no significant impact on performances. Also negation provided little improvement of Recall (0.83 wrt 0.84).

References

Borrajo, D., Veloso, M., Shah, S.: Simulating and classifying behavior in adversarial environments based on action-state traces: an application to money laundering. CoRR abs/2011.01826 (2020). https://arxiv.org/abs/2011.01826
Brown, T.B., et al.: Language models are few-shot learners. CoRR abs/2005.14165 (2020). https://arxiv.org/abs/2005.14165
Chen, J.-H., Tsai, Y.-C.: Encoding candlesticks as images for pattern classification using convolutional neural networks. Financ. Innov. 6(1), 1–19 (2020). https://doi.org/10.1186/s40854-020-00187-0
Article Google Scholar
Cohen, N., Balch, T., Veloso, M.: Trading via image classification. CoRR abs/1907.10046 (2019). http://arxiv.org/abs/1907.10046
Croce, D., Castellucci, G., Basili, R.: GAN-BERT: generative adversarial learning for robust text classification with a bunch of labeled examples. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2114–2119. Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.191, https://aclanthology.org/2020.acl-main.191
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Geiger, A., Richardson, K., Potts, C.: Neural natural language inference models partially embed theories of lexical entailment and negation. arXiv preprint. arXiv:2004.14623 (2020)
Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR abs/1311.2524 (2013)
Google Scholar
Li, X., Vilnis, L., Zhang, D., Boratko, M., McCallum, A.: Smoothing the geometry of probabilistic box embeddings. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=H1xSNiRcF7
Li, X., Saúde, J., Reddy, P., Veloso, M.: Classifying and understanding financial data using graph neural network. In: AAAI-20 Workshop on Knowledge Discovery from Unstructured Data in Financial Services (2020)
Google Scholar
Pejić Bach, M., Krstic, Z., Seljan, S., Turulja, L.: Text mining for big data analysis in financial sector: a literature review. Sustainability, 11(5) (2019). https://doi.org/10.3390/su11051277, https://www.mdpi.com/2071-1050/11/5/1277
Pereira, J.L., Silva, D.: Business process modeling languages: a comparative framework. In: New Advances in Information Systems and Technologies. AISC, vol. 444, pp. 619–628. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31232-3_58
Chapter Google Scholar
Peters, M., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long Papers), pp. 2227–2237. Association for Computational Linguistics, New Orleans, Louisiana, June 2018. https://doi.org/10.18653/v1/N18-1202, https://www.aclweb.org/anthology/N18-1202
Schuster, M., Nakajima, K.: Japanese and Korean voice search. In: International Conference on Acoustics, Speech and Signal Processing, pp. 5149–5152 (2012)
Google Scholar
Talman, A., Chatzikyriakidis, S.: Neural network models for natural language inference fail to capture the semantics of inference. CoRR abs/1810.09774 (2018). http://arxiv.org/abs/1810.09774
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. CoRR abs/1910.03771 (2019). http://arxiv.org/abs/1910.03771

Download references

Acknowledgment

This research was developed in the context of H2020 INFINITECH project (EC grant agreement number 856632). We would like to thank the “Istituto di Analisi dei Sistemi ed Informatica - Antonio Ruberti" (IASI) for supporting the experimentations through access to dedicated computing esources.

Author information

Authors and Affiliations

University of Roma Tor Vergata, Rome, Italy
Daniele Margiotta, Danilo Croce & Roberto Basili
ABI Lab, Rome, Italy
Marco Rotoloni & Barbara Cacciamani

Authors

Daniele Margiotta
View author publications
You can also search for this author in PubMed Google Scholar
Danilo Croce
View author publications
You can also search for this author in PubMed Google Scholar
Marco Rotoloni
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Cacciamani
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Basili
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Danilo Croce or Roberto Basili .

Editor information

Editors and Affiliations

Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
Stefania Bandini
Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
Francesca Gasparini
Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genoa, Genova, Italy
Viviana Mascardi
Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
Matteo Palmonari
Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
Giuseppe Vizzari

Appendix

This appendix reports the result of the assessment analysis from Sect. 5. Each cell of the matrices contains the comparison between the pool of analysts $A1-A9$ (where also ABILaBERT is considered). As an example, in Table 4 the value in the fourth row and the first column contains the $Precision=0.52$ obtained by the analyst A4 when compared with the “gold-standard" annotation of A1. As another example, in Table 6 the element from the first row and the third column contains $F1=0.82$ of ABILaBERT when compared with the annotations of A3.

Table 4. Precision of the assessment analysis.

Full size table

Table 5. Recall of the assessment analysis.

Full size table

Table 6. F1 of the assessment analysis.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Margiotta, D., Croce, D., Rotoloni, M., Cacciamani, B., Basili, R. (2022). Knowledge-Based Neural Pre-training for Intelligent Document Management. In: Bandini, S., Gasparini, F., Mascardi, V., Palmonari, M., Vizzari, G. (eds) AIxIA 2021 – Advances in Artificial Intelligence. AIxIA 2021. Lecture Notes in Computer Science(), vol 13196. Springer, Cham. https://doi.org/10.1007/978-3-031-08421-8_39

Download citation

DOI: https://doi.org/10.1007/978-3-031-08421-8_39
Published: 19 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08420-1
Online ISBN: 978-3-031-08421-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Knowledge-Based Neural Pre-training for Intelligent Document Management

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Self-supervised Information Retrieval Trained from Self-generated Sets of Queries and Relevant Documents

Insurance Business Process Automation Using Deep Learning Techniques

DocReader: Bounding-Box Free Training of a Document Information Extraction Model

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Knowledge-Based Neural Pre-training for Intelligent Document Management

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Self-supervised Information Retrieval Trained from Self-generated Sets of Queries and Relevant Documents

Insurance Business Process Automation Using Deep Learning Techniques

DocReader: Bounding-Box Free Training of a Document Information Extraction Model

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation