Abstract
Banks are usually large and complex companies that face a number of challenges to support the rapid and effective sharing of information and content across their organizations. Extracting complex metadata from raw bank documents is therefore central to support intelligent data indexing, information circulation and to promote more complex predictive capabilities, e.g., compliance assessment problems. In this paper, we present a weakly-supervised neural methodology for creating semantic metadata from bank documents. It exploits a neural pre-training method optimized against legacy semantic resources able to minimize the training effort. We studied an application to business process design and management in banks and tested the method on documents from the Italian banking community. The measured impact of the proposed training approach to process-related metadata creation confirms its applicability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
It is available at: https://www.abilab.it/tassonomia-processi-bancari.
- 3.
Decision functions f other than \(f_{desc}\) and the adoption of the Sibling Recognition task had no significant impact on performances. Also negation provided little improvement of Recall (0.83 wrt 0.84).
References
Borrajo, D., Veloso, M., Shah, S.: Simulating and classifying behavior in adversarial environments based on action-state traces: an application to money laundering. CoRR abs/2011.01826 (2020). https://arxiv.org/abs/2011.01826
Brown, T.B., et al.: Language models are few-shot learners. CoRR abs/2005.14165 (2020). https://arxiv.org/abs/2005.14165
Chen, J.-H., Tsai, Y.-C.: Encoding candlesticks as images for pattern classification using convolutional neural networks. Financ. Innov. 6(1), 1–19 (2020). https://doi.org/10.1186/s40854-020-00187-0
Cohen, N., Balch, T., Veloso, M.: Trading via image classification. CoRR abs/1907.10046 (2019). http://arxiv.org/abs/1907.10046
Croce, D., Castellucci, G., Basili, R.: GAN-BERT: generative adversarial learning for robust text classification with a bunch of labeled examples. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2114–2119. Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.191, https://aclanthology.org/2020.acl-main.191
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Geiger, A., Richardson, K., Potts, C.: Neural natural language inference models partially embed theories of lexical entailment and negation. arXiv preprint. arXiv:2004.14623 (2020)
Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR abs/1311.2524 (2013)
Li, X., Vilnis, L., Zhang, D., Boratko, M., McCallum, A.: Smoothing the geometry of probabilistic box embeddings. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=H1xSNiRcF7
Li, X., Saúde, J., Reddy, P., Veloso, M.: Classifying and understanding financial data using graph neural network. In: AAAI-20 Workshop on Knowledge Discovery from Unstructured Data in Financial Services (2020)
Pejić Bach, M., Krstic, Z., Seljan, S., Turulja, L.: Text mining for big data analysis in financial sector: a literature review. Sustainability, 11(5) (2019). https://doi.org/10.3390/su11051277, https://www.mdpi.com/2071-1050/11/5/1277
Pereira, J.L., Silva, D.: Business process modeling languages: a comparative framework. In: New Advances in Information Systems and Technologies. AISC, vol. 444, pp. 619–628. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31232-3_58
Peters, M., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long Papers), pp. 2227–2237. Association for Computational Linguistics, New Orleans, Louisiana, June 2018. https://doi.org/10.18653/v1/N18-1202, https://www.aclweb.org/anthology/N18-1202
Schuster, M., Nakajima, K.: Japanese and Korean voice search. In: International Conference on Acoustics, Speech and Signal Processing, pp. 5149–5152 (2012)
Talman, A., Chatzikyriakidis, S.: Neural network models for natural language inference fail to capture the semantics of inference. CoRR abs/1810.09774 (2018). http://arxiv.org/abs/1810.09774
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. CoRR abs/1910.03771 (2019). http://arxiv.org/abs/1910.03771
Acknowledgment
This research was developed in the context of H2020 INFINITECH project (EC grant agreement number 856632). We would like to thank the “Istituto di Analisi dei Sistemi ed Informatica - Antonio Ruberti" (IASI) for supporting the experimentations through access to dedicated computing esources.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Appendix
Appendix
This appendix reports the result of the assessment analysis from Sect. 5. Each cell of the matrices contains the comparison between the pool of analysts \(A1-A9\) (where also ABILaBERT is considered). As an example, in Table 4 the value in the fourth row and the first column contains the \(Precision=0.52\) obtained by the analyst A4 when compared with the “gold-standard" annotation of A1. As another example, in Table 6 the element from the first row and the third column contains \(F1=0.82\) of ABILaBERT when compared with the annotations of A3.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Margiotta, D., Croce, D., Rotoloni, M., Cacciamani, B., Basili, R. (2022). Knowledge-Based Neural Pre-training for Intelligent Document Management. In: Bandini, S., Gasparini, F., Mascardi, V., Palmonari, M., Vizzari, G. (eds) AIxIA 2021 – Advances in Artificial Intelligence. AIxIA 2021. Lecture Notes in Computer Science(), vol 13196. Springer, Cham. https://doi.org/10.1007/978-3-031-08421-8_39
Download citation
DOI: https://doi.org/10.1007/978-3-031-08421-8_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08420-1
Online ISBN: 978-3-031-08421-8
eBook Packages: Computer ScienceComputer Science (R0)