Extraction of Numerical Facts from German Texts to Enrich Internal Audit Data

Schumann, Gerrit; Marx Gómez, Jorge

doi:10.1007/978-3-031-56576-2_16

Gerrit Schumann⁴ &
Jorge Marx Gómez⁴

Part of the book series: Progress in IS ((PROIS))

Included in the following conference series:

International Conference on Technological Advancement in Embedded and Mobile Systems

46 Accesses

Abstract

Large-scale automated data processing is usually only possible for internal auditors in the case of structured data. Unstructured data, such as facts contained in texts, on the other hand, are often processed manually and using sampling. This, in turn, can increase the risk of disregarding relevant information during an audit. To address this risk, we present an approach that can be used to extract numerical facts along with their associated entities and relations from German texts and convert them into a format that can be processed by audit tools. The algorithm developed for this purpose follows a rule-based logic and was evaluated using 4637 sentences from 50 German annual reports. The results show that in more than 75% of all cases, the entity and relation of a numeric value within the sentence could be determined correctly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
At the time of the conference in November 2022.
2.
https://spacy.io/usage/linguistic-features#sbd.
3.
https://spacy.io/models/de#de_dep_news_trf.
4.
https://huggingface.co/bert-base-german-cased.
5.
https://spacy.io/usage/linguistic-features#:$\sim$:text=useful%20tool%20for-,information%20 extraction,-%2C%20especially%20when%20combined.

References

Huang, F., No, W. G., Vasarhelyi, M. A., & Yan, Z. (2022). Audit data analytics, machine learning, and full population testing. The Journal of Finance and Data Science, 8, 138–144.
Article Google Scholar
No, W. G., Lee, K., Huang, F., & Li, Q. (2019). Multidimensional audit data selection (MADS): A framework for using data analytics in audit data selection process. Accounting Horizons, 33(3), 127–140.
Article Google Scholar
Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., & Etzioni, O. (2007). Open information extraction from the web. In Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI ‘07 (pp. 2670–26760). AAAI.
Google Scholar
Bassa, A., Kröll, M., & Kern, R. (2018). GerIE—An open information extraction system for the German language. Journal of Universal Computer Science, 24(1), 2–24.
Google Scholar
Akbik, A., & Löser, A. (2012). Kraken: Nary facts in open information extraction. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, AKBC-WEKEX’12 (pp. 52–56). Association for Computational Linguistics.
Google Scholar
Del Corro, L., & Gemulla, R. (2013). Claus IE: Clause-based open information extraction. In Proceedings of the 22nd International World Wide Web Conference. International World Wide Web Conferences Steering Committee (pp. 355–365). ACM.
Chapter Google Scholar
Falke, T. (2016). Porting an open information extraction system from English to German. In Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP-16) (pp. 892–898). Association for Computational Linguistics.
Chapter Google Scholar
Gamallo, P., Garcia, M., & Fernandez-Lanza, S. (2012). Dependency-based open information extraction. In Proceedings of the 13th Conf. of the European Chapter of the Association for Computational Linguistics (pp. 10–18). Association for Comput. Linguistics.
Google Scholar
Gamallo, P., & Garcia, M. (2015). Multilingual open information extraction. In F. Pereira & Machado, P., Costa, E., Cardoso, A. (Eds.), Progress in Artificial Intelligence. EPIA 2015. Lecture notes in Computer Science (Vol. 9273, pp. 711–722). Springer.
Google Scholar
Xavier, C. C., & de Lima, V. L. S. (2014). Boosting open information extraction with noun-based relations. In Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC’14 (pp. 96–100). ELRA.
Google Scholar
Zhila, A., & Gelbukh, A. (2016). Open information extraction from real internet texts in Spanish using constraints over part-of-speech sequences: Problems of the method, their causes, and ways for improvement. Revista signos, 49(90), 119–142.
Article Google Scholar
Bast, H., & Haussmann, E. (2013). Open information extraction via contextual sentence decomposition. In 2013 IEEE Seventh International Conference on Semantic Computing (pp. 154–159). IEEE.
Chapter Google Scholar
Akbik, A., & Broß, J. (2009). Wanderlust: Extracting semantic relations from natural language text using dependency grammar patterns. In Proceedings of the 2009 semantic search workshop at the 18th Internet. World wide web conference (pp. 6–15). ACM.
Google Scholar
Sleator, D. D., & Temperley, D. (1995). Parsing English with a link grammar. arXiv preprint cmplg/9508004.
Google Scholar
Wu, F., & Weld, D. S. (2010). Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL’10 (pp. 118–127). Association for Computational Linguistics.
Google Scholar
Cui, L., Wei, F., & Zhou, M. (2018). Neural open information extraction, arXiv preprint, arXiv: 1805.04270.
Google Scholar
Roy, A., Park, Y., Lee, T., & Pan, S. (2019). Supervising unsupervised open information extraction models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (pp. 728–737). Association for Computational Linguistics.
Google Scholar
Stanovsky, G., Michael, J., Zettlemoyer, L., & Dagan, I. (2018). Supervised open information extraction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 885–895). Association for Computational Linguistics.
Google Scholar
Zhan, J., & Zhao, H. (2020). Span model for open information extraction on accurate corpus. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 5) (pp. 9523–9530). AAAI.
Google Scholar
Klose C., Gui, Z., & Harth, A. (2022). Open information extraction on German Wikipedia texts. In Text2KG 2022: International workshop on knowledge graph generation from text, co-located with the ESWC 2022, May 05-30-2022, Crete, Hersonissos, Greece.
Google Scholar
Saha, S., & Pal, H. (2017). Bootstrapping for numerical open IE. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Volume 2: Short Papers (pp. 317–323). Association for Computational Linguistics.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

University of Oldenburg, Oldenburg, Germany
Gerrit Schumann & Jorge Marx Gómez

Authors

Gerrit Schumann
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Marx Gómez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gerrit Schumann .

Editor information

Editors and Affiliations

University of Oldenburg, Oldenburg, Germany
Jorge Marx Gómez
The Nelson Mandela African Institution of Science and Technology, Arusha, Tanzania
Anael Elikana Sam
The Nelson Mandela African Institution of Science and Technology, Arusha, Tanzania
Devotha Godfrey Nyambo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schumann, G., Marx Gómez, J. (2024). Extraction of Numerical Facts from German Texts to Enrich Internal Audit Data. In: Marx Gómez, J., Elikana Sam, A., Godfrey Nyambo, D. (eds) Artificial Intelligence Tools and Applications in Embedded and Mobile Systems. ICTA-EMOS 2022. Progress in IS. Springer, Cham. https://doi.org/10.1007/978-3-031-56576-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-56576-2_16
Published: 30 June 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56575-5
Online ISBN: 978-3-031-56576-2
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics