Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Extraction of Numerical Facts from German Texts to Enrich Internal Audit Data

  • Conference paper
  • First Online:
Artificial Intelligence Tools and Applications in Embedded and Mobile Systems (ICTA-EMOS 2022)

Part of the book series: Progress in IS ((PROIS))

  • 46 Accesses

Abstract

Large-scale automated data processing is usually only possible for internal auditors in the case of structured data. Unstructured data, such as facts contained in texts, on the other hand, are often processed manually and using sampling. This, in turn, can increase the risk of disregarding relevant information during an audit. To address this risk, we present an approach that can be used to extract numerical facts along with their associated entities and relations from German texts and convert them into a format that can be processed by audit tools. The algorithm developed for this purpose follows a rule-based logic and was evaluated using 4637 sentences from 50 German annual reports. The results show that in more than 75% of all cases, the entity and relation of a numeric value within the sentence could be determined correctly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    At the time of the conference in November 2022.

  2. 2.

    https://spacy.io/usage/linguistic-features#sbd.

  3. 3.

    https://spacy.io/models/de#de_dep_news_trf.

  4. 4.

    https://huggingface.co/bert-base-german-cased.

  5. 5.

    https://spacy.io/usage/linguistic-features#:$\sim$:text=useful%20tool%20for-,information%20 extraction,-%2C%20especially%20when%20combined.

References

  1. Huang, F., No, W. G., Vasarhelyi, M. A., & Yan, Z. (2022). Audit data analytics, machine learning, and full population testing. The Journal of Finance and Data Science, 8, 138–144.

    Article  Google Scholar 

  2. No, W. G., Lee, K., Huang, F., & Li, Q. (2019). Multidimensional audit data selection (MADS): A framework for using data analytics in audit data selection process. Accounting Horizons, 33(3), 127–140.

    Article  Google Scholar 

  3. Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., & Etzioni, O. (2007). Open information extraction from the web. In Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI ‘07 (pp. 2670–26760). AAAI.

    Google Scholar 

  4. Bassa, A., Kröll, M., & Kern, R. (2018). GerIE—An open information extraction system for the German language. Journal of Universal Computer Science, 24(1), 2–24.

    Google Scholar 

  5. Akbik, A., & Löser, A. (2012). Kraken: Nary facts in open information extraction. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, AKBC-WEKEX’12 (pp. 52–56). Association for Computational Linguistics.

    Google Scholar 

  6. Del Corro, L., & Gemulla, R. (2013). Claus IE: Clause-based open information extraction. In Proceedings of the 22nd International World Wide Web Conference. International World Wide Web Conferences Steering Committee (pp. 355–365). ACM.

    Chapter  Google Scholar 

  7. Falke, T. (2016). Porting an open information extraction system from English to German. In Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP-16) (pp. 892–898). Association for Computational Linguistics.

    Chapter  Google Scholar 

  8. Gamallo, P., Garcia, M., & Fernandez-Lanza, S. (2012). Dependency-based open information extraction. In Proceedings of the 13th Conf. of the European Chapter of the Association for Computational Linguistics (pp. 10–18). Association for Comput. Linguistics.

    Google Scholar 

  9. Gamallo, P., & Garcia, M. (2015). Multilingual open information extraction. In F. Pereira & Machado, P., Costa, E., Cardoso, A. (Eds.), Progress in Artificial Intelligence. EPIA 2015. Lecture notes in Computer Science (Vol. 9273, pp. 711–722). Springer.

    Google Scholar 

  10. Xavier, C. C., & de Lima, V. L. S. (2014). Boosting open information extraction with noun-based relations. In Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC’14 (pp. 96–100). ELRA.

    Google Scholar 

  11. Zhila, A., & Gelbukh, A. (2016). Open information extraction from real internet texts in Spanish using constraints over part-of-speech sequences: Problems of the method, their causes, and ways for improvement. Revista signos, 49(90), 119–142.

    Article  Google Scholar 

  12. Bast, H., & Haussmann, E. (2013). Open information extraction via contextual sentence decomposition. In 2013 IEEE Seventh International Conference on Semantic Computing (pp. 154–159). IEEE.

    Chapter  Google Scholar 

  13. Akbik, A., & Broß, J. (2009). Wanderlust: Extracting semantic relations from natural language text using dependency grammar patterns. In Proceedings of the 2009 semantic search workshop at the 18th Internet. World wide web conference (pp. 6–15). ACM.

    Google Scholar 

  14. Sleator, D. D., & Temperley, D. (1995). Parsing English with a link grammar. arXiv preprint cmplg/9508004.

    Google Scholar 

  15. Wu, F., & Weld, D. S. (2010). Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL’10 (pp. 118–127). Association for Computational Linguistics.

    Google Scholar 

  16. Cui, L., Wei, F., & Zhou, M. (2018). Neural open information extraction, arXiv preprint, arXiv: 1805.04270.

    Google Scholar 

  17. Roy, A., Park, Y., Lee, T., & Pan, S. (2019). Supervising unsupervised open information extraction models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (pp. 728–737). Association for Computational Linguistics.

    Google Scholar 

  18. Stanovsky, G., Michael, J., Zettlemoyer, L., & Dagan, I. (2018). Supervised open information extraction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 885–895). Association for Computational Linguistics.

    Google Scholar 

  19. Zhan, J., & Zhao, H. (2020). Span model for open information extraction on accurate corpus. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 5) (pp. 9523–9530). AAAI.

    Google Scholar 

  20. Klose C., Gui, Z., & Harth, A. (2022). Open information extraction on German Wikipedia texts. In Text2KG 2022: International workshop on knowledge graph generation from text, co-located with the ESWC 2022, May 05-30-2022, Crete, Hersonissos, Greece.

    Google Scholar 

  21. Saha, S., & Pal, H. (2017). Bootstrapping for numerical open IE. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Volume 2: Short Papers (pp. 317–323). Association for Computational Linguistics.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gerrit Schumann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schumann, G., Marx Gómez, J. (2024). Extraction of Numerical Facts from German Texts to Enrich Internal Audit Data. In: Marx Gómez, J., Elikana Sam, A., Godfrey Nyambo, D. (eds) Artificial Intelligence Tools and Applications in Embedded and Mobile Systems. ICTA-EMOS 2022. Progress in IS. Springer, Cham. https://doi.org/10.1007/978-3-031-56576-2_16

Download citation

Publish with us

Policies and ethics