Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3360901.3364439acmconferencesArticle/Chapter ViewAbstractPublication Pagesk-capConference Proceedingsconference-collections
short-paper

Understanding Financial Transaction Documents using Natural Language Processing

Published: 23 September 2019 Publication History

Abstract

In this paper, we share our experiences creating NLP based AI platform for finance - Appzen (http://www.appzen.com). AppZen's auditing technology is being utilized by over 500 enterprise customers including multiple Fortune 500 companies for auditing employee expenses. AppZen's technology can process, analyze and identify relationships between various kinds of transaction documents such as - receipts, invoices, contracts and purchase orders. Each type of transaction document requires custom processing and analysis due to the diversity in language and structure of the document. Contracts typically require deep understanding of the content such as identifying sentence structures, identifying entities and relationships between them compared to receipts and invoices, which are somewhat semi-structured and require a different kind of processing. We elaborate on the challenges we have experienced and use of NLP in conjunction with a lightweight semantic layer to alleviate these challenges.

References

[1]
Charu C. Aggarwal and ChengXiang Zhai. 2012. A Survey of Text Classification Algorithms .Springer US, Boston, MA, 163--222. https://doi.org/10.1007/978--1--4614--3223--4_6
[2]
Grigoris Antoniou and Frank van Harmelen. 2004. Web Ontology Language: OWL .Springer Berlin Heidelberg, Berlin, Heidelberg, 67--92. https://doi.org/10.1007/978--3--540--24750-0_4
[3]
Finance and Planning Financial Services. 2014. Pace University - Travel and Expense Reimbursement Policy and Procedure. https://www.pace.edu/sites/default/files/files/finance-planning/accounts-payable/travel-expense-reimbursement-policy.pdf . (Accessed on 07/20/2019).
[4]
Freddy Lecue and Jiewen Wu. 2017. Explaining and Predicting Abnormal Expenses at Large Scale Using Knowledge Graph Based Reasoning. Web Semant., Vol. 44, C (May 2017), 89--103. https://doi.org/10.1016/j.websem.2017.05.003
[5]
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Christian Bizer. 2015. DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia. Semantic Web Journal, Vol. 6, 2 (2015), 167--195. http://jens-lehmann.org/files/2015/swj_dbpedia.pdf
[6]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'13). Curran Associates Inc., USA, 3111--3119. http://dl.acm.org/citation.cfm?id=2999792.2999959
[7]
Shunji Mori, Hirobumi Nishida, and Hiromitsu Yamada. 1999. Optical Character Recognition 1st ed.). John Wiley & Sons, Inc., New York, NY, USA.
[8]
David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes, Vol. 30, 1 (Jan. 2007), 3--26. https://doi.org/10.1075/li.30.1.03nad
[9]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12 (2011), 2825--2830.
[10]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Association for Computational Linguistics, Doha, Qatar, 1532--1543. https://doi.org/10.3115/v1/D14--1162
[11]
Beth Pinsker. 2019. Expense report of the future reduces fraud and headaches - Reuters. https://www.reuters.com/article/us-world-work-expensereports/expense-report-of-the-future-reduces-fraud-and-headaches-idUSKCN1R614N . (Accessed on 07/20/2019).
[12]
Eric Prud'hommeaux and Andy Seaborne. 2008. SPARQL Query Language for RDF . W3C Recommendation. http://www.w3.org/TR/rdf-sparql-query/ http://www.w3.org/TR/rdf-sparql-query/.
[13]
TIM WHEATCROFT. 2016. How much does business expense fraud cost? https://www.chromeriver.com/blog/how-much-does-business-expense-fraud-cost . (Accessed on 07/20/2019).

Index Terms

  1. Understanding Financial Transaction Documents using Natural Language Processing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    K-CAP '19: Proceedings of the 10th International Conference on Knowledge Capture
    September 2019
    281 pages
    ISBN:9781450370080
    DOI:10.1145/3360901
    • General Chairs:
    • Mayank Kejriwal,
    • Pedro Szekely,
    • Program Chair:
    • Raphaël Troncy
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 September 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. feature engineering
    2. financial auditing
    3. information extraction
    4. nlp

    Qualifiers

    • Short-paper

    Conference

    K-CAP '19
    Sponsor:
    K-CAP '19: Knowledge Capture Conference
    November 19 - 21, 2019
    CA, Marina Del Rey, USA

    Acceptance Rates

    Overall Acceptance Rate 55 of 198 submissions, 28%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 255
      Total Downloads
    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 13 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media