short-paper

Understanding Financial Transaction Documents using Natural Language Processing

Authors:

Prateek Jain,

Kunal Verma,

Aniket Gaikwad,

Pramod GaddeAuthors Info & Claims

K-CAP '19: Proceedings of the 10th International Conference on Knowledge Capture

Pages 255 - 258

https://doi.org/10.1145/3360901.3364439

Published: 23 September 2019 Publication History

Get Access

Abstract

In this paper, we share our experiences creating NLP based AI platform for finance - Appzen (http://www.appzen.com). AppZen's auditing technology is being utilized by over 500 enterprise customers including multiple Fortune 500 companies for auditing employee expenses. AppZen's technology can process, analyze and identify relationships between various kinds of transaction documents such as - receipts, invoices, contracts and purchase orders. Each type of transaction document requires custom processing and analysis due to the diversity in language and structure of the document. Contracts typically require deep understanding of the content such as identifying sentence structures, identifying entities and relationships between them compared to receipts and invoices, which are somewhat semi-structured and require a different kind of processing. We elaborate on the challenges we have experienced and use of NLP in conjunction with a lightweight semantic layer to alleviate these challenges.

References

[1]

Charu C. Aggarwal and ChengXiang Zhai. 2012. A Survey of Text Classification Algorithms .Springer US, Boston, MA, 163--222. https://doi.org/10.1007/978--1--4614--3223--4_6

Crossref

Google Scholar

[2]

Grigoris Antoniou and Frank van Harmelen. 2004. Web Ontology Language: OWL .Springer Berlin Heidelberg, Berlin, Heidelberg, 67--92. https://doi.org/10.1007/978--3--540--24750-0_4

Crossref

Google Scholar

[3]

Finance and Planning Financial Services. 2014. Pace University - Travel and Expense Reimbursement Policy and Procedure. https://www.pace.edu/sites/default/files/files/finance-planning/accounts-payable/travel-expense-reimbursement-policy.pdf . (Accessed on 07/20/2019).

Google Scholar

[4]

Freddy Lecue and Jiewen Wu. 2017. Explaining and Predicting Abnormal Expenses at Large Scale Using Knowledge Graph Based Reasoning. Web Semant., Vol. 44, C (May 2017), 89--103. https://doi.org/10.1016/j.websem.2017.05.003

Crossref

Google Scholar

[5]

Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Christian Bizer. 2015. DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia. Semantic Web Journal, Vol. 6, 2 (2015), 167--195. http://jens-lehmann.org/files/2015/swj_dbpedia.pdf

Google Scholar

[6]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'13). Curran Associates Inc., USA, 3111--3119. http://dl.acm.org/citation.cfm?id=2999792.2999959

Digital Library

Google Scholar

[7]

Shunji Mori, Hirobumi Nishida, and Hiromitsu Yamada. 1999. Optical Character Recognition 1st ed.). John Wiley & Sons, Inc., New York, NY, USA.

Google Scholar

[8]

David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes, Vol. 30, 1 (Jan. 2007), 3--26. https://doi.org/10.1075/li.30.1.03nad

Crossref

Google Scholar

[9]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12 (2011), 2825--2830.

Digital Library

Google Scholar

[10]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Association for Computational Linguistics, Doha, Qatar, 1532--1543. https://doi.org/10.3115/v1/D14--1162

Crossref

Google Scholar

[11]

Beth Pinsker. 2019. Expense report of the future reduces fraud and headaches - Reuters. https://www.reuters.com/article/us-world-work-expensereports/expense-report-of-the-future-reduces-fraud-and-headaches-idUSKCN1R614N . (Accessed on 07/20/2019).

Google Scholar

[12]

Eric Prud'hommeaux and Andy Seaborne. 2008. SPARQL Query Language for RDF . W3C Recommendation. http://www.w3.org/TR/rdf-sparql-query/ http://www.w3.org/TR/rdf-sparql-query/.

Google Scholar

[13]

TIM WHEATCROFT. 2016. How much does business expense fraud cost? https://www.chromeriver.com/blog/how-much-does-business-expense-fraud-cost . (Accessed on 07/20/2019).

Google Scholar

Index Terms

Understanding Financial Transaction Documents using Natural Language Processing
1. Applied computing
  1. Enterprise computing
    1. Enterprise ontologies, taxonomies and vocabularies

Recommendations

Ranking multilingual documents using minimal language dependent resources
CICLing'11: Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II

This paper proposes an approach of extracting simple and effective features that enhances multilingual document ranking (MLDR). There is limited prior research on capturing the concept of multilingual document similarity in determining the ranking of ...
Introduction to Chinese Natural Language Processing
No mining, no meaning: relating documents across repositories with ontology-driven information extraction
DocEng '08: Proceedings of the eighth ACM symposium on Document engineering

Far from eliminating documents as some expected, the Internet has lead to a proliferation of digital documents, without a centralized control or indexing. Thus, identifying relevant documents becomes simultaneously more important and much harder, since ...

Comments

Information & Contributors

Information

Published In

K-CAP '19: Proceedings of the 10th International Conference on Knowledge Capture

September 2019

281 pages

ISBN:9781450370080

DOI:10.1145/3360901

General Chairs:
Mayank Kejriwal
University of Southern California Information Sciences Institute, USA
,
Pedro Szekely
University of Southern California Information Sciences Institute, USA
,
Program Chair:
Raphaël Troncy
EURECOM, France

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 September 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

K-CAP '19

Sponsor:

SIGAI

K-CAP '19: Knowledge Capture Conference

November 19 - 21, 2019

CA, Marina Del Rey, USA

Acceptance Rates

Overall Acceptance Rate 55 of 198 submissions, 28%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
255
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)2

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Ranking multilingual documents using minimal language dependent resources

Introduction to Chinese Natural Language Processing

No mining, no meaning: relating documents across repositories with ontology-driven information extraction