Abstract
Fraud exists in all walks of life and detecting and preventing fraud represents an important research question relevant to many stakeholders in society. With the rise in big data and artificial intelligence, new opportunities have arisen in using advanced machine learning models to detect fraud. This chapter provides a comprehensive overview of the challenges in detecting fraud using machine learning. We use a framework (data, method, and evaluation criterion) to review some of the practical considerations that may affect the implementation of machine-learning models to predict fraud. Then, we review select papers in the academic literature across different disciplines that can help address some of the fraud detection challenges. Finally, we suggest promising future directions for this line of research. As accounting fraud constitutes an important class of fraud, we will discuss all of these issues within the context of accounting fraud detection.
We thank Kai Guo for research assistance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
See Boute et al. (2022) for a more in-depth discussion on the use of AI in financial services.
- 4.
In a 10-year review of corporate accounting fraud commissioned by the Committee of Sponsoring Organization of the Treadway Commission (COSO), Beasley et al. (2010) find that the total cumulative misstatement or misappropriation of nearly $120 billion across 300 fraud cases with available information (mean of nearly $400 million per case) (Beasley et al., 1999).
- 5.
See SAS no. 99 (American Institute of Certified Public Accountants, 2002) for a discussion of this issue in a U.S. context.
- 6.
- 7.
See Zhang et al. (2015) for a good discussion of these issues.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
Supervised models “learn” from labeled data. To train a supervised model, one presents both fraudulent and non-fraudulent records that have been labeled as such. Unsupervised models ask the model to “learn” the data structure on its own.
- 14.
- 15.
- 16.
- 17.
This section heavily relies on Bao et al. (2020).
- 18.
- 19.
- 20.
This section borrows heavily from the online appendix of Bao et al. (2020).
- 21.
Recurrent neural networks are artificial neural networks where connections between nodes form a directed graph along a temporal sequence.
- 22.
- 23.
References
Abbasi, A., Albrecht, C., Vance, A., & Hansen, J. (2012). Metafraud: A meta-learning framework for detecting financial fraud. MIS Quarterly, 1293–1327.
American Institute of Certified Public Accountants (2002) Consideration of fraud in a financial statement audit. Statement on Auditing Standards No. 99. New York.
Amiram, D., Bozanic, Z., & Rouen, E. (2015). Financial statement errors: Evidence from the distributional properties of financial statement numbers. Review of Accounting Studies, 20, 1540–1593.
Ashton, R. H. (1974). Behavioral implications of information overload in managerial accounting reports. Cost and Management, 48(4), 37–40.
Baltrušaitis, T., Ahuja, C., & Morency, L. P. (2018). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443.
Bao, Y., Ke, B., Li, B., Yu, Y. J., & Zhang, J. (2020). Detecting accounting fraud in publicly traded US firms using a machine learning approach. Journal of Accounting Research, 58(1), 199–235.
Beasley, M. S. (1996). An empirical analysis of the relation between the board of director composition and financial statement fraud. The Accounting Review, 71, 443–465.
Beasley, M. S., Carcello, J. V., and Hermanson, D. R. (1999). Fraudulent financial reporting: 1987–1997: An Analysis of U.S. Public Companies. Sponsored by the Committee of Sponsoring Organizations of the Treadway Commission (COSO).
Beasley, M. S., Carcello, J. V., Hermanson, D. R., and Neal, T. L. (2010). Fraudulent financial reporting: 1998–2007: An Analysis of U.S. Public Companies.” Sponsored by the Committee of Sponsoring Organizations of the Treadway Commission (COSO).
Bekker, J., & Davis, J. (2020). Learning from positive and unlabeled data: A survey. Machine Learning, 109(4), 719–760.
Beneish, M. D. (1997). Detecting GAAP violation: Implications for assessing earnings management among firms with extreme financial performance. Journal of Accounting and Public Policy, 16, 271–309.
Beneish, M. D. (1999). The detection of earnings manipulation. Financial Analysts Journal, 55, 24–36.
Benbasat, I., & Taylor, R. N. (1982). Behavioral aspects of information processing for the design of management information systems. IEEE Transactions on Systems, Man, and Cybernetics, 12(4), 439–450.
Beutel, A., Akoglu, L., & Faloutsos, C. (2015). Graph-based user behavior modeling: from prediction to fraud detection. Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. pp. 2309–2310.
Brazdil, P., Carrier, C. G., Soares, C., & Vilalta, R. (2008). Metalearning: Applications to data mining. Springer Science & Business Media.
Boute, R. N., Gijsbrechts, J., & Van Mieghem, J. A. (2022). Digital lean operations: Smart automation and artificial intelligence in financial services. In V. Babich, J. Birge, & G. Hilary (Eds.), Innovative technology at the interface of finance and operations. Springer Series in Supply Chain Management. Springer Nature.
Brazel, J. F., Jones, K. L., & Zimbelman, M. F. (2009). Using nonfinancial measures to assess fraud risk. Journal of Accounting Research, 47(5), 1135–1166.
Brown, N. C., Crowley, R. M., & Elliott, W. B. (2020). What are you saying? Using topic to detect financial misreporting. Journal of Accounting Research, 58, 237–291.
Burns, N., & Kedia, S. (2006). The impact of performance-based compensation on misreporting. Journal of Financial Economics, 79, 35–67.
Cao, S., Yang, X., Chen, C., Zhou, J., Li, X., & Qi, Y. (2019). TitAnt: Online real-time transaction fraud detection in ant financial. arXiv. preprint arXiv:1906.07407.
Chen, X., Hilary, G. and Tian, X. (2020). Mandatory data breach transparency and insider trading, working paper.
Cecchini, M., Aytug, H., Koehler, G. J., & Pathak, P. (2010). Making words work: Using financial text as a predictor of financial events. Decision Support Systems, 50(1), 164–175.
Citron, D. K. (2008). Technological due process. Wash UL Rev, 85, 1249.
Darrough, M., Huang, R., & Zhao, S. (2020). Spillover effects of fraud allegations and investor sentiment. Contemporary Accounting Research, 37, 982–1014.
Davidson, R., Dey, A., & Smith, A. (2015). Executives’ Boff-the-job^ behavior, corporate culture, and financial reporting risk. Journal of Financial Economics, 117(1), 5–28.
de Roux, D., Perez, B., Moreno, A., Villamil, M. D. P., & Figueroa, C. (2018) Tax fraud detection for under-reporting declarations using an unsupervised machine learning approach. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 215–222.
Dechow, P. M., Sloan, R. G., & Sweeney, A. P. (1995). Detecting earnings management. The Accounting Review, 70(2), 193–226.
Dechow, P. M., Sloan, R. G., & Sweeney, A. P. (1996). Causes and consequences of earnings manipulation: An analysis of firms subject to enforcement actions by the SEC. Contemporary Accounting Research, 13, 1–36.
Dechow, P. M., Ge, W., Larson, C. R., & Sloan, R. G. (2011). Predicting material accounting misstatements. Contemporary Accounting Research, 28(1), 17–82.
Dong, W., Liao, S., & Zhang, Z. (2018). Leveraging financial social media data for corporate fraud detection. Journal of Management Information Systems, 35(2), 461–487.
Dutta, I., Dutta, S., & Raahemi, B. (2017). Detecting financial restatements using data mining techniques. Expert Systems with Applications, 90, 374–393.
Dyck, A., Morse, A., & Zingales, L. (2020). How pervasive is corporate fraud. University of Toronto. working paper.
Efendi, J., Srivastava, A., & Swanson, E. P. (2007). Why do corporate managers misstate financial statements? The role of option compensation and other factors. Journal of Financial Economics, 85, 667–708.
Ernst & Young (2010). Driving ethical growth—New markets, new challenges. 11th Global Fraud Survey. from https://linomartins.files.wordpress.com/2011/12/2011th_global_fraud_survey.pdf.
Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27, 861–874.
Fiore, U., De Santis, A., Perla, F., Zanetti, P., & Palmieri, F. (2019). Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Information Sciences, 479, 448–455.
Fletcher, H., Glancy, & Yadav, S. B. (2011). A computational model for financial reporting fraud detection. Decision Support Systems, 50(3), 595–601.
Garip, F. (2020). What failure to predict life outcomes can teach us. Proceedings of the National Academy of Sciences, 117(15), 8234–8235.
Green, P., & Choi, J. H. (1997). Assessing the risk of management fraud through neural network technology. Auditing: A Journal of Practice & Theory, 16, 14–29.
Guo, J., Liu, G., Zuo, Y., & Wu, J. (2018). Learning sequential behavior representations for fraud detection. 2018 IEEE international conference on data mining (ICDM). IEEE, pp. 127–136.
Hajek, P., & Henriques, R. (2017). Mining corporate annual reports for intelligent detection of financial statement fraud–a comparative study of machine learning methods. Knowledge-Based Systems, 128, 139–152.
Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning. Springer.
He, H., & Ma, Y. (2013). Imbalanced learning: Foundations, algorithms, and applications. Wiley.
Healy, P. M. (1985). The effect of bonus schemes on accounting decisions. Journal of Accounting and Economics, 7(1), 85–107.
Hobson, J. L., Mayew, W. J., & Venkatachalam, M. (2012). Analyzing speech to detect financial misreporting. Journal of Accounting Research, 50(2), 349–392.
Hoi, S. C., Sahoo, D., Lu, J., & Zhao, P. (2018). Online learning: A comprehensive survey. arXiv preprint arXiv:1802.02871.
Hu, B., Zhang, Z., Shi, C., Zhou, J., Li, X., & Qi, Y. (2019). Cash-out user detection based on attributed heterogeneous information network with a hierarchical attention mechanism. Proceedings of the AAAI Conference on Artificial Intelligence. pp. 946–953.
Humpherys, S. L., Moffitt, K. C., Burns, M. B., Burgoon, J. K., & Felix, W. F. (2011). Identification of fraudulent financial statements using linguistic credibility analysis. Decision Support Systems, 50(3), 585–594.
Iselin, E. R. (1988). The effects of information load and information diversity on decision quality in a structured decision task. Accounting, Organizations and Society, 13(2), 147–164.
Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20, 422–446.
Johnson, S. A., Ryan, H. E., & Tian, Y. S. (2009). Managerial incentives and corporate fraud: The sources of incentives matter. Review of Finance, 13, 115–145.
Karpoff, J. M., Lee, D. S., & Martin, G. S. (2008). The costs to firms of cooking the books. Journal of Financial and Quantitative Analysis, 43(03), 581–612.
Karpoff, J. M., Koester, A., Lee, D. S., & Martin, G. S. (2017). Proxies and databases in financial misconduct research. The Accounting Review, 92(6), 129–163.
Kleinberg, J., Ludwig, J., Mullainathan, S., & Obermeyer, Z. (2015). Prediction policy problems. American Economic Review: Papers & Proceedings, 105(5), 491–495.
KPMG. Peat Marwick (1998). Fraud Survey. KPMG Peat Marwick.
Larcker, D. F., Richardson, S. A., & Tuna, I. (2007). Corporate governance, accounting outcomes, and organizational performance. The Accounting Review, 82(4), 963–1008.
Larcker, D., & Zakolyukina, A. A. (2012). Detecting deceptive discussion in conference calls. Journal of Accounting Research, 50, 495–540.
Li, H., Liu, B., Mukherjee, A., & Shao, J. (2014). Spotting fake reviews using positive-unlabeled learning. Computación y Sistemas, 18(3), 467–475.
Liang, C., Liu, Z., Liu, B., Zhou, J., Li, X., and Yang, S. (2019). Uncovering Insurance Fraud Conspiracy with Network Learning. Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 1181–1184.
Lin, J., Hwang, M., & Becker, J. (2003). A fuzzy neural network for assessing the risk of fraudulent financial reporting. Managerial Auditing Journal, 18, 657–665.
Liu, S., Hooi, B., & Faloutsos, C. (2019). A contrast metric for fraud detection in rich graphs. IEEE Transactions on Knowledge and Data Engineering, 31(12), 2235–2248.
Ngai, E. W., Hu, Y., Wong, Y. H., Chen, Y., & Sun, X. (2011). The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision Support Systems, 50(3), 559–569.
Oentaryo, R., Lim, E.-P., Finegold, M., Lo, D., Zhu, F., Phua, C., et al. (2014). Detecting click fraud in online advertising: A data mining approach. The Journal of Machine Learning Research, 15(1), 99–140.
Perols, J. L., Bowen, R. M., Zimmermann, C., & Samba, B. (2017). Finding needles in a haystack: Using data analytics to improve fraud prediction. The Accounting Review, 92, 221–245.
Purda, L., & Skillicorn, D. (2015). Accounting variables, deception, and a bag of words: Assessing the tools of fraud detection. Contemporary Accounting Research, 32(3), 1193–1223.
Salganik, M., Lundberg, I., Kindel, A., Ahearn, C., Al-Ghoneim, K. Almaatouq, A., Altschul, D., Brand, J., Carnegie, N., Compton, R, Datta, D., Davidson, T., Filippova, A., Gilroy, C., Goode, B., Jahani, E., Kashyap, R., Kirchner, A., Mckay, S. (2020). Measuring the predictability of life outcomes with a scientific mass collaboration. Proceedings of the National Academy of Sciences. 117.
Shah, N., Lamba, H., Beutel, A., & Faloutsos, C. (2017). The many faces of link fraud. 2017 IEEE International Conference on Data Mining (ICDM). IEEE, pp. 1069–1074.
Shmueli, G. (2010). To explain or to predict. Statistical Science, 25, 289–310.
Van Vlasselaer, V., Eliassi-Rad, T., Akoglu, L., Snoeck, M., & Baesens, B. (2017). Gotcha! Network-based fraud detection for social security fraud. Management Science, 63(9), 3090–3110.
Varian, H. R. (2014). Big data: New tricks for econometrics. Journal of Economic Perspectives, 28, 3–28.
Wang, D., Lin, J., Cui, P., Jia, Q., Wang, Z., Fang, Y., et al. (2019a). A Semi-supervised Graph Attentive Network for Financial Fraud Detection. 2019 IEEE International Conference on Data Mining (ICDM). IEEE, pp. 598–607.
Wang Y., Wang L., Li Y., He D., Chen W., Liu T.-Y. (2013). A Theoretical Analysis of NDCG Ranking Measures. In Proceedings of the 26th Annual Conference on Learning Theory.
Wang, J., Wen, R., Wu, C., Huang, Y., & Xion, J. (2019b). Fdgars: Fraudster detection via graph convolutional networks in online app review system. Companion Proceedings of The 2019 World Wide Web Conference. pp. 310–316.
Wang, Y., & Xu, W. (2018). Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud. Decision Support Systems, 105, 87–95.
Whiting, D. G., Hansen, J. V., McDonald, J. B., Albrecht, C., & Albrecht, W. S. (2012). Machine learning methods for detecting patterns of management fraud. Computational Intelligence, 28, 505–527.
Xu, C., Zhang, J., & Sun, Z. (2017). Online reputation fraud campaign detection in user ratings. IJCAI, 3873–3879.
Yuan, S., Wu, X., Li, J., & Lu, A. (2017) Spectrum-based deep neural networks for fraud detection. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. pp. 2419–2422.
Zhang, J., Yang, X., & Appelbaum, D. (2015). Toward effective big data analysis in continuous auditing. Accounting Horizons, 29(2), 469–476.
Zhang, Y.-L., Zhou, J., Zheng, W., Feng, J., Li, L., Liu, Z., et al. (2019). Distributed deep forest and its application to automatic detection of cash-out fraud. ACM Transactions on Intelligent Systems and Technology (TIST), 10(5), 1–19.
Zheng, P., Yuan, S., Wu, X., Li, J., & Lu, A. (2019) One-class adversarial nets for fraud detection. Proceedings of the AAAI Conference on Artificial Intelligence. pp. 1286–1293.
Zhong, Q., Liu, Y., Ao, X., Hu, B., Feng, J., Tang, J., et al. (2020). Financial defaulter detection on online credit payment via multi-view attributed heterogeneous information network. Proceedings of The Web Conference 2020. pp. 785–795.
Zhu, Y., Xi, D., Song, B., Zhuang, F., Chen, S., Gu, X., et al. (2020) Modeling Users’ Behavior Sequences with Hierarchical Explainable Network for Cross-domain Fraud Detection. Proceedings of The Web Conference 2020. pp. 928–938.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Editors
About this chapter
Cite this chapter
Bao, Y., Hilary, G., Ke, B. (2022). Artificial Intelligence and Fraud Detection. In: Babich, V., Birge, J.R., Hilary, G. (eds) Innovative Technology at the Interface of Finance and Operations. Springer Series in Supply Chain Management, vol 11. Springer, Cham. https://doi.org/10.1007/978-3-030-75729-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-75729-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75728-1
Online ISBN: 978-3-030-75729-8
eBook Packages: Economics and FinanceEconomics and Finance (R0)