Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

FrauDetector+: An Incremental Graph-Mining Approach for Efficient Fraudulent Phone Call Detection

Published: 28 August 2018 Publication History

Abstract

In recent years, telecommunication fraud has become more rampant internationally with the development of modern technology and global communication. Because of rapid growth in the volume of call logs, the task of fraudulent phone call detection is confronted with big data issues in real-world implementations. Although our previous work, FrauDetector, addressed this problem and achieved some promising results, it can be further enhanced because it focuses only on fraud detection accuracy, whereas the efficiency and scalability are not top priorities. Other known approaches for fraudulent call number detection suffer from long training times or cannot accurately detect fraudulent phone calls in real time. However, the learning process of FrauDetector is too time-consuming to support real-world application. Although we have attempted to accelerate the the learning process of FrauDetector by parallelization, the parallelized learning process, namely PFrauDetector, still cannot afford the computing cost. In this article, we propose a highly efficient incremental graph-mining-based fraudulent phone call detection approach, namely FrauDetector+, which can automatically label fraudulent phone numbers with a “fraud” tag a crucial prerequisite for distinguishing fraudulent phone call numbers from nonfraudulent ones. FrauDetector+ initially generates smaller, more manageable subnetworks from original graph and performs a parallelized weighted HITS algorithm for a significant speed increase in the graph learning module. It adopts a novel aggregation approach to generate a trust (or experience) value for each phone number (or user) based on their respective local values. After the initial procedure, we can incrementally update the trust (or experience) value for each phone number (or user) while a new fraud phone number is identified. An efficient fraud-centric hash structure is constructed to support fast real-time detection of fraudulent phone numbers in the detection module. We conduct a comprehensive experimental study based on real datasets collected through an antifraud mobile application called Whoscall. The results demonstrate a significantly improved efficiency of our approach compared with FrauDetector as well as superior performance against other major classifier-based methods.

References

[1]
Ping An, Alin Jula, Silvius Rus, Steven Saunders, Tim Smith, Gabriel Tanase, Nathan Thomas, Nancy Amato, and Lawrence Rauchwerger. 2001a. STAPL: A standard template adaptive parallel C++ library. In Proceedings of the International Workshop on Advanced Compiler Technology for High Performance and Embedded Processors.
[2]
Ping An, Alin Jula, Silvius Rus, Steven Saunders, Tim Smith, Gabriel Tanase, Nathan Thomas, Nancy Amato, and Lawrence Rauchwerger. 2001b. STAPL: An adaptive, generic parallel C++ library. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Springer, pp. 193--208.
[3]
Richard A. Becker, Chris Volinsky, and Allan R. Wilks. 2010. Fraud detection in telecommunications: History and lessons learned. Technometrics 52, 1 (2010), 20--33.
[4]
Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (Oct. 2001), 5--32.
[5]
J. Burez and D. Van den Poel. 2009. Handling class imbalance in customer churn prediction. Expert Systems With Applications 36, 3 (Apr. 2009), 4626--4636.
[6]
Michael H. Cahill, Diane Lambert, José C. Pinheiro, and Don X. Sun. 2002. Handbook of Massive Data Sets. Kluwer Academic Publishers, Norwell, MA, 911--929.
[7]
Albert Chan and Frank Dehne. 2004. cgmLIB: A Library for Coarse-Grained Parallel Computing. Springer. Retrieved from http://people.scs.carleton.ca/∼dehne/projects/Cgmlib/.
[8]
Rishan Chen, Xuetian Weng, Bingsheng He, Mao Yang, Byron Choi, and Xiaoming Li. 2010. On the Efficiency and Programmability of Large Graph Processing in the Cloud. Technical Report MSR-TR-2010-44. Microsoft Research. Retrieved from https://www.microsoft.com/en-us/research/publication/on-the-efficiency-and-programmability-of-large-graph-processing-in-the-cloud/.
[9]
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273--297.
[10]
Brian Barrett, Andrew Lumsdaine, Douglas Gregor, and Nick Edmonds. 2005. The Parallel Boost Graph Library. (2005). Retrieved from http://www.osl.iu.edu/research/pbgl/.
[11]
Tom Fawcett and Foster J. Provost. 1996. Combining data mining and machine learning for effective user profiling. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96), Portland, Oregon. 8--13.
[12]
Tom Fawcett and Foster J. Provost. 1997. Adaptive fraud detection. Data Mining and Knowledge Discovery 1, 3 (1997), 291--316.
[13]
Al Geist. 2011. PVM: (Parallel virtual machine). In Encyclopedia of Parallel Computing. D. Padua (Ed.), Springer, 1647--1651. Retrieved from https://www.springer.com/gp/book/9780387097657.
[14]
Al Geist, William Gropp, Steven Huss-Lederman, Andrew Lumsdaine, Ewing L. Lusk, William Saphir, Anthony Skjellum, and Marc Snir. 1996. MPI-2: Extending the message-passing interface. In Proceedings of the Second International European Conference on Parallel Processing (Euro-Par’96), Lyon, France, vol. I, August 26--29. 128--135. Retrieved from https://link.springer.com/chapter/10.1007/3-540-61626-8_16.
[15]
Florian Hielscher and Peter Gottschling. 2004. ParGraph. Sourceforge.net, USA, p. 1. Retrieved from http://pargraph.sourceforge.net/.
[16]
Jon M. Kleinberg. 1999a. Authoritative sources in a hyperlinked environment. Journal of ACM 46, 5 (Sep. 1999), 604--632.
[17]
Jon M. Kleinberg. 1999b. Hubs, authorities, and communities. ACM Computing Surveys 31, 4es (Dec. 1999), Article 5.
[18]
Sam Maes, Karl Tuyls, Bram Vanschoenwinkel, and Bernard Manderick. 1993. Credit card fraud detection using bayesian and neural networks. In Interactive Image-Guided Neurosurgery. American Association Neurological Surgeons, R. J. Maciunas (Ed.), American Association of Neurological Surgeons, 261--270. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.18.53778rep=rep18type=pdf.
[19]
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’10), Indianapolis, Indiana, Jun. 6--10, 135--146.
[20]
Dominik Olszewski. 2012. A probabilistic approach to fraud detection in telecommunications. Knowledge-Based Systems 26 (2012), 246--258.
[21]
J. Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA.
[22]
Jeremy G. Siek, Lee-Quan Lee, and Andrew Lumsdaine. 2002. The Boost Graph Library: User Guide and Reference Manual. Addison-Wesley. Retrieved from https://markqiu.files.wordpress.com/2009/12/boost-graph-library.pdf.
[23]
Jeremy G. Siek, Andrew Lumsdainee, and Lee-Quan Le. 2001. Boost Graph Library. Addison-Wesley. Retrieved from https://www.boost.org/doc/libs/1_54_0/libs/graph/doc/index.html.
[24]
Vincent S. Tseng, Jia-Ching Ying, Che-Wei Huang, Yimin Kao, and Kuan-Ta Chen. 2015. FrauDetector: A graph-mining-based framework for fraudulent phone call detection. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10--13, 2157--2166.
[25]
Sun-Chong Wang. 2003. Interdisciplinary Computing in Java Programming Language. Kluwer Academic Publishers, Dordrecht, the Netherlands.
[26]
M. Weatherford. 2002. Mining for fraud. IEEE Intelligent Systems 17, 4 (Jul. 2002), 4--6.
[27]
Josh Jia-Ching Ying, Ji Zhang, Che-Wei Huang, Kuan-Ta Chen, and Vincent S. Tseng. 2016. PFrauDetector: A parallelized graph mining approach for efficient fraudulent phone call detection. In Proceedings of IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS’16). pp. 1059--1066.

Cited By

View all
  • (2023)Themis: Detecting Anomalies from Disguised Normal Financial Activities2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00016(71-80)Online publication date: 1-Dec-2023
  • (2023)Explicable Integration Techniques: Relative Temporal Position TaxonomyAnti-Fraud Engineering for Digital Finance10.1007/978-981-99-5257-1_4(87-112)Online publication date: 2-Dec-2023
  • (2023)Horizontal Association Modeling: Deep Relation ModelingAnti-Fraud Engineering for Digital Finance10.1007/978-981-99-5257-1_3(43-85)Online publication date: 2-Dec-2023
  • Show More Cited By

Index Terms

  1. FrauDetector+: An Incremental Graph-Mining Approach for Efficient Fraudulent Phone Call Detection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 12, Issue 6
      December 2018
      327 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/3271478
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 August 2018
      Accepted: 01 June 2018
      Revised: 01 June 2018
      Received: 01 December 2016
      Published in TKDD Volume 12, Issue 6

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Telecommunication fraud
      2. fraudulent phone call detection
      3. incremental learning
      4. parallelized weighted HITS algorithm
      5. trust value mining

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • Ministry of Science and Technology, Taiwan, R.O.C.

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)30
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 01 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Themis: Detecting Anomalies from Disguised Normal Financial Activities2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00016(71-80)Online publication date: 1-Dec-2023
      • (2023)Explicable Integration Techniques: Relative Temporal Position TaxonomyAnti-Fraud Engineering for Digital Finance10.1007/978-981-99-5257-1_4(87-112)Online publication date: 2-Dec-2023
      • (2023)Horizontal Association Modeling: Deep Relation ModelingAnti-Fraud Engineering for Digital Finance10.1007/978-981-99-5257-1_3(43-85)Online publication date: 2-Dec-2023
      • (2022)sGrow: Explaining the Scale-Invariant Strength Assortativity of Streaming ButterfliesACM Transactions on the Web10.1145/357240817:3(1-46)Online publication date: 14-Dec-2022
      • (2022)CAeSaR: An Online Payment Anti-Fraud Integration System With Decision ExplainabilityIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2022.3186733(1-14)Online publication date: 2022
      • (2022)Dynamic Behavior Pattern: Mining the Fraudsters in Telecom Network2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00109(619-626)Online publication date: Dec-2022
      • (2021)Constrained Dual-Level Bandit for Personalized Impression Regulation in Online Ranking SystemsACM Transactions on Knowledge Discovery from Data10.1145/346134016:2(1-23)Online publication date: 21-Jul-2021
      • (2020)Influence Analysis in Evolving Networks: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.2934447(1-1)Online publication date: 2020
      • (2020)LAW: Learning Automatic Windows for Online Payment Fraud DetectionIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2020.3037784(1-1)Online publication date: 2020
      • (2019)Stacked-SVMProceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence10.1145/3377713.3377735(112-120)Online publication date: 20-Dec-2019

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media