Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1081870.1081922acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Using relational knowledge discovery to prevent securities fraud

Published: 21 August 2005 Publication History

Abstract

We describe an application of relational knowledge discovery to a key regulatory mission of the National Association of Securities Dealers (NASD). NASD is the world's largest private-sector securities regulator, with responsibility for preventing and discovering misconduct among securities brokers. Our goal was to help focus NASD's limited regulatory resources on the brokers who are most likely to engage in securities violations. Using statistical relational learning algorithms, we developed models that rank brokers with respect to the probability that they would commit a serious violation of securities regulations in the near future. Our models incorporate organizational relationships among brokers (e.g., past coworker), which domain experts consider important but have not been easily used before now. The learned models were subjected to an extensive evaluation using more than 18 months of data unseen by the model developers and comprising over two person weeks of effort by NASD staff. Model predictions were found to correlate highly with the subjective evaluations of experienced NASD examiners. Furthermore, in all performance measures, our models performed as well as or better than the handcrafted rules that are currently in use at NASD.

References

[1]
H. Blau, N. Immerman, and D. Jensen. A visual query language for relational knowledge discovery. Technical Report TR-01-28, University of Massachusetts Amherst, Computer Science Dept., 2001.
[2]
C. Brodley and P. Smyth. Applying classification algorithms in practice. Statistics and Computing, 7:45--56, 1997.
[3]
S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In Proc. of the ACM SIGMOD International Conference on Management of Data, pages 307--318, 1998.
[4]
C. Cortes, D. Pregibon, and C. Volinsky. Communities of interest. In Proc. of the 4th International Symposium of Intelligent Data Analysis, 2001.
[5]
T. Fawcett and F. Provost. Adaptive fraud detection. Data Mining and Knowledge Discovery, 1:291--316, 1997.
[6]
U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery in databases. AI Magazine, Fall:37--54, 1996.
[7]
L. Getoor, N. Friedman, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In Relational Data Mining. Springer-Verlag, 2001.
[8]
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, 2000.
[9]
D. Jensen and J. Neville. Linkage and autocorrelation cause feature selection bias in relational learning. In Proc. of the 19th International Conference on Machine Learning, pages 259--266, 2002.
[10]
D. Jensen and J. Neville. Avoiding bias when aggregating relational data with degree disparity. In Proc. of the 20th International Conference on Machine Learning, pages 274--281, 2003.
[11]
D. Jensen, J. Neville, and B. Gallagher. Why collective inference improves relational classification. In Proc. of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 593--598, 2004.
[12]
J. Neville, D. Jensen, L. Friedland, and M. Hay. Learning relational probability trees. In Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 625--630, 2003.
[13]
F. Provost and P. Domingos. Tree induction for probability-based rankings. Machine Learning, 52:3, 2003.
[14]
F. Provost and T. Fawcett. Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In Proc. of the 3rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 43--48, 1997.
[15]
S. Sanghai, P. Domingos, and D. Weld. Dynamic probabilistic relational models. In Proc. of the 18th International Joint Conference on Artificial Intelligence, pages 992--1002, 2003.
[16]
B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In Proc. of the 18th Conference on Uncertainty in Artificial Intelligence, pages 485--492, 2002.

Cited By

View all
  • (2024)Real-time anomaly detection in sky quality meter data using probabilistic exponential weighted moving averageInternational Journal of Data Science and Analytics10.1007/s41060-024-00535-8Online publication date: 20-Apr-2024
  • (2023)UD-GCN: Uncertainty-Based Semi-supervised Deep GCN for Imbalanced Node ClassificationBig Data10.1007/978-981-99-8979-9_9(112-124)Online publication date: 15-Dec-2023
  • (2023)Dual Channel Graph Neural Network for Fraud DetectionArtificial Intelligence Logic and Applications10.1007/978-981-99-7869-4_19(241-254)Online publication date: 15-Nov-2023
  • Show More Cited By

Index Terms

  1. Using relational knowledge discovery to prevent securities fraud

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
      August 2005
      844 pages
      ISBN:159593135X
      DOI:10.1145/1081870
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 August 2005

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. fraud detection
      2. relational probability trees
      3. statistical relational learning

      Qualifiers

      • Article

      Conference

      KDD05

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)14
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 06 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Real-time anomaly detection in sky quality meter data using probabilistic exponential weighted moving averageInternational Journal of Data Science and Analytics10.1007/s41060-024-00535-8Online publication date: 20-Apr-2024
      • (2023)UD-GCN: Uncertainty-Based Semi-supervised Deep GCN for Imbalanced Node ClassificationBig Data10.1007/978-981-99-8979-9_9(112-124)Online publication date: 15-Dec-2023
      • (2023)Dual Channel Graph Neural Network for Fraud DetectionArtificial Intelligence Logic and Applications10.1007/978-981-99-7869-4_19(241-254)Online publication date: 15-Nov-2023
      • (2023)TE-DyGE: Temporal Evolution-Enhanced Dynamic Graph Embedding NetworkDatabase Systems for Advanced Applications10.1007/978-3-031-30675-4_13(183-198)Online publication date: 15-Apr-2023
      • (2022)Point-of-interest recommendation model considering strength of user relationship for location-based social networksExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.117147199:COnline publication date: 1-Aug-2022
      • (2020)Entity Resolution in Dynamic Heterogeneous NetworksCompanion Proceedings of the Web Conference 202010.1145/3366424.3391264(662-668)Online publication date: 20-Apr-2020
      • (2018)On the Generative Discovery of Structured Medical KnowledgeProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3220010(2720-2728)Online publication date: 19-Jul-2018
      • (2018)Continuous-Time Dynamic Network EmbeddingsCompanion Proceedings of the The Web Conference 201810.1145/3184558.3191526(969-976)Online publication date: 23-Apr-2018
      • (2017)GOTCHA! Network-Based Fraud Detection for Social Security FraudManagement Science10.1287/mnsc.2016.248963:9(3090-3110)Online publication date: 1-Sep-2017
      • (2017)Using Large-Scale Machine Learning to Improve Our Understanding of the Formation of TornadoesLarge-Scale Machine Learning in the Earth Sciences10.1201/9781315371740-7(95-112)Online publication date: Sep-2017
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media