Article

Using relational knowledge discovery to prevent securities fraud

Authors:

Jennifer Neville,

Özgür Şimşek,

John Komoroske,

Henry GoldbergAuthors Info & Claims

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

Pages 449 - 458

https://doi.org/10.1145/1081870.1081922

Published: 21 August 2005 Publication History

Abstract

We describe an application of relational knowledge discovery to a key regulatory mission of the National Association of Securities Dealers (NASD). NASD is the world's largest private-sector securities regulator, with responsibility for preventing and discovering misconduct among securities brokers. Our goal was to help focus NASD's limited regulatory resources on the brokers who are most likely to engage in securities violations. Using statistical relational learning algorithms, we developed models that rank brokers with respect to the probability that they would commit a serious violation of securities regulations in the near future. Our models incorporate organizational relationships among brokers (e.g., past coworker), which domain experts consider important but have not been easily used before now. The learned models were subjected to an extensive evaluation using more than 18 months of data unseen by the model developers and comprising over two person weeks of effort by NASD staff. Model predictions were found to correlate highly with the subjective evaluations of experienced NASD examiners. Furthermore, in all performance measures, our models performed as well as or better than the handcrafted rules that are currently in use at NASD.

References

[1]

H. Blau, N. Immerman, and D. Jensen. A visual query language for relational knowledge discovery. Technical Report TR-01-28, University of Massachusetts Amherst, Computer Science Dept., 2001.

Digital Library

[2]

C. Brodley and P. Smyth. Applying classification algorithms in practice. Statistics and Computing, 7:45--56, 1997.

Digital Library

[3]

S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In Proc. of the ACM SIGMOD International Conference on Management of Data, pages 307--318, 1998.

Digital Library

[4]

C. Cortes, D. Pregibon, and C. Volinsky. Communities of interest. In Proc. of the 4th International Symposium of Intelligent Data Analysis, 2001.

Digital Library

[5]

T. Fawcett and F. Provost. Adaptive fraud detection. Data Mining and Knowledge Discovery, 1:291--316, 1997.

Digital Library

[6]

U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery in databases. AI Magazine, Fall:37--54, 1996.

Digital Library

[7]

L. Getoor, N. Friedman, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In Relational Data Mining. Springer-Verlag, 2001.

Digital Library

[8]

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, 2000.

[9]

D. Jensen and J. Neville. Linkage and autocorrelation cause feature selection bias in relational learning. In Proc. of the 19th International Conference on Machine Learning, pages 259--266, 2002.

Digital Library

[10]

D. Jensen and J. Neville. Avoiding bias when aggregating relational data with degree disparity. In Proc. of the 20th International Conference on Machine Learning, pages 274--281, 2003.

[11]

D. Jensen, J. Neville, and B. Gallagher. Why collective inference improves relational classification. In Proc. of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 593--598, 2004.

Digital Library

[12]

J. Neville, D. Jensen, L. Friedland, and M. Hay. Learning relational probability trees. In Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 625--630, 2003.

Digital Library

[13]

F. Provost and P. Domingos. Tree induction for probability-based rankings. Machine Learning, 52:3, 2003.

Digital Library

[14]

F. Provost and T. Fawcett. Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In Proc. of the 3rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 43--48, 1997.

[15]

S. Sanghai, P. Domingos, and D. Weld. Dynamic probabilistic relational models. In Proc. of the 18th International Joint Conference on Artificial Intelligence, pages 992--1002, 2003.

Digital Library

[16]

B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In Proc. of the 18th Conference on Uncertainty in Artificial Intelligence, pages 485--492, 2002.

Digital Library

Cited By

Riza LPutra ZZain MTrihutama FUtama JSamah KHerdiwijaya DNQZ RMumpuni EPriyatikanto R(2024)Real-time anomaly detection in sky quality meter data using probabilistic exponential weighted moving averageInternational Journal of Data Science and Analytics10.1007/s41060-024-00535-8Online publication date: 20-Apr-2024
https://doi.org/10.1007/s41060-024-00535-8
Wei BHe Q(2023)UD-GCN: Uncertainty-Based Semi-supervised Deep GCN for Imbalanced Node ClassificationBig Data10.1007/978-981-99-8979-9_9(112-124)Online publication date: 15-Dec-2023
https://doi.org/10.1007/978-981-99-8979-9_9
Tan XHeng YLi X(2023)Dual Channel Graph Neural Network for Fraud DetectionArtificial Intelligence Logic and Applications10.1007/978-981-99-7869-4_19(241-254)Online publication date: 15-Nov-2023
https://doi.org/10.1007/978-981-99-7869-4_19
Show More Cited By

Index Terms

Using relational knowledge discovery to prevent securities fraud
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Relational data pre-processing techniques for improved securities fraud detection
KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

Commercial datasets are often large, relational, and dynamic. They contain many records of people, places, things, events and their interactions over time. Such datasets are rarely structured appropriately for knowledge discovery, and they often contain ...
Digital fraud: a market regulation approach to digital rights/restrictions management
LawTech '07: Proceedings of the Fifth IASTED International Conference on Law and Technology

Digital Rights/Restrictions Management (hereafter DRM) provokes strong reactions. Some view DRM as a potential threat to free expression. Others view DRM as a technological solution to bargaining failure, obviating some of the need for the "fair use" ...
Using ESI discovery teams to manage electronic data discovery

Introduction

The importance of electronically stored information (ESI) in litigation has increased greatly over the past decade. Responding to the "discovery" requests made by all litigating parties has become a significant management function and cost ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

August 2005

844 pages

ISBN:159593135X

DOI:10.1145/1081870

General Chair:
Robert Grossman
University of Illinois at Chicago & Open Data Partners, USA
,
Program Chairs:
Roberto Bayardo
IBM Almaden Research, USA
,
Kristin Bennett
RPI, USA

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD05

Sponsor:

KDD05: The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 21 - 24, 2005

Illinois, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

66
Total Citations
View Citations
1,079
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)2

Reflects downloads up to 06 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Riza LPutra ZZain MTrihutama FUtama JSamah KHerdiwijaya DNQZ RMumpuni EPriyatikanto R(2024)Real-time anomaly detection in sky quality meter data using probabilistic exponential weighted moving averageInternational Journal of Data Science and Analytics10.1007/s41060-024-00535-8Online publication date: 20-Apr-2024
https://doi.org/10.1007/s41060-024-00535-8
Wei BHe Q(2023)UD-GCN: Uncertainty-Based Semi-supervised Deep GCN for Imbalanced Node ClassificationBig Data10.1007/978-981-99-8979-9_9(112-124)Online publication date: 15-Dec-2023
https://doi.org/10.1007/978-981-99-8979-9_9
Tan XHeng YLi X(2023)Dual Channel Graph Neural Network for Fraud DetectionArtificial Intelligence Logic and Applications10.1007/978-981-99-7869-4_19(241-254)Online publication date: 15-Nov-2023
https://doi.org/10.1007/978-981-99-7869-4_19
Wang LShen YChen L(2023)TE-DyGE: Temporal Evolution-Enhanced Dynamic Graph Embedding NetworkDatabase Systems for Advanced Applications10.1007/978-3-031-30675-4_13(183-198)Online publication date: 15-Apr-2023
https://doi.org/10.1007/978-3-031-30675-4_13
Zhou YYang GYan BCai YZhu Z(2022)Point-of-interest recommendation model considering strength of user relationship for location-based social networksExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.117147199:COnline publication date: 1-Aug-2022
https://dl.acm.org/doi/10.1016/j.eswa.2022.117147
Shekhar SPai DRavindran S(2020)Entity Resolution in Dynamic Heterogeneous NetworksCompanion Proceedings of the Web Conference 202010.1145/3366424.3391264(662-668)Online publication date: 20-Apr-2020
https://dl.acm.org/doi/10.1145/3366424.3391264
Zhang CLi YDu NFan WYu PGuo YFarooq F(2018)On the Generative Discovery of Structured Medical KnowledgeProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3220010(2720-2728)Online publication date: 19-Jul-2018
https://dl.acm.org/doi/10.1145/3219819.3220010
Nguyen GLee JRossi RAhmed NKoh EKim SChampin PGandon FMédini LLalmas MIpeirotis P(2018)Continuous-Time Dynamic Network EmbeddingsCompanion Proceedings of the The Web Conference 201810.1145/3184558.3191526(969-976)Online publication date: 23-Apr-2018
https://dl.acm.org/doi/10.1145/3184558.3191526
Van Vlasselaer VEliassi-Rad TAkoglu LSnoeck MBaesens B(2017)GOTCHA! Network-Based Fraud Detection for Social Security FraudManagement Science10.1287/mnsc.2016.248963:9(3090-3110)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1287/mnsc.2016.2489
(2017)Using Large-Scale Machine Learning to Improve Our Understanding of the Formation of TornadoesLarge-Scale Machine Learning in the Earth Sciences10.1201/9781315371740-7(95-112)Online publication date: Sep-2017
https://doi.org/10.1201/9781315371740-7
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents