article

Multi-level relationship outlier detection

Authors:

Akiko Campbell,

Jian PeiAuthors Info & Claims

International Journal of Business Intelligence and Data Mining, Volume 7, Issue 4

Pages 253 - 273

https://doi.org/10.1504/IJBIDM.2012.051713

Published: 01 January 2012 Publication History

Abstract

Relationship management is critical in business. Particularly, it is important to detect abnormal relationships, such as fraudulent relationships between service providers and consumers. Surprisingly, in the literature there is no systematic study on detecting relationship outliers. Particularly, no existing methods can detect and handle relationship outliers between groups and individuals in groups. In this paper, we tackle this important problem by developing a simple yet effective model. The major novelty is that we identify two types of outliers and devise efficient detection algorithms. Our experiments on both real data and synthetic data confirm the effectiveness, efficiency and scalability of our approach. The techniques reported in this paper have been in production in a large scale business application.

References

[1]

Agyemang, M., Barker, K. and Alhajj, R. (2006) 'A comprehensive survey of numeric and symbolic outlier mining techniques', Intell. Data Anal., Vol. 10, No. 6, pp. 521-538.

[2]

Bakar, Z.A., Mohemad, R., Ahmad, A. and Deris, M.M. (2006) 'A comparative study for outlier detection techniques in data mining', Proc. 2006 IEEE Conf. Cybernetics and Intelligent Systems, Bangkok, Thailand, pp. 1-6.

[3]

Berry, M. and Linoff, G. (1999) Mastering Data Mining: The Art and Science of Customer Relationship Management, John Wiley & Sons, Inc., New York, NY, USA.

[4]

Beyer, K. and Ramakrishnan, R. (1999) 'Bottom-up computation of sparse and iceberg CUBEs', Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD '99), Philadelphia, PA, pp. 359-370.

Digital Library

[5]

He, Z., Xu, X. and Deng, S. (2003) 'Discovering cluster-based local outliers', Pattern Recognition Letters, Elsevier, Vol. 24, Nos. 9-10, pp. 1641-1650.

[6]

Chandola, V., Banerjee, A. and Kumar, V. (2009) 'Anomaly detection: a survey', ACM Computing Surveys, Vol. 41, No. 4, pp. 1-58.

Digital Library

[7]

Chawla, N.V., Japkowicz, N. and Kotcz, A. (2004) 'Editorial: special issue on learning from imbalanced data sets', ACM SIGKDD Explorations Newsletter, ACM, Vol. 6, pp. 1-6.

[8]

Chebyshev, P.L. (1874) 'Sur les valeurs limites des intégrales', Imprimerie de Gauthier-Villars.

[9]

Cheng, J. and Hurson, A. (1991) 'Effective clustering of complex objects in object-oriented databases', ACM SIGMOD Record, ACM.

[10]

Cooper, M., Lambert, D. and Pagh, J. (1997) 'Supply chain management: more than a new name for logistics', The International Journal of Logistics Management, Vol. 8, No. 1, pp. 1-14.

[11]

Eskin, E., Arnold, A., Prerau, M., Portnoy, L. and Stolfo, S. (2002) 'A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data', Applications of Data Mining in Computer Security, pp. 77-101, Kluwer, Boston.

[12]

Ester, M., Kriegel, H., Sander, J. and Xu, X. (1996) 'A density-based algorithm for discovering clusters in large spatial databases with noise', Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, AAAI Press, pp. 226-231.

Digital Library

[13]

Gray, J., Bosworth, A., Layman, A. and Pirahesh, H. (1996) 'Data cube: a relational operator generalizing group-by, cross-tab and sub-totals', Proc. 1996 Int. Conf. Data Engineering (ICDE'96), New Orleans, Louisiana, pp. 152-159.

[14]

Han, J., Kamber, M. and Pei, J. (2000) Data Mining: Concepts and Techniques, 3rd ed., Morgan Kaufmann, Burlington, MA, USA, ISBN: 1-55860-489-8.

[15]

Hawkins, D. (1980) Identification of Outliers, Chapman and Hall, London.

[16]

Hodge, V.J. and Austin, J. (2004) 'A survey of outlier detection methodologies', Artificial Intelligence Review, Vol. 22, No. 2, pp. 85-126.

Digital Library

[17]

Huang, J., Shimizu, H. and Shioya, S. (2003) 'Clustering gene expression pattern and extracting relationship in gene network based on artificial neural networks', Journal of Bioscience and Bioengineering, Vol. 96, No. 5, pp. 421-428, Elsevier.

[18]

Inmon, W.H. (2005) Building the Data Warehouse, Wiley-India, New Delhi, India.

[19]

Jiang, B., Pei, J., Tao, Y. and Lin, X. (2011) 'Clustering uncertain data based on probability distribution similarity', IEEE Transactions on Knowledge and Data Engineering.

[20]

Joshi, M., Agarwal, R. and Kumar, V. (2001) 'Mining needle in a haystack: classifying rare classes via two-phase rule induction', ACM SIGMOD Record, ACM, Vol. 30, pp. 91-102.

Digital Library

[21]

Joshi, M., Agarwal, R. and Kumar, V. (2002) 'Predicting rare classes: can boosting make any weak learner strong?', Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 297-306.

Digital Library

[22]

Knorr, E. and Ng, R. (1998) 'Algorithms for mining distance-based outliers in large datasets', Proceedings of the International Conference on Very Large Data Bases, Citeseer, pp. 392-403.

[23]

Kou, Y., Lu, C., Sirwongwattana, S. and Huang, Y. (2004) 'Survey of fraud detection techniques', IEEE International Conference on Networking, Sensing and Control, IEEE.

[24]

Kullback, S. and Leibler, R.A. (1951) 'On information and sufficiency', The Annals of Mathematical Statistics, Vol. 22, No. 1, pp. 79-86.

[25]

Payne, A. and Frow, P. (2005) 'A strategic framework for customer relationship management', Journal of Marketing, Vol. 69, No. 4, pp. 167-176.

[26]

Phua, C., Alahakoon, D. and Lee, V. (2004) 'Minority report in fraud detection: classification of skewed data', ACM SIGKDD Explorations Newsletter, ACM, Vol. 6, pp. 50-59.

[27]

Rymon, R. (1992) 'Search through systematic set enumeration', Proc. 1992 Int. Conf. Principle of Knowledge Representation and Reasoning (KR '92), Cambridge, MA, pp. 539-550.

[28]

Sarawagi, S., Agrawal, R. and Megiddo, N. (1998) 'Discovery-driven exploration of OLAP data cubes', Proc. Int. Conf. of Extending Database Technology (EDBT '98), Valencia, Spain, pp. 168-182.

[29]

Scott, D. (1992) Multivariate Density Estimation, Wiley Online Library, Hoboken, NJ, USA.

[30]

Silverman, B. (1986) Density Estimation for Statistics and Data Analysis, Chapman & Hall/CRC, London, England.

[31]

Strehl, A. and Ghosh, J. (2003) 'Relationship-based clustering and visualization for high-dimensional data mining', INFORMS Journal on Computing, Vol. 15, No. 2, pp. 208-230.

Digital Library

[32]

Vilalta, R. and Ma, S. (2002) 'Predicting rare events in temporal domains', Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, pp. 474-481.

[33]

Weiss, G. and Hirsh, H. (1998) 'Learning to predict rare events in event sequences', Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 359-363.

Digital Library

[34]

Yu, H., Pei, J., Tang, S. and Yang, D. (2005) 'Mining most general multidimensional summarization of probable groups in data warehouses', Proceedings of the 17th International Conference on Scientific and Statistical Database Management, Lawrence Berkeley Laboratory.

Digital Library

[35]

Zhang, K., Shi, S., Gao, H. and Li, J. (2007) 'Unsupervised outlier detection in sensor networks using aggregation tree', Proceedings of the 3rd International Conference on Advanced Data Mining and Applications (ADMA '07), Springer-Verlag, Berlin, Heidelberg, pp. 158-169.

Digital Library

[36]

Zhao, Y., Deshpande, P.M. and Naughton, J.F. (1997) 'An array-based algorithm for simultaneous multidimensional aggregates', Proc. 1997 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD '97), Tucson, AZ, pp. 159-170.

Digital Library

Cited By

Jia JLiu PDu XZhang Y(2021)Multilayer Social Network Overlapping Community Detection Algorithm Based on Trust RelationshipWireless Communications & Mobile Computing10.1155/2021/92680392021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/9268039
Ma XDong DWang Q(2019)Community Detection in Multi-Layer Networks Using Joint Nonnegative Matrix FactorizationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.283220531:2(273-286)Online publication date: 16-Jul-2019
https://dl.acm.org/doi/10.1109/TKDE.2018.2832205

Multi-level relationship outlier detection
1. Information systems
  1. Information systems applications

Recommendations

Enhancing Outlier Detection by an Outlier Indicator
Machine Learning and Data Mining in Pattern Recognition
Abstract
Outlier detection is an important task in data mining and has high practical value in numerous applications such as astronomical observation, text detection, fraud detection and so on. At present, a large number of popular outlier detection ...
Neighborhood outlier detection

KNN (k nearest neighbor) is widely discussed and applied in pattern recognition and data mining, however, as a similar outlier detection method using local information for mining a new outlier, neighborhood outlier detection, few literatures are ...
Dual-regularized multi-view outlier detection
IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence

Multi-view outlier detection is a challenging problem due to the inconsistent behaviors and complicated distributions of samples across different views. The existing approaches are designed to identify the outlier exhibiting inconsistent characteristics ...

Comments

Information & Contributors

Information

Published In

cover image International Journal of Business Intelligence and Data Mining

International Journal of Business Intelligence and Data Mining Volume 7, Issue 4

January 2012

107 pages

ISSN:1743-8195

EISSN:1743-8187

Issue’s Table of Contents

Publisher

Inderscience Publishers

Geneva 15, Switzerland

Publication History

Published: 01 January 2012

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jia JLiu PDu XZhang Y(2021)Multilayer Social Network Overlapping Community Detection Algorithm Based on Trust RelationshipWireless Communications & Mobile Computing10.1155/2021/92680392021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/9268039
Ma XDong DWang Q(2019)Community Detection in Multi-Layer Networks Using Joint Nonnegative Matrix FactorizationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.283220531:2(273-286)Online publication date: 16-Jul-2019
https://dl.acm.org/doi/10.1109/TKDE.2018.2832205

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents