Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Multi-level relationship outlier detection

Published: 01 January 2012 Publication History

Abstract

Relationship management is critical in business. Particularly, it is important to detect abnormal relationships, such as fraudulent relationships between service providers and consumers. Surprisingly, in the literature there is no systematic study on detecting relationship outliers. Particularly, no existing methods can detect and handle relationship outliers between groups and individuals in groups. In this paper, we tackle this important problem by developing a simple yet effective model. The major novelty is that we identify two types of outliers and devise efficient detection algorithms. Our experiments on both real data and synthetic data confirm the effectiveness, efficiency and scalability of our approach. The techniques reported in this paper have been in production in a large scale business application.

References

[1]
Agyemang, M., Barker, K. and Alhajj, R. (2006) 'A comprehensive survey of numeric and symbolic outlier mining techniques', Intell. Data Anal., Vol. 10, No. 6, pp. 521-538.
[2]
Bakar, Z.A., Mohemad, R., Ahmad, A. and Deris, M.M. (2006) 'A comparative study for outlier detection techniques in data mining', Proc. 2006 IEEE Conf. Cybernetics and Intelligent Systems, Bangkok, Thailand, pp. 1-6.
[3]
Berry, M. and Linoff, G. (1999) Mastering Data Mining: The Art and Science of Customer Relationship Management, John Wiley & Sons, Inc., New York, NY, USA.
[4]
Beyer, K. and Ramakrishnan, R. (1999) 'Bottom-up computation of sparse and iceberg CUBEs', Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD '99), Philadelphia, PA, pp. 359-370.
[5]
He, Z., Xu, X. and Deng, S. (2003) 'Discovering cluster-based local outliers', Pattern Recognition Letters, Elsevier, Vol. 24, Nos. 9-10, pp. 1641-1650.
[6]
Chandola, V., Banerjee, A. and Kumar, V. (2009) 'Anomaly detection: a survey', ACM Computing Surveys, Vol. 41, No. 4, pp. 1-58.
[7]
Chawla, N.V., Japkowicz, N. and Kotcz, A. (2004) 'Editorial: special issue on learning from imbalanced data sets', ACM SIGKDD Explorations Newsletter, ACM, Vol. 6, pp. 1-6.
[8]
Chebyshev, P.L. (1874) 'Sur les valeurs limites des intégrales', Imprimerie de Gauthier-Villars.
[9]
Cheng, J. and Hurson, A. (1991) 'Effective clustering of complex objects in object-oriented databases', ACM SIGMOD Record, ACM.
[10]
Cooper, M., Lambert, D. and Pagh, J. (1997) 'Supply chain management: more than a new name for logistics', The International Journal of Logistics Management, Vol. 8, No. 1, pp. 1-14.
[11]
Eskin, E., Arnold, A., Prerau, M., Portnoy, L. and Stolfo, S. (2002) 'A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data', Applications of Data Mining in Computer Security, pp. 77-101, Kluwer, Boston.
[12]
Ester, M., Kriegel, H., Sander, J. and Xu, X. (1996) 'A density-based algorithm for discovering clusters in large spatial databases with noise', Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, AAAI Press, pp. 226-231.
[13]
Gray, J., Bosworth, A., Layman, A. and Pirahesh, H. (1996) 'Data cube: a relational operator generalizing group-by, cross-tab and sub-totals', Proc. 1996 Int. Conf. Data Engineering (ICDE'96), New Orleans, Louisiana, pp. 152-159.
[14]
Han, J., Kamber, M. and Pei, J. (2000) Data Mining: Concepts and Techniques, 3rd ed., Morgan Kaufmann, Burlington, MA, USA, ISBN: 1-55860-489-8.
[15]
Hawkins, D. (1980) Identification of Outliers, Chapman and Hall, London.
[16]
Hodge, V.J. and Austin, J. (2004) 'A survey of outlier detection methodologies', Artificial Intelligence Review, Vol. 22, No. 2, pp. 85-126.
[17]
Huang, J., Shimizu, H. and Shioya, S. (2003) 'Clustering gene expression pattern and extracting relationship in gene network based on artificial neural networks', Journal of Bioscience and Bioengineering, Vol. 96, No. 5, pp. 421-428, Elsevier.
[18]
Inmon, W.H. (2005) Building the Data Warehouse, Wiley-India, New Delhi, India.
[19]
Jiang, B., Pei, J., Tao, Y. and Lin, X. (2011) 'Clustering uncertain data based on probability distribution similarity', IEEE Transactions on Knowledge and Data Engineering.
[20]
Joshi, M., Agarwal, R. and Kumar, V. (2001) 'Mining needle in a haystack: classifying rare classes via two-phase rule induction', ACM SIGMOD Record, ACM, Vol. 30, pp. 91-102.
[21]
Joshi, M., Agarwal, R. and Kumar, V. (2002) 'Predicting rare classes: can boosting make any weak learner strong?', Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 297-306.
[22]
Knorr, E. and Ng, R. (1998) 'Algorithms for mining distance-based outliers in large datasets', Proceedings of the International Conference on Very Large Data Bases, Citeseer, pp. 392-403.
[23]
Kou, Y., Lu, C., Sirwongwattana, S. and Huang, Y. (2004) 'Survey of fraud detection techniques', IEEE International Conference on Networking, Sensing and Control, IEEE.
[24]
Kullback, S. and Leibler, R.A. (1951) 'On information and sufficiency', The Annals of Mathematical Statistics, Vol. 22, No. 1, pp. 79-86.
[25]
Payne, A. and Frow, P. (2005) 'A strategic framework for customer relationship management', Journal of Marketing, Vol. 69, No. 4, pp. 167-176.
[26]
Phua, C., Alahakoon, D. and Lee, V. (2004) 'Minority report in fraud detection: classification of skewed data', ACM SIGKDD Explorations Newsletter, ACM, Vol. 6, pp. 50-59.
[27]
Rymon, R. (1992) 'Search through systematic set enumeration', Proc. 1992 Int. Conf. Principle of Knowledge Representation and Reasoning (KR '92), Cambridge, MA, pp. 539-550.
[28]
Sarawagi, S., Agrawal, R. and Megiddo, N. (1998) 'Discovery-driven exploration of OLAP data cubes', Proc. Int. Conf. of Extending Database Technology (EDBT '98), Valencia, Spain, pp. 168-182.
[29]
Scott, D. (1992) Multivariate Density Estimation, Wiley Online Library, Hoboken, NJ, USA.
[30]
Silverman, B. (1986) Density Estimation for Statistics and Data Analysis, Chapman & Hall/CRC, London, England.
[31]
Strehl, A. and Ghosh, J. (2003) 'Relationship-based clustering and visualization for high-dimensional data mining', INFORMS Journal on Computing, Vol. 15, No. 2, pp. 208-230.
[32]
Vilalta, R. and Ma, S. (2002) 'Predicting rare events in temporal domains', Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, pp. 474-481.
[33]
Weiss, G. and Hirsh, H. (1998) 'Learning to predict rare events in event sequences', Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 359-363.
[34]
Yu, H., Pei, J., Tang, S. and Yang, D. (2005) 'Mining most general multidimensional summarization of probable groups in data warehouses', Proceedings of the 17th International Conference on Scientific and Statistical Database Management, Lawrence Berkeley Laboratory.
[35]
Zhang, K., Shi, S., Gao, H. and Li, J. (2007) 'Unsupervised outlier detection in sensor networks using aggregation tree', Proceedings of the 3rd International Conference on Advanced Data Mining and Applications (ADMA '07), Springer-Verlag, Berlin, Heidelberg, pp. 158-169.
[36]
Zhao, Y., Deshpande, P.M. and Naughton, J.F. (1997) 'An array-based algorithm for simultaneous multidimensional aggregates', Proc. 1997 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD '97), Tucson, AZ, pp. 159-170.

Cited By

View all
  • (2021)Multilayer Social Network Overlapping Community Detection Algorithm Based on Trust RelationshipWireless Communications & Mobile Computing10.1155/2021/92680392021Online publication date: 1-Jan-2021
  • (2019)Community Detection in Multi-Layer Networks Using Joint Nonnegative Matrix FactorizationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.283220531:2(273-286)Online publication date: 16-Jul-2019
  1. Multi-level relationship outlier detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image International Journal of Business Intelligence and Data Mining
    International Journal of Business Intelligence and Data Mining  Volume 7, Issue 4
    January 2012
    107 pages
    ISSN:1743-8195
    EISSN:1743-8187
    Issue’s Table of Contents

    Publisher

    Inderscience Publishers

    Geneva 15, Switzerland

    Publication History

    Published: 01 January 2012

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 07 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Multilayer Social Network Overlapping Community Detection Algorithm Based on Trust RelationshipWireless Communications & Mobile Computing10.1155/2021/92680392021Online publication date: 1-Jan-2021
    • (2019)Community Detection in Multi-Layer Networks Using Joint Nonnegative Matrix FactorizationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.283220531:2(273-286)Online publication date: 16-Jul-2019

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media