Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Fast, Accurate, and Flexible Algorithms for Dense Subtensor Mining

Published: 23 January 2018 Publication History

Abstract

Given a large-scale and high-order tensor, how can we detect dense subtensors in it? Can we spot them in near-linear time but with quality guarantees? Extensive previous work has shown that dense subtensors, as well as dense subgraphs, indicate anomalous or fraudulent behavior (e.g., lockstep behavior in social networks). However, available algorithms for detecting dense subtensors are not satisfactory in terms of speed, accuracy, and flexibility. In this work, we propose two algorithms, called M-Zoom and M-Biz, for fast and accurate dense-subtensor detection with various density measures. M-Zoom gives a lower bound on the density of detected subtensors, while M-Biz guarantees the local optimality of detected subtensors. M-Zoom and M-Biz can be combined, giving the following advantages: (1) Scalable: scale near-linearly with all aspects of tensors and are up to 114× faster than state-of-the-art methods with similar accuracy, (2) Provably accurate: provide a guarantee on the lowest density and local optimality of the subtensors they find, (3) Flexible: support multi-subtensor detection and size bounds as well as diverse density measures, and (4) Effective: successfully detected edit wars and bot activities in Wikipedia, and spotted network attacks from a TCP dump with near-perfect accuracy (AUC = 0.98).

References

[1]
Leman Akoglu, Mary McGlohon, and Christos Faloutsos. 2010. Oddball: Spotting anomalies in weighted graphs. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 410--421.
[2]
Reid Andersen and Kumar Chellapilla. 2009. Finding dense subgraphs with size bounds. In Proceedings of International Workshop on Algorithms and Models for the Web-Graph. Springer, 25--37.
[3]
Brett W. Bader and Tamara G. Kolda. 2007. Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing 30, 1 (2007), 205--231.
[4]
Bahman Bahmani, Ashish Goel, and Kamesh Munagala. 2014. Efficient primal-dual graph algorithms for mapreduce. In Proceedings of International Workshop on Algorithms and Models for the Web-Graph. Springer, 59--78.
[5]
Bahman Bahmani, Ravi Kumar, and Sergei Vassilvitskii. 2012. Densest subgraph in streaming and MapReduce. Proceedings of the VLDB Endowment 5, 5 (2012), 454--465.
[6]
Oana Denisa Balalau, Francesco Bonchi, T. H. Chan, Francesco Gullo, and Mauro Sozio. 2015. Finding subgraphs with maximum total density and limited overlap. In Proceedings of the 8th ACM International Conference on Web Search and Data Mining. ACM, 379--388.
[7]
James Bennett and Stan Lanning. 2007. The Netflix prize. In Proceedings of KDD Cup and Workshop, Vol. 2007. ACM, 35.
[8]
Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. Copycatch: Stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 119--130.
[9]
Sayan Bhattacharya, Monika Henzinger, Danupon Nanongkai, and Charalampos Tsourakakis. 2015. Space-and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In Proceedings of the 47th Annual ACM Symposium on Theory of Computing. ACM, 173--182.
[10]
Loïc Cerf, Jérémy Besson, Céline Robardet, and Jean-François Boulicaut. 2008. Data peeler: Contraint-based closed pattern mining in n-ary relations. In Proceedings of the 2008 SIAM International Conference on Data Mining. SIAM, 37--48.
[11]
Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In Proceedings of International Workshop on Approximation Algorithms for Combinatorial Optimization. Springer, 84--95.
[12]
Gideon Dror, Noam Koenigstein, Yehuda Koren, and Markus Weimer. 2012. The yahoo! music dataset and KDD-cup’11. In Proceedings of KDD Cup 2011. ACM, 3--18.
[13]
Alessandro Epasto, Silvio Lattanzi, and Mauro Sozio. 2015. Efficient densest subgraph computation in evolving graphs. In Proceedings of the 24th International Conference on World Wide Web. ACM, 300--310.
[14]
Esther Galbrun, Aristides Gionis, and Nikolaj Tatti. 2016. Top-k overlapping densest subgraphs. Data Mining and Knowledge Discovery 30, 5 (2016), 1134--1165.
[15]
David Gibson, Ravi Kumar, and Andrew Tomkins. 2005. Discovering large dense subgraphs in massive graphs. In Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 721--732.
[16]
Andrew V. Goldberg. 1984. Finding a Maximum Density Subgraph. University of California Berkeley, CA.
[17]
Bryan Hooi, Kijung Shin, Hyun Ah Song, Alex Beutel, Neil Shah, and Christos Faloutsos. 2017. Graph-based fraud detection in the face of camouflage. ACM Transactions on Knowledge Discovery from Data 11, 4 (2017), 44.
[18]
Dmitry I. Ignatov, Sergei O. Kuznetsov, Jonas Poelmans, and Leonid E. Zhukov. 2013. Can triconcepts become triclusters?International Journal of General Systems 42, 6 (2013), 572--593.
[19]
Meng Jiang, Alex Beutel, Peng Cui, Bryan Hooi, Shiqiang Yang, and Christos Faloutsos. 2015. A general suspiciousness metric for dense blocks in multimodal data. In Proceedings of the 2015 IEEE International Conference on Data Mining (ICDM’15). IEEE, 781--786.
[20]
Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014a. Catchsync: Catching synchronized behavior in large directed graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 941--950.
[21]
Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014b. Inferring strange behavior from connectivity pattern in social networks. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 126--138.
[22]
Ravi Kannan and V. Vinay. 1999. Analyzing the Structure of Large Graphs. Rheinische Friedrich-Wilhelms-Universität Bonn.
[23]
Samir Khuller and Barna Saha. 2009. On finding dense subgraphs. In International Colloquium on Automata, Languages, and Programming. Springer, 597--608.
[24]
Tamara G. Kolda and Brett W. Bader. 2009. Tensor decompositions and applications. SIAM Review 51, 3 (2009), 455--500.
[25]
Jérôme Kunegis. 2013. Konect: The Koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 1343--1350.
[26]
Victor E. Lee, Ning Ruan, Ruoming Jin, and Charu Aggarwal. 2010. A survey of algorithms for dense subgraph discovery. In Proceedings of Managing and Mining Graph Data. Springer, 303--336.
[27]
Koji Maruhashi, Fan Guo, and Christos Faloutsos. 2011. Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In Proceedings of 2011 International Conference on Advances in Social Networks Analysis and Mining (ASONAM’11). IEEE, 203--210.
[28]
Julian McAuley, Rahul Pandey, and Jure Leskovec. 2015. Inferring networks of substitutable and complementary products. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 785--794.
[29]
Andrew McGregor, David Tench, Sofya Vorotnikova, and Hoa T. Vu. 2015. Densest subgraph in dynamic graph streams. In Proceedings of International Symposium on Mathematical Foundations of Computer Science. Springer, 472--482.
[30]
Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement. ACM, 29--42.
[31]
Shashank Pandit, Duen Horng Chau, Samuel Wang, and Christos Faloutsos. 2007. Netprobe: A fast and scalable system for fraud detection in online auction networks. In Proceedings of the 16th International Conference on World Wide Web. ACM, 201--210.
[32]
B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju, and Christos Faloutsos. 2010. Eigenspokes: Surprising patterns and scalable community chipping in large graphs. In Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 435--448.
[33]
Jouni K. Seppänen and Heikki Mannila. 2004. Dense itemsets. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 683--688.
[34]
Neil Shah, Alex Beutel, Brian Gallagher, and Christos Faloutsos. 2014. Spotting suspicious link behavior with fbox: An adversarial perspective. In Proceedings of 2014 IEEE International Conference on Data Mining (ICDM’14). IEEE, 959--964.
[35]
Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos. 2016a. CoreScope: Graph mining using k-core analysis - patterns, anomalies and algorithms. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, 469--478.
[36]
Kijung Shin, Bryan Hooi, and Christos Faloutsos. 2016b. M-Zoom: Fast dense-block detection in tensors with quality guarantees. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 264--280.
[37]
Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos. 2017a. D-cube: Dense-block detection in terabyte-scale tensors. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining. ACM, 681--689.
[38]
Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos. 2017b. DenseAlert: Incremental dense-subtensor detection in tensor streams. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1057--1066.
[39]
Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 104--112.

Cited By

View all
  • (2024)Collaborative Fraud Detection on Large Scale Graph Using Secure Multi-Party ComputationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679863(1473-1482)Online publication date: 21-Oct-2024
  • (2024)Interrelated Dense Pattern Detection in Multilayer NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339868336:11(6462-6476)Online publication date: Nov-2024
  • (2023)Hierarchical Dense Pattern Detection in TensorsACM Transactions on Knowledge Discovery from Data10.1145/357702217:6(1-29)Online publication date: 28-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 12, Issue 3
June 2018
360 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3178546
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 January 2018
Accepted: 01 October 2017
Revised: 01 June 2017
Received: 01 December 2016
Published in TKDD Volume 12, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Tensor
  2. anomaly detection
  3. dense subtensor
  4. fraud detection

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)57
  • Downloads (Last 6 weeks)7
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Collaborative Fraud Detection on Large Scale Graph Using Secure Multi-Party ComputationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679863(1473-1482)Online publication date: 21-Oct-2024
  • (2024)Interrelated Dense Pattern Detection in Multilayer NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339868336:11(6462-6476)Online publication date: Nov-2024
  • (2023)Hierarchical Dense Pattern Detection in TensorsACM Transactions on Knowledge Discovery from Data10.1145/357702217:6(1-29)Online publication date: 28-Feb-2023
  • (2023)Telecom Fraud Detection via Hawkes-Enhanced Sequence ModelIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.315080335:5(5311-5324)Online publication date: 1-May-2023
  • (2023)W3Detector: Detecting Fraudulent Online Sellers Based on Temporal and Spacial Information2023 International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA58977.2023.00280(1845-1851)Online publication date: 15-Dec-2023
  • (2023)Hypercore decomposition for non-fragile hyperedges: concepts, algorithms, observations, and applicationsData Mining and Knowledge Discovery10.1007/s10618-023-00956-237:6(2389-2437)Online publication date: 8-Aug-2023
  • (2022)Abnormal User Detection via Multiview Graph Clustering in the Mobile e-Commerce NetworkWireless Communications and Mobile Computing10.1155/2022/37668102022(1-17)Online publication date: 9-Aug-2022
  • (2022)Blackmarket-Driven Collusion on Online Media: A SurveyACM/IMS Transactions on Data Science10.1145/35179312:4(1-37)Online publication date: 17-May-2022
  • (2022)Finding a Concise, Precise, and Exhaustive Set of Near Bi-Cliques in Dynamic GraphsProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498390(908-916)Online publication date: 11-Feb-2022
  • (2022)Two-stage anomaly detection algorithm via dynamic community evolution in temporal graphApplied Intelligence10.1007/s10489-021-03109-452:11(12222-12240)Online publication date: 2-Feb-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media