Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3318464.3389697acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs

Published: 31 May 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Given a directed graph G, the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G, whose density is the highest among all the subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fraud detection, community mining, and graph compression. However, existing DDS solutions suffer from efficiency and scalability problems: on a three-thousand-edge graph, it takes three days for one of the best exact algorithms to complete. In this paper, we develop an efficient and scalable DDS solution. We introduce the notion of [x, y]-core, which is a dense subgraph for G, and show that the densest subgraph can be accurately located through the [x, y]-core with theoretical guarantees. Based on the [x, y]-core, we develop exact and approximation algorithms. We have performed an extensive evaluation of our approaches on eight real large datasets. The results show that our proposed solutions are up to six orders of magnitude faster than the state-of-the-art.

    Supplementary Material

    Source Code (3318464.3389697_source_code.zip)
    Read me (3318464.3389697_readme.pdf)
    MP4 File (3318464.3389697.mp4)
    Presentation Video

    References

    [1]
    Federal Aviation Administration. 2019. Air Traffic Control System Command Center. https://www.faa.gov. (2019).
    [2]
    Réka Albert, Hawoong Jeong, and Albert-László Barabási. 1999. Internet: Diameter of the world-wide web. nature, Vol. 401, 6749 (1999), 130.
    [3]
    Bahman Bahmani, Ravi Kumar, and Sergei Vassilvitskii. 2012. Densest subgraph in streaming and mapreduce. Proceedings of the VLDB Endowment, Vol. 5, 5 (2012), 454--465.
    [4]
    Vladimir Batagelj and Matjaz Zaversnik. 2003. An O(m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049 (2003).
    [5]
    Aditya Bhaskara, Moses Charikar, Eden Chlamtac, Uriel Feige, and Aravindan Vijayaraghavan. 2010. Detecting high log-densities: an O (n 1/4) approximation for densest k-subgraph. In Proceedings of the forty-second ACM symposium on Theory of computing. ACM, 201--210.
    [6]
    Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In Proceedings of the 2008 International Conference on Web Search and Data Mining. ACM, 95--106.
    [7]
    Andrea Capocci, Vito DP Servedio, Francesca Colaiori, Luciana S Buriol, Debora Donato, Stefano Leonardi, and Guido Caldarelli. 2006. Preferential attachment in the growth of social networks: The internet encyclopedia Wikipedia. Physical review E, Vol. 74, 3 (2006), 036116.
    [8]
    Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, and KrishnaP. Gummadi. 2010. Measuring User Influence in Twitter: The Million Follower Fallacy. In Proc. Int. Conf. on Weblogs and Social Media. 10--17.
    [9]
    Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In International Workshop on Approximation Algorithms for Combinatorial Optimization. Springer, 84--95.
    [10]
    Maximilien Danisch, T-H Hubert Chan, and Mauro Sozio. 2017. Large scale density-friendly graph decomposition via convex programming. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 233--242.
    [11]
    Soroush Ebadian and Xin Huang. 2019. Fast Algorithm for K-Truss Discovery on Public-Private Graphs. (2019), 2258--2264.
    [12]
    Yixiang Fang, Reynold Cheng, Yankai Chen, Siqiang Luo, and Jiafeng Hu. 2017a. Effective and efficient attributed community search. The VLDB Journal, Vol. 26, 6 (2017), 803--828.
    [13]
    Yixiang Fang, Reynold Cheng, Xiaodong Li, Siqiang Luo, and Jiafeng Hu. 2017b. Effective community search over large spatial graphs. PVLDB, Vol. 10, 6 (2017), 709--720.
    [14]
    Yixiang Fang, Reynold Cheng, Siqiang Luo, and Jiafeng Hu. 2016. Effective community search for large attributed graphs. PVLDB, Vol. 9, 12 (2016), 1233--1244.
    [15]
    Yixiang Fang, Xin Huang, Lu Qin, Ying Zhang, Wenjie Zhang, Reynold Cheng, and Xuemin Lin. 2019 a. A survey of community search over big graphs. The VLDB Journal (2019), 1--40.
    [16]
    Yixiang Fang, Zheng Wang, Reynold Cheng, Xiaodong Li, Siqiang Luo, Jiafeng Hu, and Xiaojun Chen. 2019 b. On spatial-aware community search. TKDE, Vol. 31, 4 (2019), 783--798.
    [17]
    Yixiang Fang, Zhongran Wang, Reynold Cheng, Hongzhi Wang, and Jiafeng Hu. 2019 c. Effective and efficient community search over large directed graphs. TKDE, Vol. 31, 11 (2019), 2093--2107.
    [18]
    Yixiang Fang, Yixing Yang, Wenjie Zhang, Xuemin Lin, and Xin Cao. 2020. Effective and Efficient Community Search over Large Heterogeneous Information Networks. PVLDB, Vol. 13, 6 (Feb. 2020).
    [19]
    Yixiang Fang, Kaiqiang Yu, Reynold Cheng, Laks VS Lakshmanan, and Xuemin Lin. 2019 d. Efficient Algorithms for Densest Subgraph Discovery. Proceedings of the VLDB Endowment, Vol. 12, 11 (2019), 1719 -- 1732.
    [20]
    Linton Clarke Freeman, CynthiaMarie Webster, and Deirdre M Kirke. 1998. Exploring Social Structure using Dynamic Three-dimensional Color Images. Social Networks, Vol. 20, 2 (1998), 109--118.
    [21]
    Christos Giatsidis, Dimitrios M Thilikos, and Michalis Vazirgiannis. 2013. D-cores: measuring collaboration of directed graphs based on degeneracy. Knowledge and information systems, Vol. 35, 2 (2013), 311--343.
    [22]
    Aristides Gionis and Charalampos E Tsourakakis. 2015. Dense subgraph discovery: Kdd 2015 tutorial. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2313--2314.
    [23]
    Andrew V Goldberg. 1984. Finding a maximum density subgraph .University of California Berkeley, CA.
    [24]
    GT Heineman, G Pollice, and S Selkow. 2008. Network Flow Algorithms. Algorithms in a Nutshell. (2008).
    [25]
    Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos. 2016. Fraudar: Bounding graph fraud in the face of camouflage. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 895--904.
    [26]
    Jiafeng Hu, Reynold Cheng, Kevin Chen-Chuan Chang, Aravind Sankar, Yixiang Fang, and BrianYH Lam. 2019. Discovering Maximal Motif Cliques in Large Heterogeneous Information Networks. In International Conference on Data Engineering (ICDE). IEEE, 746--757.
    [27]
    Xin Huang, Laks VS Lakshmanan, and Jianliang Xu. 2019. Community Search over Big Graphs .Morgan & Claypool Publishers.
    [28]
    Xin Huang, Laks VS Lakshmanan, JeffreyXu Yu, and Hong Cheng. 2015. Approximate closest community search in networks. PVLDB, Vol. 9, 4 (2015), 276--287.
    [29]
    Akshay Java, Xiaodan Song, Tim Finin, and Belle Tseng. 2007. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis. ACM, 56--65.
    [30]
    Ravi Kannan and V Vinay. 1999. Analyzing the structure of large graphs .Rheinische Friedrich-Wilhelms-Universitat Bonn Bonn.
    [31]
    Samir Khuller and Barna Saha. 2009. On finding dense subgraphs. In International Colloquium on Automata, Languages, and Programming. Springer, 597--608.
    [32]
    Jon M Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), Vol. 46, 5 (1999), 604--632.
    [33]
    Jérôme Kunegis. 2013. KONECT -- The Koblenz Network Collection. In Proc. Int. Conf. on World Wide Web Companion. 1343--1350. http://userpages.uni-koblenz.de/kunegis/paper/kunegis-koblenz-network-collection.pdf
    [34]
    Jure Leskovec, Lada A. Adamic, and Bernardo A. Huberman. 2007. The Dynamics of Viral Marketing. ACM Transaction on the Web, Vol. 1, 1 (2007).
    [35]
    Qing Liu, Minjun Zhao, Xin Huang, Jianliang Xu, and Yunjun Gao. 2020. Truss-based Community Search over Large Directed Graphs. In SIGMOD .
    [36]
    Chenhao Ma, Reynold Cheng, Laks VS Lakshmanan, Tobias Grubenmann, Yixiang Fang, and Xiaodong Li. 2019. LINC: a motif counting algorithm for uncertain graphs. Proceedings of the VLDB Endowment, Vol. 13, 2 (2019), 155--168.
    [37]
    Paolo Massa, Martino Salvetti, and Danilo Tomasoni. 2009. Bowling Alone and Trust Decline in Social Network Sites. In Proc. Int. Conf. Dependable, Autonomic and Secure Computing. 658--663.
    [38]
    Michael Mitzenmacher, Jakub Pachocki, Richard Peng, Charalampos Tsourakakis, and ShenChen Xu. 2015. Scalable large near-clique detection in large-scale networks via sampling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 815--824.
    [39]
    Arjun Mukherjee, Bing Liu, and Natalie Glance. 2012. Spotting Fake Reviewer Groups in Consumer Reviews. In WWW. 191--200.
    [40]
    Xing Niu, Xinruo Sun, Haofen Wang, Shu Rong, Guilin Qi, and Yong Yu. 2011. Zhishi.me -- Weaving Chinese Linking Open Data. In Proc. Int. Semantic Web Conf. 205--220.
    [41]
    Tore Opsahl, Filip Agneessens, and John Skvoretz. 2010. Node Centrality in Weighted Networks: Generalizing Degree and Shortest Paths. Social Networks, Vol. 3, 32 (2010), 245--251.
    [42]
    James B Orlin. 2013. Max flows in O (nm) time, or better. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing. 765--774.
    [43]
    B Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju, and Christos Faloutsos. 2010. Eigenspokes: Surprising patterns and scalable community chipping in large graphs. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 435--448.
    [44]
    Lu Qin, Rong-Hua Li, Lijun Chang, and Chengqi Zhang. 2015. Locally densest subgraph discovery. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 965--974.
    [45]
    Martin W. Schein and MiltonH. Fohrman. 1955. Social Dominance Relationships in a Herd of Dairy Cattle. The British J. of Animal Behaviour, Vol. 3, 2 (1955), 45--55.
    [46]
    Stephen B Seidman. 1983. Network structure and minimum degree. Social networks, Vol. 5, 3 (1983), 269--287.
    [47]
    Nikolaj Tatti and Aristides Gionis. 2015. Density-friendly graph decomposition. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1089--1099.
    [48]
    Charalampos Tsourakakis. 2015. The k-clique densest subgraph problem. In Proceedings of the 24th international conference on world wide web. International World Wide Web Conferences Steering Committee, 1122--1132.
    [49]
    Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 104--112.
    [50]
    Zhiwei Zhang, Xin Huang, Jianliang Xu, Byron Choi, and Zechao Shang. 2019. Keyword-Centric Community Search. In ICDE. 422--433.
    [51]
    Dong Zheng, Jianquan Liu, Rong-Hua Li, Cigdem Aslay, Yi-Cheng Chen, and Xin Huang. 2017. Querying intimate-core groups in weighted graphs. In IEEE International Conference on Semantic Computing. 156--163.

    Cited By

    View all
    • (2024)A Counting-based Approach for Efficient k-Clique Densest Subgraph DiscoveryProceedings of the ACM on Management of Data10.1145/36549222:3(1-27)Online publication date: 30-May-2024
    • (2024)A Similarity-based Approach for Efficient Large Quasi-clique DetectionProceedings of the ACM on Web Conference 202410.1145/3589334.3645374(401-409)Online publication date: 13-May-2024
    • (2024)A Unified and Scalable Algorithm Framework of User-Defined Temporal $(k,\mathcal {X})$-Core QueryIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3349310(1-15)Online publication date: 2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
    June 2020
    2925 pages
    ISBN:9781450367356
    DOI:10.1145/3318464
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 May 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. densest subgraph discovery
    2. directed graph

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGMOD/PODS '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)401
    • Downloads (Last 6 weeks)50
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Counting-based Approach for Efficient k-Clique Densest Subgraph DiscoveryProceedings of the ACM on Management of Data10.1145/36549222:3(1-27)Online publication date: 30-May-2024
    • (2024)A Similarity-based Approach for Efficient Large Quasi-clique DetectionProceedings of the ACM on Web Conference 202410.1145/3589334.3645374(401-409)Online publication date: 13-May-2024
    • (2024)A Unified and Scalable Algorithm Framework of User-Defined Temporal $(k,\mathcal {X})$-Core QueryIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3349310(1-15)Online publication date: 2024
    • (2024)Unified Dense Subgraph Detection: Fast Spectral Theory Based AlgorithmsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327257436:3(1356-1370)Online publication date: Mar-2024
    • (2024)Maximizing $(k,L)$-Core With Edge Augmentation in Multilayer GraphsIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.333209111:3(3931-3943)Online publication date: Jun-2024
    • (2024)Diversity-Optimized Group Extraction in Social NetworksIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.322493511:1(756-769)Online publication date: Mar-2024
    • (2024)On Searching Maximum Directed $(k, \ell)$-Plex2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00202(2570-2583)Online publication date: 13-May-2024
    • (2024)Influence maximization on hypergraphs via multi-hop influence estimationInformation Processing & Management10.1016/j.ipm.2024.10368361:3(103683)Online publication date: May-2024
    • (2024)Efficient and effective algorithms for densest subgraph discovery and maintenanceThe VLDB Journal10.1007/s00778-024-00855-yOnline publication date: 8-May-2024
    • (2023)Scalable Time-Range k-Core Query on Temporal GraphsProceedings of the VLDB Endowment10.14778/3579075.357908916:5(1168-1180)Online publication date: 1-Jan-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media