Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

On Directed Densest Subgraph Discovery

Published: 15 November 2021 Publication History

Abstract

Given a directed graph G, the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G, whose density is the highest among all the subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fraud detection, community mining, and graph compression. However, existing DDS solutions suffer from efficiency and scalability problems: on a 3,000-edge graph, it takes three days for one of the best exact algorithms to complete. In this article, we develop an efficient and scalable DDS solution. We introduce the notion of [x, y]-core, which is a dense subgraph for G, and show that the densest subgraph can be accurately located through the [x, y]-core with theoretical guarantees. Based on the [x, y]-core, we develop exact and approximation algorithms. We further study the problems of maintaining the DDS over dynamic directed graphs and finding the weighted DDS on weighted directed graphs, and we develop efficient non-trivial algorithms to solve these two problems by extending our DDS algorithms. We have performed an extensive evaluation of our approaches on 15 real large datasets. The results show that our proposed solutions are up to six orders of magnitude faster than the state-of-the-art.

References

[1]
Federal Aviation Administration. 2019. Air Traffic Control System Command Center. Retrieved from https://www.faa.gov.
[2]
Ravindra K. Ahuja, M. Kodialam, A. K. Mishra, and J. B. Orlin. 1997. Computational investigations of maximum flow algorithms. European Journal of Operational Research 97, 3 (1997), 509–542.
[3]
Réka Albert, Hawoong Jeong, and Albert-László Barabási. 1999. Internet: Diameter of the world-wide web. Nature 401, 6749 (1999), 130.
[4]
Reid Andersen. 2010. A local algorithm for finding dense subgraphs. Trans. Alg. 6, 4 (2010), 1–12.
[5]
Albert Angel, Nick Koudas, Nikos Sarkas, Divesh Srivastava, Michael Svendsen, and Srikanta Tirthapura. 2014. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. VLDB J. 23, 2 (2014), 175–199.
[6]
Bahman Bahmani, Ravi Kumar, and Sergei Vassilvitskii. 2012. Densest subgraph in streaming and mapreduce. Proc. VLDB Endow. 5, 5 (2012), 454–465.
[7]
Vladimir Batagelj and Matjaz Zaversnik. 2003. An O (m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049 (2003).
[8]
Aditya Bhaskara, Moses Charikar, Eden Chlamtac, Uriel Feige, and Aravindan Vijayaraghavan. 2010. Detecting high log-densities: An O (n 1/4) approximation for densest k-subgraph. In STOC. ACM, 201–210.
[9]
Sayan Bhattacharya, Monika Henzinger, Danupon Nanongkai, and Charalampos Tsourakakis. 2015. Space-and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In STOC. 173–182.
[10]
John Adrian Bondy, Uppaluri Siva Ramachandra Murty et al. 1976. Graph Theory with Applications. Vol. 290. Macmillan London.
[11]
Digvijay Boob, Yu Gao, Richard Peng, Saurabh Sawlani, Charalampos Tsourakakis, Di Wang, and Junxing Wang. 2020. Flowless: Extracting densest subgraphs without flow computations. In WWW. ACM, 573–583.
[12]
Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In WSDM. ACM, 95–106.
[13]
Andrea Capocci, Vito D. P. Servedio, Francesca Colaiori, Luciana S. Buriol, Debora Donato, Stefano Leonardi, and Guido Caldarelli. 2006. Preferential attachment in the growth of social networks: The internet encyclopedia Wikipedia. Phys. Rev. E 74, 3 (2006), 036116.
[14]
Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, and Krishna P. Gummadi. 2010. Measuring user influence in Twitter: The million follower fallacy. In ICWSM. 10–17.
[15]
Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In APPROX. Springer, 84–95.
[16]
Maximilien Danisch, T.-H. Hubert Chan, and Mauro Sozio. 2017. Large scale density-friendly graph decomposition via convex programming. In WWW. 233–242.
[17]
Soroush Ebadian and Xin Huang. 2019. Fast algorithm for k-truss discovery on public-private graphs. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019.
[18]
Alessandro Epasto, Silvio Lattanzi, and Mauro Sozio. 2015. Efficient densest subgraph computation in evolving graphs. In WWW. 300–310.
[19]
Yixiang Fang, Reynold Cheng, Yankai Chen, Siqiang Luo, and Jiafeng Hu. 2017. Effective and efficient attributed community search. VLDB J. 26, 6 (2017), 803–828.
[20]
Yixiang Fang, Reynold Cheng, Xiaodong Li, Siqiang Luo, and Jiafeng Hu. 2017. Effective community search over large spatial graphs. Proc. VLDB Endow. 10, 6 (2017), 709–720.
[21]
Yixiang Fang, Reynold Cheng, Siqiang Luo, and Jiafeng Hu. 2016. Effective community search for large attributed graphs. Proc. VLDB Endow. 9, 12 (2016), 1233–1244.
[22]
Yixiang Fang, Reynold Cheng, Siqiang Luo, Jiafeng Hu, and Kai Huang. 2017. C-Explorer: Browsing communities in large graphs. Proc. VLDB Endow. 10, 12 (2017), 1885–1888.
[23]
Yixiang Fang, Xin Huang, Lu Qin, Ying Zhang, Wenjie Zhang, Reynold Cheng, and Xuemin Lin. 2019. A survey of community search over big graphs. VLDB J. (2019), 1–40.
[24]
Yixiang Fang, Zheng Wang, Reynold Cheng, Xiaodong Li, Siqiang Luo, Jiafeng Hu, and Xiaojun Chen. 2019. On spatial-aware community search. IEEE Trans. Knowl. Data Eng. 31, 4 (2019), 783–798.
[25]
Yixiang Fang, Zhongran Wang, Reynold Cheng, Hongzhi Wang, and Jiafeng Hu. 2019. Effective and efficient community search over large directed graphs. IEEE Trans. Knowl. Data Eng. 31, 11 (2019), 2093–2107.
[26]
Yixiang Fang, Yixing Yang, Wenjie Zhang, Xuemin Lin, and Xin Cao. 2020. Effective and efficient community search over large heterogeneous information networks. Proc. VLDB Endow. 13, 6 (Feb. 2020).
[27]
Yixiang Fang, Kaiqiang Yu, Reynold Cheng, Laks V. S. Lakshmanan, and Xuemin Lin. 2019. Efficient algorithms for densest subgraph discovery. Proc. VLDB Endow. 12, 11 (2019), 1719–1732.
[28]
Linton Clarke Freeman, Cynthia Marie Webster, and Deirdre M. Kirke. 1998. Exploring social structure using dynamic three-dimensional color images. Soc. Netw. 20, 2 (1998), 109–118.
[29]
Giorgio Gallo, Michael D. Grigoriadis, and Robert E. Tarjan. 1989. A fast parametric maximum flow algorithm and applications. SIAM J. Comput. 18, 1 (1989), 30–55.
[30]
Christos Giatsidis, Dimitrios M. Thilikos, and Michalis Vazirgiannis. 2013. D-cores: Measuring collaboration of directed graphs based on degeneracy. Knowl. Inf. Syst. 35, 2 (2013), 311–343.
[31]
Aristides Gionis and Charalampos E. Tsourakakis. 2015. Dense subgraph discovery: KDD 2015 tutorial. In KDD. ACM, 2313–2314.
[32]
Andrew V. Goldberg. 1984. Finding a Maximum Density Subgraph. University of California Berkeley, CA.
[33]
Andrew V. Goldberg. 2008. The partial augment–relabel algorithm for the maximum flow problem. In ESA. Springer, 466–477.
[34]
G. T. Heineman, G. Pollice, and S. Selkow. 2008. Algorithms in a nutshell: A practical guide[M]. O’Reilly Media, Inc.
[35]
Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos. 2016. Fraudar: Bounding graph fraud in the face of camouflage. In KDD. ACM, 895–904.
[36]
Jiafeng Hu, Reynold Cheng, Kevin Chen-Chuan Chang, Aravind Sankar, Yixiang Fang, and Brian Y. H. Lam. 2019. Discovering maximal motif cliques in large heterogeneous information networks. In ICDE. IEEE, 746–757.
[37]
Shuguang Hu, Xiaowei Wu, and T. H. Hubert Chan. 2017. Maintaining densest subsets efficiently in evolving hypergraphs. In CIKM. 929–938.
[38]
Xin Huang, Laks V. S. Lakshmanan, and Jianliang Xu. 2019. Community Search over Big Graphs. Morgan & Claypool Publishers.
[39]
Xin Huang, Laks V. S. Lakshmanan, Jeffrey Xu Yu, and Hong Cheng. 2015. Approximate closest community search in networks. Proc. VLDB Endow. 9, 4 (2015), 276–287.
[40]
Akshay Java, Xiaodan Song, Tim Finin, and Belle Tseng. 2007. Why we Twitter: Understanding microblogging usage and communities. In 9th WebKDD and 1st SNA-KDD. ACM, 56–65.
[41]
Ravi Kannan and V. Vinay. 1999. Analyzing the Structure of Large Graphs. Rheinische Friedrich-Wilhelms-Universität Bonn Bonn.
[42]
Samir Khuller and Barna Saha. 2009. On finding dense subgraphs. In ICALP. Springer, 597–608.
[43]
Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5 (1999), 604–632.
[44]
Jérôme Kunegis. 2013. — Koblenz network collection. In WWW. 1343–1350. Retrieved from http://userpages.uni-koblenz.de/kunegis/paper/kunegis-koblenz-network-collection.pdf.
[45]
Jérôme Kunegis, Gerd Gröner, and Thomas Gottron. 2012. Online dating recommender systems: The split-complex number approach. In 4th ACM RecSys Workshop onRSWEB. 37–44.
[46]
Jure Leskovec, Lada A. Adamic, and Bernardo A. Huberman. 2007. The dynamics of viral marketing. ACM Trans. Web 1, 1 (2007).
[47]
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In KDD. ACM, 177–187.
[48]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http://snap.stanford.edu/data.
[49]
Xiaodong Li, Tsz Nam Chan, Reynold Cheng, Caihua Shan, Chenhao Ma, and Kevin Chang. 2019. Motif paths: A new approach for analyzing higher-order semantics between graph nodes. HKU Tech. Rep. 3 (2019), 4.
[50]
Xiaodong Li, Reynold Cheng, Kevin Chen-Chuan Chang, Caihua Shan, Chenhao Ma, and Hongtai Cao. 2021. On analyzing graphs with motif-paths. Proc. VLDB Endow. 14, 6 (2021), 1111–1123.
[51]
Qing Liu, Minjun Zhao, Xin Huang, Jianliang Xu, and Yunjun Gao. 2020. Truss-based community search over large directed graphs. In SIGMOD.
[52]
Chenhao Ma, Reynold Cheng, Laks V. S. Lakshmanan, Tobias Grubenmann, Yixiang Fang, and Xiaodong Li. 2019. LINC: A motif counting algorithm for uncertain graphs. Proc. VLDB Endow. 13, 2 (2019), 155–168.
[53]
Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks V. S. Lakshmanan, Wenjie Zhang, and Xuemin Lin. 2020. Efficient algorithms for densest subgraph discovery on large directed graphs. In SIGMOD. 1051–1066.
[54]
Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks V. S. Lakshmanan, Wenjie Zhang, and Xuemin Lin. 2021. Efficient directed densest subgraph discovery. ACM SIGMOD Rec. 50, 1 (2021), 33–40.
[55]
Paolo Massa, Martino Salvetti, and Danilo Tomasoni. 2009. Bowling alone and trust decline in social network sites. In IEEE DASC. 658–663.
[56]
Michael Mitzenmacher, Jakub Pachocki, Richard Peng, Charalampos Tsourakakis, and Shen Chen Xu. 2015. Scalable large near-clique detection in large-scale networks via sampling. In KDD. ACM, 815–824.
[57]
Atsushi Miyauchi and Akiko Takeda. 2018. Robust densest subgraph discovery. In ICDM. IEEE, 1188–1193.
[58]
Arjun Mukherjee, Bing Liu, and Natalie Glance. 2012. Spotting fake reviewer groups in consumer reviews. In WWW. 191–200.
[59]
Xing Niu, Xinruo Sun, Haofen Wang, Shu Rong, Guilin Qi, and Yong Yu. 2011. Zhishi.me—Weaving Chinese linking open data. In ISWC.205–220.
[60]
Tore Opsahl. 2011. Why anchorage is not (that) important: Binary ties and sample selection.
[61]
Tore Opsahl, Filip Agneessens, and John Skvoretz. 2010. Node centrality in weighted networks: Generalizing degree and shortest paths. Soc. Netw. 3, 32 (2010), 245–251.
[62]
James B. Orlin. 2013. Max flows in O (nm) time, or better. In STOC. 765–774.
[63]
Ashwin Paranjape, Austin R. Benson, and Jure Leskovec. 2017. Motifs in temporal networks. In WSDM. 601–610.
[64]
J. Patricio. 2000. Network Analysis of Trophic Dynamics in South Florida Ecosystems, FY 99: The Graminoid Ecosystem. Master’s Thesis. University of Coimbra, Coimbra, Portugal.
[65]
B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju, and Christos Faloutsos. 2010. Eigenspokes: Surprising patterns and scalable community chipping in large graphs. In PAKDD. Springer, 435–448.
[66]
Lu Qin, Rong-Hua Li, Lijun Chang, and Chengqi Zhang. 2015. Locally densest subgraph discovery. In KDD. ACM, 965–974.
[67]
Ryan A. Rossi and Nesreen K. Ahmed. 2015. The network data repository with interactive graph analytics and visualization. In AAAI. Retrieved from http://networkrepository.com.
[68]
Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, and Amitabh Trehan. 2012. Dense subgraphs on dynamic networks. In DISC. Springer, 151–165.
[69]
Saurabh Sawlani and Junxing Wang. 2020. Near-optimal fully dynamic densest subgraph. In STOC. 181–193.
[70]
Martin W. Schein and Milton H. Fohrman. 1955. Social dominance relationships in a herd of dairy cattle. Brit. J. Anim. Behav. 3, 2 (1955), 45–55.
[71]
Stephen B. Seidman. 1983. Network structure and minimum degree. Soc. Netw. 5, 3 (1983), 269–287.
[72]
Bintao Sun, Maximilien Dansich, Hubert Chan, and Mauro Sozio. 2020. KClist++: A simple algorithm for finding k-clique densest subgraphs in large graphs. Proc. VLDB Endow. 13, 10 (2020), 1628–1640.
[73]
Nikolaj Tatti and Aristides Gionis. 2015. Density-friendly graph decomposition. In WWW. 1089–1099.
[74]
Charalampos Tsourakakis. 2015. The K-clique densest subgraph problem. In WWW. 1122–1132.
[75]
Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In KDD. ACM, 104–112.
[76]
Charalampos E. Tsourakakis, Tianyi Chen, Naonori Kakimura, and Jakub Pachocki. 2019. Novel dense subgraph discovery primitives: Risk aversion and exclusion queries. In ECML PKDD. Springer, 378–394.
[77]
Zhiwei Zhang, Xin Huang, Jianliang Xu, Byron Choi, and Zechao Shang. 2019. Keyword-centric community search. In ICDE. 422–433.
[78]
Dong Zheng, Jianquan Liu, Rong-Hua Li, Cigdem Aslay, Yi-Cheng Chen, and Xin Huang. 2017. Querying intimate-core groups in weighted graphs. In ICSC. 156–163.
[79]
Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, and Georg Lausen. 2005. Improving recommendation lists through topic diversification. In WWW. 22–32.

Cited By

View all
  • (2024)Efficient Algorithms for Density Decomposition on Large Static and Dynamic GraphsProceedings of the VLDB Endowment10.14778/3681954.368197417:11(2933-2945)Online publication date: 1-Jul-2024
  • (2024)A Survey on the Densest Subgraph Problem and its VariantsACM Computing Surveys10.1145/365329856:8(1-40)Online publication date: 30-Apr-2024
  • (2024)On Searching Maximum Directed $(k, \ell)$-Plex2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00202(2570-2583)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 46, Issue 4
December 2021
169 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/3492445
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2021
Accepted: 01 August 2021
Revised: 01 July 2021
Received: 01 August 2020
Published in TODS Volume 46, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Directed graph
  2. densest subgraph discovery

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • Research Grants Council of Hong Kong
  • University of Hong Kong
  • Innovation and Technology Commission of Hong Kong
  • HKU-TCL Joint Research Center for Artificial Intelligence
  • NSFC
  • CUHK-SZ
  • Natural Sciences and Engineering Research Council of Canada (NSERC)
  • ARC

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)195
  • Downloads (Last 6 weeks)20
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Algorithms for Density Decomposition on Large Static and Dynamic GraphsProceedings of the VLDB Endowment10.14778/3681954.368197417:11(2933-2945)Online publication date: 1-Jul-2024
  • (2024)A Survey on the Densest Subgraph Problem and its VariantsACM Computing Surveys10.1145/365329856:8(1-40)Online publication date: 30-Apr-2024
  • (2024)On Searching Maximum Directed $(k, \ell)$-Plex2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00202(2570-2583)Online publication date: 13-May-2024
  • (2024)Efficient and effective algorithms for densest subgraph discovery and maintenanceThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00855-y33:5(1427-1452)Online publication date: 1-Sep-2024
  • (2023)Efficient Core Maintenance in Large Bipartite GraphsProceedings of the ACM on Management of Data10.1145/36173291:3(1-26)Online publication date: 13-Nov-2023
  • (2023)Efficient and Effective Algorithms for Generalized Densest Subgraph DiscoveryProceedings of the ACM on Management of Data10.1145/35893141:2(1-27)Online publication date: 20-Jun-2023
  • (2023)Scalable Algorithms for Densest Subgraph Discovery2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00029(287-300)Online publication date: Apr-2023
  • (2023)Efficient Densest Subgraphs Discovery in Large Dynamic Graphs by Greedy ApproximationIEEE Access10.1109/ACCESS.2023.327719711(49367-49377)Online publication date: 2023
  • (2023)Accelerating directed densest subgraph queries with software and hardware approachesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00805-033:1(207-230)Online publication date: 31-Jul-2023
  • (2022)Densest subgraph discovery on large graphsProceedings of the VLDB Endowment10.14778/3554821.355489515:12(3766-3769)Online publication date: 1-Aug-2022
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media