Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3178876.3186111acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
research-article
Free access

Provable and Practical Approximations for the Degree Distribution using Sublinear Graph Samples

Published: 23 April 2018 Publication History
  • Get Citation Alerts
  • Abstract

    The degree distribution is one of the most fundamental properties used in the analysis of massive graphs. There is a large literature on graph sampling, where the goal is to estimate properties (especially the degree distribution) of a large graph through a small, random sample. Estimating the degree distribution of real-world graphs poses a significant challenge, due to their heavy-tailed nature and the large variance in degrees. We design a new algorithm, SADDLES, for this problem, using recent mathematical techniques from the field of sublinear algorithms. The SADDLES algorithm gives provably accurate outputs for all values of the degree distribution. For the analysis, we define two fatness measures of the degree distribution, called the h-index and the z-index. We prove that SADDLES is sublinear in the graph size when these indices are large. A corollary of this result is a provably sublinear algorithm for any degree distribution bounded below by a power law. We deploy our new algorithm on a variety of real datasets and demonstrate its excellent empirical behavior. In all instances, we get extremely accurate approximations for all values in the degree distribution by observing at most $1%$ of the vertices. This is a major improvement over the state-of-the-art sampling algorithms, which typically sample more than $10%$ of the vertices to give comparable results. We also observe that the h and z-indices of real graphs are large, validating our theoretical analysis.

    References

    [1]
    D. Achlioptas, A. Clauset, D. Kempe, and C. Moore. 2009. On the bias of traceroute sampling: Or, power-law degree distributions in regular graphs. J. ACM Vol. 56, 4 (2009).
    [2]
    N.K. Ahmed, J. Neville, and R. Kompella. 2010. Reconsidering the Foundations of Network Sampling. In WIN 10.
    [3]
    N. Ahmed, J. Neville, and R. Kompella. 2012. Space-Efficient Sampling from Social Activity Streams SIGKDD BigMine. 1--8.
    [4]
    Nesreen K Ahmed, Nick Duffield, Jennifer Neville, and Ramana Kompella. 2014 a. Graph sample and hold: A framework for big-graph analytics SIGKDD. ACM, ACM, 1446--1455.
    [5]
    Nesreen K Ahmed, Jennifer Neville, and Ramana Kompella. 2014 b. Network sampling: From static to streaming graphs. TKDD Vol. 8, 2 (2014), 7.
    [6]
    Sinan G. Aksoy, Tamara G. Kolda, and Ali Pinar. 2017. Measuring and modeling bipartite graphs with community structure. Journal of Complex Networks (2017). to appear.
    [7]
    Albert-László Barabási and Réka Albert. 1999. Emergence of Scaling in Random Networks. Science Vol. 286 (Oct. 1999), 509--512.
    [8]
    A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. 2000. Graph structure in the web. Computer Networks Vol. 33 (2000), 309--320.
    [9]
    Deepayan Chakrabarti and Christos Faloutsos. 2006. Graph Mining: Laws, Generators, and Algorithms. Comput. Surveys Vol. 38, 1 (2006).
    [10]
    F. Chierichetti, A. Dasgupta, R. Kumar, S. Lattanzi, and T. Sarlos. 2016. On Sampling Nodes in a Network. In Conference on the World Wide Web (WWW).
    [11]
    A. Clauset and C. Moore. 2005. Accuracy and scaling phenomena in internet mapping. Phys. Rev. Lett. Vol. 94 (2005), 018701.
    [12]
    A. Clauset, C. R. Shalizi, and M. E. J. Newman. 2009. Power-Law Distributions in Empirical Data. SIAM Rev. Vol. 51, 4 (2009), 661--703.
    [13]
    R. Cohen, K. Erez, D. ben Avraham, and S. Havlin. 2000. Resilience of the Internet to Random Breakdowns. Phys. Rev. Lett. Vol. 85, 4626--8 (2000).
    [14]
    A. Dasgupta, R. Kumar, and T. Sarlos. 2014. On estimating the average degree. In Conference on the World Wide Web (WWW). 795--806.
    [15]
    D. Dubhashi and A. Panconesi. 2012. Concentration of Measure for the Analysis of Randomised Algorithms. Cambridge University Press.
    [16]
    N. Durak, T.G. Kolda, A. Pinar, and C. Seshadhri. 2013. A scalable null model for directed graphs matching all degree distributions: In, out, and reciprocal. In Network Science Workshop (NSW), 2013 IEEE 2nd. 23--30.
    [17]
    Peter Ebbes, Zan Huang, Arvind Rangaswamy, Hari P Thadakamalla, and ORGB Unit. 2008. Sampling large-scale social networks: Insights from simulated networks 18th Annual Workshop on Information Technologies and Systems, Paris, France.
    [18]
    Talya Eden, Shweta Jain, Ali Pinar, Dana Ron, and C. Seshadhri. 2017 a. Provable and practical approximations for the degree distribution using sublinear graph samples. CoRR Vol. abs/1710.08607 (2017). {arxiv}1710.08607 http://arxiv.org/abs/1710.08607
    [19]
    T. Eden, A. Levi, D. Ron, and C. Seshadhri. 2015. Approximately Counting Triangles in Sublinear Time Foundations of Computer Science (FOCS), GRS11 (Ed.). 614--633.
    [20]
    T. Eden, D. Ron, and C. Seshadhri. 2017 b. Sublinear Time Estimation of Degree Distribution Moments: The Degeneracy Connection. In International Colloquium on Automata, Languages, and Programming (ICALP), GRS11 (Ed.). 614--633.
    [21]
    M. Faloutsos, P. Faloutsos, and C. Faloutsos. 1999. On power-law relationships of the internet topology SIGCOMM. 251--262.
    [22]
    U. Feige. 2006. On sums of independent random variables with unbounded variance and estimating the average degree in a graph. SIAM J. Comput. Vol. 35, 4 (2006), 964--984.
    [23]
    O. Goldreich and D. Ron. 2002. Property Testing in Bounded Degree Graphs. Algorithmica (2002), 302--343.
    [24]
    O. Goldreich and D. Ron. 2008. Approximating average parameters of graphs. Random Structures and Algorithms Vol. 32, 4 (2008), 473--493.
    [25]
    M. Gonen, D. Ron, and Y. Shavitt. 2011. Counting stars and other small subgraphs in sublinear-time. SIAM Journal on Discrete Math Vol. 25, 3 (2011), 1365--1411.
    [26]
    Mira Gonen, Dana Ron, Udi Weinsberg, and Avishai Wool. 2008. Finding a dense-core in Jellyfish graphs. Computer Networks Vol. 52, 15 (2008), 2831--2841.
    [27]
    J. E. Hirsch. 2005. An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences Vol. 102, 46 (2005), 16569--16572.
    [28]
    A. Lakhina, J. Byers, M. Crovella, and P. Xie. 2003. Sampling biases in IP topology measurements. In Proceedings of INFOCOMM, Vol. Vol. 1. 332--341.
    [29]
    Sang Hoon Lee, Pan-Jun Kim, and Hawoong Jeong. 2006. Statistical properties of sampled networks. Physical Review E Vol. 73, 1 (2006), 016102.
    [30]
    Jure Leskovec. 2015. SNAP Stanford Network Analysis Project. http://snap.standord.edu. (2015).
    [31]
    Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Knowledge Data and Discovery (KDD). ACM, 631--636.
    [32]
    A. S. Maiya and T. Y. Berger-Wolf. 2011. Benefits of Bias: Towards Better Characterization of Network Sampling, In Knowledge Data and Discovery (KDD). ArXiv e-prints, 105--113. {arxiv}1109.3911
    [33]
    Andrew McGregor. 2014. Graph stream algorithms: A survey. SIGMOD Vol. 43, 1 (2014), 9--20.
    [34]
    M. Mitzenmacher. 2003. A Brief History of Generative Models for Power Law and Lognormal Distributions. Internet Mathematics Vol. 1, 2 (2003), 226--251.
    [35]
    M. E. J. Newman. 2003. The Structure and Function of Complex Networks. SIAM Rev. Vol. 45, 2 (2003), 167--256.
    [36]
    M. E. J. Newman, S. Strogatz, and D. Watts. 2001. Random graphs with arbitrary degree distributions and their applications. Physical Review E Vol. 64 (2001), 026118.
    [37]
    D. Pennock, G. Flake, S. Lawrence, E. Glover, and C. L. Giles. 2002. Winners don't take all: Characterizing the competition for links on the web. Proceedings of the National Academy of Sciences Vol. 99, 8 (2002), 5207--5211.
    [38]
    T. Petermann and P. Rios. 2004. Exploration of scale-free networks. European Physical Journal B Vol. 38 (2004), 201--204.
    [39]
    Ali Pinar, Sucheta Soundarajan, Tina Eliassi-Rad, and Brian Gallagher. 2015. MaxOutProbe: An Algorithm for Increasing the Size of Partially Observed Networks. Technical Report. Sandia National Laboratories (SNL-CA), Livermore, CA (United States).
    [40]
    Bruno Ribeiro and Don Towsley. 2012. On the estimation accuracy of degree distributions from graph sampling Annual Conference on Decision and Control (CDC). IEEE, 5240--5247.
    [41]
    Dana Ron. 2010. Algorithmic and Analysis Techniques in Property Testing. Foundations and Trends in Theoretical Computer Science Vol. 5, 2 (2010), 73--205.
    [42]
    Dana Ron and Gilad Tsur. 2016. The Power of an Example: Hidden Set Size Approximation Using Group Queries and Conditional Sampling. ACM Transactions on Computation Theory Vol. 8, 4 (2016), 15:1--15:19.
    [43]
    C. Seshadhri, Tamara G. Kolda, and Ali Pinar. 2012. Community structure and scale-free collections of Erdös-Rényi graphs. Physical Review E Vol. 85, 5 (May. 2012), 056109.
    [44]
    Olivia Simpson, C Seshadhri, and Andrew McGregor. 2015. Catching the head, tail, and everything in between: a streaming algorithm for the degree distribution. In International Conference on Data Mining (ICDM). IEEE, 979--984.
    [45]
    Michael PH Stumpf and Carsten Wiuf. 2005. Sampling properties of random graphs: the degree distribution. Physical Review E Vol. 72, 3 (2005), 036118.
    [46]
    Yaonan Zhang, Eric D Kolaczyk, and Bruce D Spencer. 2015. Estimating network degree distributions under sampling: An inverse problem, with applications to monitoring social media networks. The Annals of Applied Statistics Vol. 9, 1 (2015), 166--199.

    Cited By

    View all
    • (2024)EXGC: Bridging Efficiency and Explainability in Graph CondensationProceedings of the ACM on Web Conference 202410.1145/3589334.3645551(721-732)Online publication date: 13-May-2024
    • (2024)Brave the Wind and the Waves: Discovering Robust and Generalizable Graph Lottery TicketsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.334218446:5(3388-3405)Online publication date: May-2024
    • (2024)Denoising Item Graph With Disentangled Learning for RecommendationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.336148236:7(2942-2955)Online publication date: Jul-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '18: Proceedings of the 2018 World Wide Web Conference
    April 2018
    2000 pages
    ISBN:9781450356398
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    International World Wide Web Conferences Steering Committee

    Republic and Canton of Geneva, Switzerland

    Publication History

    Published: 23 April 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. degree distribution
    2. graphs
    3. sampling
    4. sublinear

    Qualifiers

    • Research-article

    Funding Sources

    • Azrieli Foundation
    • Israel Science Foundation
    • Blavatnik fund
    • Laboratory Directed Research and Development program at Sandia National Laboratories
    • NSF TRIPODS
    • Simons Institute

    Conference

    WWW '18
    Sponsor:
    • IW3C2
    WWW '18: The Web Conference 2018
    April 23 - 27, 2018
    Lyon, France

    Acceptance Rates

    WWW '18 Paper Acceptance Rate 170 of 1,155 submissions, 15%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)99
    • Downloads (Last 6 weeks)18

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)EXGC: Bridging Efficiency and Explainability in Graph CondensationProceedings of the ACM on Web Conference 202410.1145/3589334.3645551(721-732)Online publication date: 13-May-2024
    • (2024)Brave the Wind and the Waves: Discovering Robust and Generalizable Graph Lottery TicketsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.334218446:5(3388-3405)Online publication date: May-2024
    • (2024)Denoising Item Graph With Disentangled Learning for RecommendationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.336148236:7(2942-2955)Online publication date: Jul-2024
    • (2023)Robust Graph Structure Learning with Virtual Nodes ConstructionMathematics10.3390/math1106139711:6(1397)Online publication date: 13-Mar-2023
    • (2023)Interpretable Sparsification of Brain Graphs: Better Practices and Effective Designs for Graph Neural NetworksProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599394(1223-1234)Online publication date: 6-Aug-2023
    • (2023)DeMEtRIS: Counting (near)-Cliques by CrawlingProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570438(312-320)Online publication date: 27-Feb-2023
    • (2023)Interactive Activities Initiation through Retrieving Hidden Social Information Networks2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00063(538-547)Online publication date: 1-Dec-2023
    • (2023)The nature and nurture of network evolutionNature Communications10.1038/s41467-023-42856-514:1Online publication date: 3-Nov-2023
    • (2023)A matrix completion bootstrap method for estimating scale-free network degree distributionKnowledge-Based Systems10.1016/j.knosys.2023.110803(110803)Online publication date: Jul-2023
    • (2022)Bring orders into uncertaintyProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532379(1-14)Online publication date: 28-Jun-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media