Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3447548.3467354acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Bavarian: Betweenness Centrality Approximation with Variance-Aware Rademacher Averages

Published: 14 August 2021 Publication History

Abstract

We present Bavarian, a collection of sampling-based algorithms for approximating the Betweenness Centrality (BC) of all vertices in a graph. Our algorithms use Monte-Carlo Empirical Rademacher Averages (MCERAs), a concept from statistical learning theory, to efficiently compute tight bounds on the maximum deviation of the estimates from the exact values. The MCERAs provide a sample-dependent approximation guarantee much stronger than the state of the art, thanks to its use of variance-aware probabilistic tail bounds. The flexibility of the MCERA allows us to introduce a unifying framework that can be instantiated with existing sampling-based estimators of BC, thus allowing a fair comparison between them, decoupled from the sample-complexity results with which they were originally introduced. Additionally, we prove novel sample-complexity results showing that, for all estimators, the sample size sufficient to achieve a desired approximation guarantee depends on the vertex-diameter of the graph, an easy-to-bound characteristic quantity. We also show progressive-sampling algorithms and extensions to other centrality measures, such as percolation centrality. Our extensive experimental evaluation of Bavarian shows the improvement over the state-of-the art made possible by the MCERA, and it allows us to assess the different trade-offs between sample size and accuracy guarantee offered by the different estimators.

References

[1]
Z. AlGhamdi et al. 2017. A Benchmark for Betweenness Centrality Approximation Algorithms on Large Graphs. In SSDBM'17.
[2]
J. M. Anthonisse. 1971. The rush in a directed graph. Technical Report BN 9/71. Stichting Mathematisch Centrum, Amsterdam, Netherlands.
[3]
D. A. Bader et al. 2007. Approximating Betweenness Centrality. In Algorithms and Models for the Web-Graph, Springer, 124--137.
[4]
P. L. Bartlett and S. Mendelson. 2002. Rademacher and Gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res. 3, 463--482.
[5]
A. Bavelas. 1950. Communication patterns in task-oriented groups. J. Acoust. Soc. Am. 22, 6 (1950), 725--730.
[6]
G. Bennett. 1962. Probability inequalities for the sum of independent random variables. J. Amer. Statist. Assoc. 57, 297 (1962), 33--45.
[7]
E. Bergamini and H. Meyerhenke. 2015. Fully-dynamic Approximation of Betweenness Centrality. In ESA'15. 155--166.
[8]
E. Bergamini and H. Meyerhenke. 2016. Approximating Betweenness Centrality in Fully-dynamic Networks. Internet Math. 12, 5 (2016), 281--314.
[9]
E. Bergamini et al. 2015. Approximating Betweenness Centrality in Large Evolving Networks. In ALENEX '15. SIAM, 133--146.
[10]
P. Boldi and S. Vigna. 2014. Axioms for centrality. Internet Math. 10, 3--4 (2014), 222--262.
[11]
F. Bonchi et al. 2016. Centrality measures on big graphs: Exact, approximated, and distributed algorithms. In WWW'15. 1017--1020.
[12]
M. Borassi and E. Natale. 2019. KADABRA is an ADaptive Algorithm for Betweenness via Random Approximation. J. Exp. Alg. 24, 1 (2019).
[13]
S. P. Borgatti and M. G. Everett. 2006. A Graph-theoretic perspective on centrality. Soc. Netw. 28, 4 (2006), 466--484.
[14]
U. Brandes. 2001. A faster algorithm for betweenness centrality. J. Math. Sociol. 25, 2 (2001), 163--177.
[15]
U. Brandes. 2008. On variants of shortest-path betweenness centrality and their generic computation. Soc. Netw. 30, 2 (2008), 136--145.
[16]
U. Brandes and C. Pich. 2007. Centrality estimation in large networks. Int. J. Bifur. Chaos 17, 7 (2007), 2303--2318.
[17]
S. Cabello et al. . 2013. Multiple-source shortest paths in embedded graphs. SIAM J. Comput. 42, 4 (2013), 1542--1571.
[18]
M. H. Chehreghani et al. . 2018. Efficient Exact and Approximate Algorithms for Computing Betweenness Centrality in Directed Graphs. In PAKDD'18, 752--764.
[19]
F. Chierichetti et al. 2016. On sampling nodes in a network. In WWW'16. 471--481.
[20]
F. Chierichetti and S. Haddadan. 2018. On the Complexity of Sampling Vertices Uniformly from a Graph. In ICALP'18.
[21]
C. Cousins and M. Riondato. 2020. Sharp uniform convergence bounds through empirical centralization. In NeurIPS'20.
[22]
A. M. de Lima et al. 2020. Estimating the Percolation Centrality of Large Networks through Pseudo-dimension Theory. In KDD'20. ACM.
[23]
S. Dolev et al. 2010. Routing betweenness centrality. J. ACM 57, 4, Article 25 (May 2010), 27 pages.
[24]
D. Erd's et al. 2015. A Divide-and-Conquer Algorithm for Betweenness Centrality. In SDM '15. 433--441.
[25]
C. Fan et al. 2019. Learning to Identify High Betweenness Centrality Nodes from Scratch. In CIKM'19. ACM.
[26]
L. C. Freeman. 1977. A set of measures of centrality based on betweenness. Sociometry 40 (1977), 35--41.
[27]
R. Geisberger et al. 2008. Better Approximation of Betweenness Centrality. In ALENEX '08. SIAM, 90--100.
[28]
J. Ghurye and .M Pop. 2016. Better Identification of Repeats in Metagenomic Scaffolding. In WABI 2016. Springer, 174--184.
[29]
O. Green et al. 2012. A Fast Algorithm for Streaming Betweenness Centrality. In PASSAT '12. IEEE, 11--20.
[30]
D. Haussler. 1995. Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension. J. Comb. Th., Ser. A 69, 2 (1995).
[31]
T. Hayashi et al. 2015. Fully Dynamic Betweenness Centrality Maintenance on Massive Networks. VLDB'16 (2015).
[32]
W. Hoeffding. 1963. Probability Inequalities for Sums of Bounded Random Variables. J. Am. Stat. Assoc. 58, 301 (1963), 13--30.
[33]
R. Jacob et al. 2005. Algorithms for Centrality Indices. In Network Analysis. Springer, 62--82.
[34]
G. H. John and P. Langley. 1996. Static Versus Dynamic Sampling for Data Mining. In KDD '96. The AAAI Press, Menlo Park, CA, USA, 367--370.
[35]
M. Kas et al. 2013. Incremental Algorithm for Updating Betweenness Centrality in Dynamically Growing Networks. In ASONAM '13. IEEE/ACM, 33--40.
[36]
L. Katzir et al. 2014. Estimating sizes of social networks via biased sampling. Internet Math. 10, 3--4 (2014).
[37]
V. Koltchinskii. 2001. Rademacher penalties and structural risk minimization. IEEE Trans. Inf. Th. 47, 5 (July 2001), 1902--1914.
[38]
A. Kontorovich. 2016. Agnostic PAC lower bound. https://www.cs.bgu.ac.il/ asml162/wiki.files/agnostic-pac-lb.pdf
[39]
N. Kourtellis et al. 2012. Identifying high betweenness centrality nodes in large social networks. Soc. Netw. Anal. Mining 3, 4 (2012), 899--914.
[40]
N. Kourtellis et al. 2015. Scalable Online Betweenness Centrality in Evolving Graphs. IEEE Trans. Knowl. Data Eng. 27, 9 (2015), 2494--2506.
[41]
J. Leskovec and A. Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
[42]
Y. Li et al. 2019. Electric Power Grid Invulnerability Under Intentional Edge-Based Attacks. In DependSys'19, 454--461.
[43]
Y.-S. Lim et al. 2011. Online estimating the k central nodes of a network. In IEEE Netw. Sci. Work. (NSW'11). 118--122.
[44]
A. S. Maiya and T. Y. Berger-Wolf. 2010. Online Sampling of High Centrality Individuals in Social Networks. In PAKDD'10, 91--98.
[45]
J. Matta et al. 2019. Comparing the speed and accuracy of approaches to betweenness centrality approximation. Comp. Soc. Netw. 6, 1 (2019), 2.
[46]
A. McLaughlin and D. A. Bader. 2014. Scalable and High Performance Betweenness Centrality on the GPU. SC'14 (Nov 2014).
[47]
M. E. J. Newman and M. Girvan. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69 (Feb. 2004). Issue 2.
[48]
T. Opsahl et al. 2010. Node centrality in weighted networks: Generalizing degree and shortest paths. Soc. Netw. 32, 3 (2010), 245--251.
[49]
J. Pfeffer and K. M. Carley. 2012. k-Centralities: local approximations of global measures based on shortest paths. In WWW '12. ACM, 1043--1050.
[50]
D. Pollard. 1984. Convergence of stochastic processes. Springer-Verlag.
[51]
M. Pontecorvi and V. Ramachandran. 2015. Fully Dynamic Betweenness Centrality. In ISAAC '15. 331--342.
[52]
D. Prountzos and K. Pingali. 2013. Betweenness centrality: algorithms and implementations. In PPoPP '13. ACM 35--46.
[53]
M. Riondato and E. M. Kornaropoulos. 2016. Fast approximation of betweenness centrality through sampling. Data Min. Knowl. Disc. 30, 2 (2016), 438--475.
[54]
M. Riondato and E. Upfal. 2015. Mining Frequent Itemsets through Progressive Sampling with Rademacher Averages. In KDD '15. ACM, 1005--1014.
[55]
M. Riondato and E. Upfal. 2018. ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages. ACM Trans. Knowl. Disc. Data 12, 5 (2018), 61.
[56]
A. E. Saryüce et al. 2017. Graph Manipulations for Fast Centrality Computation. ACM Trans. Knowl. Disc. Data 11, 3 (2017), 1--25.
[57]
S. Shalev-Shwartz and S. Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.
[58]
N. Srebro and K. Sridharan. 2010. Note on refined Dudley integral covering number bound. (2010). http://www.cs.cornell.edu/ sridharan/dudley.pdf.
[59]
C. L. Staudt et al., 2016. NetworKit: An Interactive Tool Suite for HighPerformance Network Analysis. Netw. Sci. 4, 4 (2016).
[60]
V. N. Vapnik. 1998. Statistical learning theory. Wiley.
[61]
V. N. Vapnik and A. J. Chervonenkis. 1971. On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities. Th. Prob. Appl. 16, 2 (1971), 264--280.
[62]
Y. Yoshida. 2014. Almost Linear-time Algorithms for Adaptive Betweenness Centrality Using Hypergraph Sketches. In KDD '14. ACM, 1416--1425.

Cited By

View all
  • (2025)Labeling-based centrality approaches for identifying critical edges on temporal graphsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-3424-y19:2Online publication date: 1-Feb-2025
  • (2024)Efficient Betweenness Centrality Computation over Large Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3681954.368200617:11(3360-3372)Online publication date: 30-Aug-2024
  • (2024)Efficient Exact and Approximate Betweenness Centrality Computation for Temporal GraphsProceedings of the ACM Web Conference 202410.1145/3589334.3645438(2395-2406)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
August 2021
4259 pages
ISBN:9781450383325
DOI:10.1145/3447548
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. concentration bounds
  2. dynamic graphs
  3. percolation centrality
  4. random sampling
  5. sample complexity
  6. statistical learning theory

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)131
  • Downloads (Last 6 weeks)31
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Labeling-based centrality approaches for identifying critical edges on temporal graphsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-3424-y19:2Online publication date: 1-Feb-2025
  • (2024)Efficient Betweenness Centrality Computation over Large Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3681954.368200617:11(3360-3372)Online publication date: 30-Aug-2024
  • (2024)Efficient Exact and Approximate Betweenness Centrality Computation for Temporal GraphsProceedings of the ACM Web Conference 202410.1145/3589334.3645438(2395-2406)Online publication date: 13-May-2024
  • (2024)MANTRA: Temporal Betweenness Centrality Approximation Through SamplingMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70341-6_8(125-143)Online publication date: 22-Aug-2024
  • (2023)Efficient Centrality Maximization with Rademacher AveragesProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599325(1872-1884)Online publication date: 6-Aug-2023
  • (2023) Bavarian: Betweenness Centrality Approximation with Variance-aware Rademacher AveragesACM Transactions on Knowledge Discovery from Data10.1145/357702117:6(1-47)Online publication date: 6-Mar-2023
  • (2022)MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern MiningACM Transactions on Knowledge Discovery from Data10.1145/353218716:6(1-29)Online publication date: 30-Jul-2022
  • (2022)Deception detection on social mediaKnowledge-Based Systems10.1016/j.knosys.2022.109649256:COnline publication date: 28-Nov-2022
  • (2022)Bounding the Family-Wise Error Rate in Local Causal Discovery Using Rademacher AveragesMachine Learning and Knowledge Discovery in Databases10.1007/978-3-031-26419-1_16(255-271)Online publication date: 19-Sep-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media