Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Planting trees for scalable and efficient canonical hub labeling

Published: 09 December 2019 Publication History

Abstract

Hub labeling is widely used to improve the latency and throughput of Point-to-Point Shortest Distance (PPSD) queries in graph databases. However, constructing hub labeling, even via the state-of-the-art Pruned Landmark Labeling (PLL) algorithm is computationally intensive. PLL further has a sequential root order label dependency that makes it challenging to parallelize. Hence, the existing parallel approaches are often plagued by label size increase, poor scalability and inability to process large weighted graphs.
In this paper, we develop novel algorithms that construct the minimal (guaranteed) Canonical Hub Labeling on shared and distributed-memory parallel systems in a scalable and efficient manner. Our key contribution, the PLaNT algorithm, provides an embarrassingly parallel approach for label construction that scales well beyond the limits of current practice. Our approach is the first to employ a collaborative label partitioning scheme across multiple nodes of a cluster, for completely in-memory labeling and parallel querying on massive graphs whose labels cannot fit on a single node.
On a single node with 72-threads, our shared-memory algorithm is up to 47.4X faster than sequential PLL. While our labeling time is comparable to the state-of-the-art shared-memory paraPLL, our label size is 17% smaller on average.
PLaNT demonstrates superior parallel scalability. It can process significantly larger graphs and construct labeling orders of magnitude faster than the state-of-the-art distributed paraPLL. Compared to the best shared-memory parallel algorithm, it achieves up to 9.5X speedup on a 64 node cluster.

References

[1]
The skitter as links dataset, 2019. [Online; accessed 8-April-2019].
[2]
I. Abraham, D. Delling, A. Fiat, A. V. Goldberg, and R. F. Werneck. Vc-dimension and shortest path algorithms. In International Colloquium on Automata, Languages, and Programming, pages 690--699. Springer, 2011.
[3]
I. Abraham, D. Delling, A. V. Goldberg, and R. F. Werneck. A hub-based labeling algorithm for shortest paths in road networks. In International Symposium on Experimental Algorithms, pages 230--241. Springer, 2011.
[4]
I. Abraham, D. Delling, A. V. Goldberg, and R. F. Werneck. Hierarchical hub labelings for shortest paths. In European Symposium on Algorithms, pages 24--35. Springer, 2012.
[5]
I. Abraham, A. Fiat, A. V. Goldberg, and R. F. Werneck. Highway dimension, shortest paths, and provably efficient algorithms. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 782--793. Society for Industrial and Applied Mathematics, 2010.
[6]
A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. Dbmss on a modern processor: Where does time go? In VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7--10, 1999, Edinburgh, Scotland, UK, number CONF, pages 266--277, 1999.
[7]
T. Akiba, Y. Iwata, K.-i. Kawarabayashi, and Y. Kawata. Fast shortest-path distance queries on road networks by pruned highway labeling. In 2014 Proceedings of the Sixteenth Workshop on Algorithm Engineering and Experiments (ALENEX), pages 147--154. SIAM, 2014.
[8]
T. Akiba, Y. Iwata, and Y. Yoshida. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 349--360. ACM, 2013.
[9]
T. Akiba, Y. Iwata, and Y. Yoshida. Dynamic and historical shortest-path distance queries on large evolving networks by pruned landmark labeling. In Proceedings of the 23rd international conference on World wide web, pages 237--248. ACM, 2014.
[10]
T. Akiba, C. Sommer, and K.-i. Kawarabayashi. Shortest-path queries for complex networks: exploiting low tree-width outside the core. In Proceedings of the 15th International Conference on Extending Database Technology, pages 144--155. ACM, 2012.
[11]
R. Albert, H. Jeong, and A.-L. Barabási. Internet: Diameter of the world-wide web. nature, 401(6749):130, 1999.
[12]
M. Babenko, A. V. Goldberg, H. Kaplan, R. Savchenko, and M. Weller. On the complexity of hub labeling. In International Symposium on Mathematical Foundations of Computer Science, pages 62--74. Springer, 2015.
[13]
L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 44--54. ACM, 2006.
[14]
R. Bauer, D. Delling, P. Sanders, D. Schieferdecker, D. Schultes, and D. Wagner. Combining hierarchical and goal-directed speed-up techniques for dijkstra's algorithm. ACM Journal of Experimental Algorithmics, 15(2.3), 2010.
[15]
F. Busato, O. Green, N. Bombieri, and D. A. Bader. Hornet: An efficient data structure for dynamic sparse graphs and matrices on gpus. In 2018 IEEE High Performance extreme Computing Conference (HPEC), pages 1--7. IEEE, 2018.
[16]
R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: taking the pulse of a fast-changing and connected world. In Proceedings of the 7th ACM european conference on Computer Systems, pages 85--98. ACM, 2012.
[17]
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SIAM Journal on Computing, 32(5):1338--1355, 2003.
[18]
D. Delling, A. V. Goldberg, R. Savchenko, and R. F. Werneck. Hub labels: Theory and practice. In International Symposium on Experimental Algorithms, pages 259--270. Springer, 2014.
[19]
D. Delling, A. V. Goldberg, and R. F. Werneck. Hub label compression. In International Symposium on Experimental Algorithms, pages 18--29. Springer, 2013.
[20]
C. Demetrescu, A. Goldberg, and D. Johnson. 9th dimacs implementation challenge-shortest paths. American Mathematical Society, 2006.
[21]
L. Dhulipala, G. Blelloch, and J. Shun. Julienne: A framework for parallel graph algorithms using work-efficient bucketing. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, pages 293--304. ACM, 2017.
[22]
Q. Dong, K. Lakhotia, H. Zeng, R. Karman, V. Prasanna, and G. Seetharaman. A fast and efficient parallel algorithm for pruned landmark labeling. In 2018 IEEE High Performance extreme Computing Conference (HPEC), pages 1--7. IEEE, 2018.
[23]
S. Eyerman, W. Heirman, K. D. Bois, J. B. Fryman, and I. Hur. Many-core graph workload analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC '18, pages 22:1--22:11, Piscataway, NJ, USA, 2018. IEEE Press.
[24]
D. Ferizovic and G. E. Blelloch. Parallel pruned landmark labeling for shortest path queries on unit-weight networks. 2015.
[25]
J. S. Firoz, M. Zalewski, T. Kanewala, and A. Lumsdaine. Synchronization-avoiding graph algorithms. In 2018 IEEE 25th International Conference on High Performance Computing (HiPC), pages 52--61. IEEE, 2018.
[26]
A. W.-C. Fu, H. Wu, J. Cheng, and R. C.-W. Wong. Is-label: an independent-set based labeling scheme for point-to-point distance querying. Proceedings of the VLDB Endowment, 6(6):457--468, 2013.
[27]
R. Geisberger, P. Sanders, and D. Schultes. Better approximation of betweenness centrality. In Proceedings of the Meeting on Algorithm Engineering & Expermiments, pages 90--100. Society for Industrial and Applied Mathematics, 2008.
[28]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Presented as part of the 10th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 12), pages 17--30, 2012.
[29]
J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. Graphx: Graph processing in a distributed dataflow framework. In 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14), pages 599--613, 2014.
[30]
O. Green, M. Dukhan, and R. Vuduc. Branch-avoiding graph algorithms. In Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures, pages 212--223. ACM, 2015.
[31]
S. Hangal, D. MacLean, M. S. Lam, and J. Heer. All friends are not equal: Using weights in social graphs to improve search. In Workshop on Social Network Mining & Analysis, ACM KDD, 2010.
[32]
P. E. Hart, N. J. Nilsson, and B. Raphael. A formal basis for the heuristic determination of minimum cost paths. IEEE transactions on Systems Science and Cybernetics, 4(2):100--107, 1968.
[33]
S. Horvath. Weighted network analysis: applications in genomics and systems biology. Springer Science & Business Media, 2011.
[34]
M. Jiang, A. W.-C. Fu, R. C.-W. Wong, and Y. Xu. Hop doubling label indexing for point-to-point distance querying on scale-free networks. Proceedings of the VLDB Endowment, 7(12):1203--1214, 2014.
[35]
B. H. Junker and F. Schreiber. Analysis of biological networks, volume 2. Wiley Online Library, 2008.
[36]
J. Kunegis. Konect: the koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web, pages 1343--1350. ACM, 2013.
[37]
K. Lakhotia, R. Kannan, and V. Prasanna. Accelerating pagerank using partition-centric processing. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 427--440, 2018.
[38]
W. Li, M. Qiao, L. Qin, Y. Zhang, L. Chang, and X. Lin. Scaling distance labeling on small-world networks. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD '19, pages 1060--1077, New York, NY, USA, 2019. ACM.
[39]
Y. Li, M. L. Yiu, N. M. Kou, et al. An experimental study on hub labeling based shortest path algorithms. Proceedings of the VLDB Endowment, 11(4):445--457, 2017.
[40]
A. Lumsdaine, D. Gregor, B. Hendrickson, and J. Berry. Challenges in parallel graph processing. Parallel Processing Letters, 17(01):5--20, 2007.
[41]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 135--146. ACM, 2010.
[42]
F. McSherry, M. Isard, and D. G. Murray. Scalability! but at what {COST}? In 15th Workshop on Hot Topics in Operating Systems (HotOS {XV}), 2015.
[43]
M. E. Newman. Scientific collaboration networks. ii. shortest paths, weighted networks, and centrality. Physical review E, 64(1):016132, 2001.
[44]
D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 456--471. ACM, 2013.
[45]
K. Qiu, Y. Zhu, J. Yuan, J. Zhao, X. Wang, and T. Wolf. Parapll: Fast parallel shortest-path distance query on large-scale weighted graphs. In Proceedings of the 47th International Conference on Parallel Processing, page 2. ACM, 2018.
[46]
S. A. Rahman, P. Advani, R. Schunk, R. Schrader, and D. Schomburg. Metabolic pathway analysis web service (pathway hunter tool at cubic). Bioinformatics, 21(7):1189--1193, 2004.
[47]
P. Shiralkar, A. Flammini, F. Menczer, and G. L. Ciampaglia. Finding streams in knowledge graphs to support fact checking. In 2017 IEEE International Conference on Data Mining (ICDM), pages 859--864. IEEE, 2017.
[48]
J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In ACM Sigplan Notices, volume 48, pages 135--146. ACM, 2013.
[49]
Y. Tang, M. Li, J. Wang, Y. Pan, and F.-X. Wu. Cytonca: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems, 127:67--72, 2015.
[50]
B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the evolution of user interaction in facebook. In Proceedings of the 2nd ACM workshop on Online social networks, pages 37--42. ACM, 2009.
[51]
F. Wei. Tedi: efficient shortest path query answering on graphs. In Graph Data Management: Techniques and Applications, pages 214--238. IGI Global, 2012.
[52]
H. Wei, J. X. Yu, C. Lu, and X. Lin. Speedup graph processing by graph ordering. In Proceedings of the 2016 International Conference on Management of Data, pages 1813--1828. ACM, 2016.
[53]
J. Xu and Y. Li. Discovering disease-genes by topological features in human protein-protein interaction network. Bioinformatics, 22(22):2800--2805, 2006.
[54]
S. A. Yahia, M. Benedikt, L. V. Lakshmanan, and J. Stoyanovich. Efficient network aware search in collaborative tagging sites. Proceedings of the VLDB Endowment, 1(1):710--721, 2008.
[55]
D. Zhang, C.-Y. Chow, A. Liu, X. Zhang, Q. Ding, and Q. Li. Efficient evaluation of shortest travel-time path queries through spatial mashups. GeoInformatica, 22(1):3--28, 2018.
[56]
X. Zhu, W. Chen, W. Zheng, and X. Ma. Gemini: A computation-centric distributed graph processing system. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), pages 301--316, 2016.

Cited By

View all
  • (2024)Optimization Study of Terrace Restoration Sampling Governance Model Based on Fuzzy Clustering Algorithm2024 Asia-Pacific Conference on Software Engineering, Social Network Analysis and Intelligent Computing (SSAIC)10.1109/SSAIC61213.2024.00060(288-291)Online publication date: 10-Jan-2024
  • (2024)Scalable Distance Labeling Maintenance and Construction for Dynamic Small-World Networks2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00348(4573-4585)Online publication date: 13-May-2024
  • (2024)Exact and Approximate Hierarchical Hub LabelingWALCOM: Algorithms and Computation10.1007/978-981-97-0566-5_15(194-211)Online publication date: 18-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 13, Issue 4
December 2019
167 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 09 December 2019
Published in PVLDB Volume 13, Issue 4

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Optimization Study of Terrace Restoration Sampling Governance Model Based on Fuzzy Clustering Algorithm2024 Asia-Pacific Conference on Software Engineering, Social Network Analysis and Intelligent Computing (SSAIC)10.1109/SSAIC61213.2024.00060(288-291)Online publication date: 10-Jan-2024
  • (2024)Scalable Distance Labeling Maintenance and Construction for Dynamic Small-World Networks2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00348(4573-4585)Online publication date: 13-May-2024
  • (2024)Exact and Approximate Hierarchical Hub LabelingWALCOM: Algorithms and Computation10.1007/978-981-97-0566-5_15(194-211)Online publication date: 18-Mar-2024
  • (2024)Parameterized Upper Bounds for Path-Consistent Hub LabelingCombinatorial Algorithms10.1007/978-3-031-63021-7_34(446-459)Online publication date: 1-Jul-2024
  • (2023)Parallel Hub Labeling Maintenance With High Efficiency in Dynamic Small-World NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.323663235:11(11751-11768)Online publication date: 13-Jan-2023
  • (2023)Fully Dynamic Contraction Hierarchies with Label Restrictions on Road NetworksData Science and Engineering10.1007/s41019-023-00227-68:3(263-278)Online publication date: 4-Sep-2023
  • (2023)Contraction Hierarchies with Label Restrictions Maintenance in Dynamic Road NetworksDatabase Systems for Advanced Applications10.1007/978-3-031-30675-4_18(269-285)Online publication date: 17-Apr-2023
  • (2022)Workload-Aware Shortest Path Distance Querying in Road Networks2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00223(2372-2384)Online publication date: May-2022
  • (2022)Reachability Labeling for Distributed Graphs2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00056(686-698)Online publication date: May-2022
  • (2021)An experimental evaluation and guideline for path finding in weighted dynamic networkProceedings of the VLDB Endowment10.14778/3476249.347626714:11(2127-2140)Online publication date: 1-Jul-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media