Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

TAPER: query-aware, partition-enhancement for large, heterogenous graphs

Published: 01 June 2017 Publication History

Abstract

Graph partitioning has long been seen as a viable approach to addressing Graph DBMS scalability. A partitioning, however, may introduce extra query processing latency unless it is sensitive to a specific query workload, and optimised to minimise inter-partition traversals for that workload. Additionally, it should also be possible to incrementally adjust the partitioning in reaction to changes in the graph topology, the query workload, or both. Because of their complexity, current partitioning algorithms fall short of one or both of these requirements, as they are designed for offline use and as one-off operations. The TAPER system aims to address both requirements, whilst leveraging existing partitioning algorithms. TAPER takes any given initial partitioning as a starting point, and iteratively adjusts it by swapping chosen vertices across partitions, heuristically reducing the probability of inter-partition traversals for a given path queries workload. Iterations are inexpensive thanks to time and space optimisations in the underlying support data structures. We evaluate TAPER on two different large test graphs and over realistic query workloads. Our results indicate that, given a hash-based partitioning, TAPER reduces the number of inter-partition traversals by $$\sim $$~80%; given an unweighted Metis partitioning, by $$\sim $$~30%. These reductions are achieved within eight iterations and with the additional advantage of being workload-aware and usable online.

References

[1]
Barcelo, P., Hurtado, C.A., Libkin, L., Wood, P.T.: Expressive languages for path queries over graph-structured data. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pp. 3---14 (2010)
[2]
Chen, L.: Distance-join: pattern match query in a large graph. Sci. Technol. 2(1), 886---897 (2009)
[3]
Curino, C., Jones, E., Zhang, Y., Madden, S.: Schism: a workload-driven approach database replication and partitioning. Proc. VLDB Endow. 3(1---2), 48---57 (2010)
[4]
Delvenne, Jc, Schaub, M.T., Yaliraki, S.N.: The stability of a graph partition: a dynamics-based framework for community detection. Dyn. Complex Netw. 2, 221---242 (2013)
[5]
Fiduccia, C., Mattheyses, R.: A linear-time heuristic for improving network partitions. In: Proceedings of the 19th Design Automation Conference (1982)
[6]
Firth, H., Missier, P.: ProvGen: generating synthetic PROV graphs with predictable structure. In: 5th International Provenance and Annotation Workshop, (IPAW), pp. 16---27 (2014)
[7]
Firth, H., Missier, P.: Workload-aware streaming graph partitioning. In: Workshop Proceedings of the EDBT/ICDT 2016 Joint Conference (2016)
[8]
Hendrickson, B., Leland, R.: An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM J. Sci. Comput. 16(2), 452---469 (1995)
[9]
Huang, Z., Chung, W., Ong, T.H., Chen, H.: A graph-based recommender system for digital library. In: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, pp. 65---73 (2002)
[10]
Jindal, A., Dittrich, J.: Relax and let the database do the partitioning online. In: Enabling Real-Time Business Intelligence, pp. 65---80 (2012)
[11]
Karvounarakis, G., Ives, Z.G., Tannen, V.: Querying data provenance. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10), pp. 951---962. ACM, New York (2010)
[12]
Karypis, G., Kumar, V.: Multilevel k -way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 47(2), 109---124 (1997)
[13]
Karypis, G., Kumar, V.: A parallel algorithm for multilevel graph partitioning and sparse matrix ordering. J. Parallel Distrib. Comput. 48(1), 71---95 (1998)
[14]
Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291---307 (1970)
[15]
Li, H., Lee, S.: Mining top-K path traversal patterns over streaming web click-sequences. J. Inf. Sci. Eng. 1133(95), 1121---1133 (2009)
[16]
Margo, D., Seltzer, M.: A scalable distributed graph partitioner. Proc. VLDB Endow. 8(12), 1478---1489 (2015)
[17]
Mendelzon, A.O., Wood, P.T.: Finding regular simple paths in graph databases. SIAM J. Comput. 24(6), 1235---1258 (1995)
[18]
Mondal, J., Deshpande, A.: Managing large dynamic graphs efficiently. In: Proceedings of the 2012 international conference on Management of Data, pp. 145---156 (2012)
[19]
Moreau, L., Missier, P., Belhajjame, K., B'Far, R., Cheney, J., Coppens, S., Cresswell, S., Gil, Y., Groth, P., Klyne, G., Lebo, T., McCusker, J., Miles, S., Myers, J., Sahoo, S., Tilmes, C.: PROV-DM: the PROV data model technical reports. In: World Wide Web Consortium (2012)
[20]
Pavlo, A., Curino, C., Zdonik, S.: Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In: Proceedings of the 2012 international conference on Management of Data, p. 61 (2012)
[21]
Pujol, J.M., Erramilli, V., Siganos, G., Yang, X., Laoutaris, N., Chhabra, P., Rodriguez, P.: The little engine(s) that could. In: Proceedings of the ACM SIGCOMM 2010 Conference, pp. 375---386 (2010)
[22]
Quamar, A., Kumar, K.A., Deshpande, A.: SWORD: scalable workload-aware data placement for transactional workloads. In: Proceedings of the 16th International Conference on Extending Database Technology, p. 430. ACM Press, New York (2013)
[23]
Sanders, P., Schulz, C.: Think locally, act globally: highly balanced graph partitioning. In: International Symposium on Experimental Algorithms, pp. 164---175. Springer, New York (2013)
[24]
Schloegel, K., Karypis, G., Kumar, V.: Multilevel diffusion schemes for repartitioning of adaptive meshes. J. Parallel Distrib. Comput. 47(2), 109---124 (1997)
[25]
Shang, Z., Yu, J.X.: Catch the Wind: graph workload balancing on cloud. In: IEEE 29th International Conference on Data Engineering (ICDE), pp. 553---564 (2013)
[26]
Stanton, I., Kliot, G.: Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1222---1230 (2012)
[27]
Tong, H., Gallagher, B., Faloutsos, C., Eliassi-Rad, T.: Fast best-effort pattern matching in large attributed graphs. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 737 (2007)
[28]
Tsourakakis, C., Gkantsidis, C., Radunovic, B., Vojnovic, M.: FENNEL: streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 333---342 (2014)
[29]
Vaquero, L.M., Cuadrado, F., Logothetis, D., Martella, C.: Adaptive partitioning for large-scale dynamic graphs. In: IEEE 34th International Conference on Distributed Computing Systems (ICDCS), pp. 144---153 (2014)
[30]
Xu, N., Chen, L., Cui, B.: LogGP: a log-based dynamic graph partitioning method. Proc. VLDB Endow. 7(14), 1917---1928 (2014)
[31]
Xu, N., Cui, B., Chen, L., Huang, Z., Shao, Y.: Heterogeneous environment aware streaming graph partitioning. IEEE Trans. Knowl. Data Eng. 27(6), 1560---1572 (2015)
[32]
Yang, S., Yan, X., Zong, B., Khan, A.: Towards effective partition management for large graphs. In: Proceedings of the 2012 International Conference on Management of Data, pp. 517---528. ACM Press, New York (2012)

Cited By

View all
  • (2019)HeliosProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3300103(1820-1822)Online publication date: 25-Jun-2019
  • (2019)Experimental Analysis of Streaming Algorithms for Graph PartitioningProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3300076(1375-1392)Online publication date: 25-Jun-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Distributed and Parallel Databases
Distributed and Parallel Databases  Volume 35, Issue 2
June 2017
112 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 June 2017

Author Tags

  1. Graph databases
  2. Graph repartitioning
  3. Workload mining

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2019)HeliosProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3300103(1820-1822)Online publication date: 25-Jun-2019
  • (2019)Experimental Analysis of Streaming Algorithms for Graph PartitioningProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3300076(1375-1392)Online publication date: 25-Jun-2019

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media