Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A general-purpose query-centric framework for querying big graphs

Published: 01 March 2016 Publication History

Abstract

Pioneered by Google's Pregel, many distributed systems have been developed for large-scale graph analytics. These systems employ a user-friendly "think like a vertex" programming model, and exhibit good scalability for tasks where the majority of graph vertices participate in computation. However, the design of these systems can seriously under-utilize the resources in a cluster for processing light-workload graph queries, where only a small fraction of vertices need to be accessed. In this work, we develop a new open-source system, called Quegel, for querying big graphs. Quegel treats queries as first-class citizens in its design: users only need to specify the Pregel-like algorithm for a generic query, and Quegel processes light-workload graph queries on demand, using a novel superstep-sharing execution model to effectively utilize the cluster resources. Quegel further provides a convenient interface for constructing graph indexes, which significantly improve query performance but are not supported by existing graph-parallel systems. Our experiments verified that Quegel is highly efficient in answering various types of graph queries and is up to orders of magnitude faster than existing systems.

References

[1]
Apache Giraph: http://giraph.apache.org.
[2]
http://dblp.uni-trier.de/xml/.
[3]
http://km.aifb.kit.edu/projects/btc-2009.
[4]
http://konect.uni-koblenz.de/networks/livejournal-groupmemberships.
[5]
http://konect.uni-koblenz.de/networks/twitter_mpi.
[6]
http://www.xml-benchmark.org/.
[7]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, pages 17--30, 2012.
[8]
J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. Graphx: Graph processing in a distributed dataflow framework. In OSDI, pages 599--613, 2014.
[9]
M. Han, K. Daudjee, K. Ammar, M. T. Özsu, X. Wang, and T. Jin. An experimental comparison of Pregel-like graph processing systems. PVLDB, 7(12):1047--1058, 2014.
[10]
B. Iordanov. HyperGraphDB: A generalized graph database. In WAIM Workshop on Graph Database, pages 25--36, 2010.
[11]
R. Jin, N. Ruan, B. You, and H. Wang. Hub-accelerator: Fast and exact shortest path computation in large social networks. CoRR, abs/1305.0507, 2013.
[12]
A. Khan and S. Elnikety. Systems for big-graphs. PVLDB, 7(13):1709--1710, 2014.
[13]
A. Kyrola, G. E. Blelloch, and C. Guestrin. GraphChi: Large-scale graph computation on just a PC. In OSDI, pages 31--46, 2012.
[14]
Z. Liu and Y. Chen. Reasoning and identifying relevant matches for XML keyword search. PVLDB, 1(1):921--932, 2008.
[15]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed GraphLab: A framework for machine learning in the cloud. PVLDB, 5(8):716--727, 2012.
[16]
Y. Lu, J. Cheng, D. Yan, and H. Wu. Large-scale distributed graph computing systems: An experimental evaluation. PVLDB, 8(3):281--292, 2014.
[17]
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD Conference, pages 135--146, 2010.
[18]
Neo Technology. Neo4j graph database. http://www.neo4j.org/.
[19]
A. Roy, I. Mihailovic, and W. Zwaenepoel. X-stream: edge-centric graph processing using streaming partitions. In SOSP, pages 472--488, 2013.
[20]
S. Sakr, S. Elnikety, and Y. He. G-SPARQL: a hybrid engine for querying large attributed graphs. In CIKM, pages 335--344, 2012.
[21]
S. Salihoglu and J. Widom. GPS: a graph processing system. In SSDBM, page 22, 2013.
[22]
M. Sarwat, S. Elnikety, Y. He, and M. F. Mokbel. Horton+: A distributed system for processing declarative reachability queries over partitioned graphs. PVLDB, 6(14):1918--1929, 2013.
[23]
B. Shao, H. Wang, and Y. Li. Trinity: a distributed graph engine on a memory cloud. In SIGMOD, pages 505--516, 2013.
[24]
A. Termehchy and M. Winslett. Using structural information in XML keyword search effectively. ACM Trans. Database Syst., 36(1):4, 2011.
[25]
D. Yan, J. Cheng, Y. Lu, and W. Ng. Blogel: A block-centric framework for distributed computation on real-world graphs. PVLDB, 7(14):1981--1992, 2014.
[26]
D. Yan, J. Cheng, Y. Lu, and W. Ng. Effective techniques for message reduction and load balancing in distributed graph computation. In WWW, pages 1307--1317, 2015.
[27]
D. Yan, J. Cheng, M. T. Özsu, F. Yang, Y. Lu, J. C. S. Lui, Q. Zhang, and W. Ng. Quegel: A general-purpose query-centric framework for querying big graphs. CoRR, abs/1601.06497, 2016.
[28]
D. Yan, J. Cheng, K. Xing, Y. Lu, W. Ng, and Y. Bu. Pregel algorithms for graph connectivity problems with performance guarantees. PVLDB, 7(14):1821--1832, 2014.
[29]
F. Yang, J. Li, and J. Cheng. Husky: Towards a more efficient and expressive distributed computing framework. PVLDB, 9(5):420--431, 2016.
[30]
C. Zhang, Q. Ma, X. Wang, and A. Zhou. Distributed SLCA-based XML keyword search by map-reduce. In DASFAA Workshop on Ubiquitous Data Management, pages 386--397, 2010.
[31]
J. Zhou, Z. Bao, Z. Chen, and T. W. Ling. Fast result enumeration for keyword queries on XML data. In DASFAA, pages 95--109, 2012.
[32]
J. Zhou, Z. Bao, W. Wang, T. W. Ling, Z. Chen, X. Lin, and J. Guo. Fast SLCA and ELCA computation for XML keyword queries based on set intersection. In ICDE, pages 905--916, 2012.
[33]
J. Zhou, Z. Bao, W. Wang, J. Zhao, and X. Meng. Efficient query processing for XML keyword queries based on the idlist index. VLDB J., 23(1):25--50, 2014.

Cited By

View all
  • (2024)Automating Vectorized Distributed Graph ComputationProceedings of the ACM on Management of Data10.1145/36988332:6(1-27)Online publication date: 20-Dec-2024
  • (2024)A Survey on Concurrent Processing of Graph Analytical Queries: Systems and AlgorithmsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339393636:11(5508-5528)Online publication date: 26-Apr-2024
  • (2023)MITra: A Framework for Multi-Instance Graph TraversalProceedings of the VLDB Endowment10.14778/3603581.360359416:10(2551-2564)Online publication date: 1-Jun-2023
  • Show More Cited By
  1. A general-purpose query-centric framework for querying big graphs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 9, Issue 7
    March 2016
    96 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 March 2016
    Published in PVLDB Volume 9, Issue 7

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 06 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Automating Vectorized Distributed Graph ComputationProceedings of the ACM on Management of Data10.1145/36988332:6(1-27)Online publication date: 20-Dec-2024
    • (2024)A Survey on Concurrent Processing of Graph Analytical Queries: Systems and AlgorithmsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339393636:11(5508-5528)Online publication date: 26-Apr-2024
    • (2023)MITra: A Framework for Multi-Instance Graph TraversalProceedings of the VLDB Endowment10.14778/3603581.360359416:10(2551-2564)Online publication date: 1-Jun-2023
    • (2023)TurboMGNN: Improving Concurrent GNN Training Tasks on GPU With Fine-Grained Kernel FusionIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.326794334:6(1968-1981)Online publication date: 1-Jun-2023
    • (2023)Distributed Approaches to Butterfly Analysis on Large Dynamic Bipartite GraphsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.322182134:2(431-445)Online publication date: 1-Feb-2023
    • (2023)Expressway: Prioritizing Edges for Distributed Evaluation of Graph Queries2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386860(4362-4371)Online publication date: 15-Dec-2023
    • (2023)MapReduce based convolutional graph neural networks: A comprehensive reviewPrinciples of Big Graph: In-depth Insight10.1016/bs.adcom.2021.10.002(213-231)Online publication date: 2023
    • (2022)BanyanProceedings of the VLDB Endowment10.14778/3547305.354731115:10(2045-2057)Online publication date: 1-Jun-2022
    • (2022)Efficient Distributed Approaches to Core Maintenance on Large Dynamic GraphsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309075933:1(129-143)Online publication date: 1-Jan-2022
    • (2022)SimGQ+Journal of Parallel and Distributed Computing10.1016/j.jpdc.2022.01.007164:C(12-27)Online publication date: 1-Jun-2022
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media