Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3035918.3064012acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

ZipG: A Memory-efficient Graph Store for Interactive Queries

Published: 09 May 2017 Publication History

Abstract

We present ZipG, a distributed memory-efficient graph store for serving interactive graph queries. ZipG achieves memory efficiency by storing the input graph data using a compressed representation. What differentiates ZipG from other graph stores is its ability to execute a wide range of graph queries directly on this compressed representation. ZipG can thus execute a larger fraction of queries in main memory, achieving query interactivity. ZipG exposes a minimal API that is functionally rich enough to implement published functionalities from several industrial graph stores. We demonstrate this by implementing and evaluating graph queries from Facebook TAO, LinkBench, Graph Search and several other workloads on top of ZipG. On a single server with 244GB memory, ZipG executes tens of thousands of queries from these workloads for raw graph data over half a TB; this leads to an order of magnitude (sometimes as much as 23×) higher throughput than Neo4j and Titan. We get similar gains in distributed settings compared to Titan.

References

[1]
Breadth First Search. https://en.wikipedia.org/wiki/Breadth-first_search.
[2]
Building a follower model from scratch. https://engineering.pinterest.com/blog/building-follower-model-scratch.
[3]
Demining the "Join Bomb" with graph queries. http://neo4j.com/blog/demining-the-join-bomb-with-graph-queries/.
[4]
FlockDB. https://github.com/twitter/flockdb.
[5]
Function Shipping: Separating Logical and Physical Tiers. https://docs.oracle.com/cd/A87860_01/doc/appdev.817/a86030/adx16nt4.htm.
[6]
gMark Queries for LDBC Social Network Benchmark. https://github.com/graphMark/gmark/tree/master/demo/social/social-translated.
[7]
Introducing FlockDB. https://blog.twitter.com/2010/introducing-flockdb.
[8]
Introducing Graph Search Beta. http://newsroom.fb.com/news/2013/01/introducing-graph-search-beta/.
[9]
LinkBench. https://github.com/facebookarchive/linkbench.
[10]
Microsoft GraphView. https://github.com/facebookarchive/linkbench.
[11]
Neo4j. http://neo4j.com/.
[12]
Neo4j Pushes Graph DB Limits Past a Quadrillion Nodes. https://www.datanami.com/2016/04/26/neo4j-pushes-graph-db-limits-past-quadrillion-nodes/.
[13]
openCypher. http://www.opencypher.org.
[14]
OrientDB. http://orientdb.com/.
[15]
Property Graph Model. https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model.
[16]
Sparksee by Sparsity Technologies. http://www.sparsity-technologies.com.
[17]
Suffix Array. http://en.wikipedia.org/wiki/Suffix_array.
[18]
Titan. http://thinkaurelius.github.io/titan/.
[19]
Titan Data Model. http://s3.thinkaurelius.com/docs/titan/current/data-model.html.
[20]
Virtuoso Universal Server. http://virtuoso.openlinksw.com.
[21]
R. Agarwal, A. Khandelwal, and I. Stoica. Succinct: Enabling Queries on Compressed Data. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2015.
[22]
A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola. Distributed Large-scale Natural Graph Factorization. In ACM International Conference on World Wide Web (WWW), 2013.
[23]
A. Anand, C. Muthukrishnan, S. Kappes, A. Akella, and S. Nath. Cheap and large cams for high performance data-intensive networked systems. In NSDI, 2010.
[24]
T. G. Armstrong, V. Ponnekanti, D. Borthakur, and M. Callaghan. Linkbench: A database benchmark based on the facebook social graph. In ACM International Conference on Management of Data (SIGMOD), 2013.
[25]
G. Bagan, A. Bonifati, R. Ciucanu, G. H. L. Fletcher, A. Lemay, and N. Advokaat. Generating flexible workloads for graph databases. Proceedings of the VLDB Endowment (PVLDB), 9(13):1457--1460, 2016.
[26]
P. Barceló Baeza. Querying graph databases. In ACM Symposium on Principles of Database Systems (PODS), pages 175--188, 2013.
[27]
Bharat, Krishna and Broder, Andrei and Henzinger, Monika and Kumar, Puneet and Venkatasubramanian, Suresh. The connectivity server: Fast access to linkage information on the web. Computer networks and ISDN Systems, 30, 1998.
[28]
P. Boldi and S. Vigna. The Webgraph Framework I: Compression Techniques. In International Conference on World Wide Web (WWW), 2004.
[29]
N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, S. Kulkarni, H. Li, et al. TAO: Facebook\textquoterights Distributed Data Store for the Social Graph. In USENIX Annual Technical Conference (ATC), 2013.
[30]
D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Y. Vardi. Rewriting of regular expressions and regular path queries. In ACM Symposium on Principles of Database Systems (PODS), pages 194--204, 1999.
[31]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2):4, 2008.
[32]
F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On Compressing Social Networks. In ACM International Conference on Knowledge Discovery and Data Mining (KDD), 2009.
[33]
I. F. Cruz, A. O. Mendelzon, and P. T. Wood. A graphical query language supporting recursion. In ACM International Conference on Management of Data (SIGMOD), pages 323--330, 1987.
[34]
C. Curino, E. Jones, Y. Zhang, and S. Madden. Schism: a Workload-Driven Approach to Database Replication and Partitioning. Proceedings of the VLDB Endowment, 2010.
[35]
A. Dubey, G. D. Hill, R. Escriva, and E. G. Sirer. Weaver: A High-Performance, Transactional Graph Store Based on Refinable Timestamps. CoRR, abs/1509.08443, 2015.
[36]
O. Erling, A. Averbuch, J. Larriba-Pey, H. Chafi, A. Gubichev, A. Prat, M.-D. Pham, and P. Boncz. The ldbc social network benchmark: Interactive workload. In ACM International Conference on Management of Data (SIGMOD), pages 619--630, 2015.
[37]
W. Fan, J. Li, X. Wang, and Y. Wu. Query Preserving Graph Compression. In ACM International Conference on Management of Data (SIGMOD), 2012.
[38]
J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. GraphX: Graph Processing in a Distributed Dataflow Framework. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2014.
[39]
C. Hernández and G. Navarro. Compression of Web and Social Graphs supporting Neighbor and Community Queries. In ACM Workshop on Social Network mining and Analysis (SNAKDD), 2011.
[40]
C. Hernández and G. Navarro. Compressed Representations for Web and Social Graphs. Knowledge and Information Systems, 40(2), 2014.
[41]
Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data, 2014.
[42]
A. Khandelwal, R. Agarwal, and I. Stoica. BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2016.
[43]
A. Kyrola, G. E. Blelloch, and C. Guestrin. GraphChi: Large-Scale Graph Computation on Just a PC. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2012.
[44]
Lakshman, Avinash and Malik, Prashant. Cassandra: A Decentralized Structured Storage System. ACM SIGOPS Operating Systems Review, 44, 2010.
[45]
K. Lang. Finding good nearly balanced cuts in power law graphs. Preprint, 2004.
[46]
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics, 2009.
[47]
L. Libkin and D. Vrgo\vc. Regular path queries on graphs with data. In ACM International Conference on Database Theory (ICDT), pages 74--85, 2012.
[48]
Y. Low, J. E. Gonzalez, A. Kyrola, D. Bickson, C. E. Guestrin, and J. Hellerstein. GraphLab: A New Framework For Parallel Machine Learning. arXiv preprint arXiv:1408.2041, 2014.
[49]
A. Maccioni and D. J. Abadi. Scalable pattern matching over compressed graphs via dedensification. In ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2016.
[50]
U. Manber and G. Myers. Suffix Arrays: A new method for on-line string searches. SIAM Journal on Computing (SICOMP), 1993.
[51]
N. Martinez-Bazan, S. Gómez-Villamor, and F. Escalé-Claveras. DEX: A high-performance graph database management system. In IEEE Data Engineering Workshops (ICDEW), 2011.
[52]
H. Maserrat and J. Pei. Neighbor Query Friendly Compression of Social Networks. In ACM International Conference on Knowledge Discovery and Data Mining (KDD), 2010.
[53]
J. C. Mogul, F. Douglis, A. Feldmann, and B. Krishnamurthy. Potential benefits of delta encoding and data compression for HTTP. In ACM SIGCOMM Computer Communication Review, 1997.
[54]
B. Shao, H. Wang, and Y. Li. Trinity: A Distributed Graph Engine on a Memory Cloud. In ACM International Conference on Management of Data (SIGMOD). ACM, 2013.
[55]
J. Shun and G. E. Blelloch. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoP), 2013.
[56]
J. Shun, L. Dhulipala, and G. Blelloch. Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra
[57]
. In IEEE Data Compression Conference (DCC), 2015.
[58]
Y. J. Song, M. K. Aguilera, R. Kotla, and D. Malkhi. RPC Chains: Efficient Client-server Communication in Geodistributed Systems. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2009.
[59]
R. Wang, C. Conrad, and S. Shah. Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph. In Workshop on Hot Topics in Cloud Computing (HotCloud), 2013.

Cited By

View all
  • (2024)LSGraph: A Locality-centric High-performance Streaming Graph EngineProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650076(33-49)Online publication date: 22-Apr-2024
  • (2024)PDG: A Prefetcher for Dynamic Graph UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333588043:4(1246-1259)Online publication date: Apr-2024
  • (2024)PlatoD2GL: An Efficient Dynamic Deep Graph Learning System for Graph Neural Network Training on Billion-Scale Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00191(2421-2434)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
May 2017
1810 pages
ISBN:9781450341974
DOI:10.1145/3035918
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph compression
  2. graph store
  3. interactive graph queries

Qualifiers

  • Research-article

Funding Sources

  • DHS Award
  • NSF CISE Expeditions Award

Conference

SIGMOD/PODS'17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)4
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LSGraph: A Locality-centric High-performance Streaming Graph EngineProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650076(33-49)Online publication date: 22-Apr-2024
  • (2024)PDG: A Prefetcher for Dynamic Graph UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333588043:4(1246-1259)Online publication date: Apr-2024
  • (2024)PlatoD2GL: An Efficient Dynamic Deep Graph Learning System for Graph Neural Network Training on Billion-Scale Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00191(2421-2434)Online publication date: 13-May-2024
  • (2024)Lossy Compression of Adjacency Matrices by Graph Filter BanksICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448045(9386-9390)Online publication date: 14-Apr-2024
  • (2023)DomainNet: Homograph Detection and Understanding in Data Lake DisambiguationACM Transactions on Database Systems10.1145/361291948:3(1-40)Online publication date: 12-Sep-2023
  • (2023)CompressGraph: Efficient Parallel Graph Analytics with Rule-Based CompressionProceedings of the ACM on Management of Data10.1145/35886841:1(1-31)Online publication date: 30-May-2023
  • (2023)Enabling Efficient Random Access to Hierarchically Compressed Text Data on Diverse GPU PlatformsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329434134:10(2699-2717)Online publication date: Oct-2023
  • (2023)Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.313167734:6(1860-1876)Online publication date: 1-Jun-2023
  • (2022)Optimizing random access to hierarchically-compressed data on GPUProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571908(1-15)Online publication date: 13-Nov-2022
  • (2022)Programmable multi-dimensional table filters for line rate network functionsProceedings of the ACM SIGCOMM 2022 Conference10.1145/3544216.3544266(649-662)Online publication date: 22-Aug-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media