Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3323298.3323322guideproceedingsArticle/Chapter ViewAbstractPublication PagesfastConference Proceedingsconference-collections
Article

GRAPHONE: a data store for real-time analytics on evolving graphs

Published: 25 February 2019 Publication History

Abstract

There is a growing need to perform real-time analytics on evolving graphs in order to deliver the values of big data to users. The key requirement from such applications is to have a data store to support their diverse data access efficiently, while concurrently ingesting fine-grained updates at a high velocity. Unfortunately, current graph systems, either graph databases or analytics engines, are not designed to achieve high performance for both operations. To address this challenge, we have designed and developed GRAPHONE, a graph data store that combines two complementary graph storage formats (edge list and adjacency list), and uses dual versioning to decouple graph computations from updates. Importantly, it presents a new data abstraction, GraphView, to enable data access at two different granularities with only a small data duplication. Experimental results show that GRAPHONE achieves an ingestion rate of two to three orders of magnitude higher than graph databases, while delivering algorithmic performance comparable to a static graph system. GRAPHONE is able to deliver 5.36× higher update rate and over 3× better analytics performance compared to a state-of-the-art dynamic graph system.

References

[1]
Friendster Network Dataset - KONECT. http://konect.uni-koblenz.de/networks/friendster.
[2]
Titan Graph Database. https://github.com/thinkaurelius/titan.
[3]
Twitter (MPI) Network Dataset - KONECT. http://konect.uni-koblenz.de/networks/twitter_mpi.
[4]
Web Graphs. http://webdatacommons.org/hyperlinkgraph/2012-08/download.html.
[5]
D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, et al. The Design of the Borealis Stream Processing Engine. In Cidr, volume 5, pages 277-289, 2005.
[6]
D. J. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: A New Model and Architecture for Data Stream Management. VLDB Journal, 12(2):12039, 2007.
[7]
Z. Ai, M. Zhang, Y. Wu, X. Qian, K. Chen, and W. Zheng. Squeezing out All the Value of Loaded Data: An out-of-core Graph Processing System with Reduced Disk I/O. In 2017 USENIX Annual Technical Conference (USENIX ATC 17), (Santa Clara, CA), pages 125-137, 2017.
[8]
L. Akoglu, H. Tong, and D. Koutra. Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery, 29(3):626-688, May 2015.
[9]
R. Albert, H. Jeong, and A.-L. Barabási. Internet: Diameter of the World-Wide Web. Nature, 401(6749):130-131, Sept. 1999.
[10]
D. Anicic, P. Fodor, S. Rudolph, and N. Stojanovic. EPSPARQL: A Unified Language for Event Processing and Stream Reasoning. In Proceedings of the 20th international conference on World wide web, pages 635-644. ACM, 2011.
[11]
S. Beamer, K. Asanovic, and D. Patterson. Direction-Optimizing Breadth-First Search. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2012.
[12]
B. Bhattarai, H. Liu, and H. H. Huang. CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD '19, 2019.
[13]
U. Brandes. A faster algorithm for betweenness centrality*. Journal of Mathematical Sociology, 25(2):163-177, 2001.
[14]
N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, S. Kulkarni, H. C. Li, et al. TAO: Facebook's Distributed Data Store for the Social Graph. In USENIX Annual Technical Conference, pages 49-60, 2013.
[15]
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A Recursive Model for Graph Mining. In SDM, 2004.
[16]
R. Chen, J. Shi, Y. Chen, and H. Chen. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. In Proceedings of the Tenth European Conference on Computer Systems, 2015.
[17]
R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: Taking the Pulse of a Fast-Changing and Connected World. In Proceedings of the 7th ACM european conference on Computer Systems, 2012.
[18]
S. Choudhury, L. B. Holder, G. Chin, K. Agarwal, and J. Feo. A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs. In 18th International Conference on Extending Database Technology (EDBT), 2015.
[19]
J. Cipar, G. Ganger, K. Keeton, C. B. Morrey III, C. A. Soules, and A. Veitch. LazyBase: trading freshness for performance in a scalable database. In Proceedings of the 7th ACM european conference on Computer Systems, pages 169-182. ACM, 2012.
[20]
D. Easley and J. Kleinberg. Networks, crowds, and markets: Reasoning about a highly connected world. Cambridge University Press, 2010.
[21]
D. Ediger, R. McColl, J. Riedy, and D. A. Bader. Stinger: High Performance Data Structure for Streaming Graphs. In High Performance Extreme Computing (HPEC), 2012 IEEE Conference on, pages 1-5. IEEE, 2012.
[22]
M. Faloutsos, P. Faloutsos, and C. Faloutsos. On Power-Law Relationships of the Internet Topology. In ACM SIGCOMM computer communication review, volume 29, pages 251-262. ACM, 1999.
[23]
FlockDB. https://blog.twitter.com/engineering/en_us/a/2010/introducing-flockdb.html, 2010.
[24]
Graph Compute with Neo4j. https://neo4j.com/blog/graph-compute-neo4j-algorithms-spark-extensions/, 2016.
[25]
Graph500. http://www.graph500.org/.
[26]
W. Han, Y. Miao, K. Li, M. Wu, F. Yang, L. Zhou, V. Prabhakaran, W. Chen, and E. Chen. Chronos: A Graph Engine for Temporal Graph Analysis. In EuroSys, 2014.
[27]
W.-S. Han, S. Lee, K. Park, J.-H. Lee, M.-S. Kim, J. Kim, and H. Yu. TurboGraph: A Fast Parallel Graph Engine Handling Billion-scale Graphs in a Single PC. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 2013.
[28]
S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-Marl: a DSL for Easy and Efficient Graph Analysis. In ACM SIGARCH Computer Architecture News, 2012.
[29]
Y. Hu, P. Kumar, G. Swope, and H. H. Huang. Trix: Triangle Counting at Extreme Scale. In High Performance Extreme Computing Conference (HPEC), 2017 IEEE, pages 1-7. IEEE, 2017.
[30]
Y. Hu, H. Liu, and H. H. Huang. TriCore: Parallel Triangle Counting on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, page 14. IEEE Press, 2018.
[31]
B. A. Huberman and L. A. Adamic. Internet: Growth dynamics of the World-Wide Web. Nature, 1999.
[32]
A. P. Iyer, L. E. Li, T. Das, and I. Stoica. Time-Evolving Graph Processing at Scale. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, page 5. ACM, 2016.
[33]
H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L. Barabási. The large-scale organization of metabolic networks. Nature, 2000.
[34]
Y. Ji, H. Liu, and H. H. Huang. iSpan: Parallel Identification of Strongly Connected Components with Spanning Trees. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, page 58. IEEE Press, 2018.
[35]
X. Ju, D. Williams, H. Jamjoom, and K. G. Shin. Version Traveler: Fast and Memory-Efficient Version Switching in Graph Processing Systems. In 2016 USENIX Annual Technical Conference (USENIX ATC 16), pages 523-536. USENIX Association, 2016.
[36]
U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos. GBASE: A Scalable and General Graph Management System. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011.
[37]
A. D. Kent. Comprehensive, Multi-Source Cyber-Security Events. Los Alamos National Laboratory, 2015.
[38]
A. D. Kent, L. M. Liebrock, and J. C. Neil. Authentication graphs: Analyzing user behavior within an enterprise network. Computers & Security, 48:150-166, 2015.
[39]
U. Khurana and A. Deshpande. Efficient Snapshot Retrieval over Historical Graph Data. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pages 997-1008. IEEE, 2013.
[40]
K. Kim, T. Wang, R. Johnson, and I. Pandis. Ermia: Fast Memory-Optimized Database System for Heterogeneous Workloads. In Proceedings of the 2016 International Conference on Management of Data, pages 1675-1687. ACM, 2016.
[41]
D. Knoke and S. Yang. Social network analysis, volume 154. Sage, 2008.
[42]
J. Kreps, N. Narkhede, J. Rao, et al. Kafka: a Distributed Messaging System for Log Processing. In Proceedings of the NetDB, pages 1-7, 2011.
[43]
P. Kumar and H. H. Huang. G-Store: High-Performance Graph Store for Trillion-Edge Processing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2016.
[44]
P. Kumar and H. H. Huang. Falcon: Scaling IO Performance in Multi-SSD Volumes. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). USENIX Association, pages 41-53, 2017.
[45]
P. Kumar and H. H. Huang. SafeNVM: A Non-Volatile Memory Store with Thread-Level Page Protection. In IEEE International Congress on Big Data (BigData Congress), 2017, pages 65-72. IEEE, 2017.
[46]
H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In WWW, 2010.
[47]
A. Kyrola, G. Blelloch, and C. Guestrin. Graphchi: Large-Scale Graph Computation on Just a PC. In OSDI, 2012.
[48]
J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 177-187. ACM, 2005.
[49]
H. Liu and H. H. Huang. Enterprise: Breadth-First Graph Traversal on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2015.
[50]
H. Liu and H. H. Huang. Graphene: Fine-Grained IO Management for Graph Computing. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST), 2017.
[51]
H. Liu, H. H. Huang, and Y. Hu. iBFS: Concurrent Breadth-First Search on GPUs. In Proceedings of the SIGMOD International Conference on Management of Data, 2016.
[52]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. Proceedings of the VLDB Endowment (VLDB), 2012.
[53]
S. Maass, C. Min, S. Kashyap, W. Kang, M. Kumar, and T. Kim. Mosaic: Processing a Trillion-Edge Graph on a Single Machine. In Proceedings of the Twelfth European Conference on Computer Systems, EuroSys '17, 2017.
[54]
P. Macko, V. J. Marathe, D. W. Margo, and M. I. Seltzer. LLAMA: Efficient Graph Analytics Using Large Multiversioned Arrays. In Data Engineering (ICDE), 2015 IEEE 31st International Conference on, pages 363-374. IEEE, 2015.
[55]
J. Malicevic, B. Lepers, and W. Zwaenepoel. Everything you always wanted to know about multicore graph processing but were afraid to ask. In 2017 USENIX Annual Technical Conference (USENIX ATC 17), pages 631-643, Santa Clara, CA, 2017. USENIX Association.
[56]
R. C. McColl, D. Ediger, J. Poovey, D. Campbell, and D. A. Bader. A Performance Evaluation of Open Source Graph Databases. In Proceedings of the First Workshop on Parallel Programming for Analytics Applications, PPAA '14, pages 11-18, New York, NY, USA, 2014. ACM.
[57]
F. McSherry, M. Isard, and D. G. Murray. Scalability! But at what COST? In 15th Workshop on Hot Topics in Operating Systems (HotOS XV), 2015.
[58]
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A Timely Dataflow System. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP), 2013.
[59]
Neo4j Inc. https://neo4j.com/, 2016.
[60]
D. Nguyen, A. Lenharth, and K. Pingali. A Lightweight Infrastructure for Graph Analytics. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), 2013.
[61]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: bringing order to the Web. 1999.
[62]
C. Ren, E. Lo, B. Kao, X. Zhu, and R. Cheng. On Querying Historical Evolving Graph Sequences. Proceedings of the VLDB Endowment, 4(11):726-737, 2011.
[63]
I. Robinson, J. Webber, and E. Eifrem. Graph Databases. O'Reilly Media, 2013.
[64]
D. M. Romero, B. Meeder, and J. Kleinberg. Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter. In Proceedings of the 20th international conference on World wide web, pages 695-704. ACM, 2011.
[65]
A. Roy, L. Bindschaedler, J. Malicevic, and W. Zwaenepoel. Chaos: Scale-out Graph Processing from Secondary Storage. In SOSP. ACM, 2015.
[66]
A. Roy, I. Mihailovic, and W. Zwaenepoel. X-stream: Edge-centric Graph Processing using Streaming Partitions. In SOSP. ACM, 2013.
[67]
S. Sallinen, K. Iwabuchi, S. Poudel, M. Gokhale, M. Ripeanu, and R. Pearce. Graph Colouring as a Challenge Problem for Dynamic Graph Processing on Distributed Systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, page 30. IEEE Press, 2016.
[68]
J. Seo, J. Park, J. Shin, and M. S. Lam. Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis. Proceedings of the VLDB Endowment, 6(14):1906-1917, 2013.
[69]
M. Sevenich, S. Hong, O. van Rest, Z. Wu, J. Banerjee, and H. Chafi. Using Domain-specific Languages for Analytic Graph Databases. Proc. VLDB Endow., 9(13):1257-1268, Sept. 2016.
[70]
B. Shao, H. Wang, and Y. Li. Trinity: A Distributed Graph Engine on a Memory Cloud. In Proceedings of the SIGMOD International Conference on Management of Data, 2013.
[71]
F. Sheng, Q. Cao, H. Cai, J. Yao, and C. Xie. GraPU: Accelerate Streaming Graph Analysis Through Preprocessing Buffered Updates. In Proceedings of the ACM Symposium on Cloud Computing, SoCC '18, 2018.
[72]
X. Shi, B. Cui, Y. Shao, and Y. Tong. Tornado: A System For Real-Time Iterative Analysis Over Evolving Data. In Proceedings of the 2016 International Conference on Management of Data, pages 417-430. ACM, 2016.
[73]
J. Shun and G. E. Blelloch. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP), 2013.
[74]
M. J. M. Turcotte, A. D. Kent, and C. Hash. Unified Host and Network Data Set. ArXiv e-prints, Aug. 2017.
[75]
K. Vora, R. Gupta, and G. Xu. Kickstarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, pages 237-251. ACM, 2017.
[76]
M. Wu, F. Yang, J. Xue, W. Xiao, Y. Miao, L. Wei, H. Lin, Y. Dai, and L. Zhou. GRAM: Scaling Graph Computation to the Trillions. In Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015.
[77]
W. Xie, Y. Tian, Y. Sismanis, A. Balmin, and P. J. Haas. Dynamic interaction graphs with probabilistic edge decay. In Data Engineering (ICDE), 2015 IEEE 31st International Conference on, pages 1143-1154. IEEE, 2015.
[78]
Y. Zhang, R. Chen, and H. Chen. Sub-millisecond Stateful Stream Querying over Fast-evolving Linked Data. In Proceedings of the 26th Symposium on Operating Systems Principles, pages 614-630. ACM, 2017.
[79]
D. Zheng, D. Mhembere, R. Burns, J. Vogelstein, C. E. Priebe, and A. S. Szalay. FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST), 2015.
[80]
X. Zhu, W. Chen, W. Zheng, and X. Ma. Gemini: A Computation-Centric Distributed Graph Processing System. In OSDI, pages 301-316, 2016.
[81]
X. Zhu, W. Han, and W. Chen. GridGraph: Large-scale Graph Processing on a Single Machine Using 2-level Hierarchical Partitioning. In Proceedings of the USENIX Conference on Usenix Annual Technical Conference, 2015.

Cited By

View all
  • (2024)LSGraph: A Locality-centric High-performance Streaming Graph EngineProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650076(33-49)Online publication date: 22-Apr-2024
  • (2021)Symmetric continuous subgraph matching with bidirectional dynamic programmingProceedings of the VLDB Endowment10.14778/3457390.345739514:8(1298-1310)Online publication date: 21-Oct-2021
  • (2021)Teseo and the analysis of structural dynamic graphsProceedings of the VLDB Endowment10.14778/3447689.344770814:6(1053-1066)Online publication date: 12-Apr-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
FAST'19: Proceedings of the 17th USENIX Conference on File and Storage Technologies
February 2019
373 pages
ISBN:9781931971485

Sponsors

  • VMware
  • NetApp
  • amazon: amazon
  • Google Inc.
  • NSF

Publisher

USENIX Association

United States

Publication History

Published: 25 February 2019

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LSGraph: A Locality-centric High-performance Streaming Graph EngineProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650076(33-49)Online publication date: 22-Apr-2024
  • (2021)Symmetric continuous subgraph matching with bidirectional dynamic programmingProceedings of the VLDB Endowment10.14778/3457390.345739514:8(1298-1310)Online publication date: 21-Oct-2021
  • (2021)Teseo and the analysis of structural dynamic graphsProceedings of the VLDB Endowment10.14778/3447689.344770814:6(1053-1066)Online publication date: 12-Apr-2021
  • (2021)Towards Next-Generation Cybersecurity with Graph AIACM SIGOPS Operating Systems Review10.1145/3469379.346938655:1(61-67)Online publication date: 6-Jun-2021
  • (2021)SpZipProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00087(1069-1082)Online publication date: 14-Jun-2021
  • (2019)SIMD-XProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358843(411-427)Online publication date: 10-Jul-2019
  • (2019)CGraphACM Transactions on Storage10.1145/331940615:2(1-26)Online publication date: 20-Apr-2019
  • (2019)GraphMProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356143(1-14)Online publication date: 17-Nov-2019

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media