Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2882903.2882938acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing

Published: 14 June 2016 Publication History

Abstract

Billion-node graphs are rapidly growing in size in many applications such as online social networks. Most graph algorithms generate a large number of messages during iterative computations. Vertex-centric distributed systems usually store graph data and message data on disk to improve scalability. Currently, these distributed systems with disk-resident data take a push-based approach to handle messages. This works well if few messages reside on disk. Otherwise, it is I/O-inefficient due to expensive random writes. By contrast, the existing memory-resident pull-based approach individually pulls messages for each vertex on demand. Although it can be used to avoid disk operations regarding messages, expensive I/O costs are incurred by random and frequent access to vertices.
This paper proposes a hybrid solution to support switching between push and pull adaptively, to obtain optimal performance for distributed systems with disk-resident data in different scenarios. We first employ a new block-centric technique (b-pull) to improve the I/O-performance of pulling messages, although the iterative computation is vertex-centric. I/O costs of data accesses are shifted from the receiver side where messages are written/read by push to the sender side where graph data are read by b-pull. Graph data are organized by clustering vertices and edges to achieve high I/O-efficiency in b-pull. Second, we design a seamless switching mechanism and a prominent performance prediction method to guarantee efficiency when switching between push and b-pull. We conduct extensive performance studies to confirm the effectiveness of our proposals over existing up-to-date solutions using a broad spectrum of real-world graphs.

References

[1]
Faunus. http://thinkaurelius.github.io/faunus/.
[2]
Giraph. http://giraph.apache.org/.
[3]
Hama. https://hama.apache.org/.
[4]
Y. Bu, V. Borkar, J. Jia, M. J. Carey, and T. Condie. Pregelix: Big(ger) graph analytics on a dataflow engine. Proc. of the VLDB Endowment, 8(2):161--172, 2014.
[5]
Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. Haloop: efficient iterative data processing on large clusters. Proc. of the VLDB Endowment, 3(1--2):285--296, 2010.
[6]
R. Chen, X. Weng, B. He, and M. Yang. Large graph processing in the cloud. In Proc. of SIGMOD, pages 1123--1126. ACM, 2010.
[7]
R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: taking the pulse of a fast-changing and connected world. In Proc. of EuroSys, pages 85--98. ACM, 2012.
[8]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Proc. of OSDI, volume 12, page 2, 2012.
[9]
J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. Graphx: Graph processing in a distributed dataflow framework. In Proc. of OSDI, pages 599--613, 2014.
[10]
W. Hant, Y. Miao, K. Li, M. Wu, F. Yang, L. Zhou, V. Prabhakaran, W. Chen, and E. Chen. Chronos: a graph engine for temporal graph analysis. In Proc. of EuroSys, page 1. ACM, 2014.
[11]
L.-Y. Ho, T.-H. Li, J.-J. Wu, and P. Liu. Kylin: An efficient and scalable graph data processing system. In Proc. of IEEE BigData, pages 193--198. IEEE, 2013.
[12]
I. Hoque and I. Gupta. Lfgraph: Simple and fast distributed graph analytics. In Proc. of the First ACM SIGOPS Conference on Timely Results in Operating Systems, page 9. ACM, 2013.
[13]
U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos. Gbase: a scalable and general graph management system. In Proc. of SIGKDD, pages 1091--1099. ACM, 2011.
[14]
U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In ICDM, pages 229--238. IEEE, 2009.
[15]
Z. Khayyat, K. Awara, A. Alonazi, H. Jamjoom, D. Williams, and P. Kalnis. Mizan: a system for dynamic load balancing in large-scale graph processing. In Proc. of Eurosys, pages 169--182. ACM, 2013.
[16]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc. of the VLDB Endowment, 5(8):716--727, 2012.
[17]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proc. of SIGMOD, pages 135--146. ACM, 2010.
[18]
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In Proc. of SOSP, pages 439--455. ACM, 2013.
[19]
U. N. Raghavan, R. Albert, and S. Kumara. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3):036106, 2007.
[20]
S. Salihoglu and J. Widom. Gps: A graph processing system. In Proc. of SSDBM, page 22. ACM, 2013.
[21]
Z. Shang and J. X. Yu. Catch the wind: Graph workload balancing on cloud. In Proc. of ICDE, pages 553--564. IEEE, 2013.
[22]
B. Shao, H. Wang, and Y. Li. Trinity: A distributed graph engine on a memory cloud. In Proc. of SIGMOD, pages 505--516. ACM, 2013.
[23]
I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In Proc. of SIGKDD, pages 1222--1230. ACM, 2012.
[24]
S. Tasci and M. Demirbas. Giraphx: parallel yet serializable large-scale graph processing. In Euro-Par 2013 Parallel Processing, pages 458--469. Springer, 2013.
[25]
Y. Tian, A. Balmin, S. A. Corsten, S. Tatikonda, and J. McPherson. From" think like a vertex "to" think like a graph. Proc. of the VLDB Endowment, 7(3):193--204, 2013.
[26]
L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103--111, 1990.
[27]
D. Yan, J. Cheng, Y. Lu, and W. Ng. Blogel: A block-centric framework for distributed computation on real-world graphs. Proc. of the VLDB Endowment, 7(14):1981--1992, 2014.
[28]
D. Yan, J. Cheng, Y. Lu, and W. Ng. Effective techniques for message reduction and load balancing in distributed graph computation. In Proc. of WWW, pages 1307--1317, 2015.
[29]
J. Yan, G. Tan, and N. Sun. Gre: A graph runtime engine for large-scale distributed graph-parallel applications. arXiv preprint arXiv:1310.5603, 2013.
[30]
Z. Yang, J. Xue, Z. Qu, S. Hou, and Y. Dai. Seraph: An efficient system for parallel processing on a shared graph, 2013.
[31]
J. Yin and L. Gao. Scalable distributed belief propagation with prioritized block updates. In Proc. of CIKM, pages 1209--1218, 2014.
[32]
C. Zhou, J. Gao, B. Sun, and J. X. Yu. Mocgraph: Scalable distributed graph processing using message online computing. Proc. of the VLDB Endowment, 8(4):377--388, 2014.

Cited By

View all
  • (2024)I/O-Efficient Multi-Criteria Shortest Paths Query Processing on Large GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338690636:11(6430-6446)Online publication date: Nov-2024
  • (2023)Lightweight Streaming Graph Partitioning by Fully Utilizing Knowledge from Local View2023 IEEE 43rd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS57875.2023.00079(614-625)Online publication date: Jul-2023
  • (2022)DGIC: A Distributed Graph Inference Computing Framework Suitable For Encoder-Decoder GNNProceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence10.1145/3529466.3529493(148-153)Online publication date: 4-Mar-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. I/O-efficient
  2. distributed graph computing
  3. pull
  4. push

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)3
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)I/O-Efficient Multi-Criteria Shortest Paths Query Processing on Large GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338690636:11(6430-6446)Online publication date: Nov-2024
  • (2023)Lightweight Streaming Graph Partitioning by Fully Utilizing Knowledge from Local View2023 IEEE 43rd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS57875.2023.00079(614-625)Online publication date: Jul-2023
  • (2022)DGIC: A Distributed Graph Inference Computing Framework Suitable For Encoder-Decoder GNNProceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence10.1145/3529466.3529493(148-153)Online publication date: 4-Mar-2022
  • (2022)Parallel Query Processing: To Separate Communication from ComputationProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526164(1447-1461)Online publication date: 10-Jun-2022
  • (2022)GGraph: An Efficient Structure-Aware Approach for Iterative Graph ProcessingIEEE Transactions on Big Data10.1109/TBDATA.2020.30196418:5(1182-1194)Online publication date: 1-Oct-2022
  • (2022)EC-Graph: A Distributed Graph Neural Network System with Error-Compensated Compression2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00053(648-660)Online publication date: May-2022
  • (2022)DRPS: efficient disk-resident parameter servers for distributed machine learningFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-021-0445-216:4Online publication date: 1-Aug-2022
  • (2021)DFOGraphProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441622(474-476)Online publication date: 17-Feb-2021
  • (2021)ScaleG: A Distributed Disk-based System for Vertex-centric Graph ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3101057(1-1)Online publication date: 2021
  • (2021)Distributed Graph Processing: Techniques and SystemsWeb and Big Data. APWeb-WAIM 2020 International Workshops10.1007/978-981-16-0479-9_2(14-23)Online publication date: 1-Apr-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media