research-article

Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing

Authors:

Jeffrey Xu YuAuthors Info & Claims

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

Pages 479 - 494

https://doi.org/10.1145/2882903.2882938

Published: 14 June 2016 Publication History

Abstract

Billion-node graphs are rapidly growing in size in many applications such as online social networks. Most graph algorithms generate a large number of messages during iterative computations. Vertex-centric distributed systems usually store graph data and message data on disk to improve scalability. Currently, these distributed systems with disk-resident data take a push-based approach to handle messages. This works well if few messages reside on disk. Otherwise, it is I/O-inefficient due to expensive random writes. By contrast, the existing memory-resident pull-based approach individually pulls messages for each vertex on demand. Although it can be used to avoid disk operations regarding messages, expensive I/O costs are incurred by random and frequent access to vertices.

This paper proposes a hybrid solution to support switching between push and pull adaptively, to obtain optimal performance for distributed systems with disk-resident data in different scenarios. We first employ a new block-centric technique (b-pull) to improve the I/O-performance of pulling messages, although the iterative computation is vertex-centric. I/O costs of data accesses are shifted from the receiver side where messages are written/read by push to the sender side where graph data are read by b-pull. Graph data are organized by clustering vertices and edges to achieve high I/O-efficiency in b-pull. Second, we design a seamless switching mechanism and a prominent performance prediction method to guarantee efficiency when switching between push and b-pull. We conduct extensive performance studies to confirm the effectiveness of our proposals over existing up-to-date solutions using a broad spectrum of real-world graphs.

References

[1]

Faunus. http://thinkaurelius.github.io/faunus/.

[2]

Giraph. http://giraph.apache.org/.

[3]

Hama. https://hama.apache.org/.

[4]

Y. Bu, V. Borkar, J. Jia, M. J. Carey, and T. Condie. Pregelix: Big(ger) graph analytics on a dataflow engine. Proc. of the VLDB Endowment, 8(2):161--172, 2014.

Digital Library

[5]

Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. Haloop: efficient iterative data processing on large clusters. Proc. of the VLDB Endowment, 3(1--2):285--296, 2010.

Digital Library

[6]

R. Chen, X. Weng, B. He, and M. Yang. Large graph processing in the cloud. In Proc. of SIGMOD, pages 1123--1126. ACM, 2010.

Digital Library

[7]

R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: taking the pulse of a fast-changing and connected world. In Proc. of EuroSys, pages 85--98. ACM, 2012.

Digital Library

[8]

J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Proc. of OSDI, volume 12, page 2, 2012.

Digital Library

[9]

J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. Graphx: Graph processing in a distributed dataflow framework. In Proc. of OSDI, pages 599--613, 2014.

Digital Library

[10]

W. Hant, Y. Miao, K. Li, M. Wu, F. Yang, L. Zhou, V. Prabhakaran, W. Chen, and E. Chen. Chronos: a graph engine for temporal graph analysis. In Proc. of EuroSys, page 1. ACM, 2014.

Digital Library

[11]

L.-Y. Ho, T.-H. Li, J.-J. Wu, and P. Liu. Kylin: An efficient and scalable graph data processing system. In Proc. of IEEE BigData, pages 193--198. IEEE, 2013.

[12]

I. Hoque and I. Gupta. Lfgraph: Simple and fast distributed graph analytics. In Proc. of the First ACM SIGOPS Conference on Timely Results in Operating Systems, page 9. ACM, 2013.

Digital Library

[13]

U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos. Gbase: a scalable and general graph management system. In Proc. of SIGKDD, pages 1091--1099. ACM, 2011.

Digital Library

[14]

U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In ICDM, pages 229--238. IEEE, 2009.

Digital Library

[15]

Z. Khayyat, K. Awara, A. Alonazi, H. Jamjoom, D. Williams, and P. Kalnis. Mizan: a system for dynamic load balancing in large-scale graph processing. In Proc. of Eurosys, pages 169--182. ACM, 2013.

Digital Library

[16]

Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc. of the VLDB Endowment, 5(8):716--727, 2012.

Digital Library

[17]

G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proc. of SIGMOD, pages 135--146. ACM, 2010.

Digital Library

[18]

D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In Proc. of SOSP, pages 439--455. ACM, 2013.

Digital Library

[19]

U. N. Raghavan, R. Albert, and S. Kumara. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3):036106, 2007.

[20]

S. Salihoglu and J. Widom. Gps: A graph processing system. In Proc. of SSDBM, page 22. ACM, 2013.

Digital Library

[21]

Z. Shang and J. X. Yu. Catch the wind: Graph workload balancing on cloud. In Proc. of ICDE, pages 553--564. IEEE, 2013.

Digital Library

[22]

B. Shao, H. Wang, and Y. Li. Trinity: A distributed graph engine on a memory cloud. In Proc. of SIGMOD, pages 505--516. ACM, 2013.

Digital Library

[23]

I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In Proc. of SIGKDD, pages 1222--1230. ACM, 2012.

Digital Library

[24]

S. Tasci and M. Demirbas. Giraphx: parallel yet serializable large-scale graph processing. In Euro-Par 2013 Parallel Processing, pages 458--469. Springer, 2013.

Digital Library

[25]

Y. Tian, A. Balmin, S. A. Corsten, S. Tatikonda, and J. McPherson. From" think like a vertex "to" think like a graph. Proc. of the VLDB Endowment, 7(3):193--204, 2013.

Digital Library

[26]

L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103--111, 1990.

Digital Library

[27]

D. Yan, J. Cheng, Y. Lu, and W. Ng. Blogel: A block-centric framework for distributed computation on real-world graphs. Proc. of the VLDB Endowment, 7(14):1981--1992, 2014.

Digital Library

[28]

D. Yan, J. Cheng, Y. Lu, and W. Ng. Effective techniques for message reduction and load balancing in distributed graph computation. In Proc. of WWW, pages 1307--1317, 2015.

Digital Library

[29]

J. Yan, G. Tan, and N. Sun. Gre: A graph runtime engine for large-scale distributed graph-parallel applications. arXiv preprint arXiv:1310.5603, 2013.

[30]

Z. Yang, J. Xue, Z. Qu, S. Hou, and Y. Dai. Seraph: An efficient system for parallel processing on a shared graph, 2013.

[31]

J. Yin and L. Gao. Scalable distributed belief propagation with prioritized block updates. In Proc. of CIKM, pages 1209--1218, 2014.

Digital Library

[32]

C. Zhou, J. Gao, B. Sun, and J. X. Yu. Mocgraph: Scalable distributed graph processing using message online computing. Proc. of the VLDB Endowment, 8(4):377--388, 2014.

Digital Library

Cited By

Zhou XHuang KLi LZhang MZhou X(2024)I/O-Efficient Multi-Criteria Shortest Paths Query Processing on Large GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338690636:11(6430-6446)Online publication date: Nov-2024
https://doi.org/10.1109/TKDE.2024.3386906
Wang ZYang ZWang NDu YNie JWei ZGu YYu G(2023)Lightweight Streaming Graph Partitioning by Fully Utilizing Knowledge from Local View2023 IEEE 43rd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS57875.2023.00079(614-625)Online publication date: Jul-2023
https://doi.org/10.1109/ICDCS57875.2023.00079
Pan ZChang J(2022)DGIC: A Distributed Graph Inference Computing Framework Suitable For Encoder-Decoder GNNProceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence10.1145/3529466.3529493(148-153)Online publication date: 4-Mar-2022
https://dl.acm.org/doi/10.1145/3529466.3529493
Show More Cited By

Index Terms

Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing

Recommendations

Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation
WWW '15: Proceedings of the 24th International Conference on World Wide Web

Massive graphs, such as online social networks and communication networks, have become common today. To efficiently analyze such large graphs, many distributed graph computing systems have been developed. These systems employ the "think like a vertex" ...
Reliable message delivery for mobile agents: push or pull?

Two of the fundamental issues in designing protocols for message passing between mobile agents (MAs) are tracking the migration of the target agent and forwarding messages to it. Even with an ideal fault-free network-transport mechanism, messages can be ...
LOHD: Location-Oblivious Hybrid data Diffusion in wireless sensor networks

Data-centric design has been widely adopted in wireless sensor networks thanks to its efficiency, as PUSH and PULL are two common data dissemination algorithms for such networks. The two algorithms work well with only a few sources or a few sinks, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

June 2016

2300 pages

ISBN:9781450335317

DOI:10.1145/2882903

General Chairs:
Fatma Özcan
IBM Research, USA
,
Georgia Koutrika
HP Labs, USA
,
Program Chair:
Sam Madden
Massachusetts Institute of Technology, USA

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Research Grants Council of the Hong Kong SAR
China Scholarship Council
the National Basic Research Program of China (973 Program)

Conference

SIGMOD/PODS'16

Sponsor:

SIGMOD

SIGMOD/PODS'16: International Conference on Management of Data

June 26 - July 1, 2016

California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
896
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)3

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou XHuang KLi LZhang MZhou X(2024)I/O-Efficient Multi-Criteria Shortest Paths Query Processing on Large GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338690636:11(6430-6446)Online publication date: Nov-2024
https://doi.org/10.1109/TKDE.2024.3386906
Wang ZYang ZWang NDu YNie JWei ZGu YYu G(2023)Lightweight Streaming Graph Partitioning by Fully Utilizing Knowledge from Local View2023 IEEE 43rd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS57875.2023.00079(614-625)Online publication date: Jul-2023
https://doi.org/10.1109/ICDCS57875.2023.00079
Pan ZChang J(2022)DGIC: A Distributed Graph Inference Computing Framework Suitable For Encoder-Decoder GNNProceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence10.1145/3529466.3529493(148-153)Online publication date: 4-Mar-2022
https://dl.acm.org/doi/10.1145/3529466.3529493
Zhang HYu JZhang YZhao KIves ZBonifati AEl Abbadi A(2022)Parallel Query Processing: To Separate Communication from ComputationProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526164(1447-1461)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526164
Si BLiang YZhao JZhang YLiao XJin HLiu HGu L(2022)GGraph: An Efficient Structure-Aware Approach for Iterative Graph ProcessingIEEE Transactions on Big Data10.1109/TBDATA.2020.30196418:5(1182-1194)Online publication date: 1-Oct-2022
https://doi.org/10.1109/TBDATA.2020.3019641
Song ZGu YQi JWang ZYu G(2022)EC-Graph: A Distributed Graph Neural Network System with Error-Compensated Compression2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00053(648-660)Online publication date: May-2022
https://doi.org/10.1109/ICDE53745.2022.00053
Song ZGu YWang ZYu G(2022)DRPS: efficient disk-resident parameter servers for distributed machine learningFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-021-0445-216:4Online publication date: 1-Aug-2022
https://dl.acm.org/doi/10.1007/s11704-021-0445-2
Yu JQin WZhu XSun ZHuang JLi XChen WLee JPetrank E(2021)DFOGraphProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441622(474-476)Online publication date: 17-Feb-2021
https://dl.acm.org/doi/10.1145/3437801.3441622
Wang XWen DQin LChang LZhang YZhang W(2021)ScaleG: A Distributed Disk-based System for Vertex-centric Graph ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3101057(1-1)Online publication date: 2021
https://doi.org/10.1109/TKDE.2021.3101057
Zhang YWang QGong S(2021)Distributed Graph Processing: Techniques and SystemsWeb and Big Data. APWeb-WAIM 2020 International Workshops10.1007/978-981-16-0479-9_2(14-23)Online publication date: 1-Apr-2021
https://doi.org/10.1007/978-981-16-0479-9_2
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten