Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3458817.3476192acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Whale: efficient one-to-many data partitioning in RDMA-assisted distributed stream processing systems

Published: 13 November 2021 Publication History

Abstract

To process large-scale real-time data streams, existing distributed stream processing systems (DSPSs) leverage different stream partitioning strategies. The one-to-many data partitioning strategy plays an important role in various applications. With one-to-many data partitioning, an upstream processing instance sends a generated tuple to a potentially large number of downstream processing instances. Existing DSPSs leverage an instance-oriented communication mechanism, where an upstream instance transmits a tuple to different downstream instances separately. However, in one-to-many data partitioning, multiple downstream instances typically run on the same machine to exploit multi-core resources. Therefore, a DSPS actually sends a data item to a machine multiple times, raising significant unnecessary costs for serialization and communication. We show that such a mechanism can lead to serious performance bottleneck due to CPU overload.
To address the problem, we design and implement Whale, an efficient RDMA (Remote Direct Memory Access) assisted distributed stream processing system. Two factors contribute to the efficiency of this design. First, we propose a novel RDMA-assisted stream multicast scheme with a self-adjusting non-blocking tree structure to alleviate the CPU workloads of an upstream instance during one-to-many data partitioning. Second, we re-design the communication mechanism in existing DSPSs by replacing the instance-oriented communication with a new worker-oriented communication scheme, which saves significant costs for redundant serialization and communication. We implement Whale on top of Apache Storm and conduct comprehensive experiments to evaluate its performance with large-scale real world datasets. The results show that Whale achieves 56.6× improvement of system throughput and 97% reduction of processing latency compared to existing designs.

Supplementary Material

MP4 File (Whale Efficient One-to-Many Data Partitioning in RDMA-Assisted Distributed Stream Processing Systems 232 Afternoon 6.mp4)
Presentation video

References

[1]
Gaia Initiative. https://outreach.didichuxing.com/research/opendata/en, 2020.
[2]
Kafka. http://kafka.apache.org, 2020.
[3]
Jonathan Behrens, Sagar Jha, Ken Birman, and Edward Tremel. RDMC: A reliable RDMA multicast for large objects. In Proceedings of the 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Luxembourg City, Luxembourg, June 25--28, 2018.
[4]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. Apache Flink™: Stream and batch processing in a single engine. IEEE Data Engineering Bulletin, vol. 38, no. 4, pp. 28--38, 2015.
[5]
Hanhua Chen, Hai Jin, and Shaoliang Wu. Minimizing inter-server communications by exploiting self-similarity in online social networks. IEEE Transactions on Parallel and Distributed Systems, 27(4):1116--1130, 2016.
[6]
Hanhua Chen, Fan Zhang, and Hai Jin. Popularity-aware differentiated distributed stream processing on skewed streams. In Proceedings of the 25th IEEE International Conference on Network Protocols (ICNP), Toronto, ON, Canada, October 10--13, 2017.
[7]
Bugra Gedik. Partitioning functions for stateful data parallelism in stream processing. Proceedings of the VLDB Endowment, vol. 23, no. 4, pp. 517--539, 2014.
[8]
Nusrat Sharmin Islam, Mohammad Wahidur Rahman, Jithin Jose, Raghunath Rajachandrasekar, Hao Wang, Hari Subramoni, Chet Murthy, and Dhabaleswar K Panda. High performance RDMA-based design of HDFS over infiniband. In Proceedings of the 2012 International Conference on High Performance Computing Networking, Storage and Analysis (SC), Salt Lake City, UT, USA, November 11--15, 2012.
[9]
Gabriela Jacques-Silva, Ran Lei, Luwei Cheng, Guoqiang Jerry Chen, Kuen Ching, Tanji Hu, Yuan Mei, Kevin Wilfong, Rithin Shetty, Serhat Yilmaz, Anirban Banerjee, Benjamin Heintz, Shridar Iyer, and Anshul Jaiswal. Providing streaming joins as a service at facebook. Proceedings of the VLDB Endowment, vol. 11, no. 12, pp. 1809--1821, 2018.
[10]
Jeyhun Karimov, Tilmann Rabl, Asterios Katsifodimos, Roman Samarev, Henri Heiskanen, and Volker Markl. Benchmarking distributed stream data processing systems. In Proceedings of the 34th IEEE International Conference on Data Engineering (ICDE), Paris, France, April 16--19, 2018.
[11]
David G. Kendall. Stochastic processes occurring in the theory of queues and their analysis by the method of the imbedded markov chain. The Annals of Mathematical Statistics, vol. 24, no. 3, pp. 338--354, 1953.
[12]
Martin Kleppmann and Jay Kreps. Kafka, samza and the unix philosophy of distributed data. IEEE Data Engineering Bulletin, vol. 38, no. 4, pp. 4--14, 2015.
[13]
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. Twitter heron: Stream processing at scale. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD), Melbourne, Victoria, Australia, May 31 - June 4, 2015.
[14]
Luo Mai, Kai Zeng, Rahul Potharaju, Le Xu, Steve Suh, Shivaram Venkataraman, Paolo Costa, Terry Kim, Saravanam Muthukrishnan, Vamsi Kuppa, Sudheer Dhulipalla, and Sriram Rao. Chi: A scalable and programmable control plane for distributed stream processing systems. Proceedings of the VLDB Endowment, vol. 11, no. 10, pp. 1303--1316, 2018.
[15]
Michael G. Moore and Mark A. Davenport. Estimation of poisson arrival processes under linear models. IEEE Transactions on Information Theory, vol. 65, no. 6, pp. 3555--3564, 2019.
[16]
Muhammad Anis Uddin Nasir, Gianmarco De Francisci Morales, David García-Soriano, Nicolas Kourtellis, and Marco Serafini. The power of both choices: Practical load balancing for distributed stream processing engines. In Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE), Seoul, South Korea, April 13--17, 2015.
[17]
Gengbiao Shen, Qing Li, Shuo Ai, Yong Jiang, Mingwei Xu, and Xuya Jia. How powerful switches should be deployed: A precise estimation based on queuing theory. In Proceedings of 2019 IEEE Conference on Computer Communications (INFOCOM), Paris, France, April 29 - May 2, 2019.
[18]
Maomeng Su, Mingxing Zhang, Kang Chen, Zhenyu Guo, and Yongwei Wu. RFP: when RPC is faster than server-bypass with RDMA. In Proceedings of the 12th European Conference on Computer Systems (EuroSys), 2017.
[19]
Yongxin Tong, Yuqiang Chen, Zimu Zhou, Lei Chen, Jie Wang, Qiang Yang, Jieping Ye, and Weifeng Lv. The simpler the better: a unified approach to predicting original taxi demands based on large-scale online platform. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), Halifax, NS, Canada, August 13--17, 2017.
[20]
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthikeyan Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy V. Ryaboy. Storm@twitter. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD), Snowbird, UT, USA, June 22--27, 2014.
[21]
Jean Vuillemin. A data structure for manipulating priority queues. Communications of the ACM, vol. 21, no. 4, pp. 309--315, 1978.
[22]
Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, and Haibo Chen. Fast in-memory transaction processing using RDMA and HTM. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP), Monterey, CA, USA, October 4--7, 2015.
[23]
Jian Yang, Joseph Izraelevitz, and Steven Swanson. Filemr: Rethinking RDMA networking for scalable persistent memory. In Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI), Santa Clara, CA, USA, February 25--27, 2020.
[24]
Seokwoo Yang, Siwoon Son, Mi-Jung Choi, and Yang-Sae Moon. Performance improvement of Apache Storm using infiniband RDMA. The Journal of Supercomputing, vol. 75, no. 10, pp. 6804--6830, 2019.
[25]
Bairen Yi, Jiacheng Xia, Li Chen, and Kai Chen. Towards zero copy dataflows using RDMA. In Proceedings of the 2017 ACM Conference on Special Interest Group on Data Communication (SIGCOMM), pages 28--30, Los Angeles, CA, USA, August 21--25, 2017.
[26]
Steffen Zeuch, Bonaventura Del Monte, Jeyhun Karimov, Clemens Lutz, Manuel Renz, Jonas Traub, Sebastian Breß, Tilmann Rabl, and Volker Markl. Analyzing efficient stream processing on modern hardware. Proceedings of the VLDB Endowment, vol. 12, no. 5, pp. 516--530, 2019.
[27]
Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, Jitendra Padhye, Shachar Raindel, Mohamad Haj Yahia, and Ming Zhang. Congestion control for large-scale RDMA deployments. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM), London, United Kingdom, August 17--21, 2015.

Index Terms

  1. Whale: efficient one-to-many data partitioning in RDMA-assisted distributed stream processing systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
    November 2021
    1493 pages
    ISBN:9781450384421
    DOI:10.1145/3458817
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 November 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. distributed stream processing system
    2. one-to-many data partition
    3. remote direct memory access (RDMA)

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SC '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 445
      Total Downloads
    • Downloads (Last 12 months)45
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media