research-article

Whale: efficient one-to-many data partitioning in RDMA-assisted distributed stream processing systems

Authors:

Jie Tan,

Hanhua Chen,

Yonghui Wang,

Hai JinAuthors Info & Claims

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 101, Pages 1 - 12

https://doi.org/10.1145/3458817.3476192

Published: 13 November 2021 Publication History

Get Access

Abstract

To process large-scale real-time data streams, existing distributed stream processing systems (DSPSs) leverage different stream partitioning strategies. The one-to-many data partitioning strategy plays an important role in various applications. With one-to-many data partitioning, an upstream processing instance sends a generated tuple to a potentially large number of downstream processing instances. Existing DSPSs leverage an instance-oriented communication mechanism, where an upstream instance transmits a tuple to different downstream instances separately. However, in one-to-many data partitioning, multiple downstream instances typically run on the same machine to exploit multi-core resources. Therefore, a DSPS actually sends a data item to a machine multiple times, raising significant unnecessary costs for serialization and communication. We show that such a mechanism can lead to serious performance bottleneck due to CPU overload.

To address the problem, we design and implement Whale, an efficient RDMA (Remote Direct Memory Access) assisted distributed stream processing system. Two factors contribute to the efficiency of this design. First, we propose a novel RDMA-assisted stream multicast scheme with a self-adjusting non-blocking tree structure to alleviate the CPU workloads of an upstream instance during one-to-many data partitioning. Second, we re-design the communication mechanism in existing DSPSs by replacing the instance-oriented communication with a new worker-oriented communication scheme, which saves significant costs for redundant serialization and communication. We implement Whale on top of Apache Storm and conduct comprehensive experiments to evaluate its performance with large-scale real world datasets. The results show that Whale achieves 56.6× improvement of system throughput and 97% reduction of processing latency compared to existing designs.

Supplementary Material

MP4 File (Whale Efficient One-to-Many Data Partitioning in RDMA-Assisted Distributed Stream Processing Systems 232 Afternoon 6.mp4)

Presentation video

Download
209.73 MB

References

[1]

Gaia Initiative. https://outreach.didichuxing.com/research/opendata/en, 2020.

Abstract

Supplementary Material

References

Index Terms

Recommendations

Stream-aware indexing for distributed inequality join processing

Reliable stream data processing for elastic distributed stream processing systems

hKVS: a framework for designing a high throughput heterogeneous key-value store with SmartNIC and RDMA

Comments

Information

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Badges

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations