Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3098583.3098591acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free access

Performance Isolation Anomalies in RDMA

Published: 09 August 2017 Publication History

Abstract

To meet the increasing throughput and latency demands of modern applications, many operators are rapidly deploying RDMA in their datacenters. At the same time, developers are re-designing their software to take advantage of RDMA's benefits for individual applications. However, when it comes to RDMA's performance, many simple questions remain open.
In this paper, we consider the performance isolation characteristics of RDMA. Specifically, we conduct three sets of experiments -- three combinations of one throughput-sensitive flow and one latency-sensitive flow -- in a controlled environment, observe large discrepancies in RDMA performance with and without the presence of a competing flow, and describe our progress in identifying plausible root-causes.

Supplementary Material

WEBM File (performanceisolationanomaliesinrdma.webm)

References

[1]
2008. Infiniband Technology Overview. https://goo.gl/tyszb4. (2008).
[2]
2015. Infiniband architecture specification volume 1. https://cw.infinibandta.org/document/dl/7859. (2015).
[3]
2017. Mellanox Perftest Package. https://community.mellanox.com/docs/DOC-2802. (2017).
[4]
Mohammad Alizadeh, Shuang Yang, Milad Sharif, Sachin Katti, Nick Mckeown, Balaji Prabhakar, and Scott Shenker. 2013. pFabric: Minimal Near-Optimal Datacenter Transport. In SIGCOMM.
[5]
Hitesh Ballani, Paolo Costa, Thomas Karagiannis, and Ant Rowstron. 2011. Towards predictable datacenter networks. In SIGCOMM.
[6]
M. Chowdhury, Z. Liu, A. Ghodsi, and I. Stoica. 2016. HUG: Multi-Resource Fairness for Correlated and Elastic Demands. In NSDI.
[7]
A. Demers, S. Keshav, and S. Shenker. 1989. Analysis and Simulation of a Fair Queueing Algorithm. In SIGCOMM.
[8]
Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. 2014. FaRM: Fast Remote Memory. In NSDI.
[9]
J. Gu, Y. Lee, Y. Zhang, M. Chowdhury, and K. G. Shin. 2017. Efficient Memory Disaggregation with Infiniswap. In NSDI.
[10]
Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. 2016. RDMA over Commodity Ethernet at Scale. In SIGCOMM.
[11]
Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. 2012. Erasure Coding in Windows Azure Storage. In USENIX ATC.
[12]
Jeffrey M Jaffe. 1981. Bottleneck flow control. IEEE Transactions on Communications 29, 7 (1981), 954--962.
[13]
Anuj Kalia, Michael Kaminsky, and David G Andersen. 2014. Using RDMA efficiently for key-value services. In SIGCOMM.
[14]
Anuj Kalia, Michael Kaminsky, and David G Andersen. 2016. Design guidelines for high performance RDMA systems. In USENIX ATC.
[15]
Anuj Kalia, Michael Kaminsky, and David G Andersen. 2016. FaSST: fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs. In OSDI.
[16]
Haoyuan Li, Ali Ghodsi, Matei Zaharia, Scott Shenker, and Ion Stoica. 2014. Tachyon: Reliable, memory speed storage for cluster computing frameworks. In SoCC.
[17]
Radhika Mittal, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi, Amin Vahdat, Yaogong Wang, David Wetherall, and David Zats. 2015. TIMELY: RTT-based Congestion Control for the Datacenter. In SIGCOMM.
[18]
Jeffrey C Mogul and Lucian Popa. 2012. What we talk about when we talk about cloud network performance. SIGCOMM CCR 42, 5 (2012), 44--48.
[19]
Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. 2015. Latency-tolerant software distributed shared memory. In USENIX ATC.
[20]
L. Popa, G. Kumar, M. Chowdhury, A. Krishnamurthy, S. Ratnasamy, and I. Stoica. 2012. FairCloud: Sharing the Network in Cloud Computing. In SIGCOMM.
[21]
Adit Ranadive, Ada Gavrilovska, and Karsten Schwan. 2010. FaReS: Fair resource scheduling for VMM-bypass Infiniband devices. In CCGRID.
[22]
Sayantan Sur, Matthew J Koop, Dhabaleswar K Panda, and others. 2007. Performance analysis and evaluation of Mellanox ConnectX InfiniBand architecture with multi-core platforms. In IEEE Hot Interconnects.
[23]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M.J. Franklin, S. Shenker, and I. Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI.
[24]
Fang Zheng, Hongbo Zou, Greg Eisenhauer, Karsten Schwan, Matthew Wolf, Jai Dayal, Tuan-anh Nguyen, Jianting Cao, Hasan Abbasi, Scott Klasky, Norbert Podhorszki, and Hongfeng Yu. 2013. FlexIO: I/O Middleware for Location-Flexible Scientific Data Analytics. In IPDPS.
[25]
Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, Jitendra Padhye, Shachar Raindel, Mohamad Haj Yahia, and Ming Zhang. 2015. Congestion control for large-scale RDMA deployments. In SIGCOMM.

Cited By

View all

Index Terms

  1. Performance Isolation Anomalies in RDMA

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KBNets '17: Proceedings of the Workshop on Kernel-Bypass Networks
    August 2017
    59 pages
    ISBN:9781450350532
    DOI:10.1145/3098583
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 August 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. RDMA
    2. fairness
    3. performance isolation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SIGCOMM '17
    Sponsor:
    SIGCOMM '17: ACM SIGCOMM 2017 Conference
    August 21, 2017
    CA, Los Angeles, USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)267
    • Downloads (Last 6 weeks)33
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)INSERT: In-Network Stateful End-to-End RDMA TelemetryIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621203(1061-1070)Online publication date: 20-May-2024
    • (2023)Memory Disaggregation: Advances and Open ChallengesACM SIGOPS Operating Systems Review10.1145/3606557.360656257:1(29-37)Online publication date: 28-Jun-2023
    • (2023)Scalable RDMA Transport with Efficient Connection SharingIEEE INFOCOM 2023 - IEEE Conference on Computer Communications10.1109/INFOCOM53939.2023.10228968(1-10)Online publication date: 17-May-2023
    • (2023)High Availability for virtualized Programmable Logic Controllers with Hard Real-Time Requirements on Cloud Infrastructures2023 IEEE 21st International Conference on Industrial Informatics (INDIN)10.1109/INDIN51400.2023.10218014(1-8)Online publication date: 18-Jul-2023
    • (2023)DeepMetricCorr: Fast flow correlation for data center networks with deep metric learningComputer Networks10.1016/j.comnet.2023.109904233(109904)Online publication date: Sep-2023
    • (2022)A Survey of Storage Systems in the RDMA EraIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318865633:12(4395-4409)Online publication date: 1-Dec-2022
    • (2022)Analyzing In-Memory NoSQL LandscapeIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.300290834:4(1628-1643)Online publication date: 1-Apr-2022
    • (2020)Effectively prefetching remote memory with leapProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489204(843-857)Online publication date: 15-Jul-2020
    • (2020)StratusProceedings of the 12th USENIX Conference on Hot Topics in Cloud Computing10.5555/3485849.3485861(12-12)Online publication date: 13-Jul-2020
    • (2020)1RMAProceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication10.1145/3387514.3405897(708-721)Online publication date: 30-Jul-2020
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media