Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3503222.3507711acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

The benefits of general-purpose on-NIC memory

Published: 22 February 2022 Publication History
  • Get Citation Alerts
  • Abstract

    We propose to use the small, newly available on-NIC memory ("nicmem") to keep pace with the rapidly increasing performance of NICs. We motivate our proposal by accelerating two types of workload classes: NFV and key-value stores. As NFV workloads frequently operate on headers---rather than data---of incoming packets, we introduce a new packet-processing architecture that splits between the two, keeping the data on nicmem when possible and thus reducing PCIe traffic, memory bandwidth, and CPU processing time. Our approach consequently shortens NFV latency by up to 23% and increases its throughput by up to 19%. Similarly, because key-value stores commonly exhibit skewed distributions, we introduce a new network stack mechanism that lets applications keep frequently accessed items on nicmem. Our design shortens memcached latency by up to 43% and increases its throughput by up to 80%.

    References

    [1]
    Atul Adya, Daniel Myers, Henry Qin, and Robert Grandl. 2019. Fast key-value stores: An idea whose time has come and gone (HotOS’19 talk slides). https://ai.google/research/pubs/pub48030 (Accessed: Aug 2021).
    [2]
    Fabien André, Stéphane Gouache, Nicolas Le Scouarnec, and Antoine Monsifrot. 2018. Don’ t share, Don’ t lock: Large-scale Software Connection Tracking with Krononat. In USENIX Annual Technical Conference (ATC). 453–466. https://www.usenix.org/conference/atc18/presentation/andre
    [3]
    Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload Analysis of a Large-Scale Key-Value Store. In ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 53––64. https://doi.org/10.1145/2254756.2254766
    [4]
    Tom Barbette, Cyril Soldani, and Laurent Mathy. 2015. Fast Userspace Packet Processing. In ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS). 5––16. https://doi.org/10.1109/ANCS.2015.7110116
    [5]
    Theophilus Benson, Aditya Akella, and David A. Maltz. 2010. Network Traffic Characteristics of Data Centers in the Wild. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. 267––280. https://doi.org/10.1145/1879141.1879175
    [6]
    Peter Bodik, Armando Fox, Michael J. Franklin, Michael I. Jordan, and David A. Patterson. 2010. Characterizing, Modeling, and Generating Workload Spikes for Stateful Services. In Symposium on Cloud Computing (SoCC). 241––252. https://doi.org/10.1145/1807128.1807166
    [7]
    S. Bradner and J. McQuaid. 1999. Benchmarking Methodology for Network Interconnect Devices. Internet Engineering Task Force, 31.
    [8]
    Jesse Brandeburg. 2019. ice: change default number of receive descriptors. https://marc.info/?l=linux-netdev&m=156771568024262&w=2 Intel. Accessed: June 2021.
    [9]
    Broadcom. 2015. NetXtreme BCXM57XX User Guide. https://docs.broadcom.com/doc/INGSRV170-CDUM100-R Accessed: 2021-04-16.
    [10]
    Broadcom. 2019. BCM957508-P2200G. https://docs.broadcom.com/doc/957508-P2200G-DS Accessed: 2021-04-16.
    [11]
    Broadcom. 2020. BCM5880X SmartNIC Adapters. https://docs.broadcom.com/docs/5880X-UG30X Accessed: 2021-04-16.
    [12]
    Broadcom. 2020. BCM957504-N1100G. https://docs.broadcom.com/doc/957504-N1100G-DS Accessed: 2021-04-16.
    [13]
    Broadcom. 2021. NetXtreme E-Series PCIe NIC Ethernet Adapters Specification Sheet. https://docs.broadcom.com/doc/netxtreme-e-series-pcie-nic-ethernet-adapters-specification-sheet Accessed: 2021-08-10.
    [14]
    Broadcom. 2021. NetXtreme-E User Guide. https://docs.broadcom.com/doc/netxtreme-e-user-guide Accessed: 2021-08-10.
    [15]
    Marco Spaziani Brunella, Giacomo Belocchi, Marco Bonola, Salvatore Pontarelli, Giuseppe Siracusano, Giuseppe Bianchi, Aniello Cammarano, Alessandro Palumbo, Luca Petrucci, and Roberto Bifulco. 2020. hXDP: Efficient Software Packet Processing on FPGA NICs. In USENIX Symposium on Operating System Design and Implementation (OSDI). 973–990. https://www.usenix.org/conference/osdi20/presentation/brunella
    [16]
    2019. CAIDA dataset. https://www.caida.org/catalog/datasets/trace_stats/ (Accessed: May 2021.).
    [17]
    CDW. 2021. 100GbE adapter prices. https://www.cdw.com/search/networking/network-adapters/ethernet-adapters/?w=RB1&ln=0&filter=af_networking_data_link_protocol_rb1_ss%3a(%22100+Gigabit+Ethernet%22)%2caf_networking_form_factor_rb1_ss%3a(%22Plug-in+card%22)&SortBy=PriceAsc Accessed: 2021-08-31.
    [18]
    Jonathan Chang, Yen-Huei Chen, Wei-Min Chan, Sahil Preet Singh, Hank Cheng, Hidehiro Fujiwara, Jih-Yu Lin, Kao-Cheng Lin, John Hung, Robin Lee, Hung-Jen Liao, Jhon-Jhy Liaw, Quincy Li, Chih-Yung Lin, Mu-Chi Chiang, and Shien-Yang Wu. 2017. A 7nm 256Mb SRAM in high-k metal-gate FinFET technology with write-assist circuitry for low-V_MIN applications. In IEEE International Solid-State Circuits Conference (ISSCC). 206–207. https://doi.org/10.1109/ISSCC.2017.7870333
    [19]
    Moses Charikar, Kevin Chen, and Martin Farach-Colton. 2002. Finding frequent items in data streams. In International Colloquium on Automata, Languages, and Programming. 693–703. https://doi.org/10.14778/1454159.1454225
    [20]
    Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509.
    [21]
    Cisco. 2015. TRex: Realistic Traffic Generator. https://trex-tgn.cisco.com/ (Accessed: May 2021.).
    [22]
    Ehernet Technology Consortium. 2020. 800G specification. https://ethernettechnologyconsortium.org/wp-content/uploads/2020/03/800G-Specification_r1.0.pdf Accessed: 2021-08-09.
    [23]
    Graham Cormode and Shan Muthukrishnan. 2005. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 58–75.
    [24]
    Intel Corporation. 2012. Intel Data Direct I/O Technology (Intel DDIO): A Primer. https://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/data-direct-i-o-technology-brief.pdf Accessed: 2020-07-18.
    [25]
    CSET. 2020. AI Chips: What They Are and Why They Matter. https://cset.georgetown.edu/publication/ai-chips-what-they-are-and-why-they-matter/ Accessed: 2021-08-28.
    [26]
    Alexandros Daglis, Mark Sutherland, and Babak Falsafi. 2019. RPCValet: NI-Driven Tail-Aware Balancing of µs-Scale RPCs. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 35–48. https://doi.org/10.1145/3297858.3304070
    [27]
    Michael Dalton, David Schultz, Jacob Adriaens, Ahsan Arefin, Anshuman Gupta, Brian Fahs, Dima Rubinstein, Enrique Cauich Zermeno, Erik Rubow, James Alexander Docauer, Jesse Alpert, Jing Ai, Jon Olson, Kevin DeCabooter, Marc de Kruijf, Nan Hua, Nathan Lewis, Nikhil Kasinadhuni, Riccardo Crepaldi, Srinivas Krishnan, Subbaiah Venkata, Yossi Richter, Uday Naik, and Amin Vahdat. 2018. Andromeda: Performance, Isolation, and Velocity at Scale in Cloud Network Virtualization. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 373–387. https://www.usenix.org/conference/nsdi18/presentation/dalton
    [28]
    Daniel Nenni. 2020. 7nm price is about right. https://semiwiki.com/forum/index.php?threads/5nm-wafer-cost-very-high.13101/#post-44127 SemiWiki forum discussion of CSET wafer prices report. Accessed: 2021-08-28.
    [29]
    Intel Ethernet Networking Division. 2019. Intel 82599 10 GbE Controller Datasheet. https://www.intel.com/content/www/us/en/products/details/ethernet/500-controllers/82599-10-controllers/docs.html?s=Newest Accessed: 2021-08-10.
    [30]
    Mihai Dobrescu, Katerina Argyraki, and Sylvia Ratnasamy. 2012. Toward Predictable Performance in Software Packet-Processing Platforms. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 141–154. https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/dobrescu
    [31]
    Mihai Dobrescu, Norbert Egi, Katerina Argyraki, Byung-Gon Chun, Kevin Fall, Gianluca Iannaccone, Allan Knies, Maziar Manesh, and Sylvia Ratnasamy. 2009. RouteBricks: Exploiting Parallelism to Scale Software Routers. In ACM Symposium on Operating Systems Principles (SOSP). 15––28. https://doi.org/10.1145/1629575.1629578
    [32]
    Daniel E. Eisenbud, Cheng Yi, Carlo Contavalli, Cody Smith, Roman Kononov, Eric Mann-Hielscher, Ardas Cilingiroglu, Bin Cheyney, Wentao Shang, and Jinnah Dylan Hosein. 2016. Maglev: A Fast and Reliable Software Network Load Balancer. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 523–535. https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/eisenbud
    [33]
    Haggai Eran, Lior Zeno, Maroun Tork, Gabi Malka, and Mark Silberstein. 2019. NICA: An Infrastructure for Inline Acceleration of Network Applications. In USENIX Annual Technical Conference (ATC). 345–362. https://www.usenix.org/conference/atc19/presentation/eran
    [34]
    Alireza Farshin, Tom Barbette, Amir Roozbeh, Gerald Q. Maguire Jr., and Dejan Kostić. 2021. PacketMill: Toward per-Core 100-Gbps Networking. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1––17. https://doi.org/10.1145/3445814.3446724
    [35]
    Alireza Farshin, Amir Roozbeh, Gerald Q. Maguire Jr., and Dejan Kostić. 2020. eexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks. In USENIX Annual Technical Conference (ATC). 673–689. https://www.usenix.org/conference/atc20/presentation/farshin
    [36]
    Alireza Farshin, Amir Roozbeh, Gerald Q Maguire Jr, and Dejan Kostić. 2019. Make the most out of last level cache in intel processors. In ACM Eurosys. 1–17.
    [37]
    Roy T. Fielding and Gail Kaiser. 1997. The Apache HTTP Server Project. IEEE Internet Computing, 1, 4 (1997), Jul, 88–90. http://dx.doi.org/10.1109/4236.612229
    [38]
    Brad Fitzpatrick. 2004. Distributed Caching with Memcached. Linux Journal, 2004, 124 (2004), Aug, 5. http://dl.acm.org/citation.cfm?id=1012889.1012894
    [39]
    Mario Flajslik and Mendel Rosenblum. 2013. Network Interface Design for Low Latency Request-Response Protocols. In USENIX Annual Technical Conference (ATC). 333–346. https://www.usenix.org/conference/atc13/technical-sessions/presentation/flajslik
    [40]
    Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay. 2020. Caladan: Mitigating interference at microsecond timescales. In USENIX Symposium on Operating System Design and Implementation (OSDI). 281–297.
    [41]
    Drew Gallatin. 2017. Serving 100 Gbps from an Open Connect Appliance. https://netflixtechblog.com/serving-100-gbps-from-an-open-connect-appliance-cdb51dda3b99 Accessed: 2020-09-09.
    [42]
    Johan Garcia, Topi Korhonen, Ricky Andersson, and Filip Västlund. 2018. Towards Video Flow Classification at a Million Encrypted Flows Per Second. In 2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA). 358–365. https://doi.org/10.1109/AINA.2018.00061
    [43]
    Massimo Girondi, Marco Chiesa, and Tom Barbette. 2021. High-speed Connection Tracking in Modern Servers. In IEEE International Conference on High Performance Switching and Routing (HPSR). 1–8. https://doi.org/10.1109/HPSR52026.2021.9481841
    [44]
    Younghwan Go, Muhammad Asim Jamshed, YoungGyoun Moon, Changho Hwang, and KyoungSoo Park. 2017. APUNet: Revitalizing GPU as Packet Processing Accelerator. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 83–96. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/go
    [45]
    Hossein Golestani, Amirhossein Mirhosseini, and Thomas F. Wenisch. 2019. Software Data Planes: You Can’t Always Spin to Win. In Symposium on Cloud Computing (SoCC). 337–350. https://doi.org/10.1145/3357223.3362737
    [46]
    Google. 2021. HTTPS encryption on the web. https://transparencyreport.google.com/https/overview Accessed: 2021-08-05.
    [47]
    Swati Goswami, Nodir Kodirov, Craig Mustard, Ivan Beschastnikh, and Margo Seltzer. 2020. Parking Packet Payload with P4. In ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT). 274––281. https://doi.org/10.1145/3386367.3431295
    [48]
    Stewart Grant, Anil Yelam, Maxwell Bland, and Alex C. Snoeren. 2020. SmartNIC Performance Isolation with FairNIC: Programmable Networking for the Cloud. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 681––693. https://doi.org/10.1145/3387514.3405895
    [49]
    Intel Ethernet Product Group. 2021. Intel Ethernet Controller X710/XXV710/XL710. https://www.intel.com/content/dam/www/public/us/en/documents/release-notes/xl710-ethernet-controller-feature-matrix.pdf Accessed: 2021-08-10.
    [50]
    Sangjin Han, Keon Jang, KyoungSoo Park, and Sue Moon. 2010. PacketShader: A GPU-Accelerated Software Router. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 195–206. https://doi.org/10.1145/1851182.1851207
    [51]
    Thulara N. Hewage, Malka N. Halgamuge, Ali Syed, and Gullu Ekici. 2018. Review: Big data techniques of google, Amazon, Facebook and Twitter. Oxford University Press Journal of Communications, 13, 2 (2018), Feb, 94–100. https://doi.org/10.12720/jcm.13.2.94-100
    [52]
    Intel. 2021. 3rd Generation Intel® Xeon® Scalable Processors. https://ark.intel.com/content/www/us/en/ark/products/series/204098/3rd-generation-intel-xeon-scalable-processors.html Accessed: 2021-08-31.
    [53]
    Intel. 2021. Intel Ethernet Network Adapter E810-2CQDA2. https://ark.intel.com/content/www/us/en/ark/products/192561/intel-ethernet-network-adapter-e810-cqda1.html Accessed: 2021-08-10.
    [54]
    Intel. 2021. Intel Ethernet Network Adapter E810-2CQDA2. https://ark.intel.com/content/www/us/en/ark/products/210969/intel-ethernet-network-adapter-e810-2cqda2.html Accessed: 2021-08-10.
    [55]
    Intel. 2022. Processor Counter Monitor (PCM). https://github.com/opcm/pcm Accessed: 2021-02-05.
    [56]
    Intel Corporation. 2010. DPDK: Data Plane Development Kit. http://dpdk.org (Accessed: May 2016).
    [57]
    Intel Corporation. 2012. L3 Forwarding Sample Application. https://doc.dpdk.org/guides/sample_app_ug/l3_forward.html (Accessed: May 2021).
    [58]
    Intel Corporation. 2020. DPDK Ping-Pong. https://github.com/zylan29/dpdk-pingpong (Accessed: May 2021).
    [59]
    Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, Jon Zolla, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2013. B4: Experience with a Globally-Deployed Software Defined Wan. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 3––14. https://doi.org/10.1145/2486001.2486019
    [60]
    Piotr Jurkiewicz, Grzegorz Rzym, and Piotr Boryło. 2021. Flow length and size distributions in campus Internet traffic. Computer Communications, 167 (2021), 15–30. https://www.sciencedirect.com/science/article/pii/S0140366420320223
    [61]
    Anuj Kalia, Dong Zhou, Michael Kaminsky, and David G. Andersen. 2015. Raising the Bar for Using GPUs in Software Packet Processing. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 409–423. https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/kalia
    [62]
    Georgios P. Katsikas, Tom Barbette, Dejan Kostić, Rebecca Steinert, and Gerald Q. Maguire Jr. 2018. Metron: NFV Service Chains at the True Speed of the Underlying Hardware. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 171–186. https://www.usenix.org/conference/nsdi18/presentation/katsikas
    [63]
    Antoine Kaufmann, SImon Peter, Naveen Kr. Sharma, Thomas Anderson, and Arvind Krishnamurthy. 2016. High Performance Packet Processing with FlexNIC. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 67–81. http://dx.doi.org/10.1145/2872362.2872367
    [64]
    Eddie Kohler, Robert Morris, and Benjie Chen. 2002. Programming Language Optimizations for Modular Router Configurations. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 251––263. https://doi.org/10.1145/605397.605424
    [65]
    Kevin Laatz. 2018. [dpdk-dev] [PATCH v2 0/3] Increase default RX/TX ring sizes. https://mails.dpdk.org/archives/dev/2018-January/086889.html Intel DPDK. Accessed: June 2021.
    [66]
    Redis Labs. 2009. Redis. https://redis.io/ Accessed: 2021-08-06.
    [67]
    Nikita Lazarev, Shaojie Xiang, Neil Adit, Zhiru Zhang, and Christina Delimitrou. 2021. Dagger: Efficient and Fast RPCs in Cloud Microservices with near-Memory Reconfigurable NICs. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 36––51. https://doi.org/10.1145/3445814.3446696
    [68]
    Baptiste Lepers, Vivien Quema, and Alexandra Fedorova. 2015. Thread and Memory Placement on NUMA Systems: Asymmetry Matters. In USENIX Annual Technical Conference (ATC). 277–289. https://www.usenix.org/conference/atc15/technical-session/presentation/lepers
    [69]
    Ilya Lesokhin, Haggai Eran, Shachar Raindel, Guy Shapiro, Sagi Grimberg, Liran Liss, Muli Ben-Yehuda, Nadav Amit, and Dan Tsafrir. 2017. Page fault support for network controllers. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 449–466. https://doi.org/10.1145/3037697.3037710
    [70]
    Bojie Li, Zhenyuan Ruan, Wencong Xiao, Yuanwei Lu, Yongqiang Xiong, Andrew Putnam, Enhong Chen, and Lintao Zhang. 2017. KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC. In ACM Symposium on Operating Systems Principles (SOSP). 137–152. https://doi.org/10.1145/3132747.3132756
    [71]
    Bojie Li, Kun Tan, Layong (Larry) Luo, Yanqing Peng, Renqian Luo, Ningyi Xu, Yongqiang Xiong, Peng Cheng, and Enhong Chen. 2016. ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 1––14. https://doi.org/10.1145/2934872.2934897
    [72]
    Sheng Li, Hyeontaek Lim, Victor W. Lee, Jung Ho Ahn, Anuj Kalia, Michael Kaminsky, David G. Andersen, Seongil O, Sukhan Lee, and Pradeep Dubey. 2015. Architecting to achieve a billion requests per second throughput on a single key-value store server platform. In ACM International Symposium on Computer Architecture (ISCA). 476–488. https://doi.org/10.1145/2749469.2750416
    [73]
    Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 429–444. https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/lim
    [74]
    Ming Liu, Tianyi Cui, Henry Schuh, Arvind Krishnamurthy, Simon Peter, and Karan Gupta. 2019. Offloading Distributed Applications onto SmartNICs Using IPipe. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 318–333. https://doi.org/10.1145/3341302.3342079
    [75]
    Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating Keys from Values in SSD-conscious Storage. In USENIX Conference on File and Storage Technologies (FAST). 133–148. https://www.usenix.org/conference/fast16/technical-sessions/presentation/lu
    [76]
    Antonis Manousis, Rahul Anand Sharma, Vyas Sekar, and Justine Sherry. 2020. Contention-Aware Performance Prediction For Virtualized Network Functions. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 270––282. https://doi.org/10.1145/3387514.3405868
    [77]
    Ilias Marinos, Robert N.M. Watson, and Mark Handley. 2014. Network Stack Specialization for Performance. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 175–186. http://doi.acm.org/10.1145/2619239.2626311
    [78]
    Ilias Marinos, Robert N.M. Watson, Mark Handley, and Randall R. Stewart. 2017. Disk|Crypt|Net: Rethinking the Stack for High-performance Video Streaming. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 211–224. https://doi.org/10.1145/3098822.3098844
    [79]
    Marvell. 2020. FastLinQ 41000 Series Adapters. https://www.marvell.com/content/dam/marvell/en/public-collateral/ethernet-adaptersandcontrollers/marvell-ethernet-adapters-fastlinq-41000-series-user-guide.pdf Accessed: June 2021.
    [80]
    John D. McCalpin. 2016. Memory Bandwidth and System Balance in HPC Systems. In ACM/IEEE Supercomputing (SC). https://sites.utexas.edu/jdm4372/2016/11/22/sc16-invited-talk-memory-bandwidth-and-system-balance-in-hpc-systems/
    [81]
    Mellanox. 2017. ConnectX®-5 En Card Product Brief. https://www.mellanox.com/sites/default/files/related-docs/prod_adapter_cards/PB_ConnectX-5_EN_Card.pdf Accessed: 2019-08-06.
    [82]
    Mellanox. 2018. ConnectX®-6 En Card Product Brief. https://www.mellanox.com/sites/default/files/related-docs/prod_adapter_cards/PB_ConnectX-6_EN_Card.pdf Accessed: 2019-08-06.
    [83]
    Mellanox. 2018. Mellanox NEO-Host. https://www.mellanox.com/sites/default/files/related-docs/prod_management_software/PB_Mellanox_NEO_Host.pdf Accessed: 2021-04-16.
    [84]
    Mellanox. 2020. ConnectX®-6 Dx En Card Product Brief. https://www.mellanox.com/sites/default/files/related-docs/prod_adapter_cards/PB_ConnectX-6_Dx_EN_Card.pdf Accessed: 2020-07-06.
    [85]
    Mellanox. 2020. Mellanox ASAP2. https://www.mellanox.com/files/doc-2020/sb-asap2.pdf Accessed: 2022-01-05.
    [86]
    Mellanox. 2021. Device Memory Programming Model. https://docs.mellanox.com/display/OFEDv502180/Programming#Programming-DeviceMemoryProgramming Accessed: 2021-11-20.
    [87]
    Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2005. Efficient computation of frequent and top-k elements in data streams. In International conference on database theory. 398–412. https://doi.org/10.1007/978-3-540-30570-5_27
    [88]
    Amirhossein Mirhosseini, Hossein Golestani, and Thomas F. Wenisch. 2020. HyperPlane: A Scalable Low-Latency Notification Accelerator for Software Data Planes. In IEEE/ACM International Symposium on Microarchitecture (MICRO). 852–867. https://doi.org/10.1109/MICRO50266.2020.00074
    [89]
    Jeffrey C Mogul. 2003. TCP Offload Is a Dumb Idea Whose Time Has Come. In USENIX Workshop on Hot Topics in Operating Systems (HotOS). 25–30. https://www.usenix.org/conference/hotos-ix/tcp-offload-dumb-idea-whose-time-has-come
    [90]
    Robert Morris, Eddie Kohler, John Jannotti, and M. Frans Kaashoek. 1999. The Click Modular Router. In ACM Symposium on Operating Systems Principles (SOSP). 217––231. https://doi.org/10.1145/319151.319166
    [91]
    David Naylor, Alessandro Finamore, Ilias Leontiadis, Yan Grunenberger, Marco Mellia, Maurizio Munafò, Konstantina Papagiannaki, and Peter Steenkiste. 2014. The Cost of the "S" in HTTPS. In ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT). 133––140. https://doi.org/10.1145/2674005.2674991
    [92]
    Rolf Neugebauer, Gianni Antichi, José Fernando Zazo, Yury Audzevich, Sergio López-Buedo, and Andrew W. Moore. 2018. Understanding PCIe Performance for End Host Networking. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 327––341. https://doi.org/10.1145/3230543.3230560
    [93]
    NVIDIA. 2021. Bluefield-2 DPU. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/documents/datasheet-nvidia-bluefield-2-dpu.pdf Accessed: 2021-04-16.
    [94]
    NVIDIA. 2021. ConnectX®-7 Card Product Brief. https://www.nvidia.com/content/dam/en-zz/Solutions/networking/ethernet-adapters/connectx-7-datasheet-Final.pdf Accessed: 2021-04-16.
    [95]
    Chris Nyberg, Tom Barclay, Zarka Cvetanovic, Jim Gray, and Dave Lomet. 1994. AlphaSort: A RISC Machine Sort. In ACM SIGMOD International Conference on Management of Data. 233––242. https://doi.org/10.1145/191839.191884
    [96]
    Vladimir Andrei Olteanu, Felipe Huici, and Costin Raiciu. 2015. Lost in Network Address Translation: Lessons from Scaling the World’s Simplest Middlebox. In ACM SIGCOMM Workshop on Hot Topics in Middleboxes and Network Function Virtualization (HotMiddlebox). 19––24. https://doi.org/10.1145/2785989.2785994
    [97]
    Amy Ousterhout, Joshua Fried, Jonathan Behrens, Adam Belay, and Hari Balakrishnan. 2019. Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 361–378. https://www.usenix.org/conference/nsdi19/presentation/ousterhout
    [98]
    Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Stanford InfoLab.
    [99]
    Aurojit Panda, Sangjin Han, Keon Jang, Melvin Walls, Sylvia Ratnasamy, and Scott Shenker. 2016. NetBricks: Taking the V out of NFV. In USENIX Symposium on Operating System Design and Implementation (OSDI). 203–216. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/panda
    [100]
    Parveen Patel, Deepak Bansal, Lihua Yuan, Ashwin Murthy, Albert Greenberg, David A. Maltz, Randy Kern, Hemant Kumar, Marios Zikos, Hongyu Wu, Changhoon Kim, and Naveen Karri. 2013. Ananta: Cloud Scale Load Balancing. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 207––218. https://doi.org/10.1145/2486001.2486026
    [101]
    Paul Alcorn. 2021. AMD Shows New 3D V-Cache Ryzen Chiplets, up to 192MB of L3 Cache, 15% Gaming Improvement (Updated). https://www.tomshardware.com/uk/news/amd-shows-new-3d-v-cache-ryzen-chiplets-up-to-192mb-of-l3-cache-per-chip-15-gaming-improvement Accessed: 2021-08-28.
    [102]
    Boris Pismenny, Haggai Eran, Aviad Yehezkel, Liran Liss, Adam Morrison, and Dan Tsafrir. 2021. Autonomous NIC Offloads. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 18––35. https://doi.org/10.1145/3445814.3446732
    [103]
    Salvatore Pontarelli, Roberto Bifulco, Marco Bonola, Carmelo Cascone, Marco Spaziani, Valerio Bruschi, Davide Sanvito, Giuseppe Siracusano, Antonio Capone, Michio Honda, Felipe Huici, and Giuseppe Siracusano. 2019. FlowBlaze: Stateful Packet Processing in Hardware. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 531–548. https://www.usenix.org/conference/nsdi19/presentation/pontarelli
    [104]
    Samira Pouyanfar, Yimin Yang, Shu-Ching Chen, Mei-Ling Shyu, and S. S. Iyengar. 2018. Multimedia Big Data Analytics: A Survey. ACM Compututing Surveys (CSUR), 51, 1 (2018), Apr, Article No. 10. https://doi.org/10.1145/3150226
    [105]
    Mia Primorac, Edouard Bugnion, and Katerina Argyraki. 2017. How to Measure the Killer Microsecond. In Proceedings of the Workshop on Kernel-Bypass Networks. 37––42. https://doi.org/10.1145/3098583.3098590
    [106]
    2005. RDMA Core Userspace Libraries and Daemons. https://github.com/linux-rdma/rdma-core (Accessed: May 2021.).
    [107]
    Luigi Rizzo. 2012. Netmap: A Novel Framework for Fast Packet I/O. In USENIX Annual Technical Conference (ATC). https://www.usenix.org/conference/atc12/technical-sessions/presentation/rizzo
    [108]
    Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C. Snoeren. 2015. Inside the Social Network’s (Datacenter) Network. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 123––137. https://doi.org/10.1145/2785956.2787472
    [109]
    Igor Smolyar, Alex Markuze, Boris Pismenny, Haggai Eran, Gerd Zellweger, Austin Bolen, Liran Liss, Adam Morrison, and Dan Tsafrir. 2020. IOctopus: Outsmarting Nonuniform DMA. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 101–115. https://doi.org/10.1145/3373376.3378509
    [110]
    Patrick Stuedi, Animesh Trivedi, and Bernard Metzler. 2012. Wimpy Nodes with 10GbE: Leveraging One-Sided Operations in Soft-RDMA to Boost Memcached. In USENIX Annual Technical Conference (ATC). 347–353. https://www.usenix.org/conference/atc12/technical-sessions/presentation/stuedi
    [111]
    Mark Sutherland, Siddharth Gupta, Babak Falsafi, Virendra Marathe, Dionisios Pnevmatikatos, and Alexandres Daglis. 2020. The NeBuLa RPC-Optimized Architecture. In ACM International Symposium on Computer Architecture (ISCA). 199–212. https://doi.org/10.1109/ISCA45697.2020.00027
    [112]
    Shelby Thomas, Geoffrey M. Voelker, and George Porter. 2018. CacheCloud: Towards Speed-of-light Datacenter Communication. In USENIX Workshop on Hot Topics in Cloud Computing (HotCloud). https://www.usenix.org/conference/hotcloud18/presentation/thomas
    [113]
    Amin Tootoonchian, Aurojit Panda, Chang Lan, Melvin Walls, Katerina Argyraki, Sylvia Ratnasamy, and Scott Shenker. 2018. ResQ: Enabling SLOs in Network Function Virtualization. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 283––297. https://www.usenix.org/conference/nsdi18/presentation/tootoonchian
    [114]
    Tariq Toukan. 2017. [PATCH net-next 08/10] net/mlx4_en: Increase default TX ring size. https://www.mail-archive.com/[email protected]/msg173779.html Mellanox. Accessed: June 2021.
    [115]
    Qing Want, Youyou Lu, and Jiwu Shu. 2022. Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory. In To appear in ACM SIGMOD International Conference on Management of Data. arxiv:2112.07320
    [116]
    Juncheng Yang, Yao Yue, and K. V. Rashmi. 2020. A large scale analysis of hundreds of in-memory cache clusters at Twitter. In USENIX Symposium on Operating System Design and Implementation (OSDI). 191–208. https://www.usenix.org/conference/osdi20/presentation/yang
    [117]
    Yifan Yuan, Mohammad Alian, Yipeng Wang, Ren Wang, Ilia Kurakin, Charlie Tai, and Nam Sung Kim. 2021. Don’t Forget the I/O When Allocating Your LLC. In ACM International Symposium on Computer Architecture (ISCA). 112–125. https://doi.org/10.1109/ISCA52012.2021.00018
    [118]
    Zhipeng Zhao, Hugo Sadok, Nirav Atre, James C. Hoe, Vyas Sekar, and Justine Sherry. 2020. Achieving 100Gbps Intrusion Prevention on a Single Server. In USENIX Symposium on Operating System Design and Implementation (OSDI). 1083–1100. https://www.usenix.org/conference/osdi20/presentation/zhao-zhipeng

    Cited By

    View all
    • (2024)Triton: A Flexible Hardware Offloading Architecture for Accelerating Apsara vSwitch in Alibaba CloudProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672224(750-763)Online publication date: 4-Aug-2024
    • (2024)Intel Accelerators Ecosystem: An SoC-Oriented Perspective : Industry Product2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00066(848-862)Online publication date: 29-Jun-2024
    • (2024)HADES: Hardware-Assisted Distributed Transactions in the Age of Fast Networks and SmartNICs2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00062(785-800)Online publication date: 29-Jun-2024
    • Show More Cited By

    Index Terms

    1. The benefits of general-purpose on-NIC memory

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
      February 2022
      1164 pages
      ISBN:9781450392051
      DOI:10.1145/3503222
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 February 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Badges

      Author Tags

      1. NIC
      2. hardware/software co-design
      3. operating system

      Qualifiers

      • Research-article

      Conference

      ASPLOS '22

      Acceptance Rates

      Overall Acceptance Rate 535 of 2,713 submissions, 20%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)230
      • Downloads (Last 6 weeks)24
      Reflects downloads up to 12 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Triton: A Flexible Hardware Offloading Architecture for Accelerating Apsara vSwitch in Alibaba CloudProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672224(750-763)Online publication date: 4-Aug-2024
      • (2024)Intel Accelerators Ecosystem: An SoC-Oriented Perspective : Industry Product2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00066(848-862)Online publication date: 29-Jun-2024
      • (2024)HADES: Hardware-Assisted Distributed Transactions in the Age of Fast Networks and SmartNICs2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00062(785-800)Online publication date: 29-Jun-2024
      • (2024)DmRPC: Disaggregated Memory-aware Datacenter RPC for Data-intensive Applications2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00291(3796-3809)Online publication date: 13-May-2024
      • (2023)Deploying Computational Storage for HTAP DBMSs Takes More Than Just Computation OffloadingProceedings of the VLDB Endowment10.14778/3583140.358316116:6(1480-1493)Online publication date: 1-Feb-2023
      • (2023)NeuroLPM - Scaling Longest Prefix Match Hardware with Neural NetworksProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623769(886-899)Online publication date: 28-Oct-2023
      • (2023)Marlin: A Concurrent and Write-Optimized B+-tree Index on Disaggregated MemoryProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605576(695-704)Online publication date: 7-Aug-2023
      • (2023)Host Efficient Networking Stack Utilizing NIC DRAMProceedings of the 7th Asia-Pacific Workshop on Networking10.1145/3600061.3600070(8-14)Online publication date: 29-Jun-2023
      • (2023)SplitRPC: A {Control + Data} Path Splitting RPC Stack for ML Inference ServingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35899747:2(1-26)Online publication date: 22-May-2023
      • (2023)Contiguitas: The Pursuit of Physical Memory Contiguity in DatacentersProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589079(1-15)Online publication date: 17-Jun-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media