research-article

The benefits of general-purpose on-NIC memory

Authors: Boris Pismenny,

Liran Liss, Adam Morrison,

Dan TsafrirAuthors Info & Claims

ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 1130 - 1147

https://doi.org/10.1145/3503222.3507711

Published: 22 February 2022 Publication History

Abstract

We propose to use the small, newly available on-NIC memory ("nicmem") to keep pace with the rapidly increasing performance of NICs. We motivate our proposal by accelerating two types of workload classes: NFV and key-value stores. As NFV workloads frequently operate on headers---rather than data---of incoming packets, we introduce a new packet-processing architecture that splits between the two, keeping the data on nicmem when possible and thus reducing PCIe traffic, memory bandwidth, and CPU processing time. Our approach consequently shortens NFV latency by up to 23% and increases its throughput by up to 19%. Similarly, because key-value stores commonly exhibit skewed distributions, we introduce a new network stack mechanism that lets applications keep frequently accessed items on nicmem. Our design shortens memcached latency by up to 43% and increases its throughput by up to 80%.

References

[1]

Atul Adya, Daniel Myers, Henry Qin, and Robert Grandl. 2019. Fast key-value stores: An idea whose time has come and gone (HotOS’19 talk slides). https://ai.google/research/pubs/pub48030 (Accessed: Aug 2021).

[2]

Fabien André, Stéphane Gouache, Nicolas Le Scouarnec, and Antoine Monsifrot. 2018. Don’ t share, Don’ t lock: Large-scale Software Connection Tracking with Krononat. In USENIX Annual Technical Conference (ATC). 453–466. https://www.usenix.org/conference/atc18/presentation/andre

[3]

Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload Analysis of a Large-Scale Key-Value Store. In ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 53––64. https://doi.org/10.1145/2254756.2254766

Digital Library

[4]

Tom Barbette, Cyril Soldani, and Laurent Mathy. 2015. Fast Userspace Packet Processing. In ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS). 5––16. https://doi.org/10.1109/ANCS.2015.7110116

[5]

Theophilus Benson, Aditya Akella, and David A. Maltz. 2010. Network Traffic Characteristics of Data Centers in the Wild. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. 267––280. https://doi.org/10.1145/1879141.1879175

Digital Library

[6]

Peter Bodik, Armando Fox, Michael J. Franklin, Michael I. Jordan, and David A. Patterson. 2010. Characterizing, Modeling, and Generating Workload Spikes for Stateful Services. In Symposium on Cloud Computing (SoCC). 241––252. https://doi.org/10.1145/1807128.1807166

Digital Library

[7]

S. Bradner and J. McQuaid. 1999. Benchmarking Methodology for Network Interconnect Devices. Internet Engineering Task Force, 31.

[8]

Jesse Brandeburg. 2019. ice: change default number of receive descriptors. https://marc.info/?l=linux-netdev&m=156771568024262&w=2 Intel. Accessed: June 2021.

[9]

Broadcom. 2015. NetXtreme BCXM57XX User Guide. https://docs.broadcom.com/doc/INGSRV170-CDUM100-R Accessed: 2021-04-16.

[10]

Broadcom. 2019. BCM957508-P2200G. https://docs.broadcom.com/doc/957508-P2200G-DS Accessed: 2021-04-16.

[11]

Broadcom. 2020. BCM5880X SmartNIC Adapters. https://docs.broadcom.com/docs/5880X-UG30X Accessed: 2021-04-16.

[12]

Broadcom. 2020. BCM957504-N1100G. https://docs.broadcom.com/doc/957504-N1100G-DS Accessed: 2021-04-16.

[13]

Broadcom. 2021. NetXtreme E-Series PCIe NIC Ethernet Adapters Specification Sheet. https://docs.broadcom.com/doc/netxtreme-e-series-pcie-nic-ethernet-adapters-specification-sheet Accessed: 2021-08-10.

[14]

Broadcom. 2021. NetXtreme-E User Guide. https://docs.broadcom.com/doc/netxtreme-e-user-guide Accessed: 2021-08-10.

[15]

Marco Spaziani Brunella, Giacomo Belocchi, Marco Bonola, Salvatore Pontarelli, Giuseppe Siracusano, Giuseppe Bianchi, Aniello Cammarano, Alessandro Palumbo, Luca Petrucci, and Roberto Bifulco. 2020. hXDP: Efficient Software Packet Processing on FPGA NICs. In USENIX Symposium on Operating System Design and Implementation (OSDI). 973–990. https://www.usenix.org/conference/osdi20/presentation/brunella

[16]

2019. CAIDA dataset. https://www.caida.org/catalog/datasets/trace_stats/ (Accessed: May 2021.).

[17]

CDW. 2021. 100GbE adapter prices. https://www.cdw.com/search/networking/network-adapters/ethernet-adapters/?w=RB1&ln=0&filter=af_networking_data_link_protocol_rb1_ss%3a(%22100+Gigabit+Ethernet%22)%2caf_networking_form_factor_rb1_ss%3a(%22Plug-in+card%22)&SortBy=PriceAsc Accessed: 2021-08-31.

[18]

Jonathan Chang, Yen-Huei Chen, Wei-Min Chan, Sahil Preet Singh, Hank Cheng, Hidehiro Fujiwara, Jih-Yu Lin, Kao-Cheng Lin, John Hung, Robin Lee, Hung-Jen Liao, Jhon-Jhy Liaw, Quincy Li, Chih-Yung Lin, Mu-Chi Chiang, and Shien-Yang Wu. 2017. A 7nm 256Mb SRAM in high-k metal-gate FinFET technology with write-assist circuitry for low-V_MIN applications. In IEEE International Solid-State Circuits Conference (ISSCC). 206–207. https://doi.org/10.1109/ISSCC.2017.7870333

[19]

Moses Charikar, Kevin Chen, and Martin Farach-Colton. 2002. Finding frequent items in data streams. In International Colloquium on Automata, Languages, and Programming. 693–703. https://doi.org/10.14778/1454159.1454225

Digital Library

[20]

Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509.

[21]

Cisco. 2015. TRex: Realistic Traffic Generator. https://trex-tgn.cisco.com/ (Accessed: May 2021.).

[22]

Ehernet Technology Consortium. 2020. 800G specification. https://ethernettechnologyconsortium.org/wp-content/uploads/2020/03/800G-Specification_r1.0.pdf Accessed: 2021-08-09.

[23]

Graham Cormode and Shan Muthukrishnan. 2005. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 58–75.

Digital Library

[24]

Intel Corporation. 2012. Intel Data Direct I/O Technology (Intel DDIO): A Primer. https://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/data-direct-i-o-technology-brief.pdf Accessed: 2020-07-18.

[25]

CSET. 2020. AI Chips: What They Are and Why They Matter. https://cset.georgetown.edu/publication/ai-chips-what-they-are-and-why-they-matter/ Accessed: 2021-08-28.

[26]

Alexandros Daglis, Mark Sutherland, and Babak Falsafi. 2019. RPCValet: NI-Driven Tail-Aware Balancing of µs-Scale RPCs. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 35–48. https://doi.org/10.1145/3297858.3304070

Digital Library

[27]

Michael Dalton, David Schultz, Jacob Adriaens, Ahsan Arefin, Anshuman Gupta, Brian Fahs, Dima Rubinstein, Enrique Cauich Zermeno, Erik Rubow, James Alexander Docauer, Jesse Alpert, Jing Ai, Jon Olson, Kevin DeCabooter, Marc de Kruijf, Nan Hua, Nathan Lewis, Nikhil Kasinadhuni, Riccardo Crepaldi, Srinivas Krishnan, Subbaiah Venkata, Yossi Richter, Uday Naik, and Amin Vahdat. 2018. Andromeda: Performance, Isolation, and Velocity at Scale in Cloud Network Virtualization. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 373–387. https://www.usenix.org/conference/nsdi18/presentation/dalton

[28]

Daniel Nenni. 2020. 7nm price is about right. https://semiwiki.com/forum/index.php?threads/5nm-wafer-cost-very-high.13101/#post-44127 SemiWiki forum discussion of CSET wafer prices report. Accessed: 2021-08-28.

[29]

Intel Ethernet Networking Division. 2019. Intel 82599 10 GbE Controller Datasheet. https://www.intel.com/content/www/us/en/products/details/ethernet/500-controllers/82599-10-controllers/docs.html?s=Newest Accessed: 2021-08-10.

[30]

Mihai Dobrescu, Katerina Argyraki, and Sylvia Ratnasamy. 2012. Toward Predictable Performance in Software Packet-Processing Platforms. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 141–154. https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/dobrescu

[31]

Mihai Dobrescu, Norbert Egi, Katerina Argyraki, Byung-Gon Chun, Kevin Fall, Gianluca Iannaccone, Allan Knies, Maziar Manesh, and Sylvia Ratnasamy. 2009. RouteBricks: Exploiting Parallelism to Scale Software Routers. In ACM Symposium on Operating Systems Principles (SOSP). 15––28. https://doi.org/10.1145/1629575.1629578

Digital Library

[32]

Daniel E. Eisenbud, Cheng Yi, Carlo Contavalli, Cody Smith, Roman Kononov, Eric Mann-Hielscher, Ardas Cilingiroglu, Bin Cheyney, Wentao Shang, and Jinnah Dylan Hosein. 2016. Maglev: A Fast and Reliable Software Network Load Balancer. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 523–535. https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/eisenbud

[33]

Haggai Eran, Lior Zeno, Maroun Tork, Gabi Malka, and Mark Silberstein. 2019. NICA: An Infrastructure for Inline Acceleration of Network Applications. In USENIX Annual Technical Conference (ATC). 345–362. https://www.usenix.org/conference/atc19/presentation/eran

[34]

Alireza Farshin, Tom Barbette, Amir Roozbeh, Gerald Q. Maguire Jr., and Dejan Kostić. 2021. PacketMill: Toward per-Core 100-Gbps Networking. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1––17. https://doi.org/10.1145/3445814.3446724

Digital Library

[35]

Alireza Farshin, Amir Roozbeh, Gerald Q. Maguire Jr., and Dejan Kostić. 2020. eexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks. In USENIX Annual Technical Conference (ATC). 673–689. https://www.usenix.org/conference/atc20/presentation/farshin

[36]

Alireza Farshin, Amir Roozbeh, Gerald Q Maguire Jr, and Dejan Kostić. 2019. Make the most out of last level cache in intel processors. In ACM Eurosys. 1–17.

[37]

Roy T. Fielding and Gail Kaiser. 1997. The Apache HTTP Server Project. IEEE Internet Computing, 1, 4 (1997), Jul, 88–90. http://dx.doi.org/10.1109/4236.612229

Digital Library

[38]

Brad Fitzpatrick. 2004. Distributed Caching with Memcached. Linux Journal, 2004, 124 (2004), Aug, 5. http://dl.acm.org/citation.cfm?id=1012889.1012894

Digital Library

[39]

Mario Flajslik and Mendel Rosenblum. 2013. Network Interface Design for Low Latency Request-Response Protocols. In USENIX Annual Technical Conference (ATC). 333–346. https://www.usenix.org/conference/atc13/technical-sessions/presentation/flajslik

[40]

Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay. 2020. Caladan: Mitigating interference at microsecond timescales. In USENIX Symposium on Operating System Design and Implementation (OSDI). 281–297.

[41]

Drew Gallatin. 2017. Serving 100 Gbps from an Open Connect Appliance. https://netflixtechblog.com/serving-100-gbps-from-an-open-connect-appliance-cdb51dda3b99 Accessed: 2020-09-09.

[42]

Johan Garcia, Topi Korhonen, Ricky Andersson, and Filip Västlund. 2018. Towards Video Flow Classification at a Million Encrypted Flows Per Second. In 2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA). 358–365. https://doi.org/10.1109/AINA.2018.00061

[43]

Massimo Girondi, Marco Chiesa, and Tom Barbette. 2021. High-speed Connection Tracking in Modern Servers. In IEEE International Conference on High Performance Switching and Routing (HPSR). 1–8. https://doi.org/10.1109/HPSR52026.2021.9481841

[44]

Younghwan Go, Muhammad Asim Jamshed, YoungGyoun Moon, Changho Hwang, and KyoungSoo Park. 2017. APUNet: Revitalizing GPU as Packet Processing Accelerator. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 83–96. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/go

[45]

Hossein Golestani, Amirhossein Mirhosseini, and Thomas F. Wenisch. 2019. Software Data Planes: You Can’t Always Spin to Win. In Symposium on Cloud Computing (SoCC). 337–350. https://doi.org/10.1145/3357223.3362737

Digital Library

[46]

Google. 2021. HTTPS encryption on the web. https://transparencyreport.google.com/https/overview Accessed: 2021-08-05.

[47]

Swati Goswami, Nodir Kodirov, Craig Mustard, Ivan Beschastnikh, and Margo Seltzer. 2020. Parking Packet Payload with P4. In ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT). 274––281. https://doi.org/10.1145/3386367.3431295

Digital Library

[48]

Stewart Grant, Anil Yelam, Maxwell Bland, and Alex C. Snoeren. 2020. SmartNIC Performance Isolation with FairNIC: Programmable Networking for the Cloud. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 681––693. https://doi.org/10.1145/3387514.3405895

Digital Library

[49]

Intel Ethernet Product Group. 2021. Intel Ethernet Controller X710/XXV710/XL710. https://www.intel.com/content/dam/www/public/us/en/documents/release-notes/xl710-ethernet-controller-feature-matrix.pdf Accessed: 2021-08-10.

[50]

Sangjin Han, Keon Jang, KyoungSoo Park, and Sue Moon. 2010. PacketShader: A GPU-Accelerated Software Router. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 195–206. https://doi.org/10.1145/1851182.1851207

Digital Library

[51]

Thulara N. Hewage, Malka N. Halgamuge, Ali Syed, and Gullu Ekici. 2018. Review: Big data techniques of google, Amazon, Facebook and Twitter. Oxford University Press Journal of Communications, 13, 2 (2018), Feb, 94–100. https://doi.org/10.12720/jcm.13.2.94-100

[52]

Intel. 2021. 3rd Generation Intel® Xeon® Scalable Processors. https://ark.intel.com/content/www/us/en/ark/products/series/204098/3rd-generation-intel-xeon-scalable-processors.html Accessed: 2021-08-31.

[53]

Intel. 2021. Intel Ethernet Network Adapter E810-2CQDA2. https://ark.intel.com/content/www/us/en/ark/products/192561/intel-ethernet-network-adapter-e810-cqda1.html Accessed: 2021-08-10.

[54]

Intel. 2021. Intel Ethernet Network Adapter E810-2CQDA2. https://ark.intel.com/content/www/us/en/ark/products/210969/intel-ethernet-network-adapter-e810-2cqda2.html Accessed: 2021-08-10.

[55]

Intel. 2022. Processor Counter Monitor (PCM). https://github.com/opcm/pcm Accessed: 2021-02-05.

[56]

Intel Corporation. 2010. DPDK: Data Plane Development Kit. http://dpdk.org (Accessed: May 2016).

[57]

Intel Corporation. 2012. L3 Forwarding Sample Application. https://doc.dpdk.org/guides/sample_app_ug/l3_forward.html (Accessed: May 2021).

[58]

Intel Corporation. 2020. DPDK Ping-Pong. https://github.com/zylan29/dpdk-pingpong (Accessed: May 2021).

[59]

Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, Jon Zolla, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2013. B4: Experience with a Globally-Deployed Software Defined Wan. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 3––14. https://doi.org/10.1145/2486001.2486019

Digital Library

[60]

Piotr Jurkiewicz, Grzegorz Rzym, and Piotr Boryło. 2021. Flow length and size distributions in campus Internet traffic. Computer Communications, 167 (2021), 15–30. https://www.sciencedirect.com/science/article/pii/S0140366420320223

[61]

Anuj Kalia, Dong Zhou, Michael Kaminsky, and David G. Andersen. 2015. Raising the Bar for Using GPUs in Software Packet Processing. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 409–423. https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/kalia

[62]

Georgios P. Katsikas, Tom Barbette, Dejan Kostić, Rebecca Steinert, and Gerald Q. Maguire Jr. 2018. Metron: NFV Service Chains at the True Speed of the Underlying Hardware. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 171–186. https://www.usenix.org/conference/nsdi18/presentation/katsikas

[63]

Antoine Kaufmann, SImon Peter, Naveen Kr. Sharma, Thomas Anderson, and Arvind Krishnamurthy. 2016. High Performance Packet Processing with FlexNIC. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 67–81. http://dx.doi.org/10.1145/2872362.2872367

Digital Library

[64]

Eddie Kohler, Robert Morris, and Benjie Chen. 2002. Programming Language Optimizations for Modular Router Configurations. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 251––263. https://doi.org/10.1145/605397.605424

Digital Library

[65]

Kevin Laatz. 2018. [dpdk-dev] [PATCH v2 0/3] Increase default RX/TX ring sizes. https://mails.dpdk.org/archives/dev/2018-January/086889.html Intel DPDK. Accessed: June 2021.

[66]

Redis Labs. 2009. Redis. https://redis.io/ Accessed: 2021-08-06.

[67]

Nikita Lazarev, Shaojie Xiang, Neil Adit, Zhiru Zhang, and Christina Delimitrou. 2021. Dagger: Efficient and Fast RPCs in Cloud Microservices with near-Memory Reconfigurable NICs. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 36––51. https://doi.org/10.1145/3445814.3446696

Digital Library

[68]

Baptiste Lepers, Vivien Quema, and Alexandra Fedorova. 2015. Thread and Memory Placement on NUMA Systems: Asymmetry Matters. In USENIX Annual Technical Conference (ATC). 277–289. https://www.usenix.org/conference/atc15/technical-session/presentation/lepers

[69]

Ilya Lesokhin, Haggai Eran, Shachar Raindel, Guy Shapiro, Sagi Grimberg, Liran Liss, Muli Ben-Yehuda, Nadav Amit, and Dan Tsafrir. 2017. Page fault support for network controllers. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 449–466. https://doi.org/10.1145/3037697.3037710

Digital Library

[70]

Bojie Li, Zhenyuan Ruan, Wencong Xiao, Yuanwei Lu, Yongqiang Xiong, Andrew Putnam, Enhong Chen, and Lintao Zhang. 2017. KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC. In ACM Symposium on Operating Systems Principles (SOSP). 137–152. https://doi.org/10.1145/3132747.3132756

Digital Library

[71]

Bojie Li, Kun Tan, Layong (Larry) Luo, Yanqing Peng, Renqian Luo, Ningyi Xu, Yongqiang Xiong, Peng Cheng, and Enhong Chen. 2016. ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 1––14. https://doi.org/10.1145/2934872.2934897

Digital Library

[72]

Sheng Li, Hyeontaek Lim, Victor W. Lee, Jung Ho Ahn, Anuj Kalia, Michael Kaminsky, David G. Andersen, Seongil O, Sukhan Lee, and Pradeep Dubey. 2015. Architecting to achieve a billion requests per second throughput on a single key-value store server platform. In ACM International Symposium on Computer Architecture (ISCA). 476–488. https://doi.org/10.1145/2749469.2750416

Digital Library

[73]

Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 429–444. https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/lim

[74]

Ming Liu, Tianyi Cui, Henry Schuh, Arvind Krishnamurthy, Simon Peter, and Karan Gupta. 2019. Offloading Distributed Applications onto SmartNICs Using IPipe. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 318–333. https://doi.org/10.1145/3341302.3342079

Digital Library

[75]

Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating Keys from Values in SSD-conscious Storage. In USENIX Conference on File and Storage Technologies (FAST). 133–148. https://www.usenix.org/conference/fast16/technical-sessions/presentation/lu

[76]

Antonis Manousis, Rahul Anand Sharma, Vyas Sekar, and Justine Sherry. 2020. Contention-Aware Performance Prediction For Virtualized Network Functions. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 270––282. https://doi.org/10.1145/3387514.3405868

Digital Library

[77]

Ilias Marinos, Robert N.M. Watson, and Mark Handley. 2014. Network Stack Specialization for Performance. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 175–186. http://doi.acm.org/10.1145/2619239.2626311

Digital Library

[78]

Ilias Marinos, Robert N.M. Watson, Mark Handley, and Randall R. Stewart. 2017. Disk|Crypt|Net: Rethinking the Stack for High-performance Video Streaming. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 211–224. https://doi.org/10.1145/3098822.3098844

Digital Library

[79]

Marvell. 2020. FastLinQ 41000 Series Adapters. https://www.marvell.com/content/dam/marvell/en/public-collateral/ethernet-adaptersandcontrollers/marvell-ethernet-adapters-fastlinq-41000-series-user-guide.pdf Accessed: June 2021.

[80]

John D. McCalpin. 2016. Memory Bandwidth and System Balance in HPC Systems. In ACM/IEEE Supercomputing (SC). https://sites.utexas.edu/jdm4372/2016/11/22/sc16-invited-talk-memory-bandwidth-and-system-balance-in-hpc-systems/

[81]

Mellanox. 2017. ConnectX®-5 En Card Product Brief. https://www.mellanox.com/sites/default/files/related-docs/prod_adapter_cards/PB_ConnectX-5_EN_Card.pdf Accessed: 2019-08-06.

[82]

Mellanox. 2018. ConnectX®-6 En Card Product Brief. https://www.mellanox.com/sites/default/files/related-docs/prod_adapter_cards/PB_ConnectX-6_EN_Card.pdf Accessed: 2019-08-06.

[83]

Mellanox. 2018. Mellanox NEO-Host. https://www.mellanox.com/sites/default/files/related-docs/prod_management_software/PB_Mellanox_NEO_Host.pdf Accessed: 2021-04-16.

[84]

Mellanox. 2020. ConnectX®-6 Dx En Card Product Brief. https://www.mellanox.com/sites/default/files/related-docs/prod_adapter_cards/PB_ConnectX-6_Dx_EN_Card.pdf Accessed: 2020-07-06.

[85]

Mellanox. 2020. Mellanox ASAP2. https://www.mellanox.com/files/doc-2020/sb-asap2.pdf Accessed: 2022-01-05.

[86]

Mellanox. 2021. Device Memory Programming Model. https://docs.mellanox.com/display/OFEDv502180/Programming#Programming-DeviceMemoryProgramming Accessed: 2021-11-20.

[87]

Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2005. Efficient computation of frequent and top-k elements in data streams. In International conference on database theory. 398–412. https://doi.org/10.1007/978-3-540-30570-5_27

Digital Library

[88]

Amirhossein Mirhosseini, Hossein Golestani, and Thomas F. Wenisch. 2020. HyperPlane: A Scalable Low-Latency Notification Accelerator for Software Data Planes. In IEEE/ACM International Symposium on Microarchitecture (MICRO). 852–867. https://doi.org/10.1109/MICRO50266.2020.00074

[89]

Jeffrey C Mogul. 2003. TCP Offload Is a Dumb Idea Whose Time Has Come. In USENIX Workshop on Hot Topics in Operating Systems (HotOS). 25–30. https://www.usenix.org/conference/hotos-ix/tcp-offload-dumb-idea-whose-time-has-come

[90]

Robert Morris, Eddie Kohler, John Jannotti, and M. Frans Kaashoek. 1999. The Click Modular Router. In ACM Symposium on Operating Systems Principles (SOSP). 217––231. https://doi.org/10.1145/319151.319166

Digital Library

[91]

David Naylor, Alessandro Finamore, Ilias Leontiadis, Yan Grunenberger, Marco Mellia, Maurizio Munafò, Konstantina Papagiannaki, and Peter Steenkiste. 2014. The Cost of the "S" in HTTPS. In ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT). 133––140. https://doi.org/10.1145/2674005.2674991

Digital Library

[92]

Rolf Neugebauer, Gianni Antichi, José Fernando Zazo, Yury Audzevich, Sergio López-Buedo, and Andrew W. Moore. 2018. Understanding PCIe Performance for End Host Networking. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 327––341. https://doi.org/10.1145/3230543.3230560

Digital Library

[93]

NVIDIA. 2021. Bluefield-2 DPU. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/documents/datasheet-nvidia-bluefield-2-dpu.pdf Accessed: 2021-04-16.

[94]

NVIDIA. 2021. ConnectX®-7 Card Product Brief. https://www.nvidia.com/content/dam/en-zz/Solutions/networking/ethernet-adapters/connectx-7-datasheet-Final.pdf Accessed: 2021-04-16.

[95]

Chris Nyberg, Tom Barclay, Zarka Cvetanovic, Jim Gray, and Dave Lomet. 1994. AlphaSort: A RISC Machine Sort. In ACM SIGMOD International Conference on Management of Data. 233––242. https://doi.org/10.1145/191839.191884

Digital Library

[96]

Vladimir Andrei Olteanu, Felipe Huici, and Costin Raiciu. 2015. Lost in Network Address Translation: Lessons from Scaling the World’s Simplest Middlebox. In ACM SIGCOMM Workshop on Hot Topics in Middleboxes and Network Function Virtualization (HotMiddlebox). 19––24. https://doi.org/10.1145/2785989.2785994

Digital Library

[97]

Amy Ousterhout, Joshua Fried, Jonathan Behrens, Adam Belay, and Hari Balakrishnan. 2019. Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 361–378. https://www.usenix.org/conference/nsdi19/presentation/ousterhout

[98]

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Stanford InfoLab.

[99]

Aurojit Panda, Sangjin Han, Keon Jang, Melvin Walls, Sylvia Ratnasamy, and Scott Shenker. 2016. NetBricks: Taking the V out of NFV. In USENIX Symposium on Operating System Design and Implementation (OSDI). 203–216. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/panda

[100]

Parveen Patel, Deepak Bansal, Lihua Yuan, Ashwin Murthy, Albert Greenberg, David A. Maltz, Randy Kern, Hemant Kumar, Marios Zikos, Hongyu Wu, Changhoon Kim, and Naveen Karri. 2013. Ananta: Cloud Scale Load Balancing. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 207––218. https://doi.org/10.1145/2486001.2486026

Digital Library

[101]

Paul Alcorn. 2021. AMD Shows New 3D V-Cache Ryzen Chiplets, up to 192MB of L3 Cache, 15% Gaming Improvement (Updated). https://www.tomshardware.com/uk/news/amd-shows-new-3d-v-cache-ryzen-chiplets-up-to-192mb-of-l3-cache-per-chip-15-gaming-improvement Accessed: 2021-08-28.

[102]

Boris Pismenny, Haggai Eran, Aviad Yehezkel, Liran Liss, Adam Morrison, and Dan Tsafrir. 2021. Autonomous NIC Offloads. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 18––35. https://doi.org/10.1145/3445814.3446732

Digital Library

[103]

Salvatore Pontarelli, Roberto Bifulco, Marco Bonola, Carmelo Cascone, Marco Spaziani, Valerio Bruschi, Davide Sanvito, Giuseppe Siracusano, Antonio Capone, Michio Honda, Felipe Huici, and Giuseppe Siracusano. 2019. FlowBlaze: Stateful Packet Processing in Hardware. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 531–548. https://www.usenix.org/conference/nsdi19/presentation/pontarelli

[104]

Samira Pouyanfar, Yimin Yang, Shu-Ching Chen, Mei-Ling Shyu, and S. S. Iyengar. 2018. Multimedia Big Data Analytics: A Survey. ACM Compututing Surveys (CSUR), 51, 1 (2018), Apr, Article No. 10. https://doi.org/10.1145/3150226

Digital Library

[105]

Mia Primorac, Edouard Bugnion, and Katerina Argyraki. 2017. How to Measure the Killer Microsecond. In Proceedings of the Workshop on Kernel-Bypass Networks. 37––42. https://doi.org/10.1145/3098583.3098590

Digital Library

[106]

2005. RDMA Core Userspace Libraries and Daemons. https://github.com/linux-rdma/rdma-core (Accessed: May 2021.).

[107]

Luigi Rizzo. 2012. Netmap: A Novel Framework for Fast Packet I/O. In USENIX Annual Technical Conference (ATC). https://www.usenix.org/conference/atc12/technical-sessions/presentation/rizzo

[108]

Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C. Snoeren. 2015. Inside the Social Network’s (Datacenter) Network. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 123––137. https://doi.org/10.1145/2785956.2787472

Digital Library

[109]

Igor Smolyar, Alex Markuze, Boris Pismenny, Haggai Eran, Gerd Zellweger, Austin Bolen, Liran Liss, Adam Morrison, and Dan Tsafrir. 2020. IOctopus: Outsmarting Nonuniform DMA. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 101–115. https://doi.org/10.1145/3373376.3378509

Digital Library

[110]

Patrick Stuedi, Animesh Trivedi, and Bernard Metzler. 2012. Wimpy Nodes with 10GbE: Leveraging One-Sided Operations in Soft-RDMA to Boost Memcached. In USENIX Annual Technical Conference (ATC). 347–353. https://www.usenix.org/conference/atc12/technical-sessions/presentation/stuedi

[111]

Mark Sutherland, Siddharth Gupta, Babak Falsafi, Virendra Marathe, Dionisios Pnevmatikatos, and Alexandres Daglis. 2020. The NeBuLa RPC-Optimized Architecture. In ACM International Symposium on Computer Architecture (ISCA). 199–212. https://doi.org/10.1109/ISCA45697.2020.00027

Digital Library

[112]

Shelby Thomas, Geoffrey M. Voelker, and George Porter. 2018. CacheCloud: Towards Speed-of-light Datacenter Communication. In USENIX Workshop on Hot Topics in Cloud Computing (HotCloud). https://www.usenix.org/conference/hotcloud18/presentation/thomas

[113]

Amin Tootoonchian, Aurojit Panda, Chang Lan, Melvin Walls, Katerina Argyraki, Sylvia Ratnasamy, and Scott Shenker. 2018. ResQ: Enabling SLOs in Network Function Virtualization. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 283––297. https://www.usenix.org/conference/nsdi18/presentation/tootoonchian

[114]

Tariq Toukan. 2017. [PATCH net-next 08/10] net/mlx4_en: Increase default TX ring size. https://www.mail-archive.com/[email protected]/msg173779.html Mellanox. Accessed: June 2021.

[115]

Qing Want, Youyou Lu, and Jiwu Shu. 2022. Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory. In To appear in ACM SIGMOD International Conference on Management of Data. arxiv:2112.07320

[116]

Juncheng Yang, Yao Yue, and K. V. Rashmi. 2020. A large scale analysis of hundreds of in-memory cache clusters at Twitter. In USENIX Symposium on Operating System Design and Implementation (OSDI). 191–208. https://www.usenix.org/conference/osdi20/presentation/yang

[117]

Yifan Yuan, Mohammad Alian, Yipeng Wang, Ren Wang, Ilia Kurakin, Charlie Tai, and Nam Sung Kim. 2021. Don’t Forget the I/O When Allocating Your LLC. In ACM International Symposium on Computer Architecture (ISCA). 112–125. https://doi.org/10.1109/ISCA52012.2021.00018

Digital Library

[118]

Zhipeng Zhao, Hugo Sadok, Nirav Atre, James C. Hoe, Vyas Sekar, and Justine Sherry. 2020. Achieving 100Gbps Intrusion Prevention on a Single Server. In USENIX Symposium on Operating System Design and Implementation (OSDI). 1083–1100. https://www.usenix.org/conference/osdi20/presentation/zhao-zhipeng

Cited By

Li XJiang XYang YChen LWang YWang CXu CLv YYang BWu TGao HChen ZQiao YDing HDong YYang HSong JLu JZhang PWei CZhang ZChen WHe QZhu SSekar VYu MSeneviratne AVeitch D(2024)Triton: A Flexible Hardware Offloading Architecture for Accelerating Apsara vSwitch in Alibaba CloudProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672224(750-763)Online publication date: 4-Aug-2024
https://dl.acm.org/doi/10.1145/3651890.3672224
Yuan YWang RRanganathan NRao NKumar SLantz PSanjeepan VCabrera JKwatra ASankaran RJeong IKim N(2024)Intel Accelerators Ecosystem: An SoC-Oriented Perspective : Industry Product2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00066(848-862)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00066
Kokolis APsistakis AReidys BHuang JTorrellas J(2024)HADES: Hardware-Assisted Distributed Transactions in the Age of Fast Networks and SmartNICs2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00062(785-800)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00062
Show More Cited By

Index Terms

The benefits of general-purpose on-NIC memory
1. Networks
  1. Network components
    1. End nodes
      1. Network adapters

Recommendations

Cooperating Write Buffer Cache and Virtual Memory Management for Flash Memory Based Systems
RTAS '11: Proceedings of the 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium

Flash memory is becoming the storage media of choice for mobile devices and embedded systems. The performance of flash memory is impacted by the asymmetric speed of read and write operations, limited number of erase times and the absence of in-place ...
Exploration of Non-volatile Memory Management in the OS Kernel
ICNC '12: Proceedings of the 2012 Third International Conference on Networking and Computing

Non-volatile memory's future is promising because its performance has been improved significantly. The performance improvement enables non-volatile memory to be a major part of the memory of general purpose systems. Utilization of large non-volatile ...
Dynamic Memory Pressure Aware Ballooning
MEMSYS '15: Proceedings of the 2015 International Symposium on Memory Systems

Hardware virtualization is a major component of large scale server and data center deployments due to their facilitation of server consolidation and scalability. Virtualization, however, comes at a high cost in terms of system main memory utilization. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

February 2022

1164 pages

ISBN:9781450392051

DOI:10.1145/3503222

General Chairs:
Babak Falsafi
EPFL, Switzerland
,
Michael Ferdman
Stony Brook University, USA
,
Program Chairs:
Shan Lu
University of Chicago, USA
,
Tom Wenisch
University of Michigan, USA / Google, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Conference

ASPLOS '22

Sponsor:

ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

February 28 - March 4, 2022

Lausanne, Switzerland

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
1,181
Total Downloads

Downloads (Last 12 months)230
Downloads (Last 6 weeks)24

Reflects downloads up to 12 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li XJiang XYang YChen LWang YWang CXu CLv YYang BWu TGao HChen ZQiao YDing HDong YYang HSong JLu JZhang PWei CZhang ZChen WHe QZhu SSekar VYu MSeneviratne AVeitch D(2024)Triton: A Flexible Hardware Offloading Architecture for Accelerating Apsara vSwitch in Alibaba CloudProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672224(750-763)Online publication date: 4-Aug-2024
https://dl.acm.org/doi/10.1145/3651890.3672224
Yuan YWang RRanganathan NRao NKumar SLantz PSanjeepan VCabrera JKwatra ASankaran RJeong IKim N(2024)Intel Accelerators Ecosystem: An SoC-Oriented Perspective : Industry Product2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00066(848-862)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00066
Kokolis APsistakis AReidys BHuang JTorrellas J(2024)HADES: Hardware-Assisted Distributed Transactions in the Age of Fast Networks and SmartNICs2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00062(785-800)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00062
Zhang JChen XZhang YWang Z(2024)DmRPC: Disaggregated Memory-aware Datacenter RPC for Data-intensive Applications2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00291(3796-3809)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00291
Lee KJo IAhn JLee HLee HSul WJung H(2023)Deploying Computational Storage for HTAP DBMSs Takes More Than Just Computation OffloadingProceedings of the VLDB Endowment10.14778/3583140.358316116:6(1480-1493)Online publication date: 1-Feb-2023
https://dl.acm.org/doi/10.14778/3583140.3583161
Rashelbach Ade Paula ISilberstein M(2023)NeuroLPM - Scaling Longest Prefix Match Hardware with Neural NetworksProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623769(886-899)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3623769
An HWang FFeng DZou XLiu ZZhang J(2023)Marlin: A Concurrent and Write-Optimized B+-tree Index on Disaggregated MemoryProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605576(695-704)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605576
Lee BLee DOk JYoon WMoon S(2023)Host Efficient Networking Stack Utilizing NIC DRAMProceedings of the 7th Asia-Pacific Workshop on Networking10.1145/3600061.3600070(8-14)Online publication date: 29-Jun-2023
https://dl.acm.org/doi/10.1145/3600061.3600070
Kumar ASivasubramaniam AZhu T(2023)SplitRPC: A {Control + Data} Path Splitting RPC Stack for ML Inference ServingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35899747:2(1-26)Online publication date: 22-May-2023
https://dl.acm.org/doi/10.1145/3589974
Zhao KXue KWang ZSchatzberg DYang LManousis AWeiner JVan Riel RSharma BTang CSkarlatos DSolihin YHeinrich M(2023)Contiguitas: The Pursuit of Physical Memory Contiguity in DatacentersProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589079(1-15)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589079
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents