research-article

Open access

Architecting to achieve a billion requests per second throughput on a single key-value store server platform

Authors:

Michael Kaminsky,

David G. Andersen,

Sukhan Lee, and

Pradeep DubeyAuthors Info & Claims

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

June 2015

Pages 476 - 488

https://doi.org/10.1145/2749469.2750416

Published: 13 June 2015 Publication History

Abstract

Distributed in-memory key-value stores (KVSs), such as memcached, have become a critical data serving layer in modern Internet-oriented datacenter infrastructure. Their performance and efficiency directly affect the QoS of web services and the efficiency of datacenters. Traditionally, these systems have had significant overheads from inefficient network processing, OS kernel involvement, and concurrency control. Two recent research thrusts have focused upon improving key-value performance. Hardware-centric research has started to explore specialized platforms including FPGAs for KVSs; results demonstrated an order of magnitude increase in throughput and energy efficiency over stock memcached. Software-centric research revisited the KVS application to address fundamental software bottlenecks and to exploit the full potential of modern commodity hardware; these efforts too showed orders of magnitude improvement over stock memcached.

We aim at architecting high performance and efficient KVS platforms, and start with a rigorous architectural characterization across system stacks over a collection of representative KVS implementations. Our detailed full-system characterization not only identifies the critical hardware/software ingredients for high-performance KVS systems, but also leads to guided optimizations atop a recent design to achieve a record-setting throughput of 120 million requests per second (MRPS) on a single commodity server. Our implementation delivers 9.2X the performance (RPS) and 2.8X the system energy efficiency (RPS/watt) of the best-published FPGA-based claims. We craft a set of design principles for future platform architectures, and via detailed simulations demonstrate the capability of achieving a billion RPS with a single server constructed following our principles.

References

[1]

Amazon Elasticache, http://aws.amazon.com/elasticache/.

[2]

Intel® Data Direct I/O Technology, http://www.intel.com/content/www/us/en/io/direct-data-i-o.html.

[3]

Intel® Ethernet Flow Director, http://www.intel.com/content/www/us/en/ethernet-controllers/ethernet-flow-director-video.html.

[4]

How Linkedin uses memcached, http://www.oracle.com/technetwork/server-storage/ts-4696-159286.pdf.

[5]

Intel® I/O Acceleration Technology, http://www.intel.com/content/www/us/en/wireless-network/accel-technology.html.

[6]

Mellanox® 100Gbps Ethernet NIC, http://www.mellanox.com/related-docs/prod_silicon/PB_ConnectX-4_VPI_Card.pdf.

[7]

Memcached: A distributed memory object caching system, http://memcached.org/.

[8]

Memcached SPOF Mystery, https://blog.twitter.com/2010/memcached-spof-mystery.

[9]

Netflix EVCache, http://techblog.netflix.com/2012/01/ephemeral-volatile-caching-in-cloud.html.

[10]

Mellanox® OpenFabrics Enterprise Distribution for Linux (MLNX_OFED), http://www.mellanox.com/page/products_dyn?product_family=26.

[11]

J. Ahn, S. Li, S. O, and N. P. Jouppi, "McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling," in ISPASS, 2013.

[12]

B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny, "Workload analysis of a large-scale key-value store," in SIGMETRICS, 2012.

Digital Library

[13]

A. Belay, G. Prekas, A. Klimovic, S. Grossman, C. Kozyrakis, and E. Bugnion, "IX: A protected dataplane operating system for high throughput and low latency," in OSDI, 2014.

Digital Library

[14]

M. Blott, K. Karras, L. Liu, K. Vissers, J. Bär, and Z. István, "Achieving 10Gbps line-rate key-value stores with FPGAs," in HotCloud, 2013.

[15]

S. R. Chalamalasetti, K. Lim, M. Wright, A. AuYoung, P. Ranganathan, and M. Margala, "An FPGA Memcached appliance," in FPGA, 2013.

Digital Library

[16]

B. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, "Benchmarking cloud serving systems with YCSB," in SOCC, 2010.

Digital Library

[17]

M. Dobrescu, N. Egi, K. Argyraki, B.-G. Chun, K. Fall, G. Iannaccone, A. Knies, M. Manesh, and S. Ratnasamy, "RouteBricks: Exploiting parallelism to scale software routers," in SOSP, 2009.

Digital Library

[18]

A. Dragojević, D. Narayanan, M. Castro, and O. Hodson, "FaRM: Fast remote memory," in NSDI, 2014.

Digital Library

[19]

B. Fan, D. G. Andersen, and M. Kaminsky, "MemC3: Compact and concurrent memcache with dumber caching and smarter hashing," in NSDI, 2013.

Digital Library

[20]

A. Gutierrez, M. Cieslak, B. Giridhar, R. G. Dreslinski, L. Ceze, and T. Mudge, "Integrated 3D-stacked server designs for increasing physical density of key-value stores," in ASPLOS, 2014.

Digital Library

[21]

S. Han, K. Jang, K. Park, and S. Moon, "PacketShader: a GPU-accelerated software router," in SIGCOMM, 2010.

Digital Library

[22]

M. Herlihy, N. Shavit, and M. Tzafrir, "Hopscotch hashing," in Distributed Computing. Springer, 2008, pp. 350--364.

Digital Library

[23]

R. Huggahalli, R. Iyer, and S. Tetrick, "Direct cache access for high bandwidth network I/O," in ISCA, 2005.

Digital Library

[24]

Intel, "Intel Data Plane Development Kit (Intel DPDK)," http://www.intel.com/go/dpdk, 2014.

[25]

R. Jevtic, H. Le, M. Blagojevic, S. Bailey, K. Asanovic, E. Alon, and B. Nikolic, "Per-core DVFS with switched-capacitor converters for energy efficiency in manycore processors," IEEE TVLSI, vol. 23, no. 4, pp. 723--730, 2015.

[26]

A. Kalia, M. Kaminsky, and D. G. Andersen, "Using RDMA efficiently for key-value services," in SIGCOMM, 2014.

Digital Library

[27]

R. Kapoor, G. Porter, M. Tewari, G. M. Voelker, and A. Vahdat, "Chronos: Predictable low latency for data center applications," in SOCC, 2012.

Digital Library

[28]

M. Lavasani, H. Angepat, and D. Chiou, "An FPGA-based in-line accelerator for Memcached," in HotChips, 2013.

[29]

S. Li, J. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in MICRO, 2009.

Digital Library

[30]

S. Li, K. Lim, P. Faraboschi, J. Chang, P. Ranganathan, and N. P. Jouppi, "System-level integrated server architectures for scale-out datacenters," in MICRO, 2011.

Digital Library

[31]

H. Lim, D. Han, D. G. Andersen, and M. Kaminsky, "MICA: A holistic approach to fast in-memory key-value storage," in NSDI, 2014.

Digital Library

[32]

K. Lim, D. Meisner, A. G. Saidi, P. Ranganathan, and T. F. Wenisch, "Thin Servers with Smart Pipes: Designing SoC accelerators for Memcached," in ISCA, 2013.

Digital Library

[33]

P. Lotfi-Kamran, B. Grot, M. Ferdman, S. Volos, O. Kocberber, J. Picorel, A. Adileh, D. Jevdjic, S. Idgunji, E. Ozer, and B. Falsafi, "Scale-out processors," in ISCA, 2012.

Digital Library

[34]

Y. Mao, E. Kohler, and R. T. Morris, "Cache craftiness for fast multicore key-value storage," in EuroSys, 2012.

Digital Library

[35]

C. Mitchell, Y. Geng, and J. Li, "Using one-sided RDMA reads to build a fast, CPU-efficient key-value store," in USENIX ATC, 2013.

Digital Library

[36]

R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and V. Venkataramani, "Scaling Memcache at Facebook," in NSDI, 2013.

Digital Library

[37]

S. Novakovic, A. Daglis, E. Bugnion, B. Falsafi, and B. Grot, "Scale-out NUMA," in ASPLOS, 2014.

Digital Library

[38]

D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum, "Fast crash recovery in RAMCloud," in SOSP, 2011.

Digital Library

[39]

R. Pagh and F. Rodler, "Cuckoo hashing," Journal of Algorithms, vol. 51, no. 2, pp. 122--144, May 2004.

Digital Library

[40]

D. A. Patterson, "Latency lags bandwith," Commun. ACM, vol. 47, no. 10, pp. 71--75, 2004.

Digital Library

[41]

A. Pesterev, J. Strauss, N. Zeldovich, and R. T. Morris, "Improving network connection locality on multicore systems," in EuroSys, 2012.

Digital Library

[42]

S. Peter, J. Li, I. Zhang, D. R. K. Ports, D. Woos, A. Krishnamurthy, T. Anderson, and T. Roscoe, "Arrakis: The operating system is the control plane," in OSDI, 2014.

Digital Library

[43]

L. Rizzo, "netmap: A novel framework for fast packet I/O," in USENIX ATC, 2012.

Digital Library

[44]

D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm, "Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor," in ISCA, 1996.

Digital Library

Cited By

Jin XBai ZZhang ZZhu YZhong YLiu X(2024)DistMind: Efficient Resource Disaggregation for Deep Learning WorkloadsIEEE/ACM Transactions on Networking10.1109/TNET.2024.335501032:3(2422-2437)Online publication date: Jun-2024
https://doi.org/10.1109/TNET.2024.3355010
Landgraf JGiordano MYoon ERossbach CAamodt TJerger NSwift M(2023)Reconfigurable Virtual Memory for FPGA-Driven I/OProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582048(556-571)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582048
Sutherland MFalsafi BDaglis AAamodt TJerger NSwift M(2023)Cooperative Concurrency Control for Write-Intensive Key-Value WorkloadsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3567955.3567957(30-46)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3567955.3567957
Show More Cited By

Index Terms

Architecting to achieve a billion requests per second throughput on a single key-value store server platform

Recommendations

Architecting to achieve a billion requests per second throughput on a single key-value store server platform
ISCA'15

Distributed in-memory key-value stores (KVSs), such as memcached, have become a critical data serving layer in modern Internet-oriented datacenter infrastructure. Their performance and efficiency directly affect the QoS of web services and the ...
Read More
Full-Stack Architecting to Achieve a Billion-Requests-Per-Second Throughput on a Single Key-Value Store Server Platform

Distributed in-memory key-value stores (KVSs), such as memcached, have become a critical data serving layer in modern Internet-oriented data center infrastructure. Their performance and efficiency directly affect the QoS of web services and the ...
Read More
Many-core key-value store
IGCC '11: Proceedings of the 2011 International Green Computing Conference and Workshops

Scaling data centers to handle task-parallel work-loads requires balancing the cost of hardware, operations, and power. Low-power, low-core-count servers reduce costs in one of these dimensions, but may require additional nodes to provide the required ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

June 2015

768 pages

ISBN:9781450334020

DOI:10.1145/2749469

General Chair:
Debbie Marr
Intel
,
Program Chair:
David Albonesi
Cornell

ACM SIGARCH Computer Architecture News Volume 43, Issue 3S
ISCA'15
June 2015
745 pages
ISSN:0163-5964
DOI:10.1145/2872887
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents

Copyright © 2015 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

IEEE TCCA: IEEE Computer Society Technical Committee on Computer Architecture
SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2015

Check for updates

Qualifiers

Research-article

Funding Sources

Korea government

Conference

ISCA '15

Sponsor:

IEEE TCCA
SIGARCH

ISCA '15: The 42nd Annual International Symposium on Computer Architecture

June 13 - 17, 2015

Oregon, Portland

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

96
Total Citations
View Citations
3,460
Total Downloads

Downloads (Last 12 months)341
Downloads (Last 6 weeks)44

Other Metrics

View Author Metrics

Citations

Cited By

Jin XBai ZZhang ZZhu YZhong YLiu X(2024)DistMind: Efficient Resource Disaggregation for Deep Learning WorkloadsIEEE/ACM Transactions on Networking10.1109/TNET.2024.335501032:3(2422-2437)Online publication date: Jun-2024
https://doi.org/10.1109/TNET.2024.3355010
Landgraf JGiordano MYoon ERossbach CAamodt TJerger NSwift M(2023)Reconfigurable Virtual Memory for FPGA-Driven I/OProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582048(556-571)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582048
Sutherland MFalsafi BDaglis AAamodt TJerger NSwift M(2023)Cooperative Concurrency Control for Write-Intensive Key-Value WorkloadsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3567955.3567957(30-46)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3567955.3567957
Stojkovic JMantri NSkarlatos DXu TTorrellas J(2023)Memory-Efficient Hashed Page Tables2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071061(1221-1235)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071061
Zuo PZhou QSun JYang LZhang SHua YCheng JHe RYan H(2022)RACE: One-sided RDMA-conscious Extendible HashingACM Transactions on Storage10.1145/351189518:2(1-29)Online publication date: 28-Apr-2022
https://dl.acm.org/doi/10.1145/3511895
Pismenny BLiss LMorrison ATsafrir DFalsafi BFerdman MLu SWenisch T(2022)The benefits of general-purpose on-NIC memoryProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507711(1130-1147)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507711
Zheng SWang JXue DShu JHuang L(2022)Hydra: A Decentralized File System for Persistent Memory and RDMA NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318036933:12(4192-4206)Online publication date: 1-Dec-2022
https://doi.org/10.1109/TPDS.2022.3180369
Zhang QLiu YLiu T(2022)iBalancer: Load-Aware in-Server Flow Scheduling for Sub-Millisecond Tail LatencyIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.312002133:8(1761-1774)Online publication date: 1-Aug-2022
https://doi.org/10.1109/TPDS.2021.3120021
Anupindi RKotni SBasu A(2022)memwalkd : Accelerating Key-value stores using Page Table Walkers2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC56025.2022.00021(69-74)Online publication date: Dec-2022
https://doi.org/10.1109/HiPC56025.2022.00021
Li YZeng LChen GGu CLuo FDing WShi ZFuentes J(2022)A Multi-hashing Index for hybrid DRAM-NVM memory systemsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2022.102547128:COnline publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1016/j.sysarc.2022.102547
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents