Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3330345.3330353acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article
Public Access

SDC: a software defined cache for efficient data indexing

Published: 26 June 2019 Publication History

Abstract

CPU cache has been used to bridge the processor-memory performance gap to enable high-performance computing. As the cache is of limited capacity, for its maximum efficacy it should (1) avoid caching data that are less likely to be accessed and (2) identify and cache data that would otherwise cost a program multiple memory accesses to reach. Unfortunately, existing cache architectures are inadequate on these two efforts. First, to cost-effectively exploit the spatial locality, they adopt a relatively large and fixed-size cache line as the caching unit. Thus, much of the space in a cache line can be wasted when the data locality is weak. Second, for easy use, the cache is designed to be transparent to programs, which hinders programs from fully exploiting its performance potentials.
To address these problems, we propose a high-performance Software Defined Cache (SDC) architecture providing a simple and generic key-value abstraction that allows (1) caching data at a granularity smaller than a cache line, and (2) enabling programs to explicitly insert, retrieve, and invalidate data in the cache with new instructions. By providing a program with the ability of explicitly using the cache as a lookaside key-value buffer, SDC enables a much more efficient cache without disruptively changing the existing cache organization and without substantially increasing hardware cost. We have prototyped SDC on the gem5 simulator and evaluated it with various data index structures and workloads. Experiment results show that SDC can improve the cache performance for the workloads by up to 5.3× over current cache design.

References

[1]
Nadav Amit. 2017. Optimizing the TLB Shootdown Algorithm with Page Access Tracking. In Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '17). USENIX Association, Berkeley, CA, USA, 27--39. http://dl.acm.org/citation.cfm?id=3154690.3154694
[2]
David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan. 2009. FAWN: A Fast Array of Wimpy Nodes. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles (SOSP'09). ACM, New York, NY, USA, 1--14.
[3]
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload Analysis of a Large-scale Key-value Store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '12). ACM, New York, NY, USA, 53--64.
[4]
Rajeshwari Banakar, Stefan Steinke, Bo-Sik Lee, Mahesh Balakrishnan, and Peter Marwedel. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In Hardware/Software Codesign, 2002. CODES 2002. Proceedings of the Tenth International Symposium on. IEEE, IEEE, Estes Park, CO, USA, 73--78.
[5]
Nathan Beckmann and Daniel Sanchez. 2013. Jigsaw: Scalable Software-defined Caches. In Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques (PACT '13). IEEE Press, Piscataway, NJ, USA, 213--224. http://dl.acm.org/citation.cfm?id=2523721.2523752
[6]
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7.
[7]
W. A. Burkhard. 1976. Hashing and Trie Algorithms for Partial Match Retrieval. ACM Trans. Database Syst. 1, 2 (June 1976), 175--187.
[8]
Eric S. Chung, John D. Davis, and Jaewon Lee. 2013. LINQits: Big Data on Little Clients. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 261--272.
[9]
Daniel Citron and Dror G Feitelson. 2000. Hardware memoization of mathematical and trigonometric functions. Technical Report. The Hebrew University of Jerusalem.
[10]
Jamison Collins, Suleyman Sair, Brad Calder, and Dean M. Tullsen. 2002. Pointer Cache Assisted Prefetching. In Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO 35). IEEE Computer Society Press, Los Alamitos, CA, USA, 62--73. http://dl.acm.org/citation.cfm?id=774861.774869
[11]
Jamison D. Collins, Hong Wang, Dean M. Tullsen, Christopher Hughes, Yong-Fong Lee, Dan Lavery, and John P. Shen. 2001. Speculative Precomputation: Long-range Prefetching of Delinquent Loads. In Proceedings of the 28th Annual International Symposium on Computer Architecture (ISCA '01). ACM, New York, NY, USA, 14--25.
[12]
HyperTransport Technology Consortium et al. 2008. HyperTransport I/O link specification. Revision 1 (2008), 111--118.
[13]
Robert Cooksey, Stephan Jourdan, and Dirk Grunwald. 2002. A Stateless, Content-directed Data Prefetching Mechanism. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS X). ACM, New York, NY, USA, 279--290.
[14]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). ACM, New York, NY, USA, 143--154.
[15]
Yonghua Ding and Zhiyuan Li. 2004. A Compiler Scheme for Reusing Intermediate Computation Results. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO '04). IEEE Computer Society, Washington, DC, USA, 279-. http://dl.acm.org/citation.cfm?id=977395.977679
[16]
E. Ebrahimi, O. Mutlu, and Y. N. Patt. 2009. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In 2009 IEEE 15th International Symposium on High Performance Computer Architecture. IEEE, Raleigh, NC, USA, 7--17.
[17]
Brad Fitzpatrick. 2004. Distributed Caching with Memcached. Linux J. 2004, 124 (Aug. 2004), 5-. http://dl.acm.org/citation.cfm?id=1012889.1012894
[18]
gem5. 2014. Gem5-Classic Memory System. http://www.gem5.org/Classic_Memory_System.
[19]
Brian Gold, Anastassia Ailamaki, Larry Huston, and Babak Falsafi. 2005. Accelerating Database Operators Using a Network Processor. In Proceedings of the 1st International Workshop on Data Management on New Hardware (DaMoN '05). ACM, New York, NY, USA, Article 1, 6 pages.
[20]
Timothy Hayes, Oscar Palomar, Osman Unsal, Adrian Cristal, and Mateo Valero. 2012. Vector Extensions for Decision Support DBMS Acceleration. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society, Washington, DC, USA, 166--176.
[21]
Kevin Hsieh, Samira Khan, Nandita Vijaykumar, Kevin K Chang, Amirali Boroumand, Saugata Ghose, and Onur Mutlu. 2016. Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation. In 2016 IEEE 34th International Conference on Computer Design (ICCD). IEEE, Phoenix, USA, 25--32.
[22]
INTEL. 2013. Intel Haswell processors. http://www.7-cpu.com/cpu/Haswell.html.
[23]
Intel. 2016. Intel Xeon Processor E5-2683 v4. https://ark.intel.com/products/91766/Intel-Xeon-Processor-E5-2683-v4-40M-Cache-2-10-GHz-.
[24]
Doug Joseph and Dirk Grunwald. 1997. Prefetching Using Markov Predictors. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA '97). ACM, New York, NY, USA, 252--263.
[25]
M. Karlsson, F. Dahlgren, and P. Stenstrom. 2000. A prefetching technique for irregular accesses to linked data structures. In Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550). IEEE, Touluse, France, 206--217.
[26]
Onur Kocberber, Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, and Parthasarathy Ranganathan. 2013. Meet the Walkers: Accelerating Index Traversals for In-memory Databases. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA, 468--479.
[27]
Rakesh Komuravelli, Matthew D. Sinclair, Johnathan Alsop, Muhammad Huzaifa, Maria Kotsifakou, Prakalp Srivastava, Sarita V. Adve, and Vikram S. Adve. 2015. Stash: Have Your Scratchpad and Cache It Too. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 707--719.
[28]
Hyeontaek Lim, Bin Fan, David G. Andersen, and Michael Kaminsky. 2011. SILT: A Memory-efficient, High-performance Key-value Store. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11). ACM, New York, NY, USA, 1--13.
[29]
Mikko H Lipasti, William J Schmidt, Steven R Kunkel, and Robert R Roediger. 1995. SPAID: Software prefetching in pointer-and call-intensive environments. In Microarchitecture, 1995., Proceedings of the 28th Annual International Symposium on. IEEE, IEEE, Ann Arbor, MI, USA, 231--236.
[30]
Chi-Keung Luk. 2001. Tolerating Memory Latency Through Software-controlled Pre-execution in Simultaneous Multithreading Processors. In Proceedings of the 28th Annual International Symposium on Computer Architecture (ISCA '01). ACM, New York, NY, USA, 40--51.
[31]
Chi-Keung Luk and Todd C. Mowry. 1996. Compiler-based Prefetching for Recursive Data Structures. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII). ACM, New York, NY, USA, 222--233.
[32]
Yandong Mao, Eddie Kohler, and Robert Tappan Morris. 2012. Cache Craftiness for Fast Multicore Key-value Storage. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys '12). ACM, New York, NY, USA, 183--196.
[33]
Rich Martin. 1996. A Vectorized Hash-Join. Technical Report. University of California at Berkeley, California, USA.
[34]
Anurag Mukkara, Nathan Beckmann, and Daniel Sanchez. 2016. Whirlpool: Improving Dynamic Cache Management with Static Data Classification. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 113--127.
[35]
Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling Memcache at Facebook. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13). USENIX, Lombard, IL, 385--398. https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/nishtala
[36]
Brendan O'Connor. 2011. How much text versus metadata is in a tweet. http://goo.gl/EBFIFs.
[37]
François Panneton and Pierre L'Ecuyer. 2005. On the Xorshift Random Number Generators. ACM Trans. Model. Comput. Simul. 15, 4 (Oct. 2005), 346--361.
[38]
Jun Rao and Kenneth A. Ross. 1999. Cache Conscious Indexing for Decision-Support in Main Memory. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB '99). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 78--89. http://dl.acm.org/citation.cfm?id=645925.671362
[39]
Jun Rao and Kenneth A. Ross. 2000. Making B+- Trees Cache Conscious in Main Memory. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD '00). ACM, New York, NY, USA, 475--486.
[40]
Freescale Semiconductor. 2005. PowerPC e500 Core Family Reference Manual. https://goo.gl/Jjs38u
[41]
Santhosh Srinath, Onur Mutlu, Hyesoon Kim, and Yale N. Patt. 2007. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. In Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture (HPCA '07). IEEE Computer Society, Washington, DC, USA, 63--74.
[42]
Nitish Kumar Srivastava and Akshay Dilip Navalakha. 2018. Pointer-Chase Prefetcher for Linked Data Structures. CoRR abs/1801.08088 (2018), 12. arXiv:1801.08088 http://arxiv.org/abs/1801.08088
[43]
Y. Sun, Y. Hua, D. Feng, L. Yang, P. Zuo, and S. Cao. 2015. MinCounter: An efficient cuckoo hashing scheme for cloud storage systems. In 2015 31st Symposium on Mass Storage Systems and Technologies (MSST). IEEE, Santa Clara, California, USA, 1--7.
[44]
Yuanyuan Sun, Yu Hua, Song Jiang, Qiuyu Li, Shunde Cao, and Pengfei Zuo. 2017. SmartCuckoo: A Fast and Cost-Efficient Hashing Index Scheme for Cloud Storage Systems. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). USENIX Association, Santa Clara, CA, 553--565. https://www.usenix.org/conference/atc17/technical-sessions/presentation/sun
[45]
Arjun Suresh, Erven Rohou, and André Seznec. 2017. Compile-time Function Memoization. In Proceedings of the 26th International Conference on Compiler Construction (CC 2017). ACM, New York, NY, USA, 45--54.
[46]
Symas. 2016. LMDB: Lightning Memory-Mapped Database Manager. http://www.lmdb.tech/doc/index.html.
[47]
Thomas Wang. 2007. Integer Hash Function. http://web.archive.org/web/20071223173210/http://www.concentric.net/~Ttwang/tech/inthash.htm.
[48]
Po-An Tsai, Nathan Beckmann, and Daniel Sanchez. 2017. Jenga: Software-defined cache hierarchies. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE/ACM, Toronto, Ontario, Canada, 652--665.
[49]
Tomoaki Tsumura, Ikuma Suzuki, Yasuki Ikeuchi, Hiroshi Matsuo, Hiroshi Nakashima, and Yasuhiko Nakashima. 2007. Design and Evaluation of an Auto-memoization Processor. In Proceedings of the 25th Conference on Proceedings of the 25th IASTED International Multi-Conference: Parallel and Distributed Computing and Networks (PDCN'07). ACTA Press, Anaheim, CA, USA, 245--250. http://dl.acm.org/citation.cfm?id=1295581.1295621
[50]
Stephen Tu. 2013. Silo source code on Github. https://github.com/stephentu/silo.
[51]
Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy Transactions in Multicore In-memory Databases. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP'13). ACM, New York, NY, USA, 18--32.
[52]
Xingbo Wu, Fan Ni, and Song Jiang. 2017. Search Lookaside Buffer: Efficient Caching for Index Data Structures. In Proceedings of the 2017 Symposium on Cloud Computing (SoCC '17). ACM, New York, NY, USA, 27--39.
[53]
Xingbo Wu, Fan Ni, and Song Jiang. 2019. Wormhole: A Fast Ordered Index for In-memory Data Management. In Proceedings of the Fourteenth EuroSys Conference 2019 (EuroSys '19). ACM, New York, NY, USA, Article 18, 16 pages.
[54]
Yuan Yuan, Rubao Lee, and Xiaodong Zhang. 2013. The Yin and Yang of Processing Data Warehousing Queries on GPU Devices. Proc. VLDB Endow. 6, 10 (Aug. 2013), 817--828.
[55]
Guowei Zhang and Daniel Sanchez. 2018. Leveraging Hardware Caches for Memoization. IEEE Comput. Archit. Lett. 17, 1 (Jan. 2018), 59--63.
[56]
Kai Zhang, Kaibo Wang, Yuan Yuan, Lei Guo, Rubao Lee, and Xiaodong Zhang. 2015. Mega-KV: A Case for GPUs to Maximize the Throughput of In-memory Key-value Stores. Proc. VLDB Endow. 8, 11 (July 2015), 1226--1237.
[57]
Xiao Zhang, Sandhya Dwarkadas, and Kai Shen. 2009. Towards Practical Page Coloring-based Multicore Cache Management. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys '09). ACM, New York, NY, USA, 89--102.
[58]
Dimitrios Ziakas, Allen Baum, Robert A Maddox, and Robert J Safranek. 2010. Intel® quickpath interconnect architectural features supporting scalable system architectures. In High Performance Interconnects (HOTI), 2010 IEEE 18th Annual Symposium on. IEEE, IEEE, Mountain View, California, US, 1--6.

Cited By

View all
  • (2024)EKRM: Efficient Key-Value Retrieval Method to Reduce Data Lookup Overhead for RedisEuro-Par 2024: Parallel Processing10.1007/978-3-031-69577-3_12(166-179)Online publication date: 26-Aug-2024
  • (2021)Hardware-Based Address-Centric Acceleration of Key-Value Store2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00067(736-748)Online publication date: Feb-2021
  • (2020)Put an elephant into a fridgeProceedings of the VLDB Endowment10.14778/3397230.339724713:9(1540-1554)Online publication date: 26-Jun-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '19: Proceedings of the ACM International Conference on Supercomputing
June 2019
533 pages
ISBN:9781450360791
DOI:10.1145/3330345
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data indexing
  2. key value
  3. software-defined cache

Qualifiers

  • Research-article

Funding Sources

Conference

ICS '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)109
  • Downloads (Last 6 weeks)18
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)EKRM: Efficient Key-Value Retrieval Method to Reduce Data Lookup Overhead for RedisEuro-Par 2024: Parallel Processing10.1007/978-3-031-69577-3_12(166-179)Online publication date: 26-Aug-2024
  • (2021)Hardware-Based Address-Centric Acceleration of Key-Value Store2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00067(736-748)Online publication date: Feb-2021
  • (2020)Put an elephant into a fridgeProceedings of the VLDB Endowment10.14778/3397230.339724713:9(1540-1554)Online publication date: 26-Jun-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media