Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3373376.3378513acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

CryoCache: A Fast, Large, and Cost-Effective Cache Architecture for Cryogenic Computing

Published: 13 March 2020 Publication History

Abstract

Cryogenic computing, which is to run a computer at extremely low temperatures (e.g., 77K), is a highly promising solution to dramatically improve the computer's performance and power efficiency thanks to the significantly reduced leakage power and wire resistance. However, computer architects are facing fundamental challenges in developing and deploying cryogenic-optimal architectural units due to the lack of understanding about its cost-effectiveness and feasibility (e.g., device and cooling costs vs. speedup, energy and area saving) and thus how to architect such cryogenic-optimal units.
In this paper, we propose CryoCache, a cost-effective, technology-feasible cryogenic-optimal cache architecture running at 77K. For this goal, we first thoroughly analyze the cost-effectiveness and feasibility of various on-chip memory cell technologies running at 77K. Based on the analysis, we architect cryogenic-optimal caches with conventional technology-feasible 6T-SRAM and 3T-eDRAM cells whose performance, area, and power benefits at 77K clearly outweigh their cooling costs. Our evaluations show that our example CryoCache architecture achieves 2 times faster cache access and 2 times larger capacity compared to conventional caches running at the room temperature. To the best of our knowledge, this is the first work to propose a fast, large, and cost-effective cache architecture which can be applied to cryogenic computing.

References

[1]
Francis Balestra and Gérard Ghibaudo. 2001. Device and circuit cryogenic operation for low temperature electronics .Springer.
[2]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques. ACM, 72--81.
[3]
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R Hower, Tushar Krishna, Somayeh Sardashti, et almbox. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News, Vol. 39, 2 (2011), 1--7.
[4]
Bryan Black, Murali Annavaram, Ned Brekelbaum, John DeVale, Lei Jiang, Gabriel H Loh, Don McCaule, Pat Morrow, Donald W Nelson, Daniel Pantuso, et almbox. 2006. Die stacking (3D) microarchitecture. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 469--479.
[5]
Darren K Brock. 2001 a. RSFQ technology: Circuits and systems. International journal of high speed electronics and systems, Vol. 11, 01 (2001), 307--362.
[6]
Darren K Brock. 2001 b. RSFQ technology: Circuits and systems. International journal of high speed electronics and systems, Vol. 11, 01 (2001), 307--362.
[7]
Paul Bunyk, Konstantin Likharev, and Dmitry Zinoviev. 2001. RSFQ technology: Physics and devices. International journal of high speed electronics and systems, Vol. 11, 01 (2001), 257--305.
[8]
Hao Cai, Wang Kang, You Wang, Lirida Naviner, Jun Yang, and Weisheng Zhao. 2017. High performance MRAM with spin-transfer-torque and voltage-controlled magnetic anisotropy effects. Applied Sciences, Vol. 7, 9 (2017), 929.
[9]
Mu-Tien Chang, Paul Rosenfeld, Shih-Lien Lu, and Bruce Jacob. 2013. Technology comparison for large last-level caches (L 3 Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 143--154.
[10]
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et almbox. 2014. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 609--622.
[11]
Hye Jin Cho, Jeong Dong Choe, Jeongnam Han, Dongchan Kim, Heungsik Park, Doohoon Goo, Ming Li, Chang Woo Oh, Dong-Won Kim, Tae Yong Kim, et almbox. 2005. The Vth controllability of 5nm body-tied CMOS FinFET. In IEEE VLSI-TSA International Symposium on VLSI Technology, 2005.(VLSI-TSA-Tech). IEEE, 116--117.
[12]
Ki Chul Chun, Pulkit Jain, Jung Hwa Lee, and Chris H Kim. 2009. A sub-0.9 V logic-compatible embedded DRAM with boosted 3T gain cell, regulated bit-line write scheme and PVT-tracking read reference bias. In 2009 Symposium on VLSI Circuits. IEEE, 134--135.
[13]
Ki Chul Chun, Pulkit Jain, Jung Hwa Lee, and Chris H Kim. 2011. A 3T gain cell embedded DRAM utilizing preferential boosting for high density and low power on-die caches. IEEE Journal of Solid-State Circuits, Vol. 46, 6 (2011), 1495--1505.
[14]
Ki Chul Chun, Hui Zhao, Jonathan D Harms, Tae-Hyoung Kim, Jian-Ping Wang, and Chris H Kim. 2012. A scaling roadmap and performance evaluation of in-plane and perpendicular MTJ based STT-MRAMs for high-density cache memory. IEEE Journal of Solid-State Circuits, Vol. 48, 2 (2012), 598--610.
[15]
Xiangyu Dong, Cong Xu, Yuan Xie, and Norman P Jouppi. 2012. Nvsim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 31, 7 (2012), 994--1007.
[16]
Xuanyao Fong, Yusung Kim, Karthik Yogendra, Deliang Fan, Abhronil Sengupta, Anand Raghunathan, and Kaushik Roy. 2015. Spin-transfer torque devices for logic and memory: Prospects and perspectives. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 35, 1 (2015), 1--22.
[17]
WH Henkels, NCC Lu, W Hwang, TV Rajeevakumar, RL Franch, KA Jenkins, TJ Bucelot, DF Heidel, and MJ Immediato. 1989. A 12-ns low-temperature DRAM. IEEE Transactions on Electron Devices, Vol. 36, 8 (1989), 1414--1422.
[18]
Farrukh Hijaz, Qingchuan Shi, and Omer Khan. 2013. A private level-1 cache architecture to exploit the latency and capacity tradeoffs in multicores operating at near-threshold voltages. In 2013 IEEE 31st International Conference on Computer Design (ICCD). IEEE, 85--92.
[19]
Digh Hisamoto, Wen-Chin Lee, Jakub Kedzierski, Hideki Takeuchi, Kazuya Asano, Charles Kuo, Erik Anderson, Tsu-Jae King, Jeffrey Bokor, and Chenming Hu. 2000. FinFET-a self-aligned double-gate MOSFET scalable to 20 nm. IEEE transactions on electron devices, Vol. 47, 12 (2000), 2320--2325.
[20]
JM Hornibrook, JI Colless, ID Conway Lamb, SJ Pauka, H Lu, AC Gossard, JD Watson, GC Gardner, S Fallahi, MJ Manfra, et almbox. 2015. Cryogenic control architecture for large-scale quantum computing. Physical Review Applied, Vol. 3, 2 (2015), 024010.
[21]
Chenming Hu. 2010. Modern semiconductor devices for integrated circuits. Vol. 2. Prentice Hall Upper Saddle River, NJ.
[22]
Yukikazu Iwasa. 2009. Case studies in superconducting magnets: design and operational issues .Springer Science & Business Media.
[23]
Jodi M Iwata-Harms, Guenole Jan, Huanlong Liu, Santiago Serrano-Guisan, Jian Zhu, Luc Thomas, Ru-Ying Tong, Vignesh Sundar, and Po-Kang Wang. 2018. High-temperature thermal stability driven by magnetization dilution in CoFeB free layers for spin-transfer-torque magnetic random access memory. Scientific reports, Vol. 8, 1 (2018), 14409.
[24]
Naifeng Jing, Yao Shen, Yao Lu, Shrikanth Ganapathy, Zhigang Mao, Minyi Guo, Ramon Canal, and Xiaoyao Liang. 2013. An energy-efficient and scalable eDRAM-based register file architecture for GPGPU. In ACM SIGARCH Computer Architecture News, Vol. 41. ACM, 344--355.
[25]
Nam Sung Kim, Todd Austin, David Blaauw, Trevor Mudge, Jie S Hu, Mary Jane Irwin, Mahmut Kandemir, Vijaykrishnan Narayanan, et almbox. 2003. Leakage Current: Moore. computer 12 (2003), 68--75.
[26]
Jagadish B Kotra, Mohammad Arjomand, Diana Guttman, Mahmut T Kandemir, and Chita R Das. 2016. Re-NUCA: A practical nuca architecture for reram based last-level caches. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 576--585.
[27]
Gyu-hyeon Lee, Dongmoon Min, Ilkwon Byun, and Jangwoo Kim. 2019. Cryogenic Computer Architecture Modeling with Memory-side Case Studies. In Proceedings of the 46th International Symposium on Computer Architecture (Phoenix, Arizona) (ISCA '19). ACM, New York, NY, USA, 774--787. https://doi.org/10.1145/3307650.3322219
[28]
Xiaoyao Liang, Ramon Canal, Gu-Yeon Wei, and David Brooks. 2007. Process variation tolerant 3T1D-based cache architectures. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 15--26.
[29]
Per J Liebermann and Frank K Wilhelm. 2016. Optimal Qubit Control Using Single-Flux Quantum Pulses. Physical Review Applied, Vol. 6, 2 (2016), 024022.
[30]
Konstantin K Likharev and Vasilii K Semenov. 1991. RSFQ logic/memory family: A new Josephson-junction technology for sub-terahertz-clock-frequency digital systems. IEEE Transactions on Applied Superconductivity, Vol. 1, 1 (1991), 3--28.
[31]
C-H Lin, R Kambhampati, RJ Miller, TB Hook, A Bryant, W Haensch, P Oldiges, I Lauer, T Yamashita, V Basker, et almbox. 2012. Channel doping impact on FinFETs for 22nm and beyond. In 2012 Symposium on VLSI Technology (VLSIT). IEEE, 15--16.
[32]
Gabriel H Loh, Yuan Xie, and Bryan Black. 2007. Processor design in 3D die-stacking technologies. Ieee Micro, Vol. 27, 3 (2007), 31--48.
[33]
Kristen Lovin, Benjamin C Lee, Xiaoyao Liang, David Brooks, and Gu-Yeon Wei. 2009. Empirical performance models for 3T1D memories. In 2009 IEEE International Conference on Computer Design. IEEE, 398--403.
[34]
William L Luyben. 2017. Estimating refrigeration costs at cryogenic temperatures. Computers & Chemical Engineering, Vol. 103 (2017), 144--150.
[35]
Richard Allen Matula. 1979. Electrical resistivity of copper, gold, palladium, and silver. Journal of Physical and Chemical Reference Data, Vol. 8, 4 (1979), 1147--1298.
[36]
R McDermott and MG Vavilov. 2014. Accurate qubit control with single flux quantum pulses. Physical Review Applied, Vol. 2, 1 (2014), 014007.
[37]
Naveen Muralimanohar and Rajeev Balasubramonian. 2007. Interconnect design considerations for large NUCA caches. In ACM SIGARCH Computer Architecture News, Vol. 35. ACM, 369--380.
[38]
Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P Jouppi. [n.d.]. CACTI 6.0: A tool to model large caches. ( [n.,d.]).
[39]
Ikki Nagaoka, Masamitsu Tanaka, Koji Inoue, and Akira Fujimaki. 2019. 29.3 A 48GHz 5.6 mW Gate-Level-Pipelined Multiplier Using Single-Flux Quantum Logic. In 2019 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 460--462.
[40]
John K Ousterhout, Gordon T Hamachi, Robert N Mayo, Walter S Scott, and George S Taylor. 1985. The magic VLSI layout system. IEEE Design & Test of Computers, Vol. 2, 1 (1985), 19--30.
[41]
Bishnu Patra, Rosario M Incandela, Jeroen PG Van Dijk, Harald AR Homulle, Lin Song, Mina Shahmohammadi, Robert Bogdan Staszewski, Andrei Vladimirescu, Masoud Babaie, Fabio Sebastiano, et almbox. 2017. Cryo-CMOS circuits and systems for quantum computing applications. IEEE Journal of Solid-State Circuits, Vol. 53, 1 (2017), 309--321.
[42]
David A Patterson and John L Hennessy. 2013. Computer organization and design MIPS edition: the hardware/software interface .Newnes.
[43]
RG Pires, RM Dickstein, SL Titcomb, and RL Anderson. 1990. Carrier freezeout in silicon. Cryogenics, Vol. 30, 12 (1990), 1064--1068.
[44]
Masood Qazi, Mahmut Sinangil, and Anantha Chandrakasan. 2011. Challenges and directions for low-voltage SRAM. IEEE design & test of computers, Vol. 28, 1 (2011), 32--43.
[45]
Glen Reinman and Norman P Jouppi. 2000. CACTI 2.0: An integrated cache timing and power model. Western Research Lab Research Report, Vol. 7 (2000).
[46]
Oleg Semenov, Arman Vassighi, and Manoj Sachdev. 2002. Impact of technology scaling on thermal behavior of leakage current in sub-quarter micron MOSFETs: perspective of low temperature current testing. Microelectronics Journal, Vol. 33, 11 (2002), 985--994.
[47]
M Shin, M Shi, M Mouis, A Cros, E Josse, Gyu-Tae Kim, and G Ghibaudo. 2014. Low temperature characterization of 14nm FDSOI CMOS devices. In 2014 11th International Workshop on Low Temperature Electronics (WOLTE). IEEE, 29--32.
[48]
Richard G Southwick, Justin Reed, Christopher Buu, Hieu Bui, Ross Butler, G Bersuker, and William B Knowlton. 2008. Temperature (5.6--300K) Dependence Comparison of Carrier Transport Mechanisms in HfO 2/SiO 2 and SiO 2 MOS Gate Stacks. In 2008 IEEE International Integrated Reliability Workshop Final Report. IEEE, 48--54.
[49]
Guangyu Sun, Xiangyu Dong, Yuan Xie, Jian Li, and Yiran Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In 2009 IEEE 15th International Symposium on High Performance Computer Architecture. IEEE, 239--249.
[50]
N Takeuchi, K Ehara, K Inoue, Y Yamanashi, and N Yoshikawa. 2013a. Margin and energy dissipation of adiabatic quantum-flux-parametron logic at finite temperature. IEEE Transactions on Applied Superconductivity, Vol. 23, 3 (2013), 1700304--1700304.
[51]
Naoki Takeuchi, Dan Ozawa, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2013b. An adiabatic quantum flux parametron as an ultra-low-power logic device. Superconductor Science and Technology, Vol. 26, 3 (2013), 035010.
[52]
Swamit S Tannu, Douglas M Carmean, and Moinuddin K Qureshi. 2017. Cryogenic-DRAM based memory system for scalable quantum computers: a feasibility study. In Proceedings of the International Symposium on Memory Systems. ACM, 189--195.
[53]
Narendar Vadthiya, Ramanuj Mishra, Sanjeev Rai, and R Mishra. 2012. Threshold Voltage Control Schemes in Finfets. International Journal of VLSI Design and Communication Systems, Vol. 3 (2012).
[54]
Fiona Wang, Thomas Vogelsang, Brent Haukness, and Stephen C Magee. 2018. DRAM Retention at Cryogenic Temperatures. In 2018 IEEE International Memory Workshop (IMW) . IEEE, 1--4.
[55]
G Wang, D Anand, N Butt, A Cestero, M Chudzik, J Ervin, S Fang, G Freeman, H Ho, B Khan, et almbox. 2009. Scaling deep trench based eDRAM on SOI to 32nm and Beyond. In 2009 IEEE International Electron Devices Meeting (IEDM). IEEE, 1--4.
[56]
Zhe Wang, Daniel A Jiménez, Cong Xu, Guangyu Sun, and Yuan Xie. 2014. Adaptive placement and migration policy for an STT-RAM-based hybrid cache. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) . IEEE, 13--24.
[57]
Fred Ware, Liji Gopalakrishnan, Eric Linstadt, Sally A McKee, Thomas Vogelsang, Kenneth L Wright, Craig Hampel, and Gary Bronner. 2017. Do superconducting processors really need cryogenic memories?: the case for cold DRAM. In Proceedings of the International Symposium on Memory Systems. ACM, 183--188.
[58]
Bi Wu, Pengcheng Dai, Yuanqing Cheng, Ying Wang, Jianlei Yang, Zhaohao Wang, Dijun Liu, and Weisheng Zhao. 2019. A Novel High Performance and Energy Efficient NUCA Architecture for STT-MRAM LLCs with Thermal Consideration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2019).
[59]
Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, and Yuan Xie. 2009. Hybrid cache architecture with disparate memory technologies. In ACM SIGARCH computer architecture news, Vol. 37. ACM, 34--45.
[60]
Yuan Xie. 2011. Modeling, architecture, and applications for emerging memory technologies. IEEE design & test of computers, Vol. 28, 1 (2011), 44--51.
[61]
N Yoshikawa, D Ozawa, and Y Yamanashi. 2011. Ultra-low-power superconducting logic devices using adiabatic quantum flux parametron. In Extended Abstracts of the 2011 International Conference on Solid State Devices and Materials (SSDM 2011), Nagoya .
[62]
Yan Zhang, Dharmesh Parikh, Karthik Sankaranarayanan, Kevin Skadron, and Mircea Stan. 2003. Hotleakage: A temperature-aware model of subthreshold and gate leakage for architects. University of Virginia Dept of Computer Science Tech Report CS-2003, Vol. 5 (2003).
[63]
Hongliang Zhao and Xinghui Liu. 2014. Modeling of a standard 0.35 μm CMOS technology operating from 77 K to 300 K. Cryogenics, Vol. 59 (2014), 49--59.
[64]
Wei Zhao and Yu Cao. 2006. New generation of predictive technology model for sub-45 nm early design exploration. IEEE Transactions on Electron Devices, Vol. 53, 11 (2006), 2816--2823.

Cited By

View all
  • (2024)CoolDC: A Cost-Effective Immersion-Cooled Datacenter with Workload-Aware Temperature ScalingACM Transactions on Architecture and Code Optimization10.1145/3664925Online publication date: 14-May-2024
  • (2024)A Benchmark of Cryo-CMOS Embedded SRAM/DRAMs in 40-nm CMOSIEEE Journal of Solid-State Circuits10.1109/JSSC.2024.338569659:7(2042-2054)Online publication date: Jul-2024
  • (2023)QIsim: Architecting 10+K Qubit QC Interfaces Toward Quantum SupremacyProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589036(1-16)Online publication date: 17-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
March 2020
1412 pages
ISBN:9781450371025
DOI:10.1145/3373376
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cryogenic cache
  2. cryogenic computing
  3. modeling
  4. simulation
  5. technology comparison and analysis

Qualifiers

  • Research-article

Conference

ASPLOS '20

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)228
  • Downloads (Last 6 weeks)33
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CoolDC: A Cost-Effective Immersion-Cooled Datacenter with Workload-Aware Temperature ScalingACM Transactions on Architecture and Code Optimization10.1145/3664925Online publication date: 14-May-2024
  • (2024)A Benchmark of Cryo-CMOS Embedded SRAM/DRAMs in 40-nm CMOSIEEE Journal of Solid-State Circuits10.1109/JSSC.2024.338569659:7(2042-2054)Online publication date: Jul-2024
  • (2023)QIsim: Architecting 10+K Qubit QC Interfaces Toward Quantum SupremacyProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589036(1-16)Online publication date: 17-Jun-2023
  • (2023)Is the Future Cold or Tall? Design Space Exploration of Cryogenic and 3D Embedded Cache Memory2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00022(134-144)Online publication date: Apr-2023
  • (2023)CSDB-eDRAM: A 16Kb Energy-Efficient 4T CSDB Gain Cell eDRAM with over 16.6s Retention Time and 49.23uW/Kb at 4.2K for Cryogenic Computing2023 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS46773.2023.10181628(1-5)Online publication date: 21-May-2023
  • (2022)CryoWire: wire-driven microarchitecture designs for cryogenic computingProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507749(903-917)Online publication date: 28-Feb-2022
  • (2022)XQsimProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527417(366-382)Online publication date: 18-Jun-2022
  • (2022)JBNN: A Hardware Design for Binarized Neural Networks using Single-Flux-Quantum CircuitsIEEE Transactions on Computers10.1109/TC.2022.3215085(1-12)Online publication date: 2022
  • (2022)AFS: Accurate, Fast, and Scalable Error-Decoding for Fault-Tolerant Quantum Computers2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00027(259-273)Online publication date: Apr-2022
  • (2021)Embedded Memories for Cryogenic ApplicationsElectronics10.3390/electronics1101006111:1(61)Online publication date: 25-Dec-2021
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media