Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2934583.2934589acmconferencesArticle/Chapter ViewAbstractPublication PagesislpedConference Proceedingsconference-collections
research-article
Public Access

Voltage Noise Induced DRAM Soft Error Reduction Technique for 3D-CPUs

Published: 08 August 2016 Publication History

Abstract

Three-dimensional integration enables stacking DRAM on top of CPU, providing high bandwidth and short latency. However, non-uniform voltage fluctuation and local thermal hotspot in CPU layers are coupled into DRAM layers, causing a non-uniform bit-cell leakage (thereby bit flip) distribution. We propose a performance-power-resilience simulation framework to capture DRAM soft error in 3D multi-core CPU systems. A dynamic resilience management (DRM) scheme is investigated, which adaptively tunes CPU's operating points to adjust DRAM's voltage noise and thermal condition during runtime. The DRM uses dynamic frequency scaling to achieve a resilience borrow-in strategy, which effectively enhances DRAM's resilience without sacrificing performance.

References

[1]
T. Austin. DIVA: a reliable substrate for deep submicron microarchitecture design. In Microarchitecture, 1999. MICRO-32. Proceedings. 32nd Annual International Symposium on, pages 196--207, 1999.
[2]
R. Baumann. Soft errors in advanced computer systems. Design Test of Computers, IEEE, 2005.
[3]
I. Bhati, et al. DRAM Refresh Mechanisms, Trade-offs, and Penalties. Computers, IEEE Transactions on, PP(99):1--1, 2015.
[4]
C. Bienia, et al. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT '08, pages 72--81, 2008.
[5]
C. L. Chen and M. Y. Hsiao. Error-correcting Codes for Semiconductor Memory Applications: A State-of-the-art Review. IBM J. Res. Dev., 1984.
[6]
T. J. Dell. A white paper on the benefits of chipkill-correct ECC for PC server main memory.
[7]
M. Hsiao. A Class of Optimal Minimum Odd-weight-column SEC-DED Codes. IBM Journal of Research and Development, 14(4):395--401, July 1970.
[8]
G. Huang, et al. Power Delivery for 3D Chip Stacks: Physical Modeling and Design Implication. In Electrical Performance of Electronic Packaging, 2007 IEEE, pages 205--208, Octl 2007.
[9]
W. Huang, et al. HotSpot: A compact thermal modeling methodology for early-stage VLSI design. TVLSI, 14(5):501--513, 2006.
[10]
R. Jayaseelan and T. Mitra. Temperature Aware Task Sequencing and Voltage Scaling. In Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design, ICCAD '08, pages 618--623, 2008.
[11]
X. Jian, et al. Analyzing Reliability of Memory Sub-systems with Double-Chipkill Detect/Correct. In Dependable Computing (PRDC), 2013 IEEE 19th Pacific Rim International Symposium on, pages 88--97, Dec 2013.
[12]
S. Li, et al. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium on, pages 469--480. IEEE, 2009.
[13]
J. Liu, et al. RAIDR: Retention-Aware Intelligent DRAM Refresh. SIGARCH Comput. Archit. News, 40(3):1--12, June 2012.
[14]
G. H. Loh. 3D-Stacked Memory Architectures for Multi-core Processors. In Proceedings of the 35th Annual International Symposium on Computer Architecture, ISCA '08, pages 453--464, 2008.
[15]
J. Meng, et al. Optimizing Energy Efficiency of 3-D Multicore Systems with Stacked DRAM Under Power and Thermal Constraints. In Proceedings of the 49th Annual Design Automation Conference, DAC '12, pages 648--655, 2012.
[16]
P. Mercati, et al. Workload and User Experience-aware Dynamic Reliability Management in Multicore Processors. In Proceedings of the 50th Annual Design Automation Conference, DAC '13, pages 2:1--2:6, 2013.
[17]
J. Meza, et al. Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field. In Dependable Systems and Networks (DSN), 2015 45th Annual IEEE/IFIP International Conference on, pages 415--426, June 2015.
[18]
S. Mukherjee, et al. Detailed design and evaluation of redundant multi-threading alternatives. In Computer Architecture, 2002. Proceedings. 29th Annual International Symposium on, pages 99--110, 2002.
[19]
S. S. Mukherjee, et al. Cache Scrubbing in Microprocessors: Myth or Necessity? In Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'04), PRDC '04, 2004.
[20]
J. S. Pak, et al. PDN Impedance Modeling and Analysis of 3D TSV IC by Using Proposed P/G TSV Array Model Based on Separated P/G TSV and Chip-PDN Models. TCPM, 1(2):208--219, Feb 2011.
[21]
D. Roberts and P. Nair. Faultsim: A fast, configurable memory-resilience simulator. In The Memory Forum: In conjunction with ISCA, volume 41.
[22]
K. Roy, et al. Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits. Proceedings of the IEEE, 91(2):305--327, Feb 2003.
[23]
B. Schroeder, et al. DRAM Errors in the Wild: A Large-scale Field Study. SIGMETRICS Perform. Eval. Rev., 2009.
[24]
C. Serafy, et al. Unlocking the True Potential of 3-D CPUs With Microuidic Cooling. TVLSI, PP(99):1--1, 2015.
[25]
M. Shevgoor, et al. Quantifying the Relationship Between the Power Delivery Network and Architectural Policies in a 3D-stacked Memory Device. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-46, pages 198--209, 2013.
[26]
D. Skinner and W. Kramer. Understanding the causes of performance variability in HPC workloads. In Workload Characterization Symposium, 2005. Proceedings of the IEEE International, pages 137--149, Oct 2005.
[27]
T. Slegel, et al. IBM's S/390 G5 microprocessor design. Micro, IEEE, 19(2):12--23, Mar 1999.
[28]
V. Sridharan and D. Liberty. A Study of DRAM Failures in the Field. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, 2012.
[29]
R. Ubal, et al. Multi2Sim: a simulation framework for CPU-GPU computing. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques, pages 335--344. ACM, 2012.
[30]
S. C. Woo, et al. The SPLASH-2 Programs: Characterization and Methodological Considerations. SIGARCH Comput. Archit. News, 23(2):24--36, May 1995.

Cited By

View all
  • (2023)Thermal Management for 3D-Stacked Systems via Unified Core-Memory Power RegulationACM Transactions on Embedded Computing Systems10.1145/360804022:5s(1-26)Online publication date: 31-Oct-2023
  • (2017)Low-Power Clock Tree Synthesis for 3D-ICsACM Transactions on Design Automation of Electronic Systems10.1145/301961022:3(1-24)Online publication date: 5-Apr-2017
  • (2017)TSV-Based 3-D ICs: Design Methods and ToolsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2017.266660436:10(1593-1619)Online publication date: Oct-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISLPED '16: Proceedings of the 2016 International Symposium on Low Power Electronics and Design
August 2016
392 pages
ISBN:9781450341851
DOI:10.1145/2934583
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 August 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ISLPED '16
Sponsor:
ISLPED '16: International Symposium on Low Power Electronics and Design
August 8 - 10, 2016
CA, San Francisco Airport, USA

Acceptance Rates

ISLPED '16 Paper Acceptance Rate 60 of 190 submissions, 32%;
Overall Acceptance Rate 398 of 1,159 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)87
  • Downloads (Last 6 weeks)18
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Thermal Management for 3D-Stacked Systems via Unified Core-Memory Power RegulationACM Transactions on Embedded Computing Systems10.1145/360804022:5s(1-26)Online publication date: 31-Oct-2023
  • (2017)Low-Power Clock Tree Synthesis for 3D-ICsACM Transactions on Design Automation of Electronic Systems10.1145/301961022:3(1-24)Online publication date: 5-Apr-2017
  • (2017)TSV-Based 3-D ICs: Design Methods and ToolsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2017.266660436:10(1593-1619)Online publication date: Oct-2017

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media