Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A programmable memory controller for the DDRx interfacing standards

Published: 20 December 2013 Publication History

Abstract

Modern memory controllers employ sophisticated address mapping, command scheduling, and power management optimizations to alleviate the adverse effects of DRAM timing and resource constraints on system performance. A promising way of improving the versatility and efficiency of these controllers is to make them programmable—a proven technique that has seen wide use in other control tasks, ranging from DMA scheduling to NAND Flash and directory control. Unfortunately, the stringent latency and throughput requirements of modern DDRx devices have rendered such programmability largely impractical, confining DDRx controllers to fixed-function hardware.
This article presents the instruction set architecture (ISA) and hardware implementation of PARDIS, a programmable memory controller that can meet the performance requirements of a high-speed DDRx interface. The proposed controller is evaluated by mapping previously proposed DRAM scheduling, address mapping, refresh scheduling, and power management algorithms onto PARDIS. Simulation results show that the average performance of PARDIS comes within 8% of fixed-function hardware for each of these techniques; moreover, by enabling application-specific optimizations, PARDIS improves system performance by 6 to 17% and reduces DRAM energy by 9 to 22% over four existing memory controllers.

References

[1]
Agarwal, A., Bianchini, R., Chaiken, D., Kranz, D., Kubiatowicz, J., Hong Lim, B., MacKenzie, K., and Yeung, D. 1995. The MIT alewife machine: Architecture and performance. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. 2--13.
[2]
Bailey, D. H. et al. 1994. NAS parallel benchmarks. Tech. rep. RNR-94-007, NASA Ames Research Center.
[3]
Browne, M., Aybay, G., Nowatzyk, A., Dubois, M., and Member, S. 1998. Design verification of the s3.mp cache coherent shared-memory system. IEEE Trans. Comput.
[4]
Cadence. Encounter RTL compiler. http://www.cadence.com/products/ld/rtl-compiler/.
[5]
Carter, J., Hsieh, W., Stoller, L., Swanson, M., Zhang, L., Brunvand, E., Davis, A., Kuo, C.-C., Kuramkote, R., Parker, M., Schaelicke, L., and Tateyama, T. 1999. Impulse: Building a smarter memory controller. In Proceedings of the International Symposium 5th HPCA. High-Performance Computer Architecture. 70--79.
[6]
Choudhary, N. K., Wadhavkar, S. V., Shah, T. A., Mayukh, H., Gandhi, J., Dwiel, B. H., Navada, S., Najaf-Abadi, H. H., and Rotenberg, E. 2011. Fabscalar: Composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA'11). ACM, New York, 11--22.
[7]
Dagum, L. and Menon, R. 1998. OpenMP: An industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5, 1, 46--55.
[8]
Diniz, B., Guedes, D., Meira,W., Jr., and Bianchini, R. 2007. Limiting the power consumption of main memory. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA). 290--301.
[9]
Firoozshahian, A., Solomatnikov, A., Shacham, O., Asgar, Z., Richardson, S., Kozyrakis, C., and Horowitz, M. 2009. A memory system design framework: Creating smart memories. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). ACM, New York, 406--417.
[10]
FreePDK. Free PDK 45nm open-access based PDK for the 45nm technology node. http://www.eda.ncsu.edu/wiki/FreePDK.
[11]
Hewlett-Packard Development Company, L. P. 2010. DDR3 memory technology. http://h20195.www2.hp.com/v2/GetPDF.aspx/c01750914.pdf.
[12]
Hur, I. and Lin, C. 2008. A comprehensive approach to dram power management. In Proceedings of HPCA'08. 305--316.
[13]
Ipek, E., Mutlu, O., Martinez, J., and Caruana, R. 2008. Self-optimizing memory controllers: A reinforcement learning approach. In Proceedings of the International Symposium on Computer Architecture.
[14]
Isen, C. and John, L. 2009. Eskimo - Energy savings using semantic knowledge of inconsequential memory occupancy for dram subsystem. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42). 337--346.
[15]
ITRS. International Technology Roadmap for Semiconductors: 2010 Update. http://www.itrs.net/links/2010itrs/home2010.htm.
[16]
Jacob, B. L., Ng, S. W., Wang, D. T., and Wang, D. T. 2008. Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann.
[17]
Kim, Y., Han, D., Mutlu, O., and Harchol-Balter, M. 2010a. Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers. In Proceedings of the IEEE 16th International Symposium on High Performance Computer Architecture (HPCA). 1--12.
[18]
Kim, Y., Papamichael, M., Mutlu, O., and Harchol-Balter, M. 2010b. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'43). IEEE, Los Alamistos, CA, 65--76.
[19]
Kornaros, G., Papaefstathiou, I., Nikologiannis, A., and Zervos, N. 2003. A fully programmable memory management system optimizing queue handling at multi gigabit rates. In Proceedings of the Design Automation Conference. 54--59.
[20]
Kuskin, J., Ofelt, D., Heinrich, M., Heinlein, J., Simoni, R., Gharachorloo, K., Chapin, J., Nakahira, D., Baxter, J., Horowitz, M., Gupta, A., Rosenblum, M., and Hennessy, J. 1994. The Stanford flash multiprocessor. In Proceedings of the 21st Annual International Symposium on Computer Architecture (ISCA'94). IEEE, Los Alamitos, CA, 302--313.
[21]
Lee, K.-B., Lin, T.-C., and Jen, C.-W. 2005. An efficient quality-aware memory controller for multimediaplatform soc. IEEE Trans. Circuits Syst. Video Technol. 15, 5, 620--633.
[22]
Liu, S., Pattabiraman, K., Moscibroda, T., and Zorn, B. G. 2011. Flikker: Saving DRAM refresh-power through critical data partitioning. In Proceedings of ASPLOS, R. Gupta and T. C. Mowry, Eds., ACM, New York, 213--224.
[23]
Martin, J., Bernard, C., Clermidy, F., and Durand, Y. 2009. A microprogrammable memory controller for high-performance dataflow applications. In Proceedings of ESSCIRC (ESSCIRC'09). 348--351.
[24]
Micron Technology, Inc. 2009a. 8Gb DDR3 SDRAM. Micron Technology, Inc. http://www.micron.com//getdocument/?documentId=416.
[25]
Micron Technology, Inc. 2009b. TN-29-14: Increasing NAND flash performance functionality. Micron Technology Inc. http://www.micron.com/getdocument/?documentId=140.
[26]
Micron Technology, Inc. 2009c. TN-41-08: design guide for two DDR3-1066 UDIMM systems introduction. Micron Technology, Inc. http://www.micron.com//document download/?documentId=4297.
[27]
Mukundan, J. and Martinez, J. F. 2012. Morse: Multi-objective reconfigurable self-optimizing memory scheduler. In Proceedings of the IEEE 18th International Symposium on High-Performance Computer Architecture (HPCA'12). IEEE, Los Alamitos, CA, 1--12.
[28]
Mutlu, O. and Moscibroda, T. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. In Proceedings of the 35th Annual International Symposium on Computer Architecture. ACM, New York, 32--41.
[29]
Narayanan, R., et al. 2006. Minebench: A benchmark suite for data mining workloads. In Proceedings of the IEEE International Symposium on Workload Characterization.
[30]
Reinhardt, S. K., Larus, J. R., and Wood, D. A. 1994. Tempest and typhoon: User-level shared memory. In Proceedings of ISCA-21. 325--336.
[31]
Renau, J., et al. 2005. SESC simulator. http://sesc.sourceforge.net.
[32]
Rixner, S., et al. 2000. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture.
[33]
Stuecheli, J., Kaseridis, D., Hunter, H. C., and John, L. K. 2010. Elastic refresh: Techniques to mitigate refresh penalties in high density memory. In Proceedings of MICRO. 375--384.
[34]
Sudan, K., Chatterjee, N., Nellans, D., Awasthi, M., Balasubramonian, R., and Davis, A. 2010. Micro-pages: increasing dram efficiency with locality-aware data placement. In Proceedings of ASPLOS'10. 219--230.
[35]
Wilton, S. and Jouppi, N. 1996. CACTI: An enhanced cache access and cycle time model. IEEE J. Solid-State Circuits 31, 5, 677--688.
[36]
Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of ISCA-22.
[37]
Yoo, R. M., Romano, A., and Kozyrakis, C. 2009. Phoenix rebirth: Scalable MapReduce on a large-zscale shared-memory system. In Proceedings of the IEEE International Symposium on Workload Characterization.
[38]
Zhang, Z., Zhu, Z., and Zhang, X. 2000. A permutation-based page interleaving scheme to reduce row buffer conflicts and exploit data locality. In Proceedings of the 33rd Annual International Symposium on Microarchitecture. ACM, New York, 32--41.
[39]
Zhao, W. and Cao, Y. 2006. New generation of predictive technology model for sub-45nm design exploration. In Proceedings of the International Symposium on Quality Electronic Design.
[40]
Zheng, H., Lin, J., Zhang, Z., Gorbatov, E., David, H., and Zhu, Z. 2008. Mini-rank: Adaptive dram architecture for improving memory power efficiency. In Proceedings of the 41st IEEE/ACM International Symposium on Microarchitecture (MICRO-41). IEEE, Los Alamitos, CA, 210--221.

Cited By

View all
  • (2021)Programmable FPGA-based Memory Controller2021 IEEE Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI52880.2021.00020(43-51)Online publication date: Aug-2021
  • (2020)SemeruProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488781(261-280)Online publication date: 4-Nov-2020
  • (2020)A survey on attack vectors in stack cache memoryIntegration10.1016/j.vlsi.2020.02.004Online publication date: Feb-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems
ACM Transactions on Computer Systems  Volume 31, Issue 4
December 2013
90 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/2542150
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 December 2013
Accepted: 01 June 2013
Revised: 01 June 2013
Received: 01 December 2012
Published in TOCS Volume 31, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Programmable
  2. memory controller

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)1
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Programmable FPGA-based Memory Controller2021 IEEE Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI52880.2021.00020(43-51)Online publication date: Aug-2021
  • (2020)SemeruProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488781(261-280)Online publication date: 4-Nov-2020
  • (2020)A survey on attack vectors in stack cache memoryIntegration10.1016/j.vlsi.2020.02.004Online publication date: Feb-2020
  • (2015)Recent advances in computer architectureProceedings of the 7th USENIX Conference on Theory and Practice of Provenance10.5555/2814579.2814587(8-8)Online publication date: 8-Jul-2015
  • (2015)Power-Efficient Instancy Aware DRAM SchedulingIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.E98.A.942E98.A:4(942-953)Online publication date: 2015

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media