Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Memory Ordering: A Value-Based Approach

Published: 02 March 2004 Publication History

Abstract

Conventional out-of-order processors employ a multi-ported,fully-associative load queue to guarantee correctmemory reference order both within a single thread of executionand across threads in a multiprocessor system. Asimprovements in process technology and pipelining lead tohigher clock frequencies, scaling this complex structure toaccommodate a larger number of in-flight loads becomesdifficult if not impossible. Furthermore, each access to thiscomplex structure consumes excessive amounts of energy.In this paper, we solve the associative load queue scalabilityproblem by completely eliminating the associative loadqueue. Instead, data dependences and memory consistencyconstraints are enforced by simply re-executing loadinstructions in program order prior to retirement. Usingheuristics to filter the set of loads that must be re-executed,we show that our replay-based mechanism enables a simple,scalable, and energy-efficient FIFO load queue designwith no associative lookup functionality, while sacrificingonly a negligible amount of performance and cache bandwidth.

References

[1]
{1} H. Akkary, R. Rajwar, and S. T. Srinivasan. "Checkpoint processing and recovery: Towards scalable large instruction window processors." In Proc. of the 36th Intl. Symp. on Microarchitecture, December 2003.
[2]
{2} A. Alameldeen and D. Wood. "Variability in architectural simulations of multi-threaded workloads." In Proc. of the Ninth Intl. Symp. on High Performance Computer Architecture , February 2003.
[3]
{3} T. Austin. "DIVA: A reliable substrate for deep submicron microarchitecture design." In Proc. of the 32nd Intl. Symp. on Microarchitecture, pages 196-207, Haifa, Israel, November 1999.
[4]
{4} H. W. Cain, K. M. Lepak, B. A. Schwartz, and M. H. Lipasti. "Precise and accurate processor simulation." In Proc. of the Workshop on Computer Architecture Evaluation using Commercial Workloads, February 2002.
[5]
{5} A. Charlesworth, A. Phelps, R. Williams, and G. Gilbert. "Gigaplane-XB: extending the ultra enterprise family." In Proceedings of Hot Interconnects V, pages 97-112, August 1997.
[6]
{6} G. Z. Chrysos and J. S. Emer. "Memory dependence prediction using store sets." In Proc. of the 25th Intl. Symp. on Computer architecture, pages 142-153. IEEE Press, 1998.
[7]
{7} Compaq Computer Corporation, Shrewsbury, Massachusetts. 21264/EV68CB and 21264/EV68DC Hardware Reference Manual, 1.1 edition, June 2001.
[8]
{8} A. Condon and A. J. Hu. "Automatable verification of sequential consistency." In Proc. of the 13th Symp. on Parallel Algorithms and Architectures, January 2001.
[9]
{9} K. Gharachorloo, A. Gupta, and J. Hennessy. "Two techniques to enhance the performance of memory consistency models." In Proc. of the 1991 Intl. Conf. on Parallel Processing , pages 355-364, August 1991.
[10]
{10} Intel Corporation. Pentium Pro Family Developers Manual, Volume 3: Operating System Writers Manual, Jan. 1996.
[11]
{11} T. Keller, A. Maynard, R. Simpson, and P. Bohrer. "Simos-ppc full system simulator." http://www.cs.utex-as.edu/users/cart/simOS.
[12]
{12} A. KleinOsowski and D. J. Lilja. "Minnespec: A new SPEC benchmark workload for simulation-based computer architecture research." Computer Architecure Letters, 1, June 2002.
[13]
{13} A. Landin, E. Hagersten, and S. Haridi. "Race-free interconnection networks and multiprocessor consistency." In Proc. of the 18th Intl. Symp. on Comp. Architecture, 1991.
[14]
{14} K. M. Lepak and M. H. Lipasti. "On the value locality of store instructions." In Proceedings of the 27th International Symposium on Computer Architecture, pages 182-191, Vancouver, BC, June 2000.
[15]
{15} M. M. K. Martin, D. J. Sorin, H. W. Cain, M. D. Hill, and M. H. Lipasti. "Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing." In Proc. of the 34th Intl. Symp. on Microarchitecture , pages 328-337, December 2001.
[16]
{16} J. F. Martinez, J. Renau, M. C. Huang, M. Prvulovic, and J. Torrellas. "Cherry: checkpointed early resource recycling in out-of-order microprocessors." In Proceedings of the 35th annual Intl. Symp. on Microarchitecture, pages 3-14. November, 2002.
[17]
{17} S. Onder and R. Gupta. "Dynamic memory disambiguation in the presence of out-of-order store issuing." In Proc. of the 32nd Intl. Symp. on Microarchitecture, November 1999.
[18]
{18} I. Park, C.-L. Ooi, and T. N. Vijaykumar. "Reducing design complexity of the load-store queue." In Proc. of the 36th Intl. Symp. on Microarchitecture, December 2003.
[19]
{19} D. Ponomarev, G. Kucuk, and K. Ghose. "Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources." In Proc. of the 34th Intl. Symp. on Microarchitecture, December 2001.
[20]
{20} M. Rosenblum, S. Herrod, E. Witchel, and A. Gupta. "Complete computer simulation: the simos approach." IEEE Parallel and Distributed Technology, 3(4):34-43, 1995.
[21]
{21} S. Sethumadhavan, R. Desikan, D. Burger, C. R. Moore, and S. W. Keckler. "Scalable hardware memory disambiguation for high-ilp processors." In Proc. of the 36th Intl. Symp. on Microarchitecture, December 2003.
[22]
{22} P. Shivakumar and N. P. Jouppi. "Cacti 3.0: An integrated cache timing, power, and area model." Technical Report 2001/2, Compaq Western Research Lab Research Report, 2001.
[23]
{23} J. M. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy. "POWER4 system microarchitecture." Technical white paper, IBM Server Group, October 2001.
[24]
{24} S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. "The SPLASH2 programs: Characterization and methodological considerations." In Proceedings of the 22nd International Symposium on Computer Architecture, pages 24- 36, June 1995.
[25]
{25} K. C. Yeager. "The MIPS R10000 superscalar microprocessor." IEEE Micro, 16(2):28-40, April 1996.
[26]
{26} A. Yoaz, R. Ronen, R. S. Chappell, and Y. Almog. "Silence is golden?" In Work-in-progress workshop of the 7th International Symposium on High-Performance Computer Architecture, January 2001.

Cited By

View all
  • (2024)PipeGen: Automated Transformation of a Single-Core Pipeline into a Multicore Pipeline for a Given Memory Consistency ModelProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676889(1-13)Online publication date: 14-Oct-2024
  • (2016)Performance of Dynamic Instruction Window Resizing for a Given Power Budget under DVFS ControlIEICE Transactions on Information and Systems10.1587/transinf.2015EDP7325E99.D:2(341-350)Online publication date: 2016
  • (2015)Revisiting Clustered Microarchitecture for Future Superscalar CoresACM Transactions on Architecture and Code Optimization10.1145/280078712:3(1-22)Online publication date: 31-Aug-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 32, Issue 2
ISCA 2004
March 2004
373 pages
ISSN:0163-5964
DOI:10.1145/1028176
Issue’s Table of Contents
  • cover image ACM Conferences
    ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture
    June 2004
    373 pages
    ISBN:0769521436

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 March 2004
Published in SIGARCH Volume 32, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)2
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)PipeGen: Automated Transformation of a Single-Core Pipeline into a Multicore Pipeline for a Given Memory Consistency ModelProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676889(1-13)Online publication date: 14-Oct-2024
  • (2016)Performance of Dynamic Instruction Window Resizing for a Given Power Budget under DVFS ControlIEICE Transactions on Information and Systems10.1587/transinf.2015EDP7325E99.D:2(341-350)Online publication date: 2016
  • (2015)Revisiting Clustered Microarchitecture for Future Superscalar CoresACM Transactions on Architecture and Code Optimization10.1145/280078712:3(1-22)Online publication date: 31-Aug-2015
  • (2015)Submitted to IEEE Transactions on Parallel and Distributed Systems Special Issue on CMP ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.1080(1-1)Online publication date: 2015
  • (2013)Exploring memory consistency for massively-threaded throughput-oriented processorsACM SIGARCH Computer Architecture News10.1145/2508148.248594041:3(201-212)Online publication date: 23-Jun-2013
  • (2013)Exploring memory consistency for massively-threaded throughput-oriented processorsProceedings of the 40th Annual International Symposium on Computer Architecture10.1145/2485922.2485940(201-212)Online publication date: 23-Jun-2013
  • (2012)Active Store Window: Enabling Far Store-Load Forwarding with Scalability and Complexity-EfficiencyJournal of Computer Science and Technology10.1007/s11390-012-1263-727:4(769-780)Online publication date: 12-Jul-2012
  • (2011)Hybrid timing-address oriented load-store queue filtering for an x86 architectureIET Computers & Digital Techniques10.1049/iet-cdt.2010.00045:2(145)Online publication date: 2011
  • (2009)BulkCompilerProceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/1669112.1669131(133-144)Online publication date: 12-Dec-2009
  • (2008)A Two-Level Load/Store Queue Based on Execution LocalityACM SIGARCH Computer Architecture News10.1145/1394608.138217136:3(25-36)Online publication date: 1-Jun-2008
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media