Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/998680.1006709acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

Memory Ordering: A Value-Based Approach

Published: 02 March 2004 Publication History

Abstract

Conventional out-of-order processors employ a multi-ported,fully-associative load queue to guarantee correctmemory reference order both within a single thread of executionand across threads in a multiprocessor system. Asimprovements in process technology and pipelining lead tohigher clock frequencies, scaling this complex structure toaccommodate a larger number of in-flight loads becomesdifficult if not impossible. Furthermore, each access to thiscomplex structure consumes excessive amounts of energy.In this paper, we solve the associative load queue scalabilityproblem by completely eliminating the associative loadqueue. Instead, data dependences and memory consistencyconstraints are enforced by simply re-executing loadinstructions in program order prior to retirement. Usingheuristics to filter the set of loads that must be re-executed,we show that our replay-based mechanism enables a simple,scalable, and energy-efficient FIFO load queue designwith no associative lookup functionality, while sacrificingonly a negligible amount of performance and cache bandwidth.

References

[1]
{1} H. Akkary, R. Rajwar, and S. T. Srinivasan. "Checkpoint processing and recovery: Towards scalable large instruction window processors." In Proc. of the 36th Intl. Symp. on Microarchitecture, December 2003.
[2]
{2} A. Alameldeen and D. Wood. "Variability in architectural simulations of multi-threaded workloads." In Proc. of the Ninth Intl. Symp. on High Performance Computer Architecture , February 2003.
[3]
{3} T. Austin. "DIVA: A reliable substrate for deep submicron microarchitecture design." In Proc. of the 32nd Intl. Symp. on Microarchitecture, pages 196-207, Haifa, Israel, November 1999.
[4]
{4} H. W. Cain, K. M. Lepak, B. A. Schwartz, and M. H. Lipasti. "Precise and accurate processor simulation." In Proc. of the Workshop on Computer Architecture Evaluation using Commercial Workloads, February 2002.
[5]
{5} A. Charlesworth, A. Phelps, R. Williams, and G. Gilbert. "Gigaplane-XB: extending the ultra enterprise family." In Proceedings of Hot Interconnects V, pages 97-112, August 1997.
[6]
{6} G. Z. Chrysos and J. S. Emer. "Memory dependence prediction using store sets." In Proc. of the 25th Intl. Symp. on Computer architecture, pages 142-153. IEEE Press, 1998.
[7]
{7} Compaq Computer Corporation, Shrewsbury, Massachusetts. 21264/EV68CB and 21264/EV68DC Hardware Reference Manual, 1.1 edition, June 2001.
[8]
{8} A. Condon and A. J. Hu. "Automatable verification of sequential consistency." In Proc. of the 13th Symp. on Parallel Algorithms and Architectures, January 2001.
[9]
{9} K. Gharachorloo, A. Gupta, and J. Hennessy. "Two techniques to enhance the performance of memory consistency models." In Proc. of the 1991 Intl. Conf. on Parallel Processing , pages 355-364, August 1991.
[10]
{10} Intel Corporation. Pentium Pro Family Developers Manual, Volume 3: Operating System Writers Manual, Jan. 1996.
[11]
{11} T. Keller, A. Maynard, R. Simpson, and P. Bohrer. "Simos-ppc full system simulator." http://www.cs.utex-as.edu/users/cart/simOS.
[12]
{12} A. KleinOsowski and D. J. Lilja. "Minnespec: A new SPEC benchmark workload for simulation-based computer architecture research." Computer Architecure Letters, 1, June 2002.
[13]
{13} A. Landin, E. Hagersten, and S. Haridi. "Race-free interconnection networks and multiprocessor consistency." In Proc. of the 18th Intl. Symp. on Comp. Architecture, 1991.
[14]
{14} K. M. Lepak and M. H. Lipasti. "On the value locality of store instructions." In Proceedings of the 27th International Symposium on Computer Architecture, pages 182-191, Vancouver, BC, June 2000.
[15]
{15} M. M. K. Martin, D. J. Sorin, H. W. Cain, M. D. Hill, and M. H. Lipasti. "Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing." In Proc. of the 34th Intl. Symp. on Microarchitecture , pages 328-337, December 2001.
[16]
{16} J. F. Martinez, J. Renau, M. C. Huang, M. Prvulovic, and J. Torrellas. "Cherry: checkpointed early resource recycling in out-of-order microprocessors." In Proceedings of the 35th annual Intl. Symp. on Microarchitecture, pages 3-14. November, 2002.
[17]
{17} S. Onder and R. Gupta. "Dynamic memory disambiguation in the presence of out-of-order store issuing." In Proc. of the 32nd Intl. Symp. on Microarchitecture, November 1999.
[18]
{18} I. Park, C.-L. Ooi, and T. N. Vijaykumar. "Reducing design complexity of the load-store queue." In Proc. of the 36th Intl. Symp. on Microarchitecture, December 2003.
[19]
{19} D. Ponomarev, G. Kucuk, and K. Ghose. "Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources." In Proc. of the 34th Intl. Symp. on Microarchitecture, December 2001.
[20]
{20} M. Rosenblum, S. Herrod, E. Witchel, and A. Gupta. "Complete computer simulation: the simos approach." IEEE Parallel and Distributed Technology, 3(4):34-43, 1995.
[21]
{21} S. Sethumadhavan, R. Desikan, D. Burger, C. R. Moore, and S. W. Keckler. "Scalable hardware memory disambiguation for high-ilp processors." In Proc. of the 36th Intl. Symp. on Microarchitecture, December 2003.
[22]
{22} P. Shivakumar and N. P. Jouppi. "Cacti 3.0: An integrated cache timing, power, and area model." Technical Report 2001/2, Compaq Western Research Lab Research Report, 2001.
[23]
{23} J. M. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy. "POWER4 system microarchitecture." Technical white paper, IBM Server Group, October 2001.
[24]
{24} S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. "The SPLASH2 programs: Characterization and methodological considerations." In Proceedings of the 22nd International Symposium on Computer Architecture, pages 24- 36, June 1995.
[25]
{25} K. C. Yeager. "The MIPS R10000 superscalar microprocessor." IEEE Micro, 16(2):28-40, April 1996.
[26]
{26} A. Yoaz, R. Ronen, R. S. Chappell, and Y. Almog. "Silence is golden?" In Work-in-progress workshop of the 7th International Symposium on High-Performance Computer Architecture, January 2001.

Cited By

View all
  • (2018)InvisiSpecProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00042(428-441)Online publication date: 20-Oct-2018
  • (2018)The superfluous load queueProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00017(95-107)Online publication date: 20-Oct-2018
  • (2017)Non-Speculative Load-Load Reordering in TSOACM SIGARCH Computer Architecture News10.1145/3140659.308022045:2(187-200)Online publication date: 24-Jun-2017
  • Show More Cited By
  1. Memory Ordering: A Value-Based Approach

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture
    June 2004
    373 pages
    ISBN:0769521436
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 32, Issue 2
      ISCA 2004
      March 2004
      373 pages
      ISSN:0163-5964
      DOI:10.1145/1028176
      Issue’s Table of Contents

    Sponsors

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 02 March 2004

    Check for updates

    Qualifiers

    • Article

    Conference

    ISCA04
    Sponsor:

    Acceptance Rates

    ISCA '04 Paper Acceptance Rate 31 of 217 submissions, 14%;
    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Upcoming Conference

    ISCA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)30
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)InvisiSpecProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00042(428-441)Online publication date: 20-Oct-2018
    • (2018)The superfluous load queueProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00017(95-107)Online publication date: 20-Oct-2018
    • (2017)Non-Speculative Load-Load Reordering in TSOACM SIGARCH Computer Architecture News10.1145/3140659.308022045:2(187-200)Online publication date: 24-Jun-2017
    • (2017)Non-Speculative Load-Load Reordering in TSOProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080220(187-200)Online publication date: 24-Jun-2017
    • (2015)Revisiting Clustered Microarchitecture for Future Superscalar CoresACM Transactions on Architecture and Code Optimization10.1145/280078712:3(1-22)Online publication date: 31-Aug-2015
    • (2013)Exploring memory consistency for massively-threaded throughput-oriented processorsACM SIGARCH Computer Architecture News10.1145/2508148.248594041:3(201-212)Online publication date: 23-Jun-2013
    • (2013)Exploring memory consistency for massively-threaded throughput-oriented processorsProceedings of the 40th Annual International Symposium on Computer Architecture10.1145/2485922.2485940(201-212)Online publication date: 23-Jun-2013
    • (2009)BulkCompilerProceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/1669112.1669131(133-144)Online publication date: 12-Dec-2009
    • (2009)Design and optimization of the store vectors memory dependence predictorACM Transactions on Architecture and Code Optimization10.1145/1596510.15965146:4(1-33)Online publication date: 29-Oct-2009
    • (2009)Decoupled store completion/silent deterministic replayACM SIGARCH Computer Architecture News10.1145/1555815.155578637:3(245-254)Online publication date: 20-Jun-2009
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media