Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2749469.2750409acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

LaZy superscalar

Published: 13 June 2015 Publication History
  • Get Citation Alerts
  • Abstract

    LaZy Superscalar is a processor architecture which delays the execution of fetched instructions until their results are needed by other instructions. This approach eliminates dead instructions and provides the necessary means to fuse dependent instructions across multiple control dependencies by explicitly tracking control and data dependencies through a matrix based scheduler. We present this novel redesign of scheduling, recovery and commit mechanisms and evaluate the performance of the proposed architecture. Our simulations using Spec 2006 benchmark suite indicate that LaZy Superscalar can achieve significant speed-ups while providing respectable power savings compared to a conventional superscalar processor.

    References

    [1]
    A. Bracy and A. Roth, "Encoding mini-graphs with handle prefix outlining," University of Pennsylvania, Tech. Rep. MS-CIS-08-36, January 2008.
    [2]
    D. Brooks, V. Tiwari, and M. Martonosi, "Wattch: a framework for architectural-level power analysis and optimizations," in Proceedings of the 27th Annual International Symposium on Computer Architecture, ser. ISCA '00. ACM, May 2000, pp. 83--94.
    [3]
    G. Z. Chrysos and J. S. Emer, "Memory dependence prediction using store sets," in Proceedings of the 25th Annual International Symposium on Computer Architecture, ser. ISCA '98. IEEE Computer Society, June 1998, pp. 142--153.
    [4]
    N. T. Clark, "Customizing the computation capabilities of microprocessors," Ph.D. dissertation, UNIVERSITY OF MICHIGAN, 2008.
    [5]
    B. Fields, S. Rubin, and R. Bodík, "Focusing processor policies via critical-path prediction," in Proceedings of the 28th Annual International Symposium on Computer Architecture, ser. ISCA '01. ACM, May 2001, pp. 74--85.
    [6]
    S. Gochman, R. Ronen, I. Anati, A. Berkovits, T. Kurts, A. Naveh, A. Saeed, Z. Sperber, and R. Valentine, "The intel pentium m processor: Microarchitecture and performance," Intel Technology Journal, vol. 7, no. 2, pp. 21--36, 2003.
    [7]
    S. Gochman, I. Anati, Z. Sperber, and R. Valentine, "Fusion of processor micro-operations," February 19 2004, US Patent 2004/0034757 A1.
    [8]
    M. Goshima, K. Nishino, T. Kitamura, Y. Nakashima, S. Tomita, and S.-i. Mori, "A high-speed dynamic instruction scheduling scheme for superscalar processors," in Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, ser. MICRO 34. IEEE Computer Society, 2001, pp. 225--236.
    [9]
    A. Henstrom, "Scheduling operations using a dependency matrix," April 29 2003, US Patent 6,557,095.
    [10]
    S. Hu, I. Kim, M. H. Lipasti, and J. E. Smith, "An approach for implementing efficient superscalar cisc processors," in Proceedings of the 12th annual International Symposium on High-Performance Computer Architecture, ser. HPCA 12. IEEE, February 2006, pp. 41--52.
    [11]
    S. Hu and J. E. Smith, "Using dynamic binary translation to fuse dependent instructions," in Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, ser. CGO '04. IEEE Computer Society, March 2004, p. 213.
    [12]
    M. Johnson, Superscalar Microprocessor Design. Prentice Hall, 1991.
    [13]
    R. Kessler, "The alpha 21264 microprocessor," Micro, IEEE, vol. 19, no. 2, pp. 24--36, 1999.
    [14]
    I. Kim and M. H. Lipasti, "Macro-op scheduling: Relaxing scheduling loop constraints," in Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 36. IEEE, December 2003, pp. 277--288.
    [15]
    S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 42. IEEE, December 2009, pp. 469--480.
    [16]
    N. Malik, R. J. Eickemeyer, and S. Vassiliadis, "Interlock collapsing alu for increased instruction-level parallelism," in Proceedings of the 25th Annual International Symposium on Microarchitecture, ser. MICRO 25. IEEE, December 1992, pp. 149--157.
    [17]
    A. A. Merchant and D. J. Sager, "Scheduling operations using a dependency matrix," December 25 2001, US Patent 6,334,182 B2.
    [18]
    A. Moshovos and G. S. Sohi, "Speculative memory cloaking and bypassing," Int. J. Parallel Program., vol. 27, no. 6, pp. 427--456, December 1999. Available: http://dx.doi.org/10.1023/A:1018776132598
    [19]
    S. Önder and R. Gupta, "Automatic generation of microarchitecture simulators," in Proceedings of the 1998 International Conference on Computer Languages, ser. ICCL '98. Chicago: IEEE Computer Society, May 1998, pp. 80--89.
    [20]
    S. Önder and R. Gupta, "Instruction wake-up in wide issue super-scalars," in Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing, ser. Euro-Par '01. Manchester, UK, LNCS 2150: Springer-Verlag, August 2001, pp. 418--427.
    [21]
    S. Önder and R. Gupta, "Load and store reuse using register file contents," in Proceedings of the 15th International Conference on Supercomputing, ser. ICS '01. Sorrento, Italy: ACM, June 2001, pp. 289--302.
    [22]
    S. Palacharla, N. P. Jouppi, and J. Smith, "Complexity-effective superscalar processors," in Proceedings of the 24th Annual International Symposium on Computer Architecture, ser. ISCA '97. ACM, June 1997, pp. 206--218.
    [23]
    R. Razdan and M. D. Smith, "A high-performance microarchitecture with hardware-programmable functional units," in Proceedings of the 27th Annual International Symposium on Microarchitecture, ser. MICRO 27. ACM, December 1994, pp. 172--180.
    [24]
    R. Ronen, A. Peleg, and N. Hoffman, "System and method for fusing instructions," January 6 2004, US Patent 6,675,376 B2.
    [25]
    E. Safi, A. Moshovos, and A. Veneris, "A physical-level study of the compacted matrix instruction scheduler for dynamically-scheduled superscalar processors," in Proceedings of the 9th International Conference on Systems, Architectures, Modeling and Simulation, ser. SAMOS'09. IEEE Press, July 2009, pp. 41--48.
    [26]
    H. Sasaki, M. Kondo, and H. Nakamura, "Energy-efficient dynamic instruction scheduling logic through instruction grouping," in Proceedings of the 2006 International Symposium on Low Power Electronics and Design, ser. ISLPED '06. ACM, October 2006, pp. 43--48.
    [27]
    P. G. Sassone, J. Rupley II, E. Brekelbaum, G. H. Loh, and B. Black, "Matrix scheduler reloaded," ACM SIGARCH Computer Architecture News, vol. 35, no. 2, pp. 335--346, May 2007.
    [28]
    P. G. Sassone and D. S. Wills, "Dynamic strands: Collapsing speculative dependence chains for reducing pipeline communication," in Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 37. IEEE Computer Society, December 2004, pp. 7--17.
    [29]
    P. G. Sassone, D. S. Wills, and G. H. Loh, "Static strands: safely collapsing dependence chains for increasing embedded power efficiency," ACM SIGPLAN NOTICES, vol. 40, no. 7, p. 127, June 2005.
    [30]
    J. Stark, M. D. Brown, and Y. N. Patt, "On pipelining dynamic instruction scheduling logic," in Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture, ser. MICRO 33. ACM, 2000, pp. 57--66.
    [31]
    N. Zaidi, G. Hammond, K. Shoemaker, and J. Baxter, "Dependency matrix," May 16 2000, US Patent 6,065,105.

    Cited By

    View all
    • (2017)A mechanism for energy-efficient reuse of decoding and scheduling of x86 instruction streamsProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130724(1472-1477)Online publication date: 27-Mar-2017
    • (2019)MANICProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358277(670-684)Online publication date: 12-Oct-2019

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
    June 2015
    768 pages
    ISBN:9781450334020
    DOI:10.1145/2749469
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ISCA '15
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Upcoming Conference

    ISCA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)31
    • Downloads (Last 6 weeks)1

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)A mechanism for energy-efficient reuse of decoding and scheduling of x86 instruction streamsProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130724(1472-1477)Online publication date: 27-Mar-2017
    • (2019)MANICProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358277(670-684)Online publication date: 12-Oct-2019

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media