Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1183401.1183427acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

A scalable low power issue queue for large instruction window processors

Published: 28 June 2006 Publication History
  • Get Citation Alerts
  • Abstract

    Large instruction windows and issue queues are key to exploiting greater instruction level parallelism in out-of-order superscalar processors. However, the cycle time and energy consumption of conventional large monolithic issue queues are high. Previous efforts to reduce cycle time segment the issue queue and pipeline wakeup. Unfortunately, this results in significant IPC loss. Other proposals which address energy efficiency issues by avoiding only the unnecessary tag-comparisons do not reduce broadcasts. These schemes also increase the issue latency.To address both these issues comprehensively, we propose the Scalable Lowpower Issue Queue (SLIQ). SLIQ augments a pipelined issue queue with direct indexing to mitigate the problem of delayed wakeups while reducing the cycle time. Also, the SLIQ design naturally leads to significant energy savings by reducing both the number of tag broadcasts and comparisons required.A 2 segment SLIQ incurs an average IPC loss of 0.2% over the entire SPEC CPU2000 suite, while achieving a 25.2% reduction in issue latency when compared to a monolithic 128-entry issue queue for an 8-wide superscalar processor. An 8 segment SLIQ improves scalability by reducing the issue latency by 38.3% while incurring an IPC loss of only 2.3%. Further, the 8 segment SLIQ significantly reduces the energy consumption and energy-delay product by 48.3% and 67.4% respectively on average.

    References

    [1]
    V. Agarwal, M. Hrishikesh, S. Keckler, and D. Burger. Clock rate versus IPC: The end of the road for conventional microarchitectures. In Proceedings of the 27th Annual International Symposium on Computer Architecture, 2000.
    [2]
    H. Akkary, R. Rajwar, and S. T. Srinivasan. Checkpoint processing and recovery: Towards scalable large instruction window processors. In Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003.
    [3]
    M. Brown, J. Stark, and Y. Patt. Select-free instruction scheduling logic. In Proceedings of 34th International Symposium on Microarchitecture, 2001.
    [4]
    D. C. Burger and T. M. Austin. The Simplescalar tool set, version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin, Madison, 1997.
    [5]
    A. Buyuktosunoglu and D. H. Albonesi. Tradeoffs in power-efficient issue queue design. In Proceedings of the International Symposium on Low Power Electronics and Design, 2002.
    [6]
    A. Cristal, D. Ortega, J. Llosa, and M. Valero. Kilo-instruction processors. ACM Transactions on Architecture and Code Optimization, 1(4), 2004.
    [7]
    D. Ernst, A. Hamel, and T. Austin. Cyclone: a broadcast-free dynamic instruction scheduler selective replay. In Proceedings of the 30th Annual International Symposium on Computer Architecture, 2003.
    [8]
    D. Ernst and T. M. Austin. Efficient dynamic scheduling through tag elimination. In Proceedings of 29th International Symposium on Computer Architecture, 2002.
    [9]
    R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors. In IEEE Journal of Solid-State Circuits, 1996.
    [10]
    M. Goshima, K. Nishino, Y. Nakashima, S. Mori, T. Kitamura, and S. Tomita. A high-speed dynamic instruction scheduling scheme for superscalar processors. In Proceedings of the 34th International Symposium on Microarchitecture, 2001.
    [11]
    M. Hrishikesh, N. P. Jouppi, K. I. Farkas, D. Burger, and S. W. K. P. Shivakumar. The optimal useful logic depth per pipeline stages is 6-8 fo4. In Proceedings of 29th International Symposium on Computer Architecture, 2002.
    [12]
    J. S. Hu, N. Vijaykrishnan, and M. J. Irwin. Exploring wakeup-free instruction scheduling. In Proceedings of the 10th International Symposium on High Performance Computer Architecture, 2004.
    [13]
    M. Huang, J. Renau, and J. Torellas. Energy-efficient hybrid wakeup logic. In Proceedings of the International Symposium on Low Power Electronics and Design, 2002.
    [14]
    C. N. Keltcher, K. J. McGrath, A. Ahmed, and P. Conway. The AMD Opteron processor for multiprocessor servers. IEEE Micro, 23(2), 2003.
    [15]
    I. Kim and M. Lipasti. Half price architecture. In Proceedings of the 30th International Symposium on Microarchitecture, 2003.
    [16]
    A. KleinOsowski, J. Flynn, N. Meares, and D. J. Lilja. Adapting the SPEC2000 benchmarks suite for simulation-based computer architecture research. In Workshop on Workload Characterization in International Conference on Computer Design, 2000.
    [17]
    P. Michaud and A. Seznec. Data-flow prescheduling for large instruction windows in out-of-order processors. In Proceedings of 7th International Symposium on High Performance Computer Architecture, 2001.
    [18]
    Mosis.org. www.mosis.org/cgi-bin/cgiwrap/umosis/swp/params/ibm-013/t4bj-params.txt.
    [19]
    S. Palacharla, N. P. Jouppi, and J. E. Smith. Complexity-effective superscalar processors. In Proceedings of 24th International Symposium on Computer Architecture, 1997.
    [20]
    T. Sato, Y. Nakamura, and I. Arita. Revisiting direct tag search algorithm on superscalar processors. In Workshop on Complexity-Effective Design held in conjunction with the 28th Annual International Symposium on Computer Architecture, 2004.
    [21]
    T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming, 2002.
    [22]
    P. Shivakumar and N. P. Jouppi. Cacti 3.0: An integrated cache timing, power, and area model. Technical report, Western Research Laboratory, Compaq Computer Corporation, 2001.
    [23]
    J. Stark, M. Brown, and Y. Patt. On pipelining dynamic instruction scheduling logic. In Proceedings of the 33rd International Symposium on Microarchitecture, 2000.
    [24]
    R. Vivekanandham, B. Amrutur, and R. Govindarajan. A scalable low power issue queue for large instruction window processors. Technical Report TR-LHPC-01-2006, HPC, SERC, Indian Institute of Science, 2005.
    [25]
    D. Wall. Limits of instruction-level parallelism. Technical report, Western Research Laboratory, Compaq Computer Corporation, 1993.
    [26]
    N. Weste and D. Harris. CMOS VLSI Design: A Circuits and Systems Perspective, 3rd edition. Addison-Wesley Publishing Company, 2005.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '06: Proceedings of the 20th annual international conference on Supercomputing
    June 2006
    385 pages
    ISBN:1595932828
    DOI:10.1145/1183401
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 June 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. complexity-effective architecture
    2. issue logic
    3. low-power architecture
    4. wakeup logic

    Qualifiers

    • Article

    Conference

    ICS06
    Sponsor:
    ICS06: International Conference on Supercomputing 2006
    June 28 - July 1, 2006
    Queensland, Cairns, Australia

    Acceptance Rates

    ICS '06 Paper Acceptance Rate 37 of 141 submissions, 26%;
    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)OmegaflowProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460367(152-163)Online publication date: 3-Jun-2021
    • (2010)ForwardflowACM SIGARCH Computer Architecture News10.1145/1816038.181596638:3(14-25)Online publication date: 19-Jun-2010
    • (2010)ForwardflowProceedings of the 37th annual international symposium on Computer architecture10.1145/1815961.1815966(14-25)Online publication date: 19-Jun-2010
    • (2007)Indirect Tag Search Mechanism for Instruction Window Energy Reduction7th IEEE International Conference on Computer and Information Technology (CIT 2007)10.1109/CIT.2007.98(841-846)Online publication date: Oct-2007

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media