Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1254766.1254800acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
Article

Addressing instruction fetch bottlenecks by using an instruction register file

Published: 13 June 2007 Publication History

Abstract

The Instruction Register File (IRF) is an architectural extension for providing improved access to frequently occurring instructions. An optimizing compiler can exploit an IRF by packing an application's instructions, resulting in decreased code size, reduced energy consumption and improved execution time primarily due to a smaller footprint in the instruction cache. The nature of the IRF also allows the execution of packed instructions to overlap with instruction fetch, thus providing a means for tolerating increased fetch latencies, like those experienced by encrypted ICs as well as the presence of low-power L0 caches. Although previous research has focused on the direct benefits of instruction packing, this paper explores the use of increased fetch bandwidth provided by packed instructions. Small L0 caches improve energy efficiency but can increase execution time due to frequent cache misses. We show that this penalty can be significantly reduced by overlapping the execution of packed instructions with miss stalls. The IRF can also be used to supply additional instructions to a more aggressive execution engine, effectively reducing dependence on instruction cache bandwidth. This can improve energy efficiency, in addition to providing additional flexibility for evaluating various design tradeoffs in a pipeline with asymmetric instruction bandwidth. Thus, we show that the IRF is a complementary technique, operating as a buffer tolerating fetch bottlenecks, as well as providing additional fetch bandwidth for an aggressive pipeline backend.

References

[1]
ANNAVARAM, M., GROCHOWSKI, E., AND SHEN, J. Mitigating amdahl's law through epi throttling. In Proceedings of the 2005 ACM/IEEE International Symposium on Computer Architecture (2005), IEEE Computer Society, pp. 298--309.
[2]
ARAGÓN, J. L., GONZÁLEZ, J., AND GONZÁLEZ, A. Power-aware control speculation through selective throttling. In HPCA'03: Proceedings of the 9th International Symposium on High-Performance Computer Architecture (Washington, DC, USA, 2003), IEEE Computer Society, pp. 103--112.
[3]
AUSTIN, T., LARSON, E., AND ERNST, D. SimpleScalar: An infrastructure for computer system modeling. IEEE Computer 35 (February 2002), 59--67.
[4]
BANIASADI, A., AND MOSHOVOS, A. Instruction flow-based front-end throttling for power-aware high-performance processors. In ISLPED '01: Proceedings of the 2001 international symposium on Low power electronics and design (New York, NY, USA, 2001), ACM Press, pp. 16--21.
[5]
BANIASADI, A., AND MOSHOVOS, A. Asymmetric-frequency clustering: a power-aware back-end for high-performance processors. In ISLPED '02: Proceedings of the 2002 international symposium on Low power electronics and design (New York, NY, USA, 2002), ACM Press, pp. 255--258.
[6]
BELLAS, N., HAJJ, I., POLYCHRONOPOULOS, C., AND STA-MOULIS, G. Energy and performance improvements in a microprocessor design using a loop cache. In Proceedings of the 1999 International Conference on Computer Design (October 1999), pp. 378--383.
[7]
BELLAS, N. E., HAJJ, I. N., AND POLYCHRONOPOULOS, C. D. Using dynamic cache management techniques to reduce energy in general purpose processors. IEEE Transactions on Very Large Scale Integrated Systems 8, 6 (2000), 693--708.
[8]
BENITEZ, M. E., AND DAVI DS ON, J. W. A portable global optimizer and linker. In Proceedings of the SIGPLAN'88 conference on Programming Language Design and Implementation (1988), ACM Press, pp. 329--338.
[9]
BROOKS, D., TIWARI, V., AND MARTONOSI, M. Wattch: A framework for architectural-level power analysis and optimizations. In ISCA '00: Proceedings of the 27th annual International Symposium on Computer architecture (New York, NY, USA, 2000), ACM Press, pp. 83--94.
[10]
COOPER, K., AND MCINTOSH, N. Enhanced code compression for embedded risc processors. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (May 1999), pp. 139--149.
[11]
DEBRAY, S. K., EVANS, W., MUTH, R., AND DESUTTER, B. Compiler techniques for code compaction. ACM Transactions on Programming Languages and Systems 22, 2 (March 2000), 378--415.
[12]
EYRE, J., AND BIER, J. DSP processors hit the mainstream. IEEE Computer 31, 8 (August 1998), 51--59.
[13]
FOLEGNANI, D., AND GONZÁLEZ, A. Energy-effective issue logic. In Proceedings of the 28th annual International Symposium on Computer architecture (New York, NY, USA, 2001), ACM Press, pp. 230--239.
[14]
FRASER, C. W., MYERS, E. W., AND WENDT, A. L. Analyzing and compressing assembly code. In Proceedings of the SIGPLAN '84 Symposium on Compiler Construction (June 1984), pp. 117--121.
[15]
GORDON-ROSS, A., COTTERELL, S., AND VAHID, F. Tiny instruction caches for low power embedded systems. Trans. on Embedded Computing Sys. 2, 4 (2003), 449--481.
[16]
GUTHAUS, M. R., RINGENBERG, J. S., ERNST, D., AUSTIN, T. M., MUDGE, T., AND BROWN, R. B. MiBench: A free, commercially representative embedded benchmark suite. IEEE 4th Annual Workshop on Workload Characterization (December 2001).
[17]
HINES, S., GREEN, J., TYSON, G., AND WHALLEY, D. Improving program efficiency by packing instructions into registers. In Proceedings of the 2005 ACM/IEEE International Symposium on Computer Architecture (2005), IEEE Computer Society, pp. 260--271.
[18]
HINES, S., TYSON, G., AND WHALLEY, D. Reducing instruction fetch cost by packing instructions into register windows. In Proceedings of the 38th annual ACM/IEEE International Symposium on Microarchitecture (November 2005), IEEE Computer Society, pp. 19--29.
[19]
HINES, S., WHALLEY, D., AND TYSON, G. Adapting compilation techniques to enhance the packing of instructions into registers. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (October 2006), pp. 43--53.
[20]
KIM, N. S., FLAUTNER, K., BLAAUW, D., AND MUDGE, T. Drowsy instruction caches: Leakage power reduction using dynamic voltage scaling and cache sub-bank prediction. In Proceedings of the 35th annual ACM/IEEE International Symposium on Microarchitecture (Los Alamitos, CA, USA, 2002), IEEE Computer Society Press, pp. 219--230.
[21]
KIN, J., GUPTA, M., AND MANGIONE-SMITH, W. H. The filter cache: An energy efficient memory structure. In Proceedings of the 1997 International Symposium on Microarchitecture (1997), pp. 184--193.
[22]
LAU, J., SCHOENMACKERS, S., SHERWOOD, T., AND CALDER, B. Reducing code size with echo instructions. In Proceedings of the 2003 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (2003), ACM Press, pp. 84--94.
[23]
LEE, C., POTKONJAK, M., AND MANGIONE-SMITH, W. H. MediaBench: A tool for evaluating and synthesizing multimedia and communicatons systems. In MICRO 30: Proceedings of the 30th annual ACM/IEEE International Symposium on Microarchitecture (Washington, DC, USA, 1997), IEEE Computer Society, pp. 330--335.
[24]
LEE, L., MOYER, B., AND ARENDS, J. Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In Proceedings of the International Symposium on Low Power Electronics and Design (1999), pp. 267--269.
[25]
LEE, L., MOYER, B., AND ARENDS, J. Low-cost embedded program loop caching -- revisited. Tech. Rep. CSE-TR-411-99, University of Michigan, 1999.
[26]
LEFURGY, C., BIRD, P., CHEN, I.-C., AND MUDGE, T. Improving code density using compression techniques. In Proceedings of the 1997 International Symposium on Microarchitecture (December 1997), pp. 194--203.
[27]
LEFURGY, C. R. Efficient execution of compressed programs. PhD thesis, University of Michigan, 2000.
[28]
MANNE, S., KLAUSER, A., AND GRUNWALD, D. Pipeline gating: speculation control for energy reduction. In Proceedings of the 1998 ACM/IEEE International Symposium on Computer Architecture (1998), IEEE Computer Society, pp. 132--141.
[29]
MONTANARO, J., WITEK, R. T., ANNE, K., BLACK, A. J., COOPER, E. M., DOBBERPUHL, D. W., DONAHUE, P. M., ENO, J., HOEPPNER, G. W., KRUCKEMYER, D., LEE, T. H., LIN, P. C. M., MADDEN, L., MURRAY, D., PEARCE, M. H., SANTHANAM, S., SNYDER, K. J., STEPHANY, R., AND THIERAUF, S. C. A 160-mhz, 32-b, 0. 5-W CMOS RISC microprocessor. Digital Tech. J. 9, 1 (1997), 49--62.
[30]
SHERWOOD, T., SAIR, S., AND CALDER, B. Phase tracking and prediction. SIGARCH Comput. Archit. News 31, 2 (2003), 336--349.
[31]
SHI, W., LEE, H.-H. S., GHOSH, M., LU, C., AND BOLDYREVA, A. High efficiency counter mode security architecture via prediction and precomputation. In Proceedings of the 2005 ACM/IEEE International Symposium on Computer Architecture (2005), IEEE Computer Society, pp. 14--24.
[32]
TANG, W., VEIDENBAUM, A. V., AND GUPTA, R. Architectural adaptation for power and performance. In Proceedings of the 2001 International Conference on ASIC (October 2001), pp. 530--534.
[33]
WEAVER, D., AND GERMOND, T. The SPARC Architecture Manual, 1994.
[34]
WILTON, S. J., AND JOUPPI, N. P. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid State Circuits 31, 5 (May 1996), 677--688.

Cited By

View all
  • (2010)Fine-grain dynamic instruction placement for L0 scratch-pad memoryProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878943(137-146)Online publication date: 24-Oct-2010
  • (2008)Enhancing the effectiveness of utilizing an instruction register file2008 IEEE International Symposium on Parallel and Distributed Processing10.1109/IPDPS.2008.4536413(1-5)Online publication date: Apr-2008

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
June 2007
258 pages
ISBN:9781595936325
DOI:10.1145/1254766
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 42, Issue 7
    Proceedings of the 2007 LCTES conference
    July 2007
    241 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1273444
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. L0/filter cache
  2. instruction packing
  3. instruction register file

Qualifiers

  • Article

Conference

Acceptance Rates

Overall Acceptance Rate 116 of 438 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2010)Fine-grain dynamic instruction placement for L0 scratch-pad memoryProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878943(137-146)Online publication date: 24-Oct-2010
  • (2008)Enhancing the effectiveness of utilizing an instruction register file2008 IEEE International Symposium on Parallel and Distributed Processing10.1109/IPDPS.2008.4536413(1-5)Online publication date: Apr-2008

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media