Article

Addressing instruction fetch bottlenecks by using an instruction register file

Authors:

Stephen Roderick Hines,

David WhalleyAuthors Info & Claims

LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

Pages 165 - 174

https://doi.org/10.1145/1254766.1254800

Published: 13 June 2007 Publication History

Abstract

The Instruction Register File (IRF) is an architectural extension for providing improved access to frequently occurring instructions. An optimizing compiler can exploit an IRF by packing an application's instructions, resulting in decreased code size, reduced energy consumption and improved execution time primarily due to a smaller footprint in the instruction cache. The nature of the IRF also allows the execution of packed instructions to overlap with instruction fetch, thus providing a means for tolerating increased fetch latencies, like those experienced by encrypted ICs as well as the presence of low-power L0 caches. Although previous research has focused on the direct benefits of instruction packing, this paper explores the use of increased fetch bandwidth provided by packed instructions. Small L0 caches improve energy efficiency but can increase execution time due to frequent cache misses. We show that this penalty can be significantly reduced by overlapping the execution of packed instructions with miss stalls. The IRF can also be used to supply additional instructions to a more aggressive execution engine, effectively reducing dependence on instruction cache bandwidth. This can improve energy efficiency, in addition to providing additional flexibility for evaluating various design tradeoffs in a pipeline with asymmetric instruction bandwidth. Thus, we show that the IRF is a complementary technique, operating as a buffer tolerating fetch bottlenecks, as well as providing additional fetch bandwidth for an aggressive pipeline backend.

References

[1]

ANNAVARAM, M., GROCHOWSKI, E., AND SHEN, J. Mitigating amdahl's law through epi throttling. In Proceedings of the 2005 ACM/IEEE International Symposium on Computer Architecture (2005), IEEE Computer Society, pp. 298--309.

Digital Library

[2]

ARAGÓN, J. L., GONZÁLEZ, J., AND GONZÁLEZ, A. Power-aware control speculation through selective throttling. In HPCA'03: Proceedings of the 9th International Symposium on High-Performance Computer Architecture (Washington, DC, USA, 2003), IEEE Computer Society, pp. 103--112.

Digital Library

[3]

AUSTIN, T., LARSON, E., AND ERNST, D. SimpleScalar: An infrastructure for computer system modeling. IEEE Computer 35 (February 2002), 59--67.

Digital Library

[4]

BANIASADI, A., AND MOSHOVOS, A. Instruction flow-based front-end throttling for power-aware high-performance processors. In ISLPED '01: Proceedings of the 2001 international symposium on Low power electronics and design (New York, NY, USA, 2001), ACM Press, pp. 16--21.

Digital Library

[5]

BANIASADI, A., AND MOSHOVOS, A. Asymmetric-frequency clustering: a power-aware back-end for high-performance processors. In ISLPED '02: Proceedings of the 2002 international symposium on Low power electronics and design (New York, NY, USA, 2002), ACM Press, pp. 255--258.

Digital Library

[6]

BELLAS, N., HAJJ, I., POLYCHRONOPOULOS, C., AND STA-MOULIS, G. Energy and performance improvements in a microprocessor design using a loop cache. In Proceedings of the 1999 International Conference on Computer Design (October 1999), pp. 378--383.

Digital Library

[7]

BELLAS, N. E., HAJJ, I. N., AND POLYCHRONOPOULOS, C. D. Using dynamic cache management techniques to reduce energy in general purpose processors. IEEE Transactions on Very Large Scale Integrated Systems 8, 6 (2000), 693--708.

Digital Library

[8]

BENITEZ, M. E., AND DAVI DS ON, J. W. A portable global optimizer and linker. In Proceedings of the SIGPLAN'88 conference on Programming Language Design and Implementation (1988), ACM Press, pp. 329--338.

Digital Library

[9]

BROOKS, D., TIWARI, V., AND MARTONOSI, M. Wattch: A framework for architectural-level power analysis and optimizations. In ISCA '00: Proceedings of the 27th annual International Symposium on Computer architecture (New York, NY, USA, 2000), ACM Press, pp. 83--94.

Digital Library

[10]

COOPER, K., AND MCINTOSH, N. Enhanced code compression for embedded risc processors. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (May 1999), pp. 139--149.

Digital Library

[11]

DEBRAY, S. K., EVANS, W., MUTH, R., AND DESUTTER, B. Compiler techniques for code compaction. ACM Transactions on Programming Languages and Systems 22, 2 (March 2000), 378--415.

Digital Library

[12]

EYRE, J., AND BIER, J. DSP processors hit the mainstream. IEEE Computer 31, 8 (August 1998), 51--59.

Digital Library

[13]

FOLEGNANI, D., AND GONZÁLEZ, A. Energy-effective issue logic. In Proceedings of the 28th annual International Symposium on Computer architecture (New York, NY, USA, 2001), ACM Press, pp. 230--239.

Digital Library

[14]

FRASER, C. W., MYERS, E. W., AND WENDT, A. L. Analyzing and compressing assembly code. In Proceedings of the SIGPLAN '84 Symposium on Compiler Construction (June 1984), pp. 117--121.

Digital Library

[15]

GORDON-ROSS, A., COTTERELL, S., AND VAHID, F. Tiny instruction caches for low power embedded systems. Trans. on Embedded Computing Sys. 2, 4 (2003), 449--481.

Digital Library

[16]

GUTHAUS, M. R., RINGENBERG, J. S., ERNST, D., AUSTIN, T. M., MUDGE, T., AND BROWN, R. B. MiBench: A free, commercially representative embedded benchmark suite. IEEE 4th Annual Workshop on Workload Characterization (December 2001).

Digital Library

[17]

HINES, S., GREEN, J., TYSON, G., AND WHALLEY, D. Improving program efficiency by packing instructions into registers. In Proceedings of the 2005 ACM/IEEE International Symposium on Computer Architecture (2005), IEEE Computer Society, pp. 260--271.

Digital Library

[18]

HINES, S., TYSON, G., AND WHALLEY, D. Reducing instruction fetch cost by packing instructions into register windows. In Proceedings of the 38th annual ACM/IEEE International Symposium on Microarchitecture (November 2005), IEEE Computer Society, pp. 19--29.

Digital Library

[19]

HINES, S., WHALLEY, D., AND TYSON, G. Adapting compilation techniques to enhance the packing of instructions into registers. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (October 2006), pp. 43--53.

Digital Library

[20]

KIM, N. S., FLAUTNER, K., BLAAUW, D., AND MUDGE, T. Drowsy instruction caches: Leakage power reduction using dynamic voltage scaling and cache sub-bank prediction. In Proceedings of the 35th annual ACM/IEEE International Symposium on Microarchitecture (Los Alamitos, CA, USA, 2002), IEEE Computer Society Press, pp. 219--230.

Digital Library

[21]

KIN, J., GUPTA, M., AND MANGIONE-SMITH, W. H. The filter cache: An energy efficient memory structure. In Proceedings of the 1997 International Symposium on Microarchitecture (1997), pp. 184--193.

Digital Library

[22]

LAU, J., SCHOENMACKERS, S., SHERWOOD, T., AND CALDER, B. Reducing code size with echo instructions. In Proceedings of the 2003 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (2003), ACM Press, pp. 84--94.

Digital Library

[23]

LEE, C., POTKONJAK, M., AND MANGIONE-SMITH, W. H. MediaBench: A tool for evaluating and synthesizing multimedia and communicatons systems. In MICRO 30: Proceedings of the 30th annual ACM/IEEE International Symposium on Microarchitecture (Washington, DC, USA, 1997), IEEE Computer Society, pp. 330--335.

Digital Library

[24]

LEE, L., MOYER, B., AND ARENDS, J. Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In Proceedings of the International Symposium on Low Power Electronics and Design (1999), pp. 267--269.

Digital Library

[25]

LEE, L., MOYER, B., AND ARENDS, J. Low-cost embedded program loop caching -- revisited. Tech. Rep. CSE-TR-411-99, University of Michigan, 1999.

[26]

LEFURGY, C., BIRD, P., CHEN, I.-C., AND MUDGE, T. Improving code density using compression techniques. In Proceedings of the 1997 International Symposium on Microarchitecture (December 1997), pp. 194--203.

Digital Library

[27]

LEFURGY, C. R. Efficient execution of compressed programs. PhD thesis, University of Michigan, 2000.

Digital Library

[28]

MANNE, S., KLAUSER, A., AND GRUNWALD, D. Pipeline gating: speculation control for energy reduction. In Proceedings of the 1998 ACM/IEEE International Symposium on Computer Architecture (1998), IEEE Computer Society, pp. 132--141.

Digital Library

[29]

MONTANARO, J., WITEK, R. T., ANNE, K., BLACK, A. J., COOPER, E. M., DOBBERPUHL, D. W., DONAHUE, P. M., ENO, J., HOEPPNER, G. W., KRUCKEMYER, D., LEE, T. H., LIN, P. C. M., MADDEN, L., MURRAY, D., PEARCE, M. H., SANTHANAM, S., SNYDER, K. J., STEPHANY, R., AND THIERAUF, S. C. A 160-mhz, 32-b, 0. 5-W CMOS RISC microprocessor. Digital Tech. J. 9, 1 (1997), 49--62.

Digital Library

[30]

SHERWOOD, T., SAIR, S., AND CALDER, B. Phase tracking and prediction. SIGARCH Comput. Archit. News 31, 2 (2003), 336--349.

Digital Library

[31]

SHI, W., LEE, H.-H. S., GHOSH, M., LU, C., AND BOLDYREVA, A. High efficiency counter mode security architecture via prediction and precomputation. In Proceedings of the 2005 ACM/IEEE International Symposium on Computer Architecture (2005), IEEE Computer Society, pp. 14--24.

Digital Library

[32]

TANG, W., VEIDENBAUM, A. V., AND GUPTA, R. Architectural adaptation for power and performance. In Proceedings of the 2001 International Conference on ASIC (October 2001), pp. 530--534.

[33]

WEAVER, D., AND GERMOND, T. The SPARC Architecture Manual, 1994.

[34]

WILTON, S. J., AND JOUPPI, N. P. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid State Circuits 31, 5 (May 1996), 677--688.

Cited By

Park JBalfour JDally WKathail VTatge RBarua R(2010)Fine-grain dynamic instruction placement for L0 scratch-pad memoryProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878943(137-146)Online publication date: 24-Oct-2010
https://dl.acm.org/doi/10.1145/1878921.1878943
Whalley DTyson G(2008)Enhancing the effectiveness of utilizing an instruction register file2008 IEEE International Symposium on Parallel and Distributed Processing10.1109/IPDPS.2008.4536413(1-5)Online publication date: Apr-2008
https://doi.org/10.1109/IPDPS.2008.4536413

Index Terms

Addressing instruction fetch bottlenecks by using an instruction register file

Recommendations

Addressing instruction fetch bottlenecks by using an instruction register file
Proceedings of the 2007 LCTES conference

The Instruction Register File (IRF) is an architectural extension for providing improved access to frequently occurring instructions. An optimizing compiler can exploit an IRF by packing an application's instructions, resulting in decreased code size, ...
Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE)
LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

Instruction fetch behavior has been shown to be very regular and predictable, even for diverse application areas. In this work, we propose the Lookahead Instruction Fetch Engine (LIFE), which is designed to exploit the regularity present in instruction ...
Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE)
LCTES '09

Instruction fetch behavior has been shown to be very regular and predictable, even for diverse application areas. In this work, we propose the Lookahead Instruction Fetch Engine (LIFE), which is designed to exploit the regularity present in instruction ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

June 2007

258 pages

ISBN:9781595936325

DOI:10.1145/1254766

General Chair:
Santosh Pande
Georgia Institute of Technology, USA
,
Program Chair:
Zhiyuan Li
Purdue University, USA

ACM SIGPLAN Notices Volume 42, Issue 7
Proceedings of the 2007 LCTES conference
July 2007
241 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1273444
Issue’s Table of Contents

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

LCTES 07

Sponsor:

LCTES 07: ACM SIGBED-SIGPLAN Conference on Languages, Compilers and Tools for Embedded Systems

June 13 - 15, 2007

California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 116 of 438 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
322
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Park JBalfour JDally WKathail VTatge RBarua R(2010)Fine-grain dynamic instruction placement for L0 scratch-pad memoryProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878943(137-146)Online publication date: 24-Oct-2010
https://dl.acm.org/doi/10.1145/1878921.1878943
Whalley DTyson G(2008)Enhancing the effectiveness of utilizing an instruction register file2008 IEEE International Symposium on Parallel and Distributed Processing10.1109/IPDPS.2008.4536413(1-5)Online publication date: Apr-2008
https://doi.org/10.1109/IPDPS.2008.4536413

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents