research-article

PICA: Processor Idle Cycle Aggregation for Energy-Efficient Embedded Systems

Authors:

Aviral ShrivastavaAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 11, Issue 2

Article No.: 26, Pages 1 - 27

https://doi.org/10.1145/2220336.2220338

Published: 01 July 2012 Publication History

Abstract

Processor Idle Cycle Aggregation (PICA) is a promising approach for low-power execution of processors, in which small memory stalls are aggregated to create large ones, enabling profitable switch of the processor into low-power mode. We extend the previous approach in three dimensions. First we develop static analysis for the PICA technique and present optimal parameters for five common types of loops based on steady-state analysis. Second, to remedy the weakness of software-only control in varying environment, we enhance PICA with minimal hardware extension that ensures correct execution for any loops and parameters, thus greatly facilitating exploration-based parameter tuning. Third, we demonstrate that our PICA technique can be applied to certain types of nested loops with variable bounds, thus enhancing the applicability of PICA. We validate our analytical model against simulation-based optimization and also show, through our experiments on embedded application benchmarks, that our technique can be applied to a wide range of loops with average 20% energy reductions, compared to executions without PICA.

References

[1]

Azevedo, A., Issenin, I., Cornea, R., Gupta, R., Dutt, N., Veidenbaum, A., and Nicolau, A. 2002. Profile-based dynamic voltage scheduling using program checkpoints. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’02). IEEE Computer Society, Los Alamitos, CA, 168.

Digital Library

[2]

Benini, L., Bogliolo, A., and Micheli, G. 2000. A survey of design techniques for system-level dynamic power management. IEEE Trans. VLSI Syst. 8, 3, 299--316.

Digital Library

[3]

Brockmeyer, E., Miranda, M., Corporaal, H., and Catthoor, F. 2003. Layer assignment techniques for low energy in multi-layered memory organisations. In Proc. 6th ACM/IEEE Design and Test in Europe Conf. Munich, Germany, 1070--1075.

Digital Library

[4]

Burd, T. D. and Brodersen, R. W. 2000. Design issues for dynamic voltage scaling. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’00). ACM, New York, NY, 9--14.

Digital Library

[5]

Chatterjee, S., Parker, E., Hanlon, P. J., and Lebeck, A. R. 2001. Exact analysis of the cache behavior of nested loops. SIGPLAN Not. 36, 286--297.

Digital Library

[6]

Choi, K., Soma, R., and Pedram, M. 2005. Fine-grained dynamic voltage and frequency scaling for precise energy and performance tradeoff based on the ratio of off-chip access to on-chip computation times. IEEE Trans. Comput.-Aid. Design Cir. Syst. 24, 1, 18--28.

Digital Library

[7]

Ghosh, S., Martonosi, M., and Malik, S. 1997. Cache miss equations: An analytical representation of cache misses. In Proceedings of the 11th International Conference on Supercomputing (ICS’97). ACM, New York, NY, 317--324.

Digital Library

[8]

Gowan, M. K., Biro, L. L., and Jackson, D. B. 1998. Power considerations in the design of the alpha 21264 microprocessor. In Proceedings of the ACM/IEEE Design Automation Conference. 726--731.

Digital Library

[9]

Issenin, I., Brockmeyer, E., Miranda, M., and Dutt, N. 2004. Data reuse analysis technique for software-controlled memory hierarchies. In Proceedings of the Conference on Design, Automation and Test in Europe. 202--207.

Digital Library

[10]

Kandemir, M. and Choudhary, A. 2002. Compiler-directed scratch pad memory hierarchy design and management. In Proceedings of the ACM/IEEE Design Automation Conference. 690--695.

Digital Library

[11]

Lee, J. and Shrivastava, A. 2008. Static analysis of processor stall cycle aggregation. http://www.public.asu.edu/~ashriva6/papers/pica.html.

[12]

Lee, J.-E., Kwon, W., Kim, T., Chung, E.-Y., Choi, K.-M., Kong, J.-T., Eo, S.-K., and Gwilt, D. 2005. System level architecture evaluation and optimization: An industrial case study with AMBA3 AXI. J. Semiconductor Technol. Sci. 5, 4, 229--237.

[13]

McCalpin, J. D. 1995. Memory bandwidth and machine balance in current high performance computers. IEEE Comput. Soc. Tech. Comm. Comput. Architect. Newsl., 19--25.

[14]

Mowry, T. C., Lam, M. S., and Gupta, A. 1992. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 62--73.

Digital Library

[15]

Rabaey, J. and Pedram, M., Eds. 1996. Low Power Design Methodologies. Kluwer Academic Publishers, Norwell, MA.

[16]

Shrivastava, A., Earlie, E., Dutt, N., and Nicolau, A. 2005. Aggregating processor free time for energy reduction. In Proceedings of the IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’05). 154--159.

Digital Library

[17]

Shrivastava, A., Lee, J., and Jeyapaul, R. 2010. Cache vulnerability equations for protecting data in embedded processor caches from soft errors. ACM SIGPLAN Not. 45, 4, 143--152.

Digital Library

[18]

Unsal, O. S., Koren, I., Krishna, C. M., and Moritz, C. A. 2002. Cool-fetch: Compiler-enabled power-aware fetch throttling. IEEE Comput. Architect. Lett. 1.

Digital Library

[19]

VanderWiel, S. P. and Lilja, D. J. 2000. Data prefetch mechanisms. ACM Comput. Surv. 32, 2, 174--199.

Digital Library

[20]

Verdoolaege, S., Seghir, R., Beyls, K., Loechner, V., and Bruynooghe, M. 2007. Counting integer points in parametric polytopes using Barvinok’s rational functions. Algorithmica 48, 1, 37--66.

Digital Library

[21]

Zivojnovic, V., Martinez, J., Schläger, C., and Meyr, H. 1994. DSPstone: A DSP-oriented benchmarking methodology. In Proceedings of the International Conference on Signal Processing Applications and Technology (ICSPAT’94).

Index Terms

PICA: Processor Idle Cycle Aggregation for Energy-Efficient Embedded Systems
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems

Recommendations

Aggregating processor free time for energy reduction
CODES+ISSS '05: Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

Even after carefully tuning the memory characteristics to the application properties and the processor speed, during the execution of real applications there are times when the processor stalls, waiting for data from the memory. Processor stall can be ...
Enabling large decoded instruction loop caching for energy-aware embedded processors
CASES '10: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems

Low energy consumption in embedded processors is increasingly important in step with the system complexity. The on-chip instruction cache (I-cache) is usually a most energy consuming component on the processor chip due to its large size and frequent ...
DLIC: Decoded loop instructions caching for energy-aware embedded processors

With the explosive proliferation of embedded systems, especially through countless portable devices and wireless equipment used, embedded systems have become indispensable to the modern society and people's life. Those devices are often battery driven. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 11, Issue 2

July 2012

342 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/2220336

Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 01 July 2012

Accepted: 01 May 2011

Revised: 01 February 2011

Received: 01 July 2008

Published in TECS Volume 11, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
310
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents