article

Code size reduction technique and implementation for software-pipelined DSP applications

Authors:

Qingfeng Zhuge,

Edwin H.-M. ShaAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 2, Issue 4

Pages 590 - 613

https://doi.org/10.1145/950162.950168

Published: 01 November 2003 Publication History

Abstract

Software pipelining technique is extensively used to exploit instruction-level parallelism of loops, but also significantly expands the code size. For embedded systems with very limited on-chip memory resources, code size becomes one of the most important optimization concerns. This paper presents the theoretical foundation of code size reduction for software-pipelined loops based on retiming concept. We propose a general Code-size REDuction technique (CRED) for various kinds of processors. Our CRED algorithms integrate the code size reduction with software pipelining. The experimental results show the effectiveness of the CRED technique on both code size reduction and code size/performance trade-off space exploration.

References

[1]

Araujo, G., Devadas, S., Keutzer, K., Liao, S., Malik, S., Sudarsanam, A., Tjiang, S., and Wang, A. 1995. Challenges in code generation for embedded processors. In Code Generation For Embedded Processors, P. Marwedel and G. Goossens, Eds. Kluwer Academic Publishers, Dordrecht, Ch. 1, 4--17.

[2]

Chao, L.-F., LaPaugh, A. S., and Sha, E. H.-M. 1997. Rotation scheduling: A loop pipelining algorithm. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 16, 3 (March), 229--239.

[3]

Chao, L.-F. and Sha, E. H.-M. 1995. Static scheduling for synthesis of DSP algorithms on various models. Journal of VLSI Signal Processing 10, 207--223.

[4]

Chao, L.-F. and Sha, E. H.-M. 1997. Scheduling data-flow graphs via retiming and unfolding. IEEE Transactions on Parallel and Distributed Systems 8, 12 (Dec.), 1259--1267.

[5]

Chen, F., O'Neil, T. W., and Sha, E. H.-M. 2000. Optimizing overall loop schedules using prefetching and partitioning. IEEE Transactions on Parallel and Distributed Systems 11, 604--614.

[6]

Chen, F., Tongsima, S., and Sha, E. H.-M. 1998. Loop scheduling algorithm for timing and memory operation minimization with register constraint. In Proceedings 1998 IEEE Workshop on Signal Processing Systems (SiPS), 579--588.

[7]

Granston, E., Scales, R., Stotzer, E., Ward, A., and Zbiciak, J. 2001. Controlling code size of software-pipelined loops on the TMS320C6000 VLIW DSP architecture. In Proceedings 3rd IEEE/ACM Workshop on Media and Streaming Processors, 29--38.

[8]

Hennessy, J. and Patterson, D. 1995. Computer Architecture: A Quantitive Approach, 2nd ed. Morgan Kaufmann, San Mateo, CA.

[9]

Huff, R. A. 1993. Lifetime-sensitive modulo scheduling. In Proceedings SIGPLAN'93 ACM Conference on Programming Language Design and Implementation, 258--267.

[10]

Intel Corporation 2001. Intel Itanium Architecture Software Developer's Manual Volume 1: Application Architecture. Intel Corporation. (literature number 245317-003).

[11]

Kuck, D. J., Kuhn, R. H., Padua, D. A., Leasure, B., and Wolfe, M. 1981. Dependence graphs and compiler optimizations. In Proceedings of the ACM Symposium on Principles of Programming Languages, 207--218.

[12]

Lam, M. 1988. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings SIGPLAN'88 ACM Conference on Programming Language Design and Implementation, 318--328.

[13]

Lanneer, D., Praet, J. V., Kifli, A., Schoofs, K., W.Geurts, Thoen, F., and Goossens, G. 1995. CHESS: Retargetable code generation for embedded processors. In Code Generation for Embedded Processors, P. Marwedel and G. Goossens, Eds. Kluwer Academic Publishers, Dordrcht, Ch. 5, 85--296.

[14]

Leiserson, C. E. and Saxe, J. B. 1991. Retiming synchronous circuitry. Algorithmica 6, 5--35.

[15]

Motorola Digital DNA & Agere Systems 2001. StarCore SC140 DSP Core Reference Manual. Motorola Digital DNA & Agere Systems.

[16]

Philips, Inc. 2000. TM-1300 Media Processor Data Book. Philips, Inc.

[17]

Rau, B. R. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the 27th IEEE/ACM Annual International Symposium on Microarchitecture (MICRO), 63--74.

[18]

Rau, B. R. and Fisher, J. A. 1993. Instruction-level parallel processing: History, overview and perspective. Journal of Supercomputing 7, 1/2 (July), 9--50.

[19]

Rau, B. R. and Glaeser, C. D. 1981. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In Proceedings 14th ACM/IEEE Annual Workshop on Microprogramming, 183--198.

[20]

Rau, B. R., Schlansker, M. S., and Tirumalai, P. P. 1992. Code generation schema for modulo scheduled loops. In Proc. 25th IEEE/ACM Annual International Symposium on Microarchitecture (MICRO), 158--169.

[21]

Seal, D., Ed. 2000. ARM Architecture Reference Manual, 2nd ed. Addison-Wesley, Reading, MA.

[22]

Texas Instruments, Inc. 2000. TMS320C6000 CPU and Instruction Set Reference Guide. Texas Instruments, Inc. (literature number SPRU189F).

[23]

Texas Instruments, Inc. 2001a. Code Composer Studio IDE v2 White Paper. Texas Instruments, Inc. (literature number SPRA004).

[24]

Texas Instruments, Inc. 2001b. TMS320C6000 Optimizing Compiler User's Guide. Texas Instruments, Inc. (literature number SPRU187).

[25]

Wang, Z., O'Neil, T. W., and Sha, E. H.-M. 2001. Minimizing average schedule length under memory constraints by optimal partitioning and prefetching. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 27, 215--233.

Cited By

Jiang WZhuge QChen XYang LYi JSha E(2016)Properties of Self-Timed Ring Architectures for Deadlock-Free and Consistent Configuration Reaching Maximum ThroughputJournal of Signal Processing Systems10.1007/s11265-015-0984-684:1(123-137)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1007/s11265-015-0984-6
Weiwen Jiang Zhuge QYi JLei Yang Sha E(2014)On self-timed ring for consistent mapping and maximum throughput2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications10.1109/RTCSA.2014.6910511(1-9)Online publication date: Aug-2014
https://doi.org/10.1109/RTCSA.2014.6910511
BenSaleh M(2013)Loop Transformations for Power Consumption Reduction in Wireless Sensor Networks MemoryProceedings of the 2013 European Modelling Symposium10.1109/EMS.2013.108(647-651)Online publication date: 20-Nov-2013
https://dl.acm.org/doi/10.1109/EMS.2013.108
Show More Cited By

Index Terms

Code size reduction technique and implementation for software-pipelined DSP applications
1. Hardware
  1. Communication hardware, interfaces and storage
    1. Signal processing systems
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

Optimal Code Size Reduction for Software-Pipelined Loops on DSP Applications
ICPP '02: Proceedings of the 2002 International Conference on Parallel Processing

Code size expansion of software-pipelined loops is a critical problem for DSP systems with strict code size constraint. Some ad-hoc code size reduction techniques were used to try to reduce the prologue/epilogue produced by software pipelining. This ...
Optimal code size reduction for software-pipelined and unfolded loops
ISSS '02: Proceedings of the 15th international symposium on System Synthesis

Software pipelining and unfolding are commonly used techniques to increase parallelism for DSP applications. However, these techniques expand the code size of the application significantly. For most DSP systems with limited memory resources, code size ...
General loop fusion technique for nested loops considering timing and code size
CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems

Loop fusion is commonly used to improve the instruction-level parallelism of loops for high-performance embedded computing systems. Loop fusion, however, is not always directly applicable because the fusion prevention dependencies may exist among loops. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 2, Issue 4

November 2003

165 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/950162

Issue’s Table of Contents

Copyright © 2003 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 01 November 2003

Published in TECS Volume 2, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
1,059
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jiang WZhuge QChen XYang LYi JSha E(2016)Properties of Self-Timed Ring Architectures for Deadlock-Free and Consistent Configuration Reaching Maximum ThroughputJournal of Signal Processing Systems10.1007/s11265-015-0984-684:1(123-137)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1007/s11265-015-0984-6
Weiwen Jiang Zhuge QYi JLei Yang Sha E(2014)On self-timed ring for consistent mapping and maximum throughput2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications10.1109/RTCSA.2014.6910511(1-9)Online publication date: Aug-2014
https://doi.org/10.1109/RTCSA.2014.6910511
BenSaleh M(2013)Loop Transformations for Power Consumption Reduction in Wireless Sensor Networks MemoryProceedings of the 2013 European Modelling Symposium10.1109/EMS.2013.108(647-651)Online publication date: 20-Nov-2013
https://dl.acm.org/doi/10.1109/EMS.2013.108
Du JWang YZhuge QHu JSha E(2013)Efficient Loop Scheduling for Chip Multiprocessors with Non-Volatile Main MemoryJournal of Signal Processing Systems10.1007/s11265-012-0703-571:3(261-273)Online publication date: 1-Jun-2013
https://dl.acm.org/doi/10.1007/s11265-012-0703-5
Wang YDu JHu JZhuge QSha E(2012)Loop scheduling optimization for chip-multiprocessors with non-volatile main memory2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2012.6288188(1553-1556)Online publication date: Mar-2012
https://doi.org/10.1109/ICASSP.2012.6288188
Li J(2011)BibliographyReal-Time Embedded Systems10.1201/b10935-12(187-207)Online publication date: 7-Jun-2011
https://doi.org/10.1201/b10935-12
Jung Y(2011)Hardware/Software Co-reconfigurable Instruction Decoder for Adaptive Multi-core DSP ArchitecturesJournal of Signal Processing Systems10.1007/s11265-010-0461-162:3(273-285)Online publication date: 1-Mar-2011
https://dl.acm.org/doi/10.1007/s11265-010-0461-1
Liu HShao ZWang MChen P(2008)Overhead-Aware System-Level Joint Energy and Performance Optimization for Streaming Applications on Multiprocessor Systems-on-ChipProceedings of the 2008 Euromicro Conference on Real-Time Systems10.1109/ECRTS.2008.18(92-101)Online publication date: 2-Jul-2008
https://dl.acm.org/doi/10.1109/ECRTS.2008.18
Qiu MSha ELiu MLin MHua SYang L(2008)Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSPJournal of Parallel and Distributed Computing10.1016/j.jpdc.2007.06.01468:4(443-455)Online publication date: 1-Apr-2008
https://dl.acm.org/doi/10.1016/j.jpdc.2007.06.014
Hua GWang MShao ZLiu HXue C(2007)Real-time loop scheduling with energy optimization via DVS and ABB for multi-core embedded systemProceedings of the 2007 international conference on Embedded and ubiquitous computing10.5555/1780745.1780747(1-12)Online publication date: 17-Dec-2007
https://dl.acm.org/doi/10.5555/1780745.1780747
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents