Abstract
Loop distribution and loop fusion are two e.ective loop transformation techniques to optimize the execution of the programs in DSP applications. In this paper, we propose a new technique combining loop distribution with direct loop fusion, which will improve the timing performance without jeopardizing the code size. We .rst develop the loop distribution theorems that state the legality conditions of loop distribution for multi-level nested loops. We show that if the summation of the edge weights of the dependence cycle satis.es a certain condition, then the statements involved in the dependence cycle can be distributed; otherwise, they should be put in the same loop after loop distribution. Then, we propose the technique of maximum loop distribution with direct loop fusion. The experimental results show that the execution time of the transformed loops by our technique is reduced 21.0compared to the original loops and the code size of the transformed loops is reduced 7.0% on average compared to the original loops.
This work is partially supported by TI University Program, NSF EIA-0103709, Texas ARP 009741-0028-2001, NSF CCF-0309461, NSF IIS-0513669, Microsoft, USA.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, San Francisco (2001)
Kennedy, K., Mckinley, K.S.: Loop distribution with arbitrary control flow. In: Proc. of the 1990 conference on Supercomputing, November 1990, pp. 407–416 (1990)
Kennedy, K., Mckinley, K.S.: Maximizing loop parallelism and improving data locality via loop fusion and distribution. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1993. LNCS, vol. 768, pp. 301–320. Springer, Heidelberg (1994)
Liu, M., Zhuge, Q., Shao, Z., Sha, E.H.-M.: General loop fusion technique for nested loops considering timing and code size. In: Proc. ACM/IEEE International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES 2004), September 2004, pp. 190–201.
McKinley, K.S., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems (TOPLAS) 18(4), 424–453 (1996)
Sha, E.H.-M., O’Neil, T.W., Passos, N.L.: Efficient polynomial-time nested loop fusion with full parallelism. International Journal of Computers and Their Applications 10(1), 9–24 (2003)
Verdoolaege, S., Bruynooghe, M., Catthoor, F.: Multi-dimensional incremental loop fusion for data locality. In: Proc. of the Application-Specific Systems, Architectures, and Processors, pp. 14–24 (2003)
Wolfe, M.: High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, Inc., Reading (1996)
Zhuge, Q., Xiao, B., Sha, E.-M.: Code size reduction technique and implementation for software-pipelined DSP applications. ACM Transactions on Embedded Computing Systems(TECS) 2(4), 590–613 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, M., Zhuge, Q., Shao, Z., Xue, C., Qiu, M., Sha, E.H.M. (2005). Loop Distribution and Fusion with Timing and Code Size Optimization for Embedded DSPs. In: Yang, L.T., Amamiya, M., Liu, Z., Guo, M., Rammig, F.J. (eds) Embedded and Ubiquitous Computing – EUC 2005. EUC 2005. Lecture Notes in Computer Science, vol 3824. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11596356_15
Download citation
DOI: https://doi.org/10.1007/11596356_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30807-2
Online ISBN: 978-3-540-32295-5
eBook Packages: Computer ScienceComputer Science (R0)