Abstract
The mapping of applications to FPGAs involves the exploration of a potentially large space of possible design choices with long and error-prone design cycles. Automated compiler analysis and transformation techniques aim at improving the design productivity of this mapping process by reducing the design cycles while still leading to good desigs. Scalar replacement, also known as, register promotion, leads to designs that reduce the number of external memory accesses, and thus reduce the execution time, by the use of storage resource. In this paper we present the combination of loop transformation techniques, namely loop unrolling, loop splitting and loop interchange with scalar replacement to enable partial data reuse on computations expressed by tightly nested loops pervasive in image processing algorithms. We describe an accurate performance modeling in the presence of partial data reuse. Our experimental results reveal that our model accurately captures the non-trivial execution effects of pipelined implementations in the presence of partial data reuse due to the need to fill-up data buffers. The model thus allows a compiler to explore a large design space with high accuracy, ultimately allowing compiler tools to find better design than using brute-force approaches.
This research is supported by the Inha University Research Grant 34394.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Carr, S., Kennedy, K.: Scalar Replacement in the Presence of Conditional Flow. Software-Practice and Experience 24(1), 51–77 (1994)
So, B., Hall, M.W.: Increasing the Applicability of Scalar Replacement. In: Duesterwald, E. (ed.) CC 2004. LNCS, vol. 2985, Springer, Heidelberg (2004)
Baradaran, N., Diniz, P.C., Park, J.: Extending the Applicability of Scalar Replacement to Multiple Induction Variables. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds.) LCPC 2004. LNCS, vol. 3602, pp. 455–469. Springer, Heidelberg (2005)
Park, J., et al.: Area Modeling of Complete FPGA Designs in the Presence of Loop Transformations. IEEE Trans. on Computers 53(11), 1420–1435 (2004)
Wolf, M., Lam, M.: A Data Locality Optimization Algorithm. In: Proc. of the ACM Conference on Programming Language Design and Implementation (PLDI), ACM Press, New York (1991)
Guo, Z., Buyukkurt, B., Najjar, W.: Input Data Reuse in Compiling Window Operations onto Reconfigurable Hardware. In: Proc. of the ACM Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), ACM Press, New York (2004)
Yu, H., Lesser, M.: Automatic Sliding Window Operation Optimization for FPGA-Based Computing Boards. In: Proc. of the 14th IEEE Symp. on Field-Programmable Custom Computing Machines (FCCM’06), pp. 76–88. IEEE Computer Society Press, Los Alamitos (2006)
Kandemir, M., Choudhari, A.: Compiler-Directed Scratch Pad Memory Hierarchy Design and Management. In: Proc. of 2002 ACM/IEEE Design Automation Conference (DAC’02), IEEE Computer Society Press, Los Alamitos (2002)
Xilinx® Virtex-4 FPGAs User Guide Xilinx® (2006)
Kaul, M., et al.: An Automated Temporal Partitioning and Loop Fission Approach to FPGA Based Reconfigurable Synthesis of DSP Applications. In: Proc. IEEE Design Automation Conf (DAC ’99), IEEE Computer Society Press, Los Alamitos (1999)
Cardoso, J.: Loop Dissevering: A Technique for Temporally Partitioning Loops in Dynamically Reconfigurable Computing Platforms. In: Proc. 10th Reconfigurable Architectures Workshop (RAW 2003) (2002)
Diniz, P., et al.: Bridging the Gap between Compilation and Synthesis in the DEFACTO System. In: Dietz, H.G. (ed.) LCPC 2001. LNCS, vol. 2624, Springer, Heidelberg (2003)
Bairagi, D., Pande, S., Agrawal, D.: Framework for Containing Code Size in Limited Register Set Embedded Processor. In: Compilers and Tools for Embedded Systems (LCTES 00), ACM Press, New York (2000)
Weinhardt, M., Luk, W.: Pipeline Vectorization. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems 20(2), 234–248 (2001)
Weinhardt, M., Luk, W.: Memory Access Optimisaztion for reconfigurable Systems. IEE Proceedings of Computer and Digital Tech 148(3) (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Park, J., Diniz, P.C. (2007). Partial Data Reuse for Windowing Computations: Performance Modeling for FPGA Implementations. In: Diniz, P.C., Marques, E., Bertels, K., Fernandes, M.M., Cardoso, J.M.P. (eds) Reconfigurable Computing: Architectures, Tools and Applications. ARC 2007. Lecture Notes in Computer Science, vol 4419. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71431-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-71431-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71430-9
Online ISBN: 978-3-540-71431-6
eBook Packages: Computer ScienceComputer Science (R0)