Abstract
Hardware synthesis from dataflow graphs of signal processing systems is a growing research area as focus shifts to high level design methodologies. For data intensive systems, dataflow based synthesis can lead to an inefficient usage of memory due to the restrictive nature of synchronous dataflow and its inability to easily model data reuse. This paper explores how dataflow graph changes can be used to drive both the on-chip and off-chip memory organisation and how these memory architectures can be mapped to a hardware implementation. By exploiting the data reuse inherent to many image processing algorithms and by creating memory hierarchies, off-chip memory bandwidth can be reduced by a factor of a thousand from the original dataflow graph level specification of a motion estimation algorithm, with a minimal increase in memory size. This analysis is verified using results gathered from implementation of the motion estimation algorithm on a Xilinx Virtex-4 FPGA, where the delay between the memories and processing elements drops from 14.2 ns down to 1.878 ns through the refinement of the memory architecture. Care must be taken when modeling these algorithms however, as inefficiencies in these models can be easily translated into overuse of hardware resources.
Similar content being viewed by others
References
Bhattacharyya, S.S.: “Hardware/Software Co-synthesis of DSP Systems”, in Y.H. Hu, editor, Programmable Digital Signal Processors: Architecture, Programming and Applications, pp. 333-378, Marcel Dekker, Inc., 2002.
Lee, E. A., & Messerschmitt, D. G. (1987). Synchronous data flow. Proc. IEEE, 75, 1235–1245. doi:10.1109/PROC.1987.13876.
Wolf, W. (2006). High-Performance Embedded Computing: Architectures, Applications, and Methodologies. San Francisco, CA, USA: Morgan Kaufman.
Ha, S., et al: “Hardware-software Codesign of Multimedia Embedded Systems: The PeaCE Approach,” in 12th IEEE Int’l Conf. on Emb. and Real-Time Comp. Syst. and App., pp. 207–214, 2006.
Fischaber, S., McAllister, J., Woods, R.: “Memory-Centric Hardware Synthesis from Dataflow Models”, in Proc. 8th Int. SAMOS Workshop, pp. 197-206, Greece, 2008.
Brockmeyer, E., et al. (1999). Low Power Memory Storage and Transfer Organization for the MPEG-4 Full Pel Motion Estimation on a Multi Media Processor. IEEE Trans. Multimed., 1(2), 202–216. doi:10.1109/6046.766740.
Fischaber, S., et al: “SoC Memory Hierarchy Derivation from Dataflow Graphs,” 2007 Workshop on Signal Processing Systems, Shanghai, China, pp. 469-474, Oct. 17-19, 2007.
Edwards, S., et al. (1997). Design of Embedded Systems: Formal Models, Validation, and Synthesis. Proc. IEEE, 85(3), 366–390. doi:10.1109/5.558710.
Gokhale, M., et al: “Stream-oriented FPGA Computing in the Streams-C High Level Language”, in Proc. IEEE Symp. on Field-Programmable Custom Computing Machines, pp. 49-56, 2000.
Handel-C Lanuage Reference Manual: Version 3.0, Celoxica Limited, 2004, available at www.celoxica.com, April, 2007.
Kangas, T., et al: “UML-based multiprocessor SoC design framework,” in ACM Trans. on Embedded Computing Systems (TECS), vol. 5, pp. 281-320, 2006.
Nikolov, H., Stefanov, T., Deprettere, E.: “Modeling and FPGA implementation of applications using parameterized process networks with non-static parameters,” in Proc. IEEE Symp. on FCCM, 18-20 April 2005, pp. 255-263, 2005.
Thompson, M., et al: “A Framework for Rapid System-level Exploration, Synthesis, and Programming of Multimedia MP-SoCs”, Proc. of the 5th IEEE/ACM/IFIP International Conference on HW/SW Codesign and System Synthesis, Austria, 2007.
Janneck, J. W. (2008). et al: “Synthesizing Hardware from Dataflow Programs: an MPEG-4 Simple Profile Decoder Case Study”, 2008 IEEE Workshop on Signal Processing Systems. USA: Washington D.C.
McAllister, J., et al: “Rapid Implementation and Optimisation of DSP Systems on SoPC Based Heterogeneous Platforms,” in Proc. 5th Int. SAMOS Workshop, pp. 254-163, Greece, 2005.
F. Catthoor et al., Optimisation of Global Data Transfer and Storage Organisation for decreased area and power in data-dominated real-time systems, 1998.
Murthy, P. K., & Bhattacharyya, S. S. (2004). Buffer merging - a powerful technique for reducing memory requirements of synchronous dataflow specifications. ACM Trans. Des. Autom. Electron. Syst., 9(2), 212–237. doi:10.1145/989995.989999.
Yang, H., et al. (2006). Buffer Minimization in RTL Synthesis from Coarse-grained Dataflow Specification. Nagoya, Japan, April: SASMI.
“Virtex 4 Family Overview”, Version 1.6, Xilinx, Inc., 2006, available at www.xilinx.com, April, 2007.
Murthy, P. K., & Lee, E. A. (2002). Multidimensional synchronous dataflow. IEEE Trans. Signal Process., 50(8), 2064–2079. doi:10.1109/TSP.2002.800830.
G. Bilsen et al., “Cyclo-static Dataflow”, in IEEE Trans. on Signal Processing, Vol. 44, Issue 2, pp397-408, Feb. 1996.
Denolf, K., et al: “Exploiting the Expressiveness of Cyclo-Static Dataflow to Model Multimedia Implementations,” EURASIP Journal on Advances in Signal Processing, 2007.
Watkinson, J. (2001). The MPEG Handbook. Oxford: Focal.
J.-C. Tuan, et al., “On the Data Reuse and Memory Bandwidth Analysis for Full-Search Block –Matching BLSI Architecture,” in IEEE Trans. On Circuits and Systems for Video Technology, Vol. 12, No. 1, Jan. 2002.
R. M. Ali, “DDR2 SDRAM Interfaces for Next-gen Systems”, in Electronic Engineering Times-Asia, Oct. 16, 2006.
Fischaber, S.: Memory-Centric System Level Design of Heterogeneous Embedded DSP Systems, PhD Thesis, Queen’s University Belfast, 2007.
Acknowledgements
This work was carried out using the support of the Engineering and Physical Sciences Research Council ICT grant EP/C000676/1.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fischaber, S., Woods, R. & McAllister, J. SoC Memory Hierarchy Derivation from Dataflow Graphs. J Sign Process Syst 60, 345–361 (2010). https://doi.org/10.1007/s11265-009-0380-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-009-0380-1