Abstract
Application-specific instruction-set processors (ASIPs) provide a good alternative for video processing acceleration, but the productivity gap implied by such a new technology may prevent leveraging it fully. Video processing SoCs need flexibility that is not available in pure hardware architectures, while pure software solutions do not meet video processing performance constraints. Thus, ASIP design could offer a good tradeoff between performance and flexibility. Video processing algorithms are often characterized by intrinsic parallelism that can be accelerated by ASIP specialized instructions. In this paper, we propose a new approach for exploiting sequences of tightly coupled specialized instructions in ASIP design applicable to video processing. Our approach, which avoids costly data communications by applying data grouping and data reuse, consists of accelerating an algorithm’s critical loops by transforming them according to a new intermediate representation. This representation is optimized and loop parallelism possibilities are also explored. This approach has been applied to video processing algorithms such as the ELA deinterlacer and the 2D-DCT. Experimental results show speedups up to 18 (on the considered applications, while the hardware overhead in terms of additional logic gates was found to be between 18 and 59%.
References
L. V. Agostini, I. S. Silva, and S. Bampi, “Pipelined Fast 2D DCT Architecture for JPEG Image Compression,” in Proc. of the 14th Symposium on Integrated Circuits and Systems Design, Pirenópolis, Brazil, 2001, pp. 226–231.
A. Aiken and A. Nicolau, “Optimal Loop Parallelization,” in Proc. of the SIGPLAN ’88 Conference on Programming Language Design and Implementation, Atlanta, Georgia, USA, 1988, pp. 308–317.
ARM Ltd., “Amba Bus,” available at: http://www.arm.com.
M.-A. Cantin, Y. Savaria, D. Prodanos, and P. Lavoie, “An Automatic Word Length Determination Method,” in IEEE International Symposium on Circuits and Systems (ISCAS’2001) vol. 5, Sydney, Australia, May 2001, pp. 53–56.
N. Cheung, J. Henkel, and S. Parameswaran, Rapid Configuration and Instruction Selection for an ASIP: A Case Study, DATE’03, Munich, Germany, 2003, pp. 10802–10809.
N. Clark, J. Blome, M. Chu, S. Mahlke, S. Biles, and K. Flautner, “An Framework for Transparent Instruction Set Customization in Embedded Processors,” in Proc. of the 32nd International Symposium on Computer Architecture, ISCA’05, IEEE, Madison, Wisconsin USA, 2005, pp. 272–283.
J. Cong, Y. Fan, G. Han, A. Jagannathan, G. Reinman, and Z. Zhang, Instruction Set Extension with Shadow Registers for Configurable Processors, FPGA’05, Monterey, California, USA, Feb. 2005, pp. 99–106.
CoWare, “Lisatek,” 2005, http://www.coware.com/products/lisatek.
T. V. K. Gupta, R. E. Ko, and R. Barua, “Compiler-directed Customization of ASIP Cores,” in Proc. of 10th International Symposium on Hadware/Software Codesign, CODES’02, ACM, Estes Park, Colorado, USA, 2002, pp. 97–102.
D. Goodwin and D. Petkov, Automatic Generation of Application Specific Processors, CASES’03, San Jose, California, USA, 2003, pp. 137–147.
A. Hoffmann, T. Kogel, A. Nohl, G. Braun, O. Schliebusch, O. Wahlen, A. Wieferink, and H. Meyr, “A Novel Methodology for the Design of Application-Specific Instruction-Set Processors (ASIPs) Using a Machine Description Language,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 20, no. 11, Nov. 2001, pp. 1338–1354.
M. Imai, N. Binh, and A. Shiomi, “A New HW/SW Partitioning Algorithm for Synthesizing the Highest Performance Pipelined ASIPs with Multiple Identical FUs,” in Proc. of European Design Automation Conference, EURO-VHDL’96, Geneva, Switzerland, 1996, pp. 126–131.
M. K. Jain, M. Balakrishnan, and A. Kumar, An Efficient Technique for Exploring Register File Size in ASIP Synthesis, CASES 2002, ACM, Grenoble, France, 2002, pp. 252–261.
K. Karuri, M. A. Al Faruque, S. Kraemer, R. Leupers, G. Ascheid, and H. Meyr, Fine-grained Application Source Code Profiling for ASIP Design, ACM, DAC 2005, Anaheim, California, USA, 2005, pp. 329–334.
J. S. Lim, Two-dimensional Signal and Image Processing, Prentice-Hall, Signal Processing Series, 1990.
S. Lin, Y. Chang, and L. Chen, “Motion Adaptive Interpolation with Horizontal Motion Detection for Deinterlacing,” IEEE Trans. Consum. Electron., vol. 49, no. 4, Nov 2003, pp. 1256–1265.
M. Mbaye, N. Bélanger, Y. Savaria, and S. Pierre, Application Specific Instruction-set Processor Generation for Video Processing Based on Loop Optimization, ISCAS ’05, IEEE, Kobe, Japan, May 2005, pp. 3515–3518.
M. Mbaye, D. Lebel, N. Bélanger, Y. Savaria, and S. Pierre, Design Exploration with an Application-specific Instruction-set Processor for ELA Deinterlacing, ISCAS ’06, IEEE, Island of Kos, Greece, May, 2006, pp. 4607–4610.
H. Meyr, System-on-Chip Communications: The Dawn of ASIPs and the Dusk of ASICs, Signal Processing Systems, SIPS’2003, IEEE, Seoul, Korea, 2003, pp. 4–5.
P. R. Panda, F. Cathoor, N. D. Dutt, K. Dankaert, E. Brockmeyer, C. Kulkarni, A. Vandercapelle, and P. G. Kjeldsberg, “Data and Memory Optimization Techniques for Embedded Systems,” ACM Transact. Des. Automat. Electron. Syst., vol. 6, no. 2, Apr. 2001, pp. 149–206.
J. Park, P. C. Diniz, and K. R. S. Shayee, “Performance and Area Modeling of Complete FPGA Designs in the Presence of Loop transformations,” IEEE Trans. Comput., vol. 53, no. 11, Nov. 2004, pp. 1420–1435.
L. Pozzi and K. Atasu, “Exact and Approximate Algorithms for the Extension of Embedded Processor Instruction Sets,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 25, no. 7, Jul. 2006, pp. 1209–1229.
C. Shekhar, R. Singh, A. S. Mandal, S. C. Bose, R. Saini, and P. Tanwar, “Application Specific Instruction Set Processors: Redefining Hardware–software Boundary,” in Proc. of the 17th International Conference on VLSI Design, Mumbai, India, 2004, pp. 915–918.
B. Su, S. Ding, and J. Xia, “URPR—An Extension of URCR for Software Pipelining,” in Proc. of the 19th Microprogramming Workshop (MICRO-19), New-York, New-York, USA, 1986, pp. 94–103.
F. Sun, S. Ravi, A. Raghunathan, and N. K. Jha, A Scalable Application-specific Processor Synthesis Methodology, ICCAD’2003, San Jose, California, USA, 2003, pp. 283–290.
D. C. Suresh, W. A. Najjar, F. Vahid, J. R. Villarreal, and G. Stitt, “Profiling Tools for Hardware/Software Partitioning of Embedded Applications,” in Proc. of the 2003 ACM SIGPLAN Conference on Language, Compiler, and Tool for Embedded Systems (LCTES), San Diego, California, USA, 2003, pp. 189–198.
Synopsys Inc., “Design Compiler,” 2006, http://www.synopsys.com.
Tensilica Inc., “Xtensa Processor Generator and Xpress Compiler,” 2006, available: http://www.tensilica.com.
P. Yu and T. Mitra, Characterizing Embedded Applications for Instructions-set Extensible Processors, DAC’04, ACM, San Diego, California, USA, 2004, pp. 723–728.
Wikipedia, “Data dependency,” 2007, http://en.wikipedia.org/wiki/Data_dependency.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mbaye, M.M., Bélanger, N., Savaria, Y. et al. A Novel Application-specific Instruction-set Processor Design Approach for Video Processing Acceleration. J VLSI Sign Process Syst Sign Image Video Technol 47, 297–315 (2007). https://doi.org/10.1007/s11265-007-0050-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-007-0050-0