Abstract
Shuffle operations are one of the most common operations in SIMD based embedded system architectures. In this paper we study different families of shuffle operations that frequently occur in embedded applications running on SIMD architectures. These shuffle operations are used to drive the design of a custom shuffler for domain-specific SIMD processors. The energy efficiency of various crossbar based custom shufflers is analyzed and compared with the widely used full crossbar. We show that by customizing the crossbar to implement specific shuffle operations required in the target application domain, we can reduce the energy consumption of shuffle operations by up to 80%. We also illustrate the tradeoffs between flexibility and energy efficiency of custom shufflers and show that customization offers reasonable benefits without compromising the flexibility required for the target application domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sasanka, R.: Energy Efficient Support for All levels of Parallelism for Complex Media Applications. PhD thesis, University of Illinois at Urbana-Champaign (June 2005)
Lee, H., Lin, Y., Harel, Y., Woh, M., Mahlke, S., Mudge, T., Flautner, K.: Software defined radio - a high performance embedded challenge. In: Proc. 2005 Intl. Conference on High Performance Embedded Architectures and Compilers (HiPEAC), November (2005)
IBM: The Cell Microprocessor (2005), http://www.research.ibm.com/cell/
Van Berkel, K., Heinle, F., Meuwissen, P., Moerman, K., Weiss, M.: Vector processing as an enabler for software-defined radio in handsets from 3G+WLAN onwards. In: Proc. of Software Defined Radio Technical Conference, November, pp. 125–130 (2004)
Lin, Y., Lee, H., Woh, M., Harel, Y., Mahlke, S., Mudge, T., Chakrabarti, C., Flautner, K.: SODA: A low-power architecture for software radio. In: Proc. of ISCA (2006)
Freescale Semiconductor, http://www.freescale.com/files/32bit/doc/ref_manual/MPC7400UM.pdf?srch=1 . Altivec Velocity Engine
Intel: Streaming SIMD Extension 2 (SSE2), http://www.intel.com/support/processors/sb/cs-001650.htm
Freescle Semiconductor, http://www.freescale.com/webapp/sps/site/overview.jsp?nodeId=0162468rH3bTdGmKqW5Nf2 . Altivec Engine Benchmarks (2006)
DeMan, H.: Ambient intelligence: Giga-scale dreams and nano-scale realities. In: Proc. of ISSCC, Keynote Speech, February (2005)
Duato, J., Yalamanchili, S., Ni, L.: Interconnection Networks: an Engineering Approach. IEEE Computer Society Press, Los Alamitos (1997)
Das, N., Bhattacharya, B.B., Menon, R., Bezrukov, S.L.: Permutation admissibility in shuffle-exchange networks with arbitrary number of stages. In: Intl Conference on High Performance Computing (HIPC), pp. 270–276 (1998)
Cam, H., Fortes, J.A.B.: Rearrangeability of shuffle-exchange networks. In: Proc. of Frontiers of Massively Parallel Computation, pp. 303–314 (1990)
Scherson, I.D., Corbett, P.F., Lang, T.: An analytical characterization of generalized shuffle-exchange networks. In: IEEE Proc. of Computer and Communication Societies (INFOCOM), pp. 409–414. IEEE Computer Society Press, Los Alamitos (1990)
Padmanabhan, K.: Design and analysis of even-sized binary shuffle-exchange networks for multiprocessors. In: IEEE Transactions on Parallel and Distributed Systems, pp. 385–397. IEEE Computer Society Press, Los Alamitos (1991)
Smith, S.D., Siegel, H.J.: An emulator network for SIMD machine interconnect networks. Computers, 232–241 (1979)
Padmanabhan, K.: Cube structures for multiprocessors. Commun. ACM 33(1), 43–52 (1990)
McGregor, J.P., Lee, R.B.: Architecture techniques for acclerating subword permutations with repetitions. In: Trans. on VLSI, pp. 325–335 (2003)
Yang, X., Vachharajani, M., Lee, R.B.: Fast subword permutation instructions based on butterfly networks. In: Proc. of SPIE, Media Processor, pp. 80–86 (2000)
McGregor, J.P., Lee, R.B.: Architectural enhancements for fast subword permutations with repetitions in cryptographic applications. In: Proc. of ICCD (2001)
Elnaggar, A., Aboelaze, M., Al-Naamany, A.: A modified shuffle-free architecture for linear convolution. In: Trans. on Circuits and Systems II, pp. 862–866 (2001)
Synopsys, Inc.: Physical Compiler User Guide (2006)
Mentor Graphics: ModelSim SE User’s Manual (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Raghavan, P. et al. (2007). A Customized Cross-Bar for Data-Shuffling in Domain-Specific SIMD Processors. In: Lukowicz, P., Thiele, L., Tröster, G. (eds) Architecture of Computing Systems - ARCS 2007. ARCS 2007. Lecture Notes in Computer Science, vol 4415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71270-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-71270-1_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71267-1
Online ISBN: 978-3-540-71270-1
eBook Packages: Computer ScienceComputer Science (R0)