A Customized Cross-Bar for Data-Shuffling in Domain-Specific SIMD Processors

Raghavan, Praveen; Munaga, Satyakiran; Ramos, Estela Rey; Lambrechts, Andy; Jayapala, Murali; Catthoor, Francky; Verkest, Diederik

doi:10.1007/978-3-540-71270-1_5

Praveen Raghavan^1,2,
Satyakiran Munaga^1,2,
Estela Rey Ramos^1,3,
Andy Lambrechts^1,2,
Murali Jayapala¹,
Francky Catthoor^1,2 &
…
Diederik Verkest^1,2,4

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4415))

Included in the following conference series:

International Conference on Architecture of Computing Systems

554 Accesses
16 Citations

Abstract

Shuffle operations are one of the most common operations in SIMD based embedded system architectures. In this paper we study different families of shuffle operations that frequently occur in embedded applications running on SIMD architectures. These shuffle operations are used to drive the design of a custom shuffler for domain-specific SIMD processors. The energy efficiency of various crossbar based custom shufflers is analyzed and compared with the widely used full crossbar. We show that by customizing the crossbar to implement specific shuffle operations required in the target application domain, we can reduce the energy consumption of shuffle operations by up to 80%. We also illustrate the tradeoffs between flexibility and energy efficiency of custom shufflers and show that customization offers reasonable benefits without compromising the flexibility required for the target application domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

He-P2012: Performance and Energy Exploration of Architecturally Heterogeneous Many-Cores

Article 19 October 2015

Sharing SIMD execution units with decoupled offloader in asymmetric multicores

Article 18 June 2022

HW/SW Co-design Toolset for Customization of Exposed Datapath Processors

References

Sasanka, R.: Energy Efficient Support for All levels of Parallelism for Complex Media Applications. PhD thesis, University of Illinois at Urbana-Champaign (June 2005)
Google Scholar
Lee, H., Lin, Y., Harel, Y., Woh, M., Mahlke, S., Mudge, T., Flautner, K.: Software defined radio - a high performance embedded challenge. In: Proc. 2005 Intl. Conference on High Performance Embedded Architectures and Compilers (HiPEAC), November (2005)
Google Scholar
IBM: The Cell Microprocessor (2005), http://www.research.ibm.com/cell/
Van Berkel, K., Heinle, F., Meuwissen, P., Moerman, K., Weiss, M.: Vector processing as an enabler for software-defined radio in handsets from 3G+WLAN onwards. In: Proc. of Software Defined Radio Technical Conference, November, pp. 125–130 (2004)
Google Scholar
Lin, Y., Lee, H., Woh, M., Harel, Y., Mahlke, S., Mudge, T., Chakrabarti, C., Flautner, K.: SODA: A low-power architecture for software radio. In: Proc. of ISCA (2006)
Google Scholar
Freescale Semiconductor, http://www.freescale.com/files/32bit/doc/ref_manual/MPC7400UM.pdf?srch=1 . Altivec Velocity Engine
Intel: Streaming SIMD Extension 2 (SSE2), http://www.intel.com/support/processors/sb/cs-001650.htm
Freescle Semiconductor, http://www.freescale.com/webapp/sps/site/overview.jsp?nodeId=0162468rH3bTdGmKqW5Nf2 . Altivec Engine Benchmarks (2006)
DeMan, H.: Ambient intelligence: Giga-scale dreams and nano-scale realities. In: Proc. of ISSCC, Keynote Speech, February (2005)
Google Scholar
Duato, J., Yalamanchili, S., Ni, L.: Interconnection Networks: an Engineering Approach. IEEE Computer Society Press, Los Alamitos (1997)
Google Scholar
Das, N., Bhattacharya, B.B., Menon, R., Bezrukov, S.L.: Permutation admissibility in shuffle-exchange networks with arbitrary number of stages. In: Intl Conference on High Performance Computing (HIPC), pp. 270–276 (1998)
Google Scholar
Cam, H., Fortes, J.A.B.: Rearrangeability of shuffle-exchange networks. In: Proc. of Frontiers of Massively Parallel Computation, pp. 303–314 (1990)
Google Scholar
Scherson, I.D., Corbett, P.F., Lang, T.: An analytical characterization of generalized shuffle-exchange networks. In: IEEE Proc. of Computer and Communication Societies (INFOCOM), pp. 409–414. IEEE Computer Society Press, Los Alamitos (1990)
Google Scholar
Padmanabhan, K.: Design and analysis of even-sized binary shuffle-exchange networks for multiprocessors. In: IEEE Transactions on Parallel and Distributed Systems, pp. 385–397. IEEE Computer Society Press, Los Alamitos (1991)
Google Scholar
Smith, S.D., Siegel, H.J.: An emulator network for SIMD machine interconnect networks. Computers, 232–241 (1979)
Google Scholar
Padmanabhan, K.: Cube structures for multiprocessors. Commun. ACM 33(1), 43–52 (1990)
Article MathSciNet Google Scholar
McGregor, J.P., Lee, R.B.: Architecture techniques for acclerating subword permutations with repetitions. In: Trans. on VLSI, pp. 325–335 (2003)
Google Scholar
Yang, X., Vachharajani, M., Lee, R.B.: Fast subword permutation instructions based on butterfly networks. In: Proc. of SPIE, Media Processor, pp. 80–86 (2000)
Google Scholar
McGregor, J.P., Lee, R.B.: Architectural enhancements for fast subword permutations with repetitions in cryptographic applications. In: Proc. of ICCD (2001)
Google Scholar
Elnaggar, A., Aboelaze, M., Al-Naamany, A.: A modified shuffle-free architecture for linear convolution. In: Trans. on Circuits and Systems II, pp. 862–866 (2001)
Google Scholar
Synopsys, Inc.: Physical Compiler User Guide (2006)
Google Scholar
Mentor Graphics: ModelSim SE User’s Manual (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

IMEC vzw, Kapeldreef 75, Heverlee 3001, Belgium
Praveen Raghavan, Satyakiran Munaga, Estela Rey Ramos, Andy Lambrechts, Murali Jayapala, Francky Catthoor & Diederik Verkest
ESAT, Kasteelpark Arenberg 10, K. U. Leuven, Heverlee 3001, Belgium
Praveen Raghavan, Satyakiran Munaga, Andy Lambrechts, Francky Catthoor & Diederik Verkest
Electrical Engineering, Universidade de Vigo, Spain
Estela Rey Ramos
Electrical Engineering, Vrije Universiteit Brussels, Belgium
Diederik Verkest

Authors

Praveen Raghavan
View author publications
You can also search for this author in PubMed Google Scholar
Satyakiran Munaga
View author publications
You can also search for this author in PubMed Google Scholar
Estela Rey Ramos
View author publications
You can also search for this author in PubMed Google Scholar
Andy Lambrechts
View author publications
You can also search for this author in PubMed Google Scholar
Murali Jayapala
View author publications
You can also search for this author in PubMed Google Scholar
Francky Catthoor
View author publications
You can also search for this author in PubMed Google Scholar
Diederik Verkest
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Paul Lukowicz Lothar Thiele Gerhard Tröster

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raghavan, P. et al. (2007). A Customized Cross-Bar for Data-Shuffling in Domain-Specific SIMD Processors. In: Lukowicz, P., Thiele, L., Tröster, G. (eds) Architecture of Computing Systems - ARCS 2007. ARCS 2007. Lecture Notes in Computer Science, vol 4415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71270-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-71270-1_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71267-1
Online ISBN: 978-3-540-71270-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Customized Cross-Bar for Data-Shuffling in Domain-Specific SIMD Processors

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

He-P2012: Performance and Energy Exploration of Architecturally Heterogeneous Many-Cores

Sharing SIMD execution units with decoupled offloader in asymmetric multicores

HW/SW Co-design Toolset for Customization of Exposed Datapath Processors

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Customized Cross-Bar for Data-Shuffling in Domain-Specific SIMD Processors

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

He-P2012: Performance and Energy Exploration of Architecturally Heterogeneous Many-Cores

Sharing SIMD execution units with decoupled offloader in asymmetric multicores

HW/SW Co-design Toolset for Customization of Exposed Datapath Processors

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation