Scalability Evaluation of a Polymorphic Register File: A CG Case Study

Ciobanu, Cătălin B.; Martorell, Xavier; Kuzmanov, Georgi K.; Ramirez, Alex; Gaydadjiev, Georgi N.

doi:10.1007/978-3-642-19137-4_2

Cătălin B. Ciobanu¹⁹,
Xavier Martorell^20,21,
Georgi K. Kuzmanov¹⁹,
Alex Ramirez^20,21 &
…
Georgi N. Gaydadjiev¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6566))

Included in the following conference series:

International Conference on Architecture of Computing Systems

886 Accesses

Abstract

We evaluate the scalability of a Polymorphic Register File using the Conjugate Gradient method as a case study. We focus on a heterogeneous multi-processor architecture, taking into consideration critical parameters such as cache bandwidth and memory latency. We compare the performance of 256 Polymorphic Register File-augmented workers against a single Cell PowerPC Processor Unit (PPU). In such a scenario, simulation results suggest that for the Sparse Matrix Vector Multiplication kernel, absolute speedups of up to 200 times can be obtained. Moreover, when equal number of workers in the range 1-256 is employed, our design is between 1.7 and 4.2 times faster than a Cell PPU-based system. Furthermore, we study the memory latency and cache bandwidth impact on the sustainable speedups of the system considered. Our tests suggest that a 128 worker configuration requires the caches to deliver 1638.4 GB/sec in order to preserve 80% of its peak speedup.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

The Case for Polymorphic Registers in Dataflow Computing

Article Open access 10 May 2017

Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication

Article 01 January 2019

SU3_Bench on a Programmable Integrated Unified Memory Architecture (PIUMA) and How that Differs from Standard NUMA CPUs

References

Bailey, D., Barton, J., Lasinski, T., Simon, H. (eds.): The NAS Parallel Benchmarks. Technical Report Technical Report RNR-91-02, NASA Ames Research Center, Moffett Field, CA 94035 (1991)
Google Scholar
Barcelona Supercomputing Center. Paraver, http://www.bsc.es/paraver
Barcelona Supercomputing Center. The NANOS Group Site: The Mercurium Compiler, http://nanos.ac.upc.edu/mcxx
Ciobanu, C., Kuzmanov, G.K., Ramirez, A., Gaydadjiev, G.N.: A Polymorphic Register File for Matrix Operations. In: Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS 2010), pp. 241–249 (July 2010)
Google Scholar
Corbal, J., Espasa, R., Valero, M.: MOM: a Matrix SIMD Instruction Set Architecture for Multimedia Applications. In: Proceedings of the ACM/IEEE SC 1999 Conference, pp. 1–12 (1999)
Google Scholar
Das, R., Uysal, M., Saltz, J., Shin Hwang, Y.: Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures. Journal of Parallel and Distributed Computing 22, 462–479 (1993)
Article Google Scholar
Ferrer, R., González, M., Silla, F., Martorell, X., Ayguadé, E.: Evaluation of memory performance on the Cell BE with the SARC programming model. In: MEDEA 2008: Proceedings of the 9th Workshop on MEmory Performance, pp. 77–84. ACM, New York (2008)
Google Scholar
Gueron, S.: Intel Advanced Encryption Standard (AES) Instructions Set (2010), http://software.intel.com/enus/articles/intel-advancedencryption-standard-aesinstructions-set/
Gwennap, L.: AltiVec Vectorizes PowerPC. Microprocessor Report 12(6), 1–5 (1998)
Google Scholar
IBM. Cell Broadband Engine Programming Handbook Including the PowerXCell 8i Processor, 1.11 edn. (May 2008)
Google Scholar
Juurlink, B., Cheresiz, D., Vassiliadis, S., Wijshoff, H.A.G.: Implementation and Evaluation of the Complex Streamed Instruction Set. In: Int. Conf. on Parallel Architectures and Compilation Techniques (PACT), pp. 73–82 (2001)
Google Scholar
Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.: Introduction to the Cell Multiprocessor. IBM J. Res. Dev. 49(4/5), 589–604 (2005)
Article Google Scholar
Kuck, D., Stokes, R.: The Burroughs Scientific Processor (BSP). IEEE Transactions on Computers C-31(5), 363–376 (1982)
Article Google Scholar
Panda, D., Hwang, K.: Reconfigurable Vector Register Windows for Fast Matrix Computation on the Orthogonal Multiprocessor. In: Proceedings of the International Conference on Application Specific Array Processors, pp. 202–213, 5-7 (1990)
Google Scholar
Park, J., Park, S.-B., Balfour, J.D., Black-Schaffer, D., Kozyrakis, C., Dally, W.J.: Register Pointer Architecture for Efficient Embedded Processors. In: DATE 2007: Proceedings of the Conference on Design, Automation and Test in Europe, San Jose, CA, USA, pp. 600–605. EDA Consortium (2007)
Google Scholar
Ramirez, A., Cabarcas, F., Juurlink, B., Alvarez Mesa, M., Azevedo, A., Meenderinck, C., Gaydadjiev, G., Ciobanu, C., Isaza, S., Sanchez, F.: The SARC Architecture. Micro 30(5), 16–29 (2010)
Google Scholar
Rico, A., Cabarcas, F., Quesada, A., Pavlovic, M., Vega, A., Villavieja, C., Etsion, Y., Ramirez, A.: Scalable Simulation of Decoupled Accelerator Architectures. Technical report, Universitat Politècnica de Catalunya, Barcelona, Spain (2010)
Google Scholar
Shahbahrami, A., Juurlink, B., Vassiliadis, S.: Matrix Register File and Extended Subwords: Two Techniques for Embedded Media Processors. In: Proceedings of the 2nd ACM Int. Conf. on Computing Frontiers, pp. 171–180 (May 2005)
Google Scholar
Shewchuk, J.R.: An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Technical report, Carnegie Mellon University, Pittsburgh, PA, USA (1994)
Google Scholar
Wong, S., Anjam, F., Nadeem, M.: Dynamically Reconfigurable Register File for a Softcore VLIW Processor. In: Proceedings of the Design, Automation and Test in Europe Conference, DATE 2010 (March 2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering Laboratory, Electrical Engineering Department, Delft University of Technology, The Netherlands
Cătălin B. Ciobanu, Georgi K. Kuzmanov & Georgi N. Gaydadjiev
Universitat Politècnica de Catalunya, Spain
Xavier Martorell & Alex Ramirez
Barcelona Supercomputing Center, Spain
Xavier Martorell & Alex Ramirez

Authors

Cătălin B. Ciobanu
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Martorell
View author publications
You can also search for this author in PubMed Google Scholar
Georgi K. Kuzmanov
View author publications
You can also search for this author in PubMed Google Scholar
Alex Ramirez
View author publications
You can also search for this author in PubMed Google Scholar
Georgi N. Gaydadjiev
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Datentechnik und Kommunikationsnetze, Hans-Sommer-Straße 66, 38106, Braunschweig, Germany
Mladen Berekovic
Dipartimento di elettronica e informazione, Via Ponzio 34/5, 20133, Milano, Italy
William Fornaciari & Cristina Silvano &
Johann Wolfgang Goethe-Universität Frankfurt, Robert-Mayer-Straße 11-15, 60325, Frankfurt am Main, Germany
Uwe Brinkschulte

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ciobanu, C.B., Martorell, X., Kuzmanov, G.K., Ramirez, A., Gaydadjiev, G.N. (2011). Scalability Evaluation of a Polymorphic Register File: A CG Case Study. In: Berekovic, M., Fornaciari, W., Brinkschulte, U., Silvano, C. (eds) Architecture of Computing Systems - ARCS 2011. ARCS 2011. Lecture Notes in Computer Science, vol 6566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19137-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-19137-4_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19136-7
Online ISBN: 978-3-642-19137-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Scalability Evaluation of a Polymorphic Register File: A CG Case Study

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

The Case for Polymorphic Registers in Dataflow Computing

Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication

SU3_Bench on a Programmable Integrated Unified Memory Architecture (PIUMA) and How that Differs from Standard NUMA CPUs

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Scalability Evaluation of a Polymorphic Register File: A CG Case Study

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

The Case for Polymorphic Registers in Dataflow Computing

Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication

SU3_Bench on a Programmable Integrated Unified Memory Architecture (PIUMA) and How that Differs from Standard NUMA CPUs

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation