research-article

Open access

Symmetry in Software Synthesis

Authors:

Jeronimo CastrillonAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 14, Issue 2

Article No.: 20, Pages 1 - 26

https://doi.org/10.1145/3095747

Published: 21 July 2017 Publication History

Abstract

With the surge of multi- and many-core systems, much research has focused on algorithms for mapping and scheduling on these complex platforms. Large classes of these algorithms face scalability problems. This is why diverse methods are commonly used for reducing the search space. While most such approaches leverage the inherent symmetry of architectures and applications, they do it in a problem-specific and intuitive way. However, intuitive approaches become impractical with growing hardware complexity, like Network-on-Chip interconnect or heterogeneous cores. In this article, we present a formal framework that can determine the inherent local and global symmetry of architectures and applications algorithmically and leverage these for problems in software synthesis. Our approach is based on the mathematical theory of groups and a generalization called inverse semigroups. We evaluate our approach in two state-of-the-art mapping frameworks. Even for the platforms with a handful of cores of today and moderate-sized benchmarks, our approach consistently yields reductions of the overall execution time of algorithms. We obtain a speedup of more than 10 × for one use-case and saved 10% of time in another.

Supplementary Material

TACO1402-20 (taco1402-20.pdf)

Slide deck associated with this paper

Download
1.47 MB

References

[1]

László Babai. 2015. Graph isomorphism in quasipolynomial time. arXiv preprint arXiv:1512.03547 (2015).

[2]

J. Balkind, M. McKeown, Y. Fu, T. Nguyen, Y. Zhou, A. Lavrov, M. Shahrad, A. Fuchs, S. Payne, X. Liang, M. Matl, and D. Wentzlaff. 2016. OpenPiton: An open source manycore research framework. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16). ACM, New York, NY, 217--232.

Digital Library

[3]

Reimer Behrends, Kevin Hammond, Vladimir Janjic, Alexander Konovalov, Steve Linton, Hans-Wolfgang Loidl, Patrick Maier, and Phil Trinder. 2016. HPC-GAP: Engineering a 21st-century high-performance computer algebra system. Concurrency and Computation: Practice and Experience 28 (2016), 3606--3636.

Digital Library

[4]

Eric Biscondi, Tom Flanagan, Frank Fruth, Zhihong Lin, and Filip Moerman. 2012. Maximizing Multicore Efficiency with Navigator Runtime. White Paper. (Feb. 2012). Retrieved from www.ti.com/lit/wp/spry190/spry190.pdf.

[5]

Wieb Bosma, John Cannon, and Catherine Playoust. 1997. The Magma algebra system: The user language. Journal of Symbolic Computing 24, 3--4 (1997), 235--265.

Digital Library

[6]

Simone Casale Brunet, Marco Mattavelli, Claudio Alberti, and Jorn W Janneck. 2013. Design space exploration of high level stream programs on parallel architectures: A focus on the buffer size minimization and optimization problem. In Proceedings of the 8th International Symposium on Image and Signal Processing and Analysis.

[7]

Jeronimo Castrillon, Rainer Leupers, and Gerd Ascheid. 2013. MAPS: Mapping concurrent dataflow applications to heterogeneous MPSoCs. IEEE Transactions on Industrial Informatics 9, 1 (Feb. 2013), 527--545.

[8]

Jeronimo Castrillon, Weihua Sheng, and Rainer Leupers. 2011. Trends in embedded software synthesis. In Proceedings of the International Conference on Embedded Computer Systems (SAMOS’11). IEEE, 347--354.

[9]

Jeronimo Castrillon, Andreas Tretter, Rainer Leupers, and Gerd Ascheid. 2012. Communication-aware mapping of KPN applications onto heterogeneous MPSoCs. In Proceedings of the 49th Annual Conference on Design Automation (DAC’12).

Digital Library

[10]

Kuan-Hsun Chen, Jian-Jia Chen, Florian Kriebel, Semeen Rehman, Muhammad Shafique, and Jörg Henkel. 2016. Task mapping for redundant multithreading in multi-cores with reliability and performance heterogeneity. IEEE Transactions on Computers 65, 11 (2016), 3441--3455.

Digital Library

[11]

Eric Cheung, Harry Hsieh, and Felice Balarin. 2007. Automatic buffer sizing for rate-constrained KPN applications on multiprocessor system-on-chip. In Proceedings of the 2007 IEEE International High Level Design Validation and Test Workshop. IEEE Computer Society, Washington, D.C., 37--44.

Digital Library

[12]

Paolo Codenotti, Hadi Katebi, Karem A. Sakallah, and Igor L. Markov. 2013. Conflict analysis and branching heuristics in the search for graph automorphisms. In Proceedings of the 2013 IEEE 25th International Conference on Tools with Artificial Intelligence (ICTAI’13). IEEE, 907--914.

Digital Library

[13]

Harvey A. Cohen. 1988. Symmetry considerations applied to hardware convolvers for image filtering. In Proceedings of the 1988 IEEE International Conference on Systems, Man, and Cybernetics, Vol. 2. IEEE, 1128--1131.

[14]

Benoît Dupont de Dinechin, Renaud Ayrignac, Pierre-Edouard Beaucamps, Patrice Couvert, Benoit Ganne, Pierre Guironnet de Massas, François Jacquet, Samuel Jones, Nicolas Morey Chaisemartin, Frédéric Riss, and others. 2013. A clustered manycore processor architecture for embedded and accelerated applications. In HPEC. 1--6.

[15]

K. Deb. 2001. Multi-objective Optimization Using Evolutionary Algorithms. Vol. 16. John Wiley 8 Sons.

[16]

Marco Dorigo, Mauro Birattari, Christian Blum, Maurice Clerc, Thomas Stützle, and Alan Winfield. 2008. Proceedings of the 5th International Conference on Ant Colony Optimization and Swarm Intelligence (ANTS’08). Vol. 5217. Springer.

[17]

J. East, A. Egri-Nagy, J. D. Mitchell, and Y. Péresse. 2015. Computing finite semigroups. arXiv preprint arXiv:1510.01868 (2015).

[18]

C. Erbas, S. Cerav-Erbas, and A. D. Pimentel. 2006. Multiobjective optimization and evolutionary algorithms for the application mapping problem in multiprocessor system-on-chip design. IEEE Transactions on Evolutionary Computation 10, 3 (June 2006), 358--374.

Digital Library

[19]

Andrés Goens and Jeronimo Castrillon. 2015. Analysis of process traces for mapping dynamic KPN applications to MPSoCs. In Proceedings of the IFIP International Embedded Systems Symposium (IESS). Foz do Iguaçu, Brazil.

[20]

A. Goens, R. Khasanov, J. Castrillon, S. Polstra, and A. Pimentel. 2016. Why comparing system-level MPSoC mapping approaches is difficult: A case study. In Proceedings of the IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16).

[21]

Masaki Gondo, Fumio Arakawa, and Masato Edahiro. 2014. Establishing a standard interface between multi-manycore and software tools-SHIM. In COOL Chips XVII, 2014 IEEE. IEEE, 1--3.

[22]

P. Greenhalgh. 2011. Big.LITTLE processing with arm cortex-a15 8 cortex-a7. ARM White Paper (2011), 1--8.

[23]

Linley Gwennap. 2011. Adapteva: More flops, less watts. Microprocessor Report 6, 13 (2011), 11--02.

[24]

Frank Hannig and Jürgen Teich. 2001. Design space exploration for massively parallel processor arrays. In Parallel Computing Technologies. Springer, 51--65.

[25]

G. Hempel, A. Goens, J. Asmus, J. Castrillon, and I. Sbalzarini. 2017. Robust mapping of process networks to many-core systems using bio-inspired design centering. In Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES’17).

Digital Library

[26]

Tommi Junttila and Petteri Kaski. 2011. Conflict propagation and component recursion for canonical labeling. In Theory and Practice of Algorithms in (Computer) Systems. Springer, 151--162.

[27]

Gilles Kahn. 1974. The semantics of a simple language for parallel programming. In Information Processing’74: Proceedings of the IFIP Congress, Vol. 74. 471--475.

[28]

J. Keinert, T. Schlichter, J. Falk, J. Gladigau, C. Haubelt, J. Teich, M. Meredith, and others. 2009. SystemCoDesigner- An automatic ESL synthesis approach by design space exploration and behavioral synthesis for streaming applications. ACM TODAES 14, 1 (2009), 1.

Digital Library

[29]

Heba Khdr, Santiago Pagani, Muhammad Shafique, and Jörg Henkel. 2015. Thermal constrained resource management for mixed ILP-TLP workloads in dark silicon chips. In Proceedings of the 52nd Annual Design Automation Conference. ACM, 179.

Digital Library

[30]

Márcio Kreutz, César A. Marcon, Luigi Carro, Flávio Wagner, and Altamiro A. Susin. 2005. Design space exploration comparing homogeneous and heterogeneous network-on-chip architectures. In Proceedings of the 18th Annual Symposium on Integrated Circuits and System Design. ACM, 190--195.

[31]

Mark V. Lawson. 1998. Inverse Semigroups: The Theory of Partial Symmetries. World Scientific.

[32]

E. Lee and D. Messerschmitt. 1987. Synchronous data flow. Proceedings of the IEEE 75, 9 (1987), 1235--1245.

[33]

Hung-Yi Liu, Michele Petracca, and Luca P. Carloni. 2012. Compositional system-level design exploration with planning of high-level synthesis. In Proceedings of the Conference on Design, Automation and Test in Europe. EDA Consortium, 641--646.

[34]

José Luis López-Presa, Antonio Fernández Anta, and Luis Núñez Chiroque. 2011. Conauto-2.0: Fast isomorphism testing and automorphism group computation. arXiv preprint arXiv:1108.1060 (2011).

[35]

Frank Lübeck and Max Neunhöffer. 2001. Enumerating large orbits and direct condensation. Experimental Mathematics 10, 2 (2001), 197--205.

[36]

Brendan D. McKay and Adolfo Piperno. 2014. Practical graph isomorphism, II. Journal of Symbolic Computation 60, 0 (2014), 94--112.

Digital Library

[37]

J. D. Mitchell, M. Delgado, J. East, A. Egri-Nagy, N. Ham, J. Jonusas, M. Pfeiffer, B. Steinberg, J. Smith, M. Torpey, and W. Wilson. 2016. Semigroups, Version 2.8.0. Retrieved from https://gap-packages.github.io/Semigroups.

[38]

Maximilian Odendahl, Jeronimo Castrillon, Vitaliy Volevach, Rainer Leupers, and Gerd Ascheid. 2013. Split-cost communication model for improved MPSoC application mapping. In Proceedings of the 2013 International Symposium on System on Chip (SoC’13). IEEE, 1--8.

[39]

A. Olofsson, T. Nordström, and Z. Ul-Abdin. 2014. Kickstarting high-performance energy-efficient manycore architectures with Epiphany. In 2014 48th Asilomar Conference on Signals, Systems and Computers. IEEE, 1719--1726.

[40]

Gianluca Palermo, Cristina Silvano, and Vittorio Zaccaria. 2005. Multi-objective design space exploration of embedded systems. Journal of Embedded Computing 1, 3 (2005), 305--316.

Digital Library

[41]

M. Pelcat, K. Desnos, L. Maggiani, Y. Liu, J. Heulot, J. F. Nezan, and S. Bhattacharyya. 2015. Models of Architecture. Research Report PREESM/2015-12TR01, 2015. IETR/INSA Rennes ; Scuola Superiore Sant’Anna, Pisa ; Institut Pascal, Clermont Ferrand ; University of Maryland, College Park ; Tampere University of Technology, Tampere. https://hal.archives-ouvertes.fr/hal-01244470

[42]

A. Pimentel, C. Erbas, and S. Polstra. 2006. A systematic approach to exploring embedded system architectures at multiple abstraction levels. IEEE Transactions on Computers 55, 2 (2006), 99--112.

Digital Library

[43]

W. Quan and A. Pimentel. 2014. Towards exploring vast MPSoC mapping design spaces using a bias-elitist evolutionary approach. In Proceedings of the 2014 17th Euromicro Conference on DSD. IEEE, 655--658.

Digital Library

[44]

Wei Quan and Andy D. Pimentel. 2015. A hybrid task mapping algorithm for heterogeneous MPSoCs. ACM Transactions on Embedded Computing Systems (TECS) 14, 1 (2015), 14.

Digital Library

[45]

Carl Ramey. 2011. TILE-Gx100 ManyCore Processor: Acceleration Interfaces and Architecture. Presented at HotChips 23. (Aug. 2011).

[46]

Sascha Roloff, David Schafhauser, Frank Hannig, and Jürgen Teich. 2015. Execution-driven parallel simulation of PGAS applications on heterogeneous tiled architectures. In Proceedings of the 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 1--6.

Digital Library

[47]

T. Schwarzer, A. Weichslgartner, M. Gla, S. Wildermann, P. Brand, and J. Teich. 2017. Symmetry-eliminating Design Space Exploration for Hybrid Application Mapping on Many-Core Architectures. Retrieved from https://cris.fau.de/converis/publicweb/Publication/1061555.

[48]

Ákos Seress. 2003. Permutation Group Algorithms. Vol. 152. Cambridge University Press.

[49]

Weihua Sheng, Artur Wiebe, Anastasia Stulova, Rainer Leupers, Bart Kienhuis, Johan Walters, and Gerd Ascheid. 2012. FIFO exploration in mapping streaming applications onto the TI OMAP3530 platform: Case study and optimizations. In Proceedings of the 2012 IEEE 6th International Symposium on Embedded Multicore SoCs. IEEE, 51--58.

Digital Library

[50]

Software Solutions GmbH Silexica. 2016. SLXMapper. Retrieved from http://www.silexica.com.

[51]

Charles C. Sims. 1970. Computational methods in the study of permutation groups. In Computational Problems in Abstract Algebra. 169--183.

[52]

A. Singh, M. Shafique, A. Kumar, and J. Henkel. 2013b. Mapping on multi/many-core systems: Survey of current and emerging trends. In Proceedings of the 50th Annual Design Automation Conference. ACM, 1.

Digital Library

[53]

Amit Kumar Singh, Akash Kumar, and Thambipillai Srikanthan. 2013a. Accelerating throughput-aware runtime mapping for heterogeneous MPSoCs. ACM TODAES 18, 1 (2013), 9.

Digital Library

[54]

Ian Stewart, Martin Golubitsky, and Marcus Pivato. 2003. Symmetry groupoids and patterns of synchrony in coupled cell networks. SIAM Journal on Applied Dynamical Systems 2, 4 (2003), 609--646.

[55]

The GAP Group 2016. GAP -- Groups, Algorithms, and Programming, Version 4.8.5. The GAP Group. Retrieved from http://www.gap-system.org.

[56]

Lothar Thiele, Iuliana Bacivarov, Wolfgang Haid, and Kai Huang. 2007. Mapping applications to tiled multiprocessor embedded systems. In Proceedings of the 7th International Conference on Application of Concurrency to System Design (ACSD’07). IEEE Computer Society, Washington, D.C., 29--40.

Digital Library

[57]

Mark Thompson and Andy D. Pimentel. 2013. Exploiting domain knowledge in system-level MPSoC design space exploration. Journal of Systems Architecture 59, 7 (2013), 351--360.

Digital Library

[58]

Anish Varghese, Bob Edwards, Gaurav Mitra, and Alistair P Rendell. 2015. Programming the Adapteva Epiphany 64-core network-on-chip coprocessor. International Journal of High Performance Computing Applications (2015), 1094342015599238.

[59]

G. Gary Wang and Songqing Shan. 2004. Design space reduction for multi-objective optimization and robust design optimization problems. SAE SP 113, 5 (2004), 37--46.

[60]

A. Weichslgartner, S. Wildermann, J. Götzfried, Felix Freiling, M. Glaß, and J. Teich. 2016. Design-time/run-time mapping of security-critical applications in heterogeneous MPSoCs. In Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems. ACM, 153--162.

Digital Library

[61]

A. Weinstein. 1996. Groupoids: Unifying internal and external symmetry. Notices of the AMS 43, 7 (1996), 744--752.

Cited By

Müller LSchumacher NSteffen LHaubelt C(2024)Generative Design of the Architecture Platform in Multiprocessor System DesignElectronics10.3390/electronics1307140413:7(1404)Online publication date: 8-Apr-2024
https://doi.org/10.3390/electronics13071404
Castrillon JDesnos KGoens AMenard C(2023)Dataflow Models of Computation for Programming Heterogeneous MulticoresHandbook of Computer Architecture10.1007/978-981-15-6401-7_45-2(1-40)Online publication date: 28-Sep-2023
https://doi.org/10.1007/978-981-15-6401-7_45-2
Goens ANicolai TCastrillon J(2022)mpsym: Improving Design-Space Exploration of Clustered Manycores With Arbitrary TopologiesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.310251241:6(1592-1605)Online publication date: Jun-2022
https://doi.org/10.1109/TCAD.2021.3102512
Show More Cited By

Index Terms

Symmetry in Software Synthesis

Recommendations

Glasswing: accelerating mapreduce on multi-core and many-core clusters
HPDC '14: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing

The impact and significance of parallel computing techniques is continuously increasing given the current trend of incorporating more cores in new processor designs. However, many Big Data systems fail to exploit the abundant computational power of ...
Effective Kernel Mapping for OpenCL Applications in Heterogeneous Platforms
ICPPW '12: Proceedings of the 2012 41st International Conference on Parallel Processing Workshops

Many core accelerators are being deployed in many systems to improve the processing capabilities. In such systems, application mapping need to be enhanced to maximize the utilization of the underlying architecture. Especially in GPUs mapping becomes ...
Evaluation of a performance portable lattice Boltzmann code using OpenCL
IWOCL '14: Proceedings of the International Workshop on OpenCL 2013 & 2014

With the advent of many-core computer architectures such as GPGPUs from NVIDIA and AMD, and more recently Intel's Xeon Phi, ensuring performance portability of HPC codes is potentially becoming more complex. In this work we have focused on one important ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 14, Issue 2

June 2017

259 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3086564

Editor:
Koen De Bosschere
Ghent University

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 July 2017

Accepted: 01 May 2017

Revised: 01 May 2017

Received: 01 November 2016

Published in TACO Volume 14, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Center for Advancing Electronics Dresden (cfaed)
Graduiertenkolleg Experimentelle und konstruktive Algebra (GK EukA)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
486
Total Downloads

Downloads (Last 12 months)81
Downloads (Last 6 weeks)13

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Müller LSchumacher NSteffen LHaubelt C(2024)Generative Design of the Architecture Platform in Multiprocessor System DesignElectronics10.3390/electronics1307140413:7(1404)Online publication date: 8-Apr-2024
https://doi.org/10.3390/electronics13071404
Castrillon JDesnos KGoens AMenard C(2023)Dataflow Models of Computation for Programming Heterogeneous MulticoresHandbook of Computer Architecture10.1007/978-981-15-6401-7_45-2(1-40)Online publication date: 28-Sep-2023
https://doi.org/10.1007/978-981-15-6401-7_45-2
Goens ANicolai TCastrillon J(2022)mpsym: Improving Design-Space Exploration of Clustered Manycores With Arbitrary TopologiesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.310251241:6(1592-1605)Online publication date: Jun-2022
https://doi.org/10.1109/TCAD.2021.3102512
Castrillon JDesnos KGoens AMenard C(2022)Dataflow Models of Computation for Programming Heterogeneous MulticoresHandbook of Computer Architecture10.1007/978-981-15-6401-7_45-1(1-40)Online publication date: 28-Jan-2022
https://doi.org/10.1007/978-981-15-6401-7_45-1
Pimentel A(2022)Methodologies for Design Space ExplorationHandbook of Computer Architecture10.1007/978-981-15-6401-7_23-1(1-31)Online publication date: 27-Jan-2022
https://doi.org/10.1007/978-981-15-6401-7_23-1
Müller LNeubauer KHaubelt C(2021)Exploiting Similarity in Evolutionary Product Design for Improved Design Space ExplorationEmbedded Computer Systems: Architectures, Modeling, and Simulation10.1007/978-3-031-04580-6_3(33-49)Online publication date: 4-Jul-2021
https://dl.acm.org/doi/10.1007/978-3-031-04580-6_3
Sheng WCastrillon JLeupers R(2021)Software Compilation and Optimization Techniques for Heterogeneous Multi‐core PlatformsMulti‐Processor System‐on‐Chip 210.1002/9781119818410.ch10(203-235)Online publication date: 28-Apr-2021
https://doi.org/10.1002/9781119818410.ch10
Khasanov RCastrillon JDi Natale GFummi F(2020)Energy-efficient runtime resource management for adaptable multi-application mappingProceedings of the 23rd Conference on Design, Automation and Test in Europe10.5555/3408352.3408558(909-914)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.5555/3408352.3408558
Khasanov RCastrillon J(2020)Energy-efficient Runtime Resource Management for Adaptable Multi-application Mapping2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116381(909-914)Online publication date: Mar-2020
https://doi.org/10.23919/DATE48585.2020.9116381
Bouraoui HCastrillon JJerad C(2019)Comparing Dataflow and OpenMP Programming for Speaker Recognition ApplicationsProceedings of the 10th and 8th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms10.1145/3310411.3310417(1-6)Online publication date: 21-Jan-2019
https://dl.acm.org/doi/10.1145/3310411.3310417
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents