research-article

Erbium: a deterministic, concurrent intermediate representation to map data-flow tasks to scalable, persistent streaming processes

Authors:

Cupertino Miranda,

Philippe Dumont,

Marc DurantonAuthors Info & Claims

CASES '10: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems

Pages 11 - 20

https://doi.org/10.1145/1878921.1878924

Published: 24 October 2010 Publication History

Abstract

Tuning applications for multicore systems involve subtle concurrency concepts and target-dependent optimizations. This paper advocates for a streaming execution model, called ER, where persistent processes communicate and synchronize through a multi-consumer processing applications, we demonstrate the scalability and efficiency advantages of streaming compared to data-driven scheduling. To exploit these benefits in compilers for parallel languages, we propose an intermediate representation enabling the compilation of data-flow tasks into streaming processes. This intermediate representation also facilitates the application of classical compiler optimizations to concurrent programs.

References

[1]

G. Al-Kadi and A. S. Terechko. A hardware task scheduler for embedded video processing. In Proc. of the 4th Intl. Conf. on High Performance and Embedded Architectures and Compilers (HiPEAC'09), Paphos, Cyprus, Jan. 2009.

Digital Library

[2]

M. Aldinucci, M. Meneghin, and M. Torquati. Efficient Smith-Waterman on multi-core with FastFlow. In Euromicro Intl. Conf. on Parallel, Distributed and Network-Based Processing, pages 195--199, Pisa, Feb. 2010.

Digital Library

[3]

Arvind, R. S. Nikhil, and K. Pingali. I-structures: Data structures for parallel computing. ACM Trans. on Programming Languages and Systems, 11(4):598--632, 1989.

Digital Library

[4]

C. Augonnet, S. Thibault, R. Namyst, and M. Nijhuis. Exploiting the Cell/BE architecture with the StarPU unified runtime system. In Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS'09), pages 329--339, 2009.

Digital Library

[5]

A. Azevedo, C. Meenderinck, B. H. H. Juurlink, A. Terechko, J. Hoogerbrugge, M. Alvarez, and A. Ramírez. Parallel H.264 decoding on an embedded multicore processor. In Proc. of the 4th Intl. Conf. on High Performance and Embedded Architectures and Compilers (HiPEAC'09), Paphos, Cyprus, Jan. 2009.

Digital Library

[6]

P. M. Carpenter, D. Ródenas, X. Martorell, A. Ramırez, and E. Ayguadé. A streaming machine description and programming model. In Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS'07), pages 107--116, Samos, Greece, July 2007.

Digital Library

[7]

P. Caspi and M. Pouzet. Synchronous Kahn networks. In ACM Intl. Conf. on Functional programming (ICFP'96), pages 226--238, 1996.

Digital Library

[8]

A. Cohen, L. Mandel, F. Plateau, and M. Pouzet. Abstraction of clocks in synchronous data-flow systems. In 6th Asian Symp. on Programming Languages and Systems (APLAS 08), Bangalore, India, Dec. 2008.

Digital Library

[9]

I. Corp. Occam Programming Manual. Prentice Hall, 1984.

Digital Library

[10]

D. E. Culler and Arvind. Resource requirements of dataflow programs. In ISCA, pages 141--150, 1988.

Digital Library

[11]

J. B. Dennis and G. R. Gao. An efficient pipelined dataflow processor architecture. In Supercomputing (SC'88), pages 368--373, 1988.

Digital Library

[12]

H. M. et al. Acotes project: Advanced compiler technologies for embedded streaming. Intl. J. of Parallel Programming, 2010. Special issue on European HiPEAC network of excellence member's projects.

[13]

F. L. Fessant and L. Maranget. Compiling join-patterns. Electr. Notes Theor. Comput. Sci., 16(3), 1998.

[14]

C. Fournet and G. Gonthier. The reflexive chemical abstract machine and the join-calculus. In ACM Symp. on Principles of Programming Languages, pages 372--385, St. Petersburg Beach, Florida, Jan. 1996. ACM.

Digital Library

[15]

J. Giacomoni, T. Moseley, and M. Vachharajani. Fastforward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue. In ACM Symp. on Principles and practice of parallel programming (PPoPP'08), pages 43--52, Salt Lake City, Utah, 2008.

Digital Library

[16]

R. Gupta. Exploiting parallelism on a fine-grain MIMD architecture based upon channel queues. Intl. J. of Parallel Programming, 21(3):169--192, 1992.

Digital Library

[17]

W. Haid, L. Schor, K. Huang, I. Bacivarov, and L. Thiele. Efficient execution of Kahn process networks on multi-processor systems using protothreads and windowed FIFOs. In Workshop on Embedded Systems for Real-Time Multimedia (ESTImedia'09), pages 35--44, Grenoble, France, Oct. 2009.

[18]

N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous dataflow programming language Lustre. Proc. of the IEEE, 79(9):1305--1320, Sept. 1991.

[19]

R. H. Halstead, Jr. Multilisp: a language for concurrent symbolic computation. ACM Trans. on Programming Languages and Systems, 7(4):501--538, 1985.

Digital Library

[20]

T. Henriksson and P. van der Wolf. TTL hardware interface: A high-level interface for streaming multiprocessor architectures. In Workshop on Embedded Systems for Real-Time Multimedia (ESTImedia'06), pages 107--112, Seoul, Korea, Oct. 2006.

Digital Library

[21]

C. A. R. Hoare. Communicating Sequential Processes. Prentice-Hall, 1985.

Digital Library

[22]

G. Kahn. The semantics of a simple language for parallel programming. In J. L. Rosenfeld, editor, Information processing, pages 471--475, Stockholm, Sweden, Aug. 1974. North Holland, Amsterdam.

[23]

C. Kim, J.-L. Gaudiot, and W. Proskurowski. Parallel computing with the sisal applicative language: Programmability and performance issues. Software, Practice and Experience, 26(9):1025--1051, 1996.

Digital Library

[24]

C. Kyriacou, P. Evripidou, and P. Trancoso. Data-driven multithreading using conventional microprocessors. IEEE Trans. on Parallel Distributed Systems, 17(10):1176--1188, 2006.

Digital Library

[25]

E. A. Lee and D. G. Messerschmitt. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. on Computers, 36(1):24--25, 1987.

Digital Library

[26]

E. A. Lee and A. L. Sangiovanni-Vincentelli. A framework for comparing models of computation. IEEE Trans. on CAD of Integrated Circuits and Systems, 17(12):1217--1229, 1998.

Digital Library

[27]

K. H. R. M. Frigo, C. E. Leiserson. The implementation of the Cilk-5 multithreaded language. In ACM Symp. on Programming Language Design and Implementation (PLDI'98), pages 212--223, Montreal, Quebec, June 1998.

Digital Library

[28]

V. Marjanovic, J. Labarta, E. Ayguadé, and M. Valero. Effective communication and computation overlap with hybrid MPI/SMPSs. In PPOPP, 2010.

Digital Library

[29]

R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes, i and ii. Inf. Comput., 100(1):1--40 and 41--77, 1992.

Digital Library

[30]

M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient deterministic multithreading in software. In The Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, Washington, DC, Mar 2009.

Digital Library

[31]

G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In IEEE Intl. Symp. on Microarchitecture (MICRO'05), pages 105--118, 2005.

Digital Library

[32]

J. M. Pérez, P. Bellens, R. M. Badia, and J. Labarta. CellSs: Making it easier to program the cell broadband engine processor. IBM Journal of Research and Development, 51(5):593--604, 2007.

Digital Library

[33]

J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta. Hierarchical task-based programming with starss. Intl. J. on High Performance Computing Architecture, 23(3):284--299, 2009.

Digital Library

[34]

A. Pop and A. Cohen. A stream-comptuting extension to OpenMP. In Proc. of the 4th Intl. Conf. on High Performance and Embedded Architectures and Compilers (HiPEAC'11), Jan. 2011.

Digital Library

[35]

A. Pop, S. Pop, and J. Sjödin. Automatic streamization in GCC. In GCC Developer's Summit, Montreal, Quebec, June 2009.

[36]

M. C. Rinard and M. S. Lam. The design, implementation, and evaluation of Jade. ACM Trans. on Programming Languages and Systems, 20(3):483--545, 1998.

Digital Library

[37]

M. Själander, A. Terechko, and M. Duranton. A look-ahead task management unit for embedded multi-core architectures. In Proc. of the 2008 11th EUROMICRO Conf. on Digital System Design Architectures, Parma, Italy, Sept. 2008.

Digital Library

[38]

K. Stavrou, M. Nikolaides, D. Pavlou, S. Arandi, P. Evripidou, and P. Trancoso. Tflux: A portable platform for data-driven multithreading on commodity multicore systems. In Intl. Conf. on Parallel Processing (ICPP'08), pages 25--34, Portland, Oregon, Sept. 2008.

Digital Library

[39]

S. Stuijk. Concurrency in computational networks. Master's thesis, Technische Universiteit Eindhoven (TU/e), Oct. 2002. # 446407.

[40]

W. Thies and S. Amarasinghe. An empirical characterization of stream programs and its implications for language and compiler design. In Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'10), Vienna, Austria, Sept. 2010.

Digital Library

[41]

W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A language for streaming applications. In Intl. Conf. on Compiler Construction, Grenoble, France, Apr. 2002.

Digital Library

[42]

I. Watson and J. R. Gurd. A practical data flow computer. IEEE Computer, 15(2):51--57, 1982.

Digital Library

Cited By

Bonifati ATommasini RBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654679
Susungi ATadonki C(2021)Intermediate Representations for Explicitly Parallel ProgramsACM Computing Surveys10.1145/345229954:5(1-24)Online publication date: 25-May-2021
https://dl.acm.org/doi/10.1145/3452299
Mastoras AGross T(2019)Efficient and Scalable Execution of Fine-Grained Dynamic Linear PipelinesACM Transactions on Architecture and Code Optimization10.1145/330741116:2(1-26)Online publication date: 18-Apr-2019
https://dl.acm.org/doi/10.1145/3307411
Show More Cited By

Index Terms

Erbium: a deterministic, concurrent intermediate representation to map data-flow tasks to scalable, persistent streaming processes
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

ERBIUM: a deterministic, concurrent intermediate representation for portable and scalable performance
CF '10: Proceedings of the 7th ACM international conference on Computing frontiers

Optimizing compilers and runtime libraries do not shield programmers from the complexity of multi-core hardware; as a result the need for manual, target-specific optimizations increases with every processor generation. High-level languages are being ...
Virtual world consistency: A condition for STM systems (with a versatile protocol with invisible read operations)

The aim of a Software Transactional Memory (STM) is to discharge the programmers from the management of synchronization in multiprocess programs that access concurrent objects. To that end, an STM system provides the programmer with the concept of a ...
A versatile STM protocol with invisible read operations that satisfies the virtual world consistency condition
SIROCCO'09: Proceedings of the 16th international conference on Structural Information and Communication Complexity

The aim of a Software Transactional Memory (STM) is to discharge the programmers from the management of synchronization in multiprocess programs that access concurrent objects. To that end, a STM system provides the programmer with the concept of a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CASES '10: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems

October 2010

276 pages

ISBN:9781605589039

DOI:10.1145/1878921

Program Chairs:
Vinod Kathail
USA
,
Reid Tatge
Texas Instruments, USA
,
Rajeev Barua
University of Maryland, College Park, USA

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

CEDA
IEEE CAS
IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESWeek '10

Sponsor:

ESWeek '10: Sixth Embedded Systems Week

October 24 - 29, 2010

Arizona, Scottsdale, USA

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Upcoming Conference

ESWEEK '24

Sponsor:
sigbed
sigbed
sigbed

Twentieth Embedded Systems Week

September 29 - October 4, 2024

Raleigh , NC , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
255
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bonifati ATommasini RBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654679
Susungi ATadonki C(2021)Intermediate Representations for Explicitly Parallel ProgramsACM Computing Surveys10.1145/345229954:5(1-24)Online publication date: 25-May-2021
https://dl.acm.org/doi/10.1145/3452299
Mastoras AGross T(2019)Efficient and Scalable Execution of Fine-Grained Dynamic Linear PipelinesACM Transactions on Architecture and Code Optimization10.1145/330741116:2(1-26)Online publication date: 18-Apr-2019
https://dl.acm.org/doi/10.1145/3307411
Mastoras AGross T(2018)Unifying Fixed Code Mapping, Communication, Synchronization and Scheduling Algorithms for Efficient and Scalable Loop PipeliningIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.281720729:9(2136-2149)Online publication date: 1-Sep-2018
https://doi.org/10.1109/TPDS.2018.2817207
Aldinucci MDanelutto MKilpatrick PTorquati M(2017)Fastflow: High‐Level and Efficient Streaming on MulticoreProgramming multi‐core and many‐core computing systems10.1002/9781119332015.ch13(261-280)Online publication date: 27-Jan-2017
https://doi.org/10.1002/9781119332015.ch13
Sbîrlea DShirako JNewton RSarkar V(2016)SCnCInternational Journal of Parallel Programming10.1007/s10766-015-0353-x44:2(233-256)Online publication date: 1-Apr-2016
https://dl.acm.org/doi/10.1007/s10766-015-0353-x
Soulé RHirzel MGedik BGrimm R(2016)RiverSoftware—Practice & Experience10.1002/spe.233846:7(891-929)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1002/spe.2338
Belwal MSudarshan TSB (2015)Intermediate representation for heterogeneous multi-core: A survey2015 International Conference on VLSI Systems, Architecture, Technology and Applications (VLSI-SATA)10.1109/VLSI-SATA.2015.7050496(1-6)Online publication date: Jan-2015
https://doi.org/10.1109/VLSI-SATA.2015.7050496
Selva MMorel LMarquet KFrenot S(2015)A Monitoring System for Runtime Adaptations of Streaming ApplicationsProceedings of the 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing10.1109/PDP.2015.53(27-34)Online publication date: 4-Mar-2015
https://dl.acm.org/doi/10.1109/PDP.2015.53
Posadas HNicolás APeñil PVillar EBroekaert FBourdelles MCohen ALazarescu MLavagno LTerechko AGlassee MPrieto M(2014)Improving the design flow for parallel and heterogeneous architectures running real-time applicationsMicroprocessors & Microsystems10.1016/j.micpro.2014.05.00338:8(960-975)Online publication date: 1-Nov-2014
https://dl.acm.org/doi/10.1016/j.micpro.2014.05.003
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents