Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1878921.1878924acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

Erbium: a deterministic, concurrent intermediate representation to map data-flow tasks to scalable, persistent streaming processes

Published: 24 October 2010 Publication History

Abstract

Tuning applications for multicore systems involve subtle concurrency concepts and target-dependent optimizations. This paper advocates for a streaming execution model, called ER, where persistent processes communicate and synchronize through a multi-consumer processing applications, we demonstrate the scalability and efficiency advantages of streaming compared to data-driven scheduling. To exploit these benefits in compilers for parallel languages, we propose an intermediate representation enabling the compilation of data-flow tasks into streaming processes. This intermediate representation also facilitates the application of classical compiler optimizations to concurrent programs.

References

[1]
G. Al-Kadi and A. S. Terechko. A hardware task scheduler for embedded video processing. In Proc. of the 4th Intl. Conf. on High Performance and Embedded Architectures and Compilers (HiPEAC'09), Paphos, Cyprus, Jan. 2009.
[2]
M. Aldinucci, M. Meneghin, and M. Torquati. Efficient Smith-Waterman on multi-core with FastFlow. In Euromicro Intl. Conf. on Parallel, Distributed and Network-Based Processing, pages 195--199, Pisa, Feb. 2010.
[3]
Arvind, R. S. Nikhil, and K. Pingali. I-structures: Data structures for parallel computing. ACM Trans. on Programming Languages and Systems, 11(4):598--632, 1989.
[4]
C. Augonnet, S. Thibault, R. Namyst, and M. Nijhuis. Exploiting the Cell/BE architecture with the StarPU unified runtime system. In Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS'09), pages 329--339, 2009.
[5]
A. Azevedo, C. Meenderinck, B. H. H. Juurlink, A. Terechko, J. Hoogerbrugge, M. Alvarez, and A. Ramírez. Parallel H.264 decoding on an embedded multicore processor. In Proc. of the 4th Intl. Conf. on High Performance and Embedded Architectures and Compilers (HiPEAC'09), Paphos, Cyprus, Jan. 2009.
[6]
P. M. Carpenter, D. Ródenas, X. Martorell, A. Ramırez, and E. Ayguadé. A streaming machine description and programming model. In Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS'07), pages 107--116, Samos, Greece, July 2007.
[7]
P. Caspi and M. Pouzet. Synchronous Kahn networks. In ACM Intl. Conf. on Functional programming (ICFP'96), pages 226--238, 1996.
[8]
A. Cohen, L. Mandel, F. Plateau, and M. Pouzet. Abstraction of clocks in synchronous data-flow systems. In 6th Asian Symp. on Programming Languages and Systems (APLAS 08), Bangalore, India, Dec. 2008.
[9]
I. Corp. Occam Programming Manual. Prentice Hall, 1984.
[10]
D. E. Culler and Arvind. Resource requirements of dataflow programs. In ISCA, pages 141--150, 1988.
[11]
J. B. Dennis and G. R. Gao. An efficient pipelined dataflow processor architecture. In Supercomputing (SC'88), pages 368--373, 1988.
[12]
H. M. et al. Acotes project: Advanced compiler technologies for embedded streaming. Intl. J. of Parallel Programming, 2010. Special issue on European HiPEAC network of excellence member's projects.
[13]
F. L. Fessant and L. Maranget. Compiling join-patterns. Electr. Notes Theor. Comput. Sci., 16(3), 1998.
[14]
C. Fournet and G. Gonthier. The reflexive chemical abstract machine and the join-calculus. In ACM Symp. on Principles of Programming Languages, pages 372--385, St. Petersburg Beach, Florida, Jan. 1996. ACM.
[15]
J. Giacomoni, T. Moseley, and M. Vachharajani. Fastforward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue. In ACM Symp. on Principles and practice of parallel programming (PPoPP'08), pages 43--52, Salt Lake City, Utah, 2008.
[16]
R. Gupta. Exploiting parallelism on a fine-grain MIMD architecture based upon channel queues. Intl. J. of Parallel Programming, 21(3):169--192, 1992.
[17]
W. Haid, L. Schor, K. Huang, I. Bacivarov, and L. Thiele. Efficient execution of Kahn process networks on multi-processor systems using protothreads and windowed FIFOs. In Workshop on Embedded Systems for Real-Time Multimedia (ESTImedia'09), pages 35--44, Grenoble, France, Oct. 2009.
[18]
N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous dataflow programming language Lustre. Proc. of the IEEE, 79(9):1305--1320, Sept. 1991.
[19]
R. H. Halstead, Jr. Multilisp: a language for concurrent symbolic computation. ACM Trans. on Programming Languages and Systems, 7(4):501--538, 1985.
[20]
T. Henriksson and P. van der Wolf. TTL hardware interface: A high-level interface for streaming multiprocessor architectures. In Workshop on Embedded Systems for Real-Time Multimedia (ESTImedia'06), pages 107--112, Seoul, Korea, Oct. 2006.
[21]
C. A. R. Hoare. Communicating Sequential Processes. Prentice-Hall, 1985.
[22]
G. Kahn. The semantics of a simple language for parallel programming. In J. L. Rosenfeld, editor, Information processing, pages 471--475, Stockholm, Sweden, Aug. 1974. North Holland, Amsterdam.
[23]
C. Kim, J.-L. Gaudiot, and W. Proskurowski. Parallel computing with the sisal applicative language: Programmability and performance issues. Software, Practice and Experience, 26(9):1025--1051, 1996.
[24]
C. Kyriacou, P. Evripidou, and P. Trancoso. Data-driven multithreading using conventional microprocessors. IEEE Trans. on Parallel Distributed Systems, 17(10):1176--1188, 2006.
[25]
E. A. Lee and D. G. Messerschmitt. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. on Computers, 36(1):24--25, 1987.
[26]
E. A. Lee and A. L. Sangiovanni-Vincentelli. A framework for comparing models of computation. IEEE Trans. on CAD of Integrated Circuits and Systems, 17(12):1217--1229, 1998.
[27]
K. H. R. M. Frigo, C. E. Leiserson. The implementation of the Cilk-5 multithreaded language. In ACM Symp. on Programming Language Design and Implementation (PLDI'98), pages 212--223, Montreal, Quebec, June 1998.
[28]
V. Marjanovic, J. Labarta, E. Ayguadé, and M. Valero. Effective communication and computation overlap with hybrid MPI/SMPSs. In PPOPP, 2010.
[29]
R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes, i and ii. Inf. Comput., 100(1):1--40 and 41--77, 1992.
[30]
M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient deterministic multithreading in software. In The Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, Washington, DC, Mar 2009.
[31]
G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In IEEE Intl. Symp. on Microarchitecture (MICRO'05), pages 105--118, 2005.
[32]
J. M. Pérez, P. Bellens, R. M. Badia, and J. Labarta. CellSs: Making it easier to program the cell broadband engine processor. IBM Journal of Research and Development, 51(5):593--604, 2007.
[33]
J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta. Hierarchical task-based programming with starss. Intl. J. on High Performance Computing Architecture, 23(3):284--299, 2009.
[34]
A. Pop and A. Cohen. A stream-comptuting extension to OpenMP. In Proc. of the 4th Intl. Conf. on High Performance and Embedded Architectures and Compilers (HiPEAC'11), Jan. 2011.
[35]
A. Pop, S. Pop, and J. Sjödin. Automatic streamization in GCC. In GCC Developer's Summit, Montreal, Quebec, June 2009.
[36]
M. C. Rinard and M. S. Lam. The design, implementation, and evaluation of Jade. ACM Trans. on Programming Languages and Systems, 20(3):483--545, 1998.
[37]
M. Själander, A. Terechko, and M. Duranton. A look-ahead task management unit for embedded multi-core architectures. In Proc. of the 2008 11th EUROMICRO Conf. on Digital System Design Architectures, Parma, Italy, Sept. 2008.
[38]
K. Stavrou, M. Nikolaides, D. Pavlou, S. Arandi, P. Evripidou, and P. Trancoso. Tflux: A portable platform for data-driven multithreading on commodity multicore systems. In Intl. Conf. on Parallel Processing (ICPP'08), pages 25--34, Portland, Oregon, Sept. 2008.
[39]
S. Stuijk. Concurrency in computational networks. Master's thesis, Technische Universiteit Eindhoven (TU/e), Oct. 2002. # 446407.
[40]
W. Thies and S. Amarasinghe. An empirical characterization of stream programs and its implications for language and compiler design. In Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'10), Vienna, Austria, Sept. 2010.
[41]
W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A language for streaming applications. In Intl. Conf. on Compiler Construction, Grenoble, France, Apr. 2002.
[42]
I. Watson and J. R. Gurd. A practical data flow computer. IEEE Computer, 15(2):51--57, 1982.

Cited By

View all
  • (2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
  • (2021)Intermediate Representations for Explicitly Parallel ProgramsACM Computing Surveys10.1145/345229954:5(1-24)Online publication date: 25-May-2021
  • (2019)Efficient and Scalable Execution of Fine-Grained Dynamic Linear PipelinesACM Transactions on Architecture and Code Optimization10.1145/330741116:2(1-26)Online publication date: 18-Apr-2019
  • Show More Cited By

Index Terms

  1. Erbium: a deterministic, concurrent intermediate representation to map data-flow tasks to scalable, persistent streaming processes

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CASES '10: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
    October 2010
    276 pages
    ISBN:9781605589039
    DOI:10.1145/1878921
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • CEDA
    • IEEE CAS
    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 October 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. compiler intermediate representation
    2. data flow
    3. kahn networks
    4. runtime system
    5. shared memory
    6. stream

    Qualifiers

    • Research-article

    Conference

    ESWeek '10
    ESWeek '10: Sixth Embedded Systems Week
    October 24 - 29, 2010
    Arizona, Scottsdale, USA

    Acceptance Rates

    Overall Acceptance Rate 52 of 230 submissions, 23%

    Upcoming Conference

    ESWEEK '24
    Twentieth Embedded Systems Week
    September 29 - October 4, 2024
    Raleigh , NC , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
    • (2021)Intermediate Representations for Explicitly Parallel ProgramsACM Computing Surveys10.1145/345229954:5(1-24)Online publication date: 25-May-2021
    • (2019)Efficient and Scalable Execution of Fine-Grained Dynamic Linear PipelinesACM Transactions on Architecture and Code Optimization10.1145/330741116:2(1-26)Online publication date: 18-Apr-2019
    • (2018)Unifying Fixed Code Mapping, Communication, Synchronization and Scheduling Algorithms for Efficient and Scalable Loop PipeliningIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.281720729:9(2136-2149)Online publication date: 1-Sep-2018
    • (2017)Fastflow: High‐Level and Efficient Streaming on MulticoreProgramming multi‐core and many‐core computing systems10.1002/9781119332015.ch13(261-280)Online publication date: 27-Jan-2017
    • (2016)SCnCInternational Journal of Parallel Programming10.1007/s10766-015-0353-x44:2(233-256)Online publication date: 1-Apr-2016
    • (2016)RiverSoftware—Practice & Experience10.1002/spe.233846:7(891-929)Online publication date: 1-Jul-2016
    • (2015)Intermediate representation for heterogeneous multi-core: A survey2015 International Conference on VLSI Systems, Architecture, Technology and Applications (VLSI-SATA)10.1109/VLSI-SATA.2015.7050496(1-6)Online publication date: Jan-2015
    • (2015)A Monitoring System for Runtime Adaptations of Streaming ApplicationsProceedings of the 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing10.1109/PDP.2015.53(27-34)Online publication date: 4-Mar-2015
    • (2014)Improving the design flow for parallel and heterogeneous architectures running real-time applicationsMicroprocessors & Microsystems10.1016/j.micpro.2014.05.00338:8(960-975)Online publication date: 1-Nov-2014
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media