Compiler techniques for scalable performance of stream programs on multicore architectures

January 2010

Author:
Michael I. Gordon
Massachusetts Institute of Technology
,
Adviser:
Saman Amarasinghe
Massachusetts Institute of Technology

Publisher:

Massachusetts Institute of Technology
201 Vassar Street, W59-200 Cambridge, MA
United States

Order Number:AAI0822890

Pages:

Purchase on ProQuest

Bibliometrics

Abstract

Given the ubiquity of multicore processors, there is an acute need to enable the development of scalable parallel applications without unduly burdening programmers. Currently, programmers are asked not only to explicitly expose parallelism but also concern themselves with issues of granularity, load-balancing, synchronization, and communication. This thesis demonstrates that when algorithmic parallelism is expressed in the form of a stream program, a compiler can effectively and automatically manage the parallelism. Our compiler assumes responsibility for low-level architectural details, transforming implicit algorithmic parallelism into a mapping that achieves scalable parallel performance for a given multicore target.

Stream programming is characterized by regular processing of sequences of data, and it is a natural expression of algorithms in the areas of audio, video, digital signal processing, networking, and encryption. Streaming computation is represented as a graph of independent computation nodes that communicate explicitly over data channels. Our techniques operate on contiguous regions of the stream graph where the input and output rates of the nodes are statically determinable. Within a static region, the compiler first automatically adjusts the granularity and then exploits data, task, and pipeline parallelism in a holistic fashion. We introduce techniques that data-parallelize nodes that operate on overlapping sliding windows of their input, translating serializing state into minimal and parametrized inter-core communication. Finally, for nodes that cannot be data-parallelized due to state, we are the first to apply software-pipelining techniques at a coarse granularity to exploit pipeline parallelism between stateful nodes.

Our framework is evaluated in the context of the StreamIt programming language. StreamIt is a high-level stream programming language that has been shown to improve programmer productivity in implementing streaming algorithms. We employ the StreamIt Core benchmark suite of 12 real-world applications to demonstrate the effectiveness of our techniques for varying multi-core architectures. For a 16-core distributed memory multicore, we achieve a 14.9x mean speedup. For benchmarks that include sliding-window computation, our sliding-window data-parallelization techniques are required to enable scalable performance for a 16-core SMP multicore (14x mean speedup) and a 64-core distributed shared memory multicore (52x mean speedup). (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

Cited By

Contributors

Saman Amarasinghe
Massachusetts Institute of Technology
- Publication Years1993 - 2024
- Publication counts133
- Citation count17,072
- Available for Download139
- Downloads (cumulative)204,533
- Downloads (12 months)38,313
- Downloads (6 weeks)3,018
- Average Downloads per Article1,471
- Average Citation per Article128
View Full Profile
Michael I Gordon
Massachusetts Institute of Technology
- Publication Years2002 - 2015
- Publication counts8
- Citation count3,327
- Available for Download12
- Downloads (cumulative)19,051
- Downloads (12 months)307
- Downloads (6 weeks)30
- Average Downloads per Article1,588
- Average Citation per Article416
View Full Profile

Comments

Recommendations

Efficient Compilation of Stream Programs for Heterogeneous Architectures: A Model-Checking based approach
SCOPES '15: Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems

Stream programming based on the synchronous data flow (SDF) model naturally exposes data, task and pipeline parallelism. Statically scheduling stream programs for homogeneous architectures has been an area of extensive research. With graphic processing ...
Massively LDPC Decoding on Multicore Architectures

Unlike usual VLSI approaches necessary for the computation of intensive Low-Density Parity-Check (LDPC) code decoders, this paper presents flexible software-based LDPC decoders. Algorithms and data structures suitable for parallel computing are proposed ...
Synergistic execution of stream programs on multicores with accelerators
LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multicore architectures. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on accelerators such as ...

Browse Theses

Sections

Cited By

Efficient Compilation of Stream Programs for Heterogeneous Architectures: A Model-Checking based approach

Massively LDPC Decoding on Multicore Architectures

Synergistic execution of stream programs on multicores with accelerators

Sections

Cited By

Save to Binder

Recommendations

Efficient Compilation of Stream Programs for Heterogeneous Architectures: A Model-Checking based approach

Massively LDPC Decoding on Multicore Architectures

Synergistic execution of stream programs on multicores with accelerators