Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2742854.2742872acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Scheduling stream programs with improving arithmetic unit usage on NoC-based VLIW multi-core architectures

Published: 06 May 2015 Publication History

Abstract

Stream programming model has received a lot of interest due to its naturally-exposed task, data and pipeline parallelism. Many researches concentrated on scheduling stream programs on multi-core systems. However, few of them consider the arithmetic unit utilization, which is a vital factor to determine the performance of multi-core systems. This paper focuses on scheduling stream programs on NoC-based VLIW multi-core architectures, aiming at improving the performance through increasing the arithmetic unit utilization. Three phases are proposed for the scheme. First, the stream program is replicated into multiple threads for providing enough parallel kernels. Second, parallel kernels are grouped and operators of each kernel group are scheduled together for high arithmetic unit utilization. Third, a hierarchical integer linear programming (ILP)-based methodology is proposed to map kernel groups onto each core for optimizing the maximum workload. A set of benchmarks are exploited for evaluation. Experimental results show that, compared with two other existing scheduling schemes, our proposed scheme can significantly improve the performance of the multi-core processor.

References

[1]
A. Das, W. J. Dally, and P. Mattson. Compiling for stream processing. In Proc. Int. Conf. Parallel architectures and compilation techniques (PACT), pages 33--42, 2006.
[2]
P. Matton et al. Communication scheduling. In Proc. Int. Conf. Architectural support for programming languages and operating systems (ASPLOS), pages 82--92, 2000.
[3]
W. Thies et al. StreamIt: a language for streaming applications. In Proc. Int. Conf. on Compiler Construction (CC), pages 179--196, 2002.
[4]
D. Kirk. NVDIA CUDA software and GPU parallel computing architecture. In Proc. Int. Conf. Memory Management (ISMM), page 103, 2007.
[5]
M. B. Taylor et al. The Raw microprocessor: a computational fabric for software circuits and general-purpose programs. IEEE Micro, 22(15): 25--35, Mar./Apr. 2002.
[6]
H. P. Hofstee. Power efficient processor design and the Cell processor. In Proc. Int. Conf. High-Performance Computer Architecture (HPCA), pages 258--262, 2005.
[7]
J. Nickolls and W. J. Dally. The GPU computing era. IEEE Micro, 30(2): 56--69, Mar./Apr. 2010.
[8]
S. Bell et al. TILE64 processor: A 64-core soc with mesh interconnect. In Proc. Int. Solid-State Circuits Conf. (ISSCC), pages 88--598, 2008.
[9]
S.-W. Liao et al. Data and computation transformations for brook streaming applications on multiprocessors. In Proc. Int. Conf. Code Generation and Optimation (CGO), pages 196--207, 2006.
[10]
Y. Choi, Y. Lin, N. Chong, S. Mahlke, and T. Mudge. Stream compilation for real-time embeded multicores systems. In Proc. Int. Conf. Code Generation and Optimation (CGO), pages 210--220, 2009.
[11]
A. H. Hormati et al. Flextream: Adaptive compilation of streaming applications for heterogeneous architectures. In Int. Conf. Parallel architectures and compilation techniques (PACT), pages 214--223, 2009.
[12]
W. Che and K. S. Chatha. Unrolling and retiming of stream applications onto embedded multicore processors. In Proc. Design Automation Conf. (DAC), pages 1272--1277, 2012.
[13]
Y. Wang, Duo Liu, Zhiwei Qin, and Zili Shao. Optimally Removing Intercore Communication Overhead for Streaming Applications on MPSoCs. IEEE Trans. Comput., 62(2): 336--350, February 2013.
[14]
S. M. Farhad, Yousun Ko, Bernd Burgstaller, and Bernhard Scholz. Orchestration by approximation: Mapping stream programs onto multicore architectures. In Proc. Int. Conf. Architectural support for programming languages and operating systems (ASPLOS), pages 357--368, 2011.
[15]
M. I. Gordon et al. A stream compiler for communication-exposed architectures. In Int. Conf. Architectural support for programming languages and operating systems (ASPLOS), pages 291--303, 2002.
[16]
M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grain task, data, and pipeline parallelism in stream programs. In Proc. Int. Conf. Architectural support for programming languages and operating systems (ASPLOS), pages 151--162, 2006.
[17]
H. Wei et al. StreamTMC: Stream compilation for tiled multi-core architectures. J. Parallel Distrib. Comput., 73(4): 484--494, April 2013.
[18]
Alessio Bonfietti, Michele Lombardi, Michela Milano, and Luca Benini. Maximum-throughput mapping of SDFGs on multi-core SoC platforms. J. Parallel Distrib. Comput., 73(10): 1337--1350, October 2013.
[19]
B. Khailany et al. Imagine: media processing with streams. IEEE Micro, 21(2): 35--46, Mar./Apr. 2001.
[20]
Gurobi Optimization, Houston, TX. Gruobi Solver.
[21]
S. Murali and G. De Micheli. Bandwidth-constrained mapping of cores onto NoC architectures. In Proc. Design Automat. Test Eur. (DATE), pages 896--901, 2004.

Cited By

View all
  • (2022)A Loop Optimization Method for Dataflow Architecture2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00059(202-211)Online publication date: Dec-2022
  • (2016)Run-time phase prediction for a reconfigurable VLIW processorProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2972188(1634-1639)Online publication date: 14-Mar-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '15: Proceedings of the 12th ACM International Conference on Computing Frontiers
May 2015
413 pages
ISBN:9781450333580
DOI:10.1145/2742854
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 May 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. arithmetic unit utilization
  2. multi-core
  3. scheduling
  4. stream programs

Qualifiers

  • Research-article

Conference

CF'15
Sponsor:
CF'15: Computing Frontiers Conference
May 18 - 21, 2015
Ischia, Italy

Acceptance Rates

CF '15 Paper Acceptance Rate 33 of 96 submissions, 34%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A Loop Optimization Method for Dataflow Architecture2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00059(202-211)Online publication date: Dec-2022
  • (2016)Run-time phase prediction for a reconfigurable VLIW processorProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2972188(1634-1639)Online publication date: 14-Mar-2016

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media