Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1375657.1375679acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

Optimizing scientific application loops on stream processors

Published: 12 June 2008 Publication History

Abstract

This paper describes a graph coloring compiler framework to allocate on-chip SRF(Stream Register File) storage for optimizing scientific applications on stream processors. Our framework consists of first applying enabling optimizations such as loop unrolling to expose stream reuse and opportunities for maximizing parallelism, i.e., overlapping kernel execution and memory transfers.Then the three SRF management tasks are solved in a unified manner via graph coloring: (1) placing streams in the SRF, (2) exploiting stream use, and (3) maximizing parallelism. We evaluate the performance of our compiler framework by actually running nine representative scientific computing kernels on our FT64 stream processor. Our preliminary results show that compiler management achieves an average speedup of 2.3x compared to First-Fit allocation. In comparison with the performance results obtained from running these benchmarks on Itanium 2, an average speedup of 2.1x is observed.

References

[1]
Sitij Agrawal, William Thies, and Saman Amarasinghe. Optimizing stream programs using linear state space analysis. In phCASES '05: Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, pages 126--136, New York, NY, USA, 2005. ACM Press.
[2]
Preston Briggs, Keith D. Cooper, and Linda Torczon. Improvements to graph coloring register allocation. phACM Trans. Program. Lang. Syst., 16 (3): 428--455, 1994.
[3]
G. J. Chaitin. Register allocation & spilling via graph coloring. In phSIGPLAN '82: Proceedings of the 1982 SIGPLAN symposium on Compiler construction, pages 98--101, New York, NY, USA, 1982.
[4]
Sourav Chatterji, Manikandan Narayanan, Jason Duell, and Leonid Oliker. Performance evaluation of two emerging media processors: Viram and imagine. In phIPDPS '03: Proceedings of the 17th International Symposium on Parallel and Distributed Processing, page 229.1, Washington, DC, USA, 2003. IEEE Computer Society.
[5]
Fred C. Chow and John L. Hennessy. The priority-based coloring approach to register allocation. phACM Trans. Program. Lang. Syst., 12 (4): 501--536, 1990.
[6]
William J. Dally, Francois Labonte, Abhishek Das, Patrick Hanrahan, Jung-Ho Ahn, Jayanth Gummaraju, Mattan Erez, Nuwan Jayasena, Ian Buck, Timothy J. Knight, and Ujval J. Kapasi. Merrimac: Supercomputing with streams. In phSC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, page 35, 2003.
[7]
Abhishek Das, William J. Dally, and Peter Mattson. Compiling for stream processing. In phPACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pages 33--42, New York, NY, USA, 2006. ACM.
[8]
Janet Fabri. Automatic storage optimization. phSIGPLAN Not., 14 (8): 83--91, 1979.
[9]
Lal George and Andrew W. Appel. Iterated register coalescing. phACM Trans. Program. Lang. Syst., 18 (3): 300--324, 1996.
[10]
Jordan Gergov. Algorithms for compile-time memory optimization. In phSODA '99: Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, pages 907--908, Philadelphia, PA, USA, 1999.
[11]
Michael I. Gordon, William Thies, and Saman Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In phASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 151--162, New York, NY, USA, 2006. ACM.
[12]
Jayanth Gummaraju and Mendel Rosenblum. Stream programming on general-purpose processors. In phMICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, pages 343--354, Washington, DC, USA, 2005.
[13]
H. A. Kierstead. A polynomial time approximation algorithm for dynamic storage allocation. phDiscrete Math., 87 (2-3): 231--237, 1991.
[14]
Timothy J. Knight, Ji Young Park, Manman Ren, Mike Houston, Mattan Erez, Kayvon Fatahalian, Alex Aiken, William J. Dally, and Pat Hanrahan. Compilation for explicitly managed memory hierarchies. In phPPoPP '07: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 226--236, New York, NY, USA, 2007.
[15]
Francois Labonte, Peter Mattson, William Thies, Ian Buck, Christos Kozyrakis, and Mark Horowitz. The stream virtual machine. In phPACT '04: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pages 267--277, Washington, DC, USA, 2004. IEEE Computer Society.
[16]
V. Lefebvre and P. Feautrier. Storage management in parallel programs. Technical report, Laboratory PRiSM, University of Versailles, France, 1996.
[17]
Lian Li, Lin Gao, and Jingling Xue. Memory coloring: A compiler approach for scratchpad memory management. In phPACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 329--338, Washington, DC, USA, 2005. IEEE Computer Society.
[18]
Lian Li, Quan Hoang Nguyen, and Jingling Xue. Scratchpad allocation for data aggregates in superperfect graphs. volume 42, pages 207--216, New York, NY, USA, 2007. ACM.
[19]
Peter Mattson, William J. Dally, Scott Rixner, Ujval J. Kapasi, and John D. Owens. Communication scheduling. phSIGARCH Comput. Archit. News, 28 (5): 82--92, 2000.
[20]
Peter Raymond Mattson. phA programming system for the imagine media processor. PhD thesis, Stanford University, Stanford, CA, USA, 2002. Adviser-William J. Dally.
[21]
John D. Owens. phComputer Graphics on a Stream Architecture. PhD thesis, Stanford University, November 2002.
[22]
John D. Owens, Ujval J. Kapasi, Peter Mattson, Brian Towles, Ben Serebrin, Scott Rixner, and William J. Dally. Media processing applications on the imagine stream processor. In phProceedings of the IEEE International Conference on Computer Design, pages 295--302, September 2002.
[23]
Jinpyo Park and Soo-Mook Moon. Optimistic register coalescing. phACM Trans. Program. Lang. Syst., 26 (4): 735--765, 2004.
[24]
Michael D. Smith, Norman Ramsey, and Glenn Holloway. A generalized algorithm for graph-coloring register allocation. In phPLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, pages 277--288, New York, NY, USA, 2004. ACM.
[25]
Michael Bedford Taylor, Jason Kim, and Jason Miller et al. The raw microprocessor: A computational fabric for software circuits and general-purpose programs. phIEEE Micro, 22 (2): 25--35, 2002.
[26]
W. Thies, M. Karczmarek, M. Gordon, D. Maze, J. Wong, H. Ho, M. Brown, and S. Amarasinghe. StreamIt: A compiler for streaming applications, December 2001. MIT-LCS Technical Memo TM-622, Cambridge, MA.
[27]
Shih wei Liao, Zhaohui Du, Gansha Wu, and Guei-Yuan Lueh. Data and computation transformations for brook streaming applications on multiprocessors. In phCGO '06: Proceedings of the International Symposium on Code Generation and Optimization, pages 196--207, Washington, DC, USA, 2006. IEEE Computer Society.
[28]
Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, and Katherine Yelick. The potential of the cell processor for scientific computing. In phCF '06: Proceedings of the 3rd conference on Computing frontiers, pages 9--20, New York, NY, USA, 2006. ACM.
[29]
Nan Wu, Mei Wen, Ju Ren, Yi He, and Chunyuan Zhang. Register allocation on stream processor with local register file. In phACSAC '06: Proceedings of the 11th Asia-Pacific Computer Systems Architecture Conference, pages 545--551, 2006.
[30]
J. Xue. phLoop Tiling for Parallelism. Kluwer Academic Publishers, Boston, 2000.
[31]
Xuejun Yang, Xiaobo Yan, Zuocheng Xing, Yu Deng, Jiang Jiang, and Ying Zhang. A 64-bit stream processor architecture for scientific applications. In phISCA '07: Proceedings of the 34th annual international symposium on Computer architecture, pages 210--219, 2007.

Cited By

View all
  • (2018)Optimizing modulo scheduling to achieve reuse and concurrency for stream processorsThe Journal of Supercomputing10.1007/s11227-010-0522-z59:3(1229-1251)Online publication date: 31-Dec-2018
  • (2012)Storage Allocation for Streaming-Based Register FileEnergy-Aware Memory Management for Embedded Multimedia Systems10.1201/b11418-6(151-194)Online publication date: 4-Jan-2012
  • (2012)Comparability Graph Coloring for Optimizing Utilization of Software-Managed Stream Register Files for Stream ProcessorsACM Transactions on Architecture and Code Optimization10.1145/2133382.21333879:1(1-30)Online publication date: 1-Mar-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
LCTES '08: Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
June 2008
180 pages
ISBN:9781605581040
DOI:10.1145/1375657
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 43, Issue 7
    LCTES '08
    July 2008
    167 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1379023
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data reuse
  2. graph coloring
  3. loop optimization
  4. prefetching
  5. software-managed cache
  6. stream processor
  7. streaming

Qualifiers

  • Research-article

Conference

Acceptance Rates

Overall Acceptance Rate 116 of 438 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Optimizing modulo scheduling to achieve reuse and concurrency for stream processorsThe Journal of Supercomputing10.1007/s11227-010-0522-z59:3(1229-1251)Online publication date: 31-Dec-2018
  • (2012)Storage Allocation for Streaming-Based Register FileEnergy-Aware Memory Management for Embedded Multimedia Systems10.1201/b11418-6(151-194)Online publication date: 4-Jan-2012
  • (2012)Comparability Graph Coloring for Optimizing Utilization of Software-Managed Stream Register Files for Stream ProcessorsACM Transactions on Architecture and Code Optimization10.1145/2133382.21333879:1(1-30)Online publication date: 1-Mar-2012
  • (2011)Loop fusion and reordering for register file optimization on stream processorsProceedings of the 2011 ACM Symposium on Applied Computing10.1145/1982185.1982306(560-565)Online publication date: 21-Mar-2011
  • (2010)Reuse-aware modulo scheduling for stream processorsProceedings of the Conference on Design, Automation and Test in Europe10.5555/1870926.1871197(1112-1117)Online publication date: 8-Mar-2010
  • (2010)Reuse-aware modulo scheduling for stream processors2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)10.1109/DATE.2010.5456975(1112-1117)Online publication date: Mar-2010
  • (2010)Managing Data-Objects in Dynamically Reconfigurable CachesJournal of Computer Science and Technology10.1007/s11390-010-9320-625:2(232-245)Online publication date: 16-Mar-2010
  • (2009)SARAProceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis10.1145/1629435.1629442(41-50)Online publication date: 11-Oct-2009
  • (2009)Comparability graph coloring for optimizing utilization of stream register files in stream processorsACM SIGPLAN Notices10.1145/1594835.150419544:4(111-120)Online publication date: 14-Feb-2009
  • (2009)Comparability graph coloring for optimizing utilization of stream register files in stream processorsProceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming10.1145/1504176.1504195(111-120)Online publication date: 14-Feb-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media