Article

Phi-Predication for light-weight if-conversion

Authors:

Jeanne FerranteAuthors Info & Claims

CGO '03: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization

Pages 179 - 190

Published: 23 March 2003 Publication History

Abstract

Predicated execution can eliminate hard to predict branches and help to enable instruction level parallelism. Many current predication variants exist where the result update is conditional based upon the outcome of the guarding predicate. However, conditional writing of a register creates a naming problem for an out-of-order processor, and can stall the issuing of instructions. This problem arises from potential multiple predicated definitions reaching a use, which is unresolved until the prior predicate values are computed.In this paper we focus on a light-weight form of predication, Phi-Predication, where all predicated instructions write a result value to their register regardless of the predicate value (i.e. even if it is false). Therefore, the predicate does not guard the writing of the result register; it instead acts as a form of selection between two input registers. This eliminates the naming problem for an out-of-order processor. Our Phi-Predicated ISA is derived from the predicated features of the Multiflow ISA, with extensions to efficiently predicate complex control flow. Our compiler modifications also expand upon prior techniques to provide efficient code generation. We examine the use of Phi-Predication for an in-order and out-of-order architecture and compare its performance to using select-op and IA64 ISA predication.

References

[1]

J. Bharadwaj, W. Chen, W. Chuang, G. Hoflehner, K. Menezes, K. Muthukumar, and J. Pierce. The Intel IA-64 Compiler Code Generator. IEEE Micro, 20(5):44--52, Sept 2000.

Digital Library

[2]

D. C. Burger and T. M. Austin. The Simplescalar Tool Set, version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin, Madison, Jun 1997.

[3]

Y. Choi, A. Knies, L. Gerke, and T. E Ngai. The impact of if-conversion on branch prediction and program execution on the intel itanium processor, in Proceedings of the 34th Annual International Symposium on Microarchitecture, pages 182--191, Dec 2001.

Digital Library

[4]

G. Chrysos and J. Emer. Memory dependence prediction using store sets. In Proceedings of the 25th International Symposium on Computer Architecture, June 1998.

Digital Library

[5]

R. P. Colwell, W. E. Hall, C. S. Joshi, D. B. Papworth, R. K.Rodman, and J. E. Tornes. Architecture and implementation of a vliw supercomputer. In Supercomputer '90, pages 910--919, Nov 1990.

Digital Library

[6]

R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 13(4):451--490, October 1991.

Digital Library

[7]

R. Cytron, J. Ferrante, and V. Sarkar. Compact reprentations for control dependence. In ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation, pages 337--351, june 1990.

Digital Library

[8]

J. C. Dehnert, P. Y. Hsu, and J. P. Bratt. Overlapped loop support in the cydra 5. In Architectural Support for Programming Languages and Operating Systems, pages 26--38, April 1989.

Digital Library

[9]

Intel Itanium Processor Reference Manual for Software Optimization, November 2001. http://developer.intel.com/design/itanium/downloads/245474.htm.

[10]

IA-64 Application Instruction Set Architecture Guide, Revision 1.0, 1999.

[11]

M. E Jacome, G. de Veciana, and S. Pillai. Clustered VLIW architectures with predicated switching. In Design Automation Conference, pages 696--701, 2001.

Digital Library

[12]

V. Kathail, M. S. Schlansker, and B. R. Rau. HPL PlayDoh architecture specification: Version 1.0. Technical Report HPL-93-80, HP Labs, Feb 1994.

[13]

R. Kessler. The Alpha 21264 Microprocessor. IEEE Micro, 19(2):24--36, Mar--Apr 1991.

Digital Library

[14]

P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg. The Multiflow Trace Scheduling compiler. The Journal of Supercomputing, 7(1--2):51--142, May 1993.

Digital Library

[15]

S. A. Mahlke, R. E. Hank, J. E. McCormick, D. I. August, and W. W. Hwu. A comparison of full and partial predicated execution support for ILP processors. In ISCA, pages 138--150, 1995.

Digital Library

[16]

J. C. H. Park and M. Schlansker. On Predicated Execution. Technical Report HPL-91-58, HP Labs, May 1991.

[17]

B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Trowle. The cydra 5 departmental supercomputer: design philosopy, decisions and tradeoffs. Computer, pages 12--35, January 1989.

Digital Library

[18]

M. Schlansker and B. R. Rau. EPIC: An Architecture for Instruction-Level Parallel Procesors. Technical Report HPL-1999-111, HP Labs, 2000.

[19]

H. Sharangpani and K. Aurora. Itanium processor microarchitecture. IEEE Micro, 20(5):24--43, Sept-Oct 2000.

Digital Library

[20]

T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of the 10th International Conference on Architectural Languages and Operating Systems, October 2002. http://www.cse.ucsd.edu/users/calder/simpoint/.

Digital Library

[21]

J. Sias, H. Hunter, and W. Hwu. Enhancing loop buffering of media and telecommunication applications using low-overhead predication. In Proceedings of the 34rd Annual International Symposium on Microarchitecture, December 2001.

Digital Library

[22]

R. L. Sites and R. T. Witek. Alpha AXP Architecture Reference Manual: 2nd Ed. Digital Press, Boston, MA, 1995.

Digital Library

[23]

E. Sprangle and D. Carmean. Increasing processor performance by implementing deeper pipelines. In 29th Annual International Symposium on Computer Architecture, pages 25--36, May 2002.

Digital Library

[24]

P. H. Wang, H. Wang, R. M. Kling, K. Ramakrishnan, and J. P. Shen. Register renaming for dynamic execution of predicated code. In Proceedings of the 7th International Symposium on High Performance Computer Architecture, February 2001.

Digital Library

Cited By

Han KAhn JChoi K(2013)Power-Efficient Predication Techniques for Acceleration of Control Flow Execution on CGRAACM Transactions on Architecture and Code Optimization10.1145/2459316.245931910:2(1-25)Online publication date: 1-May-2013
https://dl.acm.org/doi/10.1145/2459316.2459319
Collingbourne PCadar CKelly PKirsch CHeiser G(2011)Symbolic crosschecking of floating-point and SIMD codeProceedings of the sixth conference on Computer systems10.1145/1966445.1966475(315-328)Online publication date: 10-Apr-2011
https://dl.acm.org/doi/10.1145/1966445.1966475
Hohenauer MEngel FLeupers RAscheid GMeyr HBette GSingh BSciuto D(2008)Retargetable code optimization for predicated executionProceedings of the conference on Design, automation and test in Europe10.1145/1403375.1403734(1492-1497)Online publication date: 10-Mar-2008
https://dl.acm.org/doi/10.1145/1403375.1403734
Show More Cited By

Index Terms

Phi-Predication for light-weight if-conversion
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Garbage collection

Recommendations

Dataflow Predication
MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture

Predication facilitates high-bandwidth fetch and large static scheduling regions, but has typically been too complex to implement comprehensively in out-of-ordermicroarchitectures. This paper describes dataflow predication, which provides per-...
The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication

Predicated execution is a promising architectural feature for exploiting instruction-level parallelism in the presence of control flow. Compiling for predicated execution involves converting program control flow into conditional, or predicated, ...
Software-based branch predication for AMD GPUs

Branch predication is a program transformation technique that combines instructions of multiple branches of an if statement into a straight-line sequence and associates each instruction of the sequence with a predicate. The branch predication improves ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '03: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization

March 2003

349 pages

ISBN:076951913X

General Chairs:
Richard Johnson
Transmeta
,
Tom Conte
NC State University
,
Program Chair:
Wen-mei Hwu
University of Illinois at Urbana-Champaign

Copyright © Copyright (c) 2003 Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Sponsors

IEEE Computer Society TC-uARCH
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

IEEE Computer Society

United States

Publication History

Published: 23 March 2003

Check for updates

Qualifiers

Article

Conference

CGO03

Sponsor:

SIGMICRO

CGO03: First Annual International IEEE/ACM Symposium on Code Generation and Optimization 2003

March 23 - 26, 2003

California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
358
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Han KAhn JChoi K(2013)Power-Efficient Predication Techniques for Acceleration of Control Flow Execution on CGRAACM Transactions on Architecture and Code Optimization10.1145/2459316.245931910:2(1-25)Online publication date: 1-May-2013
https://dl.acm.org/doi/10.1145/2459316.2459319
Collingbourne PCadar CKelly PKirsch CHeiser G(2011)Symbolic crosschecking of floating-point and SIMD codeProceedings of the sixth conference on Computer systems10.1145/1966445.1966475(315-328)Online publication date: 10-Apr-2011
https://dl.acm.org/doi/10.1145/1966445.1966475
Hohenauer MEngel FLeupers RAscheid GMeyr HBette GSingh BSciuto D(2008)Retargetable code optimization for predicated executionProceedings of the conference on Design, automation and test in Europe10.1145/1403375.1403734(1492-1497)Online publication date: 10-Mar-2008
https://dl.acm.org/doi/10.1145/1403375.1403734
Anantaraman ARotenberg E(2006)Non-uniform program analysis & repeatable execution constraintsACM SIGBED Review10.1145/1279711.12797163:1(17-22)Online publication date: 1-Jan-2006
https://dl.acm.org/doi/10.1145/1279711.1279716
Tang YDeng KWang XDou YZhou X(2005)RIMPProceedings of the 6th international conference on Advanced Parallel Processing Technologies10.1007/11573937_10(71-80)Online publication date: 27-Oct-2005
https://dl.acm.org/doi/10.1007/11573937_10
Chuang WCalder BBanerjee UGallivan KGonzalez A(2003)Predicate prediction for efficient out-of-order executionProceedings of the 17th annual international conference on Supercomputing10.1145/782814.782840(183-192)Online publication date: 23-Jun-2003
https://dl.acm.org/doi/10.1145/782814.782840

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents