Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/776261.776281acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Phi-Predication for light-weight if-conversion

Published: 23 March 2003 Publication History

Abstract

Predicated execution can eliminate hard to predict branches and help to enable instruction level parallelism. Many current predication variants exist where the result update is conditional based upon the outcome of the guarding predicate. However, conditional writing of a register creates a naming problem for an out-of-order processor, and can stall the issuing of instructions. This problem arises from potential multiple predicated definitions reaching a use, which is unresolved until the prior predicate values are computed.In this paper we focus on a light-weight form of predication, Phi-Predication, where all predicated instructions write a result value to their register regardless of the predicate value (i.e. even if it is false). Therefore, the predicate does not guard the writing of the result register; it instead acts as a form of selection between two input registers. This eliminates the naming problem for an out-of-order processor. Our Phi-Predicated ISA is derived from the predicated features of the Multiflow ISA, with extensions to efficiently predicate complex control flow. Our compiler modifications also expand upon prior techniques to provide efficient code generation. We examine the use of Phi-Predication for an in-order and out-of-order architecture and compare its performance to using select-op and IA64 ISA predication.

References

[1]
J. Bharadwaj, W. Chen, W. Chuang, G. Hoflehner, K. Menezes, K. Muthukumar, and J. Pierce. The Intel IA-64 Compiler Code Generator. IEEE Micro, 20(5):44--52, Sept 2000.
[2]
D. C. Burger and T. M. Austin. The Simplescalar Tool Set, version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin, Madison, Jun 1997.
[3]
Y. Choi, A. Knies, L. Gerke, and T. E Ngai. The impact of if-conversion on branch prediction and program execution on the intel itanium processor, in Proceedings of the 34th Annual International Symposium on Microarchitecture, pages 182--191, Dec 2001.
[4]
G. Chrysos and J. Emer. Memory dependence prediction using store sets. In Proceedings of the 25th International Symposium on Computer Architecture, June 1998.
[5]
R. P. Colwell, W. E. Hall, C. S. Joshi, D. B. Papworth, R. K.Rodman, and J. E. Tornes. Architecture and implementation of a vliw supercomputer. In Supercomputer '90, pages 910--919, Nov 1990.
[6]
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 13(4):451--490, October 1991.
[7]
R. Cytron, J. Ferrante, and V. Sarkar. Compact reprentations for control dependence. In ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation, pages 337--351, june 1990.
[8]
J. C. Dehnert, P. Y. Hsu, and J. P. Bratt. Overlapped loop support in the cydra 5. In Architectural Support for Programming Languages and Operating Systems, pages 26--38, April 1989.
[9]
Intel Itanium Processor Reference Manual for Software Optimization, November 2001. http://developer.intel.com/design/itanium/downloads/245474.htm.
[10]
IA-64 Application Instruction Set Architecture Guide, Revision 1.0, 1999.
[11]
M. E Jacome, G. de Veciana, and S. Pillai. Clustered VLIW architectures with predicated switching. In Design Automation Conference, pages 696--701, 2001.
[12]
V. Kathail, M. S. Schlansker, and B. R. Rau. HPL PlayDoh architecture specification: Version 1.0. Technical Report HPL-93-80, HP Labs, Feb 1994.
[13]
R. Kessler. The Alpha 21264 Microprocessor. IEEE Micro, 19(2):24--36, Mar--Apr 1991.
[14]
P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg. The Multiflow Trace Scheduling compiler. The Journal of Supercomputing, 7(1--2):51--142, May 1993.
[15]
S. A. Mahlke, R. E. Hank, J. E. McCormick, D. I. August, and W. W. Hwu. A comparison of full and partial predicated execution support for ILP processors. In ISCA, pages 138--150, 1995.
[16]
J. C. H. Park and M. Schlansker. On Predicated Execution. Technical Report HPL-91-58, HP Labs, May 1991.
[17]
B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Trowle. The cydra 5 departmental supercomputer: design philosopy, decisions and tradeoffs. Computer, pages 12--35, January 1989.
[18]
M. Schlansker and B. R. Rau. EPIC: An Architecture for Instruction-Level Parallel Procesors. Technical Report HPL-1999-111, HP Labs, 2000.
[19]
H. Sharangpani and K. Aurora. Itanium processor microarchitecture. IEEE Micro, 20(5):24--43, Sept-Oct 2000.
[20]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of the 10th International Conference on Architectural Languages and Operating Systems, October 2002. http://www.cse.ucsd.edu/users/calder/simpoint/.
[21]
J. Sias, H. Hunter, and W. Hwu. Enhancing loop buffering of media and telecommunication applications using low-overhead predication. In Proceedings of the 34rd Annual International Symposium on Microarchitecture, December 2001.
[22]
R. L. Sites and R. T. Witek. Alpha AXP Architecture Reference Manual: 2nd Ed. Digital Press, Boston, MA, 1995.
[23]
E. Sprangle and D. Carmean. Increasing processor performance by implementing deeper pipelines. In 29th Annual International Symposium on Computer Architecture, pages 25--36, May 2002.
[24]
P. H. Wang, H. Wang, R. M. Kling, K. Ramakrishnan, and J. P. Shen. Register renaming for dynamic execution of predicated code. In Proceedings of the 7th International Symposium on High Performance Computer Architecture, February 2001.

Cited By

View all
  • (2013)Power-Efficient Predication Techniques for Acceleration of Control Flow Execution on CGRAACM Transactions on Architecture and Code Optimization10.1145/2459316.245931910:2(1-25)Online publication date: 1-May-2013
  • (2011)Symbolic crosschecking of floating-point and SIMD codeProceedings of the sixth conference on Computer systems10.1145/1966445.1966475(315-328)Online publication date: 10-Apr-2011
  • (2008)Retargetable code optimization for predicated executionProceedings of the conference on Design, automation and test in Europe10.1145/1403375.1403734(1492-1497)Online publication date: 10-Mar-2008
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '03: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
March 2003
349 pages
ISBN:076951913X

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 23 March 2003

Check for updates

Qualifiers

  • Article

Conference

CGO03
Sponsor:

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2013)Power-Efficient Predication Techniques for Acceleration of Control Flow Execution on CGRAACM Transactions on Architecture and Code Optimization10.1145/2459316.245931910:2(1-25)Online publication date: 1-May-2013
  • (2011)Symbolic crosschecking of floating-point and SIMD codeProceedings of the sixth conference on Computer systems10.1145/1966445.1966475(315-328)Online publication date: 10-Apr-2011
  • (2008)Retargetable code optimization for predicated executionProceedings of the conference on Design, automation and test in Europe10.1145/1403375.1403734(1492-1497)Online publication date: 10-Mar-2008
  • (2006)Non-uniform program analysis & repeatable execution constraintsACM SIGBED Review10.1145/1279711.12797163:1(17-22)Online publication date: 1-Jan-2006
  • (2005)RIMPProceedings of the 6th international conference on Advanced Parallel Processing Technologies10.1007/11573937_10(71-80)Online publication date: 27-Oct-2005
  • (2003)Predicate prediction for efficient out-of-order executionProceedings of the 17th annual international conference on Supercomputing10.1145/782814.782840(183-192)Online publication date: 23-Jun-2003

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media