The available instruction level parallelism (ILP) is extremely limited within basic blocks of non-numeric programs (1) (2) (3). An effective VLIW or superscalar processor must optimize and schedule instructions across basic block boundaries to achieve higher performance. An effective structure for ILP compilation is the superblock (4). The formation and optimization of superblocks increase ILP available to the scheduler along important execution paths by systematically removing constraints due to the unimportant paths. Superblock scheduling is then applied to extract the available ILP and map it to the processor resources.
The major technique employed to achieve compact superblock schedules is speculative execution. Speculative execution refers to executing an instruction before knowing that its execution is required. Such an instruction will be referred to as a speculative instruction. In the general sense, speculative execution may be engineered at run-time using dynamic scheduling or at compile-time. Superblock techniques utilize compile-time engineered speculative execution, or speculative code motion. A compiler may utilize speculative code motion to achieve higher performance in three major ways. First, in regions of the program where insufficient ILP exists to fully utilize the processor resources, useful instructions may be executed. Second, instructions starting long dependence chains may be executed early to reduce the length of critical paths. Finally, long latency instructions may be initiated early to overlap their execution with useful computation. Speculative execution is generally employed by all aggressive scheduling techniques. For example, Tirumalai et al. showed that modulo scheduling of while loops depends on speculative support to achieve high performance (5). Without speculative support, very little execution overlap between loop iterations is achieved.
This dissertation discusses the problems that must be addressed to perform compile-time speculation for acyclic global scheduling, classifies existing speculation models based upon how they solve these problems and discusses two new compile-time or compiler-controlled speculation models--write-back suppression speculation and safe speculation.
Cited By
- Jahr R, Shehan B, Uhrig S and Ungerer T Static speculation as post-link optimization for the Grid Alu processor Proceedings of the 2010 conference on Parallel processing, (145-152)
- Dimitriou G and Polychronopoulos C Hardware support for multithreaded execution of loops with limited parallelism Proceedings of the 10th Panhellenic conference on Advances in Informatics, (622-632)
- Hsieh C, Conte M, Johnson T, Gyllenhaal J and Hwu W (2019). Optimizing NET Compilers for Improved Java Performance, Computer, 30:6, (67-75), Online publication date: 1-Jun-1997.
- Chekuri C, Johnson R, Motwani R, Natarajan B, Rau B and Schlansker M Profile-driven instruction level parallel scheduling with application to super blocks Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, (58-67)
Index Terms
- Enhancing instruction level parallelism through compiler-controlled speculation
Recommendations
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruction-level parallelism (ILP) and thread-level parallelism (TLP). Wide-issue super-scalar processors exploit ILP by executing multiple instructions from a ...
Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture
This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computation-based processor. The instructions of a queue processor implicitly read and write their operands, ...
Exploiting Java instruction/thread level parallelism with horizontal multithreading
Java bytecodes can be executed with the following three methods: a Java interpretor running on a particular machine interprets bytecodes; a Just-In-Time (JIT) compiler translates bytecodes to the native primitives of the particular machine and the ...