Author:
Description:
Pre-execution is a novel latency-tolerance technique where one or more helper threads run in front of the main computation and trigger long-latency delinquent events early so that the main thread makes forward progress without experiencing stalls. The most important issue in pre-execution is how to construct effective helper threads that quickly get ahead and compute the delinquent events accurately. Since the manual construction of helper threads is error-prone and cumbersome for a programmer, automation of such an onerous task is inevitable for pre-execution to be widely used for a variety of real-world workloads. In this thesis, we study compiler-based pre-execution to construct prefetching helper threads using a source-level compiler. We first introduce various compiler algorithms to optimize the helper threads; program slicing removes noncritical code unnecessary to compute the delinquent loads, prefetch conversion reduces blocking in the helper threads by converting delinquent loads into nonblocking prefetches, and loop parallelization speculatively parallelizes the targeted code region so that more memory accesses are overlapped simultaneously. In addition to these algorithms to expedite the helper threads, we also propose several important algorithms to select the right loops for pre-execution regions and pick up the best thread initiation scheme to invoke helper threads. We implement all these algorithms in the Stanford University Intermediate Format (SUIF) compiler infrastructure to automatically generate effective helper threads at the program source level. Furthermore, we replace the external tools to perform program slicing and offline profiling in our most aggressive compiler framework with static algorithms to reduce the complexity of compiler implementation. We conduct thorough evaluation of the compiler-generated helper threads using a simulator that models the research SMT processor. Our experimental results show compiler-based pre-execution effectively eliminates the cache misses and improves ...
Contributors:
Yeung, Donald ; Digital Repository at the University of Maryland ; University of Maryland (College Park, Md.) ; Electrical Engineering
Year of Publication:
2004-10-22
Document Type:
Dissertation ; [Doctoral and postdoctoral thesis]
Language:
en_US
Subjects:
Engineering ; Electronics and Electrical ; Compiler ; Pre-execution ; Physical Experimentation ; Multithreading ; SMT ; Data Prefetching
DDC:
005 Computer programming, programs & data (computed)
Content Provider:
University of Maryland: Digital Repository (DRUM)
- URL: https://drum.lib.umd.edu/
- Continent: North America
- Country: us
- Latitude / Longitude: 38.986918 / -76.942554 (Google Maps | OpenStreetMap)
- Number of documents: 32,385
- Open Access: 204 (1%)
- Type: Academic publications
- System: DSpace
- Content provider indexed in BASE since:
- BASE URL: https://www.base-search.net/Search/Results?q=coll:ftunivmaryland
My Lists:
My Tags:
Notes: