Withdraw
Loading…
Architecting, programming, and evaluating an on-chip incoherent multi-processor memory hierarchy
Kim, Wooil
Loading…
Permalink
https://hdl.handle.net/2142/89128
Description
- Title
- Architecting, programming, and evaluating an on-chip incoherent multi-processor memory hierarchy
- Author(s)
- Kim, Wooil
- Issue Date
- 2015-12-02
- Director of Research (if dissertation) or Advisor (if thesis)
- Torrellas, Josep
- Doctoral Committee Chair(s)
- Torrellas, Josep
- Committee Member(s)
- Snir, Marc
- Padua, David
- Gropp, William
- Sadayappan, P.
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- incoherent cache hierarchy
- scratchpad hierarchy
- compiler-directed coherence
- Runnemede architecture
- Abstract
- New architectures for extreme-scale computing need to be designed for higher energy efficiency than current systems. The DOE-funded Traleika Glacier architecture is a recently-proposed extreme-scale manycore that radically simplifies the architecture, and proposes a cluster-based on-chip memory hierarchy without hardware cache coherence. Programming for such an environment, which can use scratchpads or incoherent caches, is challenging. Hence, this thesis focuses on architecting, programming, and evaluating an on-chip incoherent multiprocessor memory hierarchy. This thesis starts by examining incoherent multiprocessor caches. It proposes ISA support for data movement in such an environment, and two relatively user-friendly programming approaches that use the ISA. The ISA support is largely based on writeback and self-invalidation instructions, while the programming approaches involve shared-memory programming either inside a cluster only, or across clusters. The thesis also includes compiler transformations for such an incoherent cache hierarchy. Our simulation results show that, with our approach, the execution of applications on incoherent cache hierarchies can deliver reasonable performance. For execution within a cluster, the average execution time of our applications is only 2% higher than with hardware cache coherence. For execution across multiple clusters, our applications run on average 20% faster than a naive scheme that pushes all the data to the last-level shared cache. Compiler transformations for both regular and irregular applications are shown to deliver substantial performance increases. This thesis then considers scratchpads. It takes the design in the Traleika Glacier architecture and performs a simulation-based evaluation. It shows how the hardware exploits available concurrency from parallel applications. However, it also shows the limitations of the current software stack, which lacks smart memory management and high-level hints for the scheduler.
- Graduation Semester
- 2015-12
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/89128
- Copyright and License Information
- Copyright 2015 Wooil Kim
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…