Improved multithreading techniques for hiding communication latency in multiprocessors

B Boothe, A Ranade - Proceedings of the 19th Annual International …, 1992 - dl.acm.org
B Boothe, A Ranade
Proceedings of the 19th Annual International Symposium on Computer Architecture, 1992dl.acm.org
Shared memory multiprocessors are considered among the easiest parallel computers to
program. However building shared memory machines with thousands of processors has
proved difficult because of the inevitably long memory latencies. Much previous research
has focused on cache coherency techniques, but it remains unclear if caches can obtain
sufficiently high hit rates. In this paper we present improved multithreading techniques that
can easily tolerate latencies of hundreds of cycles, and yet only require a small number of …
Shared memory multiprocessors are considered among the easiest parallel computers to program. However building shared memory machines with thousands of processors has proved difficult because of the inevitably long memory latencies. Much previous research has focused on cache coherency techniques, but it remains unclear if caches can obtain sufficiently high hit rates. In this paper we present improved multithreading techniques that can easily tolerate latencies of hundreds of cycles, and yet only require a small number of threads per processor. High performance is achieved by introducing an explicit context switch instruction that can be used by a simple optimizing compiler to group together several shared accesses. This grouping of shared accesses dramatically reduces the frequency of context switches compared to simpler multithreading models. The combination of our techniques achieves efficiencies of 80% or higher on a broad set of applications.
ACM Digital Library