Article

Computational power of pipelined memory hierarchies

Authors:

Gianfranco Bilardi,

Kattamuri Ekanadham,

Pratap PattnaikAuthors Info & Claims

SPAA '01: Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures

Pages 144 - 152

https://doi.org/10.1145/378580.378615

Published: 03 July 2001 Publication History

Get Access

Abstract

We define a model of computation, called the Pipelined Hierarchical Random Access Machine with access function a (x), denoted the a(x)-PH-RAM. In this model, a processor interacts with a memory which can accept requests at a constant rate and satisfy each of the requests to the location x within a(x) units of time.

We investigate memory management strategies that lead to time efficient implementations of arbitrary computations on a PH-RAM. We begin by developing the so called pipeline d decomposition-treememory management strategy, which can be tuned to the memory access function. Specifically, for a linear or sublinear access function a(x), w e define the concept of latency-hiding depth d_a(x) and show ho w an y computation of N operations can be implemented on an a(x)-PH-RAM in time T(N) = Ο(Nd_a(N)). In particular, T(N) = Ο(N log N) if a(x) = Ο(x), T(N) = Ο(N log log N) if a(x) = Ο(x^Β) with 0 < Β < 1, and T(N) = O(N log* N) if a(x) = Ο(log x).

We develop lower bound techniques that allow to establish existential lower bounds on PH-RAMs. In particular, we exhibit computations for which T(N) = Ω(Nlog N/ log log N) when a(x) = Ω(x), T(N) = Ω(Nlog logN) when a(x) = Ω(x^Β) with 0 < Β < 1, and T(N) = Ω(N log* N) when a(x) = Ω(log x).

The stated lower bounds show that the pipelined decomposition-tree strategy is existentially optimal for the latter case but indicates the potential for a modest, Ο(log log N) improvement for linear access functions. To realize this potential, a superpipelined decomposition-tree memory manager is proposed, which achieves T(N) = Ο(N log N/log log N).

The pipelined decomposition-tree strategy can also be tuned to the computation, in order to exploit its temporal locality as characterized by the width parameters [9]. When the latter are suitably bounded, then T(N) = Ο(N) on any PH-RAM with linear or sublinear access function. Finally, we discuss how performance could benefit from parallelism in the data-dependence dag of the computation or from architectural enhancements, such as block-transfer primitives, and formulate various questions that deserve further investigation.

References

[1]

A. Aggarwal, B. Alpern, A.K. Chandra and M. Snir. A Model for Hierarchical Memory. InProc. of the 19th ACM Symposium on Theory of Computing, (1987), 305-314.]]

Abstract

References

Cited By

Index Terms

Recommendations

On the computational power of WECPAR

Optimal Deterministic Sorting in Parallel Memory Hierarchies

Pipelined parallel computations, and sorting on a pipelined hypercube.

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations