Abstract
We define a model of computation, called the Pipelined Hierarchical Random Access Machine with access function a (x), denoted the a(x)-PH-RAM. In this model, a processor interacts with a memory which can accept requests at a constant rate and satisfy each of the requests to the location x within a(x) units of time.
We investigate memory management strategies that lead to time efficient implementations of arbitrary computations on a PH-RAM. We begin by developing the so called pipeline d decomposition-treememory management strategy, which can be tuned to the memory access function. Specifically, for a linear or sublinear access function a(x), w e define the concept of latency-hiding depth da(x) and show ho w an y computation of N operations can be implemented on an a(x)-PH-RAM in time T(N) = Ο(Nda(N)). In particular, T(N) = Ο(N log N) if a(x) = Ο(x), T(N) = Ο(N log log N) if a(x) = Ο(xΒ) with 0 < Β < 1, and T(N) = O(N log* N) if a(x) = Ο(log x).
We develop lower bound techniques that allow to establish existential lower bounds on PH-RAMs. In particular, we exhibit computations for which T(N) = Ω(Nlog N/ log log N) when a(x) = Ω(x), T(N) = Ω(Nlog logN) when a(x) = Ω(xΒ) with 0 < Β < 1, and T(N) = Ω(N log* N) when a(x) = Ω(log x).
The stated lower bounds show that the pipelined decomposition-tree strategy is existentially optimal for the latter case but indicates the potential for a modest, Ο(log log N) improvement for linear access functions. To realize this potential, a superpipelined decomposition-tree memory manager is proposed, which achieves T(N) = Ο(N log N/log log N).
The pipelined decomposition-tree strategy can also be tuned to the computation, in order to exploit its temporal locality as characterized by the width parameters [9]. When the latter are suitably bounded, then T(N) = Ο(N) on any PH-RAM with linear or sublinear access function. Finally, we discuss how performance could benefit from parallelism in the data-dependence dag of the computation or from architectural enhancements, such as block-transfer primitives, and formulate various questions that deserve further investigation.