Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/378580.378615acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
Article

Computational power of pipelined memory hierarchies

Published: 03 July 2001 Publication History

Abstract

We define a model of computation, called the Pipelined Hierarchical Random Access Machine with access function a (x), denoted the a(x)-PH-RAM. In this model, a processor interacts with a memory which can accept requests at a constant rate and satisfy each of the requests to the location x within a(x) units of time.
We investigate memory management strategies that lead to time efficient implementations of arbitrary computations on a PH-RAM. We begin by developing the so called pipeline d decomposition-treememory management strategy, which can be tuned to the memory access function. Specifically, for a linear or sublinear access function a(x), w e define the concept of latency-hiding depth da(x) and show ho w an y computation of N operations can be implemented on an a(x)-PH-RAM in time T(N) = Ο(Nda(N)). In particular, T(N) = Ο(N log N) if a(x) = Ο(x), T(N) = Ο(N log log N) if a(x) = Ο(xΒ) with 0 < Β < 1, and T(N) = O(N log* N) if a(x) = Ο(log x).
We develop lower bound techniques that allow to establish existential lower bounds on PH-RAMs. In particular, we exhibit computations for which T(N) = Ω(Nlog N/ log log N) when a(x) = Ω(x), T(N) = Ω(Nlog logN) when a(x) = Ω(xΒ) with 0 < Β < 1, and T(N) = Ω(N log* N) when a(x) = Ω(log x).
The stated lower bounds show that the pipelined decomposition-tree strategy is existentially optimal for the latter case but indicates the potential for a modest, Ο(log log N) improvement for linear access functions. To realize this potential, a superpipelined decomposition-tree memory manager is proposed, which achieves T(N) = Ο(N log N/log log N).
The pipelined decomposition-tree strategy can also be tuned to the computation, in order to exploit its temporal locality as characterized by the width parameters [9]. When the latter are suitably bounded, then T(N) = Ο(N) on any PH-RAM with linear or sublinear access function. Finally, we discuss how performance could benefit from parallelism in the data-dependence dag of the computation or from architectural enhancements, such as block-transfer primitives, and formulate various questions that deserve further investigation.

References

[1]
A. Aggarwal, B. Alpern, A.K. Chandra and M. Snir. A Model for Hierarchical Memory. InProc. of the 19th ACM Symposium on Theory of Computing, (1987), 305-314.]]
[2]
A. Aggarwal, A.K. Chandra and M. Snir. Hierarchical Memory with Block Transfer. In Proc. 28th Annual Symposium on Foundations of Computer Science, (1987), 204-216.]]
[3]
B. Alpern, L. Carter, E. Feig and T. Selker. The Uniform Memory Hierarchy Model of Computation. In Algorithmica, vol. 12, (1994), 72-129.]]
[4]
N.M. Amato, J. Perdue, A. Pietracaprina, G. Pucci, and M. Mathis. . Predicting performance on SMP's. A case study: The SGI Power Challenge. In Proc. International Parallel and Distributed Processing Symposium, IPDPS 2000, Cancun, MEX, pages 729-737, May 2000.]]
[5]
G. Bilardi and F.P. Preparata. Horizons of Parallel Computation. Journal on Parallel and Distributed Computing, Vol. 27, 172-182, 1995.]]
[6]
G. Bilardi and F. Preparata Processor-time tradeoffs under bounded-speed message propagation. Part II: lower bounds. Theory of Computing Systems, Vol. 30, 523-546, 1997.]]
[7]
G.Bilardi and F. Preparata Processor-time tradeoffs under bounded-speed message propagation. Part II: lower bounds. Theory of Computing Systems, Vol. 32, 531-559, 1999.]]
[8]
G. Bilardi, E. Peserico. An Approach toward an Analytical Characterization of Locality and its Portability. IWIA 2000, International Workshop on Innovative Architectures, Maui, Hawai, January 2001.]]
[9]
G. Bilardi, E. Peserico. A Characterization of Temporal Locality and its Portability Across Memory Hierarchies. ICALP 2001, International Colloquium on Automata, Languages, and Programming, Crete, July 2001.]]
[10]
G. Bilardi, A. Pietracaprina, and P. D'Alberto. On the space and access complexity of computation dags. 26th Workshop on Graph-Theoretic Concepts in Computer Science, Konstanz, Germany, June 2000.]]
[11]
S.A. Cook and R.A. Reckhow. Time Bounded Random Access Machines. Journal of Computer System Science, 7:354-375, 1973.]]
[12]
J. Fotheringham. Dynamic storage allocation in the ATLAS computer, including an automatic use of a backing store. Communication of the ACM, 4, 10, pp. 435-436, (1961).]]
[13]
M. Frigo, C.E. Leiserson, H. Prokop and S. Ramachandran. Cache-Oblivious Algorithms. Proc. 40th Annual Symposium on Foundations of Computer Science, (1999).]]
[14]
F. Luccio and L. Pagli. A model of sequential computation with pipelined access to memory. Mathematical Systems Theory, 26(4), 343-356, 1993.]]
[15]
R.L. Mattson, J.Gecsei, D.R. Slutz, and I.L. Traiger. Evaluation Techniques for Storage Hierarchies. IBM System Journal, No. 2, 78-117, 1970.]]
[16]
V. Milutinovic and M. Valero (Guest Eds.) Special Issue on Cache Memory and Related Problems. IEEE Transactions on Computers, February 1999.]]
[17]
S.S. Przybylski. Cache and Memory Hierarchy Design. A Performance Directed Approach. Morgan Kaufmann Publishers, Inc. Palo Alto, CA 1990.]]
[18]
J.E. Savage. Models of Computation. Exploring the Power of Computing Addison-Wesley, Reading, MA, 1998.]]
[19]
J.S. Vitter. External Memory Algorithms. Invited paper in Proc. 6th Annual European Symposium on Algorithms, (G.Bilardi et al. Eds.), Springer Verlag, Venice, August 1998, 1-25.]]
[20]
M.Wolfe High Performance Compilers for Parallel Computing. Addison-Wesley, 1995.]]
[21]
R.C.Whaley and J.J.Dongarra. Automatically Tuned Linear Algebra Software. http://www.netlib.org/atlas/index.html]]

Cited By

View all
  • (2009)On approximating the ideal random access machine by physical machinesJournal of the ACM (JACM)10.1145/1552285.155228856:5(1-57)Online publication date: 21-Aug-2009
  • (2006)The Speculative Prefetcher and Evaluator Processor for Pipelined Memory HierarchiesInternational Workshop on Innovative Architecture for Future Generation High Performance Processors and Systems (IWIA'06)10.1109/IWIAS.2006.38(29-43)Online publication date: Jan-2006
  • (2005)An Address Dependence Model of Computation for Hierarchical Memories with Pipelined TransferProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 8 - Volume 0910.1109/IPDPS.2005.85Online publication date: 4-Apr-2005
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '01: Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
July 2001
340 pages
ISBN:1581134096
DOI:10.1145/378580
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 July 2001

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SPAA01

Acceptance Rates

SPAA '01 Paper Acceptance Rate 34 of 93 submissions, 37%;
Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2009)On approximating the ideal random access machine by physical machinesJournal of the ACM (JACM)10.1145/1552285.155228856:5(1-57)Online publication date: 21-Aug-2009
  • (2006)The Speculative Prefetcher and Evaluator Processor for Pipelined Memory HierarchiesInternational Workshop on Innovative Architecture for Future Generation High Performance Processors and Systems (IWIA'06)10.1109/IWIAS.2006.38(29-43)Online publication date: Jan-2006
  • (2005)An Address Dependence Model of Computation for Hierarchical Memories with Pipelined TransferProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 8 - Volume 0910.1109/IPDPS.2005.85Online publication date: 4-Apr-2005
  • (2002)Optimal organizations for pipelined hierarchical memoriesProceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures10.1145/564870.564886(109-116)Online publication date: 10-Aug-2002
  • (2001)Pipelined memory hierarchies: scalable organizations and application performanceInnovative Architecture for Future Generation High-Performance Processors and Systems IWIA-0110.1109/IWIA.2001.955196(45-52)Online publication date: 2001
  • (2001)An approach towards an analytical characterization of locality and its portabilityInnovative Architecture for Future Generation High-Performance Processors and Systems IWIA-0110.1109/IWIA.2001.955195(37-44)Online publication date: 2001
  • (2001)A Characterization of Temporal Locality and Its Portability across Memory HierarchiesAutomata, Languages and Programming10.1007/3-540-48224-5_11(128-139)Online publication date: 4-Jul-2001

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media