Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1378533.1378573acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Fundamental parallel algorithms for private-cache chip multiprocessors

Published: 14 June 2008 Publication History

Abstract

In this paper, we study parallel algorithms for private-cache chip multiprocessors (CMPs), focusing on methods for foundational problems that are scalable with the number of cores. By focusing on private-cache CMPs, we show that we can design efficient algorithms that need no additional assumptions about the way cores are interconnected, for we assume that all inter-processor communication occurs through the memory hierarchy. We study several fundamental problems, including prefix sums, selection, and sorting, which often form the building blocks of other parallel algorithms. Indeed, we present two sorting algorithms, a distribution sort and a mergesort. Our algorithms are asymptotically optimal in terms of parallel cache accesses and space complexity under reasonable assumptions about the relationships between the number of processors, the size of memory, and the size of cache blocks. In addition, we study sorting lower bounds in a computational model, which we call the parallel external-memory (PEM) model, that formalizes the essential properties of our algorithms for private-cache CMPs.

References

[1]
A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Commun. ACM, 31(9):1116--1127, 1988.]]
[2]
M. A. Bender, J. T. Fineman, S. Gilbert, and B. C. Kuszmaul. Concurrent cache-oblivious B-trees. In Proc. 17th ACM Sympos. Parallel Algorithms Architect., pages 228--237, New York, NY, USA, 2005. ACM.]]
[3]
G. E. Blelloch, R. A. Chowdhury, P. B. Gibbons, V. Ramachandran, S. Chen, and M. Kozuch. Provably good multicore cache performance for divide-and-conquer algorithms. In Proc. 19th ACM-SIAM Sympos. Discrete Algorithms, 2008.]]
[4]
S. Chen, P. B. Gibbons, M. Kozuch, V. Liaskovitis, A. Ailamaki, G. E. Blelloch, B. Falsafi, L. Fix, N. Hardavellas, T. C. Mowry, and C. Wilkerson. Scheduling threads for constructive cache sharing on cmps. In Proc. 19th ACM Sympos. on Parallel Algorithms Architect., pages 105--115, New York, NY, USA, 2007. ACM.]]
[5]
Y.-J. Chiang, M. T. Goodrich, E. F. Grove, R. Tamassia, D. E. Vengroff, and J. S. Vitter. External-memory graph algorithms. In Proc. 6th ACM-SIAM Sympos. Discrete Algorithms}, pages 139--149, 1995.]]
[6]
R. Cole. Parallel merge sort. SIAM J. Comput., 17(4):770--785, 1988.]]
[7]
S. Cook, C. Dwork, and R. Reischuk. Upper and lower time bounds for parallel random access machines without simultaneous writes. SIAM J. Comput., 15(1):87--97, 1986.]]
[8]
T. H. Cormen and M. T. Goodrich. A bridging model for parallel computation, communication, and I/O. ACM Computing Surveys, 28A(4), 1996.]]
[9]
D. E. Culler, R. M. Karp, D. A. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a realistic model of parallel computation. In Principles Practice of Parallel Programming, pages 1--12, 1993.]]
[10]
P. de la Torre and C. P. Kruskal. A structural theory of recursively decomposable parallel processor-networks. In SPDP'95: Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing, page 570, Washington, DC, USA, 1995. IEEE Computer Society.]]
[11]
F. Dehne, W. Dittrich, D. Hutchinson, and A. Maheshwari. Bulk synchronous parallel algorithms for the external memory model. Theory of Computing Systems, 35(6):567--598, 2002.]]
[12]
D. Geer. Chip Makers Turn to Multicore Processors. IEEE Computer, 38(5):11--13, 2005.]]
[13]
A. V. Gerbessiotis and C. J. Siniolakis. Deterministic sorting and randomized median finding on the BSP model. In Proc. 8th ACM Sympos. Parallel Algorithms Architect.}, pages 223--232, New York, NY, USA, 1996. ACM Press.]]
[14]
M. T. Goodrich. Communication-efficient parallel sorting. SIAM Journal on Computing, 29(2):416--432, 2000.]]
[15]
M. T. Goodrich and S. R. Kosaraju. Sorting on a parallel pointer machine with applications to set expression evaluation. J. ACM, 43(2):331--361, 1996.]]
[16]
M. T. Goodrich, J.-J. Tsay, D. E. Vengroff, and J. S. Vitter. External-memory computational geometry. In Proc. 34th Annu. IEEE Sympos. Found. Comput. Sci., pages 714--723, 1993.]]
[17]
J. JáJá. An Introduction to Parallel Algorithms. Addison-Wesley, Reading, Mass., 1992.]]
[18]
R. M. Karp and V. Ramachandran. Parallel algorithms for shared memory machines. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, pages 869--941. Elsevier/The MIT Press, Amsterdam, 1990.]]
[19]
R. M. Karp, A. Sahay, E. E. Santos, and K. E. Schauser. Optimal broadcast and summation in the LogP model. In SPAA'93: Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures, pages 142--153, New York, NY, USA, 1993. ACM Press.]]
[20]
G. Lowney. Why Intel is designing multi-core processors. https://conferences.umiacs.umd.edu/paa/lowney.pdf.]]
[21]
M. H. Nodine and J. S. Vitter. Deterministic distribution sort in shared and distributed memory multiprocessors. In Proc. 5th ACM Sympos. Parallel Algorithms Architect.}, pages 120--129, 1993.]]
[22]
M. H. Nodine and J. S. Vitter. Greed sort: An optimal sorting algorithm for multiple disks. J. ACM, 42(4):919--933, July 1995.]]
[23]
J. Rattner. Multi-Core to the Masses. Parallel Architectures and Compilation Techniques, 2005. PACT 2005. 14th International Conference on, pages 3--3, 2005.]]
[24]
J. H. Reif. Synthesis of Parallel Algorithms. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993.]]
[25]
L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990.]]
[26]
U. Vishkin. A PRAM-on-chip Vision (Invited Abstract). Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00), 2000.]]
[27]
J. Vitter. External memory algorithms. Proceedings of the 6th Annual European Symposium on Algorithms, pages 1--25, 1998.]]
[28]
J. S. Vitter and M. H. Nodine. Large-scale sorting in uniform memory hierarchies. J. Parallel Distrib. Comput., 17:107--114, 1993.]]
[29]
J. S. Vitter and E. A. M. Shriver. Optimal disk I/O with parallel block transfer. In Proc. 22nd Annu. ACM Sympos. Theory Comput., pages 159--169, 1990.]]
[30]
J. S. Vitter and E. A. M. Shriver. Algorithms for parallel memory I: Two-level memories. Algorithmica, 12(2--3):110--147, 1994.]]

Cited By

View all
  • (2024)The All Nearest Smaller Values Problem Revisited in Practice, Parallel and External MemoryProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659979(259-268)Online publication date: 17-Jun-2024
  • (2022)Automatic HBM ManagementProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538570(147-159)Online publication date: 11-Jul-2022
  • (2022)Beyond Binary Search: Parallel In-Place Construction of Implicit Search Tree LayoutsIEEE Transactions on Computers10.1109/TC.2021.307539271:5(1104-1116)Online publication date: 1-May-2022
  • Show More Cited By

Index Terms

  1. Fundamental parallel algorithms for private-cache chip multiprocessors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SPAA '08: Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
    June 2008
    380 pages
    ISBN:9781595939739
    DOI:10.1145/1378533
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 June 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. parallel external memory
    2. pem
    3. private-cache cmp

    Qualifiers

    • Research-article

    Conference

    SPAA08

    Acceptance Rates

    Overall Acceptance Rate 447 of 1,461 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)The All Nearest Smaller Values Problem Revisited in Practice, Parallel and External MemoryProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659979(259-268)Online publication date: 17-Jun-2024
    • (2022)Automatic HBM ManagementProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538570(147-159)Online publication date: 11-Jul-2022
    • (2022)Beyond Binary Search: Parallel In-Place Construction of Implicit Search Tree LayoutsIEEE Transactions on Computers10.1109/TC.2021.307539271:5(1104-1116)Online publication date: 1-May-2022
    • (2022)Efficient Parallel Cache-Oblivious Sorting Algorithms2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)10.1109/ICECCME55909.2022.9988160(1-7)Online publication date: 16-Nov-2022
    • (2021)External-memory Dictionaries in the Affine and PDAM ModelsACM Transactions on Parallel Computing10.1145/34706358:3(1-20)Online publication date: 20-Sep-2021
    • (2020)How to Manage High-Bandwidth Memory AutomaticallyProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400233(187-199)Online publication date: 6-Jul-2020
    • (2018)The Parallel Persistent Memory ModelProceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures10.1145/3210377.3210381(247-258)Online publication date: 11-Jul-2018
    • (2018)Report from the Fourth Workshop on Algorithms andSystems for MapReduce and Beyond (BeyondMR '17)ACM SIGMOD Record10.1145/3186549.318656146:4(44-48)Online publication date: 22-Feb-2018
    • (2018)Analysis of classic algorithms on highly-threaded many-core architecturesFuture Generation Computer Systems10.1016/j.future.2017.02.00782(528-543)Online publication date: May-2018
    • (2018)Cache Oblivious Sparse Matrix MultiplicationLATIN 2018: Theoretical Informatics10.1007/978-3-319-77404-6_32(437-447)Online publication date: 13-Mar-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media