Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/605397.605400acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article

Speculative synchronization: applying thread-level speculation to explicitly parallel applications

Published: 01 October 2002 Publication History
  • Get Citation Alerts
  • Abstract

    Barriers, locks, and flags are synchronizing operations widely used programmers and parallelizing compilers to produce race-free parallel programs. Often times, these operations are placed suboptimally, either because of conservative assumptions about the program, or merely for code simplicity.We propose Speculative Synchronization, which applies the philosophy behind Thread-Level Speculation (TLS) to explicitly parallel applications. Speculative threads execute past active barriers, busy locks, and unset flags instead of waiting. The proposed hardware checks for conflicting accesses and, if a violation is detected, offending speculative thread is rolled back to the synchronization point and restarted on the fly. TLS's principle of always keeping a safe thread is key to our proposal: in any speculative barrier, lock, or flag, the existence of one or more safe threads at all times guarantees forward progress, even in the presence of access conflicts or speculative buffer overflow. Our proposal requires simple hardware and no programming effort. Furthermore, it can coexist with conventional synchronization at run time.We use simulations to evaluate 5 compiler- and hand-parallelized applications. Our results show a reduction in the time lost to synchronization of 34% on average, and a reduction in overall program execution time of 7.4% on average.

    References

    [1]
    S. V. Adve and M. D. Hill. A unified formalization of four shared-memory models. IEEE Transactions on Parallel and Distributed Systems, 4(6):613-624, June 1993.
    [2]
    W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, B. Pottenger, L. Rauchwerger, and P. Tu. Advanced program restructuring for high-performance computers with Polaris. IEEE Computer, 29(12):78-82, Dec. 1996.
    [3]
    M. C. Carlisle and A. Rogers. Software caching and computation migration in Olden. In Symposium on Principles and Practice of Parallel Programming, pages 29-38, Santa Barbara, CA, July 1995.
    [4]
    M. Cintra, J. F. Martínez, and J. Torrellas. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In International Symposium on Computer Architecture, pages 13-24, Vancouver, Canada, June 2000.
    [5]
    L. Dagum and R. Menon. OpenMP: An industry-standard API for shared-memory programming. IEEE Computational Science and Engineering, 5(1):46-55, Jan.-Mar. 1998.
    [6]
    J. Edler, J. Lipkis, and E. Schonberg. Process management for highly parallel UNIX systems. In USENIX Workshop on Unix and Supercomputers, San Francisco, CA, Sept. 1988.
    [7]
    K. Gharachorloo and P. B. Gibbons. Detecting violations of sequential consistency. In Symposium on Parallel Algorithms and Architectures, pages 316-326, Hilton Head, SC, July 1991.
    [8]
    K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models. In International Conference on Parallel Processing, pages 1355-1364, St. Charles, IL, Aug. 1991.
    [9]
    C. Gniady, B. Falsafi, and T. N. Vijaykumar. Is SC+ILP=RC? In International Symposium on Computer Architecture, pages 162-171, Atlanta, GA, May 1999.
    [10]
    S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative versioning cache. In International Symposium on High-Performance Computer Architecture, pages 195-205, Las Vegas, NV, Jan.-Feb. 1998.
    [11]
    R. Gupta. The Fuzzy Barrier: A mechanism for high-speed synchronization of processors. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 54-63, Boston, MA, Apr. 1989.
    [12]
    L. Hammond, M. Wiley, and K. Olukotun. Data speculation support for a chip multiprocessor. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 58-69, San Jose, CA, Oct. 1998.
    [13]
    J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, second edition, 1996.
    [14]
    M. Herlihy. Apologizing versus asking permission: Optimistic concurrency control for abstract data types. ACM Transactions on Database Systems, 15(1):96-124, Mar. 1990.
    [15]
    M. Herlihy. A methodology for implementing highly concurrent data objects. ACM Transactions on Parallel Languages and Systems, 15(5):745-770, Nov. 1993.
    [16]
    M. Herlihy and J. E. B. Moss. Transactional Memory: Architectural support for lock-free data structures. In International Symposium on Computer Architecture, pages 289-300, San Diego, CA, May 1993.
    [17]
    L. I. Kontothanassis, R. W. Wisniewski, and M. L. Scott. Schedule- conscious synchronization. ACM Transactions on Computer Systems, 15(1):3-40, Feb. 1997.
    [18]
    V. Krishnan and J. Torrellas. A direct-execution framework for fast and accurate simulation of superscalar processors. In International Conference on Parallel Architectures and Compilation Techniques, pages 286-293, Paris, France, Oct. 1998.
    [19]
    V. Krishnan and J. Torrellas. A chip-multiprocessor architecture with speculative multithreading. IEEE Transactions on Computers, 48(9):866-880, Sept. 1999.
    [20]
    H. T. Kung and J. T. Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems, 6(2):213-226, June 1981.
    [21]
    D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In International Symposium on Computer Architecture, pages 148-159, Seattle, WA, May 1990.
    [22]
    E. Lusk, R. Overbeek, et al. Portable Programs for Parallel Processors. Holt, Rinehart, and Winston, Inc., New York, NY, 1996.
    [23]
    P. Marcuello and A. González. Clustered speculative multithreaded processors. In International Conference on Supercomputing, pages 365-372, Rhodes, Greece, June 1999.
    [24]
    B. D. Marsh, M. L. Scott, T. J. LeBlanc, and E. P. Markatos. First-class user-level threads. In Symposium on Operating System Principles, pages 110-121, Pacific Grove, CA, Oct. 1991.
    [25]
    J. F. Martínez and J. Torrellas. Speculative Locks for concurrent execution of critical sections in shared-memory multiprocessors. In Workshop on Memory Performance Issues, Gothenburg, Sweden, June 2001.
    [26]
    V. S. Pai, P. Ranganathan, S. V. Adve, and T. Harton. An evaluation of memory consistency models for shared-memory systems with ILP processors. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12-23, Cambridge, MA, Oct. 1996.
    [27]
    R. Rajwar and J. R. Goodman. Speculative Lock Elision: Enabling highly concurrent multithreaded execution. In International Symposium on Microarchitecture, pages 294-305, Austin, TX, Dec. 2001.
    [28]
    R. Rajwar and J. R. Goodman. Transactional lock-free execution of lock-based codes. In International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, Oct. 2002.
    [29]
    M. C. Rinard. Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization. ACM Transactions on Computer Systems, 17(4):337-371, Nov. 1999.
    [30]
    T. Sato, K. Ohno, and H. Nakashima. A mechanism for speculative memory accesses following synchronizing operations. In International Parallel and Distributed Processing Symposium, pages 145-154, Cancun, Mexico, May 2000.
    [31]
    J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. In International Symposium on Computer Architecture, pages 1-12, Vancouver, Canada, June 2000.
    [32]
    J. M. Stone, H. S. Stone, P. Heidelberg, and J. Turek. Multiple reservations and the Oklahoma Update. IEEE Parallel and Distributed Technology, 1(4):58-71, Nov. 1993.
    [33]
    D. L. Weaver and T. Germond, editors. The SPARC Architecture Manual. PTR Prentice Hall, 1994.
    [34]
    S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In International Symposium on Computer Architecture, pages 24-36, Santa Margherita Ligure, Italy, June 1995.
    [35]
    K. C. Yeager. The MIPS R10000 superscalar microprocessor. IEEE Micro, 6(2):28-40, Apr. 1996.

    Cited By

    View all
    • (2024)Speculative computing for AAFM solutions in large-scale product configurationsScientific Reports10.1038/s41598-024-61647-614:1Online publication date: 16-May-2024
    • (2023)Compiler‐driven approach for automating nonblocking synchronization in concurrent data abstractionsConcurrency and Computation: Practice and Experience10.1002/cpe.793536:5Online publication date: 24-Oct-2023
    • (2022)Reviewing Automated Analysis of Feature Model Solutions for the Product ConfigurationApplied Sciences10.3390/app1301017413:1(174)Online publication date: 23-Dec-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
    October 2002
    318 pages
    ISBN:1581135742
    DOI:10.1145/605397
    • cover image ACM SIGOPS Operating Systems Review
      ACM SIGOPS Operating Systems Review  Volume 36, Issue 5
      December 2002
      296 pages
      ISSN:0163-5980
      DOI:10.1145/635508
      Issue’s Table of Contents
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 30, Issue 5
      Special Issue: Proceedings of the 10th annual conference on Architectural Support for Programming Languages and Operating Systems
      December 2002
      296 pages
      ISSN:0163-5964
      DOI:10.1145/635506
      Issue’s Table of Contents
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 37, Issue 10
      October 2002
      296 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/605432
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 October 2002

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    ASPLOS02

    Acceptance Rates

    ASPLOS X Paper Acceptance Rate 24 of 175 submissions, 14%;
    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Speculative computing for AAFM solutions in large-scale product configurationsScientific Reports10.1038/s41598-024-61647-614:1Online publication date: 16-May-2024
    • (2023)Compiler‐driven approach for automating nonblocking synchronization in concurrent data abstractionsConcurrency and Computation: Practice and Experience10.1002/cpe.793536:5Online publication date: 24-Oct-2023
    • (2022)Reviewing Automated Analysis of Feature Model Solutions for the Product ConfigurationApplied Sciences10.3390/app1301017413:1(174)Online publication date: 23-Dec-2022
    • (2022)Exploring Functionality and Efficiency of Feature Model Product Configuration SolutionsIEEE Access10.1109/ACCESS.2022.323144910(134318-134332)Online publication date: 2022
    • (2020)Translational and Rotational Arrow Cues (TRAC) Navigation Method for Manual Alignment TasksACM Transactions on Applied Perception10.1145/337500117:1(1-19)Online publication date: 10-Feb-2020
    • (2020)OMSCSCommunications of the ACM10.1145/336619163:8(27-29)Online publication date: 22-Jul-2020
    • (2020)Threats of a replication crisis in empirical computer scienceCommunications of the ACM10.1145/336031163:8(70-79)Online publication date: 22-Jul-2020
    • (2019)Data-driven Mixed Precision Sparse Matrix Vector Multiplication for GPUsACM Transactions on Architecture and Code Optimization10.1145/337127516:4(1-24)Online publication date: 17-Dec-2019
    • (2019)DNNTuneACM Transactions on Architecture and Code Optimization10.1145/336830516:4(1-26)Online publication date: 26-Dec-2019
    • (2019)Side-channel Timing Attack of RSA on a GPUACM Transactions on Architecture and Code Optimization10.1145/334172916:3(1-18)Online publication date: 13-Aug-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media