Abstract
The advent of multicores presents a promising opportunity for speeding up the execution of sequential programs through their parallelization. In this paper we present a novel solution for efficiently supporting software-based speculative parallelization of sequential loops on multicore processors. The execution model we employ is based upon state separation, an approach for separately maintaining the speculative state of parallel threads and non-speculative state of the computation. If speculation is successful, the results produced by parallel threads in speculative state are committed by copying them into the computation’s non-speculative state. If misspeculation is detected, no costly state recovery mechanisms are needed as the speculative state can be simply discarded. Techniques are proposed to reduce the cost of data copying between non-speculative and speculative state and efficiently carrying out misspeculation detection. We apply the above approach to speculative parallelization of loops in several sequential programs which results in significant speedups on a Dell PowerEdge 1900 server with two Intel Xeon quad-core processors.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Bridges, M., Vachharajani, N., Zhang, Y., Jablin, T., August, D.: Revisiting the sequential programming model for multi-core. In: MICRO, pp. 69–84 (2007)
Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., Paul Chew, L.: Optimistic parallelism requires abstractions. In: PLDI, pp. 211–222 (2007)
Luk, C., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: PLDI, pp. 190–200 (2005)
Lattner, C., Adve, V.: Llvm: a compilation framework for lifelong program analysis & transformation. In: CGO, pp. 75–88 (2004)
Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: Mibench: a free, commercially representative embedded benchmark suite. In: IEEE 4th Annual Workshop on Workload Characterization (2001)
Cao Minh, C., Chung, J., Kozyrakis, C., Olukotun, K.: Stamp: stanford transactional applications for multi-processing. In: IISWC, pp. 35–46 (2008)
Dice, D., Shalev, O., Shavit, N.: Transactional locking ii. In: DISC, pp. 194–208 (2006)
Cintra, M.H., Martínez, J.F., Torrellas, J.: Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In: ISCA, pp. 13–24 (2000)
Hammond, L., Willey, M., Olukotun, K.: Data speculation support for a chip multiprocessor. In: ASPLOS, pp. 58–69 (1998)
Vijaykumar T.N., Gopal S., Smith J.E., Sohi G.S.: Speculative versioning cache. IEEE Trans. Parallel Distrib. Syst. 12(12), 1305–1317 (2001)
Gregory Steffan, J., Colohan, C.B., Zhai, A., Mowry, T.C.: A scalable approach to thread-level speculation. In: ISCA, pp. 1–12 (2000)
Bhowmik, A., Franklin, M.: A general compiler framework for speculative multithreading. In: SPAA, pp. 99–108 (2002)
Marcuello, P., González, A.: Clustered speculative multithreaded processors. In: ICS, pp. 365–372 (1999)
Zilles, C., Sohi, G.: Master/slave speculative parallelization. In: MICRO, pp. 85–96 (2002)
Ding, C., Shen, X., Kelsey, K., Tice, C., Huang, R., Zhang, C.: Software behavior oriented parallelization. In: PLDI, pp. 223–234 (2007)
Kulkarni, M., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Paul Chew, L.: Optimistic parallelism benefits from data partitioning. In: ASPLOS, pp. 233–243 (2008)
Ottoni, G., Rangan, R., Stoler, A., August, D.I.: Automatic thread extraction with decoupled software pipelining. In: MICRO, pp. 105–118 (2005)
Raman, E., Ottoni, G., Raman, A., Bridges, M.J., August, D.I.: Parallel-stage decoupled software pipelining. In: CGO, pp. 114–123 (2008)
Vachharajani, N., Rangan, R., Raman, E., Bridges, M,J., Ottoni, G., August, D.I.: Speculative decoupled software pipelining. In: PACT, pp. 49–59 (2007)
Buck, I.: Stream computing on graphics hardware. PhD thesis, Stanford, CA, USA (2005)
Fan, K., Park, H., Kudlur, M., Mahlke, S.A.: Modulo scheduling for highly customized datapaths to increase hardware reusability. In: CGO, pp. 124–133 (2008)
Kapasi U.J., Rixner S., Dally W.J., Khailany B., Ahn J.H., Mattson P., Owens J.D.: Programmable stream processors. Computer 36(8), 54–62 (2003)
Kudlur, M., Mahlke, S.: Orchestrating the execution of stream programs on multicore platforms. In: PLDI, pp. 114–124 (2008)
Thies, W., Chandrasekhar, V., Amarasinghe, S.: A practical approach to exploiting coarse-grained pipeline parallelism in c programs. In: MICRO, pp. 356–369 (2007)
Vijaykumar, T.N., Sohi, G.S.: Task selection for a multiscalar processor. In: MICRO, pp. 81–92 (1998)
Chu, M., Ravindran, R., Mahlke, S.: Data access partitioning for fine-grain parallelism on multicore architectures. In: MICRO, pp. 369–380 (2007)
Adl-Tabatabai, A.-R., Lewis, B.T., Menon, V., Murphy, B.R., Saha, B., Shpeisman, T.: Compiler and runtime support for efficient software transactional memory. In: PLDI, pp. 26–37 (2006)
Damron, P., Fedorova, A., Lev, Y., Luchangco, V., Moir, M., Nussbaum, D.: Hybrid transactional memory. In: ASPLOS, pp. 336–346 (2006)
Herlihy, M., Moss, J.E.B.: Transactional memory: architectural support for lock-free data structures. In: ISCA, pp. 289–300 (1993)
Moravan M.J., Bobba J., Moore K.E., Yen L., Hill M.D., Liblit B., Swift M.M., Wood D.A.: Supporting nested transactional memory in logtm. SIGOPS Oper. Syst. Rev. 40(5), 359–370 (2006)
Acknowledgments
This work is supported by NSF grants CNS-0810906, CNS-0751961, CCF-0753470, and CNS-0751949 to the University of California, Riverside.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Tian, C., Feng, M., Nagarajan, V. et al. Speculative Parallelization of Sequential Loops on Multicores. Int J Parallel Prog 37, 508–535 (2009). https://doi.org/10.1007/s10766-009-0111-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-009-0111-z