Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Parallelizing Sequential Programs with Statistical Accuracy Tests

Published: 01 May 2013 Publication History
  • Get Citation Alerts
  • Abstract

    We present QuickStep, a novel system for parallelizing sequential programs. Unlike standard parallelizing compilers (which are designed to preserve the semantics of the original sequential computation), QuickStep is instead designed to generate (potentially nondeterministic) parallel programs that produce acceptably accurate results acceptably often. The freedom to generate parallel programs whose output may differ (within statistical accuracy bounds) from the output of the sequential program enables a dramatic simplification of the compiler, a dramatic increase in the range of applications that it can parallelize, and a significant expansion in the range of parallel programs that it can legally generate.
    Results from our benchmark set of applications show that QuickStep can automatically generate acceptably accurate and efficient parallel programs---the automatically generated parallel versions of five of our six benchmark applications run between 5.0 and 7.8 times faster on eight cores than the original sequential versions. These applications and parallelizations contain features (such as the use of modern object-oriented programming constructs or desirable parallelizations with infrequent but acceptable data races) that place them inherently beyond the reach of standard approaches.

    References

    [1]
    Aleen, F. and Clark, N. 2009. Commutativity analysis for software parallelization: Letting program transformations see the big picture. In Proceedings of ASPLOS.
    [2]
    Barnes, J. and Hut, P. 1986. A hierarchical O(NlogN) force calculation algorithm. Nature 324, 4, 446--449.
    [3]
    Berger, E. and Zorn, B. 2006. DieHard: Probabilistic memory safety for unsafe languages. In Proceedings of PLDI.
    [4]
    Blume, W. and Eigenmann, R. 1992. Performance analysis of parallelizing compilers on the Perfect Benchmarks programs. IEEE Trans. Parallel Distrib. Syst. 3, 6.
    [5]
    Blume, W., Eigenmann, R., Faigin, K., Grout, J., Hoeflinger, J., Padua, D., Petersen, P., Pottenger, W., Raughwerger, L., Tu, P., and Weatherford, S. 1995. Effective automatic parallelization with Polaris. Int. J. Parallel Program.
    [6]
    Bolosky, W. and Scott, M. 1993. False sharing and its effect on shared memory performance. In Proceedings of SEDMS.
    [7]
    Bridges, M., Vachharajani, N., Zhang, Y., Jablin, T., and August, D. 2007. Revisiting the sequential programming model for multi-core. In Proceedings of MICRO.
    [8]
    Browning, R., Li, T., Chui, B., Ye, J., Pease, R., Czyzewski, Z., and Joy, D. 1995. Low-energy electron/atom elastic scattering cross sections for 0.1-30keV. Scanning 17, 4, 250--253.
    [9]
    Carbin, M., Misailovic, S., Kling, M., and Rinard, M. 2011. Detecting and escaping infinite loops with Jolt. In Proceedings of ECOOP.
    [10]
    Chaudhuri, S., Gulwani, S., Lublinerman, R., and Navidpour, S. 2011. Proving programs robust. In Proceedings of ESEC/FSE.
    [11]
    Dagum, L. and Menon, R. 1998. OpenMP: An industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5, 1, 46--55.
    [12]
    Demsky, B. and Rinard, M. 2005. Data structure repair using goal-directed reasoning. In Proceedings of ICSE.
    [13]
    Demsky, B., Ernst, M., Guo, P., McCamant, S., Perkins, J., and Rinard, M. 2006. Inference and enforcement of data structure consistency specifications. In Proceedings of ISSTA.
    [14]
    Ding, C., Shen, X., Kelsey, K., Tice, C., Huang, R., and Zhang, C. 2007. Software behavior oriented parallelization. In Proceedings of PLDI.
    [15]
    Ding, Y. and Li, Z. 2003. An adaptive scheme for dynamic parallelization. In Proceedings of LCPC, H. Dietz Ed., Lecture Notes in Computer Science, vol. 2624. Springer-Verlag, 274--289.
    [16]
    Dinning, A. and Schonberg, E. 1991. Detecting access anomalies in programs with critical sections. In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging.
    [17]
    Hall, M., Anderson, J., Amarasinghe, S., Murphy, B., Liao, S., Bugnion, E., and Lam, M. 1996. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer.
    [18]
    Harris, J., Lazaratos, S., and Michelena, R. 1990. Tomographic string inversion. In Proceedings of the 60th Annual International Meeting, Society of Exploration and Geophysics, Extended Abstracts.
    [19]
    Herlihy, M and Moss, J. 1993. Transactional memory: Architectural support for lock-free data structures. In Proceedings of ISCA.
    [20]
    Hoeffding, W. 1963. Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 301, 13--30.
    [21]
    Hoffmann, H., Misailovic, S., Sidiroglou, S., Agarwal, A., and Rinard, M. 2009. Using code perforation to improve performance, reduce energy consumption, and respond to failures. Tech. rep. MIT-CSAIL-TR-2009-042, MIT, Cambridge, MA.
    [22]
    Jin, G., Song, L., Zhang, W., Lu, S., and Liblit, B. 2011. Automated atomicity-violation fixing. In Proceedings of PLDI.
    [23]
    Kim, D. and Rinard, M. C. 2011. Verification of semantic commutativity conditions and inverse operations on linked data structures. In Proceedings of PLDI.
    [24]
    Kirsch, C., Payer, H., Röck, H., and Sokolova, A. 2011. Performance, scalability, and semantics of concurrent FIFO queues. Tech. rep. 2011-03, Department of Computer Sciences, University of Salzburg.
    [25]
    Lattner, C. and Adve, V. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of CGO.
    [26]
    Meng, J., Chakradhar, S., and Raghunathan, A. 2009. Best-effort parallel execution framework for recognition and mining applications. In Proceedings of IPDPS.
    [27]
    Meng, J., Raghunathan, A., Chakradhar, S., and Byna, S. 2010. Exploiting the forgiving nature of applications for scalable parallel execution. In Proceedings of IPDPS.
    [28]
    Misailovic, S., Kim, D., and Rinard, M. 2010a. Automatic parallelization with statistical accuracy bounds. Tech. rep. MIT-CSAIL-TR-2010-007, MIT, Cambridge, MA.
    [29]
    Misailovic, S., Kim, D., and Rinard, M. 2010b. Parallelizing sequential programs with statistical accuracy tests. Tech. rep. MIT-CSAIL-TR-2010-038, MIT, Cambridge, MA.
    [30]
    Misailovic, S., Roy, D., and Rinard, M. 2011a. Probabilistic and statistical analysis of perforated patterns. Tech. rep. MIT-CSAIL-TR-2011-003, MIT, Cambridge, MA.
    [31]
    Misailovic, S., Roy, D., and Rinard, M. 2011b. Probabilistically accurate program transformations. In Proceedings of SAS.
    [32]
    Misailovic, S., Sidiroglou, S., Hoffmann, H., and Rinard, M. 2010. Quality of service profiling. In Proceedings of ICSE.
    [33]
    Nguyen, H. and Rinard, M. 2007. Detecting and eliminating memory leaks using cyclic memory allocation. In Proceedings of ISMM.
    [34]
    Nieh, J. and Levoy, M. 1992. Volume rendering on scalable shared-memory MIMD architectures. Tech. rep. CSL-TR-92-537, Computer Systems Laboratory, Stanford Univ., Stanford, CA.
    [35]
    Perkins, J., Kim, S., Larsen, S., Amarasinghe, S., Bachrach, J., Carbin, M., Pacheco, C., Sherwood, F., Sidiroglou, S., Sullivan, G., Wong, W., Zibin, Y., Ernst, M. D., and Rinard, M. 2009. Automatically patching errors in deployed software. In Proceedings of SOSP.
    [36]
    Prabhu, M. and Olukotun, K. 2005. Exposing speculative thread parallelism in SPEC2000. In Proceedings of PPoPP.
    [37]
    Rauchwerger, L. and Padua, D. 1995. The LRPD test: Speculative runtime parallelization of loops with privatization and reduction parallelization. In Proceedings of PLDI.
    [38]
    Rauchwerger, L., Amato, N., and Padua, D. 1995. Runtime methods for parallelizing partially parallel loops. In Proceedings of ICS.
    [39]
    Rinard, M. 1994. The design, implementation and evaluation of Jade, a portable, implicitly parallel programming language. Ph.D. dissertation, Dept. of Computer Science, Stanford Univ., Stanford, CA.
    [40]
    Rinard, M. 2003. Acceptability-oriented computing. In Proceedings of OOPSLA Onwards! Session.
    [41]
    Rinard, M. 2006. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. In Proceedings of ICS.
    [42]
    Rinard, M. 2007. Using early phase termination to eliminate load imbalancess at barrier synchronization points. In Proceedings of OOPSLA.
    [43]
    Rinard, M. and Diniz, P. 1997. Commutativity analysis: A new analysis technique for parallelizing compilers. ACM Trans. Program. Lang. Syst. 19, 6.
    [44]
    Rinard, M., Cadar, C., Dumitran, D., Roy, D. M., Leu, T., and William S. Beebee, J. 2004. Enhancing server availability and security through failure-oblivious computing. In Proceedings of OSDI.
    [45]
    Rinard, M., Hoffmann, H., Misailovic, S., and Sidiroglou, S. 2010. Patterns and statistical analysis for understanding reduced resource computing. In Proceedings of OOPSLA Onwards!
    [46]
    Rul, S., Vandierendonck, H., and De Bosschere, K. 2008. A dynamic analysis tool for finding coarse-grain parallelism. In Proceedings of HiPEAC Industrial Workshop.
    [47]
    Rus, S., Pennings, M., and Rauchwerger, L. 2007. Sensitivity analysis for automatic parallelization on multi-cores. In Proceedings of ICS.
    [48]
    Sidiroglou, S., Misailovic, S., Hoffmann, H., and Rinard, M. 2011. Managing performance vs. accuracy trade-offs with loop perforation. In Proceedings of ESEC/FSE.
    [49]
    Tinker, P. and Katz, M. 1988. Parallel execution of sequential Scheme with Paratran. In Proceedings of LFP.
    [50]
    Tournavitis, G., Wang, Z., Franke, B., and O’Boyle, M. 2009. Towards a holistic approach to auto-parallelization: Integrating profile-driven parallelism detection and machine-learning based mapping. In Proceedings of PLDI.
    [51]
    Udupa, A., Rajan, K., and Thies, W. 2011. Alter: Leveraging breakable dependences for parallelization. In Proceedings of PLDI.
    [52]
    Wald, A. 1947. Sequential Analysis. John Wiley and Sons.
    [53]
    Woo, S., Ohara, M., Torrie, E., Singh, J., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of ISCA.

    Cited By

    View all
    • (2023)SPLENDID: Supporting Parallel LLVM-IR Enhanced Natural Decompilation for Interactive DevelopmentProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582058(679-693)Online publication date: 25-Mar-2023
    • (2023)Program State Element CharacterizationProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580011(199-211)Online publication date: 17-Feb-2023
    • (2022)An Introduction to the Approximate Computing ParadigmApproximate Computing and its Impact on Accuracy, Reliability and Fault-Tolerance10.1007/978-3-031-15717-2_2(11-22)Online publication date: 17-Nov-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 12, Issue 2s
    Special Section on Probabilistic Embedded Computing
    May 2013
    269 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/2465787
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 01 May 2013
    Accepted: 01 November 2011
    Revised: 01 September 2011
    Received: 01 July 2011
    Published in TECS Volume 12, Issue 2s

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Parallelization
    2. accuracy
    3. interactive
    4. trade-off

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)SPLENDID: Supporting Parallel LLVM-IR Enhanced Natural Decompilation for Interactive DevelopmentProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582058(679-693)Online publication date: 25-Mar-2023
    • (2023)Program State Element CharacterizationProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580011(199-211)Online publication date: 17-Feb-2023
    • (2022)An Introduction to the Approximate Computing ParadigmApproximate Computing and its Impact on Accuracy, Reliability and Fault-Tolerance10.1007/978-3-031-15717-2_2(11-22)Online publication date: 17-Nov-2022
    • (2022)Accuracy-Aware CompilersApproximate Computing Techniques10.1007/978-3-030-94705-7_7(177-214)Online publication date: 3-Jan-2022
    • (2021)Reverse engineering for reduction parallelization via semiring polynomialsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454079(820-834)Online publication date: 19-Jun-2021
    • (2021)Functional Approximation and Approximate Parallelization with the ACCEPT compiler2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD53543.2021.00030(188-197)Online publication date: Oct-2021
    • (2021)An approximate-computing empowered green 6G downlinkPhysical Communication10.1016/j.phycom.2021.10144449:COnline publication date: 1-Dec-2021
    • (2020)Exploiting Errors for EfficiencyACM Computing Surveys10.1145/339489853:3(1-39)Online publication date: 12-Jun-2020
    • (2020)PANDORAACM Transactions on Embedded Computing Systems10.1145/339189919:5(1-17)Online publication date: 11-Nov-2020
    • (2020)PerspectiveProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378458(351-367)Online publication date: 9-Mar-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media