Simultaneous Inspection: Hiding the Overhead of Inspector-Executor Style Dynamic Parallelization

Brinkers, Daniel; Veldema, Ronald; Philippsen, Michael

doi:10.1007/978-3-319-17473-0_7

Daniel Brinkers¹⁵,
Ronald Veldema¹⁵ &
Michael Philippsen¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8967))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

878 Accesses

Abstract

A common approach for dynamic parallelization of loops at runtime is the inspector-executor pattern. The inspector first runs the loop without any (side) effects to analyze whether there are data dependences that would prevent parallel execution. Only if no such dependences are found, does the executor phase actually run the loop iterations in parallel. In previous works, the overhead of the inspection must either be amortized by the parallel execution or is completely wasted if the loop turns out to be non-parallelizable.

In this paper we propose to run the inspection phase simultaneous to an instrumented sequential version of the loop. This way we can reduce and hide the overhead in case of a non-parallelizable loop. We discuss what needs to be done so that the sequentially executed iterations do not invalidate the inspector’s concurrent work (in which case sequential execution is needed for the whole loop).

Our measurements show that if a loop cannot be executed in parallel there is an overhead below 1.6 % compared to the runtime of the original sequential loop. If the loop is parallelizable, we see speedups of up to a factor of 3.6 on a quad core processor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Double Inspection for Run-Time Loop Parallelization

Online Dynamic Dependence Analysis for Speculative Polyhedral Parallelization

Speculative Program Parallelization with Scalable and Decentralized Runtime Verification

Notes

1.
Note that as inspectors do not modify memory, the dry-run does not set b[400] to 0. This tricks the inspector to work with a wrong a[b[400]]. But what matters is that the loop is flagged as non-parallelizable since the dependence on &b[400] is found instead.
2.
Note that the gotos in the macros of Figs. 3 and 5 are slightly different for the simultaneous inspection: they are delayed until the interrupted iteration finishes.

References

Arenaz, M., Touriño, J., Doallo, R.: An inspector-executor algorithm for irregular assignment parallelization. In: Cao, J., Yang, L.T., Guo, M., Lau, F. (eds.) ISPA 2004. LNCS, vol. 3358, pp. 4–15. Springer, Heidelberg (2004)
Google Scholar
Campanoni, S., Jones, T., Holloway, G., Reddi, V.J., Wei, G.-Y., Brooks, D.: Helix: automatic parallelization of irregular programs for chip multiprocessing. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO 2012), pp. 84–93, San Jose, CA, March 2012
Google Scholar
Chen, M.K., Olukotun, K.: The Jrpm system for dynamically parallelizing java programs. In: Proceedings of the International Symposium on Computer Architecture (ISCA 2003), pp. 434–446, San Diego, CA, June 2003
Google Scholar
DeVuyst, M., Tullsen, D.M., Kim, S.W.: Runtime parallelization of legacy code on a transactional memory system. In: Proceedings of the International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC 2011), pp. 127–136, Heraklion, Greece, January 2011
Google Scholar
Du, Z.-H., Lim, C.-C., Li, X.-F., Yang, C., Zhao, Q., Ngai, T.-F.: A cost-driven compilation framework for speculative parallelization of sequential programs. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI 2004), pp. 71–81, Washington DC, June 2004
Google Scholar
Garcia, S., Jeon, D., Louie, C.M., Taylor, M.B.: Kremlin: rethinking and rebooting Gprof for the multicore age. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI 2011), pp. 458–469, San Jose, CA, June 2011
Google Scholar
García-Yágüez, Á., Llanos, D.R., González-Escribano, A.: Exclusive squashing for thread-level speculation. In: Proceedings of the International Symposium on High Performance Distributed Computing (HPDC 2011), pp. 275–276, San Jose, CA, June 2011
Google Scholar
Gupta, M., Nim, R.: Techniques for speculative run-time parallelization of loops. In: Proceedings of the International Conference on Supercomputing (SC 1998), pp. 1–12, San Jose, CA, November 1998
Google Scholar
Larsen, P., Ladelsky, R., Lidman, J., McKee, S.A., Karlsson, S., Zaks, A.: Parallelizing more loops with compiler guided refactoring. In: Proceedings on the International Conferences on Parallel Proceesing (ICPP 2012), pp. 410–419, Pittsburg, PA, September 2012
Google Scholar
Leung, S.-T., Zahorjan, J.: Improving the performance of runtime parallelization. In: Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP 1993), pp. 83–91, San Diego, CA, May 1993
Google Scholar
Liao, S.-W., Diwan, A., Bosch, R.P., Jr., Ghuloum, A., Lam, M.S.: Suif explorer: an interactive and interprocedural parallelizer. In: Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP 1999), pp. 37–48, Atlanta, GA, May 1999
Google Scholar
Mehrara, M., Hao, J., Hsu, P.-C., Mahlke, S.: Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI 2009), pp. 166–176, Dublin, Ireland, June 2009
Google Scholar
Philippsen, M., Tillmann, N., Brinkers, D.: Double inspection for run-time loop parallelization. In: Rajopadhye, S., Mills Strout, M. (eds.) LCPC 2011. LNCS, vol. 7146, pp. 46–60. Springer, Heidelberg (2013)
Google Scholar
Ponnusamy, R., Saltz, J., Choudhary, A.: Runtime compilation techniques for data partitioning and communication schedule reuse. In: Proceedings of the International Conference on Supercomputing (SC 1993), pp. 361–370, Portland, OR, November 1993
Google Scholar
Qian, Y.: Automatic parallelization tools. In: Proceedings of the World Congress Engineering and Computer Science (WCECS 2012), pp. 97–101, San Francisco, CA, October 2012
Google Scholar
Rauchwerger, L., Padua, D.: The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI 1995), pp. 218–232, La Jolla, CA, June 1995
Google Scholar
Tournavitis, G., Wang, Z., Franke, B., O’Boyle, M.F.P.: Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI 2009), pp. 177–187, Dublin, Ireland, June 2009
Google Scholar

Download references

Author information

Authors and Affiliations

Programming Systems Group, Friedrich-Alexander University Erlangen-Nürnberg (FAU), Erlangen, Germany
Daniel Brinkers, Ronald Veldema & Michael Philippsen

Authors

Daniel Brinkers
View author publications
You can also search for this author in PubMed Google Scholar
Ronald Veldema
View author publications
You can also search for this author in PubMed Google Scholar
Michael Philippsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Brinkers .

Editor information

Editors and Affiliations

Intel Corporation, Santa Clara, California, USA
James Brodman
Intel Corporation, Santa Clara, California, USA
Peng Tu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brinkers, D., Veldema, R., Philippsen, M. (2015). Simultaneous Inspection: Hiding the Overhead of Inspector-Executor Style Dynamic Parallelization. In: Brodman, J., Tu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science(), vol 8967. Springer, Cham. https://doi.org/10.1007/978-3-319-17473-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-17473-0_7
Published: 01 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17472-3
Online ISBN: 978-3-319-17473-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Simultaneous Inspection: Hiding the Overhead of Inspector-Executor Style Dynamic Parallelization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Double Inspection for Run-Time Loop Parallelization

Online Dynamic Dependence Analysis for Speculative Polyhedral Parallelization

Speculative Program Parallelization with Scalable and Decentralized Runtime Verification

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Simultaneous Inspection: Hiding the Overhead of Inspector-Executor Style Dynamic Parallelization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Double Inspection for Run-Time Loop Parallelization

Online Dynamic Dependence Analysis for Speculative Polyhedral Parallelization

Speculative Program Parallelization with Scalable and Decentralized Runtime Verification

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation