Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Simultaneous Inspection: Hiding the Overhead of Inspector-Executor Style Dynamic Parallelization

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8967))

  • 878 Accesses

Abstract

A common approach for dynamic parallelization of loops at runtime is the inspector-executor pattern. The inspector first runs the loop without any (side) effects to analyze whether there are data dependences that would prevent parallel execution. Only if no such dependences are found, does the executor phase actually run the loop iterations in parallel. In previous works, the overhead of the inspection must either be amortized by the parallel execution or is completely wasted if the loop turns out to be non-parallelizable.

In this paper we propose to run the inspection phase simultaneous to an instrumented sequential version of the loop. This way we can reduce and hide the overhead in case of a non-parallelizable loop. We discuss what needs to be done so that the sequentially executed iterations do not invalidate the inspector’s concurrent work (in which case sequential execution is needed for the whole loop).

Our measurements show that if a loop cannot be executed in parallel there is an overhead below 1.6 % compared to the runtime of the original sequential loop. If the loop is parallelizable, we see speedups of up to a factor of 3.6 on a quad core processor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Note that as inspectors do not modify memory, the dry-run does not set b[400] to 0. This tricks the inspector to work with a wrong a[b[400]]. But what matters is that the loop is flagged as non-parallelizable since the dependence on &b[400] is found instead.

  2. 2.

    Note that the gotos in the macros of Figs. 3 and 5 are slightly different for the simultaneous inspection: they are delayed until the interrupted iteration finishes.

References

  1. Arenaz, M., Touriño, J., Doallo, R.: An inspector-executor algorithm for irregular assignment parallelization. In: Cao, J., Yang, L.T., Guo, M., Lau, F. (eds.) ISPA 2004. LNCS, vol. 3358, pp. 4–15. Springer, Heidelberg (2004)

    Google Scholar 

  2. Campanoni, S., Jones, T., Holloway, G., Reddi, V.J., Wei, G.-Y., Brooks, D.: Helix: automatic parallelization of irregular programs for chip multiprocessing. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO 2012), pp. 84–93, San Jose, CA, March 2012

    Google Scholar 

  3. Chen, M.K., Olukotun, K.: The Jrpm system for dynamically parallelizing java programs. In: Proceedings of the International Symposium on Computer Architecture (ISCA 2003), pp. 434–446, San Diego, CA, June 2003

    Google Scholar 

  4. DeVuyst, M., Tullsen, D.M., Kim, S.W.: Runtime parallelization of legacy code on a transactional memory system. In: Proceedings of the International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC 2011), pp. 127–136, Heraklion, Greece, January 2011

    Google Scholar 

  5. Du, Z.-H., Lim, C.-C., Li, X.-F., Yang, C., Zhao, Q., Ngai, T.-F.: A cost-driven compilation framework for speculative parallelization of sequential programs. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI 2004), pp. 71–81, Washington DC, June 2004

    Google Scholar 

  6. Garcia, S., Jeon, D., Louie, C.M., Taylor, M.B.: Kremlin: rethinking and rebooting Gprof for the multicore age. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI 2011), pp. 458–469, San Jose, CA, June 2011

    Google Scholar 

  7. García-Yágüez, Á., Llanos, D.R., González-Escribano, A.: Exclusive squashing for thread-level speculation. In: Proceedings of the International Symposium on High Performance Distributed Computing (HPDC 2011), pp. 275–276, San Jose, CA, June 2011

    Google Scholar 

  8. Gupta, M., Nim, R.: Techniques for speculative run-time parallelization of loops. In: Proceedings of the International Conference on Supercomputing (SC 1998), pp. 1–12, San Jose, CA, November 1998

    Google Scholar 

  9. Larsen, P., Ladelsky, R., Lidman, J., McKee, S.A., Karlsson, S., Zaks, A.: Parallelizing more loops with compiler guided refactoring. In: Proceedings on the International Conferences on Parallel Proceesing (ICPP 2012), pp. 410–419, Pittsburg, PA, September 2012

    Google Scholar 

  10. Leung, S.-T., Zahorjan, J.: Improving the performance of runtime parallelization. In: Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP 1993), pp. 83–91, San Diego, CA, May 1993

    Google Scholar 

  11. Liao, S.-W., Diwan, A., Bosch, R.P., Jr., Ghuloum, A., Lam, M.S.: Suif explorer: an interactive and interprocedural parallelizer. In: Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP 1999), pp. 37–48, Atlanta, GA, May 1999

    Google Scholar 

  12. Mehrara, M., Hao, J., Hsu, P.-C., Mahlke, S.: Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI 2009), pp. 166–176, Dublin, Ireland, June 2009

    Google Scholar 

  13. Philippsen, M., Tillmann, N., Brinkers, D.: Double inspection for run-time loop parallelization. In: Rajopadhye, S., Mills Strout, M. (eds.) LCPC 2011. LNCS, vol. 7146, pp. 46–60. Springer, Heidelberg (2013)

    Google Scholar 

  14. Ponnusamy, R., Saltz, J., Choudhary, A.: Runtime compilation techniques for data partitioning and communication schedule reuse. In: Proceedings of the International Conference on Supercomputing (SC 1993), pp. 361–370, Portland, OR, November 1993

    Google Scholar 

  15. Qian, Y.: Automatic parallelization tools. In: Proceedings of the World Congress Engineering and Computer Science (WCECS 2012), pp. 97–101, San Francisco, CA, October 2012

    Google Scholar 

  16. Rauchwerger, L., Padua, D.: The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI 1995), pp. 218–232, La Jolla, CA, June 1995

    Google Scholar 

  17. Tournavitis, G., Wang, Z., Franke, B., O’Boyle, M.F.P.: Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI 2009), pp. 177–187, Dublin, Ireland, June 2009

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Brinkers .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Brinkers, D., Veldema, R., Philippsen, M. (2015). Simultaneous Inspection: Hiding the Overhead of Inspector-Executor Style Dynamic Parallelization. In: Brodman, J., Tu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science(), vol 8967. Springer, Cham. https://doi.org/10.1007/978-3-319-17473-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17473-0_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17472-3

  • Online ISBN: 978-3-319-17473-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics