Abstract
A common approach for dynamic parallelization of loops at runtime is the inspector-executor pattern. The inspector first runs the loop without any (side) effects to analyze whether there are data dependences that would prevent parallel execution. Only if no such dependences are found, does the executor phase actually run the loop iterations in parallel. In previous works, the overhead of the inspection must either be amortized by the parallel execution or is completely wasted if the loop turns out to be non-parallelizable.
In this paper we propose to run the inspection phase simultaneous to an instrumented sequential version of the loop. This way we can reduce and hide the overhead in case of a non-parallelizable loop. We discuss what needs to be done so that the sequentially executed iterations do not invalidate the inspector’s concurrent work (in which case sequential execution is needed for the whole loop).
Our measurements show that if a loop cannot be executed in parallel there is an overhead below 1.6 % compared to the runtime of the original sequential loop. If the loop is parallelizable, we see speedups of up to a factor of 3.6 on a quad core processor.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that as inspectors do not modify memory, the dry-run does not set b[400] to 0. This tricks the inspector to work with a wrong a[b[400]]. But what matters is that the loop is flagged as non-parallelizable since the dependence on &b[400] is found instead.
- 2.
References
Arenaz, M., Touriño, J., Doallo, R.: An inspector-executor algorithm for irregular assignment parallelization. In: Cao, J., Yang, L.T., Guo, M., Lau, F. (eds.) ISPA 2004. LNCS, vol. 3358, pp. 4–15. Springer, Heidelberg (2004)
Campanoni, S., Jones, T., Holloway, G., Reddi, V.J., Wei, G.-Y., Brooks, D.: Helix: automatic parallelization of irregular programs for chip multiprocessing. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO 2012), pp. 84–93, San Jose, CA, March 2012
Chen, M.K., Olukotun, K.: The Jrpm system for dynamically parallelizing java programs. In: Proceedings of the International Symposium on Computer Architecture (ISCA 2003), pp. 434–446, San Diego, CA, June 2003
DeVuyst, M., Tullsen, D.M., Kim, S.W.: Runtime parallelization of legacy code on a transactional memory system. In: Proceedings of the International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC 2011), pp. 127–136, Heraklion, Greece, January 2011
Du, Z.-H., Lim, C.-C., Li, X.-F., Yang, C., Zhao, Q., Ngai, T.-F.: A cost-driven compilation framework for speculative parallelization of sequential programs. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI 2004), pp. 71–81, Washington DC, June 2004
Garcia, S., Jeon, D., Louie, C.M., Taylor, M.B.: Kremlin: rethinking and rebooting Gprof for the multicore age. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI 2011), pp. 458–469, San Jose, CA, June 2011
García-Yágüez, Á., Llanos, D.R., González-Escribano, A.: Exclusive squashing for thread-level speculation. In: Proceedings of the International Symposium on High Performance Distributed Computing (HPDC 2011), pp. 275–276, San Jose, CA, June 2011
Gupta, M., Nim, R.: Techniques for speculative run-time parallelization of loops. In: Proceedings of the International Conference on Supercomputing (SC 1998), pp. 1–12, San Jose, CA, November 1998
Larsen, P., Ladelsky, R., Lidman, J., McKee, S.A., Karlsson, S., Zaks, A.: Parallelizing more loops with compiler guided refactoring. In: Proceedings on the International Conferences on Parallel Proceesing (ICPP 2012), pp. 410–419, Pittsburg, PA, September 2012
Leung, S.-T., Zahorjan, J.: Improving the performance of runtime parallelization. In: Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP 1993), pp. 83–91, San Diego, CA, May 1993
Liao, S.-W., Diwan, A., Bosch, R.P., Jr., Ghuloum, A., Lam, M.S.: Suif explorer: an interactive and interprocedural parallelizer. In: Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP 1999), pp. 37–48, Atlanta, GA, May 1999
Mehrara, M., Hao, J., Hsu, P.-C., Mahlke, S.: Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI 2009), pp. 166–176, Dublin, Ireland, June 2009
Philippsen, M., Tillmann, N., Brinkers, D.: Double inspection for run-time loop parallelization. In: Rajopadhye, S., Mills Strout, M. (eds.) LCPC 2011. LNCS, vol. 7146, pp. 46–60. Springer, Heidelberg (2013)
Ponnusamy, R., Saltz, J., Choudhary, A.: Runtime compilation techniques for data partitioning and communication schedule reuse. In: Proceedings of the International Conference on Supercomputing (SC 1993), pp. 361–370, Portland, OR, November 1993
Qian, Y.: Automatic parallelization tools. In: Proceedings of the World Congress Engineering and Computer Science (WCECS 2012), pp. 97–101, San Francisco, CA, October 2012
Rauchwerger, L., Padua, D.: The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI 1995), pp. 218–232, La Jolla, CA, June 1995
Tournavitis, G., Wang, Z., Franke, B., O’Boyle, M.F.P.: Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI 2009), pp. 177–187, Dublin, Ireland, June 2009
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Brinkers, D., Veldema, R., Philippsen, M. (2015). Simultaneous Inspection: Hiding the Overhead of Inspector-Executor Style Dynamic Parallelization. In: Brodman, J., Tu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science(), vol 8967. Springer, Cham. https://doi.org/10.1007/978-3-319-17473-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-17473-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17472-3
Online ISBN: 978-3-319-17473-0
eBook Packages: Computer ScienceComputer Science (R0)