Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Hybridizing and Relaxing Dependence Tracking for Efficient Parallel Runtime Support

Published: 30 August 2017 Publication History

Abstract

It is notoriously challenging to develop parallel software systems that are both scalable and correct. Runtime support for parallelism—such as multithreaded record and replay, data race detectors, transactional memory, and enforcement of stronger memory models—helps achieve these goals, but existing commodity solutions slow programs substantially to track (i.e., detect or control) an execution’s cross-thread dependencies accurately. Prior work tracks cross-thread dependencies either “pessimistically,” slowing every program access, or “optimistically,” allowing for lightweight instrumentation of most accesses but dramatically slowing accesses that are conflicting (i.e., involved in cross-thread dependencies).
This article presents two novel approaches that seek to improve the performance of dependence tracking. Hybrid tracking (HT) hybridizes pessimistic and optimistic tracking by overcoming a fundamental mismatch between these two kinds of tracking. HT uses an adaptive, profile-based policy to make runtime decisions about switching between pessimistic and optimistic tracking. Relaxed tracking (RT) attempts to reduce optimistic tracking’s overhead on conflicting accesses by tracking dependencies in a “relaxed” way—meaning that not all dependencies are tracked accurately—while still preserving both program semantics and runtime support’s correctness. To demonstrate the usefulness and potential of HT and RT, we build runtime support based on the two approaches. Our evaluation shows that both approaches offer performance advantages over existing approaches, but there exist challenges and opportunities for further improvement.
HT and RT are distinct solutions to the same problem. It is easier to build runtime support based on HT than on RT, although RT does not incur the overhead of online profiling. This article presents the two approaches together to inform and inspire future designs for efficient parallel runtime support.

References

[1]
Martín Abadi, Tim Harris, and Mojtaba Mehrara. 2009. Transactional memory with strong atomicity using off-the-shelf memory protection hardware. In PPoPP. 185--196.
[2]
Sarita V. Adve and Hans-J. Boehm. 2010. Memory models: A case for rethinking parallel languages and hardware. Commun. ACM 53, 8 (2010), 90--101.
[3]
Sarita V. Adve and Mark D. Hill. 1990. Weak ordering—a new definition. In ISCA. 2--14.
[4]
B. Alpern, C. R. Attanasio, J. J. Barton, M. G. Burke, P. Cheng, J.-D. Choi, A. Cocchi, S. J. Fink, D. Grove, M. Hind, Susan Flynn Hummel, D. Lieber, V. Litvinov, M. Mergen, T. Ngo, J. R. Russell, V. Sarkar, M. J. Serrano, J. Shepherd, S. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. 2000. The Jalapeño virtual machine. IBM Syst. J. 39, 1 (2000), 211--238.
[5]
B. Alpern, S. Augart, S. M. Blackburn, M. Butrico, A. Cocchi, P. Cheng, J. Dolby, S. Fink, D. Grove, M. Hind, K. S. McKinley, M. Mergen, J. E. B. Moss, T. Ngo, and V. Sarkar. 2005. The Jikes research virtual machine project: Building an open-source research community. IBM Syst. J. 44, 2 (2005), 399--417.
[6]
David F. Bacon, Ravi Konuru, Chet Murthy, and Mauricio Serrano. 1998. Thin locks: Featherweight synchronization for java. In PLDI. 258--268.
[7]
Swarnendu Biswas, Minjia Zhang, Michael D. Bond, and Brandon Lucia. 2015. Valor: Efficient, software-only region conflict exceptions. In OOPSLA. 241--259.
[8]
S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. 2006. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA. 169--190.
[9]
Hans-J. Boehm. 2012. Position paper: Nondeterminism is unavoidable, but data races are pure evil. In RACES. 9--14.
[10]
Hans-J. Boehm and Sarita V. Adve. 2008. Foundations of the C++ concurrency memory model. In PLDI. 68--78.
[11]
Michael D. Bond, Milind Kulkarni, Man Cao, Meisam Fathi Salmi, and Jipeng Huang. 2015. Efficient deterministic replay of multithreaded executions in a managed language virtual machine. In PPPJ. 90--101.
[12]
Michael D. Bond, Milind Kulkarni, Man Cao, Minjia Zhang, Meisam Fathi Salmi, Swarnendu Biswas, Aritra Sengupta, and Jipeng Huang. 2013. Octet: Capturing and controlling cross-thread dependences efficiently. In OOPSLA. 693--712.
[13]
Chandrasekhar Boyapati, Robert Lee, and Martin Rinard. 2002. Ownership types for safe programming: Preventing data races and deadlocks. In OOPSLA. 211--230.
[14]
Mike Burrows. 2004. How to implement unnecessary mutexes. In Computer Systems Theory, Technology, and Applications. Springer--Verlag, 51--57.
[15]
Man Cao, Minjia Zhang, and Michael D. Bond. 2014. Drinking from both glasses: Adaptively combining pessimistic and optimistic synchronization for efficient parallel runtime support. In WoDet.
[16]
Man Cao, Minjia Zhang, Aritra Sengupta, and Michael D. Bond. 2016. Drinking from both glasses: Combining pessimistic and optimistic tracking of cross-thread dependences. In PPoPP. Article 20, 13 pages.
[17]
Chi Cao Minh, JaeWoong Chung, Christos Kozyrakis, and Kunle Olukotun. 2008. STAMP: Stanford transactional applications for multi-processing. In IISWC.
[18]
Jong-Deok Choi, Keunwoo Lee, Alexey Loginov, Robert O’Callahan, Vivek Sarkar, and Manu Sridharan. 2002. Efficient and precise datarace detection for multithreaded object-oriented programs. In PLDI. 258--269.
[19]
Luke Dalessandro and Michael L. Scott. 2012. Sandboxing transactional memory. In PACT. 171--180.
[20]
Luke Dalessandro, Michael F. Spear, and Michael L. Scott. 2010. NOrec: Streamlining STM by abolishing ownership records. In PPoPP. 67--78.
[21]
Brian Demsky and Alokika Dash. 2010. Evaluating contention management using discrete event simulation. In TRANSACT.
[22]
Dave Dice, Alex Kogan, Yossi Lev, Timothy Merrifield, and Mark Moir. 2014. Adaptive integration of hardware and software lock elision techniques. In SPAA. 188--197.
[23]
Aleksandar Dragojević, Rachid Guerraoui, and Michal Kapalka. 2009. Stretching transactional memory. In PLDI. 155--165.
[24]
Tayfun Elmas, Shaz Qadeer, and Serdar Tasiran. 2007. Goldilocks: A race and transaction-aware java runtime. In PLDI. 245--255.
[25]
Cormac Flanagan and Stephen N. Freund. 2009. FastTrack: Efficient and precise dynamic race detection. In PLDI. 121--133.
[26]
Cormac Flanagan, Stephen N. Freund, and Jaeheon Yi. 2008. Velodrome: A sound and complete dynamic atomicity checker for multithreaded programs. In PLDI. 293--303.
[27]
Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu, Honggo Wijaya, Christos Kozyrakis, and Kunle Olukotun. 2004. Transactional memory coherence and consistency. In ISCA. 102--113.
[28]
Tim Harris and Keir Fraser. 2003. Language support for lightweight transactions. In OOPSLA. 388--402.
[29]
Tim Harris, James Larus, and Ravi Rajwar. 2010. Transactional Memory (2nd ed.). Morgan 8 Claypool Publishers.
[30]
Tim Harris, Mark Plesko, Avraham Shinnar, and David Tarditi. 2006. Optimizing memory transactions. In PLDI. 14--25.
[31]
Yanyan Jiang, Du Li, Chang Xu, Xiaoxing Ma, and Jian Lu. 2015. Optimistic shared memory dependence tracing. In ASE. 524--534.
[32]
Tomas Kalibera, Matthew Mole, Richard Jones, and Jan Vitek. 2012. A black-box approach to understanding concurrency in DaCapo. In OOPSLA. 335--354.
[33]
Kiyokuni Kawachiya, Akira Koseki, and Tamiya Onodera. 2002. Lock reservation: Java locks can mostly do without atomic operations. In OOPSLA. 130--141.
[34]
Guy Korland, Nir Shavit, and Pascal Felber. 2010. Deuce: Noninvasive software transactional memory in java. Trans. High Perf. EAC 5, 2 (2010).
[35]
Leslie Lamport. 1978. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (1978), 558--565.
[36]
T. J. LeBlanc and J. M. Mellor-Crummey. 1987. Debugging parallel programs with instant replay. IEEE Trans. Comput. 36, 4 (1987), 471--482.
[37]
Dongyoon Lee, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. 2012. Chimera: Hybrid program analysis for determinism. In PLDI. 463--474.
[38]
Dongyoon Lee, Benjamin Wester, Kaushik Veeraraghavan, Satish Narayanasamy, Peter M. Chen, and Jason Flinn. 2010. Respec: Efficient online multiprocessor replay via speculation and external determinism. In ASPLOS. 77--90.
[39]
Yujie Liu, Justin Gottschlich, Gilles Pokam, and Michael Spear. 2015. TSXProf: Profiling hardware transactions. In PACT. 75--86.
[40]
Jeremy Manson, William Pugh, and Sarita V. Adve. 2005. The java memory model. In POPL. 378--391.
[41]
Hassan Salehe Matar, Ismail Kuru, Serdar Tasiran, and Roman Dementiev. 2014. Accelerating precise race detection using commercially-available hardware transactional memory support. In WoDet.
[42]
Vijay Menon, Steven Balensiefer, Tatiana Shpeisman, Ali-Reza Adl-Tabatabai, Richard L. Hudson, Bratin Saha, and Adam Welc. 2008. Practical weak-atomicity semantics for java STM. In SPAA. 314--325.
[43]
Iulian Neamtiu and Michael Hicks. 2009. Safe and timely updates to multi-threaded programs. In PLDI. 13--24.
[44]
Yang Ni, Adam Welc, Ali-Reza Adl-Tabatabai, Moshe Bach, Sion Berkowits, James Cownie, Robert Geva, Sergey Kozhukow, Ravi Narayanaswamy, Jeffrey Olivier, Serguei Preis, Bratin Saha, Ady Tal, and Xinmin Tian. 2008. Design and implementation of transactional constructs for C/C++. In OOPSLA. 195--212.
[45]
Marek Olszewski, Jeremy Cutler, and J. Gregory Steffan. 2007. JudoSTM: A dynamic binary-rewriting approach to software transactional memory. In PACT. 365--375.
[46]
Jessica Ouyang, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. 2013. ...and region serializability for all. In HotPar.
[47]
Soyeon Park, Yuanyuan Zhou, Weiwei Xiong, Zuoning Yin, Rini Kaushik, Kyu H. Lee, and Shan Lu. 2009. PRES: Probabilistic replay with execution sketching on multiprocessors. In SOSP. 177--192.
[48]
Carl G. Ritson and Frederick R. M. Barnes. 2013. An evaluation of Intel’s restricted transactional memory for CPAs. In CPA. 271--292.
[49]
Michiel Ronsse and Koen De Bosschere. 1999. RecPlay: A fully integrated practical record/replay system. Trans. Comput. Syst. 17, 2 (1999), 133--152.
[50]
Kenneth Russell and David Detlefs. 2006. Eliminating synchronization-related atomic operations with biased locking and bulk rebiasing. In OOPSLA. 263--272.
[51]
Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hudson, Chi Cao Minh, and Benjamin Hertzberg. 2006. McRT-STM: A high performance software transactional memory system for a multi-core runtime. In PPoPP. 187--197.
[52]
Daniel J. Scales, Kourosh Gharachorloo, and Chandramohan A. Thekkath. 1996. Shasta: A low overhead, software-only approach for supporting fine-grain shared memory. In ASPLOS. 174--185.
[53]
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond, and Milind Kulkarni. 2015. Hybrid static--dynamic analysis for statically bounded region serializability. In ASPLOS. 561--575.
[54]
Aritra Sengupta, Man Cao, Michael D. Bond, and Milind Kulkarni. 2017. Legato: End-to-end bounded region serializability using commodity hardware transactional memory. In CGO. 1--13.
[55]
Nehir Sonmez, Tim Harris, Adrian Cristal, Osman S. Unsal, and Mateo Valero. 2009. Taking the heat off transactions: Dynamic selection of pessimistic concurrency control. In IPDPS. 1--10.
[56]
Michael F. Spear, Luke Dalessandro, Virendra J. Marathe, and Michael L. Scott. 2009. A comprehensive strategy for contention management in software transactional memory. In PPoPP. 141--150.
[57]
Takayuki Usui, Reimer Behrends, Jacob Evans, and Yannis Smaragdakis. 2009. Adaptive locks: Combining transactions and locks for efficient concurrency. In PACT. 3--14.
[58]
Nalini Vasudevan, Kedar S. Namjoshi, and Stephen A. Edwards. 2010. Simple and fast biased locks. In PACT. 65--74.
[59]
Kaushik Veeraraghavan, Dongyoon Lee, Benjamin Wester, Jessica Ouyang, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. 2011. DoublePlay: Parallelizing sequential logging and replay. In ASPLOS. 15--26.
[60]
Christoph von Praun and Thomas R. Gross. 2001. Object race detection. In OOPSLA. 70--82.
[61]
Christoph von Praun and Thomas R. Gross. 2003. Static conflict analysis for multi-threaded object-oriented programs. In PLDI. 115--128.
[62]
Dasarath Weeratunge, Xiangyu Zhang, and Suresh Jagannathan. 2010. Analyzing multicore dumps to facilitate concurrency bug reproduction. In ASPLOS. 155--166.
[63]
Richard M. Yoo, Christopher J. Hughes, Konrad Lai, and Ravi Rajwar. 2013. Performance evaluation of intel transactional synchronization extensions for high-performance computing. In SC. Article 19, 11 pages.
[64]
Minjia Zhang, Swarnendu Biswas, and Michael D. Bond. 2016. Relaxed dependence tracking for parallel runtime support. In CC’16. 45--55.
[65]
Minjia Zhang, Jipeng Huang, Man Cao, and Michael D. Bond. 2015. Low-overhead software transactional memory with progress guarantees and strong semantics. In PPoPP. 97--108.
[66]
Ofri Ziv, Alex Aiken, Guy Golan-Gueta, G. Ramalingam, and Mooly Sagiv. 2015. Composing concurrency control. In PLDI. 240--249.
[67]
Ferad Zyulkyarov, Srdjan Stipic, Tim Harris, Osman S. Unsal, Adrián Cristal, Ibrahim Hur, and Mateo Valero. 2010. Discovering and understanding performance bottlenecks in transactional applications. In PACT. 285--294.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing
ACM Transactions on Parallel Computing  Volume 4, Issue 2
Special Issue: Invited papers from PPoPP 2016, Part 2
June 2017
154 pages
ISSN:2329-4949
EISSN:2329-4957
DOI:10.1145/3134419
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 August 2017
Accepted: 01 May 2017
Revised: 01 April 2017
Received: 01 December 2016
Published in TOPC Volume 4, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Dynamic analysis
  2. concurrency correctness
  3. data races
  4. dependence tracking
  5. runtime support for parallelism
  6. synchronization

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 238
    Total Downloads
  • Downloads (Last 12 months)69
  • Downloads (Last 6 weeks)20
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media