Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Noninvasive Java Concurrency with  Deuce STM 1.0   Guy Korland  “ Multi Core Tools”   CMP09
Outline Motivation Deuce Implementation TL2 LSA Benchmarks Summary References
Motivation
Problem I Process 1   Process 2 a = acc.get()    a = a + 100 b = acc.get()  b = b + 50  acc.set(b) acc.set(a)   ... Lost Update! ...   
Problem II Process 1 Process2 lock(A) lock(B) lock(B) lock(A)   ... Deadlock! ...
Cannot exploit cheap threads Today’s Software  Non-scalable methodologies Today’s Hardware Poor support for scalable synchronization. Low level support CAS, TAS, MemBar… The Problem
The Problem
Why Locking Doesn’t Scale? Not Robust Relies on conventions Hard to Use Conservative Deadlocks Lost wake-ups Not Composable
Outline Motivation Solutions Deuce  Implementation TL2 LSA Benchmarks Summary References
Solutions I – Domain specific Mathlab – Concurrency behind the scenes.  SQL/XQuery/XPath – DB will handle it…  HTML, ASP, PHP, JSP … – (almost) stateless. Fortress[Sun], X10[IBM], Chapel[UW] … –  implicit concurrency.  Remember Cobol! Domain too specific
Solutions II – Actor Model (Share nothing model) Carl Hewitt, Peter Bishop and Richard, A Universal Modular Actor Formalism for Artificial Intelligence  [IJCAI 1973]. An actor, on message: no shared data send messages to other actors create new actors Where can we find it? Simula, Smalltalk, Scala, Haskell, F#, Erlang ... Functional languges
Solutions II – Actor Model (Share nothing model) - module (counter). - export ([run/0, counter/1]).      run() ->      S =  spawn (counter, counter, [0]),       send_msgs (S, 100000),      S.   counter(Sum) ->      receive          {inc, Amount} -> counter(Sum+Amount)      end. send_msgs (_, 0) -> true; send_msgs (S, Count) ->      S ! {inc, 1},  send_msgs (S, Count-1).   Actors in Erlang   Is it really easier? What about performance? Will functional languages  ever be functional?  Java/.NET/C++ rules! !!   (maybe Ruby)
Solutions III – STM   Nir Shavit, DAN TOUITOU, Software Transactional Memory [PODC95] synchronized { <instructions> } atomic { <instructions> }   l.lock(); <instructions> l.unlock();
What is a transaction? A tomicity – all or nothing C onsistency – consistent state  (after & before) I solation – Other can’t see intermediate. D urability - persistent Or maybe we do want it?
The Brief History of STM 1993 STM  (Shavit,Touitou) 2003 DSTM  (Herlihy et al) 2003 WSTM  (Fraser, Harris) 2003 OSTM  (Fraser, Harris) 2004 ASTM  (Marathe et al) 2004 T-Monitor  (Jagannathan … ) 2005 Lock-OSTM  (Ennals) 2004 HybridTM  (Moir) 2004 Meta Trans  (Herlihy, Shavit) 2005 McTM  (Saha et al) 2006 AtomJava  (Hindman…) 1997 Trans Support TM  (Moir) 2005 TL  (Dice, Shavit)) 2004 Soft Trans  (Ananian, Rinard) 2006 LSA  (Riegel et al 2006 TL2  (Dice, Shavit, Shalev) 2009 Deuce  (Korland et al) 2008 Rock  (Sun) 2006 DSTM2  (Herlihy, Luchangco) 2007 Tanger
DSTM2 Maurice Herlihy et al, A flexible framework … [OOPSLA06] @atomic   public interface   INode{ int   getValue (); void   setValue ( int   value ); INode getNext (); void   setNext (INode value ); } Factory < INode >  factory = Thread.makeFactory(INode. class   ); result = Thread.doIt( new   Callable < Boolean > () {   public   Boolean call () {   return   intSet.insert (value);   }   }); Limited to Objects. V ery intrusive. Doesn’t support libraries. Bad performance (fork).
JVSTM João Cachopo and António Rito-Silva, Versioned boxes as the basis for memory transactions [SCOOL05] public class  Account{  private   VBox <Long> balance =  new   VBox <Long>();  public   @Atomic   void  withdraw( long  amount) {  balance.put (balance.get() - amount);  } } Doesn’t support libraries. Less intrusive. Need to “Announce” shared fields
Atom-Java B. Hindman and D. Grossman. Atomicity via source-tosource translation. [MSPC06] public void  update (  double  value){ Atomic{ commission   += value; } } Add a reserved word. Need precompilation. Doesn’t support libraries. Even Less intrusive.
Multiverse Peter Veentjer, 2009  @TmEntity public class  Stack<E>{   private  Node<E> head;  public void  push(E item) {      head =  new  Node(item, head);   }  } @TmEntity    public static class  Node<E> {          final  E value;          final  Node parent;         Node(E value, Node prev) {              this .value = value;              this .parent = prev;         }     }  Doesn’t support libraries. Limited to Objects.
DATM-J Hany E. Ramadan et al., Dependence-aware transactional memory [MICRO08] Transaction tx = new Transaction ( id) ; boolean  done = false; while  ( !done) { try{ tx.BeginTransaction( ) ; / / txnl code done = tx.CommitTransaction ( ) ; } catch( AbortException e ) { tx.AbortTransaction( ) ; done = false; } } Explicit transaction. Explicit retry.
Outline Motivation Solutions Deuce Implementation TL2 LSA Benchmarks Summary References
Deuce STM Java STM framework @Atomic  methods Field based access More scalable than Object bases. More efficient than word based. Supports external libraries Can be part of a transaction No reserved words No need for new compilers (Existing IDEs can be used) Research tool API for developing and testing new algorithms.
Deuce - API public class  Bank{ final private static   double  MAXIMUM_TRANSACTION = 1000; private   double  commission = 0; @Atomic (retries=64) public void  transaction( Account ac1, Account ac2,  double  amount){ ac1. balance  -= (amount +  commission ); ac2. balance  += amount; } @Atomic public void  update(  double  value){ commission += value; } }
Deuce - Overview
Deuce - Running – javaagent:deuceAgent.jar  Dynamic bytecode manipulation. -Xbootclasspath/p:rt.jar Offline instrumentation to support boot classloader. java –javaagent:deuceAgent.jar –cp “myjar.jar” MyMain
Outline Motivation Solutions Deuce Implementation TL2 LSA Benchmarks Summary References
Implementation ASM – Bytecode manipulation Online & Offline Fields  private   double  commission; final static public long   commission__ADDRESS ... Relative address (-1 if final). final static public  Object  __CLASS_BASE__  ... Mark the class base for static fields access.
Implementation Method  @Atomic methods. Replace the with a transaction retry loop. Add another instrumented method. Non-Atomic methods Duplicate each with an instrumented version.
Implementation @Atomic public void  update (  double  value){ double tmp =  commission ; commission  = tmp + value; } @Atomic public void  update (  double  value){ commission  += value; } In byte code
Implementation public void  update(  double  value, Context c){ double  tmp; if (  commission__ADDRESS < 0 )  {   // final field tmp =  commission ; } else { c.beforeRead( this,  commission__ADDRESS); tmp = c.onRead( this,  commission ,    commission__ADDRESS); } c.onWrite( this, tmp + value,  commission__ADDRESS); } JIT removes it
Implementation public void  update(  double  value, Context c){ c.beforeRead( this,  commission__ADDRESS); double   tmp = c.onRead( this,  commission ,    commission__ADDRESS); c.onWrite( this, tmp + value,  commission__ADDRESS); }
Implementation public void  update(  double  value){ Context context = ContextDelegetor.getContext(); for (  int  i = retries ; i > 0 ; --i){ context.init (); try { update( value, context);   if (  context.commit ())  return ; } catch  ( TransactionException e ){ context.rollback (); continue ; } catch  ( Throwable t ){ if (  context.commit ()) throw  t; }  } throw new  TransactionException(); }
Implementation public interface  Context{ void  init ( int atomicBlockId) boolean  commit(); void  rollback (); void  beforeReadAccess( Object obj ,  long  field ); Object onReadAccess( Object obj, Object value ,  long  field ); int  onReadAccess( Object obj,  int  value ,  long  field ); long  onReadAccess( Object obj,  long  value ,  long  field ); … void  onWriteAccess( Object obj , Object value ,  long  field ); void  onWriteAccess( Object obj ,  int  value ,  long  field ); void  onWriteAccess( Object obj ,  long  value ,  long  field ); … }
Outline Motivation Solutions Deuce Implementation TL2 LSA Benchmarks Summary References
TL2 (Transaction Locking II) Dave Dice, Ori Shalev and Nir Shavit [DISC06] CTL - Commit-time locking Start Sample global version-clock Run through a speculative execution Collect write-set & read-set End Lock the write-set Increment global version-clock Validate the read-set Commit and release the locks
Outline Motivation Solutions Deuce Implementation TL2 LSA Benchmarks Summary References
LSA (Lazy Snapshot Algorithm) Torvald Riegel, Pascal Felber and Christof Fetzer [DISC06] ETL - Encounter-time locking Start Sample global version-clock Run through a speculative execution Lock on write access Collect read-set & write-set On validation error try to extend snapshot End Increment global version-clock Validate the read-set Commit and release the locks
Outline Motivation Solutions Deuce Implementation TL2 LSA Benchmarks Summary References
Benchmarks  (Azul – Vega2 – 2 x 46)
Benchmarks  (SuperMicro – 2 x Quad Intel)
Benchmarks  (Sun UltraSPARC T2 Plus – 2 x Quad x 8 HT )
Outline Motivation Solutions Deuce Implementation TL2 LSA Benchmarks Summary References
Summary Simple API @Atomic No changes to Java No reserved words OpenSource On Google code Shows nice scalabilty Field based
Outline Motivation Solutions Deuce Implementation TL2 LSA Benchmarks Summary References
References Homepage -  http://www.deucestm.org Project -  http://code.google.com/p/deuce/ Wikipedia - http://en.wikipedia.org/wiki/Software_transactional_memory TL2 –  http://research.sun.com/scalable LSA-STM -  http://tmware.org/lsastm

More Related Content

Deuce STM - CMP'09

  • 1. Noninvasive Java Concurrency with Deuce STM 1.0 Guy Korland “ Multi Core Tools” CMP09
  • 2. Outline Motivation Deuce Implementation TL2 LSA Benchmarks Summary References
  • 4. Problem I Process 1 Process 2 a = acc.get()    a = a + 100 b = acc.get()  b = b + 50  acc.set(b) acc.set(a) ... Lost Update! ...  
  • 5. Problem II Process 1 Process2 lock(A) lock(B) lock(B) lock(A) ... Deadlock! ...
  • 6. Cannot exploit cheap threads Today’s Software Non-scalable methodologies Today’s Hardware Poor support for scalable synchronization. Low level support CAS, TAS, MemBar… The Problem
  • 8. Why Locking Doesn’t Scale? Not Robust Relies on conventions Hard to Use Conservative Deadlocks Lost wake-ups Not Composable
  • 9. Outline Motivation Solutions Deuce Implementation TL2 LSA Benchmarks Summary References
  • 10. Solutions I – Domain specific Mathlab – Concurrency behind the scenes. SQL/XQuery/XPath – DB will handle it… HTML, ASP, PHP, JSP … – (almost) stateless. Fortress[Sun], X10[IBM], Chapel[UW] … – implicit concurrency. Remember Cobol! Domain too specific
  • 11. Solutions II – Actor Model (Share nothing model) Carl Hewitt, Peter Bishop and Richard, A Universal Modular Actor Formalism for Artificial Intelligence [IJCAI 1973]. An actor, on message: no shared data send messages to other actors create new actors Where can we find it? Simula, Smalltalk, Scala, Haskell, F#, Erlang ... Functional languges
  • 12. Solutions II – Actor Model (Share nothing model) - module (counter). - export ([run/0, counter/1]).     run() ->     S = spawn (counter, counter, [0]),     send_msgs (S, 100000),     S.   counter(Sum) ->     receive         {inc, Amount} -> counter(Sum+Amount)     end. send_msgs (_, 0) -> true; send_msgs (S, Count) ->     S ! {inc, 1}, send_msgs (S, Count-1).   Actors in Erlang Is it really easier? What about performance? Will functional languages ever be functional? Java/.NET/C++ rules! !! (maybe Ruby)
  • 13. Solutions III – STM Nir Shavit, DAN TOUITOU, Software Transactional Memory [PODC95] synchronized { <instructions> } atomic { <instructions> } l.lock(); <instructions> l.unlock();
  • 14. What is a transaction? A tomicity – all or nothing C onsistency – consistent state (after & before) I solation – Other can’t see intermediate. D urability - persistent Or maybe we do want it?
  • 15. The Brief History of STM 1993 STM (Shavit,Touitou) 2003 DSTM (Herlihy et al) 2003 WSTM (Fraser, Harris) 2003 OSTM (Fraser, Harris) 2004 ASTM (Marathe et al) 2004 T-Monitor (Jagannathan … ) 2005 Lock-OSTM (Ennals) 2004 HybridTM (Moir) 2004 Meta Trans (Herlihy, Shavit) 2005 McTM (Saha et al) 2006 AtomJava (Hindman…) 1997 Trans Support TM (Moir) 2005 TL (Dice, Shavit)) 2004 Soft Trans (Ananian, Rinard) 2006 LSA (Riegel et al 2006 TL2 (Dice, Shavit, Shalev) 2009 Deuce (Korland et al) 2008 Rock (Sun) 2006 DSTM2 (Herlihy, Luchangco) 2007 Tanger
  • 16. DSTM2 Maurice Herlihy et al, A flexible framework … [OOPSLA06] @atomic public interface INode{ int getValue (); void setValue ( int value ); INode getNext (); void setNext (INode value ); } Factory < INode > factory = Thread.makeFactory(INode. class ); result = Thread.doIt( new Callable < Boolean > () { public Boolean call () { return intSet.insert (value); } }); Limited to Objects. V ery intrusive. Doesn’t support libraries. Bad performance (fork).
  • 17. JVSTM João Cachopo and António Rito-Silva, Versioned boxes as the basis for memory transactions [SCOOL05] public class Account{ private VBox <Long> balance = new VBox <Long>(); public @Atomic void withdraw( long amount) { balance.put (balance.get() - amount); } } Doesn’t support libraries. Less intrusive. Need to “Announce” shared fields
  • 18. Atom-Java B. Hindman and D. Grossman. Atomicity via source-tosource translation. [MSPC06] public void update ( double value){ Atomic{ commission += value; } } Add a reserved word. Need precompilation. Doesn’t support libraries. Even Less intrusive.
  • 19. Multiverse Peter Veentjer, 2009 @TmEntity public class Stack<E>{ private Node<E> head; public void push(E item) {     head = new Node(item, head); } } @TmEntity   public static class Node<E> {         final E value;         final Node parent;         Node(E value, Node prev) {             this .value = value;             this .parent = prev;         }     } Doesn’t support libraries. Limited to Objects.
  • 20. DATM-J Hany E. Ramadan et al., Dependence-aware transactional memory [MICRO08] Transaction tx = new Transaction ( id) ; boolean done = false; while ( !done) { try{ tx.BeginTransaction( ) ; / / txnl code done = tx.CommitTransaction ( ) ; } catch( AbortException e ) { tx.AbortTransaction( ) ; done = false; } } Explicit transaction. Explicit retry.
  • 21. Outline Motivation Solutions Deuce Implementation TL2 LSA Benchmarks Summary References
  • 22. Deuce STM Java STM framework @Atomic methods Field based access More scalable than Object bases. More efficient than word based. Supports external libraries Can be part of a transaction No reserved words No need for new compilers (Existing IDEs can be used) Research tool API for developing and testing new algorithms.
  • 23. Deuce - API public class Bank{ final private static double MAXIMUM_TRANSACTION = 1000; private double commission = 0; @Atomic (retries=64) public void transaction( Account ac1, Account ac2, double amount){ ac1. balance -= (amount + commission ); ac2. balance += amount; } @Atomic public void update( double value){ commission += value; } }
  • 25. Deuce - Running – javaagent:deuceAgent.jar Dynamic bytecode manipulation. -Xbootclasspath/p:rt.jar Offline instrumentation to support boot classloader. java –javaagent:deuceAgent.jar –cp “myjar.jar” MyMain
  • 26. Outline Motivation Solutions Deuce Implementation TL2 LSA Benchmarks Summary References
  • 27. Implementation ASM – Bytecode manipulation Online & Offline Fields private double commission; final static public long commission__ADDRESS ... Relative address (-1 if final). final static public Object __CLASS_BASE__ ... Mark the class base for static fields access.
  • 28. Implementation Method @Atomic methods. Replace the with a transaction retry loop. Add another instrumented method. Non-Atomic methods Duplicate each with an instrumented version.
  • 29. Implementation @Atomic public void update ( double value){ double tmp = commission ; commission = tmp + value; } @Atomic public void update ( double value){ commission += value; } In byte code
  • 30. Implementation public void update( double value, Context c){ double tmp; if ( commission__ADDRESS < 0 ) { // final field tmp = commission ; } else { c.beforeRead( this, commission__ADDRESS); tmp = c.onRead( this, commission , commission__ADDRESS); } c.onWrite( this, tmp + value, commission__ADDRESS); } JIT removes it
  • 31. Implementation public void update( double value, Context c){ c.beforeRead( this, commission__ADDRESS); double tmp = c.onRead( this, commission , commission__ADDRESS); c.onWrite( this, tmp + value, commission__ADDRESS); }
  • 32. Implementation public void update( double value){ Context context = ContextDelegetor.getContext(); for ( int i = retries ; i > 0 ; --i){ context.init (); try { update( value, context); if ( context.commit ()) return ; } catch ( TransactionException e ){ context.rollback (); continue ; } catch ( Throwable t ){ if ( context.commit ()) throw t; } } throw new TransactionException(); }
  • 33. Implementation public interface Context{ void init ( int atomicBlockId) boolean commit(); void rollback (); void beforeReadAccess( Object obj , long field ); Object onReadAccess( Object obj, Object value , long field ); int onReadAccess( Object obj, int value , long field ); long onReadAccess( Object obj, long value , long field ); … void onWriteAccess( Object obj , Object value , long field ); void onWriteAccess( Object obj , int value , long field ); void onWriteAccess( Object obj , long value , long field ); … }
  • 34. Outline Motivation Solutions Deuce Implementation TL2 LSA Benchmarks Summary References
  • 35. TL2 (Transaction Locking II) Dave Dice, Ori Shalev and Nir Shavit [DISC06] CTL - Commit-time locking Start Sample global version-clock Run through a speculative execution Collect write-set & read-set End Lock the write-set Increment global version-clock Validate the read-set Commit and release the locks
  • 36. Outline Motivation Solutions Deuce Implementation TL2 LSA Benchmarks Summary References
  • 37. LSA (Lazy Snapshot Algorithm) Torvald Riegel, Pascal Felber and Christof Fetzer [DISC06] ETL - Encounter-time locking Start Sample global version-clock Run through a speculative execution Lock on write access Collect read-set & write-set On validation error try to extend snapshot End Increment global version-clock Validate the read-set Commit and release the locks
  • 38. Outline Motivation Solutions Deuce Implementation TL2 LSA Benchmarks Summary References
  • 39. Benchmarks (Azul – Vega2 – 2 x 46)
  • 40. Benchmarks (SuperMicro – 2 x Quad Intel)
  • 41. Benchmarks (Sun UltraSPARC T2 Plus – 2 x Quad x 8 HT )
  • 42. Outline Motivation Solutions Deuce Implementation TL2 LSA Benchmarks Summary References
  • 43. Summary Simple API @Atomic No changes to Java No reserved words OpenSource On Google code Shows nice scalabilty Field based
  • 44. Outline Motivation Solutions Deuce Implementation TL2 LSA Benchmarks Summary References
  • 45. References Homepage - http://www.deucestm.org Project - http://code.google.com/p/deuce/ Wikipedia - http://en.wikipedia.org/wiki/Software_transactional_memory TL2 – http://research.sun.com/scalable LSA-STM - http://tmware.org/lsastm

Editor's Notes

  1. Marabma – 128 threads ($17,995) Vega 3 - 864 processors
  2. Specific domain are too specific to rule them all. New concurrent languages are too different from the wildly used languages. X10 – add async command Chapel – add send/recv
  3. Working with messages can lead to deadlock also, and not intuitive. Every thing is immutable functional languages are every hard to work with in real applications. Imperative programming is too common.
  4. No need for durability since we’re changing memory state. Remark, maybe we do want it….