Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
The Binary Compatibility 
Challenge 
Martin Odersky 
Typesafe and EPFL
The Problem in a Nutshell 
• Binary compatibility has been an issue ever since Scala 
became popular. 
• Causes grief when building, friction for upgrading. 
• The community has learned to deal with this by 
becoming more conservative. 
• But this makes it harder to innovate and improve. 
Break your client’s builds vs Freeze, and stop improving 
Is there no third way? 
2
What is Binary Compatibility? 
Binary compatibility ≠ Source compatibility 
Source & binary incompatible 
object 
Client 
{ 
msg.length 
} 
object 
Server 
{ 
val 
msg 
= 
“abc” 
} 
object 
Server 
{ 
val 
msg 
= 
Some(“abc”) 
}
What is Binary Compatibility? 
Binary compatibility ≠ Source compatibility 
Source incompatible, binary compatible: 
object 
Client 
{ 
import 
a, 
b 
val 
x: 
String 
= 
1 
} 
object 
a 
{ 
implicit 
def 
f(x: 
Int): 
String 
= 
x.toString 
} 
object 
b 
object 
a 
{ 
implicit 
def 
f(x: 
Int): 
String 
= 
x.toString 
} 
object 
b 
{ 
implicit 
def 
g(x: 
Int): 
String 
= 
”abc” 
}
What is Binary Compatibility? 
Binary compatibility ≠ Source compatibility 
Source compatible, binary incompatible: 
object 
Apple 
extends 
Edible 
{ 
def 
joules 
= 
500000 
} 
trait 
Edible 
{ 
def 
joules: 
Double 
} 
trait 
Edible 
{ 
def 
joules: 
Double 
def 
calories 
= 
joules 
* 
4.184 
} 
 
èNeed to recompile on Java 1-7 
In Java 8 it’s more complex but fundamentally the same.
What is Binary Compatibility? 
Binary compatibility ≠ Source compatibility 
Source compatible, binary incompatible: 
object 
Apple 
extends 
Edible 
{ 
def 
joules 
= 
500000 
} 
trait 
Edible 
{ 
def 
joules: 
Double 
def 
calories: 
Double 
} 
trait 
Edible 
{ 
object 
def 
Edible$joules: 
class 
Double 
{ 
} 
def 
calories($this: 
Edible): 
Double 
= 
$this.joules 
* 
4.184 
trait 
Edible 
{ 
def 
joules: 
Double 
def 
calories 
= 
joules 
* 
4.184 
} 
} 
object 
Apple 
extends 
Edible 
{ 
def 
joules 
= 
500000.0 
def 
calories: 
Double 
= 
Edibl$class.calories(this) 
} 
 
èNeed to recompile on Java 1-7 
In Java 8 it’s more complex but fundamentally the same.
Other Issues 
Compiler optimizations and bug fixes can affect binary 
compatibility. 
Example: Implementation of lazy values. 
trait 
Edible 
{ 
def 
joules: 
Double 
lazy 
val 
def 
calories 
= 
joules 
* 
4.184 
} 
object 
Apple 
extends 
Edible 
{ 
def 
joules 
= 
500000 
} 
object 
Apple 
extends 
Edible 
{ 
def 
joules 
= 
500000.0 
private 
var 
initFlags: 
BitSet 
private 
var 
cals: 
Int 
= 
_ 
def 
calories 
= 
{ 
if 
(!initFlags(N)) 
{ 
cals 
= 
Edible$class.initCals(this) 
initFlags(N) 
= 
true 
} 
cals 
}} 
Previously: 
1 bit per lazy val 
To avoid deadlocks: 
2 bits. 
è all offsets change!
Compiler Pipeline 
Parser 
Typer 
SyntheticMethods 
SuperAccessors 
RefChecks 
ElimRepeated 
ElimLocals 
ExtensionMethods 
TailRec 
PatternMatcher 
ExplicitOuter 
Erasure 
Mixin 
Memoize 
LazyVals 
CapturedVars 
Constructors 
LambdaLift 
Flatten 
RestoreScopes 
Cleanup 
more GenBCode 
phases 
Source 
Symbols 
JVM 
Byte-code 
Lots of 
scope for 
things to 
go wrong!
Where It Breaks 
A.class 
C.class 
C.class 
C.scala C.scala 
(binary incompatible 
source change)
Why Is This Such a Big Problem? 
MyApplication 
DustyLegacyLib 
Scala 
Library 
2.10 
X can’t upgrade to 
(too old, can’t rebuild) 
Scala 
Library 
2.11 
Seq.scala Seq.scala 
(binary incompatible 
source change) 
Scala 2.11!
Not Just A Problem with Scala-Library 
MyApplication 
DustyLegacyLib 
Akka 
3.2 
X can’t upgrade to 
(can’t rebuild) 
Akka 
3.3 
Actor.scala Actor.scala 
(binary incompatible 
source change) 
Akka 3.3!
Not Just A Problem with Scala-Library 
MyApplication 
DustyLegacyLib 
shapeless 
2.0 
X can’t upgrade to 
(can’t rebuild) 
shapeless 
2.1 
unions.scala unions.scala 
(binary incompatible 
source change) 
shapeless 2.1!
Dealing With It So Far 
“MiMa” tool can detect binary incompatibilities. 
Scala release policy: 
– Minor versions need to be (forwards and backwards) binary 
compatible. 
– Major versions are allowed to break binary compatibility 
– Major versions are released rarely (+18 months between them). 
Problem: 
– 3rd party libraries need similar policies but often don’t enforce 
them. 
– Innovation is stifled. 
– Simple fixes have to wait for a long time to get in. 
– Lots of dev cycles spent on dealing with binary compatibility.
What Do Others Do? 
Java: 
• Language close to JVM bytecode. 
• Innovation happens on JVM level. 
– Either in the JVM itself or through reflection. 
– E.g. Java 8 lambdas, default methods. 
• Libraries are frozen when they appear. 
– E.g. java.util.Date 
• Language is restricted in terms of extensibility 
– E.g. nterface1, interface2, ... interface7 in Eclipse.
What Do Others Do? 
OSGI: Allow multiple versions of a library in an application 
MyApplication 
DustyLegacyLib 
Scala 
Library 
2.10 
MyApplication 
Scala 
Library 
2.11 
rebuild 
• Fragile, requires serious classloader magic. 
• Few frameworks beyond Eclipse have bought in.
What Do Others Do? 
C/C++: 
• Relies on Linker for more flexibility in interfaces. 
• Not that great a story either (c.f. DLL Hell).
What Do Others Do? 
Clojure: 
• Builds from source.
What Do Others Do? 
Javascript: 
• Builds from source.
What Do Others Do? 
Python: 
• Builds from source.
What Do Others Do? 
Ruby: 
• Builds from source.
What Do Others Do? 
Go: 
• Builds from source.
Why Can’t Scala Build from Source? 
No standard Build Tool 
Should we standardize on SBT, Gradle, Maven, Ivy, Ant? 
Reproducible builds are rare. 
Chicken and egg problem: 
Because everyone is used to binary builds, nobody* invests in 
making builds reproducible 
*Not quite true: Typesafe has invested in community build, can now 
build more than 1M lines of community projects. But it’s a huge 
effort.
What We Need 
• An interchange format 
that captures the essence 
of Scala dependencies. 
• This cannot be the JVM 
bytecode format 
• Nor can it be source 
23
The Idea 
Use Typed Trees as an 
interchange format. 
– More robust than source. 
– More stable than JVM 
bytecode. 
– Efficient?
New Compiler Pipeline 
Parser 
Typer 
SyntheticMethods 
SuperAccessors 
RefChecks 
ElimRepeated 
ElimLocals 
ExtensionMethods 
TailRec 
PatternMatcher 
ExplicitOuter 
Erasure 
Mixin 
Memoize 
LazyVals 
CapturedVars 
Constructors 
LambdaLift 
Flatten 
RestoreScopes 
Cleanup 
GenBCode 
Typed Trees 
Typed Trees 
Source 
Frontend Backend Bytecode
How To Build 
A.class 
| 
ATree 
| 
HashA1 
B.class 
| 
BTree 
| 
HashB1 
C.class 
| 
CTree 
| 
HashC1 
A.class 
| 
ATree 
| 
HashA2 
B.class 
| 
BTree 
| 
HashB2 
C.class 
| 
CTree 
| 
HashC2 
(rebuild from 
BTree) 
C.scala C.scala 
(source change) 
(rebuild)
More Robust Than Source 
Source 
Parser 
Typer 
SyntheticMethods 
SuperAccessors 
Typed Trees 
Frontend 
Ø Resolve Names 
Ø scan packages 
Ø handle imports 
Ø establish implicit scopes 
Ø Resolve Overloading 
Ø Find Implicits 
Ø Apply Conversions 
Ø Infer Type Parameters 
Ø Assign Types to Trees 
(5374 lines) 
A lot can go wrong here!
More Robust Than Source 
RefChecks 
ElimRepeated 
ElimLocals 
ExtensionMethods 
TailRec 
PatternMatcher 
ExplicitOuter 
Erasure 
Mixin 
Memoize 
LazyVals 
CapturedVars 
Constructors 
LambdaLift 
Flatten 
RestoreScopes 
Cleanup 
GenBCode 
Typed Trees 
Backend Bytecode 
Assign Types 
(311 lines)
More Resilient Than Bytecode 
Can 
– add fields and methods to traits 
– add lazy vals anywhere 
– change compilation scheme in any way necessary. 
None of these would be binary compatible! 
Can also 
– add or remove implicits 
– add methods anywhere 
– change imports 
All of these could be source incompatible!
Efficient? 
Can typed trees be efficient enough to build million+ line 
systems? 
Possible issues: 
• Size of trees 
– on disk 
– in memory 
• Transformation time 
30
Potential Issue: Tree Size 
xs.filter(_ 
= 
0) 
becomes: Apply 
Select 
Ident 
“filter” 
“xs” 
:: 
Block 
DefDef 
“anonfun” 
:: 
:: 
ValDef 
“$x” 
Nil 
Nil 
Nil 
TypeTree: 
Int 
Apply 
Select 
Ident 
“$x” 
“=” 
Literal 
0 
Literal 
“anonfun” 
:: 
16 Nodes, Nil 
not counting types 
17 chars
Back of the Envelope Calculation: 
16 nodes 
Average size of node: 32 bytes 
512 bytes total. 
Double that to include type info. 
è 16 bytes source à 1KB tree (factor 64 blow-up). 
For a 1M line system 
30MB source à 2GB trees. 
32
A More Compact Representation 
Apply 
(34) 
SelectTermWithSig 
(9) 
Ident 
(3) 
“xs” 
“filter” 
“Function1 
-­‐ 
Boolean” 
Closure 
(23) 
ParamDef 
(7) 
“x” 
TypeRef 
“scala.Int” 
Apply 
(14) 
SelectTermWithSig 
(9) 
Ident 
(3) 
“x$” 
“=“ 
“Integer 
-­‐ 
Boolean” 
Literal 
(3) 
0 
33 
Still navigable, 
because inner nodes 
contain size of total tree 
derived from them 
Types or symbols given 
at the leaves. 
Types of inner nodes are 
reconstituted using the 
TypeAssigner.
Speed 
Transformation + byte-code generation amounts to ~ 60% 
of total compile time. 
We can speed this up by 
– fusing phases, reducing amount of intermediate trees, 
– using a fast type assigner, instead of a slow typer, 
– building different files in parallel. 
Besides, can use incremental compilation. 
– Compile only this units that depend on changed libraries. 
– Need to do that only once. 
34
Other Benefits 
Optimization 
– Typed trees are a great format for interprocedural analyses 
– Inlining across compilation units made simple 
– Inlining without binary compatibility issues 
Program Analysis 
– Types trees are close to source, but easy to traverse 
– Ideal for context-dependent program analyses such as FindBugs 
– Ideal for instrumentation 
Portability 
– Typed trees allow retargeting to different backends, as long as 
dependencies exist. 
– Allow libraries to be used on JVM, JS, LLVM... without needing 
explicit recompilation. 
35
Common Intermediate Format 
36 
New 
Backend 
dotc 
Frontend 
scalac 
Frontend 
Old 
Backend 
GenBCode 
Bytecode
Conclusion 
37 
Typed 
trees 
can fix the 
binary 
compatibility 
problem and they 
offer lots of 
other benefits, too. 
Let’s start the 
work to make them 
real!

More Related Content

Scalax

  • 1. The Binary Compatibility Challenge Martin Odersky Typesafe and EPFL
  • 2. The Problem in a Nutshell • Binary compatibility has been an issue ever since Scala became popular. • Causes grief when building, friction for upgrading. • The community has learned to deal with this by becoming more conservative. • But this makes it harder to innovate and improve. Break your client’s builds vs Freeze, and stop improving Is there no third way? 2
  • 3. What is Binary Compatibility? Binary compatibility ≠ Source compatibility Source & binary incompatible object Client { msg.length } object Server { val msg = “abc” } object Server { val msg = Some(“abc”) }
  • 4. What is Binary Compatibility? Binary compatibility ≠ Source compatibility Source incompatible, binary compatible: object Client { import a, b val x: String = 1 } object a { implicit def f(x: Int): String = x.toString } object b object a { implicit def f(x: Int): String = x.toString } object b { implicit def g(x: Int): String = ”abc” }
  • 5. What is Binary Compatibility? Binary compatibility ≠ Source compatibility Source compatible, binary incompatible: object Apple extends Edible { def joules = 500000 } trait Edible { def joules: Double } trait Edible { def joules: Double def calories = joules * 4.184 } èNeed to recompile on Java 1-7 In Java 8 it’s more complex but fundamentally the same.
  • 6. What is Binary Compatibility? Binary compatibility ≠ Source compatibility Source compatible, binary incompatible: object Apple extends Edible { def joules = 500000 } trait Edible { def joules: Double def calories: Double } trait Edible { object def Edible$joules: class Double { } def calories($this: Edible): Double = $this.joules * 4.184 trait Edible { def joules: Double def calories = joules * 4.184 } } object Apple extends Edible { def joules = 500000.0 def calories: Double = Edibl$class.calories(this) } èNeed to recompile on Java 1-7 In Java 8 it’s more complex but fundamentally the same.
  • 7. Other Issues Compiler optimizations and bug fixes can affect binary compatibility. Example: Implementation of lazy values. trait Edible { def joules: Double lazy val def calories = joules * 4.184 } object Apple extends Edible { def joules = 500000 } object Apple extends Edible { def joules = 500000.0 private var initFlags: BitSet private var cals: Int = _ def calories = { if (!initFlags(N)) { cals = Edible$class.initCals(this) initFlags(N) = true } cals }} Previously: 1 bit per lazy val To avoid deadlocks: 2 bits. è all offsets change!
  • 8. Compiler Pipeline Parser Typer SyntheticMethods SuperAccessors RefChecks ElimRepeated ElimLocals ExtensionMethods TailRec PatternMatcher ExplicitOuter Erasure Mixin Memoize LazyVals CapturedVars Constructors LambdaLift Flatten RestoreScopes Cleanup more GenBCode phases Source Symbols JVM Byte-code Lots of scope for things to go wrong!
  • 9. Where It Breaks A.class C.class C.class C.scala C.scala (binary incompatible source change)
  • 10. Why Is This Such a Big Problem? MyApplication DustyLegacyLib Scala Library 2.10 X can’t upgrade to (too old, can’t rebuild) Scala Library 2.11 Seq.scala Seq.scala (binary incompatible source change) Scala 2.11!
  • 11. Not Just A Problem with Scala-Library MyApplication DustyLegacyLib Akka 3.2 X can’t upgrade to (can’t rebuild) Akka 3.3 Actor.scala Actor.scala (binary incompatible source change) Akka 3.3!
  • 12. Not Just A Problem with Scala-Library MyApplication DustyLegacyLib shapeless 2.0 X can’t upgrade to (can’t rebuild) shapeless 2.1 unions.scala unions.scala (binary incompatible source change) shapeless 2.1!
  • 13. Dealing With It So Far “MiMa” tool can detect binary incompatibilities. Scala release policy: – Minor versions need to be (forwards and backwards) binary compatible. – Major versions are allowed to break binary compatibility – Major versions are released rarely (+18 months between them). Problem: – 3rd party libraries need similar policies but often don’t enforce them. – Innovation is stifled. – Simple fixes have to wait for a long time to get in. – Lots of dev cycles spent on dealing with binary compatibility.
  • 14. What Do Others Do? Java: • Language close to JVM bytecode. • Innovation happens on JVM level. – Either in the JVM itself or through reflection. – E.g. Java 8 lambdas, default methods. • Libraries are frozen when they appear. – E.g. java.util.Date • Language is restricted in terms of extensibility – E.g. nterface1, interface2, ... interface7 in Eclipse.
  • 15. What Do Others Do? OSGI: Allow multiple versions of a library in an application MyApplication DustyLegacyLib Scala Library 2.10 MyApplication Scala Library 2.11 rebuild • Fragile, requires serious classloader magic. • Few frameworks beyond Eclipse have bought in.
  • 16. What Do Others Do? C/C++: • Relies on Linker for more flexibility in interfaces. • Not that great a story either (c.f. DLL Hell).
  • 17. What Do Others Do? Clojure: • Builds from source.
  • 18. What Do Others Do? Javascript: • Builds from source.
  • 19. What Do Others Do? Python: • Builds from source.
  • 20. What Do Others Do? Ruby: • Builds from source.
  • 21. What Do Others Do? Go: • Builds from source.
  • 22. Why Can’t Scala Build from Source? No standard Build Tool Should we standardize on SBT, Gradle, Maven, Ivy, Ant? Reproducible builds are rare. Chicken and egg problem: Because everyone is used to binary builds, nobody* invests in making builds reproducible *Not quite true: Typesafe has invested in community build, can now build more than 1M lines of community projects. But it’s a huge effort.
  • 23. What We Need • An interchange format that captures the essence of Scala dependencies. • This cannot be the JVM bytecode format • Nor can it be source 23
  • 24. The Idea Use Typed Trees as an interchange format. – More robust than source. – More stable than JVM bytecode. – Efficient?
  • 25. New Compiler Pipeline Parser Typer SyntheticMethods SuperAccessors RefChecks ElimRepeated ElimLocals ExtensionMethods TailRec PatternMatcher ExplicitOuter Erasure Mixin Memoize LazyVals CapturedVars Constructors LambdaLift Flatten RestoreScopes Cleanup GenBCode Typed Trees Typed Trees Source Frontend Backend Bytecode
  • 26. How To Build A.class | ATree | HashA1 B.class | BTree | HashB1 C.class | CTree | HashC1 A.class | ATree | HashA2 B.class | BTree | HashB2 C.class | CTree | HashC2 (rebuild from BTree) C.scala C.scala (source change) (rebuild)
  • 27. More Robust Than Source Source Parser Typer SyntheticMethods SuperAccessors Typed Trees Frontend Ø Resolve Names Ø scan packages Ø handle imports Ø establish implicit scopes Ø Resolve Overloading Ø Find Implicits Ø Apply Conversions Ø Infer Type Parameters Ø Assign Types to Trees (5374 lines) A lot can go wrong here!
  • 28. More Robust Than Source RefChecks ElimRepeated ElimLocals ExtensionMethods TailRec PatternMatcher ExplicitOuter Erasure Mixin Memoize LazyVals CapturedVars Constructors LambdaLift Flatten RestoreScopes Cleanup GenBCode Typed Trees Backend Bytecode Assign Types (311 lines)
  • 29. More Resilient Than Bytecode Can – add fields and methods to traits – add lazy vals anywhere – change compilation scheme in any way necessary. None of these would be binary compatible! Can also – add or remove implicits – add methods anywhere – change imports All of these could be source incompatible!
  • 30. Efficient? Can typed trees be efficient enough to build million+ line systems? Possible issues: • Size of trees – on disk – in memory • Transformation time 30
  • 31. Potential Issue: Tree Size xs.filter(_ = 0) becomes: Apply Select Ident “filter” “xs” :: Block DefDef “anonfun” :: :: ValDef “$x” Nil Nil Nil TypeTree: Int Apply Select Ident “$x” “=” Literal 0 Literal “anonfun” :: 16 Nodes, Nil not counting types 17 chars
  • 32. Back of the Envelope Calculation: 16 nodes Average size of node: 32 bytes 512 bytes total. Double that to include type info. è 16 bytes source à 1KB tree (factor 64 blow-up). For a 1M line system 30MB source à 2GB trees. 32
  • 33. A More Compact Representation Apply (34) SelectTermWithSig (9) Ident (3) “xs” “filter” “Function1 -­‐ Boolean” Closure (23) ParamDef (7) “x” TypeRef “scala.Int” Apply (14) SelectTermWithSig (9) Ident (3) “x$” “=“ “Integer -­‐ Boolean” Literal (3) 0 33 Still navigable, because inner nodes contain size of total tree derived from them Types or symbols given at the leaves. Types of inner nodes are reconstituted using the TypeAssigner.
  • 34. Speed Transformation + byte-code generation amounts to ~ 60% of total compile time. We can speed this up by – fusing phases, reducing amount of intermediate trees, – using a fast type assigner, instead of a slow typer, – building different files in parallel. Besides, can use incremental compilation. – Compile only this units that depend on changed libraries. – Need to do that only once. 34
  • 35. Other Benefits Optimization – Typed trees are a great format for interprocedural analyses – Inlining across compilation units made simple – Inlining without binary compatibility issues Program Analysis – Types trees are close to source, but easy to traverse – Ideal for context-dependent program analyses such as FindBugs – Ideal for instrumentation Portability – Typed trees allow retargeting to different backends, as long as dependencies exist. – Allow libraries to be used on JVM, JS, LLVM... without needing explicit recompilation. 35
  • 36. Common Intermediate Format 36 New Backend dotc Frontend scalac Frontend Old Backend GenBCode Bytecode
  • 37. Conclusion 37 Typed trees can fix the binary compatibility problem and they offer lots of other benefits, too. Let’s start the work to make them real!