Keynote slides from @dmandelin highly regarded "Know Your Engines - How to make your JavaScript Fast"
1 of 192
More Related Content
Know yourengines velocity2011
1. Know Your Engines
How to Make Your JavaScript Fast
Dave Mandelin
June 15, 2011
O’Reilly Velocity
2. 5 years of progress...
10
JavaScript
7.5 C
run time vs. C
5
2.5
0
2006 2008 2011
one program on one popular browser:
10x faster!
3. ...lost in an instant!
function f() {
var sum = 0;
for (var i = 0; i < N; ++i) {
sum += i;
}
}
function f() {
eval(“”);
var sum = 0;
for (var i = 0; i < N; ++i) {
sum += i;
}
}
4. ...lost in an instant!
function f() { 80
var sum = 0;
for (var i = 0; i < N; ++i) {
sum += i; 60
}
} 40
20
function f() {
eval(“”); 0
without eval with eval
var sum = 0;
for (var i = 0; i < N; ++i) {
sum += i; with eval(“”) up to
}
} 10x slower!
5. Making JavaScript Fast
Or, Not Making JavaScript Slow
How JITs make JavaScript not slow
How not to ruin animation with pauses
How to write JavaScript that’s not slow
7. Inside the 2006 JS Engine
DOM
Standard
Front End Interpreter Library
Garbage
Collector
8. Inside the 2006 JS Engine
// JavaScript source
e.innerHTML = n + “ items”; DOM
Standard
Front End Interpreter Library
Garbage
Collector
9. Inside the 2006 JS Engine
// JavaScript source
e.innerHTML = n + “ items”; DOM
Standard
Front End Interpreter Library
// bytecode (AST in some engines) Garbage
tmp_0 = add var_1 str_3 Collector
setprop var_0 ‘innerHTML’ tmp_0
10. Inside the 2006 JS Engine
// JavaScript source
e.innerHTML = n + “ items”; DOM
Standard
Front End Interpreter Library
Run the bytecode
// bytecode (AST in some engines) Garbage
tmp_0 = add var_1 str_3 Collector
setprop var_0 ‘innerHTML’ tmp_0
11. Inside the 2006 JS Engine
// JavaScript source
e.innerHTML = n + “ items”; DOM
Standard
Front End Interpreter Library
Run the bytecode
Reclaim memory
// bytecode (AST in some engines) Garbage
tmp_0 = add var_1 str_3 Collector
setprop var_0 ‘innerHTML’ tmp_0
12. Inside the 2006 JS Engine
Set innerHTML
// JavaScript source
e.innerHTML = n + “ items”; DOM
Standard
Front End Interpreter Library
Run the bytecode
Reclaim memory
// bytecode (AST in some engines) Garbage
tmp_0 = add var_1 str_3 Collector
setprop var_0 ‘innerHTML’ tmp_0
13. Why it’s hard to make JS fast
Because
JavaScript is an untyped language.
untyped = no type declarations
14. Operations in an untyped language
x = y + z can mean many things
• if y and z are numbers, numeric addition
• if y and z are strings, concatenation
• and many other cases; y and z can have different types
16. Engine-Internal Types
JS engines use finer-grained types internally.
JavaScript type Engine type
number 32-bit* integer
64-bit floating-point
object
17. Engine-Internal Types
JS engines use finer-grained types internally.
JavaScript type Engine type
number 32-bit* integer
64-bit floating-point
{ a: 1 }
{ a: 1, b: 2 }
object
{ a: get ... }
{ a: 1, __proto__ = new C }
18. Engine-Internal Types
JS engines use finer-grained types internally.
JavaScript type Engine type
number 32-bit* integer
64-bit floating-point
{ a: 1 }
{ a: 1, b: 2 } Different
object
{ a: get ... } shapes
{ a: 1, __proto__ = new C }
19. Values in an untyped language
Because JavaScript is untyped, the interpreter needs boxed values.
Boxed Unboxed
Purpose Storage Computation
Examples (INT, 55) 55
(STRING, “foo”) “foo”
Definition (type tag, C++ value) C++ value
only boxed values can be stored in variables,
only unboxed values can be computed with (+, *, etc)
20. Running Code in the Interpreter
Here’s what the interpreter must do to execute x = y + z:
21. Running Code in the Interpreter
Here’s what the interpreter must do to execute x = y + z:
‣ read the operation x = y + z from memory
22. Running Code in the Interpreter
Here’s what the interpreter must do to execute x = y + z:
‣ read the operation x = y + z from memory
‣ read the boxed inputs y and z from memory
23. Running Code in the Interpreter
Here’s what the interpreter must do to execute x = y + z:
‣ read the operation x = y + z from memory
‣ read the boxed inputs y and z from memory
‣ check the types of y and z and choose the action
24. Running Code in the Interpreter
Here’s what the interpreter must do to execute x = y + z:
‣ read the operation x = y + z from memory
‣ read the boxed inputs y and z from memory
‣ check the types of y and z and choose the action
‣ unbox y and z
25. Running Code in the Interpreter
Here’s what the interpreter must do to execute x = y + z:
‣ read the operation x = y + z from memory
‣ read the boxed inputs y and z from memory
‣ check the types of y and z and choose the action
‣ unbox y and z
‣ execute the action
26. Running Code in the Interpreter
Here’s what the interpreter must do to execute x = y + z:
‣ read the operation x = y + z from memory
‣ read the boxed inputs y and z from memory
‣ check the types of y and z and choose the action
‣ unbox y and z
‣ execute the action
‣ box the output x
27. Running Code in the Interpreter
Here’s what the interpreter must do to execute x = y + z:
‣ read the operation x = y + z from memory
‣ read the boxed inputs y and z from memory
‣ check the types of y and z and choose the action
‣ unbox y and z
‣ execute the action
‣ box the output x
‣ write the boxed output x to memory
28. Running Code in the Interpreter
Here’s what the interpreter must do to execute x = y + z:
‣ read the operation x = y + z from memory
‣ read the boxed inputs y and z from memory
‣ check the types of y and z and choose the action
‣ unbox y and z
This is the only real work!
‣ execute the action
‣ box the output x
‣ write the boxed output x to memory
29. Running Code in the Interpreter
Here’s what the interpreter must do to execute x = y + z:
‣ read the operation x = y + z from memory
‣ read the boxed inputs y and z from memory
‣ check the types of y and z and choose the action
‣ unbox y and z
This is the only real work!
‣ execute the action
‣ box the output x Everything else is
‣ write the boxed output x to memory overhead.
31. Inside the 2011 JS Engine
Garbage
Collector DOM
Interpreter
JavaScript source Standard
Library
Front End
bytecode/AST
32. Inside the 2011 JS Engine
Garbage
Collector DOM
Interpreter
JavaScript source Standard
Library
Front End JIT Compiler
Compile to x86/x64/ARM
bytecode/AST
33. Inside the 2011 JS Engine
Garbage
Collector DOM
Interpreter
JavaScript source Standard
Library
Fast! x86/x64/ARM
Front End JIT Compiler
Compile to x86/x64/ARM
CPU
bytecode/AST
34. Inside the 2011 JS Engine
Garbage
Collector DOM
Interpreter
JavaScript source Standard
Library
Fast! x86/x64/ARM
Front End JIT Compiler
Compile to x86/x64/ARM
CPU
Type-Specializing
bytecode/AST JIT Compiler Ultra Fast!
35. Inside the 2011 JS Engine
Garbage
Collector DOM
Interpreter
JavaScript source Standard
Library
Fast! x86/x64/ARM
Front End JIT Compiler
Compile to x86/x64/ARM
CPU
Type-Specializing
bytecode/AST JIT Compiler Ultra Fast!
36. Inside the 2011 JS Engine
THE Garbage
DOM
Collector
SLOW ZONE Interpreter
JavaScript source Standard
Library
Fast! x86/x64/ARM
Front End JIT Compiler
Compile to x86/x64/ARM
CPU
Type-Specializing
bytecode/AST JIT Compiler Ultra Fast!
37. Running Code with the JIT
All Major
The basic JIT compiler on x = y + z: Browsers
‣ read the operation x = y + z from memory
‣ read the inputs y and z from memory
‣ check the types of y and z and choose the action
‣ unbox y and z
‣ execute the action
‣ box the output x
‣ write the output x to memory
38. Running Code with the JIT
All Major
The basic JIT compiler on x = y + z: Browsers
‣ read the operation x = y + z from memory CPU does it for us!
‣ read the inputs y and z from memory
‣ check the types of y and z and choose the action
‣ unbox y and z
‣ execute the action
‣ box the output x
‣ write the output x to memory
39. Running Code with the JIT
All Major
The basic JIT compiler on x = y + z: Browsers
‣ read the operation x = y + z from memory CPU does it for us!
‣ read the inputs y and z from memory
‣ check the types of y and z and choose the action
‣ unbox y and z
‣ execute the action
‣ box the output x
‣ write the output x to memory JIT code can keep
things in registers
42. Choosing the action in the JIT
• Many cases for operators like +
• Engines generate fast JIT code for “common cases”
• number + number
• string + string
43. Choosing the action in the JIT
• Many cases for operators like +
• Engines generate fast JIT code for “common cases”
• number + number
• string + string
• “Rare cases” run in the slow zone
• number + undefined
44. JITs for Regular Expressions
All Major
Browsers
• There is a separate JIT for regular expressions
• Regular expressions are generally faster than manual search
• Still in the slow zone:
• Some complex regexes (example: backreferences)
• Building result arrays (test much faster than exec)
46. Object Properties
function f(obj) {
return obj.a + 1;
}
• Need to search obj for a property named a slow
47. Object Properties
function f(obj) {
return obj.a + 1;
}
• Need to search obj for a property named a slow
• May need to search prototype chain up several levels super-slow
48. Object Properties
function f(obj) {
return obj.a + 1;
}
• Need to search obj for a property named a slow
• May need to search prototype chain up several levels super-slow
• Finally, once we’ve found it, get the property value fast!
50. ICs: a mini-JIT for objects All Major
Browsers
• Properties become fast with inline caching (we prefer IC)
51. ICs: a mini-JIT for objects All Major
Browsers
• Properties become fast with inline caching (we prefer IC)
• Basic plan:
52. ICs: a mini-JIT for objects All Major
Browsers
• Properties become fast with inline caching (we prefer IC)
• Basic plan:
1. First time around, search for the property in the Slow Zone
53. ICs: a mini-JIT for objects All Major
Browsers
• Properties become fast with inline caching (we prefer IC)
• Basic plan:
1. First time around, search for the property in the Slow Zone
2. But record the steps done to actually get the property
54. ICs: a mini-JIT for objects All Major
Browsers
• Properties become fast with inline caching (we prefer IC)
• Basic plan:
1. First time around, search for the property in the Slow Zone
2. But record the steps done to actually get the property
3. Then JIT a little piece of code that does just that
55. ICs: Example
Example Code
var obj1 = { a: 1, b: 2, c: 3 };
var obj2 = { b: 2 };
function f(obj) {
return obj.b + 1;
}
56. ICs: Example
Example Code
var obj1 = { a: 1, b: 2, c: 3 };
var obj2 = { b: 2 };
function f(obj) {
return obj.b + 1;
}
Generated JIT Code
...
jump slowPropAccess slowPropAccess:
continue_1: ... set up call
... call ICGetProp ; C++ Slow Zone
jump continue_1
57. ICs: Example
Example Code
shape=12, in position 1
var obj1 = { a: 1, b: 2, c: 3 };
var obj2 = { b: 2 };
function f(obj) {
return obj.b + 1;
}
Generated JIT Code
...
jump slowPropAccess slowPropAccess:
continue_1: ... set up call
... call ICGetProp ; C++ Slow Zone
jump continue_1
58. ICs: Example
Example Code icStub_1:
shape=12, in position 1 compare obj.shape, 12
var obj1 = { a: 1, b: 2, c: 3 }; jumpIfFalse slowPropAccess
var obj2 = { b: 2 }; load obj.props[1]
jump continue_1
function f(obj) {
return obj.b + 1;
}
Generated JIT Code
...
jump slowPropAccess slowPropAccess:
continue_1: ... set up call
... call ICGetProp ; C++ Slow Zone
jump continue_1
59. ICs: Example
Example Code icStub_1:
shape=12, in position 1 compare obj.shape, 12
var obj1 = { a: 1, b: 2, c: 3 }; jumpIfFalse slowPropAccess
var obj2 = { b: 2 }; load obj.props[1]
jump continue_1
function f(obj) {
return obj.b + 1;
}
Generated JIT Code
...
jump slowPropAccess slowPropAccess:
continue_1: ... set up call
... call ICGetProp ; C++ Slow Zone
jump continue_1
60. ICs: Example
Example Code icStub_1:
shape=12, in position 1 compare obj.shape, 12
var obj1 = { a: 1, b: 2, c: 3 }; jumpIfFalse slowPropAccess
var obj2 = { b: 2 }; load obj.props[1]
jump continue_1
function f(obj) {
return obj.b + 1;
}
Generated JIT Code
...
jump slowPropAccess slowPropAccess:
continue_1: ... set up call
... call ICGetProp ; C++ Slow Zone
jump continue_1
61. ICs: Example
Example Code icStub_1:
shape=12, in position 1 compare obj.shape, 12
var obj1 = { a: 1, b: 2, c: 3 }; jumpIfFalse slowPropAccess
var obj2 = { b: 2 }; load obj.props[1]
shape=15, in position 0 jump continue_1
function f(obj) {
return obj.b + 1;
}
Generated JIT Code
...
jump slowPropAccess slowPropAccess:
continue_1: ... set up call
... call ICGetProp ; C++ Slow Zone
jump continue_1
62. ICs: Example
Example Code icStub_1:
shape=12, in position 1 compare obj.shape, 12
var obj1 = { a: 1, b: 2, c: 3 }; jumpIfFalse slowPropAccess
var obj2 = { b: 2 }; load obj.props[1]
shape=15, in position 0 jump continue_1
function f(obj) {
return obj.b + 1;
} icStub_2:
compare obj.shape, 15
jumpIfFalse slowPropAccess
Generated JIT Code load obj.props[0]
jump continue_1
...
jump slowPropAccess slowPropAccess:
continue_1: ... set up call
... call ICGetProp ; C++ Slow Zone
jump continue_1
63. ICs: Example
Example Code icStub_1:
shape=12, in position 1 compare obj.shape, 12
var obj1 = { a: 1, b: 2, c: 3 }; jumpIfFalse slowPropAccess
var obj2 = { b: 2 }; load obj.props[1]
shape=15, in position 0 jump continue_1
function f(obj) {
return obj.b + 1;
} icStub_2:
compare obj.shape, 15
jumpIfFalse slowPropAccess
Generated JIT Code load obj.props[0]
jump continue_1
...
jump slowPropAccess slowPropAccess:
continue_1: ... set up call
... call ICGetProp ; C++ Slow Zone
jump continue_1
64. ICs: Example
Example Code icStub_1:
shape=12, in position 1 compare obj.shape, 12
var obj1 = { a: 1, b: 2, c: 3 }; jumpIfFalse slowPropAccess
var obj2 = { b: 2 }; load obj.props[1]
shape=15, in position 0 jump continue_1
function f(obj) {
return obj.b + 1;
} icStub_2:
compare obj.shape, 15
jumpIfFalse slowPropAccess
Generated JIT Code load obj.props[0]
jump continue_1
...
jump slowPropAccess slowPropAccess:
continue_1: ... set up call
... call ICGetProp ; C++ Slow Zone
jump continue_1
65. These are fast because of ICs
Global Variable Access
var q = 4;
var r;
function f(obj) {
r = q;
}
66. These are fast because of ICs
Global Variable Access
var q = 4;
var r;
function f(obj) {
r = q;
}
Direct Property Access
var obj1 = { a: 1, b: 2, c: 3 };
var obj2 = { b: 2 };
function f(obj) {
obj2.b = obj1.c;
}
67. These are fast because of ICs
Global Variable Access Closure Variable Access
var q = 4; var f = function() {
var r; var x = 1;
var g = function() {
function f(obj) { var sum = 0;
r = q; for (var i = 0; i < N; ++i) {
} sum += x;
}
return sum;
Direct Property Access }
return g();
var obj1 = { a: 1, b: 2, c: 3 }; }
var obj2 = { b: 2 };
function f(obj) {
obj2.b = obj1.c;
}
68. Prototypes don’t hurt much
function A(x) {
this.x = x;
}
function B(y) {
this.y = y;
}
B.prototype = new A;
function C(z) {
this.z = z;
}
C.prototype = new B;
69. Prototypes don’t hurt much
new A
function A(x) {
this.x = x;
}
new B
function B(y) { proto
this.y = y;
}
new C(1)
B.prototype = new A;
function C(z) {
this.z = z;
}
C.prototype = new B;
70. Prototypes don’t hurt much
new A
function A(x) {
this.x = x;
}
new B
function B(y) { proto
this.y = y;
}
new C(1) new C(2)
B.prototype = new A;
function C(z) {
this.z = z;
}
C.prototype = new B;
71. Prototypes don’t hurt much
new A
function A(x) {
this.x = x;
}
new B
function B(y) { proto
this.y = y;
}
new C(1) new C(2) new C(3)
B.prototype = new A;
function C(z) {
this.z = z;
}
C.prototype = new B;
72. Prototypes don’t hurt much
new A
function A(x) {
this.x = x;
}
new B
function B(y) { proto
this.y = y;
}
new C(1) new C(2) new C(3)
B.prototype = new A;
function C(z) {
this.z = z;
Shape of new C objects determines prototype
}
C.prototype = new B;
73. Prototypes don’t hurt much
new A
function A(x) {
this.x = x;
}
new B
function B(y) { proto
this.y = y;
}
new C(1) new C(2) new C(3)
B.prototype = new A;
function C(z) {
this.z = z;
Shape of new C objects determines prototype
}
C.prototype = new B; -> IC can generate code that checks shape,
then reads directly from prototype without walking
74. Many Shapes Slow Down ICs
What happens if many shapes of obj are passed to f?
function f(obj) {
return obj.p;
}
ICs end up looking like this:
75. Many Shapes Slow Down ICs
What happens if many shapes of obj are passed to f?
function f(obj) {
return obj.p;
}
ICs end up looking like this:
jumpIf shape != 12
read for shape 12
76. Many Shapes Slow Down ICs
What happens if many shapes of obj are passed to f?
function f(obj) {
return obj.p;
}
ICs end up looking like this:
jumpIf shape != 12
read for shape 12
jumpIf shape != 15
read for shape 15
77. Many Shapes Slow Down ICs
What happens if many shapes of obj are passed to f?
function f(obj) {
return obj.p;
}
ICs end up looking like this:
jumpIf shape != 12
read for shape 12
jumpIf shape != 15
read for shape 15
jumpIf shape != 6
read for shape 6
78. Many Shapes Slow Down ICs
What happens if many shapes of obj are passed to f?
function f(obj) {
return obj.p;
}
ICs end up looking like this: ...
jumpIf shape != 12 jumpIf shape != 16
read for shape 12 read for shape 16
jumpIf shape != 15 jumpIf shape != 22
read for shape 15 read for shape 22
jumpIf shape != 6 jumpIf shape != 3
read for shape 6 read for shape 3
79. Many shapes in practice
100
IE IE Slow Zone for 2+ shapes
Opera
Chrome
75
Opera # of shapes doesn’t matter!
nanoseconds/iteration
Firefox
Safari
50 Chrome more shapes -> slower
Firefox
25 slower with more shapes,
but levels off in Slow Zone
Safari
0
1 2 8 16 32 100 200
# of shapes at property read site
80. Deeply Nested Closures are Slower
var f = function() {
var x;
var g = function() {
var h = function() {
var y;
var i = function () {
var j = function() {
z = x + y;
81. Deeply Nested Closures are Slower
var f = function() { f call object
var x;
var g = function() {
var h = function() { h call object
var y;
var i = function () {
var j = function() { j call object
z = x + y;
First call to f
82. Deeply Nested Closures are Slower
var f = function() { f call object f call object
var x;
var g = function() {
var h = function() { h call object h call object
var y;
var i = function () {
var j = function() { j call object j call object
z = x + y;
First call to f Second call to f
83. Deeply Nested Closures are Slower
var f = function() { f call object f call object
var x;
var g = function() {
var h = function() { h call object h call object
var y;
var i = function () {
var j = function() { j call object j call object
z = x + y;
First call to f Second call to f
• Prototype chains don’t slow us down, but deep closure nesting does. Why?
84. Deeply Nested Closures are Slower
var f = function() { f call object f call object
var x;
var g = function() {
var h = function() { h call object h call object
var y;
var i = function () {
var j = function() { j call object j call object
z = x + y;
First call to f Second call to f
• Prototype chains don’t slow us down, but deep closure nesting does. Why?
• Every call to f generates a unique closure object to hold x.
85. Deeply Nested Closures are Slower
var f = function() { f call object f call object
var x;
var g = function() {
var h = function() { h call object h call object
var y;
var i = function () {
var j = function() { j call object j call object
z = x + y;
First call to f Second call to f
• Prototype chains don’t slow us down, but deep closure nesting does. Why?
• Every call to f generates a unique closure object to hold x.
• The engine must walk up to x each time
87. Properties in the Slow Zone
Undefined Property
(Fast on Firefox, Chrome)
var a = {};
a.x;
88. Properties in the Slow Zone
Undefined Property
(Fast on Firefox, Chrome)
var a = {};
a.x;
DOM Access
(I only tested .id, so take with a grain of salt--
other properties may differ)
var a = document.getByElementId(“foo”);
a.id;
89. Properties in the Slow Zone
Undefined Property Scripted Getter
(Fast on Firefox, Chrome) (Fast on IE)
var a = {}; var a = { x: get() { return 1; } };
a.x; a.x;
DOM Access
(I only tested .id, so take with a grain of salt--
other properties may differ)
var a = document.getByElementId(“foo”);
a.id;
90. Properties in the Slow Zone
Undefined Property Scripted Getter
(Fast on Firefox, Chrome) (Fast on IE)
var a = {}; var a = { x: get() { return 1; } };
a.x; a.x;
DOM Access Scripted Setter
(I only tested .id, so take with a grain of salt--
other properties may differ) var a = { x: set(y) { this.x_ = y; } };
a.x = 1;
var a = document.getByElementId(“foo”);
a.id;
93. Types FTW!
If only JavaScript had type declarations...
➡ The JIT would know the type of every local variable
94. Types FTW!
If only JavaScript had type declarations...
➡ The JIT would know the type of every local variable
➡ Know exactly what action to use (no type checks)
95. Types FTW!
If only JavaScript had type declarations...
➡ The JIT would know the type of every local variable
➡ Know exactly what action to use (no type checks)
➡ Local variables don’t need to be boxed (or unboxed)
96. Types FTW!
If only JavaScript had type declarations...
➡ The JIT would know the type of every local variable
➡ Know exactly what action to use (no type checks)
➡ Local variables don’t need to be boxed (or unboxed)
We call this kind of JIT a
type-specializing JIT
98. But JS doesn’t have types
• Problem: JS doesn’t have type declarations
• won’t have them any time soon
• we don’t want to wait
99. But JS doesn’t have types
• Problem: JS doesn’t have type declarations
• won’t have them any time soon
• we don’t want to wait
• Solution: run the program for a bit, monitor types
100. But JS doesn’t have types
• Problem: JS doesn’t have type declarations
• won’t have them any time soon
• we don’t want to wait
• Solution: run the program for a bit, monitor types
• Then recompile optimized for those types
101. Running with the Type-Specializing JIT
Firefox 3.5+
On x = y + z: Chrome 11+
‣ read the operation x = y + z from memory
‣ read the inputs y and z from memory
‣ check the types of y and z and choose the action
‣ unbox y and z
‣ execute the action
‣ box the output x
‣ write the output x to memory
102. Running with the Type-Specializing JIT
Firefox 3.5+
On x = y + z: Chrome 11+
‣ read the operation x = y + z from memory
‣ read the inputs y and z from memory
‣ check the types of y and z and choose the action
‣ unbox y and z
‣ execute the action
‣ box the output x
‣ write the output x to memory
103. Running with the Type-Specializing JIT
Firefox 3.5+
On x = y + z: Chrome 11+
‣ read the operation x = y + z from memory
‣ read the inputs y and z from memory
‣ check the types of y and z and choose the action
‣ unbox y and z
‣ execute the action
‣ box the output x
‣ write the output x to memory
104. Further Optimization 1
Automatic Inlining
original code
function getPop(city) {
return popdata[city.id];
}
for (var i = 0; i < N; ++i) {
total += getPop(city);
}
105. Further Optimization 1
Automatic Inlining
original code JIT compiles as if
function getPop(city) {
you wrote this
return popdata[city.id];
} for (var i = 0; i < N; ++i) {
total += popdata[city.id];
for (var i = 0; i < N; ++i) { }
total += getPop(city);
}
106. Further Optimization 2
Loop Invariant Code Motion (LICM, “hoisting”)
original code
for (var i = 0; i < N; ++i) {
total += a[i] *
(1 + options.tax);
}
107. Further Optimization 2
Loop Invariant Code Motion (LICM, “hoisting”)
original code JIT compiles as if
you wrote this
for (var i = 0; i < N; ++i) { var f = 1 + options.tax;
total += a[i] * for (var i = 0; i < N; ++i) {
(1 + options.tax); total += a[i] * f;
} }
109. Optimize Only Hot Code
• Type-specializing JITs can have a hefty startup cost
• Need to collect the type information
• Advanced compiler optimizations take longer to run
110. Optimize Only Hot Code
• Type-specializing JITs can have a hefty startup cost
• Need to collect the type information
• Advanced compiler optimizations take longer to run
• Therefore, type specialization is applied selectively
• Only on hot code
• Tracemonkey: hot = 70 iterations
• Crankshaft: hot = according to a profiler
• Only if judged to be worthwhile (incomprehensible heuristics)
112. Current Limitations
• What happens if the types change after compiling?
• Just a few changes -> recompile, slight slowdown
• Many changes -> give up and deoptimize to basic JIT
113. Current Limitations
• What happens if the types change after compiling?
• Just a few changes -> recompile, slight slowdown
• Many changes -> give up and deoptimize to basic JIT
• Array elements, object properties, and closed-over variables
• Usually still boxed
• Still need to check type and unbox on get, box on set
• Typed arrays might help, but support is not always there yet
114. Current Limitations
• What happens if the types change after compiling?
• Just a few changes -> recompile, slight slowdown
• Many changes -> give up and deoptimize to basic JIT
• Array elements, object properties, and closed-over variables
• Usually still boxed
• Still need to check type and unbox on get, box on set
• Typed arrays might help, but support is not always there yet
• JS semantics require overflow checks for integer math
117. Type Inference
• Trying to get rid of the last few instances of boxing
(from before: array and object properties)
118. Type Inference
• Trying to get rid of the last few instances of boxing
(from before: array and object properties)
• Idea: use static program analysis to prove types
• of object props, array elements, called functions
• or, almost prove types, and also prove minimal checks needed
119. Type Inference Example
var a = [];
for (var i = 0; i < N; ++i) {
a[i] = i * i;
]
var sum = 0;
for (var i = 0; i < N; ++i) {
sum += a[i];
}
Type inference gets this...
120. Type Inference Example
var a = [];
for (var i = 0; i < N; ++i) {
a[i] = i * i;
]
var sum = 0;
for (var i = 0; i < N; ++i) {
sum += a[i];
}
Type inference gets this...
“i is always a number,
so i * i is always a number,
so a[_] is always a number!”
121. Type Inference Example
var a = []; var a = [];
for (var i = 0; i < N; ++i) { for (var i = 0; i < N; ++i) {
a[i] = i * i; if (i % 2)
] a[i] = i * i;
else
var sum = 0; a[i] = “foo”;
for (var i = 0; i < N; ++i) { ]
sum += a[i];
} var sum = 0;
for (var i = 0; i < N; ++i) {
if (i % 2)
Type inference gets this... sum += a[i];
}
“i is always a number,
so i * i is always a number, ...but not this.
so a[_] is always a number!”
122. Type-stable JavaScript
The key to running faster in future JITs is
type-stable JavaScript.
This means JavaScript where you could
declare a single engine-internal type for each variable.
123. Type-stable JS: examples
Type-stable
var g = 34;
var o1 = { a: 56 };
var o2 = { a: 99 };
for (var i = 0; i < 10; ++i) {
var o = i % 2 ? o1 : o2;
g += o.a;
}
g = 0;
124. Type-stable JS: examples
Type-stable NOT type-stable
var g = 34; var g = 34;
var o1 = { a: 56 }; var o1 = { a: 56 };
var o2 = { a: 99 }; var o2 = { z: 22, a: 56 };
for (var i = 0; i < 10; ++i) { for (var i = 0; i < 10; ++i) {
var o = i % 2 ? o1 : o2; var o = i % 2 ? o1 : o2;
g += o.a; g += o.a;
} }
g = 0; g = “hello”;
125. Type-stable JS: examples
Type-stable NOT type-stable
var g = 34; var g = 34;
var o1 = { a: 56 }; var o1 = { a: 56 };
var o2 = { a: 99 }; var o2 = { z: 22, a: 56 };
for (var i = 0; i < 10; ++i) { for (var i = 0; i < 10; ++i) {
var o = i % 2 ? o1 : o2; var o = i % 2 ? o1 : o2;
g += o.a; g += o.a; Different shapes
} }
g = 0; g = “hello”;
126. Type-stable JS: examples
Type-stable NOT type-stable
var g = 34; var g = 34;
var o1 = { a: 56 }; var o1 = { a: 56 };
var o2 = { a: 99 }; var o2 = { z: 22, a: 56 };
for (var i = 0; i < 10; ++i) { for (var i = 0; i < 10; ++i) {
var o = i % 2 ? o1 : o2; var o = i % 2 ? o1 : o2;
g += o.a; g += o.a; Different shapes
} }
g = 0; g = “hello”; Type change
128. What Allocates Memory?
Objects
new Object();
new MyConstructor();
{ a: 4, b: 5 }
Object.create();
Arrays
new Array();
[ 1, 2, 3, 4 ];
Strings
new String(“hello”);
“<p>” + e.innerHTML + “</p>”
129. What Allocates Memory?
Objects Function Objects
new Object(); var x = function () { ... }
new MyConstructor(); new Function(code);
{ a: 4, b: 5 }
Object.create();
Arrays
new Array();
[ 1, 2, 3, 4 ];
Strings
new String(“hello”);
“<p>” + e.innerHTML + “</p>”
130. What Allocates Memory?
Objects Function Objects
new Object(); var x = function () { ... }
new MyConstructor(); new Function(code);
{ a: 4, b: 5 }
Object.create();
Arrays Closure Environments
new Array(); function outer(name) {
[ 1, 2, 3, 4 ]; var x = name;
return function inner() {
return “Hi, “ + name;
Strings }
}
new String(“hello”);
“<p>” + e.innerHTML + “</p>”
131. What Allocates Memory?
Objects Function Objects
new Object(); var x = function () { ... }
new MyConstructor(); new Function(code);
{ a: 4, b: 5 }
Object.create();
Arrays Closure Environments
new Array(); function outer(name) {
[ 1, 2, 3, 4 ]; var x = name;
return function inner() {
return “Hi, “ + name;
Strings }
}
new String(“hello”); name is stored in an
“<p>” + e.innerHTML + “</p>” implicitly created object!
132. GC Pauses Your Program!
Time JavaScript GC Running
Running JS Paused
133. GC Pauses Your Program!
Time JavaScript GC Running
Running JS Paused
• Basic GC algorithm (mark and sweep)
• Traverse all reachable objects (from locals, window, DOM)
• Recycle objects that are not reachable
134. GC Pauses Your Program!
Time JavaScript GC Running
Running JS Paused
• Basic GC algorithm (mark and sweep)
• Traverse all reachable objects (from locals, window, DOM)
• Recycle objects that are not reachable
• The JS program is paused during GC for safe traversal
135. GC Pauses Your Program!
Time JavaScript GC Running
Running JS Paused
• Basic GC algorithm (mark and sweep)
• Traverse all reachable objects (from locals, window, DOM)
• Recycle objects that are not reachable
• The JS program is paused during GC for safe traversal
• Pauses may be long: 100 ms or more
• Serious problem for animation
• Can also be a drag on general performance
137. Reducing Pauses with Science 1
Generational GC Chrome
Idea: Optimize for creating many short-lived objects
138. Reducing Pauses with Science 1
Generational GC Chrome
Idea: Optimize for creating many short-lived objects
Create objects in a frequently collected nursery area
139. Reducing Pauses with Science 1
Generational GC Chrome
Idea: Optimize for creating many short-lived objects
Create objects in a frequently collected nursery area
Promote long-lived objects to a rarely collected tenured area
140. Reducing Pauses with Science 1
Generational GC Chrome
Idea: Optimize for creating many short-lived objects
Create objects in a frequently collected nursery area
Promote long-lived objects to a rarely collected tenured area
JavaScript GC Running
Simple GC
Running JS Paused
141. Reducing Pauses with Science 1
Generational GC Chrome
Idea: Optimize for creating many short-lived objects
Create objects in a frequently collected nursery area
Promote long-lived objects to a rarely collected tenured area
JavaScript GC Running
Simple GC
Running JS Paused
JavaScript
Generational GC
Running
142. Reducing Pauses with Science 1
Generational GC Chrome
Idea: Optimize for creating many short-lived objects
Create objects in a frequently collected nursery area
Promote long-lived objects to a rarely collected tenured area
JavaScript GC Running
Simple GC
Running JS Paused
JavaScript
Generational GC
Running
nursery collection (<100 us)
143. Reducing Pauses with Science 1
Generational GC Chrome
Idea: Optimize for creating many short-lived objects
Create objects in a frequently collected nursery area
Promote long-lived objects to a rarely collected tenured area
JavaScript GC Running
Simple GC
Running JS Paused
JavaScript
Generational GC
Running
tenured collection
nursery collection (<100 us)
144. Reducing Pauses with Science 1
Generational GC Chrome
Idea: Optimize for creating many short-lived objects
Create objects in a frequently collected nursery area
Promote long-lived objects to a rarely collected tenured area
JavaScript GC Running
Simple GC
Running JS Paused
JavaScript
Generational GC fewer pauses!
Running
tenured collection
nursery collection (<100 us)
145. Generational GC by Example
scavenging young generation (aka nursery)
mark-and-sweep tenured generation
Message
Message
Array
146. Generational GC by Example
scavenging young generation (aka nursery)
Point
mark-and-sweep tenured generation
Message
Message
Array
147. Generational GC by Example
scavenging young generation (aka nursery)
Point Point
mark-and-sweep tenured generation
Message
Message
Array
148. Generational GC by Example
scavenging young generation (aka nursery)
Point Point Line
mark-and-sweep tenured generation
Message
Message
Array
149. Generational GC by Example
scavenging young generation (aka nursery)
Point Point Line
a b
mark-and-sweep tenured generation
Message
Message
Array
150. Generational GC by Example
scavenging young generation (aka nursery)
Point Point Line Point
a b
mark-and-sweep tenured generation
Message
Message
Array
151. Generational GC by Example
scavenging young generation (aka nursery)
Point Point Line Point Message
a b
mark-and-sweep tenured generation
Message
Message
Array
152. Generational GC by Example
scavenging young generation (aka nursery)
Point Point Line Point Message
a b
mark-and-sweep tenured generation
Message
Message
Array
153. Generational GC by Example
scavenging young generation (aka nursery)
Point Point Line Point Message Point
a b
mark-and-sweep tenured generation
Message
Message
Array
154. Generational GC by Example
scavenging young generation (aka nursery)
Point Point Line Point Point
a b
mark-and-sweep tenured generation
Message
Message
Message
Array
155. Generational GC by Example
scavenging young generation (aka nursery)
mark-and-sweep tenured generation
Message
Message
Message
Array
157. Reducing Pauses with Science 1I
Current
Incremental GC Research
@Mozilla
Idea: Do a little bit of GC traversal at a time
158. Reducing Pauses with Science 1I
Current
Incremental GC Research
@Mozilla
Idea: Do a little bit of GC traversal at a time
JavaScript GC Running
Simple GC
Running JS Paused
159. Reducing Pauses with Science 1I
Current
Incremental GC Research
@Mozilla
Idea: Do a little bit of GC traversal at a time
JavaScript GC Running
Simple GC
Running JS Paused
Incremental GC
160. Reducing Pauses with Science 1I
Current
Incremental GC Research
@Mozilla
Idea: Do a little bit of GC traversal at a time
JavaScript GC Running
Simple GC
Running JS Paused
Incremental GC shorter pauses!
162. Reducing Pauses in Practice
• For all GCs
• Fewer live objects -> shorter pauses (if not incremental),
less time spent in GC
163. Reducing Pauses in Practice
• For all GCs
• Fewer live objects -> shorter pauses (if not incremental),
less time spent in GC
• For simple GCs
• Lower allocation rate (objects/second) -> less frequent pauses
164. Reducing Pauses in Practice
• For all GCs
• Fewer live objects -> shorter pauses (if not incremental),
less time spent in GC
• For simple GCs
• Lower allocation rate (objects/second) -> less frequent pauses
• For generational GCs
• Short-lived objects don’t affect pause frequency
• Long-lived objects cost extra (promotion = copying)
166. Performance Faults
• Performance fault: when a tiny change hurts performance
• Sometimes, just makes one statement slower
• Other times, deoptimizes the entire function!
• Reasons we have performance faults
• bug, tends to get quickly
• “rare” case, will get fixed if not rare
• hard to optimize, RSN...
168. Strings
• In the Slow Zone, but some things are faster than you might think
169. Strings
• In the Slow Zone, but some things are faster than you might think
• .substring() is fast, O(1)
• Don’t need to copy characters, just point within original
170. Strings
• In the Slow Zone, but some things are faster than you might think
• .substring() is fast, O(1)
• Don’t need to copy characters, just point within original
• Concatenation is also optimized
• Batch up inputs in a rope or concat tree, concat all at once
• Performance fault: prepending (Chrome, Opera)
171. Strings
• In the Slow Zone, but some things are faster than you might think
• .substring() is fast, O(1) // Prepending example
var s = “”;
•Don’t need to copy characters, just point iwithin<original {
for (var = 0; i 100; ++i)
s = i + s;
• Concatenation is also optimized }
• Batch up inputs in a rope or concat tree, concat all at once
• Performance fault: prepending (Chrome, Opera)
172. Arrays
fast: dense array
var a = [];
Want a fast array?
for (var i = 0; i < 100; ++i) {
a[i] = 0; ‣ Make sure it’s dense
}
‣ 0..N fill or push fill is always dense
3-15x slower: sparse array ‣ Huge gaps are always sparse
var a = []; ‣ N..0 fill is sparse on Firefox
a[10000] = 0;
for (var i = 0; i < 100; ++i) {
a[i] = 0;
‣ adding a named property is sparse
on Firefox, IE
}
a.x = 7; // Fx, IE only
173. Iteration over Arrays
fastest: index iteration
// This runs in all in JIT code,
// so it’s really fast.
for (var i = 0; i < a.length; ++i) {
sum += a[i];
}
174. Iteration over Arrays
3-15x slower: functional style
// This makes N function calls,
fastest: index iteration // and most JITs don’t optimize
// through C++ reduce().
sum = a.reduce(function(a, b) {
// This runs in all in JIT code,
return a + b; });
// so it’s really fast.
for (var i = 0; i < a.length; ++i) {
sum += a[i];
}
20-80x slower: for-in
// This calls a C++ function to
// navigate the property list.
for (var i in a) {
sum += a[i];
}
175. Functions
• Function calls use ICs, so they are fast
• Manual inlining can still help sometimes
• Key performance faults:
• f.call() - 1.3-35x slower than f()
• f.apply() - 5-50x slower than f()
• arguments - often very slow, but varies
176. Creating Objects
Creating objects is slow
Doesn’t matter too much how you create or populate
177. Creating Objects
Creating objects is slow
Doesn’t matter too much how you create or populate
Exception: Constructors on Chrome are fast
function Cons(x, y, z) {
this.x = x;
this.y = y;
this.z = z;
}
for (var i = 0; i < N; ++i)
new Cons(i, i + 1, i * 2);
182. OOP Styling
Prototype Information-Hiding
function Point(x, y) {
this.x = x; function Point(x, y) {
this.y = y; return {
} distance: function(pt2) ...
Point.prototype = { }
distance: function(pt2) ... }
Prototype style is much faster to create Instance Methods
(each closure creates a function object) function Point(x, y) {
this.x = x;
this.y = y;
this.distance = function(pt2) ...
}
183. OOP Styling
Prototype Information-Hiding
function Point(x, y) {
this.x = x; function Point(x, y) {
this.y = y; return {
} distance: function(pt2) ...
Point.prototype = { }
distance: function(pt2) ... }
Prototype style is much faster to create Instance Methods
(each closure creates a function object) function Point(x, y) {
this.x = x;
this.y = y;
this.distance = function(pt2) ...
}
Using the objects is about the same
184. Exceptions
• Exceptions assumed to be rare in perf-sensitive code
• running a try statement is free on most browers
• throw/catch is really slow
• There are many performance faults around exceptions
• just having a try statement deoptimizes on some browers
• try-finally is perf fault on some
185. eval and with
Short version:
Do not use anywhere near performance sensitive code!
Mind-Bogglingly Awful Still Terrible
5-100x slower than using a function call 2-10x slower than without eval
var sum = 0; var sum = 0;
for (var i = 0; i < N; ++i) { eval(“”);
sum = eval(“sum + i”); for (var i = 0; i < N; ++i) {
} sum = eval(“sum + i”);
}
187. Top 5 Things to Know
5. Avoid eval, with, exceptions near perf-senstive code
188. Top 5 Things to Know
5. Avoid eval, with, exceptions near perf-senstive code
4. Avoid creating objects in hot loops
189. Top 5 Things to Know
5. Avoid eval, with, exceptions near perf-senstive code
4. Avoid creating objects in hot loops
3. Use dense arrays (know what causes sparseness)
190. Top 5 Things to Know
5. Avoid eval, with, exceptions near perf-senstive code
4. Avoid creating objects in hot loops
3. Use dense arrays (know what causes sparseness)
2. Write type-stable code
191. Top 5 Things to Know
5. Avoid eval, with, exceptions near perf-senstive code
4. Avoid creating objects in hot loops
3. Use dense arrays (know what causes sparseness)
2. Write type-stable code
1. ...
192. Talk To Us
JS engine developers want to help you. Tell us about:
• Performance faults you run into
• Exciting apps that require fast JS
• Anything interesting you discover about JS performance
Editor's Notes
\n
JavaScript now runs 10-100x faster than 5 years ago, fast on all major browsers\nDevelopers using it for new apps: interactive movies, games, photo editing, slides\nI&#x2019;m going to explain how it works to help you get the most out of these engines\n