Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Know Your Engines
How to Make Your JavaScript Fast

        Dave Mandelin
         June 15, 2011
        O’Reilly Velocity
5 years of progress...
                 10

                                                       JavaScript
                 7.5                                   C
run time vs. C



                  5


                 2.5


                  0
                   2006          2008           2011

                 one program on one popular browser:
                             10x faster!
...lost in an instant!
function f() {
  var sum = 0;
  for (var i = 0; i < N; ++i) {
    sum += i;
  }
}




function f() {
    eval(“”);
  var sum = 0;
  for (var i = 0; i < N; ++i) {
    sum += i;
  }
}
...lost in an instant!
function f() {                     80
  var sum = 0;
  for (var i = 0; i < N; ++i) {
    sum += i;                      60
  }
}                                  40


                                   20

function f() {
    eval(“”);                        0
                                  without eval             with eval
  var sum = 0;
  for (var i = 0; i < N; ++i) {
    sum += i;                             with eval(“”) up to
}
  }                                          10x slower!
Making JavaScript Fast
     Or, Not Making JavaScript Slow


How JITs make JavaScript not slow

How not to ruin animation with pauses

How to write JavaScript that’s not slow
The 2006 JavaScript Engine
Inside the 2006 JS Engine
                           DOM

                          Standard
Front End   Interpreter    Library



                          Garbage
                          Collector
Inside the 2006 JS Engine
// JavaScript source
e.innerHTML = n + “ items”;                  DOM

                                            Standard
            Front End         Interpreter    Library



                                            Garbage
                                            Collector
Inside the 2006 JS Engine
// JavaScript source
e.innerHTML = n + “ items”;                            DOM

                                                      Standard
            Front End                   Interpreter    Library



               // bytecode (AST in some engines)      Garbage
               tmp_0 = add var_1 str_3                Collector
               setprop var_0 ‘innerHTML’ tmp_0
Inside the 2006 JS Engine
// JavaScript source
e.innerHTML = n + “ items”;                                  DOM

                                                            Standard
            Front End                   Interpreter          Library
                                         Run the bytecode



               // bytecode (AST in some engines)            Garbage
               tmp_0 = add var_1 str_3                      Collector
               setprop var_0 ‘innerHTML’ tmp_0
Inside the 2006 JS Engine
// JavaScript source
e.innerHTML = n + “ items”;                                      DOM

                                                               Standard
            Front End                   Interpreter             Library
                                         Run the bytecode
                                                            Reclaim memory


               // bytecode (AST in some engines)               Garbage
               tmp_0 = add var_1 str_3                         Collector
               setprop var_0 ‘innerHTML’ tmp_0
Inside the 2006 JS Engine
                                                            Set innerHTML

// JavaScript source
e.innerHTML = n + “ items”;                                      DOM

                                                               Standard
            Front End                   Interpreter             Library
                                         Run the bytecode
                                                            Reclaim memory


               // bytecode (AST in some engines)               Garbage
               tmp_0 = add var_1 str_3                         Collector
               setprop var_0 ‘innerHTML’ tmp_0
Why it’s hard to make JS fast

                 Because
   JavaScript is an untyped language.


    untyped = no type declarations
Operations in an untyped language
          x = y + z can mean many things



  •   if y and z are numbers, numeric addition

  •   if y and z are strings, concatenation

  •   and many other cases; y and z can have different types
Engine-Internal Types
JS engines use finer-grained types internally.

 JavaScript type

   number




    object
Engine-Internal Types
JS engines use finer-grained types internally.

 JavaScript type             Engine type

   number                  32-bit* integer
                         64-bit floating-point



    object
Engine-Internal Types
JS engines use finer-grained types internally.

 JavaScript type              Engine type

   number                  32-bit* integer
                         64-bit floating-point


                                     { a: 1 }
                                 { a: 1, b: 2 }
    object
                                  { a: get ... }
                        { a: 1, __proto__ = new C }
Engine-Internal Types
JS engines use finer-grained types internally.

 JavaScript type              Engine type

   number                  32-bit* integer
                         64-bit floating-point


                                     { a: 1 }
                                 { a: 1, b: 2 }       Different
    object
                                  { a: get ... }       shapes
                        { a: 1, __proto__ = new C }
Values in an untyped language
Because JavaScript is untyped, the interpreter needs boxed values.

                              Boxed             Unboxed
          Purpose             Storage           Computation
          Examples            (INT, 55)             55

                           (STRING, “foo”)         “foo”

         Definition      (type tag, C++ value)    C++ value


         only boxed values can be stored in variables,
      only unboxed values can be computed with (+, *, etc)
Running Code in the Interpreter
 Here’s what the interpreter must do to execute x = y + z:
Running Code in the Interpreter
 Here’s what the interpreter must do to execute x = y + z:

 ‣ read the operation x = y + z from memory
Running Code in the Interpreter
 Here’s what the interpreter must do to execute x = y + z:

 ‣ read the operation x = y + z from memory
 ‣ read the boxed inputs y and z from memory
Running Code in the Interpreter
 Here’s what the interpreter must do to execute x = y + z:

 ‣ read the operation x = y + z from memory
 ‣ read the boxed inputs y and z from memory
 ‣ check the types of y and z and choose the action
Running Code in the Interpreter
 Here’s what the interpreter must do to execute x = y + z:

 ‣   read the operation x = y + z from memory
 ‣   read the boxed inputs y and z from memory
 ‣   check the types of y and z and choose the action
 ‣   unbox y and z
Running Code in the Interpreter
 Here’s what the interpreter must do to execute x = y + z:

 ‣   read the operation x = y + z from memory
 ‣   read the boxed inputs y and z from memory
 ‣   check the types of y and z and choose the action
 ‣   unbox y and z
 ‣   execute the action
Running Code in the Interpreter
 Here’s what the interpreter must do to execute x = y + z:

 ‣   read the operation x = y + z from memory
 ‣   read the boxed inputs y and z from memory
 ‣   check the types of y and z and choose the action
 ‣   unbox y and z
 ‣   execute the action
 ‣   box the output x
Running Code in the Interpreter
 Here’s what the interpreter must do to execute x = y + z:

 ‣   read the operation x = y + z from memory
 ‣   read the boxed inputs y and z from memory
 ‣   check the types of y and z and choose the action
 ‣   unbox y and z
 ‣   execute the action
 ‣   box the output x
 ‣   write the boxed output x to memory
Running Code in the Interpreter
 Here’s what the interpreter must do to execute x = y + z:

 ‣   read the operation x = y + z from memory
 ‣   read the boxed inputs y and z from memory
 ‣   check the types of y and z and choose the action
 ‣   unbox y and z
                                      This is the only real work!
 ‣   execute the action
 ‣   box the output x
 ‣   write the boxed output x to memory
Running Code in the Interpreter
 Here’s what the interpreter must do to execute x = y + z:

 ‣   read the operation x = y + z from memory
 ‣   read the boxed inputs y and z from memory
 ‣   check the types of y and z and choose the action
 ‣   unbox y and z
                                      This is the only real work!
 ‣   execute the action
 ‣   box the output x                        Everything else is
 ‣   write the boxed output x to memory           overhead.
The 2011 JavaScript Engine
Inside the 2011 JS Engine
                                                 Garbage
                                                 Collector   DOM
                                   Interpreter
JavaScript source                                            Standard
                                                              Library




          Front End




                    bytecode/AST
Inside the 2011 JS Engine
                                                            Garbage
                                                            Collector   DOM
                                     Interpreter
JavaScript source                                                       Standard
                                                                         Library




          Front End                 JIT Compiler

                                   Compile to x86/x64/ARM




                    bytecode/AST
Inside the 2011 JS Engine
                                                            Garbage
                                                            Collector     DOM
                                     Interpreter
JavaScript source                                                        Standard
                                                                          Library


                                                       Fast!     x86/x64/ARM
          Front End                 JIT Compiler

                                   Compile to x86/x64/ARM
                                                                          CPU

                    bytecode/AST
Inside the 2011 JS Engine
                                                            Garbage
                                                            Collector     DOM
                                     Interpreter
JavaScript source                                                        Standard
                                                                          Library


                                                       Fast!     x86/x64/ARM
          Front End                 JIT Compiler

                                   Compile to x86/x64/ARM
                                                                          CPU
                                   Type-Specializing
                    bytecode/AST     JIT Compiler    Ultra Fast!
Inside the 2011 JS Engine
                                                            Garbage
                                                            Collector     DOM
                                     Interpreter
JavaScript source                                                        Standard
                                                                          Library


                                                       Fast!     x86/x64/ARM
          Front End                 JIT Compiler

                                   Compile to x86/x64/ARM
                                                                          CPU
                                   Type-Specializing
                    bytecode/AST     JIT Compiler    Ultra Fast!
Inside the 2011 JS Engine
                THE                                         Garbage
                                                                          DOM
                                                            Collector
             SLOW ZONE               Interpreter
JavaScript source                                                        Standard
                                                                          Library


                                                       Fast!     x86/x64/ARM
          Front End                 JIT Compiler

                                   Compile to x86/x64/ARM
                                                                          CPU
                                   Type-Specializing
                    bytecode/AST     JIT Compiler    Ultra Fast!
Running Code with the JIT
                                                       All Major
        The basic JIT compiler on x = y + z:           Browsers

‣   read the operation x = y + z from memory
‣   read the inputs y and z from memory
‣   check the types of y and z and choose the action
‣   unbox y and z
‣   execute the action
‣   box the output x
‣   write the output x to memory
Running Code with the JIT
                                                      All Major
       The basic JIT compiler on x = y + z:           Browsers

‣   read the operation x = y + z from memory CPU does it for us!
‣   read the inputs y and z from memory
‣   check the types of y and z and choose the action
‣   unbox y and z
‣   execute the action
‣   box the output x
‣   write the output x to memory
Running Code with the JIT
                                                         All Major
       The basic JIT compiler on x = y + z:              Browsers

‣   read the operation x = y + z from memory CPU does it for us!
‣   read the inputs y and z from memory
‣   check the types of y and z and choose the action
‣   unbox y and z
‣   execute the action
‣   box the output x
‣   write the output x to memory JIT code can keep
                                   things in registers
Choosing the action in the JIT
Choosing the action in the JIT
•   Many cases for operators like +
Choosing the action in the JIT
•   Many cases for operators like +

•   Engines generate fast JIT code for “common cases”
    •   number + number

    •   string + string
Choosing the action in the JIT
•   Many cases for operators like +

•   Engines generate fast JIT code for “common cases”
    •   number + number

    •   string + string

•   “Rare cases” run in the slow zone
    •   number + undefined
JITs for Regular Expressions
                                                              All Major
                                                              Browsers
•   There is a separate JIT for regular expressions

•   Regular expressions are generally faster than manual search

•   Still in the slow zone:

    •   Some complex regexes (example: backreferences)

    •   Building result arrays (test much faster than exec)
Object Properties
  function f(obj) {
    return obj.a + 1;
  }
Object Properties
                     function f(obj) {
                       return obj.a + 1;
                     }




•   Need to search obj for a property named a   slow
Object Properties
                      function f(obj) {
                        return obj.a + 1;
                      }




•   Need to search obj for a property named a                slow

•   May need to search prototype chain up several levels   super-slow
Object Properties
                       function f(obj) {
                         return obj.a + 1;
                       }




•   Need to search obj for a property named a                slow

•   May need to search prototype chain up several levels   super-slow

•   Finally, once we’ve found it, get the property value     fast!
ICs: a mini-JIT for objects   All Major
                              Browsers
ICs: a mini-JIT for objects                       All Major
                                                          Browsers

•   Properties become fast with inline caching (we prefer IC)
ICs: a mini-JIT for objects                      All Major
                                                          Browsers

•   Properties become fast with inline caching (we prefer IC)

•   Basic plan:
ICs: a mini-JIT for objects                          All Major
                                                              Browsers

•   Properties become fast with inline caching (we prefer IC)

•   Basic plan:
    1. First time around, search for the property in the Slow Zone
ICs: a mini-JIT for objects                            All Major
                                                                Browsers

•   Properties become fast with inline caching (we prefer IC)

•   Basic plan:
    1. First time around, search for the property in the Slow Zone
    2. But record the steps done to actually get the property
ICs: a mini-JIT for objects                            All Major
                                                                Browsers

•   Properties become fast with inline caching (we prefer IC)

•   Basic plan:
    1. First time around, search for the property in the Slow Zone
    2. But record the steps done to actually get the property
    3. Then JIT a little piece of code that does just that
ICs: Example
          Example Code
var obj1 = { a: 1, b: 2, c: 3 };
var obj2 = { b: 2 };

function f(obj) {
  return obj.b + 1;
}
ICs: Example
          Example Code
var obj1 = { a: 1, b: 2, c: 3 };
var obj2 = { b: 2 };

function f(obj) {
  return obj.b + 1;
}


     Generated JIT Code
 ...
 jump slowPropAccess                       slowPropAccess:
continue_1:                                 ... set up call
 ...                                        call ICGetProp ; C++ Slow Zone
                                            jump continue_1
ICs: Example
         Example Code
                            shape=12, in position 1
var obj1 = { a: 1, b: 2, c: 3 };
var obj2 = { b: 2 };

function f(obj) {
  return obj.b + 1;
}


    Generated JIT Code
 ...
 jump slowPropAccess                                  slowPropAccess:
continue_1:                                            ... set up call
 ...                                                   call ICGetProp ; C++ Slow Zone
                                                       jump continue_1
ICs: Example
         Example Code                                 icStub_1:
                            shape=12, in position 1     compare obj.shape, 12
var obj1 = { a: 1, b: 2, c: 3 };                        jumpIfFalse slowPropAccess
var obj2 = { b: 2 };                                    load obj.props[1]
                                                        jump continue_1
function f(obj) {
  return obj.b + 1;
}


    Generated JIT Code
 ...
 jump slowPropAccess                                  slowPropAccess:
continue_1:                                            ... set up call
 ...                                                   call ICGetProp ; C++ Slow Zone
                                                       jump continue_1
ICs: Example
         Example Code                                 icStub_1:
                            shape=12, in position 1     compare obj.shape, 12
var obj1 = { a: 1, b: 2, c: 3 };                        jumpIfFalse slowPropAccess
var obj2 = { b: 2 };                                    load obj.props[1]
                                                        jump continue_1
function f(obj) {
  return obj.b + 1;
}


    Generated JIT Code
 ...
 jump slowPropAccess                                  slowPropAccess:
continue_1:                                            ... set up call
 ...                                                   call ICGetProp ; C++ Slow Zone
                                                       jump continue_1
ICs: Example
         Example Code                                 icStub_1:
                            shape=12, in position 1     compare obj.shape, 12
var obj1 = { a: 1, b: 2, c: 3 };                        jumpIfFalse slowPropAccess
var obj2 = { b: 2 };                                    load obj.props[1]
                                                        jump continue_1
function f(obj) {
  return obj.b + 1;
}


    Generated JIT Code
 ...
 jump slowPropAccess                                  slowPropAccess:
continue_1:                                            ... set up call
 ...                                                   call ICGetProp ; C++ Slow Zone
                                                       jump continue_1
ICs: Example
         Example Code                                 icStub_1:
                            shape=12, in position 1     compare obj.shape, 12
var obj1 = { a: 1, b: 2, c: 3 };                        jumpIfFalse slowPropAccess
var obj2 = { b: 2 };                                    load obj.props[1]
                       shape=15, in position 0          jump continue_1
function f(obj) {
  return obj.b + 1;
}


    Generated JIT Code
 ...
 jump slowPropAccess                                  slowPropAccess:
continue_1:                                            ... set up call
 ...                                                   call ICGetProp ; C++ Slow Zone
                                                       jump continue_1
ICs: Example
         Example Code                                 icStub_1:
                            shape=12, in position 1     compare obj.shape, 12
var obj1 = { a: 1, b: 2, c: 3 };                        jumpIfFalse slowPropAccess
var obj2 = { b: 2 };                                    load obj.props[1]
                       shape=15, in position 0          jump continue_1
function f(obj) {
  return obj.b + 1;
}                                                     icStub_2:
                                                        compare obj.shape, 15
                                                        jumpIfFalse slowPropAccess
    Generated JIT Code                                  load obj.props[0]
                                                        jump continue_1

 ...
 jump slowPropAccess                                  slowPropAccess:
continue_1:                                            ... set up call
 ...                                                   call ICGetProp ; C++ Slow Zone
                                                       jump continue_1
ICs: Example
         Example Code                                 icStub_1:
                            shape=12, in position 1     compare obj.shape, 12
var obj1 = { a: 1, b: 2, c: 3 };                        jumpIfFalse slowPropAccess
var obj2 = { b: 2 };                                    load obj.props[1]
                       shape=15, in position 0          jump continue_1
function f(obj) {
  return obj.b + 1;
}                                                     icStub_2:
                                                        compare obj.shape, 15
                                                        jumpIfFalse slowPropAccess
    Generated JIT Code                                  load obj.props[0]
                                                        jump continue_1

 ...
 jump slowPropAccess                                  slowPropAccess:
continue_1:                                            ... set up call
 ...                                                   call ICGetProp ; C++ Slow Zone
                                                       jump continue_1
ICs: Example
         Example Code                                 icStub_1:
                            shape=12, in position 1     compare obj.shape, 12
var obj1 = { a: 1, b: 2, c: 3 };                        jumpIfFalse slowPropAccess
var obj2 = { b: 2 };                                    load obj.props[1]
                       shape=15, in position 0          jump continue_1
function f(obj) {
  return obj.b + 1;
}                                                     icStub_2:
                                                        compare obj.shape, 15
                                                        jumpIfFalse slowPropAccess
    Generated JIT Code                                  load obj.props[0]
                                                        jump continue_1

 ...
 jump slowPropAccess                                  slowPropAccess:
continue_1:                                            ... set up call
 ...                                                   call ICGetProp ; C++ Slow Zone
                                                       jump continue_1
These are fast because of ICs
  Global Variable Access
var q = 4;
var r;

function f(obj) {
  r = q;
}
These are fast because of ICs
   Global Variable Access
var q = 4;
var r;

function f(obj) {
  r = q;
}


  Direct Property Access
var obj1 = { a: 1, b: 2, c: 3 };
var obj2 = { b: 2 };

function f(obj) {
  obj2.b = obj1.c;
}
These are fast because of ICs
   Global Variable Access           Closure Variable Access
var q = 4;                         var f = function() {
var r;                               var x = 1;
                                      var g = function() {
function f(obj) {                       var sum = 0;
  r = q;                                for (var i = 0; i < N; ++i) {
}                                         sum += x;
                                        }
                                        return sum;
  Direct Property Access              }
                                      return g();
var obj1 = { a: 1, b: 2, c: 3 };   }
var obj2 = { b: 2 };

function f(obj) {
  obj2.b = obj1.c;
}
Prototypes don’t hurt much
function A(x) {
  this.x = x;
}

function B(y) {
  this.y = y;
}
B.prototype = new A;

function C(z) {
  this.z = z;
}
C.prototype = new B;
Prototypes don’t hurt much
                                     new A
function A(x) {
  this.x = x;
}
                                     new B
function B(y) {              proto
  this.y = y;
}
                         new C(1)
B.prototype = new A;

function C(z) {
  this.z = z;
}
C.prototype = new B;
Prototypes don’t hurt much
                                      new A
function A(x) {
  this.x = x;
}
                                      new B
function B(y) {              proto
  this.y = y;
}
                         new C(1)    new C(2)
B.prototype = new A;

function C(z) {
  this.z = z;
}
C.prototype = new B;
Prototypes don’t hurt much
                                      new A
function A(x) {
  this.x = x;
}
                                      new B
function B(y) {              proto
  this.y = y;
}
                         new C(1)    new C(2)   new C(3)
B.prototype = new A;

function C(z) {
  this.z = z;
}
C.prototype = new B;
Prototypes don’t hurt much
                                          new A
function A(x) {
  this.x = x;
}
                                          new B
function B(y) {                  proto
  this.y = y;
}
                             new C(1)    new C(2)   new C(3)
B.prototype = new A;

function C(z) {
  this.z = z;
                       Shape of new C objects determines prototype
}
C.prototype = new B;
Prototypes don’t hurt much
                                                new A
function A(x) {
  this.x = x;
}
                                                new B
function B(y) {                        proto
  this.y = y;
}
                                   new C(1)    new C(2)   new C(3)
B.prototype = new A;

function C(z) {
  this.z = z;
                            Shape of new C objects determines prototype
}
C.prototype = new B;       -> IC can generate code that checks shape,
                       then reads directly from prototype without walking
Many Shapes Slow Down ICs
What happens if many shapes of obj are passed to f?
            function f(obj) {
              return obj.p;
            }


           ICs end up looking like this:
Many Shapes Slow Down ICs
         What happens if many shapes of obj are passed to f?
                      function f(obj) {
                        return obj.p;
                      }


                     ICs end up looking like this:
jumpIf shape != 12
read for shape 12
Many Shapes Slow Down ICs
         What happens if many shapes of obj are passed to f?
                              function f(obj) {
                                return obj.p;
                              }


                         ICs end up looking like this:
jumpIf shape != 12
read for shape 12


         jumpIf shape != 15
         read for shape 15
Many Shapes Slow Down ICs
         What happens if many shapes of obj are passed to f?
                              function f(obj) {
                                return obj.p;
                              }


                         ICs end up looking like this:
jumpIf shape != 12
read for shape 12


         jumpIf shape != 15
         read for shape 15

                     jumpIf shape != 6
                     read for shape 6
Many Shapes Slow Down ICs
         What happens if many shapes of obj are passed to f?
                              function f(obj) {
                                return obj.p;
                              }


                         ICs end up looking like this:                                 ...
jumpIf shape != 12                            jumpIf shape != 16
read for shape 12                             read for shape 16

         jumpIf shape != 15                             jumpIf shape != 22
         read for shape 15                              read for shape 22

                     jumpIf shape != 6                             jumpIf shape != 3
                     read for shape 6                              read for shape 3
Many shapes in practice
                        100
                                      IE                                            IE       Slow Zone for 2+ shapes
                                      Opera
                                      Chrome
                         75
                                                                                  Opera     # of shapes doesn’t matter!
nanoseconds/iteration




                                      Firefox
                                      Safari

                         50                                                       Chrome      more shapes -> slower

                                                                                  Firefox
                         25                                                                  slower with more shapes,
                                                                                            but levels off in Slow Zone
                                                                                   Safari
                          0
                              1   2          8       16       32      100   200
                                      # of shapes at property read site
Deeply Nested Closures are Slower
var f = function() {
 var x;
 var g = function() {
  var h = function() {
    var y;
    var i = function () {
      var j = function() {
       z = x + y;
Deeply Nested Closures are Slower
var f = function() {         f call object
 var x;
 var g = function() {
  var h = function() {       h call object
    var y;
    var i = function () {
      var j = function() {   j call object
       z = x + y;
                             First call to f
Deeply Nested Closures are Slower
var f = function() {         f call object      f call object
 var x;
 var g = function() {
  var h = function() {       h call object      h call object
    var y;
    var i = function () {
      var j = function() {   j call object      j call object
       z = x + y;
                             First call to f   Second call to f
Deeply Nested Closures are Slower
var f = function() {                     f call object      f call object
 var x;
 var g = function() {
  var h = function() {                   h call object      h call object
    var y;
    var i = function () {
      var j = function() {               j call object      j call object
       z = x + y;
                                         First call to f   Second call to f


•   Prototype chains don’t slow us down, but deep closure nesting does. Why?
Deeply Nested Closures are Slower
var f = function() {                      f call object      f call object
 var x;
 var g = function() {
  var h = function() {                    h call object      h call object
    var y;
    var i = function () {
      var j = function() {                j call object      j call object
       z = x + y;
                                          First call to f   Second call to f


•   Prototype chains don’t slow us down, but deep closure nesting does. Why?

•   Every call to f generates a unique closure object to hold x.
Deeply Nested Closures are Slower
var f = function() {                      f call object      f call object
 var x;
 var g = function() {
  var h = function() {                    h call object      h call object
    var y;
    var i = function () {
      var j = function() {                j call object      j call object
       z = x + y;
                                          First call to f   Second call to f


•   Prototype chains don’t slow us down, but deep closure nesting does. Why?

•   Every call to f generates a unique closure object to hold x.

•   The engine must walk up to x each time
Properties in the Slow Zone
Properties in the Slow Zone
     Undefined Property
       (Fast on Firefox, Chrome)

var a = {};
a.x;
Properties in the Slow Zone
        Undefined Property
           (Fast on Firefox, Chrome)

   var a = {};
   a.x;


                 DOM Access
(I only tested .id, so take with a grain of salt--
          other properties may differ)

var a = document.getByElementId(“foo”);
a.id;
Properties in the Slow Zone
        Undefined Property                                       Scripted Getter
           (Fast on Firefox, Chrome)                                   (Fast on IE)

   var a = {};                                       var a = { x: get() { return 1; } };
   a.x;                                              a.x;


                 DOM Access
(I only tested .id, so take with a grain of salt--
          other properties may differ)

var a = document.getByElementId(“foo”);
a.id;
Properties in the Slow Zone
        Undefined Property                                        Scripted Getter
           (Fast on Firefox, Chrome)                                    (Fast on IE)

   var a = {};                                        var a = { x: get() { return 1; } };
   a.x;                                               a.x;


                 DOM Access                                     Scripted Setter
(I only tested .id, so take with a grain of salt--
          other properties may differ)               var a = { x: set(y) { this.x_ = y; } };
                                                     a.x = 1;
var a = document.getByElementId(“foo”);
a.id;
The Type-Specializing JIT
                        Firefox 3.5+
                      (Tracemonkey)

                       Chrome 11+
                       (Crankshaft)
Types FTW!
If only JavaScript had type declarations...
Types FTW!
        If only JavaScript had type declarations...

➡ The JIT would know the type of every local variable
Types FTW!
        If only JavaScript had type declarations...

➡ The JIT would know the type of every local variable
 ➡ Know exactly what action to use (no type checks)
Types FTW!
        If only JavaScript had type declarations...

➡ The JIT would know the type of every local variable
 ➡ Know exactly what action to use (no type checks)
   ➡ Local variables don’t need to be boxed (or unboxed)
Types FTW!
        If only JavaScript had type declarations...

➡ The JIT would know the type of every local variable
 ➡ Know exactly what action to use (no type checks)
   ➡ Local variables don’t need to be boxed (or unboxed)
                  We call this kind of JIT a
                 type-specializing JIT
But JS doesn’t have types
But JS doesn’t have types

• Problem: JS doesn’t have type declarations
 • won’t have them any time soon
 • we don’t want to wait
But JS doesn’t have types

• Problem: JS doesn’t have type declarations
 • won’t have them any time soon
 • we don’t want to wait
• Solution: run the program for a bit, monitor types
But JS doesn’t have types

• Problem: JS doesn’t have type declarations
 • won’t have them any time soon
 • we don’t want to wait
• Solution: run the program for a bit, monitor types
• Then recompile optimized for those types
Running with the Type-Specializing JIT
                                                          Firefox 3.5+
                         On x = y + z:                    Chrome 11+
   ‣   read the operation x = y + z from memory
   ‣   read the inputs y and z from memory
   ‣   check the types of y and z and choose the action
   ‣   unbox y and z
   ‣   execute the action
   ‣   box the output x
   ‣   write the output x to memory
Running with the Type-Specializing JIT
                                                          Firefox 3.5+
                         On x = y + z:                    Chrome 11+
   ‣   read the operation x = y + z from memory
   ‣   read the inputs y and z from memory
   ‣   check the types of y and z and choose the action
   ‣   unbox y and z
   ‣   execute the action
   ‣   box the output x
   ‣   write the output x to memory
Running with the Type-Specializing JIT
                                                          Firefox 3.5+
                         On x = y + z:                    Chrome 11+
   ‣   read the operation x = y + z from memory
   ‣   read the inputs y and z from memory
   ‣   check the types of y and z and choose the action
   ‣   unbox y and z
   ‣   execute the action
   ‣   box the output x
   ‣   write the output x to memory
Further Optimization 1
                                Automatic Inlining

        original code
function getPop(city) {
  return popdata[city.id];
}

for (var i = 0; i < N; ++i) {
  total += getPop(city);
}
Further Optimization 1
                                Automatic Inlining

        original code                             JIT compiles as if
function getPop(city) {
                                                    you wrote this
  return popdata[city.id];
}                                             for (var i = 0; i < N; ++i) {
                                                total += popdata[city.id];
for (var i = 0; i < N; ++i) {                 }
  total += getPop(city);
}
Further Optimization 2
        Loop Invariant Code Motion (LICM, “hoisting”)

        original code

for (var i = 0; i < N; ++i) {
  total += a[i] *
        (1 + options.tax);
}
Further Optimization 2
        Loop Invariant Code Motion (LICM, “hoisting”)

        original code                  JIT compiles as if
                                         you wrote this
for (var i = 0; i < N; ++i) {       var f = 1 + options.tax;
  total += a[i] *                   for (var i = 0; i < N; ++i) {
        (1 + options.tax);            total += a[i] * f;
}                                   }
Optimize Only Hot Code
Optimize Only Hot Code
•   Type-specializing JITs can have a hefty startup cost
    •   Need to collect the type information

    •   Advanced compiler optimizations take longer to run
Optimize Only Hot Code
•   Type-specializing JITs can have a hefty startup cost
    •   Need to collect the type information

    •   Advanced compiler optimizations take longer to run

•   Therefore, type specialization is applied selectively
    •   Only on hot code

        •   Tracemonkey: hot = 70 iterations

        •   Crankshaft: hot = according to a profiler

    •   Only if judged to be worthwhile (incomprehensible heuristics)
Current Limitations
Current Limitations
• What happens if the types change after compiling?
 •   Just a few changes -> recompile, slight slowdown
 •   Many changes -> give up and deoptimize to basic JIT
Current Limitations
• What happens if the types change after compiling?
    •   Just a few changes -> recompile, slight slowdown
    •   Many changes -> give up and deoptimize to basic JIT
•   Array elements, object properties, and closed-over variables
    •   Usually still boxed
    •   Still need to check type and unbox on get, box on set
    •   Typed arrays might help, but support is not always there yet
Current Limitations
• What happens if the types change after compiling?
    •   Just a few changes -> recompile, slight slowdown
    •   Many changes -> give up and deoptimize to basic JIT
•   Array elements, object properties, and closed-over variables
    •   Usually still boxed
    •   Still need to check type and unbox on get, box on set
    •   Typed arrays might help, but support is not always there yet
•   JS semantics require overflow checks for integer math
Type Inference for JITs
                          Current
                          Research
                          @Mozilla
Type Inference
Type Inference

•   Trying to get rid of the last few instances of boxing
    (from before: array and object properties)
Type Inference

•   Trying to get rid of the last few instances of boxing
    (from before: array and object properties)

•   Idea: use static program analysis to prove types

    •   of object props, array elements, called functions

    •   or, almost prove types, and also prove minimal checks needed
Type Inference Example
 var a = [];
 for (var i = 0; i < N; ++i) {
   a[i] = i * i;
 ]

 var sum = 0;
 for (var i = 0; i < N; ++i) {
   sum += a[i];
 }


Type inference gets this...
Type Inference Example
 var a = [];
 for (var i = 0; i < N; ++i) {
   a[i] = i * i;
 ]

 var sum = 0;
 for (var i = 0; i < N; ++i) {
   sum += a[i];
 }


Type inference gets this...
      “i is always a number,
   so i * i is always a number,
  so a[_] is always a number!”
Type Inference Example
 var a = [];                      var a = [];
 for (var i = 0; i < N; ++i) {    for (var i = 0; i < N; ++i) {
   a[i] = i * i;                    if (i % 2)
 ]                                    a[i] = i * i;
                                    else
 var sum = 0;                         a[i] = “foo”;
 for (var i = 0; i < N; ++i) {    ]
   sum += a[i];
 }                                var sum = 0;
                                  for (var i = 0; i < N; ++i) {
                                    if (i % 2)
Type inference gets this...           sum += a[i];
                                  }
      “i is always a number,
   so i * i is always a number,          ...but not this.
  so a[_] is always a number!”
Type-stable JavaScript
      The key to running faster in future JITs is

                type-stable JavaScript.

        This means JavaScript where you could
declare a single engine-internal type for each variable.
Type-stable JS: examples
         Type-stable
var g = 34;

var o1 = { a: 56 };
var o2 = { a: 99 };

for (var i = 0; i < 10; ++i) {
  var o = i % 2 ? o1 : o2;
  g += o.a;
}

g = 0;
Type-stable JS: examples
         Type-stable                 NOT type-stable
var g = 34;                      var g = 34;

var o1 = { a: 56 };              var o1 = { a: 56 };
var o2 = { a: 99 };              var o2 = { z: 22, a: 56 };

for (var i = 0; i < 10; ++i) {   for (var i = 0; i < 10; ++i) {
  var o = i % 2 ? o1 : o2;         var o = i % 2 ? o1 : o2;
  g += o.a;                        g += o.a;
}                                }

g = 0;                           g = “hello”;
Type-stable JS: examples
         Type-stable                 NOT type-stable
var g = 34;                      var g = 34;

var o1 = { a: 56 };              var o1 = { a: 56 };
var o2 = { a: 99 };              var o2 = { z: 22, a: 56 };

for (var i = 0; i < 10; ++i) {   for (var i = 0; i < 10; ++i) {
  var o = i % 2 ? o1 : o2;         var o = i % 2 ? o1 : o2;
  g += o.a;                        g += o.a;      Different shapes
}                                }

g = 0;                           g = “hello”;
Type-stable JS: examples
         Type-stable                 NOT type-stable
var g = 34;                      var g = 34;

var o1 = { a: 56 };              var o1 = { a: 56 };
var o2 = { a: 99 };              var o2 = { z: 22, a: 56 };

for (var i = 0; i < 10; ++i) {   for (var i = 0; i < 10; ++i) {
  var o = i % 2 ? o1 : o2;         var o = i % 2 ? o1 : o2;
  g += o.a;                        g += o.a;      Different shapes
}                                }

g = 0;                           g = “hello”;     Type change
Garbage Collection
What Allocates Memory?
              Objects
new Object();
new MyConstructor();
{ a: 4, b: 5 }
Object.create();


                  Arrays
new Array();
[ 1, 2, 3, 4 ];


                  Strings
new String(“hello”);
“<p>” + e.innerHTML + “</p>”
What Allocates Memory?
              Objects               Function Objects
new Object();                  var x = function () { ... }
new MyConstructor();           new Function(code);
{ a: 4, b: 5 }
Object.create();


                  Arrays
new Array();
[ 1, 2, 3, 4 ];


                  Strings
new String(“hello”);
“<p>” + e.innerHTML + “</p>”
What Allocates Memory?
              Objects               Function Objects
new Object();                  var x = function () { ... }
new MyConstructor();           new Function(code);
{ a: 4, b: 5 }
Object.create();


                  Arrays       Closure Environments
new Array();                   function outer(name) {
[ 1, 2, 3, 4 ];                  var x = name;
                                 return function inner() {
                                   return “Hi, “ + name;
                  Strings        }
                               }
new String(“hello”);
“<p>” + e.innerHTML + “</p>”
What Allocates Memory?
              Objects                   Function Objects
new Object();                      var x = function () { ... }
new MyConstructor();               new Function(code);
{ a: 4, b: 5 }
Object.create();


                  Arrays            Closure Environments
new Array();                       function outer(name) {
[ 1, 2, 3, 4 ];                      var x = name;
                                     return function inner() {
                                       return “Hi, “ + name;
                  Strings            }
                                   }
new String(“hello”);             name is stored in an
“<p>” + e.innerHTML + “</p>”   implicitly created object!
GC Pauses Your Program!
Time   JavaScript   GC Running
        Running      JS Paused
GC Pauses Your Program!
Time           JavaScript    GC Running
                Running       JS Paused


       •   Basic GC algorithm (mark and sweep)
           •   Traverse all reachable objects (from locals, window, DOM)

           •   Recycle objects that are not reachable
GC Pauses Your Program!
Time           JavaScript    GC Running
                Running       JS Paused


       •   Basic GC algorithm (mark and sweep)
           •   Traverse all reachable objects (from locals, window, DOM)

           •   Recycle objects that are not reachable

       •   The JS program is paused during GC for safe traversal
GC Pauses Your Program!
Time           JavaScript    GC Running
                Running       JS Paused


       •   Basic GC algorithm (mark and sweep)
           •   Traverse all reachable objects (from locals, window, DOM)

           •   Recycle objects that are not reachable

       • The JS program is paused during GC for safe traversal
       • Pauses may be long: 100 ms or more
           •   Serious problem for animation

           •   Can also be a drag on general performance
Reducing Pauses with Science 1
        Generational GC    Chrome
Reducing Pauses with Science 1
             Generational GC                            Chrome
 Idea: Optimize for creating many short-lived objects
Reducing Pauses with Science 1
              Generational GC                              Chrome
 Idea: Optimize for creating many short-lived objects

   Create objects in a frequently collected nursery area
Reducing Pauses with Science 1
               Generational GC                                   Chrome
 Idea: Optimize for creating many short-lived objects

    Create objects in a frequently collected nursery area
 Promote long-lived objects to a rarely collected tenured area
Reducing Pauses with Science 1
                     Generational GC                                  Chrome
       Idea: Optimize for creating many short-lived objects

         Create objects in a frequently collected nursery area
      Promote long-lived objects to a rarely collected tenured area

            JavaScript   GC Running
Simple GC
             Running      JS Paused
Reducing Pauses with Science 1
                            Generational GC                                  Chrome
             Idea: Optimize for creating many short-lived objects

                Create objects in a frequently collected nursery area
             Promote long-lived objects to a rarely collected tenured area

                   JavaScript   GC Running
      Simple GC
                    Running      JS Paused


                   JavaScript
Generational GC
                    Running
Reducing Pauses with Science 1
                            Generational GC                                  Chrome
             Idea: Optimize for creating many short-lived objects

                Create objects in a frequently collected nursery area
             Promote long-lived objects to a rarely collected tenured area

                   JavaScript      GC Running
      Simple GC
                    Running         JS Paused


                   JavaScript
Generational GC
                    Running


                                nursery collection (<100 us)
Reducing Pauses with Science 1
                            Generational GC                                   Chrome
             Idea: Optimize for creating many short-lived objects

                Create objects in a frequently collected nursery area
             Promote long-lived objects to a rarely collected tenured area

                   JavaScript      GC Running
      Simple GC
                    Running         JS Paused


                   JavaScript
Generational GC
                    Running


                                                               tenured collection
                                nursery collection (<100 us)
Reducing Pauses with Science 1
                            Generational GC                                   Chrome
             Idea: Optimize for creating many short-lived objects

                Create objects in a frequently collected nursery area
             Promote long-lived objects to a rarely collected tenured area

                   JavaScript      GC Running
      Simple GC
                    Running         JS Paused


                   JavaScript
Generational GC                                                               fewer pauses!
                    Running


                                                               tenured collection
                                nursery collection (<100 us)
Generational GC by Example
scavenging young generation (aka nursery)




mark-and-sweep tenured generation


                  Message
                                            Message
                                  Array
Generational GC by Example
scavenging young generation (aka nursery)

  Point



mark-and-sweep tenured generation


                  Message
                                            Message
                                  Array
Generational GC by Example
scavenging young generation (aka nursery)

  Point   Point



mark-and-sweep tenured generation


                  Message
                                            Message
                                  Array
Generational GC by Example
scavenging young generation (aka nursery)

  Point   Point    Line



mark-and-sweep tenured generation


                  Message
                                            Message
                                  Array
Generational GC by Example
scavenging young generation (aka nursery)

  Point   Point       Line
                  a      b



mark-and-sweep tenured generation


                      Message
                                            Message
                                  Array
Generational GC by Example
scavenging young generation (aka nursery)

  Point   Point       Line      Point
                  a      b



mark-and-sweep tenured generation


                      Message
                                                Message
                                        Array
Generational GC by Example
scavenging young generation (aka nursery)

  Point   Point       Line      Point   Message
                  a      b



mark-and-sweep tenured generation


                      Message
                                                  Message
                                        Array
Generational GC by Example
scavenging young generation (aka nursery)

  Point   Point       Line      Point   Message
                  a      b



mark-and-sweep tenured generation


                      Message
                                                  Message
                                        Array
Generational GC by Example
scavenging young generation (aka nursery)

  Point   Point       Line      Point   Message   Point
                  a      b



mark-and-sweep tenured generation


                      Message
                                                          Message
                                        Array
Generational GC by Example
scavenging young generation (aka nursery)

  Point   Point       Line      Point              Point
                  a      b



mark-and-sweep tenured generation

                                                Message
                      Message
                                                           Message
                                        Array
Generational GC by Example
scavenging young generation (aka nursery)




mark-and-sweep tenured generation

                                          Message
                  Message
                                                    Message
                                  Array
Reducing Pauses with Science 1I
                            Current
         Incremental GC     Research
                            @Mozilla
Reducing Pauses with Science 1I
                                                      Current
              Incremental GC                          Research
                                                      @Mozilla
    Idea: Do a little bit of GC traversal at a time
Reducing Pauses with Science 1I
                                                              Current
                          Incremental GC                      Research
                                                              @Mozilla
            Idea: Do a little bit of GC traversal at a time



             JavaScript    GC Running
Simple GC
              Running       JS Paused
Reducing Pauses with Science 1I
                                                                   Current
                               Incremental GC                      Research
                                                                   @Mozilla
                 Idea: Do a little bit of GC traversal at a time



                  JavaScript    GC Running
    Simple GC
                   Running       JS Paused



Incremental GC
Reducing Pauses with Science 1I
                                                                   Current
                               Incremental GC                      Research
                                                                   @Mozilla
                 Idea: Do a little bit of GC traversal at a time



                  JavaScript    GC Running
    Simple GC
                   Running       JS Paused



Incremental GC                                                     shorter pauses!
Reducing Pauses in Practice
Reducing Pauses in Practice
•   For all GCs
    •   Fewer live objects -> shorter pauses (if not incremental),
                              less time spent in GC
Reducing Pauses in Practice
•   For all GCs
    •   Fewer live objects -> shorter pauses (if not incremental),
                              less time spent in GC

•   For simple GCs
    •   Lower allocation rate (objects/second) -> less frequent pauses
Reducing Pauses in Practice
•   For all GCs
    •   Fewer live objects -> shorter pauses (if not incremental),
                              less time spent in GC

•   For simple GCs
    •   Lower allocation rate (objects/second) -> less frequent pauses

•   For generational GCs
    •   Short-lived objects don’t affect pause frequency

    •   Long-lived objects cost extra (promotion = copying)
JavaScript Engines in Practice
Performance Faults
•   Performance fault: when a tiny change hurts performance

    •   Sometimes, just makes one statement slower

    •   Other times, deoptimizes the entire function!

•   Reasons we have performance faults

    •   bug, tends to get quickly

    •   “rare” case, will get fixed if not rare

    •   hard to optimize, RSN...
Strings
Strings
•   In the Slow Zone, but some things are faster than you might think
Strings
•   In the Slow Zone, but some things are faster than you might think

•   .substring() is fast, O(1)

    •   Don’t need to copy characters, just point within original
Strings
•   In the Slow Zone, but some things are faster than you might think

•   .substring() is fast, O(1)

    •   Don’t need to copy characters, just point within original

•   Concatenation is also optimized

    •   Batch up inputs in a rope or concat tree, concat all at once

    •   Performance fault: prepending (Chrome, Opera)
Strings
•   In the Slow Zone, but some things are faster than you might think

•   .substring() is fast, O(1)         // Prepending example
                                       var s = “”;
    •Don’t need to copy characters, just point iwithin<original {
                                       for (var = 0; i 100; ++i)
                                         s = i + s;
•   Concatenation is also optimized    }


    •   Batch up inputs in a rope or concat tree, concat all at once

    •   Performance fault: prepending (Chrome, Opera)
Arrays
         fast: dense array
var a = [];
                                       Want a fast array?
for (var i = 0; i < 100; ++i) {
  a[i] = 0;                          ‣ Make sure it’s dense
}
                                     ‣ 0..N fill or push fill is always dense
  3-15x slower: sparse array         ‣ Huge gaps are always sparse
var a = [];                          ‣ N..0 fill is sparse on Firefox
a[10000] = 0;
for (var i = 0; i < 100; ++i) {
  a[i] = 0;
                                     ‣ adding a named property is sparse
                                       on Firefox, IE
}
a.x = 7; // Fx, IE only
Iteration over Arrays

fastest: index iteration
// This runs in all in JIT code,
// so it’s really fast.
for (var i = 0; i < a.length; ++i) {
  sum += a[i];
}
Iteration over Arrays
                                       3-15x slower: functional style
                                       // This makes N function calls,
fastest: index iteration               // and most JITs don’t optimize
                                       // through C++ reduce().
                                       sum = a.reduce(function(a, b) {
// This runs in all in JIT code,
                                                 return a + b; });
// so it’s really fast.
for (var i = 0; i < a.length; ++i) {
  sum += a[i];
}
                                       20-80x slower: for-in
                                       // This calls a C++ function to
                                       // navigate the property list.
                                       for (var i in a) {
                                         sum += a[i];
                                       }
Functions
•   Function calls use ICs, so they are fast

    •   Manual inlining can still help sometimes

•   Key performance faults:

    • f.call() - 1.3-35x slower than f()
    • f.apply() - 5-50x slower than f()
    • arguments - often very slow, but varies
Creating Objects
              Creating objects is slow
Doesn’t matter too much how you create or populate
Creating Objects
              Creating objects is slow
Doesn’t matter too much how you create or populate


    Exception: Constructors on Chrome are fast
             function Cons(x, y, z) {
               this.x = x;
               this.y = y;
               this.z = z;
             }

             for (var i = 0; i < N; ++i)
              new Cons(i, i + 1, i * 2);
OOP Styling
OOP Styling
         Prototype
function Point(x, y) {
  this.x = x;
  this.y = y;
}
Point.prototype = {
  distance: function(pt2) ...
OOP Styling
         Prototype                        Information-Hiding
function Point(x, y) {
  this.x = x;                          function Point(x, y) {
  this.y = y;                            return {
}                                          distance: function(pt2) ...
Point.prototype = {                      }
  distance: function(pt2) ...          }
OOP Styling
         Prototype                        Information-Hiding
function Point(x, y) {
  this.x = x;                          function Point(x, y) {
  this.y = y;                            return {
}                                          distance: function(pt2) ...
Point.prototype = {                      }
  distance: function(pt2) ...          }


                                          Instance Methods
                                       function Point(x, y) {
                                         this.x = x;
                                         this.y = y;
                                         this.distance = function(pt2) ...
                                       }
OOP Styling
         Prototype                            Information-Hiding
function Point(x, y) {
  this.x = x;                              function Point(x, y) {
  this.y = y;                                return {
}                                              distance: function(pt2) ...
Point.prototype = {                          }
  distance: function(pt2) ...              }



Prototype style is much faster to create      Instance Methods
(each closure creates a function object)   function Point(x, y) {
                                             this.x = x;
                                             this.y = y;
                                             this.distance = function(pt2) ...
                                           }
OOP Styling
         Prototype                            Information-Hiding
function Point(x, y) {
  this.x = x;                              function Point(x, y) {
  this.y = y;                                return {
}                                              distance: function(pt2) ...
Point.prototype = {                          }
  distance: function(pt2) ...              }



Prototype style is much faster to create      Instance Methods
(each closure creates a function object)   function Point(x, y) {
                                             this.x = x;
                                             this.y = y;
                                             this.distance = function(pt2) ...
                                           }
Using the objects is about the same
Exceptions
•   Exceptions assumed to be rare in perf-sensitive code

    •   running a try statement is free on most browers

    •   throw/catch is really slow

•   There are many performance faults around exceptions

    •   just having a try statement deoptimizes on some browers

    •   try-finally is perf fault on some
eval and with
                             Short version:
          Do not use anywhere near performance sensitive code!


 Mind-Bogglingly Awful                               Still Terrible
5-100x slower than using a function call       2-10x slower than without eval
var sum = 0;                               var sum = 0;
for (var i = 0; i < N; ++i) {              eval(“”);
  sum = eval(“sum + i”);                   for (var i = 0; i < N; ++i) {
}                                            sum = eval(“sum + i”);
                                           }
Top 5 Things to Know
Top 5 Things to Know
5. Avoid eval, with, exceptions near perf-senstive code
Top 5 Things to Know
5. Avoid eval, with, exceptions near perf-senstive code
4. Avoid creating objects in hot loops
Top 5 Things to Know
5. Avoid eval, with, exceptions near perf-senstive code
4. Avoid creating objects in hot loops
3. Use dense arrays (know what causes sparseness)
Top 5 Things to Know
5. Avoid eval, with, exceptions near perf-senstive code
4. Avoid creating objects in hot loops
3. Use dense arrays (know what causes sparseness)
2. Write type-stable code
Top 5 Things to Know
5. Avoid eval, with, exceptions near perf-senstive code
4. Avoid creating objects in hot loops
3. Use dense arrays (know what causes sparseness)
2. Write type-stable code
1. ...
Talk To Us

JS engine developers want to help you. Tell us about:

   •   Performance faults you run into

   •   Exciting apps that require fast JS

   •   Anything interesting you discover about JS performance

More Related Content

Know yourengines velocity2011

  • 1. Know Your Engines How to Make Your JavaScript Fast Dave Mandelin June 15, 2011 O’Reilly Velocity
  • 2. 5 years of progress... 10 JavaScript 7.5 C run time vs. C 5 2.5 0 2006 2008 2011 one program on one popular browser: 10x faster!
  • 3. ...lost in an instant! function f() { var sum = 0; for (var i = 0; i < N; ++i) { sum += i; } } function f() { eval(“”); var sum = 0; for (var i = 0; i < N; ++i) { sum += i; } }
  • 4. ...lost in an instant! function f() { 80 var sum = 0; for (var i = 0; i < N; ++i) { sum += i; 60 } } 40 20 function f() { eval(“”); 0 without eval with eval var sum = 0; for (var i = 0; i < N; ++i) { sum += i; with eval(“”) up to } } 10x slower!
  • 5. Making JavaScript Fast Or, Not Making JavaScript Slow How JITs make JavaScript not slow How not to ruin animation with pauses How to write JavaScript that’s not slow
  • 7. Inside the 2006 JS Engine DOM Standard Front End Interpreter Library Garbage Collector
  • 8. Inside the 2006 JS Engine // JavaScript source e.innerHTML = n + “ items”; DOM Standard Front End Interpreter Library Garbage Collector
  • 9. Inside the 2006 JS Engine // JavaScript source e.innerHTML = n + “ items”; DOM Standard Front End Interpreter Library // bytecode (AST in some engines) Garbage tmp_0 = add var_1 str_3 Collector setprop var_0 ‘innerHTML’ tmp_0
  • 10. Inside the 2006 JS Engine // JavaScript source e.innerHTML = n + “ items”; DOM Standard Front End Interpreter Library Run the bytecode // bytecode (AST in some engines) Garbage tmp_0 = add var_1 str_3 Collector setprop var_0 ‘innerHTML’ tmp_0
  • 11. Inside the 2006 JS Engine // JavaScript source e.innerHTML = n + “ items”; DOM Standard Front End Interpreter Library Run the bytecode Reclaim memory // bytecode (AST in some engines) Garbage tmp_0 = add var_1 str_3 Collector setprop var_0 ‘innerHTML’ tmp_0
  • 12. Inside the 2006 JS Engine Set innerHTML // JavaScript source e.innerHTML = n + “ items”; DOM Standard Front End Interpreter Library Run the bytecode Reclaim memory // bytecode (AST in some engines) Garbage tmp_0 = add var_1 str_3 Collector setprop var_0 ‘innerHTML’ tmp_0
  • 13. Why it’s hard to make JS fast Because JavaScript is an untyped language. untyped = no type declarations
  • 14. Operations in an untyped language x = y + z can mean many things • if y and z are numbers, numeric addition • if y and z are strings, concatenation • and many other cases; y and z can have different types
  • 15. Engine-Internal Types JS engines use finer-grained types internally. JavaScript type number object
  • 16. Engine-Internal Types JS engines use finer-grained types internally. JavaScript type Engine type number 32-bit* integer 64-bit floating-point object
  • 17. Engine-Internal Types JS engines use finer-grained types internally. JavaScript type Engine type number 32-bit* integer 64-bit floating-point { a: 1 } { a: 1, b: 2 } object { a: get ... } { a: 1, __proto__ = new C }
  • 18. Engine-Internal Types JS engines use finer-grained types internally. JavaScript type Engine type number 32-bit* integer 64-bit floating-point { a: 1 } { a: 1, b: 2 } Different object { a: get ... } shapes { a: 1, __proto__ = new C }
  • 19. Values in an untyped language Because JavaScript is untyped, the interpreter needs boxed values. Boxed Unboxed Purpose Storage Computation Examples (INT, 55) 55 (STRING, “foo”) “foo” Definition (type tag, C++ value) C++ value only boxed values can be stored in variables, only unboxed values can be computed with (+, *, etc)
  • 20. Running Code in the Interpreter Here’s what the interpreter must do to execute x = y + z:
  • 21. Running Code in the Interpreter Here’s what the interpreter must do to execute x = y + z: ‣ read the operation x = y + z from memory
  • 22. Running Code in the Interpreter Here’s what the interpreter must do to execute x = y + z: ‣ read the operation x = y + z from memory ‣ read the boxed inputs y and z from memory
  • 23. Running Code in the Interpreter Here’s what the interpreter must do to execute x = y + z: ‣ read the operation x = y + z from memory ‣ read the boxed inputs y and z from memory ‣ check the types of y and z and choose the action
  • 24. Running Code in the Interpreter Here’s what the interpreter must do to execute x = y + z: ‣ read the operation x = y + z from memory ‣ read the boxed inputs y and z from memory ‣ check the types of y and z and choose the action ‣ unbox y and z
  • 25. Running Code in the Interpreter Here’s what the interpreter must do to execute x = y + z: ‣ read the operation x = y + z from memory ‣ read the boxed inputs y and z from memory ‣ check the types of y and z and choose the action ‣ unbox y and z ‣ execute the action
  • 26. Running Code in the Interpreter Here’s what the interpreter must do to execute x = y + z: ‣ read the operation x = y + z from memory ‣ read the boxed inputs y and z from memory ‣ check the types of y and z and choose the action ‣ unbox y and z ‣ execute the action ‣ box the output x
  • 27. Running Code in the Interpreter Here’s what the interpreter must do to execute x = y + z: ‣ read the operation x = y + z from memory ‣ read the boxed inputs y and z from memory ‣ check the types of y and z and choose the action ‣ unbox y and z ‣ execute the action ‣ box the output x ‣ write the boxed output x to memory
  • 28. Running Code in the Interpreter Here’s what the interpreter must do to execute x = y + z: ‣ read the operation x = y + z from memory ‣ read the boxed inputs y and z from memory ‣ check the types of y and z and choose the action ‣ unbox y and z This is the only real work! ‣ execute the action ‣ box the output x ‣ write the boxed output x to memory
  • 29. Running Code in the Interpreter Here’s what the interpreter must do to execute x = y + z: ‣ read the operation x = y + z from memory ‣ read the boxed inputs y and z from memory ‣ check the types of y and z and choose the action ‣ unbox y and z This is the only real work! ‣ execute the action ‣ box the output x Everything else is ‣ write the boxed output x to memory overhead.
  • 31. Inside the 2011 JS Engine Garbage Collector DOM Interpreter JavaScript source Standard Library Front End bytecode/AST
  • 32. Inside the 2011 JS Engine Garbage Collector DOM Interpreter JavaScript source Standard Library Front End JIT Compiler Compile to x86/x64/ARM bytecode/AST
  • 33. Inside the 2011 JS Engine Garbage Collector DOM Interpreter JavaScript source Standard Library Fast! x86/x64/ARM Front End JIT Compiler Compile to x86/x64/ARM CPU bytecode/AST
  • 34. Inside the 2011 JS Engine Garbage Collector DOM Interpreter JavaScript source Standard Library Fast! x86/x64/ARM Front End JIT Compiler Compile to x86/x64/ARM CPU Type-Specializing bytecode/AST JIT Compiler Ultra Fast!
  • 35. Inside the 2011 JS Engine Garbage Collector DOM Interpreter JavaScript source Standard Library Fast! x86/x64/ARM Front End JIT Compiler Compile to x86/x64/ARM CPU Type-Specializing bytecode/AST JIT Compiler Ultra Fast!
  • 36. Inside the 2011 JS Engine THE Garbage DOM Collector SLOW ZONE Interpreter JavaScript source Standard Library Fast! x86/x64/ARM Front End JIT Compiler Compile to x86/x64/ARM CPU Type-Specializing bytecode/AST JIT Compiler Ultra Fast!
  • 37. Running Code with the JIT All Major The basic JIT compiler on x = y + z: Browsers ‣ read the operation x = y + z from memory ‣ read the inputs y and z from memory ‣ check the types of y and z and choose the action ‣ unbox y and z ‣ execute the action ‣ box the output x ‣ write the output x to memory
  • 38. Running Code with the JIT All Major The basic JIT compiler on x = y + z: Browsers ‣ read the operation x = y + z from memory CPU does it for us! ‣ read the inputs y and z from memory ‣ check the types of y and z and choose the action ‣ unbox y and z ‣ execute the action ‣ box the output x ‣ write the output x to memory
  • 39. Running Code with the JIT All Major The basic JIT compiler on x = y + z: Browsers ‣ read the operation x = y + z from memory CPU does it for us! ‣ read the inputs y and z from memory ‣ check the types of y and z and choose the action ‣ unbox y and z ‣ execute the action ‣ box the output x ‣ write the output x to memory JIT code can keep things in registers
  • 40. Choosing the action in the JIT
  • 41. Choosing the action in the JIT • Many cases for operators like +
  • 42. Choosing the action in the JIT • Many cases for operators like + • Engines generate fast JIT code for “common cases” • number + number • string + string
  • 43. Choosing the action in the JIT • Many cases for operators like + • Engines generate fast JIT code for “common cases” • number + number • string + string • “Rare cases” run in the slow zone • number + undefined
  • 44. JITs for Regular Expressions All Major Browsers • There is a separate JIT for regular expressions • Regular expressions are generally faster than manual search • Still in the slow zone: • Some complex regexes (example: backreferences) • Building result arrays (test much faster than exec)
  • 45. Object Properties function f(obj) { return obj.a + 1; }
  • 46. Object Properties function f(obj) { return obj.a + 1; } • Need to search obj for a property named a slow
  • 47. Object Properties function f(obj) { return obj.a + 1; } • Need to search obj for a property named a slow • May need to search prototype chain up several levels super-slow
  • 48. Object Properties function f(obj) { return obj.a + 1; } • Need to search obj for a property named a slow • May need to search prototype chain up several levels super-slow • Finally, once we’ve found it, get the property value fast!
  • 49. ICs: a mini-JIT for objects All Major Browsers
  • 50. ICs: a mini-JIT for objects All Major Browsers • Properties become fast with inline caching (we prefer IC)
  • 51. ICs: a mini-JIT for objects All Major Browsers • Properties become fast with inline caching (we prefer IC) • Basic plan:
  • 52. ICs: a mini-JIT for objects All Major Browsers • Properties become fast with inline caching (we prefer IC) • Basic plan: 1. First time around, search for the property in the Slow Zone
  • 53. ICs: a mini-JIT for objects All Major Browsers • Properties become fast with inline caching (we prefer IC) • Basic plan: 1. First time around, search for the property in the Slow Zone 2. But record the steps done to actually get the property
  • 54. ICs: a mini-JIT for objects All Major Browsers • Properties become fast with inline caching (we prefer IC) • Basic plan: 1. First time around, search for the property in the Slow Zone 2. But record the steps done to actually get the property 3. Then JIT a little piece of code that does just that
  • 55. ICs: Example Example Code var obj1 = { a: 1, b: 2, c: 3 }; var obj2 = { b: 2 }; function f(obj) { return obj.b + 1; }
  • 56. ICs: Example Example Code var obj1 = { a: 1, b: 2, c: 3 }; var obj2 = { b: 2 }; function f(obj) { return obj.b + 1; } Generated JIT Code ... jump slowPropAccess slowPropAccess: continue_1: ... set up call ... call ICGetProp ; C++ Slow Zone jump continue_1
  • 57. ICs: Example Example Code shape=12, in position 1 var obj1 = { a: 1, b: 2, c: 3 }; var obj2 = { b: 2 }; function f(obj) { return obj.b + 1; } Generated JIT Code ... jump slowPropAccess slowPropAccess: continue_1: ... set up call ... call ICGetProp ; C++ Slow Zone jump continue_1
  • 58. ICs: Example Example Code icStub_1: shape=12, in position 1 compare obj.shape, 12 var obj1 = { a: 1, b: 2, c: 3 }; jumpIfFalse slowPropAccess var obj2 = { b: 2 }; load obj.props[1] jump continue_1 function f(obj) { return obj.b + 1; } Generated JIT Code ... jump slowPropAccess slowPropAccess: continue_1: ... set up call ... call ICGetProp ; C++ Slow Zone jump continue_1
  • 59. ICs: Example Example Code icStub_1: shape=12, in position 1 compare obj.shape, 12 var obj1 = { a: 1, b: 2, c: 3 }; jumpIfFalse slowPropAccess var obj2 = { b: 2 }; load obj.props[1] jump continue_1 function f(obj) { return obj.b + 1; } Generated JIT Code ... jump slowPropAccess slowPropAccess: continue_1: ... set up call ... call ICGetProp ; C++ Slow Zone jump continue_1
  • 60. ICs: Example Example Code icStub_1: shape=12, in position 1 compare obj.shape, 12 var obj1 = { a: 1, b: 2, c: 3 }; jumpIfFalse slowPropAccess var obj2 = { b: 2 }; load obj.props[1] jump continue_1 function f(obj) { return obj.b + 1; } Generated JIT Code ... jump slowPropAccess slowPropAccess: continue_1: ... set up call ... call ICGetProp ; C++ Slow Zone jump continue_1
  • 61. ICs: Example Example Code icStub_1: shape=12, in position 1 compare obj.shape, 12 var obj1 = { a: 1, b: 2, c: 3 }; jumpIfFalse slowPropAccess var obj2 = { b: 2 }; load obj.props[1] shape=15, in position 0 jump continue_1 function f(obj) { return obj.b + 1; } Generated JIT Code ... jump slowPropAccess slowPropAccess: continue_1: ... set up call ... call ICGetProp ; C++ Slow Zone jump continue_1
  • 62. ICs: Example Example Code icStub_1: shape=12, in position 1 compare obj.shape, 12 var obj1 = { a: 1, b: 2, c: 3 }; jumpIfFalse slowPropAccess var obj2 = { b: 2 }; load obj.props[1] shape=15, in position 0 jump continue_1 function f(obj) { return obj.b + 1; } icStub_2: compare obj.shape, 15 jumpIfFalse slowPropAccess Generated JIT Code load obj.props[0] jump continue_1 ... jump slowPropAccess slowPropAccess: continue_1: ... set up call ... call ICGetProp ; C++ Slow Zone jump continue_1
  • 63. ICs: Example Example Code icStub_1: shape=12, in position 1 compare obj.shape, 12 var obj1 = { a: 1, b: 2, c: 3 }; jumpIfFalse slowPropAccess var obj2 = { b: 2 }; load obj.props[1] shape=15, in position 0 jump continue_1 function f(obj) { return obj.b + 1; } icStub_2: compare obj.shape, 15 jumpIfFalse slowPropAccess Generated JIT Code load obj.props[0] jump continue_1 ... jump slowPropAccess slowPropAccess: continue_1: ... set up call ... call ICGetProp ; C++ Slow Zone jump continue_1
  • 64. ICs: Example Example Code icStub_1: shape=12, in position 1 compare obj.shape, 12 var obj1 = { a: 1, b: 2, c: 3 }; jumpIfFalse slowPropAccess var obj2 = { b: 2 }; load obj.props[1] shape=15, in position 0 jump continue_1 function f(obj) { return obj.b + 1; } icStub_2: compare obj.shape, 15 jumpIfFalse slowPropAccess Generated JIT Code load obj.props[0] jump continue_1 ... jump slowPropAccess slowPropAccess: continue_1: ... set up call ... call ICGetProp ; C++ Slow Zone jump continue_1
  • 65. These are fast because of ICs Global Variable Access var q = 4; var r; function f(obj) { r = q; }
  • 66. These are fast because of ICs Global Variable Access var q = 4; var r; function f(obj) { r = q; } Direct Property Access var obj1 = { a: 1, b: 2, c: 3 }; var obj2 = { b: 2 }; function f(obj) { obj2.b = obj1.c; }
  • 67. These are fast because of ICs Global Variable Access Closure Variable Access var q = 4; var f = function() { var r; var x = 1; var g = function() { function f(obj) { var sum = 0; r = q; for (var i = 0; i < N; ++i) { } sum += x; } return sum; Direct Property Access } return g(); var obj1 = { a: 1, b: 2, c: 3 }; } var obj2 = { b: 2 }; function f(obj) { obj2.b = obj1.c; }
  • 68. Prototypes don’t hurt much function A(x) { this.x = x; } function B(y) { this.y = y; } B.prototype = new A; function C(z) { this.z = z; } C.prototype = new B;
  • 69. Prototypes don’t hurt much new A function A(x) { this.x = x; } new B function B(y) { proto this.y = y; } new C(1) B.prototype = new A; function C(z) { this.z = z; } C.prototype = new B;
  • 70. Prototypes don’t hurt much new A function A(x) { this.x = x; } new B function B(y) { proto this.y = y; } new C(1) new C(2) B.prototype = new A; function C(z) { this.z = z; } C.prototype = new B;
  • 71. Prototypes don’t hurt much new A function A(x) { this.x = x; } new B function B(y) { proto this.y = y; } new C(1) new C(2) new C(3) B.prototype = new A; function C(z) { this.z = z; } C.prototype = new B;
  • 72. Prototypes don’t hurt much new A function A(x) { this.x = x; } new B function B(y) { proto this.y = y; } new C(1) new C(2) new C(3) B.prototype = new A; function C(z) { this.z = z; Shape of new C objects determines prototype } C.prototype = new B;
  • 73. Prototypes don’t hurt much new A function A(x) { this.x = x; } new B function B(y) { proto this.y = y; } new C(1) new C(2) new C(3) B.prototype = new A; function C(z) { this.z = z; Shape of new C objects determines prototype } C.prototype = new B; -> IC can generate code that checks shape, then reads directly from prototype without walking
  • 74. Many Shapes Slow Down ICs What happens if many shapes of obj are passed to f? function f(obj) { return obj.p; } ICs end up looking like this:
  • 75. Many Shapes Slow Down ICs What happens if many shapes of obj are passed to f? function f(obj) { return obj.p; } ICs end up looking like this: jumpIf shape != 12 read for shape 12
  • 76. Many Shapes Slow Down ICs What happens if many shapes of obj are passed to f? function f(obj) { return obj.p; } ICs end up looking like this: jumpIf shape != 12 read for shape 12 jumpIf shape != 15 read for shape 15
  • 77. Many Shapes Slow Down ICs What happens if many shapes of obj are passed to f? function f(obj) { return obj.p; } ICs end up looking like this: jumpIf shape != 12 read for shape 12 jumpIf shape != 15 read for shape 15 jumpIf shape != 6 read for shape 6
  • 78. Many Shapes Slow Down ICs What happens if many shapes of obj are passed to f? function f(obj) { return obj.p; } ICs end up looking like this: ... jumpIf shape != 12 jumpIf shape != 16 read for shape 12 read for shape 16 jumpIf shape != 15 jumpIf shape != 22 read for shape 15 read for shape 22 jumpIf shape != 6 jumpIf shape != 3 read for shape 6 read for shape 3
  • 79. Many shapes in practice 100 IE IE Slow Zone for 2+ shapes Opera Chrome 75 Opera # of shapes doesn’t matter! nanoseconds/iteration Firefox Safari 50 Chrome more shapes -> slower Firefox 25 slower with more shapes, but levels off in Slow Zone Safari 0 1 2 8 16 32 100 200 # of shapes at property read site
  • 80. Deeply Nested Closures are Slower var f = function() { var x; var g = function() { var h = function() { var y; var i = function () { var j = function() { z = x + y;
  • 81. Deeply Nested Closures are Slower var f = function() { f call object var x; var g = function() { var h = function() { h call object var y; var i = function () { var j = function() { j call object z = x + y; First call to f
  • 82. Deeply Nested Closures are Slower var f = function() { f call object f call object var x; var g = function() { var h = function() { h call object h call object var y; var i = function () { var j = function() { j call object j call object z = x + y; First call to f Second call to f
  • 83. Deeply Nested Closures are Slower var f = function() { f call object f call object var x; var g = function() { var h = function() { h call object h call object var y; var i = function () { var j = function() { j call object j call object z = x + y; First call to f Second call to f • Prototype chains don’t slow us down, but deep closure nesting does. Why?
  • 84. Deeply Nested Closures are Slower var f = function() { f call object f call object var x; var g = function() { var h = function() { h call object h call object var y; var i = function () { var j = function() { j call object j call object z = x + y; First call to f Second call to f • Prototype chains don’t slow us down, but deep closure nesting does. Why? • Every call to f generates a unique closure object to hold x.
  • 85. Deeply Nested Closures are Slower var f = function() { f call object f call object var x; var g = function() { var h = function() { h call object h call object var y; var i = function () { var j = function() { j call object j call object z = x + y; First call to f Second call to f • Prototype chains don’t slow us down, but deep closure nesting does. Why? • Every call to f generates a unique closure object to hold x. • The engine must walk up to x each time
  • 86. Properties in the Slow Zone
  • 87. Properties in the Slow Zone Undefined Property (Fast on Firefox, Chrome) var a = {}; a.x;
  • 88. Properties in the Slow Zone Undefined Property (Fast on Firefox, Chrome) var a = {}; a.x; DOM Access (I only tested .id, so take with a grain of salt-- other properties may differ) var a = document.getByElementId(“foo”); a.id;
  • 89. Properties in the Slow Zone Undefined Property Scripted Getter (Fast on Firefox, Chrome) (Fast on IE) var a = {}; var a = { x: get() { return 1; } }; a.x; a.x; DOM Access (I only tested .id, so take with a grain of salt-- other properties may differ) var a = document.getByElementId(“foo”); a.id;
  • 90. Properties in the Slow Zone Undefined Property Scripted Getter (Fast on Firefox, Chrome) (Fast on IE) var a = {}; var a = { x: get() { return 1; } }; a.x; a.x; DOM Access Scripted Setter (I only tested .id, so take with a grain of salt-- other properties may differ) var a = { x: set(y) { this.x_ = y; } }; a.x = 1; var a = document.getByElementId(“foo”); a.id;
  • 91. The Type-Specializing JIT Firefox 3.5+ (Tracemonkey) Chrome 11+ (Crankshaft)
  • 92. Types FTW! If only JavaScript had type declarations...
  • 93. Types FTW! If only JavaScript had type declarations... ➡ The JIT would know the type of every local variable
  • 94. Types FTW! If only JavaScript had type declarations... ➡ The JIT would know the type of every local variable ➡ Know exactly what action to use (no type checks)
  • 95. Types FTW! If only JavaScript had type declarations... ➡ The JIT would know the type of every local variable ➡ Know exactly what action to use (no type checks) ➡ Local variables don’t need to be boxed (or unboxed)
  • 96. Types FTW! If only JavaScript had type declarations... ➡ The JIT would know the type of every local variable ➡ Know exactly what action to use (no type checks) ➡ Local variables don’t need to be boxed (or unboxed) We call this kind of JIT a type-specializing JIT
  • 97. But JS doesn’t have types
  • 98. But JS doesn’t have types • Problem: JS doesn’t have type declarations • won’t have them any time soon • we don’t want to wait
  • 99. But JS doesn’t have types • Problem: JS doesn’t have type declarations • won’t have them any time soon • we don’t want to wait • Solution: run the program for a bit, monitor types
  • 100. But JS doesn’t have types • Problem: JS doesn’t have type declarations • won’t have them any time soon • we don’t want to wait • Solution: run the program for a bit, monitor types • Then recompile optimized for those types
  • 101. Running with the Type-Specializing JIT Firefox 3.5+ On x = y + z: Chrome 11+ ‣ read the operation x = y + z from memory ‣ read the inputs y and z from memory ‣ check the types of y and z and choose the action ‣ unbox y and z ‣ execute the action ‣ box the output x ‣ write the output x to memory
  • 102. Running with the Type-Specializing JIT Firefox 3.5+ On x = y + z: Chrome 11+ ‣ read the operation x = y + z from memory ‣ read the inputs y and z from memory ‣ check the types of y and z and choose the action ‣ unbox y and z ‣ execute the action ‣ box the output x ‣ write the output x to memory
  • 103. Running with the Type-Specializing JIT Firefox 3.5+ On x = y + z: Chrome 11+ ‣ read the operation x = y + z from memory ‣ read the inputs y and z from memory ‣ check the types of y and z and choose the action ‣ unbox y and z ‣ execute the action ‣ box the output x ‣ write the output x to memory
  • 104. Further Optimization 1 Automatic Inlining original code function getPop(city) { return popdata[city.id]; } for (var i = 0; i < N; ++i) { total += getPop(city); }
  • 105. Further Optimization 1 Automatic Inlining original code JIT compiles as if function getPop(city) { you wrote this return popdata[city.id]; } for (var i = 0; i < N; ++i) { total += popdata[city.id]; for (var i = 0; i < N; ++i) { } total += getPop(city); }
  • 106. Further Optimization 2 Loop Invariant Code Motion (LICM, “hoisting”) original code for (var i = 0; i < N; ++i) { total += a[i] * (1 + options.tax); }
  • 107. Further Optimization 2 Loop Invariant Code Motion (LICM, “hoisting”) original code JIT compiles as if you wrote this for (var i = 0; i < N; ++i) { var f = 1 + options.tax; total += a[i] * for (var i = 0; i < N; ++i) { (1 + options.tax); total += a[i] * f; } }
  • 109. Optimize Only Hot Code • Type-specializing JITs can have a hefty startup cost • Need to collect the type information • Advanced compiler optimizations take longer to run
  • 110. Optimize Only Hot Code • Type-specializing JITs can have a hefty startup cost • Need to collect the type information • Advanced compiler optimizations take longer to run • Therefore, type specialization is applied selectively • Only on hot code • Tracemonkey: hot = 70 iterations • Crankshaft: hot = according to a profiler • Only if judged to be worthwhile (incomprehensible heuristics)
  • 112. Current Limitations • What happens if the types change after compiling? • Just a few changes -> recompile, slight slowdown • Many changes -> give up and deoptimize to basic JIT
  • 113. Current Limitations • What happens if the types change after compiling? • Just a few changes -> recompile, slight slowdown • Many changes -> give up and deoptimize to basic JIT • Array elements, object properties, and closed-over variables • Usually still boxed • Still need to check type and unbox on get, box on set • Typed arrays might help, but support is not always there yet
  • 114. Current Limitations • What happens if the types change after compiling? • Just a few changes -> recompile, slight slowdown • Many changes -> give up and deoptimize to basic JIT • Array elements, object properties, and closed-over variables • Usually still boxed • Still need to check type and unbox on get, box on set • Typed arrays might help, but support is not always there yet • JS semantics require overflow checks for integer math
  • 115. Type Inference for JITs Current Research @Mozilla
  • 117. Type Inference • Trying to get rid of the last few instances of boxing (from before: array and object properties)
  • 118. Type Inference • Trying to get rid of the last few instances of boxing (from before: array and object properties) • Idea: use static program analysis to prove types • of object props, array elements, called functions • or, almost prove types, and also prove minimal checks needed
  • 119. Type Inference Example var a = []; for (var i = 0; i < N; ++i) { a[i] = i * i; ] var sum = 0; for (var i = 0; i < N; ++i) { sum += a[i]; } Type inference gets this...
  • 120. Type Inference Example var a = []; for (var i = 0; i < N; ++i) { a[i] = i * i; ] var sum = 0; for (var i = 0; i < N; ++i) { sum += a[i]; } Type inference gets this... “i is always a number, so i * i is always a number, so a[_] is always a number!”
  • 121. Type Inference Example var a = []; var a = []; for (var i = 0; i < N; ++i) { for (var i = 0; i < N; ++i) { a[i] = i * i; if (i % 2) ] a[i] = i * i; else var sum = 0; a[i] = “foo”; for (var i = 0; i < N; ++i) { ] sum += a[i]; } var sum = 0; for (var i = 0; i < N; ++i) { if (i % 2) Type inference gets this... sum += a[i]; } “i is always a number, so i * i is always a number, ...but not this. so a[_] is always a number!”
  • 122. Type-stable JavaScript The key to running faster in future JITs is type-stable JavaScript. This means JavaScript where you could declare a single engine-internal type for each variable.
  • 123. Type-stable JS: examples Type-stable var g = 34; var o1 = { a: 56 }; var o2 = { a: 99 }; for (var i = 0; i < 10; ++i) { var o = i % 2 ? o1 : o2; g += o.a; } g = 0;
  • 124. Type-stable JS: examples Type-stable NOT type-stable var g = 34; var g = 34; var o1 = { a: 56 }; var o1 = { a: 56 }; var o2 = { a: 99 }; var o2 = { z: 22, a: 56 }; for (var i = 0; i < 10; ++i) { for (var i = 0; i < 10; ++i) { var o = i % 2 ? o1 : o2; var o = i % 2 ? o1 : o2; g += o.a; g += o.a; } } g = 0; g = “hello”;
  • 125. Type-stable JS: examples Type-stable NOT type-stable var g = 34; var g = 34; var o1 = { a: 56 }; var o1 = { a: 56 }; var o2 = { a: 99 }; var o2 = { z: 22, a: 56 }; for (var i = 0; i < 10; ++i) { for (var i = 0; i < 10; ++i) { var o = i % 2 ? o1 : o2; var o = i % 2 ? o1 : o2; g += o.a; g += o.a; Different shapes } } g = 0; g = “hello”;
  • 126. Type-stable JS: examples Type-stable NOT type-stable var g = 34; var g = 34; var o1 = { a: 56 }; var o1 = { a: 56 }; var o2 = { a: 99 }; var o2 = { z: 22, a: 56 }; for (var i = 0; i < 10; ++i) { for (var i = 0; i < 10; ++i) { var o = i % 2 ? o1 : o2; var o = i % 2 ? o1 : o2; g += o.a; g += o.a; Different shapes } } g = 0; g = “hello”; Type change
  • 128. What Allocates Memory? Objects new Object(); new MyConstructor(); { a: 4, b: 5 } Object.create(); Arrays new Array(); [ 1, 2, 3, 4 ]; Strings new String(“hello”); “<p>” + e.innerHTML + “</p>”
  • 129. What Allocates Memory? Objects Function Objects new Object(); var x = function () { ... } new MyConstructor(); new Function(code); { a: 4, b: 5 } Object.create(); Arrays new Array(); [ 1, 2, 3, 4 ]; Strings new String(“hello”); “<p>” + e.innerHTML + “</p>”
  • 130. What Allocates Memory? Objects Function Objects new Object(); var x = function () { ... } new MyConstructor(); new Function(code); { a: 4, b: 5 } Object.create(); Arrays Closure Environments new Array(); function outer(name) { [ 1, 2, 3, 4 ]; var x = name; return function inner() { return “Hi, “ + name; Strings } } new String(“hello”); “<p>” + e.innerHTML + “</p>”
  • 131. What Allocates Memory? Objects Function Objects new Object(); var x = function () { ... } new MyConstructor(); new Function(code); { a: 4, b: 5 } Object.create(); Arrays Closure Environments new Array(); function outer(name) { [ 1, 2, 3, 4 ]; var x = name; return function inner() { return “Hi, “ + name; Strings } } new String(“hello”); name is stored in an “<p>” + e.innerHTML + “</p>” implicitly created object!
  • 132. GC Pauses Your Program! Time JavaScript GC Running Running JS Paused
  • 133. GC Pauses Your Program! Time JavaScript GC Running Running JS Paused • Basic GC algorithm (mark and sweep) • Traverse all reachable objects (from locals, window, DOM) • Recycle objects that are not reachable
  • 134. GC Pauses Your Program! Time JavaScript GC Running Running JS Paused • Basic GC algorithm (mark and sweep) • Traverse all reachable objects (from locals, window, DOM) • Recycle objects that are not reachable • The JS program is paused during GC for safe traversal
  • 135. GC Pauses Your Program! Time JavaScript GC Running Running JS Paused • Basic GC algorithm (mark and sweep) • Traverse all reachable objects (from locals, window, DOM) • Recycle objects that are not reachable • The JS program is paused during GC for safe traversal • Pauses may be long: 100 ms or more • Serious problem for animation • Can also be a drag on general performance
  • 136. Reducing Pauses with Science 1 Generational GC Chrome
  • 137. Reducing Pauses with Science 1 Generational GC Chrome Idea: Optimize for creating many short-lived objects
  • 138. Reducing Pauses with Science 1 Generational GC Chrome Idea: Optimize for creating many short-lived objects Create objects in a frequently collected nursery area
  • 139. Reducing Pauses with Science 1 Generational GC Chrome Idea: Optimize for creating many short-lived objects Create objects in a frequently collected nursery area Promote long-lived objects to a rarely collected tenured area
  • 140. Reducing Pauses with Science 1 Generational GC Chrome Idea: Optimize for creating many short-lived objects Create objects in a frequently collected nursery area Promote long-lived objects to a rarely collected tenured area JavaScript GC Running Simple GC Running JS Paused
  • 141. Reducing Pauses with Science 1 Generational GC Chrome Idea: Optimize for creating many short-lived objects Create objects in a frequently collected nursery area Promote long-lived objects to a rarely collected tenured area JavaScript GC Running Simple GC Running JS Paused JavaScript Generational GC Running
  • 142. Reducing Pauses with Science 1 Generational GC Chrome Idea: Optimize for creating many short-lived objects Create objects in a frequently collected nursery area Promote long-lived objects to a rarely collected tenured area JavaScript GC Running Simple GC Running JS Paused JavaScript Generational GC Running nursery collection (<100 us)
  • 143. Reducing Pauses with Science 1 Generational GC Chrome Idea: Optimize for creating many short-lived objects Create objects in a frequently collected nursery area Promote long-lived objects to a rarely collected tenured area JavaScript GC Running Simple GC Running JS Paused JavaScript Generational GC Running tenured collection nursery collection (<100 us)
  • 144. Reducing Pauses with Science 1 Generational GC Chrome Idea: Optimize for creating many short-lived objects Create objects in a frequently collected nursery area Promote long-lived objects to a rarely collected tenured area JavaScript GC Running Simple GC Running JS Paused JavaScript Generational GC fewer pauses! Running tenured collection nursery collection (<100 us)
  • 145. Generational GC by Example scavenging young generation (aka nursery) mark-and-sweep tenured generation Message Message Array
  • 146. Generational GC by Example scavenging young generation (aka nursery) Point mark-and-sweep tenured generation Message Message Array
  • 147. Generational GC by Example scavenging young generation (aka nursery) Point Point mark-and-sweep tenured generation Message Message Array
  • 148. Generational GC by Example scavenging young generation (aka nursery) Point Point Line mark-and-sweep tenured generation Message Message Array
  • 149. Generational GC by Example scavenging young generation (aka nursery) Point Point Line a b mark-and-sweep tenured generation Message Message Array
  • 150. Generational GC by Example scavenging young generation (aka nursery) Point Point Line Point a b mark-and-sweep tenured generation Message Message Array
  • 151. Generational GC by Example scavenging young generation (aka nursery) Point Point Line Point Message a b mark-and-sweep tenured generation Message Message Array
  • 152. Generational GC by Example scavenging young generation (aka nursery) Point Point Line Point Message a b mark-and-sweep tenured generation Message Message Array
  • 153. Generational GC by Example scavenging young generation (aka nursery) Point Point Line Point Message Point a b mark-and-sweep tenured generation Message Message Array
  • 154. Generational GC by Example scavenging young generation (aka nursery) Point Point Line Point Point a b mark-and-sweep tenured generation Message Message Message Array
  • 155. Generational GC by Example scavenging young generation (aka nursery) mark-and-sweep tenured generation Message Message Message Array
  • 156. Reducing Pauses with Science 1I Current Incremental GC Research @Mozilla
  • 157. Reducing Pauses with Science 1I Current Incremental GC Research @Mozilla Idea: Do a little bit of GC traversal at a time
  • 158. Reducing Pauses with Science 1I Current Incremental GC Research @Mozilla Idea: Do a little bit of GC traversal at a time JavaScript GC Running Simple GC Running JS Paused
  • 159. Reducing Pauses with Science 1I Current Incremental GC Research @Mozilla Idea: Do a little bit of GC traversal at a time JavaScript GC Running Simple GC Running JS Paused Incremental GC
  • 160. Reducing Pauses with Science 1I Current Incremental GC Research @Mozilla Idea: Do a little bit of GC traversal at a time JavaScript GC Running Simple GC Running JS Paused Incremental GC shorter pauses!
  • 161. Reducing Pauses in Practice
  • 162. Reducing Pauses in Practice • For all GCs • Fewer live objects -> shorter pauses (if not incremental), less time spent in GC
  • 163. Reducing Pauses in Practice • For all GCs • Fewer live objects -> shorter pauses (if not incremental), less time spent in GC • For simple GCs • Lower allocation rate (objects/second) -> less frequent pauses
  • 164. Reducing Pauses in Practice • For all GCs • Fewer live objects -> shorter pauses (if not incremental), less time spent in GC • For simple GCs • Lower allocation rate (objects/second) -> less frequent pauses • For generational GCs • Short-lived objects don’t affect pause frequency • Long-lived objects cost extra (promotion = copying)
  • 166. Performance Faults • Performance fault: when a tiny change hurts performance • Sometimes, just makes one statement slower • Other times, deoptimizes the entire function! • Reasons we have performance faults • bug, tends to get quickly • “rare” case, will get fixed if not rare • hard to optimize, RSN...
  • 168. Strings • In the Slow Zone, but some things are faster than you might think
  • 169. Strings • In the Slow Zone, but some things are faster than you might think • .substring() is fast, O(1) • Don’t need to copy characters, just point within original
  • 170. Strings • In the Slow Zone, but some things are faster than you might think • .substring() is fast, O(1) • Don’t need to copy characters, just point within original • Concatenation is also optimized • Batch up inputs in a rope or concat tree, concat all at once • Performance fault: prepending (Chrome, Opera)
  • 171. Strings • In the Slow Zone, but some things are faster than you might think • .substring() is fast, O(1) // Prepending example var s = “”; •Don’t need to copy characters, just point iwithin<original { for (var = 0; i 100; ++i) s = i + s; • Concatenation is also optimized } • Batch up inputs in a rope or concat tree, concat all at once • Performance fault: prepending (Chrome, Opera)
  • 172. Arrays fast: dense array var a = []; Want a fast array? for (var i = 0; i < 100; ++i) { a[i] = 0; ‣ Make sure it’s dense } ‣ 0..N fill or push fill is always dense 3-15x slower: sparse array ‣ Huge gaps are always sparse var a = []; ‣ N..0 fill is sparse on Firefox a[10000] = 0; for (var i = 0; i < 100; ++i) { a[i] = 0; ‣ adding a named property is sparse on Firefox, IE } a.x = 7; // Fx, IE only
  • 173. Iteration over Arrays fastest: index iteration // This runs in all in JIT code, // so it’s really fast. for (var i = 0; i < a.length; ++i) { sum += a[i]; }
  • 174. Iteration over Arrays 3-15x slower: functional style // This makes N function calls, fastest: index iteration // and most JITs don’t optimize // through C++ reduce(). sum = a.reduce(function(a, b) { // This runs in all in JIT code, return a + b; }); // so it’s really fast. for (var i = 0; i < a.length; ++i) { sum += a[i]; } 20-80x slower: for-in // This calls a C++ function to // navigate the property list. for (var i in a) { sum += a[i]; }
  • 175. Functions • Function calls use ICs, so they are fast • Manual inlining can still help sometimes • Key performance faults: • f.call() - 1.3-35x slower than f() • f.apply() - 5-50x slower than f() • arguments - often very slow, but varies
  • 176. Creating Objects Creating objects is slow Doesn’t matter too much how you create or populate
  • 177. Creating Objects Creating objects is slow Doesn’t matter too much how you create or populate Exception: Constructors on Chrome are fast function Cons(x, y, z) { this.x = x; this.y = y; this.z = z; } for (var i = 0; i < N; ++i) new Cons(i, i + 1, i * 2);
  • 179. OOP Styling Prototype function Point(x, y) { this.x = x; this.y = y; } Point.prototype = { distance: function(pt2) ...
  • 180. OOP Styling Prototype Information-Hiding function Point(x, y) { this.x = x; function Point(x, y) { this.y = y; return { } distance: function(pt2) ... Point.prototype = { } distance: function(pt2) ... }
  • 181. OOP Styling Prototype Information-Hiding function Point(x, y) { this.x = x; function Point(x, y) { this.y = y; return { } distance: function(pt2) ... Point.prototype = { } distance: function(pt2) ... } Instance Methods function Point(x, y) { this.x = x; this.y = y; this.distance = function(pt2) ... }
  • 182. OOP Styling Prototype Information-Hiding function Point(x, y) { this.x = x; function Point(x, y) { this.y = y; return { } distance: function(pt2) ... Point.prototype = { } distance: function(pt2) ... } Prototype style is much faster to create Instance Methods (each closure creates a function object) function Point(x, y) { this.x = x; this.y = y; this.distance = function(pt2) ... }
  • 183. OOP Styling Prototype Information-Hiding function Point(x, y) { this.x = x; function Point(x, y) { this.y = y; return { } distance: function(pt2) ... Point.prototype = { } distance: function(pt2) ... } Prototype style is much faster to create Instance Methods (each closure creates a function object) function Point(x, y) { this.x = x; this.y = y; this.distance = function(pt2) ... } Using the objects is about the same
  • 184. Exceptions • Exceptions assumed to be rare in perf-sensitive code • running a try statement is free on most browers • throw/catch is really slow • There are many performance faults around exceptions • just having a try statement deoptimizes on some browers • try-finally is perf fault on some
  • 185. eval and with Short version: Do not use anywhere near performance sensitive code! Mind-Bogglingly Awful Still Terrible 5-100x slower than using a function call 2-10x slower than without eval var sum = 0; var sum = 0; for (var i = 0; i < N; ++i) { eval(“”); sum = eval(“sum + i”); for (var i = 0; i < N; ++i) { } sum = eval(“sum + i”); }
  • 186. Top 5 Things to Know
  • 187. Top 5 Things to Know 5. Avoid eval, with, exceptions near perf-senstive code
  • 188. Top 5 Things to Know 5. Avoid eval, with, exceptions near perf-senstive code 4. Avoid creating objects in hot loops
  • 189. Top 5 Things to Know 5. Avoid eval, with, exceptions near perf-senstive code 4. Avoid creating objects in hot loops 3. Use dense arrays (know what causes sparseness)
  • 190. Top 5 Things to Know 5. Avoid eval, with, exceptions near perf-senstive code 4. Avoid creating objects in hot loops 3. Use dense arrays (know what causes sparseness) 2. Write type-stable code
  • 191. Top 5 Things to Know 5. Avoid eval, with, exceptions near perf-senstive code 4. Avoid creating objects in hot loops 3. Use dense arrays (know what causes sparseness) 2. Write type-stable code 1. ...
  • 192. Talk To Us JS engine developers want to help you. Tell us about: • Performance faults you run into • Exciting apps that require fast JS • Anything interesting you discover about JS performance

Editor's Notes

  1. \n
  2. JavaScript now runs 10-100x faster than 5 years ago, fast on all major browsers\nDevelopers using it for new apps: interactive movies, games, photo editing, slides\nI&amp;#x2019;m going to explain how it works to help you get the most out of these engines\n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n
  81. \n
  82. \n
  83. \n
  84. \n
  85. \n
  86. \n
  87. \n
  88. \n
  89. \n
  90. \n
  91. \n
  92. \n
  93. \n
  94. \n
  95. \n
  96. \n
  97. \n
  98. \n
  99. \n
  100. \n
  101. \n
  102. \n
  103. \n
  104. \n
  105. \n
  106. \n
  107. \n
  108. \n
  109. \n
  110. \n
  111. \n
  112. \n
  113. \n
  114. \n
  115. \n
  116. \n
  117. \n
  118. \n
  119. \n
  120. \n
  121. \n
  122. \n
  123. \n
  124. \n
  125. \n
  126. \n
  127. \n
  128. \n
  129. \n
  130. \n
  131. \n
  132. \n
  133. \n
  134. \n
  135. \n
  136. \n
  137. \n
  138. \n
  139. \n
  140. \n
  141. \n
  142. \n
  143. \n
  144. \n
  145. \n
  146. \n
  147. \n
  148. \n
  149. \n
  150. \n
  151. \n
  152. \n
  153. \n
  154. \n
  155. \n
  156. \n
  157. \n
  158. \n
  159. \n
  160. \n
  161. \n
  162. \n
  163. \n
  164. \n
  165. \n
  166. \n
  167. \n
  168. \n
  169. \n
  170. \n
  171. \n
  172. \n
  173. \n
  174. \n
  175. \n
  176. \n
  177. \n
  178. \n
  179. \n
  180. \n
  181. \n
  182. \n
  183. \n
  184. \n
  185. \n
  186. \n
  187. \n
  188. \n
  189. \n
  190. \n
  191. \n
  192. \n
  193. \n
  194. \n
  195. \n
  196. \n
  197. \n
  198. \n
  199. \n
  200. \n
  201. \n
  202. \n
  203. \n
  204. \n
  205. \n
  206. \n
  207. \n
  208. \n
  209. \n
  210. \n
  211. \n
  212. \n
  213. \n
  214. \n
  215. \n
  216. \n
  217. \n
  218. \n
  219. \n
  220. \n
  221. \n
  222. \n
  223. \n
  224. \n
  225. \n
  226. \n
  227. \n
  228. \n
  229. \n
  230. \n
  231. \n
  232. \n
  233. \n
  234. \n
  235. \n
  236. \n
  237. \n
  238. \n
  239. \n
  240. \n
  241. \n
  242. \n
  243. \n
  244. \n
  245. \n
  246. \n
  247. \n
  248. \n
  249. \n
  250. \n
  251. \n
  252. \n
  253. \n
  254. \n
  255. \n
  256. \n
  257. \n
  258. \n
  259. \n
  260. \n
  261. \n
  262. \n
  263. \n
  264. \n
  265. \n
  266. \n
  267. \n
  268. \n
  269. \n
  270. \n
  271. \n
  272. \n
  273. \n
  274. \n
  275. \n
  276. \n
  277. \n
  278. \n
  279. \n
  280. \n
  281. \n
  282. \n
  283. \n
  284. \n
  285. \n
  286. \n
  287. \n
  288. \n
  289. \n
  290. \n
  291. \n
  292. \n
  293. \n
  294. \n
  295. \n
  296. \n
  297. \n
  298. \n
  299. \n
  300. \n
  301. \n
  302. \n
  303. \n
  304. \n
  305. \n
  306. \n
  307. \n
  308. \n
  309. \n
  310. \n
  311. \n
  312. \n
  313. \n
  314. \n
  315. \n
  316. \n
  317. \n
  318. \n
  319. \n
  320. \n
  321. \n