Emscripten: An LLVM-to-JavaScript Compiler

Page created by April Joseph
 
CONTINUE READING
Emscripten: An LLVM-to-JavaScript Compiler

                                             Alon Zakai
                                               Mozilla
                                         azakai@mozilla.com

Abstract                                              emscripten.org. As an LLVM-to-JavaScript
                                                      compiler, the challenges in designing Emscripten
JavaScript is the standard language of the web,       are somewhat the reverse of the norm – one must
supported on essentially all web browsers. De-        go from a low-level assembly into a high-level
spite efforts to allow other languages to be run as   language, and recreate parts of the original high-
well, none have come close to being universally       level structure of the code that were lost in the
available on all browsers, which severely limits      compilation to low-level LLVM. We detail the al-
their usefulness on the web. However, there are       gorithms used in Emscripten to deal with those
reasonable reasons why allowing other languages       challenges.
would be beneficial, including reusing existing
code and allowing developers to use their lan-          c 2011 Alon Zakai. License: Creative Com-
guages of choice.                                     mons Attribution-ShareAlike (CC BY-SA),
   We present Emscripten, an LLVM-to-                 http://creativecommons.org/licenses/
JavaScript compiler.         Emscripten compiles      by-sa/3.0/
LLVM assembly code into standard JavaScript,
which opens up two avenues for running code           1    Introduction
written in other languages on the web: (1) Com-
pile a language directly into LLVM, and then          Since the mid 1990’s, JavaScript has been
compile that into JavaScript using Emscripten,        present in most web browsers (sometimes with
or (2) Compiling a language’s entire runtime          minor variations and under slightly different
(typically written in C or C++) into JavaScript       names, e.g., JScript in Internet Explorer), and
using Emscripten, and using the compiled              today it is well-supported on essentially all web
runtime to run code written in that language.         browsers, from desktop browsers like Internet
Examples of languages that can use the first          Explorer, Firefox, Chrome and Safari, to mo-
approach are C and C++, as compilers exist            bile browsers on smartphones and tablets. To-
for them into LLVM. An example of a language          gether with HTML and CSS, JavaScript is the
that can use the second approach is Python, and       standards-based foundation of the web.
Emscripten has been used to compile CPython              Running other programming languages on the
(the standard C implementation of Python) into        web has been suggested many times, and browser
JavaScript, allowing standard Python code to          plugins have allowed doing so, e.g., via the Java
be run on the web, which was not previously           and Flash plugins. However, plugins must be
possible.                                             manually installed and do not integrate in a per-
   Emscripten itself is written in JavaScript (to     fect way with the outside HTML. Perhaps more
enable various dynamic compilation techniques),       problematic is that they cannot run at all on
and is available under the MIT license (a per-        some platforms, for example, Java and Flash can-
missive open source license), at http://www.          not run on iOS devices such as the iPhone and

                                                                                                 Page 1
iPad. For those reasons, JavaScript remains the        • Compile the runtime used to parse and
primary programming language of the web.                 execute code in a particular language into
   There are, however, justifiable motivations for       LLVM, then compile that into JavaScript
running code from other programming languages            using Emscripten. It is then possible to run
on the web, for example, if one has a large              code in that runtime on the web. This is
amount of existing code already written in an-           a useful approach if a language’s runtime is
other language, or if one simply has a strong            written in a language for which an LLVM
preference for another language (and perhaps is          frontend exists, but the language iself has no
more productive in it).                                  such frontend. For example, no currently-
   As a consequence, there have been some                supported frontend exists for Python, how-
efforts to compile languages into JavaScript.            ever it is possible to compile CPython – the
Since JavaScript is present in essentially all web       standard implementation of Python, written
browsers, by compiling one’s language of choice          in C – into JavaScript, and run Python code
into JavaScript, one can still generate content          on that (see Subsection X.Y).
that will run practically everywhere. Examples
of this approach include the Google Web Toolkit,        From a technical standpoint, the main chal-
which compiles Java into JavaScript; Pyjamas,        lenges in designing and implementing Em-
which compiles Python into JavaScript; and ru-       scripten are that it compiles a low-level lan-
mors have it that projects in Oracle and Mi-         guage – LLVM assembly – into a high-level one
crosoft, respectively, allow running JVM and         – JavaScript. This is somethat the reverse of the
CLR bytecode in JavaScript (which would allow        usual situation one is in when building a com-
running the languages of the JVM and CLR, re-        piler, and leads to some unique difficulties. For
spectively).                                         example, to get good performance in JavaScript
   In this paper we present another project along    one must use natural JavaScript code flow struc-
those lines: Emscripten, which compiles LLVM         tures, like loops and ifs, but those structures do
assembly into JavaScript. LLVM (Low Level Vir-       not exist in LLVM assembly (instead, what is
tual Machine) is a compiler project primarily fo-    present there is essentially ‘flat’ code with goto
cused on C, C++ and Objective-C. It compiles         commands). Emscripten must therefore recon-
those languages through a frontend (the main         struct a high-level representation from the low-
ones of which are Clang and LLVM-GCC) into           level data it receives.
the LLVM intermediary representation (which             In theory that issue could have been avoided
can be machine-readable bitcode, or human-           by compiling a higher-level language into
readable assembly), and then passes it through       JavaScript. For example, if compiling Java into
a backend which generates actual machine code        JavaScript (as the Google Web Toolkit does),
for a particular architecure. Emscripten plays       then one can benefit from the fact that Java’s
the role of a backend which targets JavaScript.      loops, ifs and so forth generally have a very di-
   By using Emscripten, potentially many lan-        rect parallel in JavaScript (of course the down-
guages can be run on the web, using one of the       side is that this approach yields a compiler only
following methods:                                   for Java). Compiling LLVM into JavaScript is
                                                     less straightforward, but wee will see later that
  • Compile code in a language recognized            it is possible to reconstruct a substantial part of
    by one of the existing LLVM frontends            the high-level structure of the original code.
    into LLVM, and then compile that into               We conclude this introduction with a list of
    JavaScript using Emscripten. Frontends for       this paper’s main contributions:
    various languages exist, including many of
    the most popular programming languages             • We describe Emscripten itself, during which
    such as C and C++, and also various new              we detail its approach in compiling LLVM
    and emerging languages (e.g., Rust).                 into JavaScript.

                                                                                                Page 2
• We give details of Emscripten’s ‘Relooper’      @.str = private constant [14 x i8]
      algorithm, which generates high-level loop              c"1+...+100=%d\0A\00"
      structures from low-level branching data.
      We are unaware of related results in the lit-   define i32 @main() {
      erature.                                          %1 = alloca i32, align 4
                                                        %sum = alloca i32, align 4
In addition, the following are the main contribu-       %i = alloca i32, align 4
tions of Emscripten itself, that to our knowledge       store i32 0, i32* %1
were not previously possible:                           store i32 0, i32* %sum, align 4
                                                        store i32 1, i32* %i, align 4
    • It allows compiling a very large subset of C
                                                        br label %2
      and C++ code into JavaScript, which can
      then be run on the web.
                                                      ; :2
    • By compiling their runtimes, it allows run-       %3 = load i32* %i, align 4
      ning languages such as Python on the web.         %4 = icmp slt i32 %3, 100
                                                        br i1 %4, label %5, label %12
   The remainder of this paper is structured as
follows. In Section 2 we describe, from a high        ; :5
level, the approach taken to compiling LLVM as-         %6 = load i32* %i, align 4
sembly into JavaScript. In Section 3 we describe        %7 = load i32* %sum, align 4
the workings of Emscripten on a lower, more con-        %8 = add nsw i32 %7, %6
crete level. In Section 4 we give an overview of        store i32 %8, i32* %sum, align 4
some uses of Emscripten. In Section 5 we sum-           br label %9
marize and give directions for future work on
Emscripten and uses of it.                            ; :9
                                                        %10 = load i32* %i, align 4
                                                        %11 = add nsw i32 %10, 1
2     Compilation Approach                              store i32 %11, i32* %i, align 4
                                                        br label %2
Let us begin by considering what the challenge
is, when we want to compile something into
                                                      ; :12
JavaScript. Assume we are given the following
                                                        %13 = load i32* %sum, align 4
simple example of a C program, which we want
                                                        %14 = call i32 (i8*, ...)*
to compile into JavaScript:
                                                              @printf(i8* getelementptr inbounds
    #include                                           ([14 x i8]* @.str, i32 0, i32 0),
    int main()                                                  i32 %13)
    {                                                   ret i32 0
      int sum = 0;                                    }
      for (int i = 1; i < 100; i++)
        sum += i;                                     At first glance, this may look more difficult
      printf("1+...+100=%d\n", sum);                  to translate into JavaScript than the original
      return 0;                                       C++. However, compiling C++ in general
    }                                                 would require writing code to handle prepro-
                                                      cessing, classes, templates, and all the idiosyn-
This program calculates the sum of the integers       crasies and complexities of C++. LLVM assem-
from 1 to 100. When compiled by Clang, the            bly, while more verbose in this example, is lower-
generated LLVM assembly code includes the fol-        level and simpler to work on. It also has the
lowing:                                               benefit we mentioned earlier, which is one of the

                                                                                                 Page 3
main goals of Emscripten, that many languages                   if ($4) { __label__ = 1; break; }
can be compiled into LLVM.                                      else    { __label__ = 2; break; }
   A detailed overview of LLVM assembly is be-                case 1:
yond our scope here. Briefly, though, the ex-                   var $6 = HEAP[$i];
ample assembly above can easily be seen to de-                  var $7 = HEAP[$sum];
fine a function main(), then allocate some values               var $8 = $7 + $6;
on the stack (alloca), then load and store var-                 HEAP[$sum] = $8;
ious values (load and store). We do not have                    __label__ = 3; break;
the high-level code structure as we had in C++,               case 3:
with a loop, instead we have code ‘fragments’,                  var $10 = HEAP[$i];
each with a label, and code flow moves from one                 var $11 = $10 + 1;
to another by branch (br) instructions. (Label                  HEAP[$i] = $11;
2 is the condition check in the loop; label 5 is                __label__ = 0; break;
the body, label 9 is the increment, and label 12              case 2:
is the final part of the function, outside of the               var $13 = HEAP[$sum];
loop). Conditional branches can depend on cal-                  var $14 = _printf(__str, $13);
culations, for example the results of comparing                 STACKTOP = __stackBase__;
two values (icmp). Other numerical operations                   return 0;
include addition (add). Finally, printf is called         }
(call). The challenge, then, is to convert this and   }
things like it into JavaScript.
   In general, Emscripten’s approach is to trans-     Some things to take notice of:
late each line of LLVM assembly into JavaScript,          • A switch-in-a-loop construction is used in
1 for 1, into ‘normal’ JavaScript as much as pos-           order to let the flow of execution move be-
sible. So, for example, an add operation becomes            tween fragments of code in an arbitrary
a normal JavaScript addition, a function call be-           manner: We set label to the (numerical
comes a JavaScript function call, etc. This 1 to            representation of the) label of the fragment
1 translation generates JavaScript that resembles           we want to reach, and do a break, which
assembly code, for example, the LLVM assembly               leads to the proper fragment being reached.
shown before for main() would be compiled into              Inside each fragment, every line of code cor-
the following:                                              responds to a line of LLVM assembly, gen-
                                                            erally in a very straightforward manner.
function _main() {
  var __stackBase__ = STACKTOP;                           • Memory is implemented by HEAP, a
  STACKTOP += 12;                                           JavaScript array. Reading from memory is
  var __label__ = -1;                                       a read from that array, and writing to mem-
  while(1) switch(__label__) {                              ory is a write. STACKTOP is the current
    case -1:                                                position of the stack. (Note that we allo-
      var $1 = __stackBase__;                               cate 4 memory locations for 32-bit integers
      var $sum = __stackBase__+4;                           on the stack, but only write to 1 of them.
      var $i = __stackBase__+8;                             See the Load-Store Consistency subsection
      HEAP[$1] = 0;                                         below for why.)
      HEAP[$sum] = 0;
      HEAP[$i] = 0;                                       • LLVM      assembly      functions   become
      __label__ = 0; break;                                 JavaScript functions, and function calls
    case 0:                                                 are normal JavaScript function calls. In
      var $3 = HEAP[$i];                                    general, we attempt to generate as ‘normal’
      var $4 = $3 < 100;                                    JavaScript as possible.

                                                                                                  Page 4
2.1   Load-Store Consistency (LSC)                      var x_value = 12345;
                                                        var x_addr = stackAlloc(4);
We saw before that Emscripten’s memory usage
                                                        HEAP[x_addr]   = (x_value >> 0) & 255;
allocates the usual number of bytes on the stack
                                                        HEAP[x_addr+1] = (x_value >> 8) & 255;
for variables (4 bytes for a 32-bit integer, etc.).
                                                        HEAP[x_addr+2] = (x_value >> 16) & 255;
However, we only wrote values into the first lo-
                                                        HEAP[x_addr+3] = (x_value >> 24) & 255;
cation, which appeared odd. We will now see the
                                                        [...]
reason for that.
                                                        printf("first byte: %d\n", HEAP[x_addr]);
   To get there, we must first step back, and note
that Emscripten does not aim to achieve perfect       Here we allocate space for the value of x on the
compatibility with all possible LLVM assembly         stack, and store that address in x addr. The
(and correspondingly, with all possible C or C++      stack itself is part of the ‘memory space’, which
code, etc.); instead, Emscripten targets a subset     is the array HEAP. In order for the read on the
of LLVM assembly code, which is portable and          final line to give the proper value, we must go to
does not make crucial assumptions about the un-       the effort of doing 4 store operations, each of the
derlying CPU architecture on which the code is        value of a particular byte. In other words, HEAP
meant to run. That subset is meant to encom-          is an array of bytes, and for each store into mem-
pass the vast majority of real-world code that        ory, we must deconstruct the value into bytes.1
would be compiled into LLVM, while also being            Alternatively, we can store the value in a single
compilable into very performant JavaScript.           operation, and deconstruct into bytes as we load.
   More specifically, Emscripten assumes that the     This will be faster in some cases and slower in
LLVM assembly code it is compiling has Load-          others, but is still more overhead than we would
Store Consistency (LSC), which is the require-        like, generally speaking – for if the code does
ment that loads from and stores to a specific         have LSC, then we can translate that code frag-
memory address will use the same type. Nor-           ment into the far more optimal
mal C and C++ code generally does so: If x is a
variable containing a 32-bit floating point num-        var x_value = 12345;
ber, then both loads and stores of x will be of         var x_addr = stackAlloc(4);
32-bit floating point values, and not 16-bit un-        HEAP[x_addr] = x_value;
signed integers or anything else. (Note that even       [...]
if we write something like                              printf("first byte: %d\n", HEAP[x_addr]);
float x = 5                                           (Note that even this can be optimized even more
then the compiler will assign a 32-bit float with     – we can store x in a normal JavaScript variable.
the value of 5 to x, and not an integer.)             For now though we are just clarifying why it is
  To see why this is important for performance,       useful to assume we are compiling code that has
consider the following C code fragment, which         LSC – doing so lets us generate shorter and more
does not have LSC:                                    natural JavaScript.)
                                                        In practice the vast majority of C and C++
  int x = 12345;
                                                      code does have LSC. Exceptions do exist, how-
  [...]
                                                      ever, for example:
  printf("first byte: %d\n", *((char*)&x));
                                                         1
                                                           Note that we can use JavaScript typed arrays with
Assuming an architecture with more than 8 bits,       a shared memory buffer, which would work as expected,
this code will read the first byte of x. (It might,   assuming (1) we are running in a JavaScript engine which
for example, be used to detect the endianness         supports typed arrays, and (2) we are running on a CPU
of the CPU.) To compile this into JavaScript in       with the same architecture as we expect. This is there-
                                                      fore dangerous as the generated code may run differently
a way that will run properly, we must do more         on different JavaScript engines and different CPUs. Em-
than a single operation for either the read or the    scripten currently has optional experimental support for
write, for example we could do this:                  typed arrays.

                                                                                                      Page 5
• Code that detects CPU features like endian-       generated code was fairly complicated and cum-
    ness, the behavior of floats, etc. In general     bersome, and unsurprisingly it performs quite
    such code can be disabled before running it       poorly. There are two main reasons for that:
    through Emscripten, as it is not needed.          First, that the code is simply unoptimized –
                                                      there are many variables declared when fewer
  • memset and related functions typically work       could suffice, for example, and second, that the
    on values of one kind, regardless of the un-      code does not use ‘normal’ JavaScript, which
    derlying values. For example, memset may          JavaScript engines are optimized for – it stores
    write 64-bit values on a 64-bit CPU since         all variables in an array (not normal JavaScript
    that is usually faster than writing individ-      variables), and it controls the flow of execution
    ual bytes. This tends to not be a problem,        using a switch-in-a-loop, not normal JavaScript
    as with memset the most common case is            loops and ifs.
    setting to 0, and with memcpy, the values            Emscripten’s approach to generating fast-
    end up copied properly anyhow.                    performing code is as follows.        Emscripten
                                                      doesn’t do any optimization that can otherwise
  • Even LSC-obeying C or C++ code may turn
                                                      be done by additional tools: LLVM can be used
    into LLVM assembly that does not, after be-
                                                      to perform optimizations before Emscripten, and
    ing optimized. For example, when storing
                                                      the Closure Compiler can perform optimizations
    two 32-bit integers constants into adjoining
                                                      on the generated JavaScript afterwards. Those
    locations in a structure, the optimizer may
                                                      tools will perform standard useful optimizations
    generate a single 64-bit store of an appro-
                                                      like removing unneeded variables, dead code, etc.
    priate constant. In other words, optimiza-
                                                      That leaves two major optimizations that are left
    tion can generate nonportable code, which
                                                      for Emscripten to perform:
    runs faster on the current CPU, but nowhere
    else. Emscripten currently assumes that op-         • Variable nativization: Convert variables
    timizations of this form are not being used.          that are on the stack into native JavaScript
                                                          variables. In general, a variable will be na-
In practice it may be hard to know if code has
                                                          tivized unless it is used outside that func-
LSC or not, and requiring a time-consuming code
                                                          tion, e.g., if its address is taken and stored
audit is obviously impractical. Emscripten there-
                                                          somewhere or passed to another function.
fore has automatic tools to detect violations of
                                                          When optimizing, Emscripten tries to na-
LSC, see SAFE HEAP in Subsection X.Y.
                                                          tivize as many variables as possible.
  Note that it is somewhat wasteful to allocation
4 memory locations for a 32-bit integer, and use        • ‘Relooping’: Recreate high-level loop and
only one of them. It is possible to change that be-       if structures from the low-level labels and
havior with the QUANTUM SIZE parameter to                 branches that appear in LLVM assembly.
Emscripten, however, the difficulty is that LLVM          We detail the algorithm Emscripten uses for
assembly has hardcoded values that depend on              this purpose in Section X.
the usual memory sizes being used. For example,
memcpy can be called with the normal size each          When run with Emscripten’s optimizations,
the value would have, and not a single memory         the above code looks like this:
address each as Emscripten would prefer. We
are looking into modifications to LLVM itself to      function _main() {
remedy that.                                            var __label__;
                                                        var $1;
                                                        var $sum;
2.2   Performance
                                                        var $i;
When comparing the example program from be-             $1 = 0;
fore (that counts the integers from 1 to 100), the      $sum = 0;

                                                                                                Page 6
$i = 0;                                        2.3 Dealing with Real-World Code
    $2$2: while(1) { // $2
      var $3 = $i;                                 As mentioned above, in practice it may be hard
      var $4 = $3 < 100;                           to know if code has LSC or not, and other dif-
      if (!($4)) { __label__ = 2; break $2$2; }ficulties may arise as well. Accordingly, Em-
      var $6 = $i;                                 scripten has a set of tools to help detect if
      var $7 = $sum;                               compiled code has certain problems. These are
      var $8 = $7 + $6;                            mainly compile-time options that lead to addi-
      $sum = $8;                                   tional runtime checks. One then runs the code
      var $10 = $i;                                and sees the warnings or errors generated. The
      var $11 = $10 + 1;                           debugging tools include:
      $i = $11;
      __label__ = 0; continue $2$2;
    }                                                 • SAFE HEAP: This option adds runtime
    var $13 = $sum;                                     checks for LSC. It will raise an exception if
    var $14 = _printf(__str, $13);                      LSC does not hold, as well as related issues
    return 0;                                           like reading from memory before a value was
}                                                       written (somewhat similarly to tools like
                                                        Valgrind). If such a problem occurs, possible
If in addition the Closure Compiler is run on that      solutions are to ignore the issue (if it has no
output, we get                                          actual consqeuences), or altering the source
                                                        code. In some cases, such nonportable code
                                                        can be avoided in a simple manner by chang-
function K() {                                          ing the configutation under which the code
    var a, b;                                           is built (see the examples in Section X).
    b = a = 0;
    a:for(;;) {
      if(!(b < 100)) {                                • CHECK OVERFLOWS: This option
        break a                                         adds runtime checks for overflows in inte-
      }                                                 ger operations, which are an issue since all
      a += b;                                           JavaScript numbers are doubles, hence sim-
      b += 1;                                           ple translation of LLVM numerical opera-
    }                                                   tions (addition, multiplication, etc.) into
    _printf(J, a);                                      JavaScript ones may lead to different results
    return 0;                                           than expected. CHECK OVERFLOWS
}                                                       will add runtime checks for actual overflows
                                                        of this sort, that is, whether the result of an
which is fairly close to the original C++ (the          operation is a value too large for the type
differences, of having the loop’s condition inside      it defined as. Code that has such overflows
the loop instead of inside the for() expression at      can enable CORRECT OVERFLOWS,
the top of the original loop, are not important         which fixes overflows at runtime. The cost,
to performance). Thus, it is possible to recre-         however, is decreased speed due to the cor-
ate the original high-level structure of the code       rection operations (which perform a bitwise
that was compiled into LLVM assembly, despite           AND to force the value into the proper
that structure not being explicitly available to        range). Since such overflows are rare, it is
Emscripten.                                             possible to use this option to find where they
   We will see performance measurements in Sec-         occur, and modify the original code in a suit-
tion 4.                                                 able way (see the examples in Section).

                                                                                                Page 7
3     Emscripten’s Architecture                       newlib C library. Some aspects of the runtime
                                                      environment, as implemented in that C library,
In the previous section we saw a general overview     are:
of Emscripten’s approach to compiling LLVM as-
sembly into JavaScript. We will now get into            • Files to be read must be ‘preloaded’ in
more detail into how Emscripten implements                JavaScript. They can then be accessed using
that approach and its architecture.                       the usual C library methods (fopen, fread,
   Emscripten is written in JavaScript. A main            etc.). Files that are written are cached, and
reason for that decision was to simplify shar-            can be read by JavaScript later. While lim-
ing code between the compiler and the runtime,            iting, this approach can often be sufficient
and to enable various dynamic compilation tech-           for many purposes.
niques. Two simple examples: (1) The compiler
can create JavaScript objects that represent con-       • Emscripten allows writing pixel data to an
stants in the assembly code, and convert them to          HTML5 canvas element, using a subset of
a string using JSON.stringify() in a convenient           the SDL API. That is, one can write an
manner, and (2) The compiler can simplify nu-             application in C or C++ using SDL, and
merical operations by simply eval()ing the code           that same application can be compiled nor-
(so “1+2” would become “3”, etc.). In both ex-            mally and run locally, or compiled using
amples, writing Emscripten is made simpler by             Emscripten and run on the web. See the ray-
having the exact same environment during com-             tracing demo at http://syntensity.com/
pilation as the executing code will have.                 static/raytrace.html.
   Emscripten has three main phases:
                                                        • sbrk is implemented using the HEAP array
    • The intertyper, which converts from                 which was mentioned previously. This al-
      LLVM assembly into Emscripten’s internal            lows a normal malloc implementation writ-
      representation.                                     ten in C to be compiled to JavaScript.

    • The analyzer, which inspects the internal       3.2   The Relooper: Recreating high-
      representation and generates various use-             level loop structures
      ful information for the final phase, includ-
      ing type and variable information, stack us-    The Relooper is among the most complicated
      age analysis, optional data for optimizations   components in Emscripten. It receives a ‘soup
      (variable nativization and relooping), etc.     of labels’, which is a set of labeled fragments of
                                                      code – for brevity we call such fragments simply
    • The jsifier, which does the final conver-       ‘labels’ – each ending with a branch operation
      sion of the internal representation plus ad-    (either a simple branch, a conditional branch,
      ditional analyzed data into JavaScript.         or a switch), and the goal is to generate normal
                                                      high-level JavaScript code flow structures such
3.1     The Runtime Environment                       as loops and ifs.
                                                         For example, the LLVM assembly on page X
Code generated from Emscripten is meant to            has the following label structure:
run in a JavaScript engine, typically in a web
browser. This has implications for the kind of                  /-----------\
runtime environment we can generate for it, for                 |           |
example, there is no direct access to the local                 V           |
filesystem.                                           ENTRY --> 2 --> 5 --> 9
   Emscripten comes with a partial implementa-                  |
tion of a C library, mostly written from scratch                V
in JavaScript, which parts compiled from the                   12

                                                                                                 Page 8
In this simple example, it is fairly straightfor-            – Handled blocks: A set of blocks
ward to see that a natural way to implement it                 to which execution can enter. When
using normal loop structures is                                we reach the multiple block, we check
ENTRY                                                          which of them should execute, and go
while (true) do                                                there. When execution of that block
   2                                                           is complete, or if none of the handled
   if (condition) break                                        blocks was selected for execution, we
   5                                                           proceed to the Next block, below.
   9                                                         – Next: A block of labels that will ap-
12                                                             pear just outside this one, in other
In general though, this is not always easy or even             words, that will be reached after code
possible – there may not be a reasonable high-                 flow exits the Handled blocks, above.
level loop structure corresponding to the low-
                                                      Remember that we have a label variable that
level one, if for example the original C code re-
                                                      helps control the flow of execution: Whenever we
lied heavily on goto instructions. In practice,
                                                      enter a block with more than one entry, we set
however, almost all real-world C and C++ code
                                                        label before we branch into it, and we check
tends to be amenable to loop recreation.
                                                      its value when we enter that block. So, for ex-
   Emscripten’s Relooper takes as input a ‘soup
                                                      ample, when we create a Loop block, it’s Next
of labels’ as described above, and generates a
                                                      block can have multiple entries – any label to
structured set of code ‘blocks’, which are each
                                                      which we branch out from the loop. By creating
a set of labels, with some logical structure, of
                                                      a Multiple block after the loop, we can enter the
one of the following types:
                                                      proper label when the loop is exited. Having a
  • Simple block: A block with one internal             label variable does add some overhead, but it
    label and a Next block, which the internal        greatly simplifies the problem that the Relooper
    label branches to. The block is later trans-      needs to solve. Of course, it is possible to opti-
    lated simply into the code for that label, and    mize away writes and reads to label in many
    the Next block appears right after it.            cases.
                                                         Emscripten uses the following recursive algo-
  • Loop: A block that represents an infinite
                                                      rithm for generating blocks from the soup of la-
    loop, comprised of two internal sub-blocks:
                                                      bels:
       – Inner: A block of labels that will ap-
         pear inside the loop, i.e., when execu-        • Receive a set of labels and which of them
         tion reaches the end of that block, flow         are entry points. We wish to create a block
         will return to the beginning. Typically          comprised of all those labels.
         a loop will contain a conditional break
         defining where it is exited. When we           • Calculate, for each label, which other labels
         exit, we reach the Next block, below.            it can reach, i.e., which labels we are able
       – Next: A block of labels that will ap-            to reach if we start at the current label and
         pear just outside the loop, in other             follow one of the possible paths of branching.
         words, that will be reached when the
                                                        • If we have a single entry, and cannot return
         loop is exited.
                                                          to it from any other label, then create a Sim-
  • Multiple: A block that represents an di-              ple block, with the entry as its internal la-
    vergence into several possible branches, that         bel, and the Next block comprised of all the
    eventually rejoin. A Multiple block can im-           other labels. The entries for the Next block
    plement an ‘if’, an ‘if-else’, a ‘switch’, etc.       are the entry to which the internal label can
    It is comprised of                                    branch.

                                                                                                 Page 9
• If we can return to all of the entries, return a   JavaScript, to get us to where we need to go –
    Loop block, whose Inner block is comprised         they do not need any further consideration for
    of all labels that can reach one of the en-        them to work properly).
    tries, and whose Next block is comprised of           Emscripten also does an additional pass af-
    all the others. The entry labels for the cur-      ter running the Relooper algorithm which has
    rent block become entry labels for the Inner       been described. The Relooper is guaranteed to
    block (note that they must be in the Inner         produce valid output (see below). The second
    block, as each one can reach itself). The          pass takes that valid output and optimizes it, by
    Next block’s entry labels are all the labels       making minor changes such as removing continue
    in the Next block that can be reached by           commands that occur at the very end of loops
    the Inner block.                                   (where they are not needed), etc. In other words,
                                                       the first pass focuses on generating high-level
  • If we have more than one entry, try to create
                                                       code flow structures that are correct, while the
    a Multiple block: For each entry, find all the
                                                       second pass simplifies and optimizes that struc-
    labels it reaches that cannot be reached by
                                                       ture.
    any other entry. If at least one entry has
                                                          We now turn to an analysis of the Relooper
    such labels, return a Multiple block, whose
                                                       algorithm. It is straightforward to see that the
    Handled blocks are blocks for those labels,
                                                       output of the algorithm, assuming it completes
    and whose Next block is all the rest. Entry
                                                       successfully – that is, that if finishes in finite
    labels for those two blocks become entries
                                                       time, and does not run into an error in the last
    of the new block they are now part of. We
                                                       part (where it is claimed that if we reach it we
    may have additional entry labels in the Next
                                                       can return to at least one of the entry labels) –
    block, for each entry in the Next block that
                                                       is correct in the sense of code execution being
    can be reached from the Handled ones.
                                                       carried out as in the original data. We will now
  • We must be able to return to at least one          prove that the algorithm must in fact complete
    of the entries (see proof below), so create a      successfully.
    Loop block as described above.                        First, note that if we successfully create a
                                                       block, then we simplify the remaining problem,
Note that we first create a loop only if we must,      where the ‘complexity’ of the problem for our
then try to create a multiple, then create a loop      purposes now is the sum of labels plus the some
if we can. We could have slightly simplified this      of branching operations:
in various ways. The algorithm as presented
above has given overall better results in prac-          • This is trivial for Simple blocks (since we
tice, in terms of the ‘niceness’ of the shape of the       now have a Next block which is strictly
generated code, both subjectively and in some              smaller).
benchmarks (however, the benchmarks are lim-
ited, and we cannot be sure the result will remain       • It is true for loop blocks simply by remov-
true for all possible inputs to the Relooper).             ing branching operations (there must be a
   Additional details of the algorithm include ‘fix-       branching back to an entry, which becomes
ing’ branch instructions accordingly. For exam-            a continue).
ple, when we create a Loop block, then all branch
instructions outside of the loop are converted           • For multiple blocks, if the Next block is
into break commands, and all branch instruc-               non-empty then we have split into strictly
tions to the beginning of the loop are converted           smaller blocks (in number of labels) than be-
into continue commands. Those commands are                 fore. If the next block is empty, then since
then ignored when called recursively to generate           we built the Multiple block from a set of
the Inner block (that is, the break and continue           labels with more than one entry, then the
commands are guaranteed, by the semantics of               Handled blocks are strictly smaller than the

                                                                                                Page 10
current one. The remaining case is when we         The CORRECT OVERFLOWS option in
      have a single entry.                            Emscripten is necessary for proper operation of
                                                      the generated code, as Python hash code relies
The remaining issue is whether we can reach a         on integer overflows to work normally.
situation where we fail to create a block due to         The demo can be seen live at http://www.
an error, that is, that the claim in the last part    syntensity.com/static/python.html. Poten-
does not hold. For that to occur, we must not         tial uses include ... etc.
be able to return to any of the entries (or else
we would create a Loop block). But since that is
so, we can, at minimum, create a Multiple block       4.2   Bullet Physics Engine
with entries for all the current entries, since the   Mention other bullet-¿js manual port, via Java
entry labels themselves cannot be reached, con-
tradicting the assumption that we cannot create
a block, and concluding the proof.                    4.3   Lua
   We have not, of course, proven that the shape      Mention other lua-¿js compilers
of the blocks is optimal in any sense. How-
ever, even if it is possible to optimize them,
the Relooper already gives a very substantial         5     Summary
speedup due to the move from the switch-in-
a-loop construction to more natural JavaScript        We presented Emscripten, an LLVM-to-
code flow structures. TODO: A little data here.       JavaScript compiler, which opens up numerous
                                                      opportunities for running code written in
                                                      languages other than JavaScript on the web,
4     Example Uses                                    including some not previously possible. Em-
                                                      scripten can be used to, among other things,
Emscripten has been run successfully on several       compile real-world C and C++ code and run
real-world codebases, including the following:        that on the web. In turn, by compiling the
  TODO: Performance data                              runtimes of languages implemented in C and
                                                      C++, we can run them on the web as well.
4.1    CPython                                        Examples were shown for Python and Lua.
                                                         One of the main tasks for future work in
Other ways to run Python on web: pyja-                Emscripten is to broaden it’s standard library.
mas/pyjs, old pypy-js backend, ?? also Iron-          Emscripten currently includes portions of libc
Python on Silverlight and Jython in Java. Lim-        and other standard C libraries, implemented in
itations etc.                                         JavaScript. Portions of existing libc implemen-
   CPython is the standard implementation of          tations written themselves in C can also be com-
the Python programming language written in C.         piled into JavaScript using Emscripten, but in
Compiling it in Emscripten is straightforward,        general the difficulty is in creating a suitable
except for needing to change some #defines,           runtime environment on the web. For example,
without which CPython creates platform-specific       there is no filesystem accessible, nor normal sys-
assembly code.                                        tem calls and so forth. Some of those features can
   Compilation using llvm-gcc generates approx-       be implemented in JavaScript, in particular new
imately 27.9MB of LLVM assembly.          After       HTML features like the File API should help.
running Emscripten and the Closure Compiler,             Another important task is to support multi-
the generated JavaScript code is approximately        threading. Emscripten currently assumes the
2.76MB in size. For comparison, a native binary       code being compiled is single-threaded, since
version of CPython is approxiately 2.28MB in          JavaScript does not have support for multi-
size, so the two differ by only 21%.                  threading (Web Workers allow multiprocessing,

                                                                                               Page 11
but they do not have shared state, so implement-      which would help on such code very significantly,
ing threads with them is not trivial). However, it    and are already in use or being worked on in
would be possible to emulate multithreading in a      mainstream JavaScript engines (e.g., SpiderMon-
single thread. One approach could be to not gen-      key).
erate native JavaScript control flow structures,         The limit of such an approach is to perform
and instead to use a switch-in-a-loop for the en-     static analysis on an entire program compiled
tire program. Code can then be added in the           by Emscripten, generating highly-optimized ma-
loop to switch every so often between ‘threads’,      chine code from that. As evidence of the poten-
while maintaining their state and so forth.           tial in such an approach, the PyPy project can
   A third important task is to improve the speed     compile RPython – something very close to im-
of generated code. An optimal situation would         plicitly statically typed Python – into C, which
be for code compiled by Emscripten to run at or       can then be compiled and run at native speed.
near the speed of native code. In that respect it     We may in the future see JavaScript engines per-
is worth comparing Emscripten to Portable Na-         form such static compilation, when the code they
tive Client (PNaCl), a project in development         run is implicitly statically typed, which would
which aims to allow an LLVM-like format to be         allow Emscripten’s generated code to run at na-
distributed and run securely on the web, with         tive speeds as well. While not trivial, such an
speed comparable to native code.                      approach is possible, and if accomplished, would
   Both Emscripten and PNaCl allow running            mean that the combination of Emscripten and
compiled native code on the web, Emscripten by        suitable JavaScript engines will let people write
compiling that code into JavaScript, and PNaCl        code in their languages of choice, and run them
by compiling it into an LLVM-like format, which       at native speeds on the web.
is then run in a special PNaCl runtime. The              Finally, we conclude with another another av-
major differences are that Emscripten’s gener-        enue for optimization. Assume that we are com-
ated code can run on all web browsers, since it       piling a C or C++ runtime of a language into
is standard JavaScript, while PNaCl’s generated       JavaScript, and that that runtime uses JIT com-
code requires the PNaCl runtime to be installed;      pilation to generate machine code. Typically
another major difference is that JavaScript en-       code generators for JITs are written for the main
gines do not yet run code at near-native speeds,      CPU architectures, today x86, x86 64 and ARM.
while PNaCl does. To summarize this compar-           However, it would be possible for a JIT to gener-
ison, Emscripten’s approach allows the code to        ate JavaScript instead. Thus, the runtime would
be run in more places, while PNaCl’s allows the       be compiled using Emscripten, and it would pass
code to run faster.                                   the generated JavaScript to eval. In this sce-
   However, improvements in JavaScript engines        nario, JavaScript is used as a low-level interme-
may narrow the speed gap. In particular, for          diate representation in the runtime, and the final
purposes of Emscripten we do not need to care         conversion to machine code is left to the underly-
about all JavaScript, but only the kind gener-        ing JavaScript engine. This approach can poten-
ated by Emscripten. Such code is implicitly           tially allow languages that greatly benefit from
statically typed, that is, types are not mixed,       a JIT (such as Java, Lua, etc.) to be run on the
despite JavaScript in general allowing assigning,     web efficiently.
e.g., an integer to a variable and later a floating
point value or even an object to that same vari-
able. Implicitly statically typed code can be stat-
ically analyzed and converted into machine code
that has no runtime type checks at all. While
such static analysis can be time-consuming, there
are practical ways for achieving similar results
quickly, such as tracing and local type inference,

                                                                                               Page 12
You can also read