Comparison of Erlang Runtime System and Java Virtual Machine

Page created by Julio Mason
 
CONTINUE READING
Comparison of Erlang Runtime
        System and Java Virtual Machine
                                                  Tõnis Pool
                                             University of Tartu
                                           tonis.pool@gmail.com
                                        Supervised by Oleg Batrashev

                                                     Abstract

       This report gives a high level overview of the Erlang Runtime System (ERTS) and the Java Virtual Machine
       (JVM), comparing the two in terms of overall architecture, memory layout, parallelism/concurrency and
       runtime optimisations. More specifically I’ll look at the HotSpot JVM provided by Oracle and the default
       BEAM implementation open sourced by Ericsson.

              1.    Introduction                              why it was chosen for comparison.
                                                                 The JVM works by executing bytecode state-
      here are many different virtual machines

T
                                                              ments from a class file, generated by compiling
      in existence nowadays packing a grow-                   the java source code. This indirection is what
      ing number of clever tricks to make them                gives programmers the ability to compile their
run as fast as possible. The JVM is perhaps the               source code once and then execute it on differ-
most known virtual machine and one that’s                     ent platforms that have JVM implementations.
received couple of decades’ worth of improve-
ments and optimisations.
                                                              1.2.    ERTS
    Somewhat similarly to Java, Erlang is a
general-purpose concurrent, garbage-collected                 Unlike the JVM, Erlang doesn’t have a formal
programming language that nowadays runs on                    specification for it’s runtime nor for the pro-
a virtual machine and has been around even                    gramming language. This makes it difficult
longer than Java. The whole runtime system to-                to have various implementations, as they all
gether with the language, interpreter, memory                 have to derive the semantics from the de facto
handling is called the Erlang Runtime System,                 BEAM implementation[1]. But similarly to Java
but the virtual machine is often referred to also             ERTS works by executing an intermediate rep-
as the BEAM.                                                  resentation of erlang source code, also known
                                                              as BEAM code. As mentioned there’s no speci-
                                                              fication for the generated instructions, though
1.1.    JVM
                                                              a few unofficial documents are available[2].
The term Java Virtual Machine specifies 3 differ-                 Quite unlike the Java programming lan-
ent notions - a specification of an abstract stack            guage Erlang is a functional programming lan-
machine, a implementation of that specification               guage based on the actor model, which dates
and a running process. The specification makes                back to 1973[3]. The actor model specifies the
sure that different implementations are inter-                actor as the unit of concurrency. In erlang ac-
changeable, but on other hand isn’t very strict               tors are called processes. Actors can only com-
about areas such as memory layout, garbage                    municate with each other by sending messages,
collection algorithms and instruction optimisa-               but are otherwise independent, which allows
tions, allowing for many different approaches.                to run them in parallel with each other[4].
The HotSpot VM provided by Oracle is the                          Because of this distinction, that the Er-
most widely used implementation, which is                     lang language focuses specifically on the actor

                                                          1
Comparison of Erlang Runtime System and Java Virtual Machine • May 2015

model, comparing the JVM with ERTS can be                       2.       Memory Layout
seen as comparing apples and carrots, because
the ERTS is heavily geared and optimised for         2.1.   JVM
this specific view of the world, where the JVM
                                                     Figure 1 shows the general architecture of the
has no such prejudices.
                                                     HotSpot VM when looked upon as memory
                                                     areas[5].

                                Figure 1: Simplified JVM memory layout

    There’s a shared heap, where all objects         called Eden, two smaller Survivor and Tenured
and arrays live and then there’s a non heap          spaces. This optimisation stems from the ob-
area designated for metadata about the classes       servation that most objects die young, thus it
loaded and a Code Cache, which is used for           makes sense to put them into a separate area
compilation and storage of methods that have         and garbage collect that smaller area more of-
been compiled to native code by the JIT com-         ten than the whole heap[6].
piler.
                                                         The JVM has many different Garbage Col-
    Every thread in the JVM has a program
                                                     lection (GC) algorithms embedded, which cater
counter, which holds the address of the current
                                                     to different needs of the application. In gen-
instruction (if it’s not native), a stack, which
                                                     eral to collect garbage from the shared heap
holds frames for each executing method and a
                                                     a stop-the-world pause is needed, where all
native stack. Every thread in JVM is mapped
                                                     application threads are halted. Different GC
one-to-one to an operating system (OS) level
                                                     strategies optimise either for throughput - total
thread, which is then scheduled and managed
                                                     GC overhead should be low, but long pauses
by the OS.
                                                     are allowed - or latency - each stop-the-world
   Heap on the HotSpot VM consists of a              pause should be as short as possible. Based on
three distinct areas (also known as generations)     different heuristics (like what kind of machine

2
Comparison of Erlang Runtime System and Java Virtual Machine • May 2015

it’s running on) the JVM will choose a strategy,     give it some latency goals for example[7].
but users can also enforce some strategy and

                                Figure 2: Simplified ERTS memory layout

2.2.   ERTS                                          element that is still referenced, starting with
                                                     the root set[8].
Figure 2 shows the general architecture of the          The per process heap architecture has many
Erlang Runtime System when looked upon as            benefits, one of which is that because each pro-
memory areas[8]. The major difference here           cess has it’s own heap and there are presum-
is that instead of having a single shared heap,      ably numerous process’s, each heaps’ root set
each process in Erlang has it’s own. The same        can be very small and can be garbage collected
memory area is also used for the stack of the        quickly in isolation, without affecting other
process, where the two are growing towards           running processes. Meaning there aren’t any
each other. Such a setup makes it very cheap to      global stop-the-world pauses.
check wether a process is running out of heap           When a process dies all it’s memory can be
or stack space.                                      deallocated right away, which means processes
    Besides a heap and a stack each process          that are short lived or don’t generate much
has a process control block, which is used to        garbage, may never go through a single GC
hold various metadata about the process and          cycle. The above (+ true preemptive schedul-
a message queue. Binary data that is larger          ing, that is discussed later) gives Erlang the
than 64 bytes is kept in a separate area, so that    marketed soft real-time capabilities[9], where
different processes could simply refer to it by      there aren’t global pauses in the application,
a pointer.                                           which often happen on the JVM.
    Similarly to the JVM ERTS uses a genera-
tional garbage collection, meaning the heap is        3.   Parallelism and Concurrency
divided into a young and old generation. The
GC strategy used is a copying collector (except      Now for the tricky bit. Erlang is marketed
for the binary area, which is garbage collected      as a tool for building massively scalable soft
by reference counting), meaning it will allocate     real-time systems with requirements on high
another memory area and then copy over each          availability[9]. It’s often used in 99.999% up-

                                                                                                   3
Comparison of Erlang Runtime System and Java Virtual Machine • May 2015

time guaranteed systems[10]. Java on the other            As mentioned in the introduction, Erlang
hand is a more general-purpose programming            uses the actor model, where the only means
language, used in fields such as high frequency       to share state between concurrent and/or par-
trading to embedded devices.                          allel actors is message passing. Furthermore,
                                                      because each process has its own heap, there
3.1.    Shared state                                  isn’t any shared state by design (except for the
                                                      binary and a few other Erlang specific memory
In 1971 Edsger Dijkstra posed and solved the          areas). Removing shared state and relying only
now famous the Dining Philosophers Problem            on message passing has the potential to make
about concurrency. His solution involved a            concurrency easy again[13].
construct commonly known as a semaphore,
which restrict access to a shared resource[11].              "Erlang is to locks what Java is to
Dijkstra’s solution is often referred to as shared           pointers[11]"
state concurrency, which is the programming
model Java uses.                                          Arguably message passing concurrency
    The JVM has a single heap and sharing state       model is on a higher abstraction level than
between concurrent parties is done via locks          using locks. Though because it’s a higher ab-
that guarantee mutual exclusion to a specific         straction it also removes some of the hardships
memory area. This means that the burden of            of using locks and is easier to use, because we
guaranteeing correct access to shared variables       don’t have to guarantee the correctness of the
lies on the programmer, who must guard the            program in terms of concurrency anymore, the
critical sections by locks. However this is noto-     runtime does that for us.
riously difficult to do, leading to widely spread         Similarly the message passing style of con-
belief that concurrency is hard[12].                  currency can be successfully modeled with
    There are numerous reasons why shared             locks on the JVM. The Akka framework is es-
state between threads brings troubles:                sentially doing just this and brings the actor
                                                      model to the JVM[14]. Using the Akka frame-
    1. Deadlocks - occurs when two threads            work on the JVM makes the experience very
       both need resources A and B. First thread      similar to Erlang with the added benefit of
       locks A first and then tries to get B,         ability to use the much larger Java ecosystem.
       whereas the second thread operates in
       the opposite order. This results in a situa-
                                                      3.2.    Message Passing
       tion where both threads have locked one
       resource and wait for the other thread to      Because of the heap per process architecture
       release the other resource.                    in ERTS, message passing is done by coping
                                                      the message from one process heap to another.
    2. Race conditions - occurs when the pro-
                                                      This can be somewhat surprising as it would
       grammer hasn’t correctly guaranteed the
                                                      be much cheaper to simply pass a pointer to
       order of execution between two threads.
                                                      the message from one process to another, but
       Which means that the correctness of the
                                                      it would require some shared memory area for
       program depends on the OS scheduler,
                                                      the processes.
       which is not deterministic, resulting in
                                                          ERTS has actually experimented with both
       hard to reproduce errors.
                                                      a shared and a hybrid heap architectures in the
    3. Spending too much time in the critical         past[10], but after multithreading support was
       section - occurs when a thread after ac-       added to the BEAM the implementations were
       quiring a shared resources holds it for too    broken and later removed from ERTS[15]. A
       long, effectively making all other inter-      hybrid heap architecture is one where each pro-
       ested parties wait and do nothing during       cess still has a separate heap, but messages that
       that time.                                     are to be sent to other processes are allocated

4
Comparison of Erlang Runtime System and Java Virtual Machine • May 2015

in a shared memory area to allow fast message           The effect is that Erlang is one of a few
passing. Experimentation showed that such           languages that actually does preemptive mul-
an approach has promise, but as stated, the         titasking. The reduction count of 2000, which
implementation was left unfinished[15].             is sub 1ms, is quite low and forces many small
    The original reason for going with the copy-    context switches between the Erlang processes.
ing strategy was that the destination process       But this also ensures that an Erlang system
might be on another machine. If we are pass-        tends to degrade in a graceful manner when
ing messages as pointers among processes on         loaded with more work[21].
a single machine, but doing something else for
processes on different machines, means that            Schedulers will also balance work between
the error handling code will be more difficult      each other. The strategy can be configured to ei-
and has to be handled separately. The underly-      ther balancing load out as evenly as possible or
ing reason was to make the semantics of local       using the least amount of schedulers (putting
and remote failures the same[16].                   the other ones to sleep, conserving power for
    Furthermore, the "free lunch" for perfor-       example)[20].
mance is long over, meaning processors aren’t
getting faster, instead multicore processors are        The important note here is that blocking a
standard[17]. Providing a coherent picture of       process doesn’t block a scheduler, it will sim-
memory to all cores is increasingly expensive       ply immediately start running the next process.
and one can think of your cpu as a distributed      As Erlang processes are not OS level threads
system[18]. Whether it’s hidden from the pro-       they are more lightweight, which is the main
grammer or not, the cpu will have to do some        reason ERTS can run hundreds of thousands of
copying when a cache miss occurs for example.       processes. Erlang processes use a dynamically
When data sharing times between cores con-          allocated stack, that starts off much smaller
tinue to rise it makes sense to do the expensive    than Java threads. By default an Erlang process
operation of copying immediately[16].               will use around 2.5KB[22] on a 64bit machine,
                                                    wheres as a Java thread starts off with a 1024KB
                                                    stack on a 64bit machine.
3.3.   Lightweight threads
As mentioned before, the JVM maps its threads           There is a library for Java called Quasar,
one-to-one to OS level threads, which are then      that tries to bring the same lightweight threads
scheduled by the OS scheduler, but in Erlang        to the JVM as are Erlang processes. In order
a process is simply a separate memory area          to really pull it off, Quasar needs to instru-
and are mapped n-to-m to OS level threads.          ment your code to save the execution state and
Since 2006 ERTS supports true multithreading        restore it when that lightweight thread starts
through Symmetric Multi Processing (SMP).           running again[23]. Though instrumentation
With SMP ERTS starts with the same number           has many challenges that add complexity to
of schedulers as there are cores[19].               the program instead of removing it.
    Each scheduler has a run queue that con-
tains runnable processes. A process runs until         The same applies for the JVM actor frame-
it tries to receive a message, but the mailbox      work Akka mentioned earlier. They can work
was empty, or it runs out of reductions. The        very well, if the programmers take care to fol-
meaning of reductions in ERTS is not clearly        low some principles. As Akka actors run on
defined, but they should represent "units of        OS Threads an actor that randomly blocks will
work" and are roughly equivalent to function        block the entire thread. Without fundamental
calls. Each process that starts running initially   changes in how the JVM works, one cannot
has 2000 reductions, and when it rans out is        guarantee that an arbitrary piece of code will
put at the end of the run queue[20].                not block[24].

                                                                                                   5
Comparison of Erlang Runtime System and Java Virtual Machine • May 2015

       4.   Runtime Optimisations                   JVM, because it can keep track of the shortcuts
                                                    it has made and if any of them become invalid,
4.1.    JIT compiler                                for example a method does throw an excep-
                                                    tion, then the JIT compiler will deoptimize the
When looking at the JVM and comparing it
                                                    method and compile it again without the op-
with other Virtual Machines then we cannot
                                                    timisation or run it as interpreted. The HiPE
forget to mention the just-in-tim (JIT) com-
                                                    compiler however cannot make such shortcuts,
piler that HotSpot has, which has the largest
                                                    because it hasn’t got the ability to change gears
effect on performance of the JVM[25]. As men-
                                                    during runtime, the once compiled code has to
tioned in the introduction, both Erlang and
                                                    be correct and work in all circumstances.
Java source code are not directly compiled to
                                                        However a true just-in-time compiler for
executable binaries or native code. Instead they
                                                    ERTS is in development and called BEAMJIT.
rely on the runtime to execute the statements in
                                                    Similarly to the HotSpot JIT compiler it uses
the intermediate language ( bytecode for java
                                                    tracing to decide which methods should be
and so called BEAM code for Erlang).
                                                    compiled to native code and which not. It has
    Purely interpreting commands sequentially
                                                    shown increased performance in some bench-
is slower simply because of having to translate
                                                    marks, but the current implementation doesn’t
commands over and over again into the ma-
                                                    do any Erlang language specific optimisations,
chines native instructions, not to mention vari-
                                                    which leaves out the best part of having a true
ous optimisations that good compilers do[25].
                                                    just-in-time compiler[27].
JIT compiler in the HotSpot works by keeping
track of what methods are "hot" (called often)
and then optimising and compiling them to                          5.   Summary
native code. This has the benefit that effort is
not spent to compile/optimise methods that          As we’ve seen the JVM and ERTS are quite
don’t execute or do so rarely.                      different underneath. Namely the difference
                                                    stems from the fact that Erlang takes the Actor
4.2.    HIPE and BEAMJIT                            model as it’s basis and has built it’s runtime
                                                    around that. Whether that’s good or bad is
ERTS comes with a ahead-of-time compiler            up to debate and most likely depends on the
called HiPE (High Performance Erlang). It is        problem at hand. Java and the JVM provide
sometimes called also a just-in-time compiler,      enough tools to retrofit any concurrency model
but in reality it’s more of a ahead-of-time com-    out there, but retrofitting anything won’t be the
piler, because the user has to choose which         same as taking it into the initial design. That
functions or modules are compiled into native       said there’s a wonderful quote from the one of
code. It doesn’t do it automatically during         the creators of Erlang:
runtime for most used functions[26].
    Compiling ahead of time also strips away             "Erlang accidentally has the right
many possibilities for optimisation that the             properties to exploit multi-core ar-
HotSpot JIT compiler does. Essentially the               chitectures - not by design but by
HotSpot JIT compiler is able to gamble the               accident" - Joe Armstrong[16]
system into better performance by doing short-
cuts. For example the HotSpot JIT compiler
assumes that a method never throws an excep-                        References
tion or that most methods are not overloaded
and links directly to a specific callsite instead    [1] E. Stenman, “Vm tuning, know your
of traversing the class hierarchy each time to           engine - part ii: the beam.” https://www.
find the most specific method[25].                       youtube.com/watch?v=RcyN2yS5PRU,
    These optimisations are possible on the              July 2013.

6
Comparison of Erlang Runtime System and Java Virtual Machine • May 2015

 [2] “Beam file format.” https://synrc.              [13] J.    Armstrong,   “Concurrency   is
     com/publications/cat/Functional%                     easy.”   http://armstrongonsoftware.
     20Languages/Erlang/BEAM.pdf,   May                   blogspot.com/2006/08/
     2012.                                                concurrency-is-easy.html,    August
                                                          2006.
 [3] C. Hewitt, P. Bishop, and R. Steiger, “A
     universal modular actor formalism for ar-       [14] “http://akka.io/.” http://akka.io/.
     tificial intelligence,” in Proceedings of the
                                                     [15] R. Carlsson, “[erlang-questions] why are
     3rd International Joint Conference on Artifi-
                                                          messages between processes copied?.”
     cial Intelligence, IJCAI’73, (San Francisco,
                                                          http://erlang.org/pipermail/
     CA, USA), pp. 235–245, Morgan Kauf-
                                                          erlang-questions/2012-February/
     mann Publishers Inc., 1973.
                                                          064613.html, February 2012.
 [4] F. Hebert, Learn You Some Erlang for Great      [16] J. Armstrong, “[erlang-questions] why
     Good!: A Beginner’s Guide. No Starch Press           are messages between processes copied?.”
     Series, No Starch Press, 2013.                       http://erlang.org/pipermail/
                                                          erlang-questions/2012-February/
 [5] J. Bloom, “JVM Internals.” http://blog.
                                                          064617.html, February 2012.
     jamesdbloom.com/JVMInternals.html,
     November 2013.                                  [17] H. Sutter, “The free lunch is over: A
                                                          fundamental turn toward concurrency
 [6] B. Goetz, “Java theory and practice:
                                                          in software.” http://www.gotw.ca/
     Garbage collection in the HotSpot JVM.”
                                                          publications/concurrency-ddj.htm,
     http://www.ibm.com/developerworks/
                                                          March 2005.
     library/j-jtp11253/, November 2003.
                                                     [18] A. Baumann, S. Peter, A. Schüpbach,
 [7] “Memory Management in the Java                       A. Singhania, T. Roscoe, P. Barham, and
     HotSpotTM Virtual Machine,” tech. rep.,              R. Isaacs, “Your computer is already
     Sun Microsystems, 04 2006.                           a distributed system. why isn’t your
                                                          os?.” https://www.usenix.org/legacy/
 [8] E. Stenman, “Erlang engine tuning: Part
                                                          event/hotos09/tech/full_papers/
     1 - know your engine.” http://www.
                                                          baumann/baumann_html/index.html,
     youtube.com/watch?v=QbzH0L_0pxI,
                                                          March 2009.
     April 2013.
                                                     [19] K. Lundin, “Inside the erlang vm.”
 [9] “http://www.erlang.org/.” http://www.
                                                          http://www.erlang.org/euc/08/euc_
     erlang.org/.
                                                          smp.pdf, November 2008.
[10] E. Johansson, K. Sagonas, and J. Wilhelms-      [20] L. Larsson, “Understanding the er-
     son, “Heap architectures for concurrent              lang scheduler.” https://vimeo.com/
     languages using message passing,” in Pro-            86106434, February 2014.
     ceedings of the 3rd International Symposium
     on Memory Management, ISMM ’02, (New            [21] J.   L.    Andersen,     “How    er-
     York, NY, USA), pp. 88–99, ACM, 2002.                lang    does  scheduling.”  http://
                                                          jlouisramblings.blogspot.com/2013/
[11] S. Akhmechet, “Erlang style concurrency.”            01/how-erlang-does-scheduling.html,
     http://www.defmacro.org/ramblings/                   January 2013.
     concurrency.html, August 2006.
                                                     [22] “Efficiency guide 8.1 creation of an erlang
[12] D. V. Subramaniam, “What makes concur-               process.” http://www.erlang.org/doc/
     rency hard,” Healthy Code, July 2014.                efficiency_guide/processes.html.

                                                                                                   7
Comparison of Erlang Runtime System and Java Virtual Machine • May 2015

[23] R. Pressler, “Jvmls 2014: Lightweight       [26] K. F. Sagonas, M. Pettersson, R. Carlsson,
     threads.” http://medianetwork.oracle.            P. Gustafsson, and T. Lindahl, “All you
     com/video/player/3731008736001, Au-              wanted to know about the hipe compiler:
     gust 2014.                                       (but might have been afraid to ask).,” in
                                                      Erlang Workshop (B. Däcker and T. Arts,
[24] C. Moon, “Actors, green threads and csp          eds.), pp. 36–42, ACM, 2003.
     on the jvm – no, you can’t have a pony.”
     http://www.boundary.com/blog/2014/          [27] F. Drejhammar and L. Rasmusson,
     09/no-you-cant-have-a-pony/, March               “Beamjit: A just-in-time compiling runtime
     2014.                                            for erlang,” in Proceedings of the Thirteenth
                                                      ACM SIGPLAN Workshop on Erlang, Er-
[25] S. Oaks, Java Performance: The Definitive        lang ’14, (New York, NY, USA), pp. 61–72,
     Guide. O’Reilly Media, 2014.                     ACM, 2014.

8
You can also read