Polymorphisation: Improving Rust compilation times through intelligent monomorphisation

Page created by Kent Kramer
 
CONTINUE READING
Polymorphisation: Improving Rust compilation times
            through intelligent monomorphisation
                                            David Wood (2198230W)
                                                     April 15, 2020
                             MSci Software Engineering with Work Placement (40 Credits)

ABSTRACT                                                               In addition, Rust’s compilation model is different from
Rust is a new systems programming language, designed to             other programming languages. In C++, the compilation
provide memory safety while maintaining high performance.           unit is a single file, whereas Rust’s compilation unit is a
One of the major priorities for the Rust compiler team is           crate1 . While compilation times of entire C++ and entire
to improve compilation times, which is regularly requested          Rust projects are generally comparable, modification of a
by users of the language. Rust’s high compilation times are         single C++ file requires far less recompilation than modifi-
a result of many expensive compile-time analyses, a com-            cation of a single Rust file. In recent years, implementation
pilation model with large compilation units (thus requiring         of incremental compilation into rustc has improved recom-
incremental compilation techniques), use of LLVM to gen-            pilation times after small modifications to code.
erate machine code, and the language’s monomorphisation                Rust generates machine code using LLVM, a collection of
strategy. This paper presents a code size optimisation, im-         modular and reusable compiler and toolchain technologies.
plemented in rustc, to reduce compilation times by intelli-         At LLVM’s core is a language-independent optimiser, and
gently reducing monomorphisation through “polymorphisa-             code-generation support for multiple processors. Languages
tion” - identifying where functions could remain polymor-           including Rust, Swift, Julia and C/C++ have compilers us-
phic and producing fewer monomorphisations. By reduc-               ing LLVM as a “backend”. By de-duplicating the effort of
ing the quantity of LLVM IR generated, and thus reduc-              efficient code generation, LLVM enables compiler engineers
ing code generation time spent in LLVM by the compiler,             to focus on implementing the parts of their compiler that are
this project achieves 3-5% compilation time improvements            specific to the language (in the “frontend” of the compiler).
in benchmarks with heavy use of closures in highly-generic,         Compiler frontends are expected to transform their internal
frequently-instantiated functions.                                  representation of the user’s source code into LLVM IR, an
                                                                    intermediate representation from LLVM which enables opti-
                                                                    misation and code-generation to be written without knowl-
1.   INTRODUCTION                                                   edge of the source language.
                                                                       While LLVM enables Rust to have world-class runtime
   Rust is a multi-paradigm systems programming language,           performance, LLVM is a large framework which is not fo-
designed to be syntactically similar to C++ but with an ex-         cused on compile-time performance. This is exacerbated by
plicit goal of providing memory safety and safe concurrency         technical debt in rustc which results in the generation of
while maintaining high performance. The first stable release        low-quality LLVM IR. Recent work on MIR2 optimisations
of the language was in 2015 and each year since 2016, Rust          have improved the quality of rustc’s generated LLVM IR,
has been the “most loved programming language” in Stack             and reduced the work required by LLVM to optimise and
Overflow’s Developer Survey [11, 12, 13, 14]. In the years          produce machine code.
since, Rust has been used by many well-known companies,                Finally, rustc monomorphises generic functions. Monomor-
such as Atlassian, Braintree, Canonical, Chef, Cloudflare,          phisation is a strategy for compiling generic code, which du-
Coursera, Deliveroo, Dropbox, Google and npm.                       plicates the definition of generic functions for every instan-
   Each year, the Rust project runs a “State of Rust” sur-          tiation, producing significantly more generated code than
vey, asking users of the language on their opinions to help         other translation strategies. As a result, LLVM has to per-
establishment development priorities for the project. Since         form significant work to eliminate or optimise redundant IR.
2017 [22, 19, 20], faster compilation times have consistently          This project improves the compilation times of Rust projects
been an area where users of the language say Rust needs to          by implementing a code-size optimisation in rustc to elim-
improve.                                                            inate redundant monomorphisation, which will reduce the
   There are a handful of reasons why rustc, the Rust com-          quantity of generated LLVM IR and eliminate some of the
piler, is slow. Compared to other systems programming lan-          work previously required of LLVM and thus improve compi-
guages, Rust has a moderately-complex type system and               lation times. In particular, the analysis implemented by this
has to enforce the constraints that make Rust safe at com-
pilation time - analyses which take more time than those            1
                                                                      Crates are entire Rust projects, including every Rust file
required for a language with a simpler type system. Lots            in the project
of effort is being invested in optimising rustc’s analyses to       2
                                                                      MIR is rustc’s “middle intermediate representation” and is
improve performance.                                                introduced in Section 3.1.3

                                                                1
project will detect where generic parameters to functions,            compared to homogeneous translation, with only modest in-
closures and generators are unused and fewer monomorphi-              creases in code size.
sations could be generated as a result.                                  Dragos et al. [3] contribute an approach for compila-
   Detection of unused generic parameters, while relatively           tion of polymorphic code which allows the user to decide
simple, was deliberately chosen, as the primary contribution          which code should be specialised (monomorphised). This
of this work is the infrastructure within rustc to support            approach allows for specialised code to be compiled sepa-
polymorphic MIR during code generation, which will enable             rately and mixed with generic code. Their approach, again
more complex extensions to the initial analysis (see further          implemented in Scala, introduces an annotation which can
discussion in Section 5.1). Despite this, there is ample ev-          be applied to type parameters. This annotation instructs the
idence in the Rust ecosystem, predating this project, that            compiler to specialise code for that type parameter. Generic
detection of unused generic parameters could yield signifi-           class definitions require that the approach presented must be
cant compilation time improvements.                                   able to specialise entire class definitions while maintaining
   Within Rayon, a data-parallelism library, a pull request           subclass relationships so specialised and generic code can
[17] was submitted to reduce the usage of closures by moving          interoperate. Evaluation is performed on the Scala stan-
them into nested functions, which do not inherit generic pa-          dard library and a series of benchmarks where runtime per-
rameters (and thus would have fewer unnecessary monomor-              formance is improved by over 20x for code size increases
phisations). Similarly, in rustc, the same author submitted           of between 16% and 161%, however this approach requires
a pull request [16] to apply the same optimisation manually           programmers correctly identify functions to annotate.
to the language’s standard library.                                      Stucki et al. [18] identify that specialisation of generic
   Likewise, in Serde, a popular serialisation library, it was        code, which can improve performance by an order of magni-
observed [21] that a significant proportion of LLVM IR gen-           tude, is only effective when called from monomorphic sites
erated was the result of a tiny closure within a generic func-        or other specialised code. Performance of specialised code
tion (which inherited and did not use generic parameters of           within these “islands” regresses to that of generic code when
the parent function, resulting in unnecessary monomorphisa-           invoked from generic code. Stucki et al’s implementation
tions). In this particular case, the closure contributed more         provides a Scala macro which creates a “specialised body”
LLVM IR than all but five significantly larger functions.             containing the original generic code, where type parameters
   This project makes all of these changes to rustc, Rayon            have Scala’s specialisation annotation. Dispatch code is then
and Serde unnecessary, as the compiler will apply this opti-          added, which matches on the incoming type and invokes the
misation automatically.                                               correct specialisation. While this approach induces runtime
   In Section 2, this paper reviews the existing research on          overhead, this is made up for in the performance improve-
optimisations which aim to reduce compilation times or code           ments achieved through specialisation. Their approach is
size through monomorphisation strategies. Next, Section 3             implementable in a library without compiler modifications
describes the polymorphisation analysis implemented by this           and achieves speedups of up to 30x, averaging 12x when
project in more detail and Section 4 presents a thorough              benchmarked with ScalaMeter.
analysis of the compilation time impact of polymorphisa-                 Ureche et al. [23] discuss efforts by the Java platform ar-
tion implemented in rustc. Finally, Section 5 summarises              chitects in Project Valhalla to update the Java Virtual Ma-
the result of this research and describes future work that it         chine (JVM)’s bytecode format to allow load-time class spe-
enables.                                                              cialisation, which will enable opt-in specialisation in Java.
                                                                      Java is contrasted to Scala, which has three generic compila-
2.   BACKGROUND                                                       tion schemes (erasure, specialisation and Ureche et al’s mini-
                                                                      boxing). The interaction of these schemes can cause subtle
2.0.1    Monomorphisation-related Optimisations                       performance issues which also affect Project Valhalla. In
                                                                      particular, miniboxed functions are still erased when invoked
   Optimisation of monomorphisation to improve compila-
                                                                      from erased functions, and boxing is required at generic com-
tion time is a subject that has received little attention. How-
                                                                      pilation scheme boundaries. This paper presents three tech-
ever, within the Java ecosystem, lots of literature exists on
                                                                      niques to eliminate the slowdowns associated with minibox-
addressing runtime performance impacts that result from
                                                                      ing. The authors validate their technique in the miniboxing
monomorphisation (or a lack thereof).
                                                                      plugin for Scala, producing performance improvements of
   Ureche et al. [24] present a similar optimisation, “Mini-
                                                                      2-4x.
boxing”, designed to target the Java Virtual Machine and
                                                                         These papers focus on languages based on the JVM due to
implemented in Scala. Homogeneous and heterogeneous ap-
                                                                      the performance impact that using primitives in generic code
proaches to generic code compilation are presented. Hetero-
                                                                      can have. In order to preserve bytecode backwards com-
geneous approaches duplicate and adapt code for each type
                                                                      patibility, the JVM requires that value types be converted
individually, increasing code size (also known as monomor-
                                                                      to heap objects, typically through boxing, when interacting
phisation, the approach taken in Rust and C++). Ho-
                                                                      with generic code. Scala has been of particular focus in this
mogeneous translation generates a single function but re-
                                                                      research because of the inclusion of specialisation annota-
quires data to have a common representation, irrespective of
                                                                      tions as a language feature.
type, which is typically achieved through pointers to heap-
allocated objects. Ureche et al. identify that larger value
types (e.g. integers) can contain smaller value types (e.g            2.0.2    Code Size Optimisations
bytes) and that this can be exploited to reduce the dupli-              In order to improve Rust’s compilation times, this project
cation necessary in homogeneous translations. When im-                implements an optimisation to reduce the quantity of gen-
plemented in the Scala compiler, performance matched het-             erated LLVM IR. It is therefore relevant to review research
erogeneous translation and obtained speedups of over 22x              into optimisations which aim to reduce code-size.

                                                                  2
Edler von Koch et al. [4] discuss a compiler optimisation         code as when hand-optimised with annotations.
technique for reducing code size by exploiting the structural
similarity of functions. Structurally similar functions have         2.0.3    Compilation Time Optimisations
equivalent signatures and equivalent control flow graphs.               Much like research which aims to reduce code-size, re-
Edler von Koch et al. introduce a technique for merging              search into optimisations to reduce compilation time in more
two structurally similar functions by combining the bodies           varied scenarios can be valuable to identify optimisation op-
of both functions and inserting control flow constructs to se-       portunities being missed.
lect the correct instructions where the functions differ. The           Han et al. [5] discuss a technique for reducing compi-
original functions are replaced with stubs which then call           lation time in parallelising compilers. In this paper, the
the merged function with an additional argument to select            authors inspect the OSCAR compiler, which parallelises C
the correct code path. The authors implemented this opti-            or Fortran77 programs. OSCAR repeatedly applies multiple
misation in LLVM and achieved code size reductions of an             program analysis passes and restructuring passes. After OS-
average 4% in Spec CPU2006 benchmarks.                               CAR’s second iteration, Han et al. identify that the OSCAR
   Debray et al. [2] present a series of transformations which       will apply an analysis to a function irrespective of whether
can be performed by the compiler to reduce code size. Trans-         the restructuring passes changed the function. Han et al.
formations are split into two categories, classical analyses         propose an optimisation to reduce compilation time by re-
and optimisations and techniques for code factoring. Most            moving redundant analyses. Compilation time reductions of
relevant to the techniques employed by this project is               27%, 13% and 19% are achieved for three large-scale propri-
redundant-code elimination. Redundant computations ex-               etary real applications.
ist where the same computation has been completed pre-                  Leopoldseder et al. [9] present a technique to determine
viously and that result is guaranteed to still be available.         when it is beneficial to duplicate instructions from merge
In their implementation, redundant code elimination tech-            blocks to their predecessors in order to attain further optimi-
niques were particularly effective at removing redundant             sation. The authors implement their approach, Dominance-
computation of global pointer values (on the Alpha proces-           based duplication simulation (DBDS), in GraalVM JIT com-
sor where the techniques were implemented) when enter-               piler, which introduces a constraint that the optimisation
ing functions. Debray et al. evaluate the optimisations by           must consider the impacts on compilation time. Leopoldseder
implementing them in a post-link-time optimiser, alto, and           et al. identify that constant folding, conditional elimina-
comparing the code size of a set of benchmarks before-and-           tion, partial escape analysis/scalar replacement and read
after optimisation by alto, achieving over 30% reductions.           elimination are achievable after code duplication has taken
   Kim et al. [7] present an iterative approach to the proce-        place. Other code duplication approaches use backtracking
dural abstraction techniques described by Debray et al. Pro-         to tentatively perform duplication while maintaining the op-
cedural abstraction identifies sequences of instructions (in-        tion to reverse the duplication if it is determined not prof-
struction patterns) which are repeated and creates a proce-          itable. Backtracking approaches are compile-time intensive
dure containing the pattern, replacing the original instances        and thus not suitable for JIT compilation. Leopoldseder et
with a call to the procedure. In traditional procedural ab-          al. choose to simulate duplication opportunities and then
straction approaches, such as those discussed in the orig-           fit those opportunities into a cost model which maximises
inal paper [2], pattern identification is followed by a sav-         peak performance while minimising compilation time and
ing estimation before patterns are replaced. In contrast,            code size. The authors achieve performance improvements
the approach presented by Kim et al. iterates global sav-            of up to 40%, mean performance improvements of 5.89%,
ing estimation and pattern replacement following pattern             mean code size increases of 9.93% and mean compilation
identification. In eight benchmarks, an implementation tar-          time increases of 18.84% across their benchmarks.
geting a CalmRISC8 embedded processor, averaged a 14.9%                 Unfortunately, none of these projects address the prob-
reduction in code size compared to traditional procedural            lem of reducing code-size when monomorphisation is used
abstraction techniques. However, due to the small number             exclusively as a strategy for compiling generic code.
of instructions and registers on the CalmRISC8 processor,
generated code would present many opportunities to the op-           3.   POLYMORPHISATION
timisation which could skew results higher than might be
                                                                        Polymorphisation, a concept introduced by this paper, is
achieved on more common instruction sets.
                                                                     an optimisation which determines when functions, closures
   Petrashko et al. [15] present context-sensitive call-graph
                                                                     and generators could remain polymorphic during code gen-
construction algorithms which exploits the types provided to
                                                                     eration.
type parameters as contexts. They demonstrate through a
                                                                        Rust supports parameterising functions by constants or
motivational example that context-sensitive call graph anal-
                                                                     types - these are known as generic functions and this feature
yses result in many calls being considered highly polymor-
                                                                     is similar to generics in other languages, like Java’s gener-
phic and not able to be inlined. Through running analyses
                                                                     ics or C++’s templates. Generic functions and types are
separately in the context of each possible type argument
                                                                     a desirable feature for programmers as they enable greater
(possible through their context-sensitive call graph), the au-
                                                                     code reuse. When generating machine code, there are two
thors are able to remove all boxing and unboxing in ad-
                                                                     approaches to dealing with generic functions - monomorphi-
dition to producing monomorphic calls which enable Java’s
                                                                     sation and boxing.
JIT to inline and aggressively optimise functions. Petrashko
                                                                        C++ and Rust perform monomorphisation, where multi-
et al. find that through use of the context-sensitive analy-
                                                                     ple copies of a function are generated for each of the types or
sis, performance improves by 1.4x and discovers 20% more
                                                                     constants that the function was instantiated with. In con-
monomorphic call sites. Generated bytecode size was re-
                                                                     trast, Java performs dynamic dispatch, where each object is
duced by up to 5x and achieved the same performance on
                                                                     heap-allocated and a single copy of a function is generated,

                                                                 3
which takes an address.                                             tion of polymorphisation.
        In addition, Rust compiles to LLVM IR, the intermediate
     representation of LLVM. LLVM IR doesn’t have any concept            3.1.1    Query System
     of generics, so Rust must perform either dynamic dispatch              Traditionally, compilers are implemented in passes, where
     or monomorphisation.                                                source code, an abstract syntax tree or intermediate repre-
        The initial polymorphisation analysis implemented in this        sentation is processed multiple times. Each pass takes the
     project determines when a type or constant parameter to a           result of the previous pass and performs an analysis or op-
     function, closure or generator is unused, and thus when this        eration - such as lexical analysis, parsing, semantic analysis,
     would result in multiple redundant copies of the function           optimisation and code generation. Multi-pass compilers are
     being generated. By generating fewer redundant monomor-             well-established in the literature, but the expectations on
     phisations of functions in the LLVM IR, there would be less         compilers have changed in recent years.
     work for LLVM to do, reducing compilation times and code               Modern programming languages are expected to have high-
     size.                                                               quality integration into development environments [10], and
        Types with unused generic parameters are disallowed by           assist in powering code completion engines and convenience
     rustc, but there are no checks for unused generic parameters        features such as jump-to-definition. Moreover, users ex-
     in functions. Despite this, it is assumed that it is rare for       pect this information quickly and while they are writing
     programmers to write functions which have unused generic            their code (when it might not parse correctly or type-check).
     parameters. However, closures inherit the generic parame-           In addition, many programming languages’ designs make it
     ters of their parent functions and often don’t make use of          hard or impossible to statically determine the correct pro-
     these parameters.                                                   cessing order, as expected by traditional multi-pass compiler
        For example, consider the code shown in Listing 1, which         architectures.
     was taken from Serde, a popular Rust serialisation library.            rustc’s query system enables incremental and “demand-
                                                                         driven” compilation, and is integral to satisfying these expec-
1    fn parse_value(                                                  tations. Polymorphisation, as implemented in this project,
2        &mut self,                                                      uses the query system and limitations of the query system
3        visitor: V,                                                     are reflected in its design (see Section 3.1.4).
4    ) -> Result                                                  In the query system, the compiler can be thought of as
5    where                                                               having a “database” of knowledge about the current crate,
6        V: de::Visitor
actual parsing from input data. As such, queries form a                parameter is an integer which determines its order in the
directed-acyclic graph.                                                list of generic parameters in scope. To perform a substitu-
   Queries could form a cyclic graph by invoking themselves.           tion, the subst function on Ty is invoked with a SubstsRef
To prevent this, the query engine checks for cyclic invoca-            which replaces each TyKind::Param with the type from the
tions and aborts execution.                                            SubstsRef with the corresponding index.
   As all queries are invoked through the query context,                  The details of how generic parameters are implemented
accesses can be recorded and a dependency graph can be                 within rustc are directly leveraged by the polymorphisation
produced. To avoid unnecessary computation when inputs                 implemented in this project, as well as the TypeFoldable
change, the query utilises the dependency graph in the “red-           infrastructure used to implement susbts.
green algorithm” used for query re-evaluation.                            TypeFoldable is a trait (Rust’s equivalent of a Java in-
   try_mark_green is the primary operation in the red-green            terface) implemented by types that embed type informa-
algorithm. Before evaluating a query, the red-green algo-              tion, allowing the recursive processing of the contents of the
rithm checks each of the dependencies of the query. Each               TypeFoldable. TypeFolder is another trait used in con-
dependent query will have a colour: red, green or unknown.             junction with TypeFoldable. TypeFolder is implemented
Green queries’ inputs haven’t changed, and the cached re-              on types (like SubstsFolder, which is used by subst) and
sult can be re-used. Red queries’ inputs have changed, and             is used to implement behaviour when a type, const, re-
the result must be recomputed.                                         gion or binder is encountered in a type. Used together, the
   If the dependency’s colour is unknown then                          TypeFolder and TypeFoldable traits allow for traversal and
try_mark_green is invoked recursively. Should the depen-               modification of complex recursive type structures.
dency be successfully marked as green, then the algorithm
continues with the next of the current query’s dependencies.            3.1.3   MIR
However, if the dependency was marked as red, then the de-                MIR (mid-level IR) is an intermediate representation of
pendent query must be recomputed. When the result of the               functions, closures, generators and statics used in rustc, con-
dependent query was the same as before re-execution then               structed from the HIR and type inference results. It is a sim-
it would not invalidate the current node.                              plified version of Rust that is better suited to flow-sensitive
                                                                       analyses. For example, all loops are simplified through use
3.1.2    Types and Substitutions                                       of a “goto” construct; match expressions are simplified by
   rustc represents types with the ty::Ty type, which repre-           introducing a “discriminant” statement; method calls are re-
sents the semantics of a type (in contrast to                          placed with explicit calls to trait functions; and drops/panics
hir::Ty from the HIR (high-level IR), rustc’s desugared                are made explicit.
AST, which represents the syntax of a type).                              MIR is based on a control-flow graph, composed of basic
   ty::Ty is produced during type inference on the HIR, and            blocks. Each basic block contains zero or more statements
then used for type-checking. Ty in the HIR assumes that any            with a single successor, and end with a terminator, which
two types are distinct until proven otherwise, u32 (Rust’s 32-         can have multiple successors.
bit unsigned integer type) at line 10, column 20 would be                 Memory locations on the stack are known as “locals” in
considered different to u32 at line 20, column 4.                      the MIR. Locals include function arguments, temporaries
   ty::Ty is used to represent the “substitutions” for       8   }
a type (conceptually, a list of types that are to be substituted
with the generic parameters of the type). SubstsRef>, where
GenericArg is a space-efficient wrapper around                            Within rustc, the MIR is a set of data structures, but
GenericArgKind. GenericArgKind represents what kind of                 can be dumped textually for debugging. Consider the MIR
generic parameter is present - type, lifetime or const.                for the Rust code in Listing 2, it starts with the name and
   Ty contains TyKind::Param instances, each of which has              signature of the function in a pseudo-Rust syntax, shown in
a name and index. Only the index is required, the name is              Listing 3.
for debugging and error reporting. The index of the generic               Following the function signature, the variable declarations

                                                                   5
1   //   WARNING: This output format is intended for                 1   bb0: {
2   //   human consumers only and is subject to                      2       StorageLive(_1);
3   //   change without notice. Knock yourself out.                  3       _1 = const
4   fn   main() -> () {                                                      ,→  std::collections::HashSet::::new() ->
5         ...                                                                ,→  bb2;
6   }                                                                4   }

                  Listing 3: Function prelude in MIR                                        Listing 6: bb0 in MIR

                                                                     1   bb2: {
    for all locals are shown (Listing 4). Locals are printed with    2       StorageLive(_3);
    a leading underscore, followed by their index. Temporaries       3       StorageLive(_3);
    are intermingled with user-defined variables.                    4       _3 = &mut _1;
                                                                     5       _2 = const
1   let   mut   _0: ();                                                      ,→  std::collections::HashSet::::insert(
2   let   _2:   bool;                                                        ,→  move _3, const 1i32) -> [return: bb3,
3   let   mut   _3: &mut std::collections::HashSet;                     ,→  unwind: bb4];
4   let   _4:   bool;                                                6   }
5   let   mut   _5: &mut std::collections::HashSet;
6   let   _6:   bool;                                                                       Listing 7: bb2 in MIR
7   let   mut   _7: &mut std::collections::HashSet;

                     Listing 4: Variables in MIR                            There are other basic blocks in the MIR for Listing 2, but
                                                                         bb0 and bb2 are sufficient for understanding the key concepts
       To distinguish which locals are temporaries and which are         of the MIR.
    user-defined variables, the MIR output then displays debug-             In addition to StorageLive, Assign, StorageDead, there
    info for user-defined variables, such as set, shown in Listing       are a selection of other statements:
    5.
       Within each scope block, user-defined variables are listed           • Assign statements write an rvalue into a place.
    with the user’s name from the variable and the variable’s
                                                                            • FakeRead statements represent the reading that a pat-
    place. In this example, the mapping is direct, but the com-
                                                                              tern match might do, and exists to improve diagnos-
    piler can store multiple user variables in a single local, and
                                                                              tics.
    perform field accesses or dereferences.
       Scope blocks represent the lexical structure of the source           • SetDiscriminant statements write the discriminant of
    program (used in generating debuginfo). Each statement                    an enum variant (its index) to the representation of an
    and terminator in the MIR output (omitted in these snip-                  enum.
    pets) is followed by a comment which states which scope it
    exists in.                                                              • StorageLive and StorageDead starts and ends the live
                                                                              range for storage of a local.
1   scope 1 {
2       debug set => _1;                                                    • InlineAsm executes inline assembly.
3   }
                                                                            • Retag retags references in a place - this is part of the
                                                                              “Stacked Borrows” aliasing model by Jung et al. [6].
                       Listing 5: Scope in MIR
                                                                            • AscribeUserType encodes a user’s type ascription so
       All of the remaining MIR output is used by the basic                   that they can be respected by the borrow checker.
    blocks of the current MIR body, starting with bb0, shown
    in Listing 6. Basic blocks are numbered from zero upwards,              • Nop is a no-op, and is useful for deleting instructions
    and contain statements followed by terminator on the last                 without affecting statement indices.
    line.
       bb0 starts with a StorageLive statement, which indi-                Likewise, in addition to Call, there are various other ter-
    cates that the local _1 is live (until a matching StorageDead        minators:
    statement, not included in this listing). Code generation
    uses StorageLive statements to allocate stack space. At                 • Goto jumps to a single successor block.
    the end of bb0, there is a Call terminator, which invokes
                                                                            • SwitchInt evaluates an operand to an integer and jumps
    HashSet::new and resumes execution in bb2. While termi-
                                                                              to a target depending on the value.
    nators can have multiple successors, because HashSet::new
    does not require cleanup if it panicked, there is no panic              • Resume and Abort indicate that a landing pad is fin-
    edge.                                                                     ished and unwinding should continue or the process
       bb2 has two StorageLive statements followed by an Assign               should be aborted.
    statement on line 4 into a temporary, creating a mutable bor-
    row of _1, before ending the block with a Call terminator               • Return indicates a return from the function, assumes
    invoking HashSet::insert.                                                 that _0 has had an assignment.

                                                                     6
• Drop and DropAndReplace drops a place (and option-                    During code generation, the MIR being translated into
     ally replaces it).                                                 LLVM IR (or Cranelift IR) has no substitutions applied,
                                                                        therefore all of the generic parameters in the function are
   • Call invokes a function.                                           still present. Where required, substitutions (those collected
                                                                        during monomorphisation, as will be discussed in Section
   • Assert panics if a condition doesn’t hold.                         3.1.5) are applied to types. However, for shims, these sub-
                                                                        stitutions are typically redundant as the MIR of the shim
   • Yield indicates a suspend point in a generator.
                                                                        is generated specifically for a type, and do not have any
   • GeneratorDrop indicates the end of dropping a gener-               remaining generic parameters.
     ator.
                                                                        3.1.5    Monomorphisation
   • FalseEdges and FalseUnwind are used for control flow
                                                                           In order to determine which items will be included in the
     that is impossible, but required for borrow check to be
                                                                        generated LLVM IR for a crate, rustc performs monomor-
     conservative.
                                                                        phisation collection. Non-generic, non-const functions map
                                                                        to one LLVM artefact, whereas generic functions can con-
   Closures - similar to anonymous functions or lambdas in
                                                                        tribute zero to N artefacts depending on the number of in-
other languages - which can capture variables from the par-
                                                                        stantiations of the function. Monomorphisations can be
ent environment, are also represented in the MIR. Functions
                                                                        produced from instantiations of functions defined in other
and closures are represented almost identically in the MIR.
                                                                        crates.
Closures inherit the generic parameters of the parent func-
                                                                          “Mono items” are anything that results in a function or
tion, and have additional generic parameters tracking the
                                                                        global in the LLVM IR of a codegen unit. Mono items can
closure’s inferred kind, signature and upvars (captured vari-
                                                                        reference other mono items, for example, if a function baz
ables).
                                                                        references a function quux then the mono item for baz will
   Generators are suspendable functions, which operate sim-
                                                                        reference the mono item for quux (typically this results in the
ilarly to closures, but can yield values in addition to return-
                                                                        generated LLVM IR for baz referencing the generated LLVM
ing them. Like closures, generators are represented almost
                                                                        IR for quux too). Therefore, mono items form a directed
identically to functions in the MIR, but have extra generic
                                                                        graph, known as the “mono item graph”, which contains all
parameters which track the generator’s inferred resume type,
                                                                        items necessary for codegen of the entire program.
yield type, return type and witness (an inferred type which
                                                                           In order to compute the mono item graph, the collector
is a tuple of all types that could up in a generator frame).
                                                                        starts with the roots - non-generic syntactic items in the
   rustc provides a pair of visitor traits (depending on whether
                                                                        source code. Roots are found by walking the HIR of the
mutability is required) which types can implement to sim-
                                                                        crate, and whenever a non-generic function, method or static
plify traversal of the MIR.
                                                                        item is found, a mono item is created with the DefId of
3.1.4    Shims                                                          the item and an empty SubstsRef (since the item is non-
                                                                        generic).
   There are some circumstances where rustc will generate                  From the roots, neighbours are discovered by inspecting
functions during compilation, these functions are known as              the MIR of each item. Whenever a reference to another
shims, and they are often specialised for specific types.               mono item is found, it is added to the set of all mono items.
   Drop glue is an example of a shim generated by the com-              This process is repeated until the entire mono item graph
piler. Rust doesn’t require that the user write a destruc-              has been discovered. By starting with non-generic roots,
tor for their type, any type will be dropped recursively in             the current mono item will always be monomorphic, so all
a specified order. To implement this, rustc transforms all              generic parameters of neighbours will always be known. Ref-
drops into a call to drop_in_place for the given type.               erences to other mono items can take the form of function
   drop_in_place’s implementation is generated by the com-              calls, taking references to functions, closures, drop glue, un-
piler for each T. There are a variety of shims generated by             sizing casts and boxes.
the compiler:

   • Drop shims implements the drop glue required to deal-          1   fn main() {
     locate a given type.                                           2       bar(2u64);
                                                                    3   }
   • Clone shims implement cloning for a builtin type, like         4
     arrays, tuples and closures.                                   5   fn foo() {
                                                                    6       bar(2u32);
   • Closure-once shims generate a call to an associated            7   }
     method of the FnMut trait.                                     8

   • Reify shims are for fn() pointers where the function           9   fn bar(t: T) {
     cannot be turned into a pointer.                              10       baz(t, 8u16);
                                                                   11   }
   • FnPtr shims generate a call after a dereference for a         12

     function pointer.                                             13   fn baz(f: F, g: G) {
                                                                   14   }
   • Vtable shims generate a call after a dereference and
     move for a vtable.                                                           Listing 8: Monomorphisation Example

                                                                    7
For example, consider the code in Listing 8. Monomorphi-        1   query used_generic_params(key: DefId) ->
    sation will start by walking the HIR of the crate and looking          ,→  BitSet {
    for non-generic syntactic items. After this step, the set of       2       desc {
    mono items will contain the functions main and foo, both           3           |tcx| "determining which generic
    of which have no generic parameters.                                   ,→  parameters are used by `{}`",
       Next, monomorphisation will traverse the MIR of each                ,→  tcx.def_path_str(key)
    root, looking for neighbours. In main, the terminator of bb0       4       }
    (shown in Listing 9) references the bar function, with the         5   }
    substitutions u64.
                                                                                           Listing 12: Query definition
1   bb0: {
2       StorageLive(_1);                                               1   fn used_generic_params(
3       _1 = const bar::(const 2u64) -> bb1;                      2       tcx: TyCtxt
1    fn mark_used_by_default_parameters,                                                  is invoked recursively on the closure or generator, and any
3        def_id: DefId,                                                      parameters from the parent which were used in the closure
4        generics: &'tcx ty::Generics,                                       or generator are marked as used in the parent.
5        used_parameters: &mut BitSet,                                     The analysis was significantly more complex during imple-
6    ) {                                                                     mentation and had to special-case some casts, call and drop
7        if !tcx.is_trait(def_id)                                            terminators to avoid marking parameters as used unneces-
8           && (tcx.is_closure(def_id)                                       sarily, often in order to avoid parts of the compiler which
9               || tcx.type_of(def_id).is_generator())                       were previously always monomorphic. However, during in-
10       {                                                                   tegration of the analysis, as support for polymorphic MIR
11           for param in &generics.params {                                 during codegen was improved, and these special-cases were
12               used_parameters.insert(param.index);                        removed.
13           }                                                                  Some generic parameters are only used in the bounds of
14       } else {                                                            another generic parameter. After the analysis has detected
15           for param in &generics.params {                                 which generic parameters are used in the body of the item,
16               if param.is_lifetime() {                                    the predicates of the item are checked for uses of unused
17                   used_parameters.insert(                                 generic parameters in the bounds on used generic parame-
18                       param.index);                                       ters.
19               }
20           }                                                           1   fn bar() {}
21       }                                                               2

22                                                                       3   fn foo(_: I)
23        if let Some(parent) = generics.parent {                        4   where
24            mark_used_by_default_parameters(                           5       I: Iterator,
25                tcx, parent, tcx.generics_of(parent),                  6   {
26                used_parameters);                                      7       bar::()
27        }                                                              8   }
28   }
                                                                             Listing 15: Example of generic parameter use in predicate
                Listing 14: Used-by-default parameters
                                                                                Consider the example shown in Listing 15. foo has two
                                                                             generic parameters, I and T, and uses I in the substitutions
     the MIR body for the current item and uses a MIR visitor                of a call to bar. If only the body of foo was analysed then
     to traverse it. Only three components of the MIR need to                T would be considered unused despite it being clear that T
     be visited by the analysis: locals, types and constants.                is used in the iterator bound on I.
        visit_local_decl is invoked for each local variable in the
     item’s MIR before the MIR is traversed. This analysis’ im-
                                                                             3.2.1    Testing
     plementation of visit_local_decl calls super_local_decl                    To test of the analysis, a debugging flag, -Z polymorphize-
     (which proceeds with traversal as normal) for all but one               errors, is added, which emits compilation errors on items
     case: If the current item is a closure or generator and if the          with unused generic parameters, an example of which is
     local currently being visited is the first local, then it returns       shown in Listings 16 and 17.
     early.                                                                     By emitting compilation “errors”, the project leverages the
        In the MIR, the first local for a closure or generator is            existing error diagnostics infrastructure to enable introspec-
     a reference to the closure of generator itself. If the anal-            tion into the compiler internals from a test written at the
     ysis proceeded to visit this local as normal, then it would             Rust source-code level.
     eventually visit the substitutions of the closure or genera-
     tor. Unfortunately, this would result in all inherited generic      1   struct Foo(F);
                                                                         2
     parameters for a closure always being considered used, de-
     feating the point of the analysis.                                  3   impl Foo {
        visit_const and visit_ty are invoked for each constant           4       // Function has an unused generic
     and type in the MIR, and are where the visitor “bottoms             5       // parameter from impl.
     out”. This analysis’ implementation of visit_const and              6       pub fn unused_impl() {
     visit_ty call a visit_foldable function (which isn’t part           7   //~^ ERROR item has unused generic parameters
     of the MIR visitor).                                                8       }
        visit_foldable takes any type which implements                   9   }
     TypeFoldable and checks if the type has any type or con-
     stant parameters before visiting it with this analysis’ type                      Listing 16: Example of an analysis test
     visitor.
        In the type visitor, the analysis traverses the structure of a          In addition, this approach integrates well into rustc’s ex-
     type or constant and looks for ty::Param and                            isting test infrastructure, which writes all of the error output
     ConstKind::Param, setting the index from each within the                for a test into a stderr file and checks that the output hasn’t
     BitSet, marking the parameter as used. In addition, the                 changed by comparison with that file. In addition, anywhere
     type visitor also special-cases closures and generators. In-            an error is expected is annotated in the test source.

                                                                         9
1   error: item has unused generic parameters                                  • foo::
2     --> $DIR/functions.rs:38:12
3      |                                                                       • foo::
4   LL | impl Foo {                                             • foo::
5      |      - generic parameter `F` is unused
6   LL |     // Function has an unused generic                                 However, with the analysis, only a single mono item would
7   LL |     // parameter from impl.                                        be produced: foo::.
8   LL |     pub fn unused_impl() {                                            rustc’s code generation iterates over the mono items col-
9      |            ^^^^^^^^^^^                                             lected and generates LLVM IR for each, adding the result
                                                                            to an LLVM module. By de-duplicating during monomor-
           Listing 17: Example of an analysis test error                    phisation collection, polymorphisation “falls out” of the way
                                                                            the compiler is structured (though some more changes are
                                                                            required).
    3.3    Implementing polymorphisation in rustc                              In particular, various changes to code generation were re-
                                                                            quired wherever another Instance is referenced. For exam-
       Integration of the analysis’ results into the compiler re-
                                                                            ple, when generating LLVM IR for a TerminatorKind::Call
    quired significant debugging and many targeted modifica-
                                                                            (a call or invoke instruction in LLVM), an Instance is
    tions where previously only monomorphic types and MIR
                                                                            generated to determine the exact item which is being in-
    were expected.
                                                                            voked and its mangled symbol name. Without applying the
       Throughout rustc’s code generation infrastructure, the
                                                                            Instance::polymorphize function, the instruction would
    ty::Instance type is used. Instance is constructed through
                                                                            refer to the mangled symbol for the unpolymorphised func-
    use      of     Instance::resolve        which      takes      a
                                                                            tion, which no longer exists.
    (DefId, SubstsRef, the matching in-
                                                                            is handled by a should_codegen_locally function which
    dex in the analysis’ BitSet is checked to determine whether
                                                                            checked whether upstream monomorphisations where avail-
    the parameter was used. If the parameter was used, then
                                                                            able for a given Instance. should_codegen_locally is in-
    the current parameter is kept. If the parameter was unused,
                                                                            voked before a mono item is created (and polymorphisation
    then the parameter is replaced with an identity substitu-
                                                                            occurs), so it looks for an unpolymorphised monomorphi-
    tion. During implementation, replacing the parameter with
                                                                            sation in the upstream crate. However, if the item would
    a new ty::Unused type was considered, but it was hypoth-
                                                                            be polymorphised then the upstream crate would only con-
    esised that this would require more changes to the compiler
                                                                            tain a polymorphised copy, which would result in unnec-
    than an identity substitution.
                                                                            essary code generation when compiling the current crate.
       For example, given an unsubstituted type containing
                                                                            should_codegen_locally was changed to always check for
    ty::Param(0), if the parameter were considered used by the
                                                                            a polymorphised monomorphisation in the upstream crate.
    analysis, then it would be substituted by the real type, but,
    if the parameter were considered unused, then it would be               3.3.2    Specialisation
    substituted with ty::Param(0) and remain the same.                         In Rust, data structures are defined separately from their
       During monomorphisation collection, as described in Sec-             methods. Multiple impl blocks can exist which implement
    tion 3.1.5, when a mono item is identified, it is added to a            methods on a type. For generic types, impl blocks can pick
    set. This project concerns itself with one kind of mono item,           concrete types for parameters or remain generic, but mul-
    a MonoItem::Fn which contains an Instance type. When a                  tiple impls cannot overlap. Specialisation is an unstable
    new MonoItem::Fn is created, Instance::polymorphize is                  feature in Rust which enables impl blocks to overlap if one
    invoked before it is added to the final mono item set. This             block is clearly “more specific” than another.
    has the effect of reducing the number of mono items.                       Instance::resolve can only resolve specialised functions
                                                                            of traits under certain circumstances and detects whether
1   fn foo(_: B) { }                                                  further specialisation can occur by whether the item needs
2                                                                           substitution (i.e. has unsubstituted type or constant param-
3   fn main() {                                                             eters). After integration of polymorphisation, this check had
4       foo::(1);                                                 to be more robust. In particular, polymorphisation meant
5       foo::(2);                                                 that closure types in Instance::resolve could need sub-
6       foo::(3);                                                 stitution (by containing a ty::Param from an unused par-
7   }                                                                       ent parameter) and would thus “need substitution” for the
                                                                            purposes of the specialisation check, despite this having no
    Listing 18: Example of an polymorphisation deduplication                impact on whether the item was further specialisable.
                                                                               Initially, this was resolved by implementing a fast type
       Consider the example in Listing 18, monomorphisation                 visitor which checked for generic parameters, but would not
    would normally produce three mono items (and three copies               visit the parent substitutions of a closure or generator. How-
    of the function):                                                       ever, this check is in the “fast path” and a visitor would

                                                                       10
have been too expensive, so this was replaced by a addition            generated to drop. InstanceDef::DropGlue’s DefId always
     to the TypeFlags infrastructure within the compiler (after             referred to the drop_in_place function.
     some refactoring [1] enabled this change), shown partially in             With polymorphisation, the Ty which contains the same identity param-
2    &ty::Closure(_, ref substs) => {                                       eters would be applied to the Ty and a InstanceDef. For normal functions,
     InstanceDef::Item(DefId) is the variant of InstanceDef
     which is used.            However, for drop glue, a                    4.   EVALUATION
     InstanceDef::DropGlue(DefId, Option is the type that the shim is being         on a set of benchmarks is compared with a build of the

                                                                       11
compiler without this project’s changes applied.                      mono items generated for these benchmarks was also com-
   For the purposes of evaluation, the compiler is built in           pared and the results are shown in Table 2.
release configuration from the Git commit with the hash
2f2ccd2 [26] (PR #69749 [28] at 9ef1d94 [27] merged with                   • script-servo is the script crate from Servo, the par-
master, 150322f [8]). 150322f is also built in release config-               allel browser engine used by Firefox.
uration to act as a baseline for comparisons.
                                                                           • regression-31157 is a small program which caused
   Benchmarking has been performed two times previously
                                                                             large performance regressions in earlier versions of the
and performance improvements identified from those earlier
                                                                             Rust compiler.
runs are implemented in the versions being compared in this
section, as discussed in Section 4.1.                                      • ctfe-stress-4 is a small program which is used to
   There are 40 benchmarks that are run in rustc’s stan-                     stress compile-time function evaluation.
dard performance benchmarking suite [29]. Benchmarks are
either real Rust programs from the ecosystem which are im-               In applicable benchmarks, compilation time performance
portant; real Rust programs from the ecosystem which are              on clean runs generally improves by 3-5%. Incremental runs
known to stress the compiler; or artificial stress tests.             generally have lower performance improvements, if any. script-
   All benchmarks are compiled with both debug and release            servo in the release configuration’s results show a 11.4%
profiles, some benchmarks are run with the check profile.             slowdown in patched incremental runs. Query profiling shows
Each benchmark is run at least four times:                            that this slowdown comes from LLVM LTO’s optimisations
                                                                      - this could be due to LTO being more effective and thus
   • clean: a non-incremental build.                                  more optimisation taking place or that the polymorphised
                                                                      functions make LTO more expensive to perform. However,
   • baseline incremental: an incremental build starting
                                                                      LTO does not appear to impact performance in the debug
     with empty cache.
                                                                      configuration or on other benchmarks. script-servo in the
   • clean incremental: an incremental build starting                 debug configuration shows a 5.1% performance improvement
     with complete cache, and clean source directory – the            which corresponds to an 11 second decrease in compilation
     “perfect” scenario for incremental.                              time.
                                                                         ctfe-stress-4’s mono-item count shows no reduction.
   • patched incremental: an incremental build starting               This isn’t surprising as the test contains mostly compile-
     with complete cache, and an altered source directory.            time evaluated functions (as expected for a test which aims
     The typical variant of this is “println” which represents        to stress compile-time function evaluation). This suggests
     the addition of a println! macro somewhere in the                that that polymorphisation may have reduced the number
     source code.                                                     of functions which had to be evaluated at compile-time.
                                                                         Memory usage across the benchmarks was very varied,
  perf is used to collect information on the execution of each
                                                                      ranging from 19% increases to 15% decreases, depending on
benchmark, notably: CPU clock; clock cycles; page faults;
                                                                      the benchmark. Like instructions executed, memory usage
memory usage; instructions executed; task clock and wall
                                                                      increases typically happened when incremental compilation
time. In addition, for each run, the execution count and
                                                                      was enabled, while decreases happened when incremental
total time spent on each query in rustc’s query system is
                                                                      compilation was disabled.
also captured - this allows for detailed comparisons between
                                                                         Earlier benchmarking revealed that re-execution of the
runs to better determine why there has been a performance
                                                                      used_generic_params query resulted in increased loads from
impact.
                                                                      crate metadata (intermediate data from compilation of de-
  Furthermore, for a subset of the benchmarks which had an
                                                                      pendencies) so the version of the query benchmarked in
appreciable performance impact, the number of mono-items
                                                                      this paper’s results were added to crate metadata and in-
generated are compared.
                                                                      cremental compilation storage. In addition, the BitSet re-
  Benchmarks were performed on a Linux host with an
                                                                      turned from the query was replaced with a u64 as a micro-
AMD Ryzen 5 3600 6-Core Processor and 64GB of RAM
                                                                      optimisation.
with ASLR disabled in both the kernel and userspace.

4.1   Discussion                                                      5.    CONCLUSIONS AND FUTURE WORK
   For a majority of the benchmarks, there was no apprecia-              This paper has presented a compilation time optimisation
ble difference in compilation time performance. This result           in the Rust compiler which reduces redundant monomor-
is expected, the optimisation would have limited impact in            phisation. Currently, this implementation from this project
programs which do not contain highly generic code contain-            is being reviewed as PR #69749 [28] in the upstream Rust
ing closures.                                                         project and preliminary work, such as PR #69935 [25], has
   In those benchmarks that did not show any significant              landed.
improvement or regression, any performance differences be-               There are no aspects of the Rust language itself which
tween the compiler versions are considered noise. The thresh-         makes implementation of this optimisation particularly chal-
old used for noise is determined by comparing compiler ver-           lenging or complex, most of the difficulty arises from imple-
sions that had no functional changes (“noise runs”) and ob-           mentation details of rustc. With that said, rustc is well
serving the impact on performance results on each bench-              suited to having this optimisation implemented.
mark.                                                                    Due to the structure of the compiler, in an ideal scenario,
   Three benchmarks were appreciably impacted by the opti-            only modifications to monomorphisation and call genera-
misation implemented in this PR, results for these in Table           tion are necessary to enable polymorphisation. Alternate
1 (complete results available online [29]). The number of             approaches, such as changing the the MIR of closures and

                                                                 12
generators to only inherit the parameters that they use from           [4] T. J. Edler von Koch, B. Franke, P. Bhandarkar, and
their parents, would be significantly more invasive and chal-              A. Dasgupta. Exploiting function similarity for code
lenging, as the current system has emergent properties which               size reduction. In Proceedings of the 2014
enables simplifications in other areas of the compiler.                    SIGPLAN/SIGBED Conference on Languages,
   However, rustc will not always fail eagerly when an invalid             Compilers and Tools for Embedded Systems, LCTES
case is encountered, which can significantly lengthen debug-               ’14, page 85–94, New York, NY, USA, 2014.
ging sessions as the root cause of a failure is determined.                Association for Computing Machinery.
This project has provided an excellent opportunity to learn            [5] J. Han, R. Fujino, R. Tamura, M. Shimaoka,
more about the structure of the Rust compiler, and the tech-               H. Mikami, M. Takamura, S. Kamiya, K. Suzuki,
niques used in modern, production compilers to implement                   T. Miyajima, K. Kimura, and et al. Reducing
LLVM IR generation.                                                        parallelizing compilation time by removing redundant
   Implementation of polymorphisation resulted in 3-5% com-                analysis. In Proceedings of the 3rd International
pilation time improvements in some benchmarks. While                       Workshop on Software Engineering for Parallel
there are further improvements that could be made to im-                   Systems, SEPS 2016, page 1–9, New York, NY, USA,
prove the integration and improve performance when incre-                  2016. Association for Computing Machinery.
mental compilation is enabled (see Section 5.1), this is a             [6] R. Jung, H.-H. Dang, J. Kang, and D. Dreyer. Stacked
promising result.                                                          borrows: An aliasing model for rust. Proc. ACM
                                                                           Program. Lang., 4(POPL), Dec. 2019.
5.1   Future Work                                                      [7] D.-H. Kim and H. J. Lee. Iterative procedural
   Investigation into the interaction between polymorphisa-                abstraction for code size reduction. In Proceedings of
tion and LTO should be performed to determine the source                   the 2002 International Conference on Compilers,
of the compile-time regression noticed in the script-servo                 Architecture, and Synthesis for Embedded Systems,
benchmark. In addition, tweaking of the circumstances un-                  CASES ’02, page 277–279, New York, NY, USA, 2002.
der which the query’s results are cached (both incremen-                   Association for Computing Machinery.
tally and cross-crate) should be performed to impact on                [8] T. R. P. Language. commit 150322f, 2020 (accessed
performance experienced when incremental compilation is                    March 25, 2020).
enabled.                                                                   https://github.com/rust-lang/rust/commit/
   With infrastructure in place to support polymorphic code                150322f86d441752874a8bed603d71119f190b8b.
generation, the polymorphisation analysis can be extended              [9] D. Leopoldseder, L. Stadler, T. Würthinger, J. Eisl,
to detect more advanced cases. For example, when only the                  D. Simon, and H. Mössenböck. Dominance-based
size of a type is used then monomorphisation can occur for                 duplication simulation (dbds): Code duplication to
the distinct sizes of instantiated type; or when the memory                enable compiler optimizations. In Proceedings of the
representation of a type is independent of the representation              2018 International Symposium on Code Generation
of its generic parameters, such as Vec (because the T is                and Optimization, CGO 2018, page 126–137, New
behind indirection), functions like Vec::len can be poly-                  York, NY, USA, 2018. Association for Computing
morphised.                                                                 Machinery.
   Furthermore, the integration currently doesn’t fully work          [10] Microsoft. Language server protocol, 2020 (accessed
with Rust’s new symbol mangling scheme, which isn’t yet                    March 29, 2020). https://microsoft.github.io/
enabled by default - support for this could be added.                      language-server-protocol/.
                                                                      [11] S. Overflow. Stack overflow developer survey 2016,
6.    ACKNOWLEDGEMENTS                                                     2016 (accessed March 25, 2020).
  Thanks to Eduard-Mihai Burtescu, Niko Matsakis and ev-                   https://insights.stackoverflow.com/survey/
eryone else in the Rust compiler team for their assistance                 2016#technology-most-loved-dreaded-and-wanted.
and time during this project. In addition, thanks to Michel           [12] S. Overflow. Stack overflow developer survey 2017,
Steuwer for his comments and guidance throughout the year.                 2017 (accessed March 25, 2020).
                                                                           https://insights.stackoverflow.com/survey/
                                                                           2017#most-loved-dreaded-and-wanted.
7.    REFERENCES                                                      [13] S. Overflow. Stack overflow developer survey 2018,
 [1] E.-M. Burtescu. rustc: keep upvars tupled in                          2018 (accessed March 25, 2020).
     Closure,Generatorsubsts., 2020 (accessed March 25,                    https://insights.stackoverflow.com/survey/
     2020).                                                                2018/#most-loved-dreaded-and-wanted.
     https://github.com/rust-lang/rust/pull/69968.                    [14] S. Overflow. Stack overflow developer survey 2019,
 [2] S. K. Debray, W. Evans, R. Muth, and B. De Sutter.                    2019 (accessed March 25, 2020). https://insights.
     Compiler techniques for code compaction. ACM Trans.                   stackoverflow.com/survey/2019#technology-_
     Program. Lang. Syst., 22(2):378–415, Mar. 2000.                       -most-loved-dreaded-and-wanted-languages.
 [3] I. Dragos and M. Odersky. Compiling generics through             [15] D. Petrashko, V. Ureche, O. Lhoták, and M. Odersky.
     user-directed type specialization. In Proceedings of the              Call graphs for languages with parametric
     4th Workshop on the Implementation, Compilation,                      polymorphism. In Proceedings of the 2016 ACM
     Optimization of Object-Oriented Languages and                         SIGPLAN International Conference on
     Programming Systems, ICOOOLPS ’09, page 42–47,                        Object-Oriented Programming, Systems, Languages,
     New York, NY, USA, 2009. Association for                              and Applications, OOPSLA 2016, page 394–409, New
     Computing Machinery.                                                  York, NY, USA, 2016. Association for Computing

                                                                 13
You can also read