Compiler Design Spring 2018 - Computer Science Department ETH Zurich, Switzerland

Page created by Paula Schultz
 
CONTINUE READING
Compiler Design
Spring 2018

Thomas R. Gross

Computer Science Department
ETH Zurich, Switzerland
                              1
Logistics
       § Lecture
            § Tuesdays: 10:15 – 11:55
            § Thursdays: 10:15 -- 11:55
            § In ETF E1
       § Recitation
            § Announced later
            § Watch lecture website and your ETH email
       Lecture website: via www.lst.inf.ethz.ch
            §   Lecture slides
            §   Homework assignment
            §   If questions related to assignments: contact assistants (mailing list)
            §   If questions related to lecture: write me

                                                                                         2
Rules

§ Rule #1: Peace in the lecture hall

                                       3
What do you want to get out of this class?

§ Please fill out entry questionnaire
   § It’s anonymous

                                             4
What I hope to teach you in this class

       1. Compiler design: Structure of a simple compiler
          § Simple: 2-3K lines of Java code (maybe a bit more)
          § Industry: C1 compiler in HotSpot VM is considered “simple”
              § 30K lines of C/C++/assembly code

       2. Software engineering: How to design a large(r) software
          system
          § Sometimes there is no “right” or ”wrong”
          § Sometimes there is
       3. Programming
          § What the programming language design document should tell you
          § How to use that information

                                                                            8
Course structure

§ You will not learn the material from lectures alone
§ Homework is essential!

                                                        9
Homework

§ Core element of the course
§ You will build a compiler
   § More on this topic (organization, constraints) later

                                                            10
Compiler design and implementation

§ What is your favorite compiler?
   Please talk to your neighbor and tell him/her which compiler(s) you used and if you have a
   “favorite” compiler.
§ Why?
   Justify your answer. Can you and your neighbor agree on what matters to you in a compiler?

                                                                                                11
12
Observations

§ Languages are important
   § Source language L1
   § Target language L2
   § Host language LH
§ Programs can be “executed”
   § Program is a sequence of expressions E1, E2, …
   § A processor contains state
   § Execution of expressions: Each expression Ei may read state, modify state, and determine next
     expression to execute Ej
   § A special expression Estop indicates that program execution stops

                                                                                                     14
Program execution

§ Execution (”elaboration”) of expressions E1, E2, … by some machine M
   § M realized by hardware – physical processor
   § M defined by software – “virtual machine”
   § Other possibilities
§ Expressions E1, E2, … also referred to as “statements” or “operations”
§ Elaboration sometimes referred to as interpretation
   § The word interpretation sometimes hints at “direct execution”

                                                                           15
Issues

§ Languages: Choices for L1 and L2

                                     16
Languages

Please talk to your neighbor and find at least three languages that could serve as
either source language L1 or target language L2 for a compiler.
Think about compilers you used (or would have liked to use).

                                                                                     17
18
19
Languages L1 and   L2

L1                      L2
C                       Machine instruction
ASM                     ASM
LLVM                    C
                        LLVM
Java                    Java Byte Code
C#
Scala
JavaScript
Python                  JavaScript

                                              23
(More) languages L1 and L2
       php
       html

       pdf
       dvi

       Latex
       Tex

       VHDL

       SQL
       Lisp
       Haskell
       Prolog
                             24
Issues (continued)

§ Languages: Choices for L1 and L2
   § Program written in L1 (PL1) translated into program written in L2 (PL2)
   § PL 1 à PL 2
§ Aspects of translation of programs PL1 à PL2
   § What does it mean that PL2 is a “translation” of PL1
   § PL2 should produce the “same” result as PL1

                                                                               25
Semantics

      § Describes the “meaning” of programs
         § Meaning of program defined by meaning of statements or operations
      § Formal specification
         1.   Operational semantics
              § Abstract machine A
              § Sequences of steps interpreted (“elaboration”)
              § Effect on A determines meaning
         2.   Denotational semantics
              § Mathematical construct describes effect
              § Can be manipulated (composition, projection, …)
         3.   Axiomatic semantics
              § Assertions on program state and rules that describe the effect of operations

      § Other ways: natural language, reference implementation
                                                                                               26
Semantics

§ Translated (target) program PL2 has the same meaning as the (source) program PL1
§ At least: computes the same result(s) for all legal inputs
§ Same: must be defined...

§ What about illegal inputs?
§ What about non-functional properties?

                                                                                 27
30
Reasons for translation

       § A compiler translates a program written in language L1 into
         language L2.
       § Reasons to translate PL1 à PL2
          §   Faster execution of PL2
          §   No real machine to run PL1
          §   No abstract machine (virtual machine) to run PL1
          §   PL2 can be realized (in hardware)
          §   (L1==L2) PL2 is more readable/optimized/stable
               § Special case: L1=asm, binary rewriting tool adds bounds checks
          § PL1 cannot be edited (by humans)
               § Compiler Java byte code to Java
          § PL2 requires less energy
                                                                                  31
Complications

§ L1 and L2 have different resource models
§ L1: no limit on resources, flexible description
§ L2: finite resources, inflexible description, hardware-based

                                                                 32
Complications

§ L1: no limit on resources       § L2: finite resources
   §   ∞ number of variables         §   Fixed number of registers
   §   ∞ lines of code               §   Limited storage
   §   ∞ number of methods           §   Finite representation
   §   ∞ data space                  §   Machine properties matter
   §   ∞ nesting                          §   Caches
                                          §   TLBs
   §   ∞ characters in var name
                                          §   NUMA
                                          §   …

                                                                     33
Compiler task: Translate PL1 à PL2

§ Management of resources
§ Preservation of semantics
   § Is meaning defined?
   § For all possible inputs?
§ Check constraints on PL1
   § Bailout: Not every program can be translated
§ Not every aspect can be checked by compiler
   § Escape: compiler inserts code into PL2 to check properties of program during execution (“at
     runtime”)

                                                                                                   34
Compiler Design
Spring 2018

1.1 Simple compiler model

Thomas R. Gross

Computer Science Department
ETH Zurich, Switzerland
                              1
1.1 Simple and realistic compiler model

       § Simple: Can be handled in one semester, 8 credits
          § Two persons to work on the same project (more about teams later)
       § Realistic: Experience problems encountered by real compilers
          § Mirrors structure of many compilers

                                                                               2
Compiler model
            Source
           program

           Compiler

            ASM file

           Assembler

           Object file

                         4
Compiler model

§ Compilation prior to execution
   § AOT “Ahead of (Execution) Time” compilation
   § Commonly used for languages without language-specific execution environments (e.g., C, C++)
   § Available in Java as well (IBM J9, Oracle HotSpot)
§ Other model: Continuous compilation
   § JIT “Just in Time” compilation
   § Usually: optimization of methods that are frequently invoked (hot)
   § Commonly used with language virtual machines (e.g., Java VM)
       § E.g., HotSpot JVM has two JIT compilers (C1 and C2)

                                                                                                   5
Compiler model
            Source
                          “Front-
           program                    Read input, transform
                           end”

                                      Intermediate
           Compiler          IR
                                      representation

                                      Manage machine
                         “Back-end”
                                       resources
            ASM file
                                      Generate code

           Assembler

           Object file

                                                              7
Compiler model
            Source
                          “Front-     “Front-   “Front-
           program
                           end”        end”      end”

           Compiler          IR

                         “Back-end”
            ASM file

           Assembler

           Object file

                                                          9
IR – Intermediate representation

       § Compiler-internal representation
          § E.g., compiler must distinguish between names in different scopes
          § E.g., many programs work with variables, computers work with
            locations
       § Must express all language constructs/concepts
       § Code generator maps IR to assembly code
          § Machine code another option
       § No “best” IR – all are compromises

                                                                                11
Compiler model
             Source
                          “Front-
            program
                           end”

            Compiler         IR       Optimizer

                         “Back-end”
            ASM file

           Assembler

           Native code

                                                  13
You can also read