Pointer and Escape Analysis for Multithreaded Programs ff

Page created by Frances Rodgers
 
CONTINUE READING
Pointer and Escape Analysis for Multithreaded Programs ∗

                             Alexandru Salcianu
                                        ®                                                             Martin Rinard
                   Laboratory for Computer Science                                         Laboratory for Computer Science
                  Massachusetts Institute of Technology                                   Massachusetts Institute of Technology
                        Cambridge, MA 02139                                                     Cambridge, MA 02139
                           salcianu@lcs.mit.edu                                                   rinard@lcs.mit.edu

ABSTRACT                                                                            threads packages such as POSIX threads is unstructured —
This paper presents a new combined pointer and escape                               child threads execute independently of their parent threads.
analysis for multithreaded programs. The algorithm uses                             The software structuring techniques described above are de-
a new abstraction called parallel interaction graphs to an-                         signed to work with this form of multithreading, as are
alyze the interactions between threads and extract precise                          many recommended design patterns [13]. But because the
points-to, escape, and action ordering information for ob-                          lifetimes of child threads potentially exceed the lifetime of
jects accessed by multiple threads. The analysis is compo-                          their starting procedure, unstructured multithreading sig-
sitional, analyzing each method or thread once to extract                           nificantly complicates the interprocedural analysis of multi-
a parameterized analysis result that can be specialized for                         threaded programs.
use in any context. It is also capable of analyzing programs
that use the unstructured form of multithreading present in                         1.1    Analysis Algorithm
languages such as Java and standard threads packages such                              This paper presents a new combined pointer and escape
as POSIX threads.                                                                   analysis for multithreaded programs, including programs
   We have implemented the analysis in the MIT Flex com-                            with unstructured forms of multithreading. The algorithm
piler for Java and used the extracted information to 1) ver-                        is based on a new abstraction, parallel interaction graphs,
ify that programs correctly use region-based allocation con-                        which maintain precise points-to, escape, and action order-
structs, 2) eliminate dynamic checks associated with the use                        ing information for objects accessed by multiple threads.
of regions, and 3) eliminate unnecessary synchronization.                           Unlike previous escape analysis abstractions, parallel inter-
Our experimental results show that analyzing the interac-                           action graphs enable the algorithm to analyze the interac-
tions between threads significantly increases the effective-                        tions between parallel threads. The analysis can therefore
ness of the region analysis and region check elimination, but                       capture objects that are accessed by multiple threads but
has little effect for synchronization elimination.                                  do not escape a given multithreaded computation. It can
                                                                                    also fully characterize the points-to relationships for objects
1.   INTRODUCTION                                                                   accessed by multiple parallel threads.
                                                                                       Because parallel interaction graphs characterize all of the
  Multithreading is a key structuring technique for modern                          potential interactions of the analyzed method or thread with
software. Programmers use multiple threads of control for                           its callers and other parallel threads, the resulting analysis
many reasons: to build responsive servers that communicate                          is compositional at both the method and thread levels —
with multiple parallel clients [15], to exploit the parallelism                     it analyzes each method or thread once to produce a single
in shared-memory multiprocessors [5], to produce sophisti-                          general analysis result that can be specialized for use in any
cated user interfaces [16], and to enable a variety of other                        context.1 Finally, the combination of points-to and escape
program structuring approaches [11].                                                information in the same abstraction enables the algorithm
  Research in program analysis has traditionally focused                            to analyze only part of the program, with the analysis result
on sequential programs [14]; extensions for multithreaded                           becoming more precise as more of the program is analyzed.
programs have usually assumed a block structured, parbe-
gin/parend form of multithreading in which a parent thread                          1.2    Application to Region-Based Allocation
starts several parallel threads, then immediately blocks wait-
ing for them to finish [12, 19]. But the standard form of                              We have implemented our analysis in the MIT Flex com-
multithreading supported by languages such as Java and                              piler for Java. The information that it produces has many
                                                                                    potential applications in compiler optimizations, software
∗
 The research was supported in part by DARPA/AFRL Con-                              engineering, and as a foundation for further program anal-
tract F33615-00-C-1692, NSF Grant CCR00-86154, and NSF Grant                        ysis. This paper presents our experience using the analysis
CCR00-63513.
                                                                                    to optimize and check safety conditions for programs that
Permission to make digital or hard copies of all or part of this work for
                                                                                    use region-based allocation constructs instead of relying on
personal or classroom use is granted without fee provided that copies are
not made or distributed for pro£t or commercial advantage and that copies           garbage collection. Region-based allocation allows the pro-
bear this notice and the full citation on the £rst page. To copy otherwise, to      gram to run (a potentially multithreaded) computation in
republish, to post on servers or to redistribute to lists, requires prior speci£c
                                                                                    1
permission and/or a fee.                                                              Recursive methods or recursively generated threads may require an
PPoPP’01, June 18-20, 2001, Snowbird, Utah, USA.                                    iterative algorithm that may analyze methods or threads in the same
Copyright 2001 ACM 1-58113-346-401/0006 ...$5.00.                                   strongly connected component multiple times to reach a fixed point.
the context of a specific allocation region. All objects cre-     computation. Each Task stores an Integer object in its
ated by the computation are allocated in the region and           source field as input and produces a new Integer object in
deallocated when the computation finishes. To avoid dan-          its target field as output.2
gling references, the implementation must ensure that the            This program illustrates several common patterns for mul-
objects in the region do not outlive the associated compu-        tithreaded programs. First, it uses threads to implement
tation. One standard way to achieve this goal is to dynam-        parallel computations. Second, when a thread starts its ex-
ically check that the program never attempts to create a          ecution, it points to objects that hold the input data for
reference from one object to another object allocated in a        its computation. Finally, when the computation finishes, it
region with a shorter lifetime [4]. If the program does at-       writes references to its result objects into its thread object
tempt to create such a reference, the implementation refuses      for the parent computation to read.
to create the reference and throws an exception. Unfortu-
nately, this approach imposes dynamic checking overhead           class main {
and introduces a new failure mode for programs that use             public static void main(String args[]) {
                                                                      int i = Integer.parseInt(args[0]);
region-based allocation.                                              Fib f = new Fib(i);
   We have used our analysis to statically verify that our            Region r = new Region();
multithreaded benchmark programs use region-based allo-               r.enter(f);
cation correctly. It therefore provides a safety guarantee to       }
the programmer and enables the compiler to eliminate the          }
dynamic region reference checks. We also found that in-           class Fib implements Runnable {
                                                                    int source;
trathread analysis alone is not powerful enough — the algo-
rithm must analyze the interactions between parallel threads          Fib(int i) { source = i; }
to verify the correct use of region-based allocation.
   We also used our analysis for the more traditional pur-            public void run() {
pose of synchronization elimination. While our algorithm is             Task t = new Task(new Integer(source));
quite effective at enabling this optimization, for our multi-           t.start();
                                                                        try {
threaded benchmarks, the interthread analysis provides lit-               t.join();
tle additional benefit over the standard intrathread analysis.          } catch (Exception e) { System.out.println(e); }
                                                                        System.out.println(t.target.toString());
1.3     Contributions                                                 }
     This paper makes the following contributions:                }
                                                                  class Task extends Thread {
      • Abstraction: It presents a new abstraction, paral-          public Integer source;
        lel interaction graphs, for the combined pointer and        public Integer target;
        escape analysis of programs with unstructured multi-          Task(Integer s) { source = s; }
        threading.
                                                                      public void run() {
      • Analysis: It presents a new algorithm for analyz-               int v = source.intValue();
        ing multithreaded programs. The algorithm is com-               if (v
these objects are contained in the lifetime of the Fibonacci       2.4     Analysis in the Example
computation, and die when this computation finishes. A               We use a generalized escape analysis to determine whether
standard memory management system would not exploit                any object allocated in a given region escapes the compu-
this property. The Task and Integer objects would be allo-         tation associated with the region. If none of the objects
cated out of the garbage-collected heap, increasing the mem-       escape, the program will never attempt to create a dangling
ory consumption rate, the garbage collection frequency, and        reference and the compiler can eliminate all of the checks.
therefore the garbage collection overhead.                         The algorithm first performs an intrathread, interprocedural
   Region-based allocation provides an attractive alterna-         analysis to derive a parallel interaction graph at the end of
tive. Instead of allocating all objects out of a single garbage-   each method. Figures 2 and 3 present the analysis results for
collected heap, region-based approaches allow the program          the run methods in the Fib and Task classes, respectively.
to create multiple memory regions, then allocate each object
in a specific region. When the program no longer needs any         2.4.1    Points-to Graphs
of the objects in the region, it deallocates all of the objects       The first component of the parallel interaction graph is the
in that region without garbage collection.                         points-to graph. The nodes in this graph represent objects;
   Researchers have proposed many different region-based al-       the edges represent references between objects. There are
location systems. Our example (and our implemented sys-            two kinds of edges: inside edges, which represent references
tem) uses the approach standardized in the Real-Time Java          created within the analyzed part of the program (for Fig-
specification [4]. Before the main program invokes the Fi-         ure 2, the sequential computation of the Fib.run method),
bonacci computation, it creates a new memory region r.             and outside edges, which represent references read from ob-
The statement r.enter(f) executes the run method of the            jects potentially accessed outside the analyzed part of the
f object (and all of the methods or threads that it exe-           program. In our figures, solid lines denote inside edges and
cutes) in the context of the new region r. When one of             dashed lines denote outside edges.
the threads in this computation creates a new object, the             There are also several kinds of nodes. Inside nodes rep-
object is allocated in the region r. When the entire multi-        resent objects created within the analyzed part of the pro-
threaded computation terminates, all of the objects in the         gram. There is one inside node for each object creation
region are deallocated without garbage collection. The Task        site in the program; that node represents all objects created
and Integer objects are therefore managed independently of         at that site. Parameter nodes represent objects passed as
the garbage collected heap and do not increase the garbage         parameters to the currently analyzed method; load nodes
collection frequency or overhead. Region-based allocation          represent objects accessed by reading a reference in an ob-
is an attractive alternative to garbage collection because it      ject potentially accessed outside the analyzed part of the
exploits the correspondence between the lifetimes of objects       program. Together, the parameter and load nodes make up
and the lifetimes of computations to deliver a more efficient      the set of outside nodes. In our figures, solid circles denote
memory management mechanism.                                       inside nodes and dashed circles denote outside nodes.
                                                                      In Figure 2, nodes 1 and 4 are outside nodes. Node 1
2.3   Regions and Dangling Reference Checks                        represents the this parameter of the method, while node 4
   One potential problem with region-based allocation is the       represents the object whose reference is loaded by the ex-
possibility of dangling references. If an object whose lifetime    pression t.target at line 2 of the example at the end of the
exceeds the region’s lifetime refers to an object allocated in-    Fib.run method. Nodes 2 and 3 are inside nodes, and de-
side the region, any use of the reference after the region is      note the Task and Integer objects created in the statement
deallocated will access potentially recycled garbage, violat-      Task t = new Task(new Integer(source)) at line 1 of the
ing the memory safety of the program. The Real-Time Java           example.
specification eliminates this possibility as follows. It allows
the computation to create a hierarchy of nested regions and        2.4.2    Started Thread Information
ensures that no parent region is deallocated before one of its        The parallel interaction graph contains information about
child regions. Each region is associated with a (potentially       which threads were started by the analyzed part of the pro-
multithreaded) computation; the objects in the region are          gram. In Figure 2, node 2 represents the started Task thread
deallocated when its computation terminates and the ob-            that implements the entire Fibonacci computation. In Fig-
jects in all of its child regions have been deallocated. The       ure 3, nodes 8 and 11 represent the two threads that im-
implementation dynamically checks all assignments to ob-           plement the parallel subtasks in the computation. The in-
ject fields to ensure that the program never attempts to           terthread analysis uses the started thread information when
create a reference that goes down the hierarchy from an ob-        it computes the interactions between the current thread and
ject in an ancestor region to an object in a child region. If      threads that execute in parallel with the current thread.
the program does attempt to create such a reference, the
check fails. The implementation prevents the assignment            2.4.3    Escape Information
from taking place and throws an exception.                            The parallel interaction graph contains information about
   While these checks ensure the memory safety of the exe-         how objects escape the analyzed part of the program to be
cution, they impose additional execution time overhead and         accessed by the unanalyzed part. A node escapes if it is a
introduce a new failure mode for the software. Our goal is         parameter node or represents an unanalyzed thread started
to analyze the program and statically verify that the checks       within the analyzed part of the program. It also escapes if
never fail. Such an analysis would enable the compiler to          it is reachable from an escaped node. In Figure 2, node 1
eliminate all of the dynamic region checks. It would also          escapes because it is passed as a parameter, while nodes 3
provide the programmer with a guarantee that the program           and 4 escape because they are reachable from the unana-
would never throw an exception because a check failed.             lyzed thread node 2.
Points-to Information                    Escape Information

        this               1                                              Points-to Information
                                        1 is a parameter node                                                Escape Information

                           3                  is an unanalyzed
                                        2                                  this                 1
          source                             started thread node
   t            2                       3 is reachable from 2                                            1 is a parameter node
                                                                                               3
          target                                                             source
                                        4 is reachable from 2
                           4                                                                                   is an unanalyzed
                                                                            t          2                 8
                                                                                                              started thread node
                                                                                            target
                                                                                                         9 is reachable from        8
                       inside edge               inside node                                    7
                                                                                                         10 is reachable from       8
                       outside edge              outside node
                                                                                                9
                                                                                                               is an unanalyzed
                                                                                source                   11
         Figure 2: Analysis Result for Fib.run                                                                started thread node
                                                                                       8                 12 is reachable from 11
      Points-to Information                  Escape Information
                                                                                target
                                                                                                10       13 is reachable from 11
                           6
         source

  this          5                       5 is a parameter node                                   12
                        target                                                  source
                                        6 is reachable from        5
                            7                                                          11
                                        7 is reachable from        5            target
                              9            is an unanalyzed                                     13
           source                       8
                                          started thread node
   t1              8                    9 is reachable from        8   Figure 5: Analysis Result After First Interthread
                                                                       Analysis
           target
                                        10 is reachable from       8
                           10                                               Points-to Information              Escape Information
                                              is an unanalyzed
                                        11
                                             started thread node                this                1        1 is a parameter node
                           12
           source                       12 is reachable from 11
                                                                                                    3
   t2           11                      13 is reachable from 11                 source
           target
                                                                                t          2
                            13                                                                 target

        Figure 3: Analysis Result for Task.run                                                       7

       Points-to Information                   Points-to Information                source
         from Fib.Run                            from Task.Run
                                                                            8                  9
                                     Mappings
       this               1                                                         target

                          3                                6                        source
          source                             source
                                                                           11                  12
  t            2                                    5
                                                        target
          target                                                                    target
                           4                               7
                                                                          Figure 6: Final Analysis Result for Fib.run

Figure 4: Mappings for Interthread Analysis of
Fib.run and Task.run
2.5     Interthread Analysis                                      Note that the matching process models interactions in which
   Previously proposed escape analyses treat threads very         one thread reads references created by the other thread. Be-
conservatively — if an object is reachable from a thread ob-      cause the threads execute in parallel, the matching is sym-
ject, the analyses assume that it has permanently escaped [2,     metric.
3, 6, 22]. Our algorithm, however, analyzes the interactions        The analysis uses µ1 and µ2 to combine the two parallel
between threads to recapture objects accessed by multiple         interaction graphs and obtain a new graph that represents
threads. The foundation of the interthread analysis is the        the combined effect of the two threads. Figure 5 presents
construction of two mappings µ1 and µ2 between the nodes          this graph, which the analysis computes as follows:
of the parallel interaction graphs of the parent and child
threads. Each outside node is mapped to another node if                • Edge Projections: The analysis projects the edges
the two nodes represent the same object during the analy-                through the mappings to augment nodes from one par-
sis. The mappings are used to combine the parallel interac-              allel interaction graph with edges from the other graph.
tion graph from the child thread into the parallel interaction           In our example, the analysis projects the inside edge
graph from the parent thread. The result is a new parallel               from node 5 to node 6 through µ2 to generate new
interaction graph that summarizes the parallel execution of              inside edges from node 2 to nodes 3 and 7. It also
the two threads.                                                         generates other edges involving outside nodes, but re-
   Figure 4 presents the mappings from the interthread anal-             moves these edges during the simplification step.
ysis of Fib.run and the Task.run method for the thread that
Fib.run starts. The algorithm computes these mappings as               • Graph Combination: The analysis combines the
follows:                                                                 two graphs, omitting the outside node that represents
                                                                         the this parameter of the started thread (node 5 in
      • Initialization: Inside the Fib.run method, node 2                our example).
        represents the started Task thread. Inside the Task.run
        method, node 5 represents the same started thread.             • Simplification: The analysis removes all outside edges
        The algorithm therefore initializes µ2 to map node 5             from captured nodes, all outside nodes that are not
        to node 2.                                                       reachable from a parameter node or unanalyzed started
                                                                         thread node, and all inside nodes that are not reach-
      • Matching target edges: The analysis of the Task.run              able from a live variable, parameter node, or unana-
        method creates inside edges from node 5 to nodes 6               lyzed started thread node.
        and 7. These edges have the label target, and rep-
                                                                  In our example, the analysis recaptures the (now analyzed)
        resent references between the corresponding Task and
                                                                  thread node 2. Nodes 3 and 7 are also captured even though
        Integer objects during the execution of the Task.run
                                                                  they are reachable from a thread node. The analysis removes
        method.
                                                                  nodes 4 and 6 in the new graph because they are not reach-
        The Fib.run method reads these references to obtain       able from a parameter node or unanalyzed thread node.
        the result of the Task.run method. The outside edge       Note that because the interactions with the thread nodes
        from node 2 to node 4 represents these references dur-    8 and 11 have not yet been analyzed, those nodes and all
        ing the analysis of the Fib.run method. The analysis      nodes reachable from them escape.
        therefore matches the outside edge from the Fib.run          Because our example program uses recursively generated
        method (from node 2 to node 4) against the inside         parallelism, the analysis must perform a fixed point compu-
        edges from the Task.run method to compute that node       tation during the interthread analysis. Figure 6 presents the
        4 represents the same objects as nodes 6 and 7. The       final parallel interaction graph from the end of the Fib.run
        result is that µ1 maps node 4 to nodes 6 and 7.           method, which is the result of this fixed point analysis. The
                                                                  analysis has recaptured all of the inside nodes, including the
      • Matching source edges: The analysis of the Fib.run        task nodes. Because none of the objects represented by these
        method creates an inside edge from node 2 to node 3.      nodes escapes the computation of the Fib.run method, its
        This edge has the label source, and represents a ref-     execution in a new region will not violate the region refer-
        erence between the corresponding Task and Integer         encing constraints.
        objects during the execution of the Fib.run method.
        The Task.run method reads this reference to obtain        3.    ANALYSIS ABSTRACTION
        its input. The outside edge from node 5 to node 6            We next formally present the abstraction (parallel interac-
        represents this reference during the analysis of the      tion graphs) that the analysis uses. In addition to the points-
        Task.run method. The interthread analysis therefore       to and escape information discussed in Section 2, parallel
        matches the outside edge from the Task.run method         interaction graphs can also represent ordering information
        (from node 5 to node 6) against the inside edge from      between actions (such as synchronization actions) from par-
        the Fib.run method (from node 2 to node 3) to com-        ent and child threads. This ordering information enables the
        pute that node 6 represents the same objects as node      analysis to determine when thread start events temporally
        3. The result is that µ2 maps node 6 to node 3.           separate actions of parent and child threads. This infor-
                                                                  mation may, for example, enable the analysis to determine
      • Transitive Mapping: Because µ1 maps node 4 to             that a parent thread performs all of its synchronizations on
        node 6 and µ2 maps node 6 to node 3, the analysis         a given object before a child thread starts its execution and
        computes that node 4 represents the same object as        synchronizes on the object. To simplify the presentation, we
        node 3. The result is that µ1 maps node 4 to node 3.      assume that the program does not use static class variables,
all the methods are analyzable and none of the methods re-                          – hhsync, n1 , n2 i, ni ∈ π if the synchronization ac-
turns a result. Our implemented analysis correctly handles                            tion hsync, n1 , n2 i may have happened after one
all of these aspects [20].                                                            of the threads represented by n started executing.
                                                                                      In this case, the actions of a thread represented
3.1     Object Representation                                                         by n may conflict with the action.
  The analysis represents the objects that the program ma-                          – hhn1 , f, n2 i, ni ∈ π if a reference represented by
nipulates using a set n ∈ N of nodes, which is the disjoint                           the outside edge hn1 , f, n2 i may have been read
union of the set NI of inside nodes and the set NO of outside                         after one of the threads represented by n started
nodes. The set of thread nodes NT ⊆ NI represents thread                              executing. In this case, the outside edge may rep-
objects. The set of outside nodes is the disjoint union of the                        resent a reference written by a thread represented
set NL of load nodes and the set NP of parameter nodes.                               by n.
There is also a set f ∈ F of fields in objects, a set v ∈ V of
local and parameter variables, and a set l ∈ L ⊆ V of local                We use the notation π@n = {a|ha, ni ∈ π} to denote the set
variables.                                                                 of actions and outside edges in π that may occur in parallel
                                                                           with a thread represented by n.
3.2     Points-To Escape Graphs
  A points-to escape graph is a triple hO, I, ei, where                    4.    ANALYSIS ALGORITHM
      • O ⊆ N × F × NL is a set of outside edges. We use the                  For each program point, the algorithm computes a paral-
        notation O(n1 , f) = {n2 |hn1 , f, n2 i ∈ O}.                      lel interaction graph for the current analysis scope at that
                                                                           point. For the intraprocedural analysis, the analysis scope
      • I ⊆ (N × F × N ) ∪ (V × N ) is a set of inside edges.              is the currently analyzed method up to that point. The
        We use the notation I(v) = {n|hv, ni ∈ I}, I(n1 , f) =             interprocedural analysis extends the scope to include the
        {n2 |hn1 , f, n2 i ∈ I}.                                           (transitively) called methods; the interthread analysis fur-
                                                                           ther extends the scope to include the started threads.
      • e : N → P(N ) is an escape function that records the                  We next present the analysis, identifying the program rep-
        escape information for each node.3 A node escapes if               resentation, the different phases, and the key algorithms in
        it is reachable from a parameter node or from a node               the interprocedural and interthread phases.
        that represents an unanalyzed parallel thread.
                                                                           4.1   Program Representation
  The escape function must satisfy the invariant that if n1                   The algorithm represents the computation of each method
points to n2 , then n2 escapes in at least all of the ways that            using a control flow graph. We assume the program has been
n1 escapes. When the analysis adds an edge to the points-                  preprocessed so that all statements relevant to the analy-
to escape graph, it updates the escape function so that it                 sis are either a copy statement l = v, a load statement
satisfies this invariant. We define the concepts of escaped                l1 = l2 .f, a store statement l1 .f = l2 , a synchroniza-
and captured nodes as follows:                                             tion statement l.acquire() or l.release(), an object cre-
                                                                           ation statement l = new cl, a method invocation statement
      • escaped(hO, I, ei, n) if e(n) 6= ∅
                                                                           l0 .op(l1 , . . . , lk ), or a thread start statement l.start().
      • captured(hO, I, ei, n) if e(n) = ∅                                    The control flow graph for each method op starts with an
                                                                           enter statement enterop and ends with an exit statement
3.3     Parallel Interaction Graphs                                        exitop .
  A parallel interaction graph is a tuple hhO, I, ei, τ, α, πi:            4.2   Intraprocedural Analysis
      • The thread set τ ⊆ N represents the set of unanalyzed                 The intraprocedural analysis is a forward dataflow analy-
        thread objects started by the analyzed computation.                sis that propagates parallel interaction graphs through the
                                                                           statements of the method’s control flow graph. Each method
      • The action set α records the set of actions executed               is analyzed under the assumption that the parameters are
        by the analyzed computation. Each synchronization                  maximally unaliased, i.e., point to different objects. For a
        action hsync, n1 , n2 i ∈ α has a node n1 that represents          method with formal parameters v0 , . . . , vn , the initial par-
        the object on which the action was performed and a                 allel interaction graph at the entry point of the method is
        node n2 that represents the thread that performed the              hh∅, {hvi , nvi i}, λn.if n = nvi then {n} else ∅i, ∅, ∅, ∅i, where
        action. If the action was performed by the current                 nvi is the parameter node for parameter vi . If the method
        thread, n2 is the dummy current thread node nCT ∈                  is invoked in a context where some of the parameters may
        NT . Our implementation can also record actions such               point to the same object, the interprocedural analysis de-
        as reading an object, writing an object, or invoking               scribed below in Section 4.4 merges parameter nodes to con-
        a given method on an object. It is straightforward                 servatively model the effect of the aliasing.
        to generalize the concept of actions to include actions               The transfer function hG0 , τ 0 , α0 , π 0 i = [[st]](hG, τ, α, πi)
        performed on multiple objects.                                     models the effect of each statement st on the current par-
                                                                           allel interaction graph. Figure 7 graphically presents the
      • The action order π records ordering information be-                rules that determine the new points-to graphs for the differ-
        tween the actions of the current thread and threads                ent basic statements. Each row in this figure contains four
        that execute in parallel with the current thread.                  items: a statement, a graphical representation of existing
3
  Here P(N ) is the set of all subsets of N , so that e(n) is the set of   edges, a graphical representation of the existing edges plus
nodes through which n escapes.                                             the new edges that the statement generates, and a set of side
conditions. The interpretation of each row is that whenever
                                                              the points-to escape graph contains the existing edges and
                                                              the side conditions are satisfied, the transfer function for
                                                              the statement generates the new edges. Assignments to a
                                                              variable kill existing edges from that variable; assignments
                                                              to fields of objects leave existing edges in place.
                                                                 In addition to updating the outside and inside edge sets,
                                                              the transfer function also updates the the escape function e
                                                              to ensure that if n1 points to n2 , then n2 escapes in at least
                                                              all of the ways that n1 escapes. Except for load statements,
                                                              the transfer functions leave τ , α, and π unchanged. For a
                                                              load statement l1 = l2 .f the transfer function updates the
                                                              action order π to record that any new outside edges may be
                                                              created in parallel with the threads modeled by the nodes
                                                              in τ (here nL is the load node for l1 = l2 .f):
                                                              π 0 = π ∪ {hn1 , f, nL i|n1 ∈ I(l2 ) ∧ escaped(hO, I, ei, n1 )} × τ
                                                                 Figure 8 presents the transfer function for an l.start()
                                                              statement, which adds the started thread nodes to τ and
                                                              updates the escape function. Figure 9 presents the trans-
               where
                                                              fer function for synchronization statements, which add the
                                                              corresponding synchronization actions into α and record the
                                                              actions as executing in parallel with all of the nodes in τ .
                                                              At control-flow merges, the confluence operation takes the
                                                              union of the inside and outside edges, thread sets, actions,
                                                              and action orders.

                                                              4.3   Mappings
                                                                 Mappings µ : N → P(N ) implement the substitutions
                                                              that take place when combining parallel interaction graphs.
                                                              During the interprocedural analysis, for example, a param-
                                                              eter node from a callee is mapped to all of the nodes at the
                                                              call site that may represent the corresponding actual pa-
                                                              rameter. Given an analysis component ξ, ξ[µ] denotes the
                                                              component after replacing each node n in ξ with µ(n):4
                                                                      S
               where                                            τ [µ]=Sn∈τ µ(n)
                                                               O[µ]= hn,f,n i∈O µ(n) × {f} × {nL }
                                                                      S        L                             S
                                                                I[µ]= hn ,f,n i∈I µ(n1 ) × {f} × µ(n2 ) ∪ hv,ni∈I {v} × µ(n)
                                                                      S 1 2
                                                               α[µ]= hsync,n ,n i∈α {sync} × µ(n1 ) × µ(n2 )
                                                                      S         1 2
                                                                π[µ]= hhsync,n ,n i,ni∈π ({sync} × µ(n1 ) × µ(n2 )) × µ(n)∪
                                                                      S          1 2

                                                                        hhn1 ,f,n2 i,ni∈π (µ(n1 ) × {f} × µ(n2 )) × µ(n)

                                                              4.4   Interprocedural Analysis
                                                                 The interprocedural analysis computes a transfer function
 Figure 7: Generated Edges for Basic Statements               for each method invocation statement. We assume a method
                                                              invocation site of the form l0 .op(l1 , . . . , lk ), a potentially
                                                              invoked method op with formal parameters v0 , . . . , vk with
   τ 0 =τ∪ I(l)                                              corresponding parameter nodes nv0 , nv1 , . . . , nvk , a paral-
         e(n) ∪ {n0 }   if n0 ∈ I(l) and                     lel interaction graph hhO1 , I1 , e1 i, τ1 , α1 , π1 i at the program
 0
e (n)=                      n is reachable in O ∪ I from n0   point before the method invocation site, and a graph
         e(n)            otherwise                           hhO2 , I2 , e2 i, τ2 , α2 , π2 i from the exit statement of op. The
                                                              interprocedural analysis has two steps. It first computes a
                                                              mapping µ for the outside nodes from the callee. It then uses
     Figure 8: Transfer Function for l.start()                µ to combine the two parallel interaction graphs to obtain
                                                              the parallel interaction graph at the program point immedi-
          α0 =α ∪ {sync} × I(l) × {nCT }                      ately after the method invocation. The analysis computes
          π 0 =π ∪ ({sync} × I(l) × {nCT }) × τ               µ as the least fixed point of the following constraints:

Figure 9: Transfer Function for l.acquire() and               4
                                                                The only exception is in the definition of O[µ] where we do not
l.release()                                                   substitute the load node nL that constitutes the end point of an
                                                              outside edge hn, f, nL i.
current implementation obtains this call graph information
                                                                            using a variant of a cartesian product type analysis [1], but
                 I1 (li ) ⊆ µ(nvi ), ∀i ∈ {0, 1, . . . k}            (1)    it can use any conservative approximation to the dynamic
                                                                            call graph.
          hn1 , f, n2 i ∈ O2 , hn3 , f, n4 i ∈ I1 , n3 ∈ µ(n1 )                The analysis uses a worklist algorithm to solve the com-
                                                                     (2)
                              n4 ∈ µ(n2 )                                   bined intraprocedural and interprocedural dataflow equa-
                                                                            tions. A bottom-up analysis of the program yields the full
                 hn1 , f, n2 i ∈ O2 , hn3 , f, n4 i ∈ I2 ,                  result with one analysis per strongly connected component
                   µ(n1 ) ∩ µ(n3 ) 6= ∅, n1 6= n3                    (3)    of the call graph. Within strongly connected components,
                        µ(n4 ) ∪ {n4 } ⊆ µ(n2 )                             the algorithm iterates to a fixed point.

   The first constraint initializes µ; the next two constraints             4.5   Thread Interaction
extend µ. Constraint 1 maps each parameter node from the                       Interactions between threads take place between a starter
callee to the nodes from the caller that represent the actual               thread (a thread that starts a parallel thread) and a startee
parameters at the call site. Constraint 2 matches outside                   thread (the thread that is started). The interaction algo-
edges read by the callee against corresponding inside edges                 rithm is given the parallel interaction graph hhO, I, ei, τ, α, πi
from the caller. Constraint 3 matches outside edges from the                from a program point in the starter thread, a node nT that
callee against inside edges from the callee to model aliasing               represents the startee thread, and a run method that runs
between callee nodes.                                                       when the thread object represented by nT starts. The par-
   The algorithm next extends µ to µ0 to ensure that all                    allel interaction graph associated with the exit statement
nodes from the callee (except the parameter nodes) appear                   of the run method is hhO2 , I2 , e2 i, τ2 , α2 , π2 i. The result
in the new parallel interaction graph:                                      of the thread interaction algorithm is a parallel interaction
                      ½                                                     graph hhO0 , I 0 , e0 i, τ 0 , α0 , π 0 i that models all the interactions
               0         µ(n)           if n ∈ NP
             µ (n) =                                                        between the execution of the starter thread (up to its corre-
                         µ(n) ∪ {n} otherwise
                                                                            sponding program point) and the entire startee thread. This
The algorithm computes the new parallel interaction graph                   result conservatively models all possible interleavings of the
hhO0 , I 0 , e0 i, τ 0 , α0 , π 0 i at the program point after the method   two threads.
invocation as follows:                                                         The algorithm has two steps. It first computes two map-
                                                                            pings µ1 , µ2 , where µ1 maps outside nodes from the starter
    O0 = O1 ∪ O2 [µ0 ]          I 0 = I1 ∪ (I2 − V × N )[µ0 ]
     0            0                                                         and µ2 maps outside nodes from the startee. It then uses µ1
    τ = τ1 ∪ τ2 [µ ]            α0 = α1 ∪ α2 [µ0 ]
                                                                            and µ2 to combine the two parallel interaction into a single
         π 0 = π1 ∪ π2 [µ0 ] ∪ (O2 [µ0 ] ∪ α2 [µ0 ]) × τ1
                                                                            parallel interaction graph that reflects the interactions be-
It computes the new escape function e0 as the union of the                  tween the two threads. The algorithm computes µ1 and µ2
escape function e1 before the method invocation and the                     as the least fixed point of the following constraints:
expansion of the escape function e2 from the callee through                                    nT ∈ µ2 (nv0 ), nT ∈ µ2 (nCT )                     (4)
µ0 . More formally, the following constraints define the new
escape function e0 as
                                                                                    hn1 , f, n2 i ∈ Oi , hn3 , f, n4 i ∈ Ij , n3 ∈ µi (n1 )
                                            n2 ∈ µ0 (n1 )                                                                                         (5)
                 0
      e1 (n) ⊆ e (n)                                                                                    n4 ∈ µi (n2 )
                                  (e2 (n1 ) − NP )[µ0 ] ⊆ e0 (n2 )
propagated over the edges from O0 ∪ I 0 . After the interpro-                                hn1 , f, n2 i ∈ Oi , hn3 , f, n4 i ∈ Ii ,
cedural analysis, reachability from the parameter nodes of                                    µi (n1 ) ∩ µi (n3 ) 6= ∅, n1 6= n3                  (6)
the callee is no longer relevant for the escape function, hence                                   µi (n4 ) ∪ {n4 } ⊆ µi (n2 )
the set difference in the second initialization constraint. We
have a proof that this interprocedural analysis produces to                         hn1 , f, n2 i ∈ Ii , hn3 , f, n4 i ∈ Oj , n3 ∈ µi (n1 )
                                                                                                                                                  (7)
a parallel interaction graph that is at least as conservative                                            n2 ∈ µj (n4 )
as the one that would be obtained by inlining the callee
and performing the intraprocedural analysis as in section                                        n2 ∈ µi (n1 ), n3 ∈ µj (n2 )
4.2 [20].                                                                                                                                         (8)
                                                                                                       n3 ∈ µi (n1 )
   Finally, we simplify the resulting parallel interaction graph
by removing superfluous nodes and edges. We remove all                      Here nv0 is the parameter node associated with the single
load nodes nL such that e0 (nL ) = ∅ from the graph; such                   parameter of the run method – the this pointer – and nCT
load nodes do not represent any concrete object. We also                    is the dummy current thread node. Also, I1 = I and O1 =
remove all all outside edges hn1 , f, n2 i that start from a cap-           O ∩ (π@nT ). Note that the algorithm computes interactions
tured node n1 (where e0 (n1 ) = ∅); such outside edges do                   only for outside edges from the starter thread that represent
not represent any concrete reference. Finally, we remove all                references read after the startee thread starts.
nodes that are not reachable from a live variable, parameter                   Unlike the caller/callee interaction, where the execution
node, or unanalyzed started thread node from τ 0 .                          of the caller is suspended during the execution of the callee,
   Because of dynamic dispatch, a single method invocation                  in the starter/startee interaction, both threads execute in
site may invoke several different methods. The transfer func-               parallel, producing a more complicated set of statement in-
tion therefore merges the parallel interaction graphs from all              terleavings. The interthread analysis must therefore model a
potentially invoked methods to derive the parallel interac-                 richer set of potential interactions in which each thread can
tion graph at the point after the method invocation site. The               read edges created by the other thread. The interthread
analysis therefore uses two mappings (one for each thread)                not escape via a parameter node, it is captured after the in-
instead of just one mapping. It also augments the con-                    terthread analysis. Finally the algorithm enhances the effi-
straints to reflect the potential interactions.                           ciency and precision of the analysis by removing superfluous
   In the same style as in the interprocedural analysis, the              nodes and edges using the same simplification method as in
algorithm first initializes the mappings µ01 , µ02 to extend µ1           the interprocedural analysis.
and µ2 , respectively. Each node from the two initial par-                   As presented, the algorithm assumes that each node n ∈ τ
allel interaction graphs (except nv0 ) will appear in the new             represents multiple instances of the corresponding thread.
parallel interaction graph:                                               Our implementation improves the precision of the analysis
                                                                          by tracking whether each node represents a single thread or
              µ01 (n) = ½
                        µ1 (n) ∪ {n}
                                                                          multiple threads. For nodes that represent a single thread,
                0          µ2 (n)       if n = nv0
              µ2 (n) =                                                    the algorithm computes the interactions just once, adjusting
                           µ2 (n) ∪ {n} otherwise                         the new action order π 0 to record that the outside edges and
The algorithm uses µ01 and µ02 to compute the resulting par-              actions from the startee thread do not occur in parallel with
allel interaction graph as follows:                                       the node n that represents the startee thread. For nodes
                                                                          that represent multiple threads, the algorithm repeatedly
  O0 = O[µ01 ] ∪ O2 [µ02 ]        I 0 = I[µ01 ] ∪ (I2 − V × N )[µ02 ]     computes the interactions until it reaches a fixed point.
   0      0           0
  τ = τ [µ1 ] ∪ τ2 [µ2 ]          α0 = α[µ01 ] ∪ α2 [µ02 ]
  π 0 = π[µ01 ] ∪ π2 [µ02 ] ∪                                             4.7   Resolving Outside Nodes
           (O2 [µ02 ] ∪ α2 [µ02 ]) × τ [µ01 ] ∪ π@nT [µ01 ] × τ2 [µ02 ]     It is possible to augment the algorithm so that it records,
   In addition to combining the action orderings from the                 for each outside node, all of the inside nodes that it rep-
starter and startee, the algorithm also updates the new ac-               resents during the analysis of the entire program. This in-
tion order π 0 to reflect the following ordering relationships:           formation allows the algorithm to go back to the analysis
                                                                          results generated at the various program points and resolve
      • All actions and outside edges from the startee occur in           each outside node to the set of inside nodes that it represents
        parallel with all of the starter’s threads, and                   during the analysis. In the absence of nodes that escape via
                                                                          unanalyzed threads or methods, this enables the algorithm
      • All actions and outside edges from the starter thread             to obtain complete, precise points-to information even for
        that occur in parallel with the startee thread also oc-           analysis results that contain outside nodes.
        cur in parallel with all of the threads that the startee
        starts.
                                                                          5.    ANALYSIS USES
The new escape function e0 is the union of the escape func-
tion e from the starter and the escape function e2 from the                 We next discuss how we use the analysis results to perform
startee, expanded through µ1 and µ2 , respectively. More                  two optimizations: region reference check elimination and
formally, the escape function e0 is initialized by the follow-            synchronization elimination.
ing two constraints
                                                                          5.1   Region Reference Check Elimination
     n2 ∈ µ1 (n1 )                            n2 ∈ µ2 (n1 )                  The analysis eliminates region reference checks by verify-
  e(n1 )[µ1 ] ⊆ e0 (n2 )            (e2 (n1 ) − NP )[µ2 ] ⊆ e0 (n2 )      ing that no object allocated in a given region escapes the
                                                                          computation that executes in the context of that region. In
and propagated over the edges from O0 ∪ I 0 .
                                                                          our system, all such computations are invoked via the execu-
4.6     Interthread Analysis                                              tion of a statement of the form r.enter(t). This statement
                                                                          causes the the run method of the thread t to execute in the
  The interthread analysis uses a fixed-point algorithm to
                                                                          context of the memory region r. The analysis first locates
obtain a single parallel interaction graph that reflects the in-
                                                                          all of these run methods. It then analyzes each run method,
teractions between all of the parallel threads. The algorithm
                                                                          performing both the intrathread and interthread analysis,
repeatedly chooses a node nT ∈ τ , retrieves the analysis re-
                                                                          and checks that none of the inside nodes in the analysis re-
sult from the exit node of the corresponding run method,5
                                                                          sult escape. If none of these inside nodes escape, all of the
then uses the thread interaction algorithm presented above
                                                                          objects allocated inside the region are inaccessible when the
in Section 4.5 to compute the interactions between the ana-
                                                                          computation terminates. All of the region reference checks
lyzed threads and the thread represented by nT and combine
                                                                          will therefore succeed and can be removed.
the two parallel interaction graphs into a new graph. Once
the algorithm reaches a fixed point, it removes all nodes
                                                                          5.2   Synchronization Elimination
in NT from the escape function — the final graph already
models all of the possible interactions that may affect nodes                The synchronization elimination algorithm uses the re-
that escape only via unanalyzed thread nodes. The analysis                sults of the interthread analysis to find captured objects
may therefore recapture thread nodes that escaped before                  whose synchronization operations can be removed. Like
the interthread analysis. For example, if a thread node does              previous synchronization elimination algorithms, our algo-
5
                                                                          rithm uses the intrathread analysis results to remove syn-
  The algorithm uses the type information to determine which class        chronizations on objects that do not escape the thread that
contains this run method. For inside nodes, this approach is exact.
For outside nodes, the algorithm uses class hierarchy analysis to find    created them. Unlike previous synchronization elimination
a set of classes that may contain the run method. The algorithm           algorithms, our algorithm also analyzes the interactions be-
computes the interactions with each of the possible run methods, then     tween parallel threads. It then uses the action set α and the
merges the results. In practice, τ almost always contains inside nodes
only — the common coding practice is to create and start threads in       action ordering relation π to eliminate synchronizations on
the same method.                                                          objects with synchronizations from multiple threads.
The analysis proceeds as follows. For each node n that                                          Analysis time [s]
is captured after the interthread analysis, it examines π to                        Bytecode         for removing      Backend
find all threads t that execute in parallel with a synchroniza-     Program        instructions    checks    syncs     time [s]
tion on n. It then examines the action set α to determine             Tree            10,970         0.5      15.9       41.1
if t also synchronizes on n. If none of the parallel threads         Array            10,896         0.6      16.9       42.2
t synchronize on n, the compiler can remove all synchro-             Water            17,675        11.3      56.1       66.0
nizations on the objects that n represents. Even if multiple         Barnes           15,945         6.9      94.2       54.8
threads synchronize on these objects, the analysis has deter-         Http            14,313        17.1      38.3       73.8
mined that the synchronizations are temporally separated             Quote            14,039        16.9      41.4       61.4
by thread start events and therefore redundant.
                                                                        Figure 10: Program Sizes and Analysis Times

6.    EXPERIMENTAL RESULTS                                                         Original             Optimized version
   We have implemented our combined pointer and escape               Program        version       Interprocedural Interthread
analysis algorithm in the MIT Flex compiler system, a static           Tree            59                43             43
compiler for Java. We used the analysis information for syn-          Array            59                43             43
chronization elimination and elimination of dynamic region            Water        2,367,193          919,575        919,575
reference checks. We present experimental results for a set           Barnes       2,838,720          678,355        678,355
of multithreaded benchmark programs. In general, these                 Http          67,268            8,460           7,406
programs fall into two categories: web servers and scien-             Quote         268,913           200,650        198,610
tific computations. The web servers include Http, an http
                                                                  Figure 11: Number of Synchronization Operations
server, and Quote, a stock quote server. Both of these ap-
plications were written by others and posted on the Inter-                 Program      Standard      Checks   No Checks
net. Our scientific programs include Barnes and Water, two
complete scientific applications that have appeared in other                 Tree          6.5         16.8       7.0
benchmark sets, including the SPLASH-2 parallel comput-                     Array          8.2         43.4       8.3
ing benchmark set [23]. We also present results for two syn-                Water          9.6         9.7        8.1
thetic benchmarks, Tree and Array, that use object field as-                Barnes         8.4         7.6        6.7
signment heavily. These benchmarks are designed to obtain                    Http          4.5         5.3        5.2
the maximum possible benefit from region reference check                    Quote         11.7         11.3       11.3
elimination.                                                            Figure 12: Execution Times for Benchmarks
6.1   Methodology                                                                      Number of             Number of
   We first modified the benchmark programs to use region-               Program     Objects in Heap      Objects in Regions
based allocation. The web servers create a new thread to                   Tree            184                 65,534
service each new connection. The modified versions use a                  Array            183                     8
separate region for each connection. The scientific programs              Water          20,755               3,110,675
execute a sequence of interleaved serial and parallel phases.             Barnes         17,622               2,121,167
The modified versions use a separate region for each paral-                Http          12,228                62,062
lel phase. The result is that all of the modified benchmarks              Quote          21,785                121,350
allocate long-lived shared objects in the garbage-collected
heap and short-lived objects in regions. The modifications          Figure 13: Allocation Statistics for Benchmarks
were relatively straightforward to perform, but it was diffi-
cult to evaluate the correctness of the modifications with-
out the static analysis. The web servers were particularly        6.2    Results
problematic since they heavily use the Java libraries. With-         Figure 10 presents the program sizes and analysis times.
out the static analysis it was not clear to us that the li-       The synchronization elimination algorithm analyzes the en-
braries would work correctly with region-based allocation.        tire program, while the region check algorithm analyzes only
For Http, Quote, Tree, and Array, the interprocedural anal-       the run methods and the methods that they (transitively)
ysis alone was able to verify the correct use of region-based     invoke. The synchronization elimination analysis therefore
allocation and enable the elimination of all dynamic region       takes significantly more time than the region analysis. The
checks. Barnes and Water required the interthread analysis        backend time is the time required to produce an executable
to eliminate the checks — interprocedural analysis alone was      once the analysis has finished. All times are in seconds.
unable to verify the correct use of region-based allocation.      Figure 11 presents the number of synchronizations for the
   We used the MIT Flex compiler to generate a C imple-           Original version with no analysis, the Interprocedural ver-
mentation of each benchmark, then used gcc to compile the         sion with interprocedural analysis only, and the Interthread
program to an x86 executable. We ran the Http and Quote           version with both interprocedural and interthread analysis.
servers on a 400 MHz Pentium II running Linux, with the           For this optimization, the interthread analysis produces al-
clients running on an 866 MHz Pentium III running Linux.          most no additional benefit over the interprocedural analysis.
The two machines were connected with their own private            Figure 12 presents the execution times of the benchmarks.
100 Mbit/sec Ethernet. We ran Water, Barnes, Tree, and            The Standard version allocates all objects in the garbage-
Array on an 866 MHz Pentium III running Linux.                    collected heap and does not use region-based allocation. The
                                                                  Checks version uses region-based allocation with all of the
                                                                  dynamic checks. The No Checks version uses region-based
allocation with the analysis eliminating all dynamic checks.      7.2   Escape Analysis for Multithreaded Programs
None of the versions uses the synchronization elimination            Published escape analysis algorithms for Java programs
optimization. Check elimination produces substantial per-         do not analyze interactions between threads [3, 6, 22, 2]. If
formance improvements for Tree and Array and modest per-          an object escapes via a thread object, it is never recaptured.
formance improvements for Water and Barnes. The running           These algorithms are therefore best viewed as sequential pro-
times of Http and Quote are dominated by thread creation          gram analyses that have been extended to execute correctly
and operating system overheads, so check elimination pro-         but very conservatively in the presence of multithreading.
vides basically no performance increase. Figure 13 presents       Our analysis takes the next step of analyzing interactions
the number of objects allocated in the garbage-collected          between threads to recapture objects accessed by multiple
heap and the number allocated in regions. The vast ma-            threads.
jority of the objects are allocated in regions.                      Ruf’s analysis occupies a point between traditional escape
                                                                  analyses and our multithreaded analysis [18]. His analysis
6.3   Discussion
                                                                  tracks the synchronizations that each thread performs on
   Our applications use regions in one of two ways. The           each object, enabling the compiler to remove synchroniza-
servers allocate a new region for each connection. The re-        tions for objects accessed by multiple threads if only one
gion holds the new objects required to service the connec-        thread synchronizes on the object. Our analysis goes a step
tion. Examples of such objects include String objects that        further to remove synchronizations even if multiple threads
hold responses sent to clients and iterator objects used to       synchronize on the object. The requirement is that thread
find requested data. The scientific programs use regions for      start events must temporally separate synchronizations from
auxiliary objects that structure the parallel computation.        different threads.
These objects include the Thread objects required to gen-
erate the parallel computation and objects that hold values       7.3   Region-Based Allocation
produced by intermediate calculations.                               Region-based allocation has been used in systems for many
   In general, eliminating region checks provides modest per-     years. Our comparison focuses on safe versions, which en-
formance improvements. We therefore view the primary              sure that there are no dangling references to deleted regions.
value of the analysis in this context as helping the program-     Several researchers have developed type-based systems that
mer to use regions correctly. We expect the analysis to be        support safe region-based allocation [21, 8]. These systems
especially useful in situations (such as our web servers) when    use a flow-insensitive, context-sensitive analysis to correlate
the programmer may not have complete confidence in his or         the lifetimes of objects with the lifetimes of computations.
her detailed knowledge of the program’s object usage pat-         Although these analyses were designed for sequential pro-
terns.                                                            grams, it should be straightforward to generalize them to
7.    RELATED WORK                                                handle multithreaded programs.
   We discuss several areas of related work: analysis of mul-        Gay and Aiken’s system provides an interesting contrast
tithreaded programs, escape analysis for multithreaded pro-       to ours in its overall approach [9]. They provide a safe,
grams, and region-based allocation.                               flat region-based system that allows arbitrary references be-
                                                                  tween regions. The implementation instruments each store
7.1   Analysis of Multithreaded Programs                          instruction to count references that go between regions. A
   The analysis of multithreaded programs is a relatively         region can be deleted only when there are no references to
unexplored field [17]. There is an awareness that multi-          its objects from objects in other regions. This dynamic, ref-
threading significantly complicates program analysis but a        erence counted approach works equally well for both sequen-
full range of standard techniques have yet to emerge. Grun-       tial and multithreaded programs. The system also supports
wald and Srinivasan present a dataflow analysis framework         the explicit assignment of objects to regions and allows the
for reaching definitions for explicitly parallel programs [10],   programmer to use type annotations to specify that a given
and Knoop, Steffen and Vollmer present an efficient dataflow      reference must stay within the same region. Violations of
analysis framework for bit-vector problems such as liveness,      this constraint generate a run-time error; a static analysis
reachability and available expressions [12]. Both frameworks      reduces but is not designed to eliminate the possibility of
are designed for programs with structured, parbegin/parend        such an error occurring.
concurrency and are intraprocedural. We view the main con-           Following the Real-Time Java specification, our imple-
tributions of this paper as largely orthogonal to this previous   mentation provides a less flexible system of hierarchically
research. In particular, the main contributions of this paper     organized regions with an implicit assignment of objects to
center on abstractions and algorithms for the interprocedu-       regions. Because region lifetimes are hierarchically nested,
ral and compositional analysis of programs with unstruc-          the implementation dynamically counts, for each region, the
tured multithreading. We also focus on problems, pointer          number of child regions rather than the number of external
and escape analysis, that do not fit within either framework.     pointers into each region. Instead of performing counter
   We are aware of two pointer analysis algorithms for multi-     manipulations at each store, the unoptimized version of our
threaded programs: an algorithm by Rugina and Rinard for          system checks each assignment to ensure that the program
multithreaded programs with structured parbegin/parend            never generates a reference that goes down the hierarchy
concurrency [19], and an intraprocedural algorithm by Cor-        from an ancestor region to a descendant region. Our static
bett [7]. The algorithms are not compositional (they dis-         analysis eliminates these checks, with the interthread anal-
cover the interactions between threads by repeatedly rean-        ysis required to successfully optimize multithreaded pro-
alyzing each thread in each new analysis context to reach         grams.
a fixed point), do not maintain escape information, and do
not support the analysis of incomplete programs.
8.    CONCLUSION                                                 [8] K. Crary, D. Walker, and G. Morrisett. Typed
   Multithreading is a key program structuring technique,            memory management in a calculus of capabilities. In
language and system designers have made threads a cen-               Proceedings of the 26th Annual ACM Symposium on
tral part of widely used languages and systems, and multi-           the Principles of Programming Languages, San
threaded software is becoming pervasive. This paper presents         Antonio, TX, Jan. 1999.
an abstraction (parallel interaction graphs) and an algo-        [9] D. Gay and A. Aiken. Language support for regions.
rithm that uses this abstraction to extract precise points-          In Proceedings of the SIGPLAN ’01 Conference on
to, escape, and action ordering information for programs             Program Language Design and Implementation,
that use the standard unstructured form of multithreading            Snowbird, UT, June 2001.
provided by modern languages and systems. We have im-           [10] D. Grunwald and H. Srinivasan. Data flow equations
plemented the analysis in the MIT Flex compiler for Java,            for explicitly parallel programs. In Proceedings of the
and used the extracted information to verify that programs           4th ACM SIGPLAN Symposium on Principles and
correctly use region-based allocation constructs, eliminate          Practice of Parallel Programming, San Diego, CA,
dynamic checks associated with the use of regions, and elimi-        May 1993.
nate unnecessary synchronization. Our experimental results      [11] C. Hauser, C. Jacobi, M. Theimer, B. Welch, and
show that analyzing the interactions between threads signif-         M. Weiser. Using threads in interactive systems: A
icantly increases the effectiveness of the optimizations for         case study. In Proceedings of the Fourteenth
region-based programs, but has little effect for synchroniza-        Symposium on Operating Systems Principles,
tion elimination.                                                    Asheville, NC, Dec. 1993.
                                                                [12] J. Knoop, B. Steffen, and J. Vollmer. Parallelism for
9.    ACKNOWLEDGEMENTS                                               free: Efficient and optimal bitvector analyses for
                                                                     parallel programs. ACM Transactions on Programming
  We started this research in collaboration with John Wha-           Languages and Systems, 18(3):268–299, May 1996.
ley; we thank him for his contributions during this collabo-
                                                                [13] D. Lea. Concurrent Programming in Java: Design
ration. We thank Wes Beebee for his invaluable assistance
                                                                     Principles and Patterns. Addison-Wesley, Reading,
with the region-based allocation package and Darko Marinov
                                                                     Mass., 2000.
and Viktor Kuncak for many interesting discussions regard-
ing pointer and escape analysis.                                [14] F. Nielson, H. Nielson, and C. Hankin. Principles of
                                                                     Program Analysis. Springer-Verlag, 1999.
                                                                [15] V. Pai, P. Druschel, and W. Zwaenepol. Flash: An
10.   REFERENCES                                                     efficient and portable web server. In Proceedings of the
 [1] O. Agesen, J. Palsberg, and M. Schwartzbach. Type               Usenix 1999 Annual Technical Conference, June 1999.
     inference of SELF: analysis of objects with dynamic        [16] J. Reppy. Higher–order Concurrency. PhD thesis,
     and multiple inheritance. Software—Practice and                 Dept. of Computer Science, Cornell Univ., Ithaca,
     Experience, 25(9):975–995, Sept. 1995.                          N.Y., June 1992.
 [2] B. Blanchet. Escape analysis for object oriented           [17] M. Rinard. Analysis of multithreaded programs. In
     languages. application to Java. In Proceedings of the           Proceedings of 8th Static Analysis Symposium, Paris,
     14th Annual Conference on Object-Oriented                       France, July 2001.
     Programming Systems, Languages and Applications,           [18] E. Ruf. Effective synchronization removal for Java. In
     Denver, CO, Nov. 1999.                                          Proceedings of the SIGPLAN ’00 Conference on
 [3] J. Bogda and U. Hoelzle. Removing unnecessary                   Program Language Design and Implementation,
     synchronization in Java. In Proceedings of the 14th             Vancouver, Canada, June 2000.
     Annual Conference on Object-Oriented Programming           [19] R. Rugina and M. Rinard. Pointer analysis for
     Systems, Languages and Applications, Denver, CO,                multithreaded programs. In Proceedings of the
     Nov. 1999.                                                      SIGPLAN ’99 Conference on Program Language
 [4] G. Bollella, B. Brosgol, S. Furr, D. Hardin, P. Dibble,         Design and Implementation, Atlanta, GA, May 1999.
     J. Gosling, and M. Turnbull. The Real-Time                 [20] A. Sălcianu. Pointer analysis and its applications for
     Specification for Java. Addison-Wesley, Reading,                Java programs. Master’s thesis, Dept. of Electrical
     Mass., 2000.                                                    Engineering and Computer Science, Massachusetts
 [5] R. Chandra, A. Gupta, and J. Hennessy. Data locality            Institute of Technology, In preparation.
     and load balancing in COOL. In Proceedings of the 4th      [21] M. Tofte and L. Birkedal. A region inference
     ACM SIGPLAN Symposium on Principles and                         algorithm. ACM Transactions on Programming
     Practice of Parallel Programming, San Diego, CA,                Languages and Systems, 20(4), July 1998.
     May 1993.                                                  [22] J. Whaley and M. Rinard. Compositional pointer and
 [6] J. Choi, M. Gupta, M. Serrano, V. Sreedhar, and                 escape analysis for Java programs. In Proceedings of
     S. Midkiff. Escape analysis for Java. In Proceedings of         the 14th Annual Conference on Object-Oriented
     the 14th Annual Conference on Object-Oriented                   Programming Systems, Languages and Applications,
     Programming Systems, Languages and Applications,                Denver, CO, Nov. 1999.
     Denver, CO, Nov. 1999.                                     [23] S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta.
 [7] J. Corbett. Using shape analysis to reduce finite-state         The SPLASH-2 programs: Characterization and
     models of concurrent Java programs. In Proceedings of           methodological considerations. In Proceedings of the
     the International Symposium on Software Testing and             22nd International Symposium on Computer
     Analysis, Mar. 1998.                                            Architecture. ACM, New York, June 1995.
You can also read