What Can the GC Compute Efficiently? A Language for Heap Assertions at GC Time

Page created by Holly Fernandez
 
CONTINUE READING
What Can the GC Compute Efficiently?
                            A Language for Heap Assertions at GC Time

        Christoph Reichenbach                                                  Neil Immerman                             Yannis Smaragdakis
       University of Massachusetts                                      University of Massachusetts                    University of Massachusetts
         creichen@gmail.com                                             neil.immerman@gmail.com                         yannis@cs.umass.edu

                            Edward E. Aftandilian                                                            Samuel Z. Guyer
                                   Tufts University                                                            Tufts University
                                 eaftan@cs.tufts.edu                                                         sguyer@cs.tufts.edu

Abstract                                                                                    1.   Introduction
We present the DeAL language for heap assertions that are                                   Garbage collection (GC) is a widely popular mechanism in
efficiently evaluated during garbage collection time. DeAL is                               modern programming languages, employed in representa-
a rich, declarative, logic-based language whose programs are                                tives of virtually all language classes, from mainstream man-
guaranteed to be executable with good whole-heap locality,                                  aged languages (e.g., Java, C#), to dynamic languages (e.g.,
i.e., within a single traversal over every live object on the                               Perl, Python), and to the language avant-garde (e.g., Haskell,
heap and a finite neighborhood around each object. As a                                     OCaml). The traditional view on GC considers it a necessary
result, evaluating DeAL programs incurs negligible cost:                                    cost. Our work is based on the observation that GC also rep-
for simple assertion checking at each garbage collection,                                   resents a unique opportunity. GC necessitates an occasional
the end-to-end execution slowdown is below 2%. DeAL is                                      traversal of all live program objects. If all objects are going
integrated into Java as a VM extension and we demonstrate                                   to be traversed anyway, it is natural to consider using this
its efficiency and expressiveness with several applications                                 traversal for more than computing liveness. With minimal
and properties from the past literature.                                                    machinery that piggybacks on the GC traversal, the system
    Compared to past systems for heap assertions, DeAL is                                   can perform computation over a program’s heap. Such com-
distinguished by its very attractive expressiveness/efficiency                              putation should be side-effect free, but also similar in struc-
tradeoff: it offers a significantly richer class of assertions                              ture to a garbage collection traversal, in order to exploit GC
than what past systems could check with a single traversal.                                 well without undue overhead. Furthermore, this computation
Conversely, past systems that can express the same (or more)                                should not compute results essential to the main program’s
complex assertions as DeAL do so only by suffering orders-                                  control flow, or it would cause much more frequent garbage
of-magnitude higher costs.                                                                  collection than would be otherwise necessary.
                                                                                                Assertion checking is a good fit for these requirements.
Categories and Subject Descriptors D.2.5 [Testing and                                       Ensuring program properties by way of programmer-defined
Debugging]; D.2.4 [Software/Program Verification]: Asser-                                   assertions is a crucial technique in software development
tion checkers                                                                               and forms the cornerstone of methodologies such as Design
                                                                                            by Contract [13]. Virtually every programmer with modern
General Terms             Algorithms, Reliability                                           training has been well-exposed to the idea that using asser-
                                                                                            tions to trigger program failures is an excellent way to reveal
Keywords heap assertions, garbage collector, reachability,                                  faults. Since assertion checking only queries program data
dominance                                                                                   but does not update them, it requires no computations with
                                                                                            side effects. Additionally assertions should not alter the pro-
                                                                                            gram control flow or compute results that the program will
Permission to make digital or hard copies of all or part of this work for personal or       later use.
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation       Performing assertion checking during GC time is not
on the first page. To copy otherwise, to republish, to post on servers or to redistribute   new. The QVM Java virtual machine [3] allows heap
to lists, requires prior specific permission and/or a fee.
OOPSLA/SPLASH’10, October 17–21, 2010, Reno/Tahoe, Nevada, USA.
                                                                                            probes, which check reachability and dominance properties
Copyright c 2010 ACM 978-1-4503-0203-6/10/10. . . $10.00                                    at GC time. The GC Assertions system of Aftandilian and
Guyer [1] describes a suite of assertions that can be effi-             DeAL is an expressive language, as we will demonstrate
ciently checked at GC time. The limitation of these past             with many examples. Its generality allows us to write com-
systems is that they either do not offer a general language          plex properties (which past systems needed to hard-code)
for writing assertions or they require multiple GC traver-           as straightforward logical expressions. For instance, we can
sals to evaluate a single assertion. In this way, they sacri-        write the property “all Node objects are dominated by object
fice either efficiency or expressiveness. For instance, the Af-      d”, i.e., no path to them exists that does not pass through d:
tandilian and Guyer system includes very specific kinds of
assertions, such as assert-unshared(obj). Recent versions                               ∀x ∈ Node : ¬Rr0 /d(x)
of the QVM system allow full JML assertions but cannot               The special constant r0 refers to the root set of the GC
evaluate those in a single GC traversal, often resulting in nu-      traversal. Therefore the property means that no Node object
merous heap traversals in the course of evaluating an asser-         is reachable without going through d.
tion. As a consequence, efficiency for QVM assertions relies             Using standard boolean connectives we can write inter-
on reducing the cost of every GC via parallelism. Still, per-        esting combinations of properties—e.g., the DeAL assertion:
forming multiple GCs, even in parallel, is much more costly
than piggybacking on a single GC (which would need to oc-              (∀x ∈ Node : (x.data > threshold) ⇒ Rhotcache(x))∧
cur anyway).                                                                    (∃y ∈ Node : ¬Rhotcache(y) ∧ y.next = null)
    In contrast, we present a Declarative Assertion Language
(DeAL), which fulfills the requirements of computing during          (In words, “every Node object with an above-threshold data
GC time: it only allows side-effect-free computations and it         value is in the hotcache data structure, and some Node object
can only express programs whose object access patterns are           that is not in hotcache has no next link.”) Although non-
a good fit for a GC traversal and incur constant overhead per        trivial, this assertion (like any other legal DeAL program, as
object. Our approach is informed by Descriptive Complex-             defined in Section 2) is evaluated in a single heap traversal,
ity, where the important complexity classes (LOGSPACE,               per DeAL’s locality guarantees.
PTIME, NP, PSPACE, etc.) have been captured by logic-                    Exploiting the GC allows DeAL assertions to be evalu-
based languages [10]. Although the performance property              ated with very low cost. We implemented DeAL by extend-
we are interested in does not correspond to a complexity             ing the Jikes RVM and Eclipse Java Compiler and evaluated
class, one can view it as a limited case of linear-time compu-       its efficiency extensively. The overhead of evaluating DeAL
tation, where we have the extra guarantee that the program is        assertions of course varies with the complexity of the asser-
evaluated in a single traversal of the heap. This is not a prop-     tion: if the assertion consists of comparing k field derefer-
erty that is well-expressed in complexity theory, but it is an       ence expressions that depend on the current object, it is in-
often-cited property of good locality for garbage collectors.        evitable to incur Ω(k) cost per object. For typical small prop-
For instance, Wilson’s GC survey [23] distinguishes algo-            erties, however, DeAL imposes at most a few percent over-
rithms based on their locality. A GC algorithm is considered         head over the cost of garbage collection, resulting in near-
to have good locality when it only needs to visit each ob-           zero (sub-2%) overhead over the end-to-end execution time
ject once. Algorithms that need to perform multiple passes           of applications.
over the heap have bad temporal locality, since they always              In summary, our paper makes the following contributions:
access the data that have not been accessed the longest. This        • We define a declarative language for linear time, single-
behavior is a bad fit for LRU-based cache mechanisms, espe-           traversal computations, which results in good locality for
cially for traversals that revisit structures slightly larger than    whole-heap traversals. To our knowledge, no past language
the cache.                                                            has ever offered this guarantee. The property that all legal
    DeAL is a logic-based extension of Java and guarantees            programs visit each object exactly once is a general pro-
that any assertion expressible in it runs by traversing each          gram property, not limited to the domain of computing dur-
live heap object (and a constant-size neighborhood around             ing GC. As such, our language design and overall approach
it) exactly once. From a logic standpoint, DeAL is a single-          may find even more applications in the future.
variable first order logic with predicates of the form Rc/d(x),      • We designed our language, DeAL, as a natural Java exten-
meaning that object x is reachable from object c without              sion for assertion checking and implemented it in a modern
going through object d. For instance, we can express the              high-performance JVM. We demonstrate the expressive-
well-formedness property of a doubly-linked list d via a              ness of our design with numerous examples, mostly from
logical assertion:                                                    the recent research literature.
          ∀x ∈ Node : Rd(x) ⇒ (x.next.prev = x)                      • We evaluate the practical effectiveness and efficiency of
                                                                      DeAL over a suite of assertions and applications. Our
The concrete DeAL syntax uses a textual representation of             results demonstrate that the idea of an expressive language
logical operators. For instance, the above property is written:       for assertion checking at GC time has excellent properties
forall Node x: reach[d](x) -> x.next.prev == x                        and promise for wide adoption.
primitive-valued expressions can be compared with the stan-
      DealAssertion ::= assertion(F) |                           dard primitive relational operators  o |
                                                                 name without changing the meaning of the formula.
                      o ≤ o | o ≥ o | o , o | Rt(x)                  The second important element is that the unary pred-
 t ∈ TraversalSrc ::= s/→
                        −s | /→
                              −s | t ∨ t                         icates of the DeAL logic have the form Rs1 /o1 , . . . , oi ∨
                                                                 s2 /oi+1 , . . . , o j ∨ . . . ∨ sn /ol+1 , . . . , om (x). Note that the ∨s
             o ∈ Obj ::= v | o. f | c | x
                                                                 in the above, although logically designating an “or”, are part
      s ∈ SourceObj ::= v | s. f                                 of the predicate name and should not be confused with the
                 x ∈ LVar is a set of logic variable names       top-level logical “or” connective. (For instance, there are
                                                                 no “and” and “not” counterparts in predicate names.) The
                 v ∈ JVar is a set of Java variable names
                                                                 meaning of the above relation is that x is reachable from s1
             T ∈ Typ ::= Java types                              (called a source) via a path that does not include any of the
           c ∈ Const ::= Java constants                          objects o1 , . . . , , oi or x is reachable from s2 via a path that
                                                                 does include any of the objects o1 , . . . , o j or . . . or x is reach-
            f ∈ Field ::= Java field names
                                                                 able from sn via a path that does not include any of the ob-
                                                                 jects o1 , . . . , om . (Note that the set of excluded objects keeps
Figure 1. DeAL syntax. Note the distinction between ex-          growing monotonically: the objects specified to be excluded
pressions that do and do not contain logic variables (sets       for every new source si are in addition to all previously ex-
Obj and SourceObj, respectively). Their only difference is       cluded objects.) An empty source si is shorthand for the root
that the former cannot appear as traversal sources in an R       set r0 of the garbage collection; thus, we may rewrite our
predicate or be a Java constant.                                 earlier example of ‘all Node objects are dominated by d’ as
                                                                 ∀x ∈ Node : ¬R/d(x).
2.     The DeAL Language                                             Together, these design decisions ensure that we can eval-
We next present DeAL’s syntax, semantics, and usage exam-        uate queries in a single pass through the heap. First, disal-
ples.                                                            lowing nesting ensures that we disallow queries that make
                                                                 us compare every heap object to every other heap object,
                                                                 which we cannot do in one pass in general.
2.1     Language Design
                                                                     Second, the construction of the R ensures that we can find
Syntax. The full abstract syntax of the DeAL language            a suitable heap traversal strategy. Intuitively, the key feature
is shown in Figure 1. Since DeAL’s expressions are first-        of using a single R predicate to link reachability properties
order logic formulas, we will freely alternate between the       only with an or connective is that the predicate value is
logical and the concrete DeAL syntax, as convenient for          determined regardless of which path in the object graph was
presentation purposes. The reader should treat the logical       used to reach an object. That is, if we reach an object via
notation as a shorthand for the text DeAL/Java syntax, i.e.,     one path, we do not need to later try to also reach it via other
∀x ∈ T , ∃x ∈ T , Rt, ∧, ∨, ¬, ∈, =, ⇒, ⇔, ,, ≤, ≥ are           paths.
synonyms of the actual DeAL operators forall T x, exists             We should point out precisely how the language of DeAL
T x, reach[t], &&, ||, !, instanceof, ==, ->, , !=, =,   formulas, F, formally fits the mathematical definition of a
respectively.                                                    first order logic—e.g., see [10]. From the logic perspective,
   The language has three primitives: assertion(F ),             Java object expressions are treated as logical constants—
assertDisjoint(F ), and unsafeAssert(F ), where F is a           e.g., a Java variable v is a constant from the perspective of the
sentence in first order logic. All three primitives attempt      logic, since its meaning does not change for different values
to check whether the formula is true, but they have differ-      of the logical variables. Java fields are treated as functions—
ent requirements and soundness guarantees, discussed later.      for instance, the expression x.next is equivalent to a logical
All the logical connectives have their usual meaning, with       function “next” applied to logical object x.
forall/exists quantification implicitly applying over all live
heap objects. Objects can be compared for equality and
Type Checking and Assertion Semantics. DeAL asser-                          specifics on the current DeAL integration in Java. Although
tions can appear as Java statements at any legal point                      these are largely engineering choices (which could change
in a Java+DeAL program. Each assertion (a member of                         without altering the essence of the language) they are useful
DealAssertion in the syntax of Figure 1) is a separate DeAL                 in order for the reader to establish a concrete model of usage.
program, evaluated independently. The DeAL guarantee is
                                                                            Integration. We have integrated DeAL to Java as a con-
that each assertion requires a single traversal over objects,               servative extension. The DeAL language is hidden behind a
and thus can be evaluated during a single garbage collec-                   library and the user invokes assertions by calling static meth-
tion. (In Section 2.2 we discuss in more detail the various                 ods assertion, assertDisjoint, and unsafeAssert in the
options on how often assertions are evaluated and how they                  main DeAL class. The method calls accept DeAL formulas
are integrated in our current implementation.)                              as regular strings. For instance, the doubly-linked list con-
    For a syntactically well-formed DeAL program to also                    sistency invariant is asserted as:
be valid, it first needs to satisfy Java type constraints on its            assertion(
expressions. The types of field reference expressions (either                "forall Node x: reach[d](x) -> x.next.prev == x");
 j or o in Figure 1) have to be compatible with their uses with
operators =, ≤, etc. That is, a relational expression (r in the                This allows unsuspecting Java compilers to process
syntax) should be either a legal Java expression with type                  DeAL programs by linking to a simple stub library. The
boolean or an instance of predicate R.                                      DeAL compiler, however, fully parses and statically type
    More interestingly, DeAL imposes restrictions on how                    checks the assertion string. (As indicated earlier, most of the
the R predicates are used in different kinds of asser-                      examples in this paper use the equivalent DeAL syntax of
tions. These restrictions are closely tied to the meaning of                Figure 1 instead of the concrete textual syntax. Accordingly,
assertion vs. assertDisjoint vs. unsafeAssert.
                                                                            we elide the quotes that delimit the assertion string.)
                                                                               In designing DeAL’s assertion evaluation model, we con-
• In instances of assertion(F ), at most one reacha-                        sidered two questions: when to evaluate and whether to eval-
 bility relation, Rs1 /o1 , . . . , oi ∨ s2 /oi+1 , . . . , o j ∨ . . . ∨   uate. Regarding when to evaluate, we chose to execute asser-
  sn /ol+1 , . . . , om (x), may occur in F, although the same re-          tion statements at the point they are encountered, rather than
 lation may occur multiple times. An assertion primitive                    delaying until the time when garbage collection would oc-
 straightforwardly checks the asserted condition in a single                cur anyway. Compared to past systems, this design decision
 traversal over all live objects.                                           resembles that of the QVM system [3], as opposed to the
                                                                            option followed by Aftandilian and Guyer [1], which puts
• In instances of assertDisjoint(F ), multiple reachability                 off assertion checking until GC time. The latter has the dis-
 relations may occur, but, if two such relations occur, then in             advantage that the shape of the heap may have changed be-
 addition to asserting F we are asserting that the domains of               tween the point of assertion statement and the point of its
 the two relations are disjoint. If there is a node that satisfies          evaluation, resulting in incorrect evaluation (both unsound
 both relations then we will detect this and report failure                 and incomplete).
 of the assertion. (If the domains are all disjoint, we can                    However, the issue with evaluating assertions at the point
 evaluate F in a single run through the heap during garbage                 they are encountered is that if every assertion causes a
 collection, and if the domains are not all disjoint then we                garbage collection then GC will occur too frequently, with
 will detect this during the single traversal of the heap.)                 high overhead. Therefore, DeAL decides whether to eval-
• In instances of unsafeAssert(F ), multiple R relations                    uate an assertion by measuring the amount of time since
 may occur without a claim of disjointness. In this mode,                   the last GC. The assertion is evaluated only if a sufficiently
 the DeAL system will behave as if the domains of the R                     long time has elapsed. In this way, assertions can be ignored,
 relations mentioned are all disjoint, without checking this                resulting in incompleteness.1 By default, we tie our deci-
 property. If the relations are not disjoint, then we are not               sion of whether an assertion is to be evaluated or not to the
 guaranteed to detect this, nor to actually check F. Specifi-               same internal metrics that the Jikes RVM GC uses to decide
 cally, DeAL will not correctly check formula F if the truth                whether to perform a collection. Thus, we evaluate an asser-
 value of F depends on objects that simultaneously satisfy                  tion roughly around the time a GC would be due to happen
 multiple R relations. The reason is that DeAL will only                    anyway. This is what we view as the most likely mode of
 traverse such objects once, while processing one of the                    use of DeAL: assertion checking happens at low frequency
 R relations and without knowing whether the object sat-                    but without imposing heavy overhead. We offer the user the
 isfies other Rs. Yet formula F may express different re-                   option of “throttling” the evaluation frequency of assertions,
 quirements for an object depending on which R it satisfies.                up to executing every single assertion encountered and suf-
                                                                            fering the resulting overhead. This allows us to enforce as-
2.2   DeAL Language Usage
                                                                            1 We  choose to view DeAL as a fault detector, hence the meanings of
We next discuss how the user interacts with the DeAL lan-                   “sound” and “complete” are exactly inverse from what they would be for
guage and offer specific usage scenarios. We begin with                     a correctness verifier.
sertion checking even in pathological cases that our default                   As mentioned briefly in the Introduction, we can express
heuristic would always skip, such as when specified right af-              dominance properties as non-reachability if the dominator is
ter a call to System.gc().                                                 excluded [22]. For instance, we can evaluate a property over
                                                                           all dominated nodes:
Examples. There are several possibilities for using DeAL
in the course of regular programming. These include local                      assertion(∀x ∈ Node : ¬R/c(x) ⇒ x.data > 0);
data structure invariants, ownership and dominance prop-
erties, disjointness properties, reachability properties, and                 Dominance by a single object is an object ownership
combinations of those. We show below several DeAL as-                      property. Combinations of dominance properties allow more
sertions, most of them taken from the recent literature [6–                complex conditions. For instance, we can specify that every
8, 11, 16, 17, 19].                                                        object of type Person has to be dominated by the combina-
    One common kind of condition concerns type constraints.                tion of data structures males and females. That is, no path
For instance, all objects in a data structure may need to be               in the object graph should be able to reach a Person object
Serializable, and no instance of a subtype of Thread may                   without going through the object called males or the object
be stored in the structure:                                                called females in the program:

  assertion(∀x ∈ Node :RHttpSession(x) ⇒                                       assertion(∀x ∈ Person : ¬R/males,females(x));
                              x instanceof Serializable ∧                      Note that dominance by a set of sources is not equivalent
                              ¬(x instanceof Thread));                     to any boolean combination of individual dominance prop-
                                                                           erties. In particular, “x is dominated by the combined objects
(Note that although class Thread is not Serializable, a                    males and females” is equivalent to neither “x is dominated
user-defined subclass of it may be.)                                       by males AND x is dominated by females” nor “x is dom-
   Local invariants for data structures are quite common and               inated by males OR x is dominated by females”. (Consider
useful. We already saw a doubly-linked list invariant, and a               objects reachable from both data structures, which are dom-
sortedness invariant is equally easy to express:                           inated by neither one.)
                                                                               If we want to state that the two structures are disjoint,
assertion(∀x ∈ Node : Rc(x) ⇒ x.data ≤ x.next.data);
                                                                           we can do so with an assertDisjoint statement, which will
                                                                           also allow us to combine other conditions on the objects. For
(Note that we can ignore the case of x.next being null. If
                                                                           instance, we can have:
a subexpression of a field dereference expression is null
DeAL ignores the corresponding value of the logical vari-                  assertDisjoint(∀x ∈ Person : Rmales(x)∨Rfemales(x));
able.2 )
   It is similarly easy to specify that a data structure rooted            This states that any object of type Person is reachable from
at c should have a cycle:                                                  one of the two data structures, though there may also be other
                                                                           paths to it. Since an assertDisjoint allows multiple disjoint
        assertion(∀x ∈ Node : Rc(x) ⇒ x.next , null);                      R predicates in the same assertion, we can easily strengthen
                                                                           the condition to express “every Person is dominated by
   In the same way, we can state data structure invariants for             either males or females but not both”, by combining the last
binary and n-ary trees, heaps, and so on.                                  two example formulas:
   More interesting properties are those that require the
reachability of all objects that satisfy a condition:                            assertDisjoint(∀x ∈ Person :
                                                                                                   (Rmales(x) ∨ Rfemales(x))∧
assertion(∀x ∈ Node : (x.next = null) ⇒ Rleaves(x));
                                                                                                   ¬R/males,females(x));
Such properties are not easily expressible in plain Java, be-
                                                                              Disjointness conditions have a variety of applications, in-
cause they refer to any possible live object (with a given
                                                                           cluding leak detection (data may be reachable by unexpected
type, in this case) and not just to objects in a specific struc-
                                                                           data structures) and detection of improper sharing among
ture that could be traversed with a plain for loop.
                                                                           threads. For instance, we can state that trees from distinct
   With an existential quantifier, we can easily assert that a
                                                                           compilation phases do not share structure:
value satisfying certain properties is found, independently of
which data structure contains it:                                          assertDisjoint(∀x : ¬(RparseTree(x)∧RsyntaxTree(x)));

assertion(∃x ∈ Node : x.data > 3.141∧ x.data < 3.142);                        The above examples are representative of simple but pow-
                                                                           erful DeAL assertions. The examples do not include uses of
2 Forexample, if in the quantified expression, “Qx : ϕ,” x.next.data
occurs in ϕ, then the quantifier is restricted to the case that x.next ,   unsafeAssert, which allows more complex assertions but
null, i.e, (Qx such that x.next , null) : ϕ.                               with few guarantees, as we describe next.
Unsafe Assertions. The unsafeAssert primitive gives the           3.1    Single Traversal Evaluation
user extra flexibility but at the cost of not guaranteeing cor-   DeAL assertions are guaranteed to be evaluated in a single
rectness. We do not consider unsafeAssert to be a core part       traversal of live heap objects. Although in the rest of the
of the DeAL language, and we have not used it in prac-            paper we informally describe this property as “each object is
tice. Nevertheless, it represents an escape clause for the user   traversed once”, this assumes an implicit understanding that
to specify richer properties than those that DeAL can stat-       an object may be re-examined briefly, only to discover that it
ically guarantee to check in a single traversal. It is the re-    was visited earlier. Below we state the desired property more
sponsibility of the user to ensure that such properties make      precisely.
sense under DeAL’s single-traversal execution. Therefore, it
is worth elucidating possible usage patterns, since correct-      Theorem 1. A valid, safe DeAL assertion (i.e., a sin-
ness depends on understanding them. There are four possi-         gle assertion or assertDisjoint statement but not an
bilities for the correctness of evaluating an unsafeAssert:       unsafeAssert statement) can be evaluated correctly in a
                                                                  single traversal that crosses each edge in the object graph
• The formula is evaluated soundly and completely, i.e.,          exactly once and performs a constant amount of work per
 DeAL assertion failures correctly indicate that the formula      object. (This constant depends on the size of the assertion.)
 is false, and if the DeAL assertion evaluation does not fail,
 the formula is true. This is the case when the shape of          Proof. Consider first the case of an assertion statement.
 the properties or of the heap ensures that the truth value       According to Figure 1, an asserted formula is a boolean com-
 of the formula does not depend on which objects satisfy
                                                                  bination of quantified formulas (each starting with ∀ or ∃),
 multiple Rs. For a simple example, consider a formula
                                                                  with each quantified formula being a boolean combination
 that treats all objects reachable under two different sets of
                                                                  of relational expressions, r. Since all quantification is over
 conditions the same way: ∀x : (Rs1 /o1 (x) ⇒ x.data >
                                                                  live objects, it is clear that we only need to evaluate every
 0) ∧ (Rs2 /o2 (x) ⇒ x.data > 0).                                 relational expression r once per object. All relational expres-
• The formula is evaluated soundly, i.e., DeAL assertion          sions of a form other than Rt(x) can be evaluated correctly
 failures correctly indicate that the formula is false. This      in constant time for any object. Thus, we can evaluate all
 means that some DeAL warnings may be missed, but this            such expressions (from the entire asserted formula) at what-
 is likely practically acceptable. For example, the formula       ever time an object is visited. The difficulty lies in proving
 may contain two distinct R predicates with one of them           that we can define a traversal (i.e., a loop over live objects)
 implying stronger conditions than the other (e.g., ∀x :          for which expressions of the form Rt(x) are also correctly
 (Rs1 (x) ⇒ x.data > 0) ∧ (Rs2 (x) ⇒ x.data ≥ 0)). DeAL           evaluated in constant time. If we also show this, then our
 offers no guarantee as to which condition it will check for      traversal needs to keep current truth values only for all quan-
 objects that satisfy both Rs.                                    tified formulas and at the end of the entire traversal perform
• The formula is evaluated completely: there may be false         their boolean combination for the final result. (An optimiza-
 assertion failures, but if no assertion fails then the formula   tion consists of short-circuiting the evaluation of quantified
 was true at the point of evaluation. An example would            formulas. That is, if an ∃ or ∀ formula has already had its
 be the complement of our previous “unsound” formula:             truth value determined before the end of the traversal—this
 ∃x : Rs1 (x) ∧ x.data ≤ 0 ∨ Rs2 (x) ∧ x.data < 0.                can only be true for ∃ and false for ∀—there is no reason to
                                                                  evaluate for further objects the relational expressions r in the
• The formula is evaluated neither soundly nor
                                                                  body of this quantified formula.)
 completely—e.g., it combines an incomplete and an un-                Recall that, per the restrictions of Section 2.1, a valid
 sound sub-formula. This scenario may still be valuable if        assertion statement can only contain instances of a sin-
 it turns out to detect failures often while issuing false re-
                                                                  gle Rt(x) predicate. The truth value of such a predicate
 ports rarely. Practical automatic bug detectors are often
                                                                  can be computed (in constant time) for a traversal visit-
 unsound and incomplete—the authors of ESC/Java have
                                                                  ing every live object once. The full form of the predicate
 argued forcefully that soundness and completeness are
                                                                  is Rs1 /o1 , . . . , oi ∨ s2 /oi+1 , . . . , o j ∨ . . . ∨ sn /ol+1 , . . . , om (x).
 overrated properties compared to practical usefulness in         We define the traversal to satisfy the required property as
 catching bugs [9].                                               follows:
3.   Single Traversal Property and Language                        • Mark objects o1 , . . . , oi ignored.
     Implementation                                                  For example, for query Ra/b ∨ c/d, e(x) in Figure 2, we
We next state and prove DeAL’s single traversal property.          mark b as ignored.
This discussion also serves to introduce the way the DeAL          • Perform a standard reachability traversal starting at ob-
compiler processes programs and the changes required to the        ject s1 . The traversal processes objects that are neither
runtime system so that assertions are evaluated as part of a       ignored nor visited and marks objects visited after pro-
GC traversal.                                                      cessing them. Predicate Rt(x) is true for each processed
root                                   visited flag and one for ignored).3 From the perspective
                                                                     of the visited flag, the above traversal is a full replacement
                        a
                               ❻        c                            of a GC reachability traversal with only the object visiting
                        ❶              ❷                             order modified.
                    d   b               e                                The handling of statement assertDisjoint is quite sim-
                        ❸              ❺                             ilar. The difference is that, in addition to the visited and
                                                                     ignored flags we maintain per object, we need dlog(u)e
                                                                     bits per object, where u is the number of distinct predi-
    Figure 2. Traversals for Ra/b ∨ c/d, e(x), in order.             cates Rt(x). These bits form a traversalNum counter. The
                                                                     traversal proceeds by repeating all but the last step of the
 object, since it is reachable from object s1 without going          assertion traversal for every Rt(x) predicate, with some
 through o1 , . . . , oi . Thus, when processing an object, the      small differences. First, while marking an object visited af-
 traversal can evaluate (for the current object) all relational      ter processing, the traversal sets traversalNum to the current
 expressions r in the entire formula.                                traversal number, which is initially 0 and incremented for ev-
    For our example in Figure 2, we execute traversal ¶,             ery Rt(x) predicate examined. Second, for each previously
 starting from a, ignoring b.                                        visited object encountered, the traversal checks whether the
 • Proceed in the same manner for other sources and ex-              visited object’s traversalNum is equal to the current traver-
 cluded objects. That is, mark objects oi+1 , . . . , o j ignored,   sal number (indicating a previously visited object during the
 perform a standard reachability traversal starting at object        current traversal) or lower (indicating a disjointness error
 s2 , etc. In total there are n such steps (one for each object      and causing an immediate assertion failure, unless the object
 s1 , . . . , sn ).                                                  is also ignored). The final traversal (from the root set and
    In our running example (Figure 2) we now ignore objects          from excluded objects) uses a traversal number of 0, thereby
 d and e and then perform traversal ·.                               avoiding disjointedness errors.
 • Remove the ignored flag from all objects o1 , . . . , om .            The above assertDisjoint traversal maintains the prop-
    In our running example we now remove this flag from b,           erties required for the proof. First, the truth value of every R
 d, and e.                                                           predicate is evident at the time of visit of each object, since
 • Perform m reachability traversals, each starting at object        the disjointness property guarantees that the only predicate
 ok for 1 ≤ k ≤ m, followed by a reachability traversal              R that is true for the object is the R predicate whose traversal
 starting at the root set (i.e., the default live objects of the     reaches the object. Second, the visited flag ensures no edge
 program’s execution, such as the stack and the global re-           is crossed more than once, by the same argument as for the
 gion). Every traversal processes all non-visited objects            assertion traversal.                                           
 and marks objects visited after processing them. During
 each of these m + 1 traversals, predicate Rt(x) is false for            Although the above proof is not hard, this is largely due
 each processed object, by nature of the traversal: the object       to the careful design of the DeAL language, which main-
 was not reachable by any si without going through the cor-          tains the properties required for a single traversal evaluation
 responding excluded objects o1 , . . . , om . When processing       of assertions. We note that it is quite easy to be misled with
 an object, the traversal can evaluate (for the current object)      respect to predicates that can be evaluated in a single traver-
 all relational expressions r in the entire formula.                 sal. For instance, a common mistake is to attempt to integrate
    In our running example, we first run traversals ¸ through        the inverse of predicate Rc(x) in a single traversal language:
 º, though ¹ would be a no-op (since we already visited d            It is not possible to add to DeAL a predicate Pc(x) with the
 in ¶ before ignoring it). Then we conclude with the root            meaning “x can reach c” while maintaining the same single
 traversal ».                                                        traversal property for arbitrary graphs.

    Note that no edge in the object graph (that is, a reference      3.2      DeAL Implementation
field of an object) is processed twice: the standard reachabil-      The proof of the single traversal theorem essentially lays out
ity traversal follows edges out of an object exactly once, after     the principles of the DeAL implementation. Each traversal
marking the object visited, and the only further operation           may need to touch the entire heap, so DeAL must interface
on a visited object is to check whether it is visited. We            with a whole-heap garbage collector, such as mark-sweep
invert the meaning (i.e., mapping to a 0 or 1 bit) of visited        or the major collection of a generational collector (which
after every collection. (This is a standard technique in mark-       performs whole-heap traversals).
sweep garbage collection, avoiding a second visit of nodes
                                                                     3 This requirement can be reduced to just one bit per object, by using the
just to reset them.) After collection no object is ignored
                                                                     visited bit to also mean ignored, plus a second bit only for each of the
(since no object is ignored initially.)                              m ignored objects. The reason for the second bit is that, during the final
    The above traversal satisfies all requirements of the proof.     traversal from the root set, it is necessary to remember whether an ignored
It also requires space of two bits per object (one for the           object has been visited or not.
DeAL instructs this garbage collector to perform the              Node objects directly referenced by array arr are dominated
traversals described in the proof for assertion and                  by arr”. This is a property that can be evaluated in a single
assertDisjoint. Statement unsafeAssert is handled identi-            traversal, yet it is not expressible in DeAL. One can attempt
cally to assertion with the only difference being that all but       to write the property straightforwardly as:
the last step of the assertion traversal are repeated for every
Rt(x) predicate in the formula. Therefore, unsafeAssert
visits every object exactly once but without guaranteeing              assertDisjoint(∀x ∈ Node : Rarr(x) ⇒ ¬R/arr(x));
correct evaluation of the assertion, since each object is vis-
ited only as part of a traversal corresponding to a single R             Yet this is not the desired property: the Node objects
predicate.                                                           directly referenced by arr are only a subset of all the objects
   DeAL is implemented on top of Jikes RVM [2], modify-              reachable by arr. For instance, the Node objects referenced
ing the included MarkSweep garbage collector. The bulk of            by arr may themselves refer to other objects that are not
the implementation consists of changes to the runtime sys-           dominated by arr, causing a spurious disjointness error.
tem, and especially interfacing with the collector. To pro-          Intuitively, the desired property should be expressible in
duce a clean and general interface, we introduced a small            DeAL but is not because of a technicality: Although the
“trace-and-test” language inside the VM. A trace-and-test            property is local (it requires traversing just one link to tell
program instructs the garbage collector what to mark recur-          that an object is directly referenced by arr), it is not local
sively, what to test for, and what individual objects to ex-         from the perspective of the referenced object: when visiting
pressly exclude or include in the traversal. The trace-and-          a random object it is not possible to know in constant time
test language has primitives such as exclude-node(o) (to             whether it is directly referenced by arr.
exclude an object from the traversal), include-node(o) (to               These (as well as many other) expressiveness limitations
unmark an object, so that it is processed when next en-              of DeAL can be overcome by combining Java computation
countered), mark-traversal-id(x) (to increase the traver-            with DeAL assertion checking. The general programming
sal number), set-expressions(k1 , . . . , kn ) (to set up the sets   pattern consists of having Java code compute a property
of relational expressions that need to be evaluated for ev-          of objects, summarize the results as a local property (for
ery node, together with information, per set, on whether the         instance, by marking a field for all objects satisfying the
expressions are in an existential or universal context), and         desired property) and invoke a DeAL assertion that checks
trace(o) (to start a reachability traversal at object o).            whether the marked objects have the expected structure.
   The DeAL compiler translates an assertion into trace-                 In our above example, if we add an otherwise unused
and-test language instructions, following closely the traver-        field mark in class Node, we can easily express the desired
sal structure described in the proof of Section 3.1. The rela-       assertion by combining Java and DeAL evaluation:
tional expressions that need to be evaluated per-object are
translated into plain Java methods that the runtime com-
                                                                     Object dummy = new Object();
piles with the maximum optimization level allowed by the
Jikes just-in-time compiler. We altered the garbage collec-          for (Node n : arr) n.mark = dummy;
tion logic in the virtual machine to execute the trace-and-set       assertion(∀x ∈ Node : (x.mark = dummy) ⇒ ¬R/arr(x));
instructions. For marking nodes, we employ 4 bits per object
that are currently unused by the runtime, resulting in a max-           This programming pattern is quite powerful. It allows
imum of 8 R predicates per assertDisjoint formula (since             nearly arbitrary extension of the properties that can be
one bit is occupied by the ignored flag); in addition we ob-         checked, at the expense of performing some of the work
tain the visited flag from a region in the object header re-         outside the DeAL traversal. Given the generality of the pat-
served for the existing MarkSweep garbage collector. In the          tern, we are considering adding language support for it in fu-
future, the implementation could allocate extra space per ob-        ture versions of DeAL. Namely, we can add a labelObject
ject on a side structure, allowing unlimited R predicates.           primitive to DeAL and allow the Java program to mark ob-
                                                                     jects with a finite set of labels, at any point in the execu-
                                                                     tion. Then, when a DeAL assertion is checked, it can refer to
4.   Discussion: Expressiveness Limitations,
                                                                     whether an object has a certain label or not (labels are unary
     Programming Patterns                                            predicates). The markings will be cleared after evaluation of
DeAL is an expressive language but has by design some ex-            the assertion. This simple addition to the language is just a
pressiveness limitations to support its single traversal guar-       convenience feature that obviates the need for extra fields,
antee. Clearly, DeAL cannot express computations that are            such as mark in our earlier example. The markings can be
inherently non-linear, or non-single-traversal. Furthermore,         kept on a side structure that the DeAL traversal consults. We
DeAL occasionally cannot support computations that hap-              have not added such convenience features in our first version
pen to be computable in a single traversal, if their structure       of DeAL, preferring to concentrate on the core language in-
obscures this fact. For instance, consider the property “all         stead of on elements that can be easily emulated.
5.    Experiments                                              5.2     Controlled Microbenchmarks
We evaluate the implementation of DeAL with several ex-        The overhead of DeAL evaluation depends strongly on the
periments:                                                     complexity of assertions. For instance, if the user asserts
                                                               a property that is a function of the values of k neighbors
• We evaluate the assertions of Section 2.2 in a full heap     of every live object, then the runtime cost will necessarily
 of objects with no other computation performed in the         be Ω(k). Therefore, the interesting question concerns the
 application. This is a controlled microbenchmark for the      overhead for typical assertions. We used as our evaluation
 given assertions, intended to check how much the overhead     set the example assertions of Section 2.2 (which mostly
 varies with the complexity of typical assertions and with     emulate assertions from the recent research literature [6–
 the percentage of relevant objects.                           8, 11, 16, 17, 19]). The assertions are applied to specially
                                                               designed skeletal programs, with appropriate data structures
• We run a wide spectrum of actual applications (includ-
                                                               for each assertion.
 ing the DaCapo benchmarks [4], SPEC JVM98 [20],
                                                                   The intention of these microbenchmarks is to show what
 and pseudojbb—a fixed-workload version of SPEC
                                                               can be expected as a worst-case execution time for repre-
 JBB2000 [21]). We measure the overhead of simple
                                                               sentative assertions. These programs do nothing aside from
 (generic) DeAL assertion checking, when assertions are
                                                               building the data structure and running an assertion on it.
 evaluated with the same frequency as that of normal
                                                               We used three different input sizes, determining the num-
 garbage collection in the course of the application. This
                                                               ber of objects on the heap (counting only objects with types
 is an end-to-end experiment establishing cost under condi-
                                                               relevant to the assertion): small (10,000 objects), medium
 tions where the user intends to achieve near-zero-overhead
                                                               (100,000 objects), and large (1 million objects).4 The com-
 execution.
                                                               parison is to an unmodified Jikes RVM that runs a regular
• We analyze in more depth the overhead of a custom as-        System.gc() every time DeAL would check an assertion.
 sertion that reveals a (known) bug in pseudojbb. This is a    Figure 3 shows the GC time overhead results for all three
 close approximation of a “real use” scenario.                 input sizes.
                                                                   The large input size expectedly incurs the most over-
   All experiments confirm that the overhead of DeAL is        head. The geometric mean of the GC time overhead is 1.25.
small and becomes negligible for assertion evaluation at       That is, assertion checking slows down garbage collection by
roughly normal GC frequencies.                                 just 25%. The absolute worst-case GC time slowdowns rise
                                                               to 60-70%. Generally, assertions that access fields perform
                                                               worse than those that just check disjointedness or reacha-
5.1   Methodology and Setup                                    bility properties. Note that the object type provided in the
                                                               assertion does not serve as an efficient filter, since the pro-
Our system is implemented on top of Jikes RVM 3.1.0, so we
                                                               gram is specially designed to create only objects matching
compare our results to those of the unmodified Jikes RVM
                                                               the type examined in the assertion. Thus, the fields of vir-
3.1.0, both using the full-heap MarkSweep collector.
                                                               tually every program-generated object need to be examined.
   We use the DaCapo benchmarks versions 2006-10-MR2
                                                               (This is unlikely to be the case in a realistic setting: object
and 9.12-bach. For SPEC JVM98, we use the large input size
                                                               types will be used to disqualify most objects without needing
(-s100); for DaCapo 2006, DaCapo 9.12, and pseudojbb,
                                                               to examine their fields.) The three worst performers are as-
we use the default input size. Many of the DaCapo 9.12
                                                               sertions “cyclic”, “doubly linked list”, and “sorted”, repro-
benchmarks do not run on the unmodified Jikes RVM 3.1.0,
                                                               duced here:
so we provide results from only those that do. We perform
our experiments on a Core 2 Duo E6750 machine with 2 GB
                                                                       assertion(∀x ∈ Node : Rc(x) ⇒ x.next , null);
of RAM, running Ubuntu Linux 9.04.
   We use the adaptive configuration of Jikes RVM, which
                                                                 assertion(∀x ∈ Node : Rd(x) ⇒ (x.next.prev = x));
dynamically identifies frequently executed methods and re-
compiles them at higher optimization levels. We iterate each   assertion(∀x ∈ Node : Rc(x) ⇒ x.data ≤ x.next.data);
benchmark k times and record the results from the last it-
eration. For microbenchmarks (Section 5.2) k = 10, and            As can be seen from Figure 3, even the worst case slow-
for larger programs k = 4. We repeat this twenty times for     down is quite low, considering that GC time is a small part
each benchmark. All numbers reported are means of the 20       of total program execution time. Indeed, even though these
runs, and our graphs also include 90% confidence intervals.
                                                               4 The choice of sizes follows the sizes of the DaCapo benchmarks. Across
We execute each benchmark with a heap size fixed at two
                                                               all DaCapo benchmarks, the geometric mean of the maximum number
times the minimum possible for that benchmark using the        of live objects is roughly 100,000. The minimum is about 3,000 and the
MarkSweep collector. We vary the UNIX environment size         maximum is 3.2 million [4]. Therefore, our three setups have roughly the
to avoid measurement bias [14].                                same spread of sizes as the DaCapo benchmarks.
Figure 3. GC time overhead for representative assertions and skeletal programs, for small (top), medium (middle), and large
                                                  n                                                                         n                                                                     n
                                          ea                                                                        ea                                                                        ea
      Assert

                                                                                Assert

                                                                                                                                                          Assert
                                    om                                                                        om                                                                      o  m
      Base

                                                                                Base

                                                                                                                                                          Base
                               ge                                                                        ge                                                                        ge
                                          d                                                                         d                                                                         d
                                    rte                                                                       rte                                                                    rte
                               so                                                                        so                                                                        so
                                                      le                                                                        le                                                                  le
                                               ab                                                                        ab                                                                       ab
                                         liz                                                                       liz                                                                      liz
                                    ria                                                                       ria                                                                       ria
                               se                                                                        se                                                                        se
                               pi                                                                        pi                                                                        pi
                                      es                                                                        es                                                                     es
                                 av                                   t                                    av                                   t                                    av                            t
                               le                                lis                                     le                                lis                                     le                          lis
                                                            d_                                                                        d_                                                                    d_
                                                       ke                                                                        ke                                                                ke
                                                   in                                                                        in                                                                 lin
                                               _l                                                                        _l                                                                   y_
                                          ly                                                                        ly                                                                     bl
                                    ub                                e                                       ub                                e                                     u                          e
                               do                                pl                                      do                                pl                                      do                          pl
                                                            eo                                                                        eo                                                                eo
                                                        _p                                                                        _p                                                                  _p
                                                  ce                                                                        ce                                                                     ce
                                              an                                                                        an                                                                   an
                                         in                                                                        in                                                                      in
                                    m                                                                         m                                                                       m
                               do                                                                        do                                                                        do
                                                  ce                                                                        ce                                                                     ce
                                              an                                                                        an                                                                   an
                                         in                                                                        in                                                                      in
                                    m                             ed                                          m                             ed                                        m                          ed
                               do                               at                                       do                               at                                       do                         at
                                                            in                                                                        in                                                                    in
                                                        m                                                                         m                                                                om
                                                   do                                                                        do                                                                 _d
                                              t_                                                                        t_                                                                   nt
                                          in                                                                        in                                                                    oi
                                    sjo                                                                       sjo                                                                     sj
                               di                            le
                                                                  r                                      di                            le
                                                                                                                                            r                                      di                       le
                                                                                                                                                                                                                 r
                                                           pi                                                                        pi                                                                  pi
                                                        m                                                                         m                                                                     m
                                                   co                                                                        co                                                                 _c
                                                                                                                                                                                                   o
                                              t_                                                                        t_                                                                   nt
                                          in                                                                        in                                                                    oi
                                    sjo                                                                       sjo                                                                     sj
                               di                                                                        di                                                                        di
                                              t                                                                         t                                                                    nt
                                          in                                                                        in                                                                    oi
                                    sjo                                                                       sjo                                                                     sj
                               di                                                                        di                                                                        di
                                      ic                                                                        ic                                                                         ic
                                    cl                                                                        cl                                                                     cl
                               cy                                                                        cy                                                                        cy

                                                                                                                                                                                                                       (bottom) input sizes.
200

       150

               100

                     50

                           0

                                                                          200

                                                                                 150

                                                                                         100

                                                                                               50

                                                                                                     0

                                                                                                                                                    200

                                                                                                                                                           150

                                                                                                                                                                   100

                                                                                                                                                                         50

                                                                                                                                                                               0
      Normalized GC time                                                        Normalized GC time                                                        Normalized GC time
skeletal programs do very little other than assertion check-     becomes just ∀x ∈ Order : x.field , specialValue. Since
ing, the total execution time overhead (not shown in Fig-        the ∀ iteration is over live objects and we expect our removed
ure 3) has a geometric mean of 1.07 for the small config-        objects to be unreachable, no live object should have the spe-
uration, 1.08 for the medium, and 1.13 for the large—i.e.,       cial value.
assertion checking causes slowdowns of just 7, 8, and 13%,          The bug is hard to detect because Order objects are
respectively.                                                    removed from the orderTable only at rare program in-
                                                                 stances. Specifically, the orderTable is occasionally com-
5.3   End-to-end Simple Assertion Cost
                                                                 pletely emptied and then destroyed. Since the programmer
We ran a large suite of end-to-end benchmarks to quantify        might not know how often this happens, we measured the
the overall overhead of our system. Our suite of applica-        cost of employing DeAL at high frequency: the assertion is
tions includes the DaCapo benchmarks, SPEC JVM98, and a          evaluated once every 5 times it is encountered. The high fre-
fixed-workload version of SPEC JBB2000 called pseudojbb.         quency of assertion checking causes much more frequent
Clearly, the challenge of evaluating overheads over a wide       GCs than normal program execution without DeAL: the
range of applications is that real assertions are application-   mean of the number of GCs (over 20 executions) rises from
specific. To overcome this difficulty, we added a “generic”      2.85 to 12.7, and the total GC cost is 3.37 times higher for the
assertion in our runtime that runs at the same frequency as      version employing DeAL. However, even this high GC over-
normally-triggered GC. This assertion checks that there are      head is hardly noticeable in terms of total program execution
no objects on the heap that are an instance of a certain un-     time. The slowdown is merely 6.1%: the assertion-checking
used class. Thus, the assertion should always return true, but   program runs in 3.87s vs. 3.65s for the original program run-
it will cause an instanceof predicate to be tested on every      ning on an unmodified VM.
live object. In addition, we allow this dummy assertion to be
combined with a traversal disjointness test, which incurs the
cost of also writing and checking a traversal ID in the header
of every live object.                                            6.   Related work
    Figure 4 shows the overhead of DeAL on GC time for           Our work is related to a wide range of techniques, both static
the benchmark applications. The figure shows three configu-      and dynamic, for checking properties of data structures.
rations: Base (unmodified Jikes RVM), Assert (instanceof             Research in static analysis has yielded a significant body
assertion), and AssertDisjoint (instanceof assertion + dis-      of sophisticated techniques for modeling the heap and an-
jointness check).                                                alyzing data structures at compile-time. The strength of
    Overall, the geometric mean of the slowdown in GC time       static analysis is that it explores all possible paths through
for Assert is 9.10% and for AssertDisjoint is 11.96%. These      the program: a sound analysis algorithm can prove the ab-
overheads are quite low and become vanishingly small when        sence of errors, or even verify the full correctness of a data
computed as a percentage of total execution time, instead        structure implementation. The weakness of static analysis is
of just GC time: the geometric mean of the total execution       that many properties, particularly those involving pointers
slowdown for both kinds of assertions is below 1.9%. (The        and heap-based structures, are undecidable in principle [15],
mean number of garbage collections for these benchmarks is       and extremely difficult to approximate precisely in prac-
20.3, with a standard deviation of 25.9.)                        tice [5, 12, 18, 24]. These problems are particular severe for
                                                                 languages such as Java, which include many hard-to-analyze
5.4   Representative Example                                     features such as reflection, bytecode rewriting, and dynamic
To evaluate the overhead of a real use of DeAL, we sim-          class loading.
ulated, with pessimistic assumptions, the process by which           Our work is most closely related to several recent tech-
DeAL would be used to detect a bug in practice. We picked        niques for dynamically analyzing heap data structures, in-
the most complex of the bugs previously identified in pseu-      cluding two that check properties at garbage collection time.
dojbb using heap assertions [1]. The bug consists of remov-      GC Assertions provides a set of heap assertions that, like
ing Order objects from the orderTable data structure. These      DeAL, are checked by the garbage collector at runtime [1].
orders should then become unreachable and be garbage             This work differs from DeAL in two critical ways. First,
collected. However, the objects are also erroneously refer-      it does not provide a full language for heap properties, but
enced from Customer objects, causing a memory leak. (The         rather a fixed set of assertions that cannot be composed. Sec-
bug fix consists of unlinking Order objects from Customers       ond, it does not check assertions at the time they are encoun-
when they are removed from the orderTable.) Our assertion        tered, but instead defers checking until the next regularly
checks whether the objects removed from orderTable can           scheduled garbage collection. While this choice improves
be reached. This can be effected with a simple variant of the    performance, it limits the kinds of heap properties to those
programming pattern of Section 4. Namely, once an object         that can be checked at any time, since they might, for exam-
is removed from the orderTable we mark it by assigning a         ple, be deferred to a point at which they are knowingly but
special value to one of its fields. The DeAL assertion then      temporarily invalidated.
200

                                                                                                                                                                           Base
                                                                                                                                                                           Assert
 Normalized GC time

                      150
                                                                                                                                                                           AssertDisjoint

                      100

                      50

                        0
                            da

                                  da

                                            da

                                                      da

                                                            da

                                                                      da

                                                                             da

                                                                                       da

                                                                                                  da

                                                                                                         da

                                                                                                                   da

                                                                                                                           da

                                                                                                                                da 200 lan

                                                                                                                                da 200 nde

                                                                                                                                pj

                                                                                                                                jv

                                                                                                                                jv

                                                                                                                                jv _20 mp

                                                                                                                                jv _20 ss

                                                                                                                                jv _20 ytra

                                                                                                                                jv _21

                                                                                                                                jv _22 vac

                                                                                                                                jv _22 peg

                                                                                                                                ge
                                                                                                                                   bb 09

                                                                                                                                   m

                                                                                                                                   m

                                                                                                                                   m

                                                                                                                                   m

                                                                                                                                   m

                                                                                                                                   m

                                                                                                                                   m

                                                                                                                                   m
                             ca

                                      ca

                                             ca

                                                       ca

                                                                ca

                                                                        ca

                                                                                 ca

                                                                                        ca

                                                                                                   ca

                                                                                                            ca

                                                                                                                      ca

                                                                                                                             ca

                                                                                                                                   ca

                                                                                                                                   ca

                                                                                                                                   om
                                                                                                                                     98 jbb

                                                                                                                                     98

                                                                                                                                     98

                                                                                                                                     98

                                                                                                                                     98 9_d ce

                                                                                                                                     98 3_ja

                                                                                                                                     98 2_m

                                                                                                                                     98 7_m aud
                                                                                                                                     20 .xa h
                                 po

                                       po

                                                 po

                                                           po

                                                                  po

                                                                           po

                                                                                  po

                                                                                            po

                                                                                                       po

                                                                                                              po

                                                                                                                        po

                                                                                                                                po

                                                                                                                                      po

                                                                                                                                      po

                                                                                                                                       ea
                                                                                                                                        ._

                                                                                                                                        .

                                                                                                                                        .

                                                                                                                                        .

                                                                                                                                        .

                                                                                                                                        .

                                                                                                                                        .

                                                                                                                                        ._
                                                                                                                                         0
                                  20

                                           20

                                                  20

                                                            20

                                                                     20

                                                                             20

                                                                                      20

                                                                                              20

                                                                                                        20

                                                                                                                 20

                                                                                                                           20

                                                                                                                                          2

                                                                                                                                           n
                                                                                                                                           0.

                                                                                                                                           20 000

                                                                                                                                           22 trt
                                                                                                                                            0
                                      06

                                             06

                                                      06

                                                                06

                                                                        06

                                                                                 06

                                                                                       06

                                                                                                  06

                                                                                                            06

                                                                                                                   06

                                                                                                                             06

                                                                                                                                              p

                                                                                                                                               1

                                                                                                                                               2

                                                                                                                                               5

                                                                                                                                               8_
                                                                                                                                               9.

                                                                                                                                               9.

                                                                                                                                                 _c

                                                                                                                                                 _j ress

                                                                                                                                                 _r
                                       .an

                                                 .b

                                                       .ch

                                                                  .ec

                                                                           .fo

                                                                                  .h

                                                                                            .jy

                                                                                                   .lu

                                                                                                             .lu

                                                                                                                      .p

                                                                                                                                .x

                                                                                                                                                  lu

                                                                                                                                                  l

                                                                                                                                                   ja
                                                                                                                                                    us
                                                 lo

                                                                                  sq

                                                                                                                        m

                                                                                                                                                    a

                                                                                                                                                    e

                                                                                                                                                     a
                                                                                                                                                     o
                                                                                                                                                     2

                                                                                                                                                     b
                                                                                             th

                                                                                                       in

                                                                                                                 se

                                                                                                                                                     i
                                                                             p
                                                                     lip
                                           tlr

                                                            ar

                                                                                                                                                      l

                                                                                                                                                      ck
                                                                                                                                                       e

                                                                                                                                                        an
                                                  at

                                                                                                                           d
                                                                                       ld

                                                                                              on

                                                                                                        de

                                                                                                                   ar

                                                                                                                                                         ar
                                                              t

                                                                        se

                                                                                       b

                                                                                                                     ch

                                                                                                                                                           c
                                                                                                          x

                                                                                                                                                            x

                                                                                                                                                              io
                                                      Figure 4. GC time overhead of our system for end-to-end benchmarks.

    The QVM system provides a richer set of heap checks,                                                                          pair broken data structures [6]. While the checking overhead
called heap probes, which QVM performs using garbage                                                                              is difficult to evaluate (since it is presented in milliseconds),
collector machinery at the point they are requested [3]. The                                                                      the paper describes it as “a rather large overhead.”
PHALANX system extends on QVM to add checks for
reachability and dominance [22] while parallelizing heap                                                                          7.   Conclusions and Future Work
probe execution. Both systems allow the checking of prop-                                                                         We have presented the DeAL language for assertions that are
erties that can require multiple heap traversals to check,                                                                        evaluated in a single traversal of live objects, implemented
significantly impacting performance. The authors explore                                                                          as a variation of a garbage collection traversal. DeAL is a
both sampling (at a user chosen rate) and parallelization to                                                                      rich language that can express many useful properties as
control the cost. We believe that DeAL represents a sweet                                                                         logical combinations of reachability conditions and local
spot between GC Assertions and QVM: unlike QVM, DeAL                                                                              comparisons. At the same time, the overhead of DeAL on a
controls overhead by guaranteeing that all properties can                                                                         program’s total execution time remains very low—typically
be checked in a single heap traversal, and by only check-                                                                         well below the 5% level. Our implementation of DeAL is
ing these properties at or near a regularly scheduled collec-                                                                     available for public download 5 .
tion; but unlike GC Assertions, DeAL offers a full asser-                                                                             Although the current language forms a nicely closed set
tion language and a clearer semantics regarding the timing                                                                        of features, there are clear directions of interesting future ex-
of checks.                                                                                                                        tensions. One possibility is to extend DeAL from an asser-
    The Ditto system reduces the cost of data structure check-                                                                    tion language to a general query language that can produce
ing by incrementalizing the checking code itself [19]. This                                                                       new data structures. The typical way to define a declarative
system provides a significantly different tradeoff than DeAL:                                                                     query language is by allowing unbound logical variables. For
it supports arbitrary checks written procedurally in Java it-                                                                     instance, one can imagine a query primitive, used with first-
self, but even with incrementalization incurs a significant                                                                       order logic formulas. The matching values for unbound vari-
overhead (in the range of 3X to 10X slowdown).                                                                                    ables become the tuples returned as the result of the query.
    A number of systems developed for data structure re-                                                                          For example, one might produce a new Java set containing
pair include a method for detecting data structure errors. Al-                                                                    all positive elements of a list by writing:
though these checking subsystems are not the focus of the
work, they highlight the difficulty of detecting such errors.
                                                                                                                                  Set posval =
The STARC system uses programmer-written repOk meth-
ods to detect errors, which incur a very substantial overhead                                                                            query( x.val, (x ∈ Node ∧ Rd(x)) ⇒ (x.val > 0));
(30X) when run frequently, and even a 20% overhead when                                                                              The first part, x.val, of the above query determines the
run only at the point of failure [8]. Demsky et al. present a                                                                     form of the result as a projection of the matched values,
technique based on Daikon to detect heap invariants and re-
                                                                                                                                  5 http://www.cs.umass.edu/
                                                                                                                                                              ˜creichen/gcassert/index.html
You can also read