A Generic Approach of Static Analysis for Detecting Runtime Errors in Java Programs

Page created by Irene Alexander
 
CONTINUE READING
A Generic Approach of Static Analysis
          for Detecting Runtime Errors in Java Programs
                                   Xiaoping Jia, Sotiris Skevoulis
                                  Division of Software Engineering
                          School of Computer Science, Telecommunication,
                                      and Information Systems
                                         DePaul University
                                      Chicago, Illinois, U.S.A.
                       E-mail: jia@cs.depaul.edu, sskevoul@cs.depaul.edu

                      Abstract                              language, which is strongly typed with extensive and
   This paper presents a generic approach to statically     sophisticated runtime checking mechanism.
analyze Java programs in order to detect potential er-         However, even the most sophisticated compilers
rors (bugs). We discuss a framework that supports our       have a limited ability to detect potential errors. This
approach and carries out the static analysis of Java        is due to certain time, resources and performance con-
code automatically. Our approach can automatically          straints that limit the amount of checking that com-
detect potential bugs and report them before the pro-       pilers can perform. More serious problems, that can
gram is executed. For a Java class, invariants related      cause runtime exceptions, such as attempts to access
to the category of error under examination are au-          null pointers, out of bounds array indeces, etc. can not
tomatically generated and used to assess the validity       be caught by any compiler yet. Most of these kinds of
of variable usage in the implementation of this class.      errors are usually caught at runtime.
Our approach is distinctive in its emphasis to provide         We approach the problem of potential error detec-
a practical generic mechanism for error detection that      tion, using formal methods. We provide a partial so-
is capable of addressing error detection for a variety of   lution and a generic mechanism capable of detecting
error categories via a web of specialized components.       a variety of runtime errors. We apply static analysis
A research prototype has been developed that demon-         and veri cation techniques, to automatically analyse
strates the feasibility and eectiveness of our approach.   Java programs and detect potential bugs that can not
                                                            be detected by compilers and other data ow-based
                                                            analysis techniques. Our approach is based on the use
1. Introduction                                             of the logical concepts of class invariant and weakest
                                                            precondition 4]. We developed a generic and deter-
   Detecting errors in programs has always been one         ministic detection mechanism capable of addressing
of the most active areas of research 4, 1, 11, 7] in        the problem of error detection for a variety of run-
computer science. Numerous research eorts have re-         time errors. The detection mechanism comprises a
sulted in techniques that allow us to reason about pro-     set of generic and specialized algorirhms to carry out
grams. Concerns about program correctness has led           analysis of Java code automatically. It consists of the
from the Hoare triples 5], to Dijkstra's weakest pre-       following components:
condition 1] and currently to complete veri cation
environments such as Higher Order Logic (HOL) 2]                 Generic analysis It can be used to provide the
and the Prototype Veri cation System (PVS) 11]. A                framework for any kind of analysis that we need.
large number of programming notations exists that                Specialized analysis. It specializes the behavior
use a variety of mechanisms to prevent potential er-             of some of the steps de ned in the generic mech-
rors from occurring or to notify the user in a safe man-         anism.
ner if errors do occur. Examples of such mechanisms
include strong typing, smart compilers, and runtime              Prototype. A fully functional prototype imple-
checking. Java 3] is an example of such programming              ments the algorithms and provides feedback that
is used to enhance and strengthen our approach         generic and specialized components in order to deter-
                                                           mine invariants related to the speci c analysis that is
   Our approach uses a generic mechanism for carry-        performed. The invariant determination process takes
ing out the static analysis of Java programs and pro-      into consideration the dierent initialization semantics
vides a number of specialized components which take        for static and instance variables i.e. static variables
into account the idiosycratic details of each category     are instantiated during the rst active use 6] of the
of errors that we are trying to detect.                    type (class). The generic algorithms are listed below:
   The detection mechanism of our approach has cer-
tain limitations, mainly due to the intractability of          DetermineInvariant. This algorithm is responsi-
theorem proving process. We can not guarantee abso-            ble for generating and verifying a class invariant.
lute success in nding all bugs even for the restricted
types of analysis discussed earlier. Our goal is not           BreakDownCandInv. It splits the candidate in-
to nd all errors rather to nd the majority of them             variant into predicates involving static variables
in a fully automatic and transparent to the developer          and predicates involving instance variables.
way. The key characteristics and contributions of our
approach could be summarized as follows::                      IsInvariant. It checks whether a predicate regard-
                                                               ing class level variables is an invariant.
Generic detection mechanism. Our approach is
    both speci c enough to detect errors determinis-           CalculateWP. (CWP) Given a statement and a
    tically and generic enough to provide an ecient           predicate, it calculates the weakest precondition.
    and eective framework for an extensive variety            Prove. It is a call to the thorem prover that at-
    of error categories.                                       tempts to discharge proof obligations.
Complete automation. The entire process of ana-                CheckViolation. It analyzes the implementation
    lyzing the source code and reporting the potential
    violations is fully automatic.                             of the class and based on the class invariant, it
                                                               check for potential errors.
No need for formal specications. We do not re-
    quire formal speci cations. Indeed, we recover            Each one of them uses a number of generic and spe-
    the relevant speci cations from the source code.       cialized algorithms. Specialized analysis is achieved
                                                           with a number of speci c algorithms that carry out
Flexibility. Our approach can be easily extended           analysis for speci c categories of bugs. Some of these
    to incorporate any user provided speci cations         algorithms are listed below:
    which will strengthen the capability of the analy-
    sis component and will extend its set of potential         ConstructCandInv. It constructs a potential in-
    errors that can eectively check for.                      variant based on the category of error that we
   In the following section we discuss the generic             check for
framework and present the algorithms. Section three            Mutate. The goal of the mutation algorithm
presents an overview of the prototype tool and its com-        is to provide an array of weaker predicates and
ponents. Section four discusses the experiments that           store them in order of strength starting from the
have been carried out using the prototype tool. Re-            strongest one (the predicate itself unchanged) go-
lated work and comparisons with our approach is pre-           ing to weaker ones until the predicate can not be
sented in section ve. We discuss some future work              mutated anymore and its weakest mutated form
is section six and conclude in section seven, with a           is the predicate true.
summary of main ndings and results.
                                                               CreateTargets It creates the veri cation condi-
                                                               tions that will ensure the safe use of variables in
2. The Analysis Framework                                      the Java code.
   An important concept in our approach is the invari-
ant of a class. A class invariant is a condition that is      Determining a class invariant involves the construc-
satis ed by all non-transient states of the instances of   tion of a candidate invariant, its mutation and check-
the class. A generic algorithm which can determine         ing if it satis es the requirements to be an invariant.
an invariant has been developed. It uses a number of
2.1. The DetermineInvariant Algorithm                              expresses a condition about static variables and one
   We informally de ne the following concepts:                     about the all variables. Invariant is broken down to
                                                                   two sets in order to examine seperately the properties
    Static Invariant is a condition regarding only                 that do not depend on the instantiation of the Java
    static variables of the Java class, that is ensured            class from the ones that do. Each predicate is exam-
    by all static initialization blocks and is preserved           ined seperately and if it ful lls the requirements of an
    by all constructors and public methods.                        invariant, is added to the invariant of the class. The
    Instance Invariant is a condition regarding all                output of the algorithm is the invariant for the Java
    class level variables of the Java class, that is en-           class under examination.
    sured by all constructors and is preserved by all
    public methods.
                                                                   2.1.1. Determining the Invariant
P Predicate DETERMINE I NVARIANT ( in JavaClass Class)
                                                                      We need to establish a condition (i.e invariant) that
 1. /* Initialization */                                           is true for the relevant variables every time a class is
     P Predicate Invariant := ?                                   instantiated. The algorithm described below is used
     P Predicate CandInstInv := ?                                 to determine invariants related to both static and in-
     P Predicate CandStaticInv := ?                               stance variables.
     P Predicate InstanceInv := ?
     P Predicate StaticInv := ?                                    boolean I S I NVARIANT (in P Predicate Precond, in P Predicate
     P Predicate PermutCandStaticInv := ?
     P Predicate PermutCandInstInv := ?                                                             Candidate)
    seq Predicate MutCandStaticInv := hi                            boolean progress := false
    seq Predicate MutCandInstInv := hi                               for each ci 2 InstanceConstructionBlock do
 2. /* Constructing candidate static and instance invariants*/         progress :=
    CandInstInv                       :=            C ONSTRUCT C      PROVE (Precond ) CWP (ci Candidate))
    ANDINV (Class CandStaticInv)
    PermCandStaticInv                    :=        BREAKDOWN C         if : progress then
    ANDINV (CandStaticInv)                                               break
    PermCandInstInv := BREAKDOWN C ANDINV (CandInstInv)               end if
 3. /* Determine invariant related to static variables */           end for
    for each predst 2 PermCandStaticInv do                          if progress then
      MutCandStaticInv := M UTATE (predst )
      for each mutpred in MutCandStaticInv do                          for each mj 2 Methods do
         if I S STATIC I NVARIANT (mutpred ) then                         progress := PROVE (Candidate ) CWP (mj Candidate))
            StaticInv :=StaticInv  fmutpred g                               if :progress then
           break
        end if                                                              break
      end for                                                            end if
     end for                                                           end for
 4. /* Determine invariant of instance variables */                 return progress
    for each predinst 2 PermCandInstInv do
      MutCandInstInv := M UTATE (predinst )
      for each mutpred in MutCandInstInv do                           Due to the dierent instantiation semantics be-
         if I S I NVARIANT (mutpred ) then
            InstanceInv :=InstanceInv  fmutpred g                 tween static and instance variables, we actually run
           break                                                   the algorithm with dierent inputs and each time we
        end if                                                     consider dierent instantiation blocks. Speci cally for
      end for                                                      the static invariant determination the static blocks
     end for
 5. /* The invariant found */
                                                                   must ensure the potential invariant while for instance
     Invariant := StaticInv  InstanceInv                          invariant we consider init blocks and constructors.
     return Invariant                                              The precondition that holds before every instantia-
                                                                   tion of a class is the StaticInvariant . In the case that
   DetermineInvariant algorithm accepts as input a                 the class has no static variables, the precondition is
Java class and returns a predicate that is satis ed by             reduced to the predicate true. We use this as a Pre-
all non transient instances of this class. It reads in             condition input in our algorithm along with the can-
the class level variables both static and non static               didate invariant, Candidate. An invariant regarding
and forms a candidate invariant. It breaks down                    instance variables has to be ensured by constructors
the formed invariant into two predicates: one that                 and preserved by each public method.
2.2. The Check Violation Algorithms                                       P Predicate Targets := ?
                                                                          P Variable ProblematicVars := ?
   The algorithms presented in the previous section                    2. /* Check Java block of code for bugs with a given precondi-
determine an invariant with regard to a speci c prop-                     tion */
erty that we try to ensure. We can now check the                          for each    statement si 2 Block do
usage of each relevant variable to detect any possible                       ProcessedStatements := ProcessedStatements a hsi i
                                                                             Targets := C REATE TARGETS (si )
violations.                                                                  if Targets 6= ? then
                                                                                for each pred 2 Targets do
void aMethod(...) {                                                                correct := PROVE (Precondition )
     fInvariant g                 the pre-condition                                     CWP (ProcessedStatements pred))
     // ...                       other statements                                  if : correct then
     f:isNull (v )g               the assertion we attempt to prove                    ProblematicVars := ProblematicVars  fvar g
                                                                                  end if
     v.m(...)                     the dereference                               end for
     // ...                       other statements                             StatementsWithBugs := StatementsWithBugs 
}                                                                         fsi ! ProblematicVars g
   In order to do that we provide an algorithm that                         end if
                                                                          end for
scans though the implementation of the class, locates
all the relevent variables and forms the appropriate                   3. /* Return potential bugs found in the block of code */
veri cation conditions.                                                   return StatementsWithBugs 
P StatementBugs CHECK CLASS V IOLATION ( in JavaClass Class,
in P Predicate Invariant)                                             3. Prototype Development
                                                                         The main components of the prototype are listed
    1. /* Initialization */                                           below:
       P StatementBugs StatementsWithBugs := ?
    2. /* Check Static Block for potential bugs. Precondition is:
        true */
                                                                          Model constructor The modeling activity de-
       StatementsWithBugs := StatementsWithBugs                          pends on an extensive set of classes that provide
                     C HECK V IOLATION (StaticBlock , true )              support for every Java programming construct, i.e
    3. /* Check Constructors for potential Bugs. Precondition is:         class, method, block of code, statements, types,
       StaticInvariant */                                                 etc. An abstract syntax tree is created and con-
       for each cj 2 InstanceConstructionBlock do                         trol ow graph of the program is constructed.
          StatementsWithBugs := StatementsWithBugs                       Based on the control ow graph model all the
                     C HECK V IOLATION (cj , StaticInvariant)
       end for                                                            paths of execution are identi ed.
    4. /* Check public Methods for potential Bugs. Precondition
       is: Invariant */
                                                                          Invariant generator It uses the generic and cus-
       for each mj 2 Methods do
                                                                          tomizable algorithms to derive an invariant re-
           StatementsWithBugs := StatementsWithBugs                      garding the speci c property that is under inves-
                      C HECK V IOLATION (mj , Invariant)                  tigation.
       end for
    5. /* Return the pstatement with potential violations */              Violation detector Given the invariant from
       return StatementsWithBugs                                          the invariant generator, it scans through class im-
                                                                          plementation to form the approprate veri cation
                                                                          conditions, and passes them to the prover.
   In order to identify potential errors, we need to
carefully check every statement in the implementation                    In general, theorem proving that involves unre-
of the class, locate the points that these variables are              stricted predicates is intractable. Our approach has
used, form the appropriate predicate that needs to be                 focused on the use of such restricted predicates. Ex-
true in order for the variables to be safely used by the              amples of such restricted predicates are the following:
program at the point of execution. This is shown in a
general form below:                                                        Checking for illegal dereference of variable of ref-
P StatementBugs CHECK V IOLATION ( in JavaBlockCode Block,                 erence type myvar. We can formulate the predi-
       in P Predicate Precondition)                                        cate :isNull (myvar ).
    1. /* Initialization */                                                Checking for array index falling out of bounds
       seq Statement ProcessedStatements := hi                            while accessing the ith element of myarrayi]. We
       boolean condition := true 
       boolean correct := false                                           formulate the predicate i < myarray :length .
4. Experiments                                             Error
                                                           null pointer
                                                                                          Scenario
                                                                          class level var not initialized
   We assess the capabilities of our approach by con-      array bounds   array initialized with size , where size  0
ducting a number of small experiments. The results         null pointer   class level var becomes null at arbitrary
are encouranging and show that the prototype can           array bounds   array created and accessed in local scope
serve as a vehicle for further research in the area.       null pointer   var is dereferenced under two condition blocks
Our rst goal was to demonstrate the feasibility of         array bounds   class level array initialized with int constant:
                                                                               a) remains constant
the approach. Small scale projects have been con-                              b) may change in various constructors
ducted and the results have been evaluated. Our re-        null pointer   null var dereferenced in an unexecutable path
search eort is experimental in nature. Its success can    array bounds   array access guarded by arr.length
be judged through experiments rather than theoreti-        null pointer   var initialized but randomnly becomes null
cal proofs and analyses. Experimentation is used as        null pointer   var becomes null before end of loop iteration
                                                           null pointer   var not initialized but dereferenced is guarded
a feedback mechanism to our theoretical studies and
solutions. The goal is to gain and demonstrate eec-                 Figure 1: Experimental Scenarios
tiveness, through experimentation. The experimental
prototype allows us to gain insight, discover obstacles
and limitations of our techniques. We use the feedback     scenarions are shown in Fig. 1. Prior to any testing all
given by the tool to re ne and enhance our approach.       test cases were successfully compiled to ensure syntac-
We conducted a large number of tests and the proto-        tic correctness of the code. For each of the classi ed
type tool exhibited the following key characteristics:     cases we performed complete set of permutations un-
                                                           der dierent scenarios to measure the eectiveness of
Complete Java coverage. We were able to cover              the approach. Our prototype was capable of nding
   the entire Java language. No syntactical or se-         over 80% of the errors in each scenario. In summary,
   mantical restriction on the original language has       the early experiments are encouranging. The proto-
   been imposed.                                           type is still under development and more extensive
No guarantee of program correctness. Our ap-               case studies are planned.
   proach does not promise absolute correctness or
   absence of errors. It rather strives to identify cer-
                                                           5. Related Work
                                                               Flow control based techniques have been used suc-
   tain types of potential errors                          cessfully as part of modern compilers such as Java,
Application of formal methods Our approach                 to provide early detection of a various anomalies and
   promotes the use of formal methods in a way             minimize the number of inconsistencies in the pro-
   completely transparent to the user. We can              grams. Unfortunately these techniques are not suc-
   actually apply formal methods with little cost          cessuful in dealing with invariant properties, pre/post
   but concrete gains.                                     conditions and cannot prevent a number of tedious
                                                           bugs from occurring. Such bugs include dereferencing
Concerete gains. The prototype tool under devel-           a null pointer or array index falling out of bounds,
   opment provides the means for measuring the ef-         etc. These kinds of problems are beyond the capabil-
   fectiveness and capability of our approach. Early       ities of compilers and subsequently are transfered to
   experiments indicate that:                              the runtime environment where some languages oer
                                                           exception handling mechanisms.
         Our approach does not introduce any radi-             More recently, research work at DEC laboratories
         cal changes in the way practitioners develop      has resulted in Extended Static Checking (ESC) 8, 10]
         their code.                                       and checking object invariants 9]. They attempt to
         It removes the burden of proof from the soft-     use formal methods to identify particular kinds of bugs
         ware practitioners                                in a programming language and provide also some
         The analysis is fully automatic                   kind of feedback to the programmer about those po-
                                                           tential bugs. ESC translates progrmas into a an ab-
   We have developed a classi cation scheme for the        stract form based on Dijkstra's guarded command.
dierent causes of errors in both kinds of specialized     Their logical framework is untyped rst order pred-
analysis discussed in this paper: null pointer and array   icate calculus and the lack of type information for
bounds. For each kind of anomaly we analyzed and           the proofs is covered by the background predicate 9]
enumerated the causes and the scenarios under which        which includes type axioms. An unsuccessful proof
a violation will be occur in the program. Some of those    return information about the reason of its failure.
The work presented in this paper is a generalisa-     References
tion and extension of the work 12] on the null pointer    1] E. Dijkstra. Guarded commands, nondetermi-
detection which carried out by the members of For-           nacy and formal derivation of program. Com-
mal Methods group at Software Engineering Division           munications of ACM, 18(8):453{458, 1975.
at DePaul University.
                                                          2] M. Gordon and T. Melham. An Introduction to
6. Future Work                                               HOL: A theorem proving environment for higher
   The work presented in this paper lays out the foun-       order logic, 1993.
dation of a comprehensive analysis of Java classes.       3] J. Gosling, B. Joy, and G. Steele. The
Our work will evolve in three main directions in or-         JavaTM Language Speci cation. Addison-Wesley,
der to provide:                                              1996.
    Stronger analysis techniques which will cover         4] D. Gries. The science of Programming. Springer-
    more complicated cases within the already de-            Verlag, 1981.
     ned categories of potential bugs
                                                          5] A. Hoare. An axiomatic basis for com-
    An extensive array of algorithms capable of han-         puter programming. Communications of ACM,
    dling dierent cases                                     12(10):576{580, 1969.
    Broader coverage of kinds of errors that can be       6] B. J. Gosling and G. Steele. Java speci cation
    automatically checked. These include but not             language. Technical report, 1996. available from
                                                             http://www.javasoft.com.
    limited to:
                                                          7] D. Jackson. A formal Speci cation Language for
       { illegal downcasting                                 Detecting Bugs, 1992. PhD Thesis, Massachusetts
       { string index out of bounds                          Institute of Technology.
                                                          8] D. l. Detlefs. An overview of the extended static
7. Conclusion                                                checking. In Proc. The First Workshop on Formal
    Usually, checking for errors is not one of the rst       Methods inSoftware Practice, pages 1{9, 1996.
priorities of a software developer. Sometimes errors         ACM-SIGSOFT.
even escape detection during testing and products are
released to customers with a number of bugs hidden        9] K. R. M. Leino. Ecstatic: An object-oriented pro-
in the code. Our approach provides a solution to this        gramming language with axiomatic semantics. In
problem by focusing on the detection of certain kinds        Proc. FOOL4, 1997. Fourth International Work-
of bugs. The approach and algorithms described in            shop on Foundations of Object-Oriented Lan-
this paper are dealing with the concept of class in-         guages.
variant and based on that, check the implementation      10] K. R. M. Leino and R. Stata. Checking object
of the class. The implication is that for an instance        invariants. Technical report, Digital Equipment
created by any public constructor of the class, any          Corporation Research Center, 1997. Palo Alto,
public (thread-safe) method can be invoked in any or-        CA.
der. In our work, we address some of the obstacles
that prevent formal methods from being widely used       11] N. S. S. Owre, J. Rushby. PVS: A prototype
in software industry. We make the use of formal meth-        veri cation system. Lecture Notes in Arti cial
ods and theorem proving completeley trasnparent to           Intelligence, 607:748{752, 1992.
the software practitioner. By sacri cing guarantee of
absolute corectness and lack of errors, we provide a     12] S. Sawant. Applying static analysis for detecting
mechanism to detect an arguably signi cant number            null pointers in java programs. Technical report,
of common errors, that are part of the daily debugging       Oct. 1998. MS Thesis.
routine of every software practitioner.
    Our approach focused on certain types of anomalies
in the source code that can be detected automatically.
The key feature which makes our approach practical
is its complete automation which entails practicality.
You can also read