Lock-and-key Strategies for Handling Undefined Variables


              Lock-and-key Strategies for Handling
                      Undefined Variables

                         richard b. borie and allen s. parrish
       Department of Computer Science, University of Alabama, Box 870290, Tuscaloosa, AL
                                      35487-0290, U.S.A.
                                       srinivas mandyam
                 BusLogic, Inc., 4151 Burton Drive, Santa Clara, CA 95054, U.S.A.

Most programming languages ignore the problem of undefined variables and permit compilers to
allow the leftover contents of the memory cells belonging to such variables to be referenced.
Although efficient, this type of semantics does not support software engineering, because defects
might arise in subtle ways and might be difficult to locate. Two other types of semantics that
better support software engineering are to either initialize all variables to some default, e.g. zero,
or to require that all references to undefined variables be treated as errors. However, these types
of semantics are obviously more expensive than simply ignoring the undefined variables entirely.
   In this paper, we propose a simple technique that works equally well for both of these latter
two types of semantics, and whose efficiency compares favorably for certain realistic programs
with more traditional implementations of these semantics. Furthermore, we provide a mechanism
for using this technique through Ada implementations of two abstract data types where undefined
variables respectively exhibit these two types of semantics, and whose implementations of these
semantics use our technique. These abstract data types allow our technique to be selectively used
in strictly those situations where the cost of the technique is justified. We provide practical
examples illustrating these situations.
key words: Undefined variables   Programming languages    Software engineering   Algorithms   Language
design Sparse arrays

The method of handling variables that are referenced before being defined represents
one of many decisions that have to be made by programming language designers.
At the semantic level, there are three approaches:1
  I1. Allow values of undefined variables to be referenced. This means that bit
      patterns that happen to be ‘leftover’ in memory from previous executions
      could be used by the program.
  I2. Initialize all variables to default values, e.g. zero for integers, blank for
      characters, etc.
  I3. Define rules prohibiting references to undefined variables.

0038–0644/93/070693–18$14.00                                            Received 10 August 1992
 1993 by John Wiley & Sons, Ltd.                                        Revised 5 January 1993
694                  r. b. borie, a. s. parrish and s. mandyam
   I1 effectively means that the whole issue is ignored, and undefined variables are
treated just like any other variable. Among these different semantics, I1 can be
‘implemented’ most efficiently, since actually no implementation is required. Conse-
quently, I1 is the most frequently adopted semantics by language designers.1,2
However, I1 is generally acknowledged to be the most problematic semantics in
terms of software engineering.1–5 For example, both Reference 2 and Reference 3
observe that I1 can hide defects during testing. Consider a program which contains
a counter that was not properly initialized to zero at the beginning of the program.
If the counter happens to initially contain a zero coincidentally during the testing
process, testing might not reveal any defects. But if the program then later executes
in an environment where the counter is coincidentally initialized to something other
than zero, a defect might then appear.
   This example seems to offer some support for I2 or I3, despite additional
implementation costs. Between these two approaches, there is some disagreement as
to which one better supports software engineering. Some authors, e.g. Meyer1 and
Weide, Ogden and Zweben4, argue in favor of I2, because it simplifies the program-
mer’s tasks if the initialization values are well-chosen. On the other hand, Reference
2 argues in favor of I3, because despite the use of well-chosen initialization values,
there might be some cases where a default initialization value was not really intended
to be used. In such cases, the programmer might have made a mistake in either
failing to properly assign the variable, or in referencing a variable that should not
have been referenced. As an example from Reference 2, consider a program that is
supposed to find the minimum value in an array (1. .N) of natural numbers that has
been filled by the programmer in positions (1. .i), where i is allowed to be strictly
less than N. If the program is ‘off-by-one’ and accidentally looks at position i+1 in
searching for the minimum, and I2 semantics have been used to initialize the array
to zeros, then the program could conceivably indicate that the 0 in the (i+1)th
position is the minimum. Yet the (i+1)th position is irrelevant, because it is not part
of the data set. Depending on the context, this error might be difficult to find during
testing and debugging.
   Regardless of whether I2 or I3 better supports software engineering, it is clear
that both offer more support than I1, yet both are more expensive than I1. Conse-
quently, reducing the cost of implementations of these semantics seems to be a
desirable goal. In this paper, we propose a simple technique that works equally well
for both of these latter two types of semantics, and whose efficiency compares
favorably for certain realistic programs with more traditional implementations of
these semantics. We offer an analysis of our approach that concretely identifies the
types of programs for which the approach outperforms existing implementations.
Furthermore, we provide a mechanism for using this technique through Ada
implementations of two abstract data types where undefined variables respectively
exhibit these two types of semantics, and whose implementations of these semantics
use our technique. These abstract data types allow the programmer to selectively
use our technique in strictly those situations where the cost of the technique is
justified. We provide practical examples illustrating these situations. Although there
are advantages in building our technique directly into compilers, our use of abstract
data types has the advantage of permitting our technique to be used with existing
lock-and-key strategies                              695
                           EXISTING IMPLEMENTATIONS
In this section, we consider existing, typical implementations of I2 and I3. Our
interest is in informally analyzing the run-time requirements of these implementations
beyond what is already required by normal program execution, i.e. with I1 semantics.
To make our analysis as widely applicable as possible, we assume a typical machine
architecture where no special provisions have been made for efficiently implementing
these semantics. We also conduct our analysis in the context of local procedure
variables in a stack-based language such as Pascal. Thus, although the load image
for a program may include a static memory map allowing global variables to be
initialized without run-time overhead, our discussion is in the context of local
procedure variables, for which run-time initialization overhead is necessary.
   For simplicity, we define a variable to be any scalar variable of simple type, such
as integer, character, real or boolean, or any elementary component of a structure,
such as a single integer element from an array of integers. Also, we use the term
program to mean any program unit, such as a procedure or block, for which variables
can be separately allocated at initiation and destroyed at termination. We then refer
to the collection of all variables in a program as the program’s variable space. A
variable definition is a statement that assigns a value to the variable, i.e. through
assignment or user input. A variable reference is a statement that refers to an
existing value of the variable, i.e. through the right side of an assignment statement
or through a predicate.
   Throughout this work, it is important to classify variable references as follows:
  1. Variable references for which the compiler can determine whether or not the
     variable has been defined. These variable references fall into two subcategories:
        (a) Variable   references for which there is no path from the beginning of the
            program    to the reference that contains a definition of the variable.
        (b) Variable    references for which every path from the beginning of the
            program    to the reference contains a definition of the variable.
  2. Variable references for which the compiler cannot determine whether or not the
     variable has been defined. These variable references fall into two subcategories:
        (a) Variable references for which some, but not all paths from the beginning
            of the program to the reference contain definitions of the variable.
        (b) References where the memory location is based entirely on run time
            events. This includes most references to array elements that have variable
            subscripts, and to dereferenced pointers.
   The compiler is unable to determine whether references in 2(a) are guaranteed to
be defined, because it cannot determine whether the paths containing definitions are
executable. A similar determination is impossible for 2(b) because the compiler
cannot determine the value of the subscript variable. We note that because of 2(b),
variable references falling into categories 1(a) and 1(b) are usually to locations where
the relative address can be computed at compile time, such as normal scalar variables.
   We first consider I3 semantics, which makes references to undefined variables
illegal. Because static analysis can detect references falling into category 1, such
references are either guaranteed errors 1(a), and can be flagged at compile time with
696                  r. b. borie, a. s. parrish and s. mandyam
a fatal error, or are guaranteed not to be errors 1(b), and can be flagged and ignored
at run-time. However, since the compiler cannot determine whether variable references
in category 2 have been defined, run-time analysis is necessary for variables in this
category. Also in the absence of static analysis to identify variables falling into the
first category, such run-time analysis must be performed with respect to all variables.
   There are several implementations of tun-time analysis for I3.2 Two representative
implementations are as follows:

  1. Undefined values method. There is an ‘undefined’ value for every type, that is
     outside the set of legal values for the type. The compiler must generate code
     to initialize every variable to the undefined value for its type. This code is
     executed at program start-up time. Then, every time a variable is referenced,
     if the variable contains the undefined value for its type, an error message is
     generated. This requires time at start-up to perform the initialization, proportional
     to the number of variables with references in category 2, and constant time
     for every variable reference in category 2.
  2. Extra bit method. For every variable, an extra ‘defined’ bit denotes whether or
     not the variable has been assigned a value. All of the bits must be set to
     ‘undefined’ initially, requiring time at start-up to perform the initialization
     proportional to the number of variables with references in category 2. Then
     constant time is required for every variable reference in category 2 to check
     the assigned bit, and also for every variable definition with a subsequent
     reference in category 2 to check and/or update this bit. If the assigned bit still
     has its ‘undefined’ value when the corresponding variable is referenced, an
     error message is generated.

   Other existing implementations of the run-time analysis component of I3 possess
the same general form. Essentially, for any of these implementations, an initialization
must first be performed, requiring time proportional to the number of variables with
references in category 2, to set every variable to an undefined state. Then when the
user code is executed, every variable reference must be checked to see whether or
not the variable is in its undefined state. If a variable is referenced while in its
undefined state, an error message is generated.
   To implement I2 semantics, the compiler must generate additional code to initialize
some or all program variables. In the absence of static analysis, all program variables
must be initialized. If static analysis is performed to categorize variable references
as above, then an optimization is possible: it is not necessary to generate initialization
code for variables where every reference falls into category 1(b), since such variables
are always guaranteed to be initialized. Of course, both of these approaches require
time proportional to the number of variables being initialized.
   An alternative approach to implementing I2 involves generating code to initialize
only those variables having references in category 1(a). As pointed out already, it
is unnecessary to generate initialization code for variables whose only references are
in category 1(b). Rather than initializing variables having references in category 2,
one of the strategies for detecting undefined variables at run-time could be employed.
However, instead of generating an error, a default value for the type could be
returned. This approach takes initialization time proportional to the number of
lock-and-key strategies                               697
variables having references in category 1(a), plus the time required to detect undefined
variables in category 2.
   All of these implementations of I2 and I3 share a potentially non-trivial initializ-
ation cost at program start-up time. Our proposed implementations of I2 and I3
discussed in the next section eliminate this start-up cost. Program start-up can be
done in constant time with either semantics. However, with either semantics, our
algorithm requires constant time for most variable definitions and references. We
show that despite these additional time requirements, the start-up savings are still
worth the additional cost for certain programs.

                   NEW IMPLEMENTATIONS FOR I2 AND I3

Our proposed implementations of I2 and I3 both use the same algorithm to detect
undefined variables. For I2, a reference to an undefined variable simply returns the
default value for variables of that type; for I3, an error is generated whenever an
undefined variable is referenced. Our algorithm is very similar to algorithms that have
been developed to detect dangling pointers,6 to support constant time initialization and
finalization of arrays,7 and to support a version of dynamic type checking in C.8
   The basic idea behind our algorithm involves associating a ‘lock’ and a ‘key’
with every variable that possibly could be undefined when referenced, i.e. every
variable not eliminated by static analysis. If the lock and key do not match, then
the variable is assumed to be undefined, and the appropriate action is performed,
i.e. returning a default value for I2 or generating an error for I3. The lock associated
with a variable is set to match the key when the variable is defined, thus allowing
subsequent access to the variable’s actual value.
   Although our final implementation eliminates this possibility, we first consider a
simplified version of the algorithm that allows undefined variables to possibly go
undetected because of a coincidental match between a key and an undefined lock.
Specifically, the locks and keys are implemented as follows. The key for a variable
is simply its memory address; thus, the key can be computed from the variable
without requiring additional space. The lock is an extra memory cell associated with
the variable that is large enough to contain its address, i.e. in order to match its
key. The idea is that at the first time a variable is defined, the address of the
variable is assigned to its lock, making the lock and key match at this point. Any
subsequent references to the variable therefore return the actual value of the variable.
By the same token, if the lock and key do not match on a particular variable
reference, it can be assumed that the variable is undefined, and either an error
message is generated or the default value is returned.
   More specifically, we have the following steps.

  1. Whenever a variable is defined. First, check to see whether the lock matches
     the key. If so, then do nothing, as the variable has been defined previously. If
     not, then assign the lock the address of the variable, thus allowing the lock
     and key to match for subsequent references.
  2. Whenever a variable is referenced. First, check to see whether the lock matches
     the key. If so, permit the actual value of the variable to be referenced, as the
698                  r. b. borie, a. s. parrish and s. mandyam

                            Figure 1. Lock-and-key configurations

      variable has been defined previously. If not, then the variable is undefined. For
      I2, return the default value for variables of this type; for I3, generate an error.
   Figure 1 demonstrates the lock-and-key configuration for both undefined and
defined variables using this simplified algorithm.
   As noted above, the problem with this algorithm is that there is nothing to
guarantee that a lock and key cannot initially match coincidentally, since there is
no a priori initialization. Since our objective is to achieve constant running time at
start-up, we cannot simply take the obvious approach of initializing all of the locks
to some type of undefined state. This would also do nothing more than reduce the
implementation to a more complicated version of the extra-bit method. Instead, rather
than assigning a value directly to the lock as above, we make the lock for variable
v (v.Lock) a pointer into an array (LockValues) which contains the actual lock values.
The cells from LockValues are then uniquely allocated to various locks dynamically,
from contiguous locations in the array. A variable, fence, partitions LockValues into
its allocated and unallocated portions; fence is initialized to 0 to denote that no
locks have been allocated. Then, in order to conclude that variable v has been
defined, v.Lock must point at a value in the allocated portion of LockValues, and
LockValues[v.Lock] must match the key for v, i.e. its address, v.Key.
   Figure 2 shows the lock-and-key configuration given this approach, for an undefined

                               Figure 2. Undefined variable v
lock-and-key strategies                               699

                             Figure 3. After the statement v:=25

variable v, and Figure 3 shows the lock-and-key configuration after v has been
defined. Unlike our simpler, but incorrect, version of the algorithm given previously,
this implementation requires non-zero start-up overhead, because fence must be
initialized to 0. However, obviously this overhead is a very small constant. Figure 4
illustrates the details of the algorithm as an implementation of I3. Figure 5 illustrates
the analogous implementation of I2. Note that, in either case, all three of the
operations (Initialize, Define and Reference), should be guaranteed to be performed
automatically by the language. The operation Initialize is performed automatically at
program start-up, whereas Define and Reference are performed at every variable
definition and reference, respectively.

                      Figure 4. Lock-and-key algorithm pseudocode for I3
700                   r. b. borie, a. s. parrish and s. mandyam

                      Figure 5. Lock-and-key algorithm pseudocode for I2


We now compare the execution time of the lock-and-key algorithm with those of
existing implementations, for both I2 and I3. As discussed previously, it is possible
to use varying degrees of static analysis with either semantics. The degree to which
static analysis is performed affects the details of our execution time analysis. For
simplicity, we first conduct the execution time analysis as if there were no static
analysis. We later consider the impact of static analysis on our execution time analy-
   We let V refer to the size of the variable space for a given program. For a given
program execution, we let D and R be the total numbers of variable definitions and
references, respectively. The comparison is summarized in Table I. Note that the
values provided in Table I refer to the additional time requirements of the I2 and
I3 strategies, beyond what is required for the normal execution of the program with
I1 semantics.
   It is fair to point out that although the O(V) term is eliminated from both I2 and
I3 by the lock-and-key approach, the O(D+R) term hides a relatively large constant,
e.g. larger than the analogous term in the extra-bit implementation of I3. If we call
this larger constant k, and the constant associated with the O(V) term in traditional
methods c, then the lock-and-key approach is superior to traditional implementations
of I2 and I3 whenever k(D+R) , cV. Regardless of the exact values of k and c,
however, we maintain that there are realistic situations where the average for D+R

                            Table I. Comparison of execution times

                               Traditional       Traditional     Lock-and-key
               Strategy         method              time             time

               I2              Obvious             O(V)              O(D+R)
               I3           Undefined values      O(V+R)             O(D+R)
               I3              Extra bit         O(V+D+R)            O(D+R)
lock-and-key strategies                                 701
over all executions is small enough relative to V to satisfy this relationship. In
particular, a static array to hold variable-size input data, where the worst case number
of data elements is much larger than the average case, constitutes such a situation.
We elaborate on some specific applications involving this situation later in the paper.
   Adding static analysis does not substantially affect our execution time analysis.
In particular, suppose we interpret V, D and R in Table I to refer to their respective
quantities after static analysis has eliminated certain variables, definitions, and refer-
ences from consideration at run-time. Provided that I3 produces only warnings at
compile time for references in category 1(a), where no path preceding the reference
contains a definition, then both I2 and I3 must deal with references in both categories
1(a) and 2 at run-time. So the analysis in Table I still holds.
   Alternatively, if I3 flags the references in category 1(a) as errors, which is certainly
possible and perhaps desirable, this would allow I3 to ignore such references at run-
time. On the other hand, I2 remains forced to deal with such references, since they
do not constitute errors under I2. This latter interpretation would thus make the V,
D and R quantities smaller for some programs under I3 than under I2. However,
for either I2 or I3, this still does not change our basic conclusion concerning the
circumstances under which the lock-and-key approach is more appropriate than
conventional methods.

                            IMPLEMENTATION IN ADA
A combination of software engineering and performance concerns might motivate
choosing different initialization semantics and different implementations of those
semantics for different programs. An ideal approach would provide a compiler switch
that permits the programmer to choose the appropriate option from Table I for the
program at hand. This would permit static analysis to be integrated with run-time
checking, possibly allowing certain variables to be eliminated from being subject to
run-time checking. A second approach would provide user-defined types that exhibit
the desired initialization semantics and performance characteristics. This would allow
various implementations of I2 and I3 to be introduced into the context of existing
compilers that only support I1 semantics, and would still allow the programmer to
select the desired semantics by choosing the appropriate type. Although perhaps not
as ideal as building our technique into the compiler, this approach of providing user-
defined types would at least permit the lock-and-key technique to be used in situations
where modifying the compiler is not an option.
   In this section, we provide an example of the latter approach by providing user-
defined types whose initialization semantics are supported by our lock-and-key
algorithm. Because of its data abstraction capability, we choose Ada as the implemen-
tation language. Since the lock-and-key approach is most appropriate when the
number of variable references is small relative to the size of the variable space, we
feel that this approach has the most potential in the context of large structures that
are, on the average, only partially used. We have therefore chosen to demonstrate
this approach in the context of an array type, which we say is protected from
undefined references by the lock-and-key mechanism. In this context, a ‘key’ for a
variable, an array element, is just its index. Note that we could have chosen to
‘protect’ other types of structures in a similar fashion.
   We actually provide two types: an array type with I2 initialization semantics,
702                 r. b. borie, a. s. parrish and s. mandyam
exported by the package in Figure 6, and an array type with I3 initialization
semantics, exported by the package in Figure 7. The package I2 requires generic
parameters for the type of array elements and for the default value for the type; the
array elements are then initialized to the default in constant time before variables
of the protected array type are used. The package I3 only requires the array element
type as a formal parameter; in this case, each reference to an array element that has

                          Figure 6. Implementation of I2 in Ada
lock-and-key strategies                             703

                          Figure 7. Implementation of I3 in Ada

not been defined causes an exception to be raised. Each of these two packages
exports Define and Reference operations that must be invoked in the obvious fashion.
For example, Define(a,3,10) is invoked to assign the value 10 to element 3 in
protected array a. Reference(a,i) is invoked whenever it is desired to reference
element i in protected array a. Examples using these operations are given in the
next section.
704                  r. b. borie, a. s. parrish and s. mandyam
   Either of the packages I2 and I3 can easily be generalized to support protected
multidimensional arrays. Figure 8 represents a two-dimensional generalization of the
I3 package. The only feature that needs explanation is the allocation of storage for
the field LockValues in the record Protected2dimArray. Ada prohibits declaring the
size of a field to be an expression involving discriminants, such as Max1*Max2.9
However, using an access type as shown is permitted.

We first consider an example of a procedure that locates and prints the smallest
integer from the collection of integers stored in an array. The number of integers
that are actually input into the array is determined by the actual value of a parameter
called Count. We assume that Count is normally small relative to the size of the
(static) array, which has been defined to handle a worst-case scenario in terms of
data size. We note that even in a language like Ada that supports dynamic arrays,
defining a static array with substantial wasted space, which costs little, may still be
desirable to reduce execution time.
   As observed earlier, this example was used in Reference 2 to argue in favor of
I3. In particular, given the error that Count is one greater than the actual size of
the data n, I1 will lead to totally unpredictable results due to the existence of
random data in position n+1, whereas I2 may lead to the incorrect conclusion that
the 0 in the (n+1)th position is the minimum.
   Although our Ada compiler only supports I1 semantics, our Ada package I3 solves
this problem. Figure 9 contains an example of a minimum procedure where the I3
package is used to detect the error. By defining the array to be protected, an error
will be generated whenever the (n+1)th position is referenced. We note that whenever
the size of the array is substantially larger than Count, then the relationship
k(D+R) , cV is satisfied. The values chosen for the size of the array and for Count
in our example at least plausibly satisfy this relationship, making this appear to be
a case where the lock-and-key technique is more efficient than the conventional
implementation of I3. Even if this were not the case, however, the package I3 at
least allows us to replace the default I1 semantics of the Ada compiler with I3
semantics, in a situation where I3 semantics is most useful. Future work will involve
providing both conventional and lock-and-key Ada package implementations of both
I3 and I2, as well as more precise guidelines for choosing the most efficient
implementation for a particular situation.
   As an aside, I3 is actually no more expensive than I2 with our lock-and-key
algorithm. This is unlike the case with conventional approaches where I3 is more
expensive; whereas I2 involves initializing all values, I3 involves initializing all
values, or their extra bits, and checking them at run-time to determine whether they
are defined. Thus, in situations like the above when I3 is more desirable than I2 in
terms of software engineering concerns, no additional expense over I2 is required
to implement I3 using locks and keys. Therefore, in cases where k(D+R) , cV holds,
and the lock-and-key approaches are superior to conventional approaches, the decision
of whether to use I2 or I3 can be motivated entirely by software engineering
concerns, and not by efficiency.
   Aside from its ability to perform more efficient error detection associated with
unintended uses of undefined variables, the lock-and-key technique provides a means
lock-and-key strategies                            705

Figure 8. I3 generalization of a two-dimensional protected array
706                  r. b. borie, a. s. parrish and s. mandyam

                           Figure 9. Ada client: off-by-one error

in general for using an array of size n without requiring O(n) initialization time.
For certain problems, it is therefore possible to develop a more efficient algorithm
with our protected array abstraction than with either an unprotected array or linked
list data structure. In the remainder of this section, we briefly consider two such
problems: maintaining a sparse array and evaluating a recurrence formula.
   A sparse array is an array in which most of the values are zero. Figure 10 gives
an Ada implementation in which at most M of the values in an N-element array are
accessed. This Ada implementation using the lock-and-key technique to initialize all
array elements to a default of 0 requires only O(M) execution time. Traditional
implementations, such as explicitly initializing all N elements, or using the extra bit
or undefined values methods to trap undefined accesses, require O(N+M) time.
Another implementation, using a linked list data structure that contains all (index,
value) pairs whose value co-ordinate is non-zero, yields an O(M2) time algorithm.
lock-and-key strategies                           707

                       Figure 10. Ada client: maintaining a sparse array

Therefore when M is substantially smaller than N, as can be the case for a sparse
array, the lock-and-key technique gives the most efficient implementation.
   Finally, consider the problem of computing an exact value F(n) given a recurrence
such as

          F(k) =   H
                   a1F(k/b1) + a2 * F(k/b2)
                                                      , if k#1
                                                      , if k.1

   A straightforward dynamic programming solution for computing a value F(n)
requires O(n) time. Furthermore, assuming without loss of generality that b1$b2, the
straightforward recursive solution has an execution time that satisfies a recurrence
T(n)$2T(n/b1), and thus requires more than O(nlogb12) time. These standard techniques
therefore each require time that is polynomial, and hence worse than polylogarithmic,
in the magnitude of n.
   However, by combining the lock-and-key technique with other algorithmic tech-
niques known as top-down dynamic programming and memoization,10 a better than
polynomial execution time can be obtained. This execution time is proportional to
708                 r. b. borie, a. s. parrish and s. mandyam
the number of distinct values k such that F(k) is present in the recursion tree for
computing F(n). Again assuming b1$b2, this number is O(log2b2n), so this method
yields an O(log2n) execution time, which is polylogarithmic, and hence sublinear, in
the magnitude of n. An Ada implementation of this algorithm is given in Figure 11.
   Note that this example represents an unusual use of I3 semantics. In evaluating
the recurrence formula, we wish to test particular array elements to determine
whether or not they are defined. However, an undefined array element does not
imply an error, as our previous discussion has implied. Rather, an undefined array
element array element is a signal to perform the recurrence computation. This is
similar to ideas from Reference 5, which suggest ‘undefined’ as a legitimate value
for a given type.

                   Figure 11. Ada client: evaluating a recurrence formula
lock-and-key strategies                                        709
In this paper, we have proposed an algorithm for detecting undefined variables in
programs. We have shown that this algorithm can be used to implement two different
treatments of undefined variables: either assigning a default value (I2) or returning
an error (I3) whenever an undefined variable is detected. For programs where the
average number of variable definitions and references is small relative to the size
of the overall variable space, this algorithm is more efficient than classical approaches
to implementing I2 and I3. As discussed previously, our technique is remarkably
general, and has been used in several disparate contexts.6–8
   Although the universe of programs for which this technique is efficient appears
to be relatively small, we have provided a flexible implementation in Ada whereby
the technique is built into an abstract data type to handle undefined variables of that
type. We provided separate ADTs corresponding to I2 and I3 semantics. Thus, the
programmer can make use of this technique only for array variables where it appears
to be useful. In addition, the programmer can use different types for different
variables according to the desired I2 or I3 semantics, even to the degree of treating
different undefined variables in two different ways within the same program. Finally,
we showed how this technique could be used to obtain speed-up of classical
applications involving large, sparsely populated structures.
   There are several areas where future work appears to be needed. Certainly
additional work is needed to validate this technique experimentally, in order to
determine the extent of its usefulness in practical situations. In addition, appropriate
ways to integrate these approaches into language design and implementation should
be considered, e.g. through compiler switches to choose an appropriate semantics
and implementation of that semantics. Also, issues surrounding the development of
the protected array type should be further investigated. In particular, implementation
alternatives should be considered that allow the lock-and-key algorithm to be reused
independently of the type being protected, perhaps through inheritance of the basic
algorithm across a variety of types. In addition, an implementation that supports a
more natural client programming style should be sought, where assignment statements
and variable references can be written in a more natural way. Perhaps reimplementing
the protected array types in a language supporting assignment overloading and
inheritance would offer some insight into a more appropriate implementation.
1. B. Meyer, Introduction to the Theory of Programming Languages, C.A.R. Hoare Programming Series,
   Prentice-Hall, 1990.
2. W. Kempton and B. Wichmann, ‘Run-time detection of undefined variables considered essential’,
   Software—Practice and Experience, 20, 391–402 (1990).
3. A. Parrish, ‘Obtaining maximum confidence from unrevealing tests’, Proc. IEEE Southeastcon 1992,
   April 1992.
4. B. Weide, W. Ogden and S. Zweben, ‘Reusable software components’, in M. C. Yovits (ed.), Advances
   in Computers, Academic Press, 1991, Vol. 33, pp. 1–65.
5. R. Winner, ‘Unassigned objects’, ACM Trans. Programming Languages and Systems, 6, (4), 449–
   467 (1984).
6. C. Fischer and R. LeBlanc, ‘Implementation of runtime diagnostics in Pascal’, IEEE Trans. Software
   Engineering, SE-6, (4), 313–319 (1980).
7. D. Harms and B. Weide, ‘Efficient initialization and finalization of data structures: why and how’,
   Technical Report OSU-CISRC-3/89-TR11, The Ohio State University Computer and Information Science
   Research Center, February 1989.
710                     r. b. borie, a. s. parrish and s. mandyam
 8. B. Weide, ‘OWL programming in C’, Technical Report OSU-CISRC-TR-86-9, The Ohio State University
    Computer and Information Science Research Center, March 1986.
 9. Ada Programming Language Reference Manual, ANSI/MIL-STD-1815A-1983.
10. T. Cormen, C. Leiserson and R. Rivest, Introduction to Algorithms, MIT Press, 1990.
You can also read