Dynamic Optimality for Skip Lists and B-Trees

Page created by Richard Carter
 
CONTINUE READING
Dynamic Optimality for Skip Lists and B-Trees
                                        ∗
                    Prosenjit Bose                Karim Douïeb†§                Stefan Langerman         ‡§

Abstract                                                              The dictionary problem becomes different if one
Sleator and Tarjan [39] conjectured that splay trees             is interested in the performance of a structure over a
are dynamically optimal binary search trees (BST).               sequence of operations as opposed to just the worst
In this context, we study the skip list data structure           case time of a single operation. In this paper, we are
introduced by Pugh [35]. We prove that for a class               interested in understanding the optimal time needed for
of skip lists that satisfy a weak balancing property,            a data structure to perform an access sequence X =
the working-set bound is a lower bound on the time               {x1 , x2 , . . . , xm } of search operations in the comparison
to access any sequence. Furthermore, we develop                  model of computation on a pointer machine. The
a deterministic self-adjusting skip list whose running           operations consist of several basic unit-cost operations
time matches the working-set bound, thereby achieving            such as rotations, promotions/demotions or splits/joins
dynamic optimality in this class. Finally, we highlight          depending on the structure considered.
the implications our bounds for skip lists have on                    We define a data structure by specifying a model
multi-way branching search trees such as B-trees, (a-            of computation that describes the structure of the in-
b)-trees, and other variants as well as their binary tree        formation itself and the unit-cost operations allowed.
representations. In particular, we show a self-adjusting         We define the minimum number of unit-cost operations
B-tree that is dynamically optimal both in internal and          needed for a data structure in model M to access se-
external memory.                                                 quence X as OP TM (X). Thus, the optimal offline
                                                                 access algorithm, i.e., the algorithm which knows ev-
1    Introduction                                                ery access of the sequence in advance, executes X in
                                                                 OP TM (X) unit-cost operations. An on-line algorithm,
The comparison-based dictionary problem is one of the
                                                                 which serves each access without any knowledge of fu-
most fundamental in information processing: How do
                                                                 ture accesses, is said to be dynamically optimal if it
we store and retrieve data efficiently? Consider a totally
                                                                 executes X in time O(OP TM (X)). More generally, we
ordered set S of n elements each associated with a key.
                                                                 say that an algorithm A is f (n)-competitive if its access
A dictionary is a data structure that allows the following
                                                                 cost AM (X) does not exceed f (n)OP TM (X).
operations:
                                                                      Without any prior knowledge of the access se-
    • Search(x): Find the element x in the set S.                quence, data structures must be self-adjusting to be
                                                                 competitive, i.e., they not only adapt during insertions
    • Insertion(x): Add the element x to S.                      or deletions of elements but also during searches. In the
                                                                 next section we review some of the relevant literature
    • Deletion(x): Remove the element x from S.                  related to our results.
     It is well known that in the comparison-based                    Before presenting our results, we define the
model, the cost of these operations has a lower bound            Working-set Property: In an access sequence, the
of ⌈log(n + 1)⌉ in the worst case. Since the AVL-                working-set number ti (z) of an element z at time i
tree [1], numerous alternative solutions to the dictionary       is the number of distinct elements accessed since the
problem have been developed that guarantee worst case            last access of z prior to time i, including z. A data
or expected Θ(lg n)1 cost per operation.                         structure has the working-set property if the amor-
                                                                 tized cost of access xi is O (lg ti (xi )). Over an ac-
  ∗ School of Computer Science, Carleton University, Ottawa,     cess sequence    PmX, we define the working set bound as
Ontario, K1S 5B6. jit@scs.carleton.ca                            W S(X) = i=1 lg ti (xi ).
Research supported in part by NSERC.                                  In this paper, we analyze the relationship between
  † Boursier FRIA, kdouieb@ulb.ac.be
                                                                 several models of skip lists and multi-way search tree
  ‡ Chercheur qualifié du FNRS, stefan.langerman@ulb.ac.be
  § Département d’Informatique, Université Libre de Bruxelles,
                                                                 data structures. We give lower bounds on the optimal
CP212, Boulevard du Triomphe, 1050 Bruxelles, Belgium.
                                                                 time needed to perform any access sequence X in these
  1 lg x in this paper is defined as log (x + 2).
                                        2
                                                                 models. We show that for some of these models, the
working-set bound is equivalent to dynamic optimality.      Static Optimality Theorem [39], the Static Finger The-
We also develop a deterministic self-adjusting skip list    orem [39], the Working-Set Theorem [39], the Sequen-
in this specific model that guarantees an upper bound        tial Access Theorem [18, 40, 41, 42] and the Dynamic
on the access cost matching the working set, i.e., it is    Finger Theorem [12, 13]. In [23], splay trees have been
dynamically optimal. We extend our results to show          shown to be O(1)-competitive against a class of dynamic
dynamic optimality of B-trees [8] both in internal and      balanced binary search trees. Alternatives were intro-
external memory under split and join operations.            duced by Demaine et al. [15], followed by Derryberry et
     In Section 3, we formally define several models of      al. [17] and Georgakopoulos [24] to achieve O(lg lg n)-
self-adjusting data structures. Lower bounds on optimal     competitive BST algorithms.
access cost for some skip list models are provided               The multi-way branching search tree
in Section 4. In Section 5, a dynamically optimal           (MWBST model in Section 3) is an abstract data
deterministic self-adjusting skip list is developed. In     structure model that generalizes B-trees [8], (a-b)-trees,
Section 6, we highlight the implications our bounds for     2-3-trees or many of their variants. This model allows
skip lists have on B-trees. In particular, we show a        a node to contain an arbitrary number of elements and
self-adjusting B-tree that is dynamically optimal for B     thus have an arbitrary number of children. The update
constant. Finally in Section 7, we show how our results     operations allowed are the same as in B-trees, i.e., split
on skip lists lead to dynamically optimal B-trees in        and join of nodes. To the best of our knowledge, lower
external-memory.                                            bounds on the optimal access sequence cost in this tree
                                                            model have not been studied before.
2   Prior Work                                                   Others Structures inspired but differing from the
      The linked list (LL model in Section 3) is one        MWBST model have been adapted in a self-adjusting
of the first self-adjusting data structure to be ana-        fashion. The splay tree of Sleator and Tarjan [39]
lyzed. Deterministic on-line algorithms were studied        has been generalized by Sherk [37] to k-ary trees.
for this model such as Move-To-Front (MTF) which            Martel [29] presented another self-adjusting multi-way
simply moves the accessed element to the front of the       search tree, the k-forest, which is a collection of different
list. It was shown by Sleator and Tarjan [38] that the      B-trees of branching factor k and of doubly exponential
MTF strategy is 2-competitive. An unpublished result        increasing size. Assuming a node is processed in
of Karp and Raghavan [27] states that no deterministic      O(lg k) unit-cost operations, a k-forest performs an
algorithm is c-competitive, where c < 2. The Bit algo-      access sequence X in O(H(p)) expected time, where
rithm [36], which moves to front the accessed element       H(p) is the entropy of the probability distribution p =
with probability 1/2, is expected to be 1.75-competitive.   hps1 , ps2 , . . . , psn i with psj the probability of accessing
The best randomized algorithm so far, due to Albers [3],    element sj in X. Martel shows how a k-forest can be
is 1.6-competitive. No randomized list update algorithm     adapted to achieve worst case performance by using
can be better than 1.5-competitive [43]. It is NP-hard      additional structures.
to compute an optimum offline strategy [4].                        With a similar idea of using a forest of trees,
      The binary search tree (BST model in Sec-             Iacono [26] and later Bădoiu et al. [10] developed
tion 3) is the first efficient data structure to have been     an alternative to the splay tree, not included in the
formally studied more specifically in the case where we      BST model, with O(lg n) worst-case access times that
are just allowed to perform rotations. Two lower bounds     satisfies all of the original splay tree properties proved
on OP TBST (X) were given by Wilber [45]. Both are          in [39] and the Unified property.
complex functions of X. A slight variation of the first           The skip list of Pugh [35] is a probabilistic alter-
one, the interleaves lower bound, is given by Demaine       native to balanced trees. It is a dictionary data struc-
et al. [15]. Recently Derryberry et al. [16] presented a    ture storing a dynamic set S of ordered elements and
graphical interpretation of OP TBST (X), the rectangle      which has an expected O(lg n) running time per oper-
cover lower bound, which generalizes both the inter-        ation. A skip list is built in levels, the bottom level
leaves lower bound and Wilber’s second lower bound.         is an ordinary sorted linked list and each higher level
Determining whether those lower bounds are tight is         acts as an "express lane" for the lists below (see Fig. 1).
still open.                                                 The height h(s) of an element s is defined as the high-
      So far the splay tree [39], introduced by Sleator     est level where s appears. The height H(L) of a skip
and Tarjan, is the only known BST algorithm which           list L is defined as maxs∈L h(s) and the depth d(s) of
may be O(1)-competitive. This constitutes the dynamic       s is H(L) − h(s). The pointers connecting two items
optimality conjecture. The splay tree has numerous          on a same level and the pointers leading to the lower
properties described by the Balanced Theorem [39], the      level of an item are called forward and down pointers
h   3    5    6   9    12   15   22 24     26 30   32   47   59 64   68

                                         Figure 1: A skip list, with h its header.

respectively.                                                     measured in terms of number of disk access (I/Os) fetch-
     Several variants of the skip list have been consid-          ing pages of a fixed size B.
ered: Munro et al. [34] developed a deterministic ver-                 Note that another self-adjusting skip list presented
sion of the skip list, based on B-trees [8], that performs        as an exercise in the book of Mulmuley [33], maintains
insert, delete and search operations in worst case O(lg n)        an access frequency count for each element and consid-
time through the increment and decrement of the height            ers them as weights such that the structure performs
of various elements (promotion/demotion operations).              standard operations in expected running times similar
     Under the assumption that the distribution of ac-            to those of biased search trees [9, 22].
cess probabilities is given, Martínez and Roura [30] de-               Relation between skip list and tree models:
veloped an algorithm that minimizes the expected ac-              Skip lists were introduced as a probabilistic alterna-
cess time by either building an optimal static skip list          tive to balanced trees. Several studies have shown that
in O(n2 lg n) time or a nearly optimal one in O(n) time.          in fact both models are tightly related. Several au-
Later, Bagchi et al. [6] developed the biased skip list as        thors [33, 32, 5, 34, 31, 6, 14] have noticed that a skip
suggested by Mehlhorn and Näher [31]. It manages a                list can be interpreted for the same cost as a specific
biased dictionary, i.e., an ordered set S of elements v as-       kind of tree and vice versa. In [34, 31, 6] a skip list
sociated with weight w(v) and performs search, insert,            simulates MWBSTs such as B-trees [8], (a, b)-trees [25]
delete, join, split, finger search and reweight operations         or other variant in order to obtain deterministic skip
in worst case running times similar to those of biased            lists. Messeguer [32] took the opposite approach by
search trees [9, 22].                                             developing the skip tree, belonging to the MWBST
     Skip lists cited above are not self-adjusting as they        model, consisting of a one-to-one mapping from stan-
do not update themselves during a search operation.               dard skip list operations to new randomized tree opera-
Over a sequence of accesses these structures have the             tions. Thus skip list and MWBST models have been
following problem: Frequently accessed elements might             shown to be equivalent. As suggested by Munro et
have smaller height in the list than less frequently ac-          al. [34], skip lists can be interpreted as BST by us-
cessed elements, implying that it takes longer to reach           ing a standard symmetric binary B-tree transformation
frequently accessed elements. A self-adjusting skip list          scheme [7] which includes red-black trees [28]. Recently
must somehow raise frequently accessed elements to                Dean and Jones [14] provided a detailed description
higher levels in the list to speed up future accesses. The        of how to represent a skip list using BST. They also
self-adjusting skip lists due to Ergun et al. [19, 20, 21]        showed that a modification of the skip list structure (see
perform an operation xi in O(lg ti (xi )) expected time.          model SL∆ in section 3) can simulate any BST based
With structural modification of the original skip list             on rotation-balancing mechanisms.
structure (not a real skip list anymore as defined by
Pugh), they show in [20] that it can support inser-               3   Models
tion/deletion of xi in O(lg tmax
                               i      (xi )) expected amortized        LL: In the linked list model defined by Sleator and
time, where tmax
               i    (xi ) is the maximum working number           Tarjan [38], each element contains pointers to the next
xi attains during its life-time prior to time i. The second       and to the previous element respectively. A finger is
self-adjusting structure due to Ciriani et al. [11] per-          initialized to the front of the list at each access, the
forms an access sequence X in O(H(p)) expected amor-              search operation consists of moving the finger forward
tized time, where H(p) is the entropy of the probability          and backward by one position at unit-cost. Two other
distribution p = hps1 , ps2 , . . . , psn i with psj the proba-   operations are allowed: the unit-cost swap of two
bility of accessing element sj in X. This self-adjusting          adjacent elements and the free swap which moves the
skip list has been adapted for managing arbitrary long            accessed element anywhere closer to the front.
strings residing in external memory, where the cost is
BST: We refer here to the formal binary search          cost of future accesses (this contrasts with the SL
tree model described by Wilber [45]. In this model,          model where only the touched elements are involved
the accessed element must be moved to the root of the        in those operations). The access cost is precisely the
tree by doing a sequence of unit-cost rotations. Other       total number of promotion or demotion operations
unit-cost rotations can be done as well in order to speed    performed.
up future accesses. The access cost is the total number
of rotations performed plus 1.                                   We can further specify that for each skip list model,
     MWBST: This model of multi-way branching                it must be either bounded in the sense that we never
search tree defines ordered trees where nodes are com-        go forward more than B times without going down
posed of several elements. Nodes have a number of chil-      in a search or weakly bounded in the sense P     that the
                                                                                                                i     j
dren (branching factor ) which is the number of element      first i highest levels contain no more than         j=0 B
inside the node plus 1. To perform a standard search [8]     elements. When we impose the bounded or weakly
in the MWBST model we use a finger initialized at the         bounded constraints to a model M, we refer to it as M-B
root of the tree after each access. Every movement of        or M-WB respectively. Note that trivially OP TM (X) ≤
the finger from an element to its child node or to its next   OP TM−W B (X) ≤ OP TM−B (X).
sibling in the node costs 1. Other unit-cost operations          The bounded and the weakly bounded constraints
are allowed such as split and join of nodes touched by       can also be generalized to the MWBST model, by
the finger during a search. The split operation consists      bounding the branching factor from above with B or
of moving an element x of a node v into its parent node      enforcing that the sum of elementsPi in every     node at
                                                                                                       j
u, elements storing a key smaller than x remain in the       depth lower than i is at most       j=0 B   . Note   that
original node v and a new node storing the other ele-        MWBST-B corresponds exactly to the B-tree [8] data
ments of x is created as another child of u. The join        structure.
operation performs the inverse of a split operation.
     SL: In the standard skip list model we use a            4 Lower bounds for skip list models
single finger. To access xi , the finger is initialized to     Lemma 4.1. For       any   access    sequence   X,
the highest level of the header element. The search          1
                                                               OP T MT T −W B (X) + m  ≤    OP T SL−W B (X) and
                                                             2
algorithm must reach xi by performing several unit-          1
                                                             2 OP TMT T −B (X) + m ≤ OP TSL−B (X).
cost operations, namely move the finger to the forward
element on the current level or to the upper/lower level     Proof. We prove the first statement of this lemma by
of the current element. Additionally the height of any       showing that we can simulate a sequence of k unit
element touched by the finger can be incremented or           cost operations in a skip list in the model SL-WB by
decremented at unit cost (possibly many times).              a sequence of at most 2(k − 1) unit cost operations
     SL∆: This model was defined in [14]. Although it         in a skip list in the model MTT-WB. The unit cost
is not used in the remainder of this paper, we present it    operations of an access in a skip list in the SL-WB
here to highlight the differences with our models. This       model can be decomposed in two parts, the number of
model is similar to the previous one except that every       finger movements and the number of height changes.
element x of the list stores an offset value ∆(x). During     The number of finger movements is at least the depth
a search, when the finger is positioned on the highest        d(xi ) of the accessed element plus 1 if xi is found directly
level of an element x, the model allows the finger to         by moving the finger from a level of the header or plus 2
move down by ∆(x) levels at unit-cost. The offset value       otherwise. Now in the MTT-WB model we must move
of an element can be updated for free when its height is     xi to the top with a cost of either d(xi ) if the position of
incremented or decremented.                                  xi is before the element that is directly reachable from
     As shown in [14], this model can be implemented         the highest level of the header or with a cost of d(xi ) + 1
by replacing the levels column of every element of the       otherwise. In order to be consistent with the skip list
SL∆ with an array.                                           we want to simulate in the SL-WB model, we must
     MTT: Here we consider an alternative model              move the element back to its original position before
inspired by Wilber’s [45] lower bounds on binary search      the next access, which doubles the cost. Every other
trees. In this skip list model, the element xi being         height modification is done in both models for the same
accessed must be moved to the top. Namely, the height        cost. Thus the simulation of an access in the MTT-
of xi and other elements in the list must change so          WB model requires at most twice the cost of this access
that xi is directly pointed to by the header of the          minus 1 in the SL-WB model.
list. Note that during an access, any other element               The proof of the second statement is identical. 
can be promoted or demoted which may affect the
Lemma 4.2. For    any     access    sequence             X,         This means that among the α ≥ ti (xi ) distinct
OP TSL−B (X) ≤ (B + 2)OP TMT T −B (X).                        elements which went above xi since its last access, at
                                                              least α − β of them have decreased their height under
Proof. We want to simulate a skip list in the model           h′i (xi ). Thus they have charged xi by a cost of at
MTT-B by a skip list in the model SL-B. By the                least α1 + α−11             1
                                                                               + . . . + β+1   = Hα − Hβ , where Hk
definition of a skip list in MTT-B model, the length of        is the kth harmonic number. Indeed, if we consider
a search path from the header to element xi is at most        that δ elements among α have already been demoted
B times d(xi ). The skip list in SL-B can simulate the        to a height which is smaller than the height of xi
height change of every element on the search path for         then the lowest cost that xi must pay for an element
the same cost as in the model MTT-B. But in MTT-B,            decrementing its height corresponds to 1/(α − δ) and
height changes does not necessarily involve elements on       occurs in a situation where α − δ elements are on the
the search path. Thus to simulate those height changes        same level than xi .
in SL-B, we perform the following procedure: Consider               Thus the total amortized access cost of xi is at least
                                                              a + Hα − Hβ ≥ logB β − 1 + Hα − Hβ ≥ ln β ln1B − 1 +
                                                                                                                       
the search path from the header to xi in the skip list
in SL-B. We say that an element hits this path if an                      ln α
                                                              ln α − 1 ≥ ln  B − 1 ≥ logB ti (xi ) − 1.
update operation in MTT-B, performed during previous                                                                     
accesses, has increased its height so that the element
would be touched by a forward pointer of the path. The             Surprisingly, a skip list algorithm in the SL model
procedure consists of walking on this search path, each       without the bounded constraint can perform some ac-
time we identify an element y which hits it, we move          cess sequences in time significantly faster than in the
the finger to touch y and we change its height as in           SL-B model. Take the sequential access sequence X =
MTT model. The cost of this operation in SL-B is a            {1, 2, . . . , n}, according to lower bounds proved above,
at most (B + 2) times the cost of changing the height         this access sequence takes Ω(n lg n) time in the SL-B
of y in MTT-B. Then we continue the walk. Elements            or SL-WB models with B constant. Whereas the fol-
which have not been treated yet will wait future accesses     lowing offline algorithm performs the sequence in linear
to be modified. The amortized simulation cost of a             time in the SL model: Beginning with every element
skip list in the MTT-B model by one in the SL-B is at         at height 1, we increment the height of the accessed
most (B + 2) times its own cost, i.e., OP TSL−B (X) ≤         element and decrement the height of the previously ac-
(B + 2)OP TMT T −B (X).                                      cessed one, yielding each access constant. Determining
                                                              if this skip list structure can perform as well in an online
                                                W S(X)
Lemma 4.3. For any access sequence X,            lg B    ≤    situations is an interesting problem which remains to be
OP TMT T −W B (X) + m.                                        analyzed.

Proof. Define h′i (z) the lowest height an element z           5    Deterministic self adjusting skip list
reached since its last access or since the initial time       We now describe a skip list data structure that conforms
and prior to time i. Let a = hi (xi ) − h′i (xi ) be the      to the SL-B model and whose running time matches the
difference between the lowest height of an access xi           lower bound from the previous section. Skip lists can
and its height at time i. The amortized cost of an            be adapted from the k-forest structure of Martel [29] or
access xi corresponds at least to the number of unit          from the working set structure of Iacono [26] to guar-
cost operations needed to set it at height h′i (xi ) then     antee same performances. A similar but randomized
to raise it at hi (xi ). As we consider amortized cost, we    approach has been taken by Ferragina et al. [44] for
can assume that the promotion operation on element xi         constructing a self-adjusting data structure for string
is fully paid by itself whereas the cost of the demotion      dictionary operations in the external memory model.
operation is equally shared among xi and each of the          We present a variation of this structure which guaran-
elements on its level. The amortized cost of an access is     tees better performance and does not require any extra
necessarily smaller or equal if we assume that an element     information:
xi reaches its lowest height h′i (xi ) then only move up by       The skip list stores a dynamic set S of n keys. The
a levels exactly at time i. At this precise time xi rises     structure is a single linked skip list of height 2k+2 −2 and
above β elements of height greater or equal to h′i (xi ),     composed of 2k = 2⌈lg lg n⌉ layers l1 , l1′ , l2 , l2′ , . . . , lk , lk′ .
by the definition of the weakly bounded constraint, we         Layers la and la′ covers each 2a levels of the list.
             a+1
have β ≤ B B−1−1 . Indeed, in a weakly bounded skip list      The layers are structured in increasing order, la
of height h′i (xi ) + a at most aj=0 B j elements can be      first then la′ , such that l1 covers the highest level of
                                 P
present in the first a levels.                                 the skip list (see Figure 2). An element whose height
is within the range of levels for a layer is associated to it.   Proof. A simple search from the highest level of the
                                                                 header of the list to an element x, assuming x ∈ la ,
Search:                                                          takes O(B2a ) time in worst case. Indeed, the search
                                                                 can be decomposed in 2a subsearches in each the 2a first
 1. The first phase of a search operation i consists in           layers. A subsearch in layer lb or lb′ , with b = 1, 2, . . . , a,
    performing a traditional skip list search [35] of the        only involves elements associated with the same layer.
    element xi in the self-adjusting skip list just like in      Thus it corresponds to a traditional search in
    a standard skip list.                                                                                         Paa skip list
                                                                 of height 2b . The total search cost is O(B b=1 2b ) =
                                                                 O(B2a ). This also holds when searching for an element
 2. Assume xi is associated with layer lb . If b = 1 the
                                                                 in layer la′ . Thus the whole search procedure of element
    search operation is finished. Otherwise we must
                                                                 x associated with layer la or la′ has a cost of O(B2a ).
    move xi to l1 and perform an overflow procedure
                                                                      Now we show that the search cost of an element
    if necessary.                                                                                    a−1                     a
                                                                 z with a working-set number B 2           < ti (z) ≤ B 2 is
     Note that moving an element from a layer to an-             O(B2a ): The number of operations which are needed
                                                                                                                    b
other consists of changing its height. The ordered set of        to overflow an empty layer lb is at least B 2 . Thus
elements associated with a layer is divided in contigu-          this greatest layer in which an element z, originally
                                                                                                             a
ous subsets by the elements belonging to higher lay-             in l1 , may be associated with after B 2 accesses is la .
ers. Our deterministic self-adjusting skip list manages          Thus the search cost of z in the SL-B model is given by
each subset of each layer as a sublist performing in-            O(B2a ). The cost of moving z to level l1 is also O(B2a ).
sertion, deletion, split and join in O(Bh), where h is                A cascade of overflows can occur after this move,
the maximum height of the skip lists involved in one             and the cost of an access must integrate that. We will
of those operations. The (a-b)-skip list of Bagchi et al.        see that the cost of those operations corresponds, in an
[6] achieves this performance worst case. It is in fact a        amortized sense, to the search cost of an element. We
skip list representation of (a-b)-trees [25] inheriting their    define the non negative potential of an element z to be
properties, namely, the number of consecutive elements           B times its height in the skip list, i.e., Φ(z) = Bh(z).
at height i between any consecutive elements of height           Thus an element involved in an overflow procedure has
strictly greater than i is at least a and at most b with         enough potential to pay for moving to a lower layer.
1 < a ≤ ⌊b/2⌋. By setting b = B the (a-b)-skip list              It is easy to see that by increasing their cost by only
of height h guarantees operations in O(Bh) worst case            a constant factor, the access operations pay for the
time and contains O(B h ) elements.                              increase in potential of the elements.                          
     We use this (a-b)-skip list as a black box in our
structure. Consider now the movement of an element               Theorem 5.2. For any access sequence X, W2 S(X)
                                                                                                            lg B ≤
from a layer b to a with b > a, this consists of inserting       1
                                                                 2 (OP TMT T −W B (X) + m)   ≤ OP TSL−W B (X) ≤
the element in the corresponding sublist in layer a and          OP TSL−B (X) ≤ cB WlgS(X) with c constant.
                                                                                       B
splitting every sublist traversed by the element during
its movement from layers b to a − 1. The inverse                 Proof. Trivially deduced from Lemma 4.1, 4.2, 4.3 and
movement from a to b consists of deleting the element            Theorem 5.1.                                        
from the corresponding sublist in layer a and joining the
sublists previously divided by the element in every layer        Corollary 5.1. Our self-adjusting skip list is dynam-
from a − 1 to b.                                                 ically optimal in the SL-B model with B constant.
     An overflow in layer la occurs when the number of
levels covered by la is no longer sufficient to maintain the            The structure can be maintained dynamically. In-
elements associated with it, i.e., when a sublist reaches a      sertion and deletion operations are straightforwardly de-
height larger than 2a . An overflow procedure performed           duced from the access approach, and they have a O(lg n)
in a layer la consists first in moving every element              cost.
associated with la′ to la+1 and second in decreasing every
height of elements associated to la by 2a in order to be         6   B-tree
associated to la′ . Then the procedure is iterated in la+1
                                                      As discussed in the introduction, skip lists in the SL-B
if an overflow occurs in it, and so on.                model can be represented in the B-trees (or MWBST-
                                                      B) model using the detailed reduction of Dean and
Theorem 5.1. The self-adjusting skip list data struc-
                                                      Jones [14]. Thus identical upper and lower bounds
ture described above conforms to the SL-B model and
                                                      follow immediately such that we readily obtain the
performs any access sequence X in O(B WlgS(X)
                                          B ).        following:
l1
                      l1′
                      l2
                      l2′

                      l3

                      l3′

                                                           ...

                                  Figure 2: Deterministic self-adjusting skip list.

Corollary 6.1. There exists a self-adjusting B-tree by B. Thus the correspondence between MTT-B in
that is dynamically optimal with B constant.             internal-memory and B-trees in external-memory is eas-
                                                         ily established. Indeed, the split and join operations
7 B-tree in external-memory model                        in a B-tree correspond to the promotion/demotion op-
The external-memory model [2] (also known as the I/O erations in MTT-B and consulting elements inside a
model) considers two-levels of memory hierarchy, the node is free. We can show OP TMT T −B (X) + m ≤
cache and the disk each divided in blocks of elements of 2OP TI/O−B−tree (X) for any access sequence X using
size B. The cache has a limit of M blocks and the disk an identical argument as in Lemma 4.2. Then by
                                                                                 W S(X)
has an unbounded size. We can only access the memory Lemma 4.3, we have lg B ≤ 2OP TI/O−B−tree (X).
block loaded in the cache for free cost, if the desired       In the self-adjusting skip list developed in Section 5,
information is not in the cache then the memory block    we   have  subsets of elements associated with a layer
containing it in the disk is transfered into the cache.  which   are each surrounded by two elements belonging
When the cache becomes full, some blocks are transfered to higher layers. Those subsets are managed with
back to the disk to make space for new block. Those the structure of Bagchi et al. [6], that is a skip list
transfers from disk to cache or inversely are defined as simulating an (a-b)-tree. Thus if we fix the value b to
unit-cost operations.                                    B (with a ≤ b/2), we obtain a self-adjusting skip list
     The B-trees are well adapted to the external- where we have Θ(B) consecutive elements at height i
memory model, assuming the size B of the memory between two elements of greater height. Consider that,
block known, the maximum number of elements inside in external memory, those consecutive elements are in
a node is fixed to B. In this case, an entire node can the same memory block. Then any operation in a subset
                                                                                   h
be transfered in one transfer operation. If we search of elements of size O(B ) and height O(h) is performed
for an element in a B-tree, we achieve an information- in O(h) time. By using this parameter in a similar
theoretically optimal number of memory transfers of analysis than in Theorem 5.1 (with the potential of a
O(logB n) in worst case. When studying the running node Φ(z) = h(z)), we can show that our self-adjusting
                                                                                        W S(X)
time of access sequences, it is assumed that the cache skip list satisfies the bound lg B .
is empty at the beginning of every access. The exter-         If we represent skip lists as B-trees using the de-
nal version of the B-tree model, I/O-B-tree, thus allows tailed reduction of Dean and Jones [14], we obtain a
moving the finger to a child, split and join operations self-adjusting B-tree which is dynamically optimal in
at unit cost. But unlike in the B-tree (or MWBST-B) the external-memory model.
model, moving the finger to a sibling is free.
     We recall that in the internal-memory model (stud- References
ied in previous section), the unit cost operations of
MTT-B are promotion and demotion of elements, the          [1] G. Adel’so-Vel’skii and E. Landis. An algorithm for
horizontal movements of the finger is for free and the          the organisation of information. Dokl. Akad. Nauk
consecutive number of these movements is bounded               SSSR146 (1962), 263-266 (in Russian); English Trans-
                                                                 lation in Soviet. Math.3, 1259-1262.
[2] A. Aggarwal and J. Vitter. The input/output com-                puter Science, volume 314, pages 459–466, 2004.
     plexity of sorting and related problems. CACM,             [19] F. Ergun, S. Mittra, S. C. Sahinalp, J. Sharp, and
     31(9):1116–1127, 1988.                                          R. Sinha. A dynamic lookup scheme for bursty access
 [3] S. Albers. Improved randomized on-line algorithms               patterns. In Proceedings of IEEE INFOCOM, 2001.
     for the list update problem. In Siam Journal on            [20] F. Ergun, S. C. Sahinalp, J. Sharp, and R. Sinha. Bi-
     Computing, volume 11(3), pages 682–693, 1998.                   ased dictionaries with fast insert/deletes. In Proceed-
 [4] C. Ambühl. Offline list update is np-hard. In Pro-                ings of the 33rd annual ACM Symposium on Theory of
     ceedings of the 8th Annual European Symposium (ESA              Computing (STOC), pages 483–491, 2001.
     2000), volume 1879 of LNCS, pages 42–51, 2000.             [21] F. Ergun, S. C. Sahinalp, J. Sharp, and R. Sinha.
 [5] C. R. Aragon and R. Seidel. Randomized search trees.            Biased skip lists for highly skewed access patterns. In
     In Algorithmica, volume 16, pages 464–497, 1996.                Proceedings of ALENEX, 2001.
 [6] A. Bagchi, A. L. Buchsbaum, and M. T. Goodrich.            [22] J. Feigenbaum and R. Tarjan. Two new kinds of biased
     Biased skip lists. In Algorithmica, volume 42(1), pages         search trees. In Bell System Technical Journal, volume
     31–48, 2005.                                                    62(10), pages 3139–3158, 1983.
 [7] R. Bayer. Symmetric binary b-trees: Data structure         [23] G. F. Georgakopoulos. Splay trees: a reweighing
     and maintenance algorithms. In Acta Informatica,                lemma and a proof of competitiveness vs. dynamic
     volume 1, pages 290–306, 1972.                                  balanced trees. In Journal of Algorithms, volume
 [8] R. Bayer and E. McCreight. Organization and main-               51(1), pages 64–76, 2004.
     tenance of large ordered indexes. In Acta Informatica,     [24] G. F. Georgakopoulos. How to splay for loglog n-
     volume 1, pages 173–189, 1972.                                  competitiveness. In Workshop on Experimental and
 [9] S. W. Bent, D. Sleator, and R. Tarjan. Biased search            Efficient Algorithms, pages 570–579, 2005.
     trees. In SIAM Journal on Computing, volume 14(3),         [25] S. Huddleston and K. Mehlhorn. A new data structure
     pages 545–568, 1985.                                            for representing sorted lists. In Acta Informatica,
[10] M. Bădoiu, R. Cole, E. D. Demaine, and J. Iacono.               volume 17, pages 157–184. Springer, 1982.
     A unified access bound on comparison-based dynamic          [26] J. Iacono. Alternatives to splay trees with O(log n)
     dictionaries. Theoretical Computer Science, 382(2):86–          worst-case access times. In Proceedings of the twelfth
     96, 2007.                                                       annual ACM-SIAM Symposium on Discrete Algo-
[11] V. Ciriani, P. Ferragina, F. Luccio, and S. Muthukrish-         rithms, pages 516–522, 2001.
     nan. A data structure for a sequence of string accesses    [27] R. Karp and P. Raghavan. From a personal communi-
     in external memory. In ACM Transactions on Algo-                cation cited in [36].
     rithms, volume 3(1), 2007.                                 [28] R. S. Leonidas J. Guibas. A dichromatic framework
[12] R. Cole. Part II: The proof. In Siam J. Comput.,                for balanced trees. In Proc. 19th IEEE Symp. on
     volume 30, pages 44–85, 2000.                                   Foundations of Computer Science, pages 8–21, 1978.
[13] R. Cole, B. Mishra, J. Schmidt, and A. Siegel. On the      [29] C. U. Martel. Self-adjusting multi-way search trees.
     dynamic finger conjecture for splay trees. Part I: Splay         Inf. Process. Lett., 38(3):135–141, 1991.
     sorting log n-block sequences. In Siam J. Comput.,         [30] C. Martinez and S. Roura. Optimal and nearly optimal
     volume 30, pages 1–43, 2000.                                    static weighted skip lists. Technical report, LSI-95-34-
[14] B. C. Dean and Z. H. Jones. Exploring the duality be-           R, Dept. Llenguatges i Sistemes Informàtics (Universi-
     tween skip lists and binary search trees. In Proceedings        tat Politèchnica de Catalunya), 1995.
     of the 45th annual southeast regional conference, pages    [31] K. Mehlhorn and S. Näher. Algorithm design and
     395–399, 2007.                                                  software libraries: Recent developments in the leda
[15] E. Demaine, D. Harmon, J. Iacono, and M. Pătraşcu.              project. In IFIP 12th World Computer Congress,
     Dynamic optimality–almost. In SIAM journal on                   volume 1, pages 493–505, 1992.
     Computing, to appear. Special issue of selected papers     [32] X. Messeguer. Skip trees, an alternative data structure
     from the 45th Annual IEEE Symposium on Founda-                  to skip lists in a concurrent approach. Informatique
     tions of Computer Science.                                      Theorique et Applications, 31(3):251–269, 1997.
[16] J. Derryberry, D. D. Sleator, and C. C. Wang. A            [33] K. Mulmuley. Computational Geometry: An Introduc-
     lower bound framework for binary search trees with              tion Through Randomized Algorithm, exercise 1.4.8.
     rotations. Technical report, Computer Science Depart-           Prentice-Hall, 1994.
     ment, School of Computer Science, Carnegie Mellon          [34] I. Munro, T. Papadakis, and R. Sedgewick. Deter-
     University, 2005.                                               ministic skip lists. In Proceedings of the third annual
[17] J. Derryberry, D. D. Sleator, and C. C. Wang.                   ACM-SIAM symposium on Discrete algorithms, pages
     O(log log n)-competitive dynamic binary search trees.           367–375, 1992.
     In Proceedings of the seventeenth annual ACM-SIAM          [35] W. Pugh. Skip lists: a probabilistic alternative to
     symposium on Discrete algorithm (SODA ’06), pages               balanced trees. In Communications of the ACM,
     374–383, 2006.                                                  volume 33(6), pages 668–676, 1990.
[18] A. Elmasry. On the sequential access theorem and           [36] N. Reingold, J. Westbrook, and D. D. Sleator. Ran-
     deque conjecture for splay trees. In Theoretical Com-           domized competitive algorithms for the list update
problem. In Algorithmica, volume 11(1), pages 15–32,
       1994.
[37]   M. Sherk. Self-adjusting k-ary search trees. In Proceed-
       ings of the Workshop on Algorithms and Data Struc-
       tures, volume 382, pages 381–392, 1989.
[38]   D. D. Sleator and R. E. Tarjan. Amortized efficiency
       of list update and paging rules. In Communications of
       the ACM, volume 28(2), pages 202–208, 1985.
[39]   D. D. Sleator and R. E. Tarjan. Self-adjusting binary
       search trees. J. ACM, 32(3):652–686, 1985.
[40]   R. Sundar. Sciencetwists, turns, cascades, deque
       conjecture, and scanning theorem. In Proceedings
       of the 13th Symposium on Foundations of Computer,
       pages 555–559, 1989.
[41]   R. Sundar. On the deque conjecture for the splay
       algorithm. In Combinatorica, volume 12, pages 95–
       124, 1992.
[42]   R. Tarjan. Sequential access in splay trees takes linear
       time. In Combinatorica, volume 5, pages 367–378,
       1985.
[43]   B. Teia. A lower bound for randomized list update
       algorithms. In Information Processing Letters, volume
       47(1), pages 5–9, 1993.
[44]   F. L. V. Ciriani, P. Ferragina and S. Muthukrishnan.
       Self-adjusting data structures for external memory
       string access. Technical report, TR-01-26 (Università
       Di Pisa), 2001.
[45]   R. Wilber. Lower bounds for accessing binary search
       trees with rotations. In SIAM journal on Computing,
       volume 18(1), pages 56–67, 1989.
You can also read