Contention-Aware Lock Scheduling for Transactional Databases - VLDB Endowment

Page created by Brent Lloyd

Government & Politics

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Contention-Aware Lock Scheduling for Transactional
                           Databases

            Boyu Tian                      Jiamin Huang                          Barzan Mozafari                    Grant Schoenebeck
                                                               University of Michigan
                                                                 Ann Arbor, MI, USA
                                                 {bytian, jiamin, mozafari, schoeneb}@umich.edu

ABSTRACT                                                                                   by solving the data contention problem. For example, be-
Lock managers are among the most studied components                                        fore a transaction accesses a database object, it has to ac-
in concurrency control and transactional systems. How-                                     quire the corresponding lock; if the transaction fails to get
ever, one question seems to have been generally overlooked:                                a lock immediately, it is blocked until the system grants it
“When there are multiple lock requests on the same object,                                 the lock. This poses a fundamental question: when multiple
which one(s) should be granted first?”                                                     transactions are waiting for a lock on the same object, which
   Nearly all existing systems rely on a FIFO (first in, first                             should be granted first when the object becomes available?
out) strategy to decide which transaction(s) to grant the                                  This question, which we call lock scheduling, has received
lock to. In this paper, however, we show that the lock                                     surprisingly little attention, despite the large body of work
scheduling choices have significant ramifications on the over-                             on concurrency control and locking protocols [15, 46, 7, 68,
all performance of a transactional system. Despite the large                               18, 40, 50, 54, 23]. In fact, almost all existing DBMSs1 rely
body of research on job scheduling outside the database con-                               on variants of the first-in, first-out (FIFO) strategy, which
text, lock scheduling presents subtle but challenging require-                             grants (all) compatible lock requests based on their arrival
ments that render existing results on scheduling inapt for                                 time in the queue [2, 3, 4, 5, 6]. In this paper, we carefully
a transactional database. By carefully studying this prob-                                 study the problem of lock scheduling and show that it has
lem, we present the concept of contention-aware scheduling,                                significant ramifications on overall performance of a DBMS.
show the hardness of the problem, and propose novel lock                                   Related Work — There is a long history of research on
scheduling algorithms (LDSF and bLDSF), which guarantee                                    scheduling in a general context [25, 42, 69, 70, 64, 41, 35,
a constant factor approximation of the best scheduling. We                                 62], whereby a set of jobs is to be scheduled on a set of pro-
conduct extensive experiments using a popular database on                                  cessors such that a goal function is minimized, e.g., the sum
both TPC-C and a microbenchmark. Compared to FIFO—                                         of (weighted) completion times [64, 41, 39] or the variance
the default scheduler in most database systems—our bLDSF                                   of the completion or wait times [14, 17, 76, 49, 28]. There is
algorithm yields up to 300x speedup in overall transaction                                 also work on scheduling in a real-time database context [75,
latency. Alternatively, our LDSF algorithm, which is sim-                                  40, 7, 36, 74], where the goal is to minimize the total tardi-
pler and achieves comparable performance to bLDSF, has                                     ness or the number of transactions missing their deadlines.
already been adopted by open-source community, and was                                        In this paper, we address the problem of lock scheduling in
chosen as the default scheduling strategy in MySQL 8.0.3+.                                 a transactional context, where jobs are transactions and pro-
PVLDB Reference Format:
                                                                                           cessors are locks, and the scheduling decision is about which
Boyu Tian, Jiamin Huang, Barzan Mozafari, Grant Schoenebeck.                               locks to grant to which transactions. However, our transac-
Contention-Aware Lock Scheduling for Transactional Databases.                              tional context makes this problem quite di↵erent than the
PVLDB, 11 (5): 648
                x-yyyy,
                   - 662, 2018.                                                            well-studied variants of the scheduling problem. First, un-
DOI: https://doi.org/10.1145/3177732.3177740                                               like generic scheduling problems, where at most one job can
                                                                                           be scheduled on each processor, a lock may be held in either
1.     INTRODUCTION                                                                        exclusive or shared modes. The fact that transactions can
                                                                                           sometimes share the same resources (i.e., shared locks) sig-
  Lock management forms the backbone of concurrency con-                                   nificantly complicates the problem (see Section 2.4). More-
trol in modern software, including many distributed systems                                over, once a lock is granted to a transaction, the same trans-
and transactional databases. A lock manager guarantees                                     action may later request another lock (as opposed to jobs
both correctness and efficiency of a concurrent application                                requesting all of their needed resources upfront). Finally,
Permission to make digital or hard copies of all or part of this work for                  in the scheduling literature, the execution time of each job
personal or classroom use is granted without fee provided that copies are                  is assumed to be known upon its arrival [52, 70, 14, 76],
not made or distributed for profit or commercial advantage and that copies                 whereas the execution time of a transaction is often unknown
bear this notice and the full citation on the first page. To copy otherwise, to
                                                                                           a priori.
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. Articles from this volume were invited to present                    Although there are scheduling algorithms designed for
their results at The 44th International Conference on Very Large Data Bases,               real-time databases [55, 71, 77, 13], they are not applica-
August 2018, Rio de Janeiro, Brazil.                                                       1
Proceedings of the VLDB Endowment, Vol. 11, No. 5                                           The only exceptions are MySQL and MariaDB, which have
Copyright 2018 VLDB Endowment 2150-8097/18/1.                                              recently adopted our Variance-Aware Transaction Schedul-
DOI: https://doi.org/10.1145/3177732.3177740                                               ing (VATS) [44] (see Section 7).

                                                                                     648

ble in a general DBMS context. For example, real-time set- as a microbenchmark. Our results confirm that, com-
tings assume that each transaction comes with a deadline, pared to the commonly-used FIFO strategy, LDSF and
whereas most database workloads do not have explicit dead- bLDSF reduce mean transaction latencies by up to 300x
lines. Instead, most workloads wish to minimize latency or and 290x, respectively. They also increase throughput by
maximize throughput. up to 6.5x and 5.5x. As a result, LDSF (which is simpler
than bLDSF) has already been adopted as the default
Challenges — Several aspects of lock scheduling make it
scheduling algorithm in MySQL [1] as of 8.0.3+.
a uniquely challenging problem, particularly under the per-
formance considerations of a real-world DBMS.
1. An online problem. At the time of granting a lock 2. PROBLEM STATEMENT
to a transaction, we do not know when the lock will be In this section, we first describe our problem setting and
released, since the transaction’s execution time will only define dependency graphs. We then formally state the lock
be known once it is finished. scheduling problem.
2. Dependencies. In a DBMS, there are dependencies
among concurrent transactions when one is waiting for a
2.1 Background: Locking Protocols
lock held by another. In practice, these dependencies can Locks are the most commonly used mechanism for ensur-
be quite complex, as each transaction can hold locks on ing consistency when a set of shared objects are concurrently
several objects and several transactions can hold shared accessed by multiple transactions (or applications). In a
locks on the same object. locking system, there are two main types of locks: shared
locks and exclusive locks. Before a transaction can read
3. Non-uniform access patterns. Not all objects in the
an object (e.g., a row), it must first acquire a shared lock
database are equally popular. Also, di↵erent transaction
(a.k.a. read lock) on that object. Likewise, before a trans-
types might each have a di↵erent access pattern.
action can write to or update an object, it must acquire an
4. Multiple locking modes. The possibility of granting exclusive lock (a.k.a. write lock) on that object. A shared
a lock to one writer exclusively or to multiple readers is lock can be granted on an object as long as no exclusive
a source of great complexity (see Section 2.4). locks are currently held on that object. However, an exclu-
Contributions — In this paper, to the best of our knowl- sive lock on an object can be granted only if there are no
edge, we present the first formal study of lock scheduling other locks currently held on that object. We focus on the
problem with a goal of minimizing transaction latencies in strict 2-phase locking (strict 2PL) protocol: once a lock is
a DBMS context. Furthermore, we propose a contention- granted to a transaction, it is held until that transaction
aware transaction scheduling algorithm, which captures the ends. Once a transaction finishes execution (i.e., it commits
contention and the dependencies among concurrent transac- or gets aborted), it releases all of its locks.
tions. The key insight is that a transaction blocking many 2.2 Dependency Graph
others should be scheduled earlier. We carefully study the
difficulty of lock scheduling and the optimality of our algo- Given the set T of transactions currently in the system,
rithm. Most importantly, we show that our results are not and the set O of objects in the database, we define the
merely theoretical, but lead to dramatic speedups in a real- dependency graph of the system as an edge-labeled graph
world DBMS. Despite decades of research on all aspects of G = (V, E, L). The vertices of the graph V = T [ O con-
transaction processing, lock scheduling seems to have gone sist of the current transactions and objects. The edges of
unnoticed, to the extent that nearly all DBMSs still use the graph E ✓ T ⇥ O [ O ⇥ T describe the locking relation-
FIFO. Our ultimate hope is that our results draw attention ships among the objects and transactions. Specifically, for
to the importance of lock scheduling on the overall perfor- transaction t 2 T and object o 2 O,
mance of a transactional system. • (t, o) 2 E if t is waiting for a lock on o;
In summary, we make the following contributions: • (o, t) 2 E if t already holds a lock on o.
1. We propose a contention-aware lock scheduling algorithm, The label L : E ! {S, X} indicates the lock type:
called Largest-Dependency-Set-First (LDSF). We prove • L(t, o) = X if t is waiting for an exclusive lock on o;
that, in the absence of shared locks, LDSF is optimal in • L(t, o) = S if t is waiting for a shared lock on o;
terms of the expected mean latency (Theorem 2). With
• L(o, t) = X if t already holds an exclusive lock on o;
shared locks, we prove that LDSF is a constant factor
approximation of the optimal scheduling under certain • L(o, t) = S if t already holds a shared lock on o.
regularity constraints (Theorem 3). We assume that deadlocks are rare and are handled by an
2. We propose the idea of granting only some of the shared external process (e.g., a deadlock detection and resolution
lock requests on an object (as opposed to granting them module). Thus, for simplicity, we assume that the depen-
all ). We study the difficulty of the scheduling prob- dency graph G is always a directed acyclic graph (DAG).
lem under this setting (Theorem 5), and propose another
algorithm, called bLDSF (batched Largest-Dependency-
2.3 Lock Scheduling
Set-First), which improves upon LDSF in this setting. A lock scheduler makes decisions about which transactions
We prove that bLDSF is also a constant factor approxi- are granted locks upon one or both of the following events:
mation of the optimal scheduling (Theorem 6). (i) when a transaction requests a lock, and (ii) when a lock
3. In addition to our theoretical analysis, we use a real-world is released by a transaction.2 Let G be the set of all possible
DBMS and extensive experiments to empirically evalu- 2
These are the only situations in which the dependency
ate our algorithms on the TPC-C benchmark, as well graph changes. If a scheduler grants locks at other times,
the same decision could have been made upon the previous

649

Notation Description Theorem 1. Given a dependency graph G, when a trans-
T the set of transactions in the system action t releases a lock (S or X) on object o, it is NP-hard to
O the set of objects in the database determine which pending lock requests to grant, in order to
minimize the expected transaction latency. The result holds
G the dependency graph of the system even if all transactions have the same execution time, and
V vertices in the dependency graph no transaction requests additional locks in the future.3
E edges in the dependency graph
L labels of the edges indicating the lock type Given the NP-hardness of the problem in general, in the
rest of this paper, we propose algorithms that guarantee a
A a scheduling algorithm constant-factor approximation of the optimal scheduling in
lA (t) the latency of transaction t under A terms of the expected transaction latency.
¯
lA (t) the expectation of lA (t)
¯
l(A) the expected transaction latency under A 3. CONTENTION-AWARE SCHEDULING
Table 1: Table of Notations. We define contention-aware scheduling as any algorithm
that prioritizes transactions based on their impact on the
dependency graphs of the system. A scheduling algorithm overall contention in the system. First, we study several
A = (Areq , Arel ) is a pair of functions Areq , Arel : G ⇥ O ⇥ heuristics for comparing the contribution of di↵erent trans-
T ⇥ {S, X} ! 2T . For example, when transaction t requests actions to the overall contention, and illustrate their short-
an exclusive lock on object o, Areq (G, o, t, X) determines comings through intuitive examples. We then propose a par-
which of the transactions currently waiting for a lock on o ticular contention-aware scheduling that formally quantifies
(including t itself) should be granted their requested lock this contribution, and guarantees a constant-factor approxi-
on o, given the dependency graph G of the system. (Note mation of the optimal scheduling when shared locks are not
that the types of locks requested by transactions other than held by too many transactions. (Later, in Section 4, we gen-
t are captured in G.) Likewise, when transaction t releases a eralize this algorithm for situations where this assumption
shared lock on object o, Arel (G, o, t, S) determines which of does not hold.)
the transactions currently waiting for a lock on o should be
granted their requested lock, given the dependency graph G. 3.1 Capturing Contention
When all transactions holding a lock on an object o release The degree of contention in a database system is directly
the lock, we say that o has become available. When the related to the number of transactions concurrently request-
lock request of a transaction t is granted, we say that t is ing conflicting locks on the same objects.
scheduled. For example, a transaction holding an exclusive lock on a
Since the execution time of each transaction is typically popular object will naturally block many other transactions
unknown in advance, we model their execution time using requesting a lock on that same object. If such a transaction
a random variable with expectation R. Given a particular is itself blocked (e.g., waiting for a lock on a di↵erent object),
scheduling algorithm A, we define the latency of a transac- it will negatively a↵ect the latency of many transactions, in-
tion t, denoted by lA (t), as its execution time plus the total creasing overall contention in the system. Thus, our goal in
time it has been blocked waiting for various locks. Since contention-aware scheduling is to determine which transac-
lA (t) is a random variable, we denote its expectation as tions have a more important role in reducing the overall
¯
lA (t). We use ¯l(A) to denote the expected transaction la- contention in the system, so that they can be given higher
tency under algorithm A, which is defined as the average of priority when granting a lock. Next, we discuss heuristics
the expected for measuring the priority of a transaction in reducing the
¯ P latencies of all transactions in the system, i.e.,
l(A) = |T1 | t2T ¯lA (t). overall contention.
Our goal is to find a lock scheduling algorithm under Number of locks held — The simplest criterion for pri-
which the expected transaction latency is minimized. To oritizing transactions is the number of locks they currently
ensure consistency and isolation, in most database systems hold. We refer to this heuristic as Most Locks First (MLF).
Areq simply grants a lock to the requesting transaction only The intuition is that a transaction with more locks is more
when (i) no lock is held on the object, or (ii) the currently likely to block other transactions in the system. However,
held lock and the requested lock are compatible and no this approach does not account for the popularity of objects
transaction in the queue has an incompatible lock request. in the system. In other words, a transaction might be hold-
This choice of Areq also ensures that transactions requesting ing many locks but on unpopular objects, which are unlikely
exclusive locks are not starved. The key challenge in lock to be requested by other transactions. Prioritizing such a
scheduling, then, is choosing an Arel such that the expected transaction will not necessarily reduce contention in the sys-
transaction latency is minimized. tem. Figure 1 demonstrates an example where transaction
t1 holds the most number of locks, but on unpopular ob-
2.4 NP-Hardness jects. It is therefore better to keep t1 waiting and instead
Minimizing the expected transaction latency under the schedule t2 first, which holds fewer but more popular locks.
scheduling algorithm is, in general, an NP-hard problem.
Intuitively, the hardness is due to the presence of shared Number of locks that block other transactions — An
locks, which cause the system’s dependency graph to be a improvement over the previous criterion is to only count
DAG, but not necessarily a tree. those locks that have at least one transaction waiting on
them. This approach disregards transactions that hold many
event, i.e., a transaction was unnecessarily blocked. A lock
3
scheduler is thus an event-driven scheduler. All missing proofs can be found in our technical report [72].

650

o1                  Object                                                   o1
                                                                                            X             X                   Object
                X             X             Transaction                               t1                        t2            Transaction
      t1                      t2                Transaction holds               1
                                                                                                                                  Transaction holds
                                                a lock on object                                                                  a lock on object
                                                Transaction waits                                                                 Transaction waits
                                                for a lock on object                                                              for a lock on object
                                                                                2

Figure 1: Transaction t1 holds the greatest number of
                                                                                3
locks, but many of them on unpopular objects.
                                                                               Figure 3: Transaction t1 has a deeper dependency sub-
                             o1                                                graph, but granting the lock to t2 will unblock more trans-
                    X              X          Object                           actions which can run concurrently.
                                              Transaction
           t1                          t2         Transaction holds
                                                  a lock on object                Later, in Section 6.4, we empirically evaluate these heuris-
                                                  Transaction waits
                                                  for a lock on object
                                                                               tics. While none of these heuristics alone are able to guaran-
           t3                                                                  tee an optimal lock scheduling strategy, they o↵er valuable
                                                                               insight in understanding the relationship between scheduling
                                                                               and overall contention. In particular, the first two heuris-
                                                                               tics focus on what we call horizontal contention, whereby a
Figure 2: Transaction t2 holds two locks that are waited                       transaction holds locks on many objects directly needed by
on by other transactions. Although only one of t1 ’s locks                     other transactions. In contrast, the third heuristic focuses
is blocking other transactions, the blocked transaction (i.e.,                 on reducing vertical contention, whereby a chain of depen-
t3 ) is itself blocking three others.                                          dencies causes a series of transactions to block each other.
                                                                               Next, we present an algorithm which is capable of resolving
                                                                               both horizontal and vertical aspects of contention.
locks, but on these locks no other transactions are waiting.
We call this heuristic Most Blocking Locks First (MBLF).                       3.2     Largest-Dependency-Set-First
The issue with this criterion is that it treats all blocked                        In this section, we propose an algorithm, called Largest-
transactions as the same, even if they contribute unequally                    Dependency-Set-First (LDSF), which provides formal guar-
to the overall contention. Figure 2 shows an example in                        antees on the expected mean latency.
which the scheduler must decide between transactions t1                            Consider two transactions t1 and t2 in the system. If
and t2 when the object o1 becomes available. Here, this cri-                   there is a path from t1 to t2 in the dependency graph, we
terion would choose t2 , which currently holds two locks, each                 say that t1 is dependent on t2 (i.e., t1 depends on t2 ’s com-
at least blocking one other transaction. However, although                     pletion/abortion for at least one of its required locks). We
t1 holds only one blocking lock, it is blocking t3 which itself                define the dependency set of t, denoted by g(t), as the set of
is blocking three other transactions. Thus, by scheduling t2                   all transactions that are dependent on t (i.e., the set of trans-
first, t3 and its three subsequent transactions will remain                    actions in t’s dependency subgraph). Our LDSF algorithm
blocked in the system for a longer period of time than if t1                   uses the size of the dependency sets of di↵erent transactions
had been scheduled first.                                                      to decide which one(s) to schedule first. For example, in
Depth of the dependency subgraph — A more sophis-                              Figure 4, there are five transactions in the dependency set
ticated criterion is the depth of a transaction’s dependency                   of transaction t1 (including t1 itself) while there are four
subgraph. For a transaction t, this is defined as the subgraph                 transactions in t2 ’s dependency set. Thus, in a situation
of the dependency graph comprised of all vertices that can                     where both t1 and t2 have requested an exclusive lock on
reach t (and all edges between such vertices). The depth                       object o1 , LDSF grants the lock to t1 (instead of t2 ) as soon
of t’s dependency subgraph is characterized by the number                      as o1 becomes available.
of transactions on the longest path in the subgraph that                           Now, we can formally present our LDSF algorithm. Sup-
ends in t. We refer to this heuristic as Deepest Dependency                    pose an object o becomes available (i.e., all previous locks on
First (DDF). Figure 3 shows an example, where the depth                        o are released), and there are m + n transactions currently
of the dependency subgraph of transaction t1 is 3, while                       waiting for a lock on o: m transactions ti1 , ti2 , · · · , tim are
that of transaction t2 is only 2. Thus, based on this crite-                   requesting a shared lock o, and n transactions tx1 , tx2 , · · · , txn
rion, the exclusive lock on object o1 should be granted to                     are requesting an exclusive lock on object o. Our LDSF al-
t1 . The idea behind this heuristic is that a longer path indi-                gorithm defines the priority of each transaction txi requesting
cates a larger number of transactions sequentially blocked.                    an exclusive lock as the size of its dependency set, |g(txi )|.
Thus, to unblock such transactions sooner, the scheduling                      However, LDSF treats all transactions requesting a shared
algorithm must start with a transaction with deeper depen-                     lock on o, namely ti1 , ti2 , · · · , tim , as a single transaction—
dency graph. However, considering only the depth of this                       if LDSF decides to grant a shared lock, it will be granted
subgraph can limit the overall degree of concurrency in the                    to all of them. The priority of the shared lock requests is
system. For example, in Figure 3, if the exclusive lock on                     thus defined
                                                                                       Sm      as the size of the union of their dependency
o1 is granted to t1 , upon its completion only one transac-                    sets,     i=1 g(t i
                                                                                                 i ) . LDSF then finds the transaction t with
                                                                                                                                             ˆx
tion in its dependency subgraph will be unblocked. On the                      the highest priority among t1 , t2 , · · · , tn . If tˆ ’s priority is
                                                                                                                   x x          x      x

other hand, if the lock is granted to t2 , upon its completion                 higher than the collective priority of the transactions re-
two other transactions in its dependency subgraph will be                      questing a shared lock, LDSF grants the exclusive lock to
unblocked, which can run concurrently.                                         tˆx . Otherwise, a shared lock is granted to all transactions

                                                                         651

o1                                                                                            t2      t3        t4
                   X            X                   Object                                    Object
                                                    Transaction                               Transaction
              t1                       t2               Transaction holds                         Transaction holds
                                                                                                                                o1              o2
                                                        a lock on object
                                                                                                  a lock on object
                                                        Transaction waits
                                                                                                  Transaction waits              t5             t6
                                                        for a lock on object
                                                                                                  for a lock on object
                                                                                                                                           o3
                                                                                                                                           t1
Figure 4: Lock scheduling based on the size of the depen-                             Figure 5: The critical objects of t1 are o1 and o2 , as they
dency sets.                                                                           are locked by transactions t2 and t3 . Note that, although
                                                                                      o3 is reachable from t1 , it is not a critical object of t1 since
     Input : The dependency graph of the system G = (V, E, L),
             transaction t, object o, label L 2 {X, S}                                it is locked by transactions that are not currently running,
             // meaning t has just released a lock of type L on o                     i.e., t5 and t6 which themselves are waiting for other locks.
     Output: The set of transactions whose requested lock on o
             should be granted
                                                                                         The intuition here is that if a transaction t1 is dependent
 1   if there are other transactions still holding a lock on o then
                                                                                      on t2 , any progress in the execution of t2 can also be consid-
 2         return ;;
 3   Obtain the set of transactions waiting for a shared lock on o,                   ered as t1 ’s progress since t1 cannot receive its lock unless t2
       Ti         {ti 2 V : (ti , o) 2 E and L(ti , o) = S} =                         finishes execution. Thus, by granting the lock to the trans-
       {ti1 , ti2 , · · · , tim };                                                    action with the largest dependency set, LDSF allows the
 4   Obtain the set of transactions waiting for an exclusive lock                     most transactions to make progress toward completion.
       on o, T x             {tx 2 V : (tx , o) 2 E and L(tx , o) = X} =                 However, this does not necessarily hold true with the ex-
       {tx  ,
          1 2 t x , · · · , tx };
                              n                                                       istence of shared locks. Even if transaction t1 is dependent
                             S  n
 5   Let ⌧ (T i ) =                    i
                                i=1 g(ti ) ;                                          on t2 , the execution of t2 does not necessarily contribute
 6   Find a transaction t 2 W s.t. |g(tˆx )| = maxtxi 2T x g(tx
                                     ˆ
                                     x
                                                                         i) ;         to t1 ’s progress. Specifically, consider the set of all objects
 7   if ⌧ (T i ) < g(tˆx ) then                                                       that are reachable from t1 in the dependency graph, but are
 8         return T i ;                                                               locked (shared or exclusively) by currently running transac-
 9   else                                                                             tions. We call these objects the critical objects of t1 , and
10         return {tˆx };                                                             denote them as C(t1 ).5 For example, in Figure 5, we have
     Algorithm 1: Largest-Dependency-Set-First Algorithm                              C(t1 ) = {o1 , o2 }. Note that not all transactions that hold
                                                                                      a lock on a critical object of t1 contribute to t1 ’s progress.
                                                                                      Rather, only the transaction that releases the last lock on
ti1 , ti2 , · · · , tim . The pseudo-code of the LDSF algorithm is                    that critical object allows for the progress of t1 . In the ex-
provided in Algorithm 1.                                                              ample of Figure 5, t2 ’s execution does not contribute to t1 ’s
                                                                                      progress, unless t3 releases the lock before t2 .
Analysis — We do not make any assumptions about the fu-                                  Nonetheless, when the number of transactions waiting for
ture behavior of a transaction, as they may request various                           each shared lock is bounded, LDSF is a constant-factor ap-
locks throughout their lifetime. Furthermore, since we can-                           proximation of the optimal scheduler.
not predict new transactions arriving in the future, in our
analysis, we only consider the transactions that are already                             Theorem 3. Let the maximum number of critical objects
in the system. Since the system does not know the execution                           for any transaction in the system be c. Assume that the
time of a transaction a priori, we model the execution time                           number of transactions waiting for a shared lock on the same
of each transaction as a memoryless random variable. That                             object is bounded by u. The LDSF algorithm is a (c · u)-
is, the time a transaction has already spent in execution                             approximation of the optimal scheduling (among strategies
does not necessarily reveal any information about the trans-                          that grant all shared locks simultaneously) in terms of the
action’s remaining execution time. We denote the remaining                            expected latency.
execution time as a random variable R with expectation R̄.
We also assume that the execution time of a transaction is                            4.    SPLITTING SHARED LOCKS
not a↵ected by the scheduling.4 Transactions whose behav-
ior depends on the actual wall-clock time (e.g., stop if run                             In the LDSF algorithm, when a shared lock is granted, it is
before 2pm, otherwise run for a long time) are also excluded                          granted to all transactions waiting for it. In Section 4.1, we
from our discussion.                                                                  show why this may not be the best strategy. Then, in Sec-
   We first study a simplified scenario in which there are                            tion 4.2, we propose a modification to our LDSF algorithm,
only exclusive locks in the system (we relax this assumption                          called bLDSF, which improves upon LDSF by exploiting the
in Theorem 3). The following theorem states that LDSF                                 idea of not granting all shared locks simultaneously.
minimizes the expected latency in this scenario.
                                                                                      4.1    The Benefits and Challenges
                                                                                        As noted earlier, when the LDSF algorithm grants a shared
   Theorem 2. When there are only exclusive locks in the                              lock, it grants the lock to all transactions waiting for it.
system, the LDSF algorithm is the optimal scheduling algo-                            However, this may not be the optimal strategy. In general,
rithm in terms of the expected latency.                                               granting a larger number of shared locks on the same object
4
                                                                                      increases the probability that at least one of them will take
 For example, scheduling causes context switches, which
                                                                                      5
may a↵ect performance. For simplicity, in our formal anal-                              Note that the critical objects of a transaction may change
ysis, we assume that their overall e↵ect is not significant.                          throughout its lifetime.

                                                                                652

a long time before releasing the lock. Until the last transac- Object
tion completes and releases its lock, no exclusive locks can S S S X Transaction
be granted on that object. In other words, the expected du- Transaction waits
t1 t2 t3 t4 for a lock on obect
ration that the slowest transaction holds a shared lock grows
x Dependency set of size x
with the number of transactions sharing the lock. This is the 6 1 1 5
well-known problem of stragglers [21, 27, 31, 63, 79], which is
exacerbated as the number of independent processes grows. Figure 6: Assume that f (2) = 1.5 and f (3) = 2. If we first
To illustrate this more formally, consider the following grant a shared lock to all of t1 , t2 , and t3 , all transactions in
example. Suppose that a set of m transactions, t1 , · · · , tm , t4 ’s dependency set will wait for at least 2R̄. The total wait
are sharing a shared lock. Let R1rem , R2rem , · · · , Rmrem
be a time will be 10R̄. However, if we only grant t1 ’s lock, then
set of random variables representing the remaining times t4 ’s lock, and then grant t2 ’s and t3 ’s locks together, the
of these transactions. Then, the time needed before an transactions in t4 ’s dependency set will only wait R̄, while
exclusive lock can be granted on the same object is the those in t2 ’s and t3 ’s dependency sets will wait 2R̄. Thus,
remaining time of the the slowest transaction, denoted as the total wait time in this case will be only 9R̄.
rem
Rmax,m = max{R1rem , · · · , Rm
rem
}, which itself is a random
rem rem
variable. Let R̄max,m be the expectation of Rmax,m . As long According to this theorem, any algorithm that does not
as the Rirem ’s have non-zero variance6 (i.e., i2 > 0), R̄max,m
rem
rely on knowing the delay factor is not competitive: it per-
strictly increases with m, as stated next. forms arbitrarily poor, compared to the optimal scheduling.
Lemma 4. Suppose that R1rem , R2rem , · · · are random vari- Thus, in the next section, we take the delay factor f (k) as
ables with the same range of values. If k+1 2
> 0, then an input, and propose an algorithm that adopts the idea of
rem
R̄max,k rem
p, the system will make faster progress if the batch
wait for an exclusive lock, as illustrated in Figure 6. How-
of tî1 , tî2 , · · · , tîk is scheduled first, in which case bLDSF will
ever, lock scheduling in this situation becomes extremely
difficult. We have the following negative result. grant shared locks to tî1 , tî2 , · · · , tîk simultaneously. When
q = p, the speed of progress in the system will be the same
Theorem 5. Let A¬f be the set of scheduling algorithms under both scheduling decisions. In this case, bLDSF grants
that do not use the knowledge of the delay factor f (k) in shared locks to the batch, in order to increase the overall
their decisions. For any algorithm A¬f 2 A¬f , there exists degree of concurrency in the system. The pseudocode for
w̄(A¬f ) bLDSF is provided in Algorithm 2.
an algorithm A, such that = !(1) for some delay
w̄(A) We show that, when the number of transactions waiting
factor f (k). for shared locks on the same object is bounded, the bLDSF
6 algorithm is a constant factor approximation of the optimal
This assumption holds unless all instances of a transaction
type take exactly the same time, which is unlikely. scheduling algorithm in terms of the expected wait time.

653

Input : The dependency graph of the system G = (V, E, L),
                                                                                                       5   t1               Object
             transaction t, object o, label L 2 {X, S}
             // meaning t has just released a lock of type L on o                                                       x   Transaction with approximate
     Output: The set of transactions whose requested lock on o                                                              dependency set size x
             should be granted                                                                                                  Transaction holds
                                                                                            t2   2              2 t3
 1   if there are other transactions still holding a lock on o then                                                             a lock on object
 2         return ;;                                                                                                            Transaction waits
 3   Obtain the set of transactions waiting for a shared lock on o,                                                             for a lock on object
       Ti         {ti 2 V : (ti , o) 2 E and L(ti , o) = S} =                                          1   t4
       {ti1 , ti2 , · · · , tim };
 4   Obtain the set of transactions waiting for an exclusive lock                       Figure 7: The e↵ective size of t1 ’s dependency set is 5. But
       on o, T x             {tx 2 V : (tx , o) 2 E and L(tx , o) = X} =                its exact size is only 4.
       {tx  ,
          1 2 t x , · · · , tx };
                              n
 5   Let tˆi1 , tˆi2 , · · · , tˆik be the set of transactions in T i such that         a transaction performing a table scan will request a large
        S                                                                               number of locks, and will not make any progress until all of
       | ki=1 g(tˆii )|
                               is maximized ;                                           its locks can be granted. Thus, knowing a transaction’s lock
              f (k)
            ˆ                                                                           pattern in advance would also be beneficial. We leave such
 6   Let t be the transaction in T x with the largest dependency
             x
       set;                                                                             extensions of our algorithms (e.g., to hybrid workloads [60])
                                     Sk       ˆi                                        to future work.
 7   if g(tˆx ) · f (k)               i=1 g(t ) then
                                                 i
         return {tˆi1 , tˆi2 , · · · , tˆik };
 8

 9   else
                                                                                        5.       IMPLEMENTATION
10       return tˆx ;                                                                      We have implemented our scheduling algorithm in MySQL.
                 Algorithm 2: The bLDSF Algorithm                                       Similar to all major DBMSs, the default lock scheduling pol-
                                                                                        icy in MySQL was FIFO.7 Specifically, all pending lock re-
                                                                                        quests on an object are placed in a queue. A lock request
   Theorem 6. Let the maximum number of critical objects                                is granted immediately upon its arrival only if one of these
for any transaction in the system be c. Assume that the                                 two conditions holds: (i) there are no other locks currently
number of transactions waiting for shared locks on the same                             held on the object, or (ii) the requested lock type is com-
object is bounded by v. Then, given a delay factor of f (k),                            patible with all of the locks currently held on the object and
the bLDSF algorithm is an h-approximation of the opti-                                  there are no incompatible requests ahead of it waiting in the
mal scheduling algorithm in terms of the expected wait time,                            queue. Similarly, whenever a lock is released on an object,
where h = cv 2 · f (v).                                                                 MySQL’s scheduler scans the entire queue from beginning
                                                                                        to the end. It grants any waiting requests as long as one
   Unlike the LDSF algorithm, bLDSF requires a delay fac-                               of these conditions holds. As soon as the scheduler encoun-
tor for its analysis. However, since the remaining times of                             ters the first lock request that cannot be granted, it stops
transactions can be modeled as random variables, the exact                              scanning the rest of the queue.
form of the delay factor f (k) will also depend on the distribu-                           One challenge in implementing LDSF and bLDSF is keep-
tion of these random variables. For example, the delay fac-                             ing track of the sizes of the dependency sets. Exact calcu-
tor for exponential random variables is f (k) = O(log k) [22],                          lation would require either (i) searching down the reverse
for geometric random variables is f (k) p   = O(log k) [29], for                        edges in the dependency graph in real-time, whenever a
Gaussian random variables is f (k) = O( log k) [47], andpfor                            scheduling decision is to be made, or (ii) storing the depen-
power law random variables with exponent 3 is f (k) = k.                                dency sets for all transactions and maintaining them each
  In Section 6.7, we empirically show that bLDSF’s per-                                 time a transaction is blocked or a lock is granted. Both op-
formance is not sensitive to the specific choice of the de-                             tions are relatively costly. Therefore, in our implementation,
lay factor, as long as it is a sub-linear function that grows                           we rely on an approximation of the sizes of the dependency
monotonically with k (conditions C1, C2, and C3 from Sec-                               sets, rather than computing their exact values. When a
tion 4.1). This is because, when the batch size is small,                               transaction t holds no locks that block other transactions,
the di↵erence between all sub-linear
                                p        functions is also small.                       |g(t)| = 1. Otherwise, let Tt be the set of transactions wait-
For example, when b = 10, b ⇡ 3.16 and log2 (1 + b) ⇡                                   ing for an
3.46, leading to similar scheduling decisions. Even though                                           P object currently held by transaction t. Then,
p                                                                                       |g(t)| ⇡ t0 2Tt |g(t0 )| + 1. The reason this method is only
   log2 (1 + b) ⇡ 1.86 is smaller than the other two, it can                            an approximation of |g(t)| is that the dependency graph is a
still capture condition C2 quite well.                                                  DAG (but not necessarily a tree), which means the depen-
4.3         Discussion                                                                  dency sets of di↵erent transactions may overlap. Figure 7
                                                                                        illustrates an example, where the dependency set of t1 is
   In our analysis, we have assumed no additional informa-                              {t1 , t2 , t3 , t4 } and is therefore of size 4. However, its e↵ec-
tion regarding a transaction’s remaining execution time, or                             tive size is calculated as one plus the sum of the e↵ective
its lock access pattern. However, with the recent progress                              sizes of t2 and t3 ’s dependency sets, resulting in 5. To en-
on incorporating machine learning models into DBMS tech-                                sure that transactions appearing on multiple paths will not
nology [58, 11], one might be able to predict transaction                               be updated multiple times, we also keep track of those that
latencies [78, 59] in the near future. When such information                            have already been updated.
is available, a lock scheduling algorithm could take that into                             Another implementation challenge lies in the difficulty of
account when maximizing the speed of progress: a transac-                               finding the desired batch of transactions in bLDSF. Cal-
tion that will take longer should be given less priority. The                           culating the size of the union of several dependency sets
priority of a transaction would then be the size of its depen-
                                                                                        7
dency set divided by its estimated execution time. Likewise,                                Now, our LDSF algorithm is the default (MySQL 8.0.3+).

                                                                                  654

requires detailed information about the elements in each de- 6. EXPERIMENTS
pendency set (since the dependency sets may overlap due to Our experiments aim to answer several key questions:
shared locks). Therefore, we rely on the following approxi-
• How do our scheduling algorithms (LDSF and bLDSF)
mation in our implementation. We first sort all transactions
a↵ect the overall throughput of the system?
waiting for a shared lock in the decreasing order of their de-
pendency set sizes. Then, for k = 1, 2, · · · , we calculate the • How do our algorithms compare against FIFO (the
q value (see Section 4.2) for the first k transactions. Here, default policy in nearly all databases) and VATS (re-
we approximate the size of the union of the dependency sets cently adopted by MySQL), in terms of reducing av-
as the sum of their individual sizes. Let k⇤ be the k value erage and tail transaction latencies?
that maximizes q. We then take the first k⇤ transactions as • How do our scheduling algorithms compare against
our batch, which we consider for granting a shared lock to. various heuristics?
In Section 6, we show that, despite using these approxi- • How much overhead do our algorithms incur, com-
mations in our implementation, our algorithms remain quite pared to the latency of a transaction?
e↵ective in practice. • How does the e↵ectiveness of our algorithms vary with
Starvation Avoidance — In MySQL’s implementation of di↵erent levels of contention?
FIFO, when there is an exclusive lock request in the queue, • What is the impact of the choice of delay factor on the
it serves as a conceptual barrier: later requests for shared e↵ectiveness of bLDSF?
locks cannot be granted, even if they are compatible with the • What is the impact of approximating the dependency
currently held locks on the object. This mechanism prevents sets (Section 5) on reducing the overhead?
starvation when using FIFO. In our algorithms, starvation In summary, our experiments show the following:
is prevented using a similar mechanism. We place a barrier 1. By resolving contention much more e↵ectively than FIFO
at the end of the current wait queue. Lock requests that and VATS, bLDSF improves throughput by up to 6.5x
arrive later are placed behind this barrier and are not con- (by 4.5x on average) over FIFO, and by up to 2x (1.5x
sidered for scheduling. In other words, the only requests on average) over VATS. (Section 6.2)
that are considered are those that are ahead of the barrier.
Once all such requests are granted, this barrier is lifted, and 2. bLDSF can reduce mean transaction latencies by up to
a new barrier is added to the end of the current queue, i.e., 300x and 80x (30x and 3.5x, on average) compared to
those requests that were previously behind a barrier are now FIFO and VATS, respectively. It also reduces the 99th
ahead of one. This mechanism prevents a transaction with percentile latency by up to 190x and 16x, compared to
a small dependency set from waiting indefinitely behind an FIFO and VATS, respectively. (Section 6.3)
infinite stream of newly arrived transactions with larger de- 3. Both bLDSF and LDSF outperform various heuristics by
pendency sets. An alternative strategy to avoid starvation 2.5x in terms of throughput, and by up to 100x (8x on
is to simply add a fraction of the transaction’s age to its avg.) in terms of transaction latency. (Section 6.4)
dependency set size when making scheduling decisions. A 4. Our algorithms reduce queue length by reducing con-
third strategy is to replace a transaction’s dependency set tention, and thus incur much less overhead than FIFO.
size with a sufficiently large number once its wait time has However, their overhead is larger than VATS. (Section 6.5)
exceeded a certain timeout threshold. 5. As the degree of contention rises in the system, bLDSF’s
Space Complexity — Given the approximation methods improvement over both FIFO and VATS increases. (Sec-
mentioned above, both LDSF and bLDSF only require main- tion 6.6)
taining the approximate size of the dependency set of each 6. bLDSF is not sensitive to the specific choice of delay fac-
transaction. Therefore, the overall space overhead of our tor, as long as it is chosen to be an increasing and sub-
algorithms is only O(|T |). linear function. (Section 6.7)
7. Our approximation technique reduces scheduling over-
Time Complexity — In MySQL, all lock requests on an
head by up to 80x.
object (either granted or not) are stored in a linked list.
Whenever a transaction releases a lock on the object, the
scheduler scans this list for requests that are not granted
6.1 Experimental Setup
yet. For each of these requests, the scheduler scans the Hardware & Software — All experiments were performed
list again to check compatibility with granted requests. If using a 5 GB bu↵er pool on a Linux server with 16 Intel(R)
the request is found compatible with all existing locks, it is Xeon(R) CPU E5-2450 processors and 2.10GHz cores. The
granted, and the scheduler checks the compatibility of the clients were run on a separate machine, submitting transac-
next request. Otherwise, the request is not granted, and the tions to MySQL 5.7 running on the server.
scheduler stops granting further locks. Let N be the num-
ber of lock requests on an object (either granted or not). Methodology — We used the OLTP-Bench tool [24] to run
Then, FIFO takes O(N 2 ) time in the worst case. LDSF the TPC-C workload. We also modified this tool to run a mi-
and bLDSF both use the same procedure as FIFO to find crobenchmark (explained below). OLTP-Bench generated
compatible requests that are not granted yet, which takes transactions at a specified rate, and client threads issued
O(N 2 ) time. For bLDSF, we also sort all transactions wait- these transactions to MySQL. The latency of each transac-
ing for a shared lock by the size of their dependency sets, tion was calculated as the time from when it was issued until
which takes O(N log N ) time. Thus, the time complexity of it finished. In all experiments, we controlled the number of
LDSF and bLDSF is still O(N 2 ). transactions issued per second within a safe range to prevent
MySQL from falling into a thrashing regime. We also no-
ticed that the number of deadlocks was negligible compared

655

Figure 8: Throughput improvement Figure 9: Avg. latency improvement Figure 10: Tail latency improvement
with bLDSF (TPC-C). with bLDSF (under the same TPC-C w/ bLDSF (under the same number of
transactions per second). TPC-C transactions per second).

to the total number of transactions, across all experiments tion that holds the most locks which block at least one
and algorithms. other transaction (introduced in Section 3.1).
TPC-C Workload — We used a 32-warehouse configura- 6. Deepest Dependency First (DDF). When an object
tion for the TPC-C benchmark. To simulate a system with becomes available, grant a lock on it to the transaction
di↵erent levels of contention, we relied on changing the fol- with the deepest dependency subgraph (Section 3.1).
lowing two parameters: (i) number of clients, and (ii) num- For MLF, MBLF, and DDF, if a shared lock is granted,
ber of submitted transactions per second (a.k.a. through- all shared locks on that object are granted. For LDSF and
put). Each of our client threads issued a new transaction as bLDSF, we use the barriers explained in Section 5 to prevent
soon as its previous transaction finished. Thus, by creating starvation. For FIFO and VATS, if a shared lock is granted,
a specified number of client threads, we e↵ectively controlled they continue to grant shared locks to other transactions
the number of in-flight transactions. To control the system waiting in the queue until they encounter an exclusive lock,
throughput, we created client threads that issued transac- at which point they stop granting more locks.
tions at a specific rate.
Microbenchmark — We created a microbenchmark for a 6.2 Throughput
more thorough evaluation of our algorithm under di↵erent We compared the system throughput when using FIFO
degrees of contention. Specifically, we created a database and VATS versus bLDSF, given an equal number of clients
with only one table that had 20,000 records in it. The clients (i.e., in-flight transactions). We varied the number of clients
would send transactions to the server, each comprised of 5 from 100 to 900. The results of this experiment for TPC-C
queries. Each query was randomly chosen to be either a “SE- are presented in Figure 8.
LECT” query (acquiring a shared lock) or an “UPDATE” In both cases, the throughput dropped as the number of
query (acquiring an exclusive lock). The records in the table clients increased. This is expected, as more transactions in
were accessed by the queries according to a Zipfian distri- the system lead to more objects being locked. Thus, when
bution. To generate di↵erent levels of contention, we varied a transaction requests a lock, it is more likely to be blocked.
the following two parameters in our microbenchmark: In other words, the number of transactions that can make
1. skew of the access pattern (the parameter ✓ of the Zipfian progress decreases, which leads to a decrease in throughput.
distribution) However, the throughput decreased more rapidly when
2. fraction of exclusive locks (probability of “UPDATE” queries). using FIFO or VATS than bLDSF. For example, when there
were only 100 clients, bLDSF outperformed FIFO by only
Baselines — We compared the performance of our bLDSF 1.4x and VATS by 1.1x. However, with 900 clients, bLDSF
algorithm (with f (k)=log2 (1 + k) as default) against the achieved 6.5x higher throughput than FIFO and 2x higher
following baselines: throughput than VATS. As discussed in Section 4.2, bLDSF
1. First In First Out (FIFO). FIFO is the default sched- always schedules transactions that maximize the speed of
uler in MySQL and nearly all other DBMSs. When an progress in the system. This is why it allows for more trans-
object becomes available, FIFO grants the lock to the actions to be processed in a certain amount of time.
transaction that has waited the longest.
2. Variance-Aware Transaction Scheduling (VATS). 6.3 Average and Tail Transaction Latency
This is the strategy proposed by Huang et al. [44]. When We compared transaction latencies of FIFO, VATS, and
an object becomes available, VATS grants the lock to the bLDSF under an equal number of transactions per second
eldest transaction in the queue. (i.e, throughput). We varied the number of clients (and
3. Largest Dependency Set First (LDSF). This is the hence, the number of in-flight transactions) from 100 to
strategy described in Algorithm 1, which is equivalent to 900 for FIFO and VATS, and then ran bLDSF at the same
bLDSF with b = inf, and f (k) = 1. throughput as VATS, which is higher than the throughput
4. Most Locks First (MLF). When an object becomes of FIFO. This means that we compare bLDSF with FIFO
available, grant a lock on it to the transaction that holds at a higher throughput. The result is shown in Figure 9.
the most locks (introduced in Section 3.1). Our bLDSF algorithm dramatically outperformed FIFO by
5. Most Blocking Locks First (MBLF). When an ob- a factor of up to 300x and VATS by 80x. This outstanding
ject becomes available, grant a lock on it to the transac- improvement confirms our Theorems 3 and 6, as our algo-
rithm is designed to minimize average transaction latencies.

656

Figure 11: Maximum throughput un- Figure 12: Transaction latency under Figure 13: Scheduling overhead of var-
der various algorithms (TPC-C). various algorithms (TPC-C). ious algorithms (TPC-C).

Figure 14: Average number of transac- Figure 15: Average transaction la- Figure 16: Average latency for dif-
tions waiting in the queue under various tency for di↵erent degrees of skewness ferent numbers of exclusive locks (mi-
algorithms (TPC-C). (microbenchmark). crobenchmark).

We also report the 99th percentile latencies in Figure 10. a scheduling algorithm is the time needed by the algorithm
Here, bLDSF outperformed FIFO by up to 190x. Interest- to decide which lock(s) to grant.
ingly, bLDSF outperformed VATS too (by up to 16x), even In this experiment, we fixed the number of clients to 100
though the latter is specifically designed to reduce tail la- while varying throughput from 200 to 1000. The result is
tencies. This is because bLDSF causes all transactions to shown in Figure 13. We can see that, although all three
finish faster on average, and thus, those transactions waiting algorithms have the same time complexity in terms of the
at the end of the queue will also wait less, leading to lower queue length (Section 5), ours resulted in much less over-
tail latencies. head than FIFO because they led to much shorter queues
for the same throughput. This is because our algorithms ef-
6.4 Comparison with Other Heuristics fectively resolve contention, and thus, reduce the number of
In this section, we report our comparison of both bLDSF waiting transactions in the queue. To illustrate this, we also
and LDSF algorithms against the heuristic methods intro- measured the average number of waiting transactions when-
duced in Section 3, i.e., MLF, MBLF, and DDF. Moreover, ever an object becomes available. As shown in Figure 14,
we compare our algorithms with VATS too. this number was much smaller for LDSF and bLDSF. How-
First, we compared their throughput given an equal num- ever, VATS incurred less overhead than LDSF and bLDSF,
ber of clients. We varied the number of clients from 100 despite having longer queues. This is because VATS does
to 900. The results are shown in Figure 11. LDSF and not compute the sizes of the dependency sets.
bLDSF achieve up to 2x and 2.5x improvement over the
other heuristics in terms of throughput, respectively.
We also measured transaction latencies under an equal
6.6 Studying Different Levels of Contention
number of transactions per second (i.e, throughput). We In this section, we study the impact of di↵erent levels
varied the number of clients from 100 to 900 for the heuris- of contention on the e↵ectiveness of our bLDSF algorithm.
tics, and then ran bLDSF and LDSF at the maximum through- Contention in a workload is a result of two factors: (i) skew
put achieved by any of the heuristics. For those heuris- in the data access pattern (e.g., popular tuples), and (ii) a
tics which were not able to achieve this throughput, we large number of exclusive locks. There is more contention
compared our algorithms at a higher throughput than they when the pattern is more skewed, as transactions will re-
achieved. The results are shown in Figure 12, indicating that quest a lock on the same records more often. Likewise, ex-
MLF, MBLF, and DDF outperformed FIFO by almost 2.5x clusive lock requests cause more contention, as they cannot
in terms of average latency, while our algorithms achieved up be granted together and result in blocking more transac-
to 100x improvement over the best heuristics (MBLF with tions. We studied the e↵ectiveness of our algorithm under
900 transactions). Furthermore, bLDSF was better than di↵erent degrees of contention by varying these two factors
LDSF by a small margin. using our microbenchmark:
1. We fixed the fraction of exclusive locks to be 60% of all
6.5 Scheduling Overhead lock requests, and varied the ✓ parameter of the Zipfian
We also compared the overhead of our algorithms (LDSF distribution of our access distribution between 0.5 and
and bLDSF) against both FIFO and VATS: the overhead of 0.9 (larger ✓, more skew).

657

Figure 17: The impact of delay factor Figure 18: Scheduling overhead with Figure 19: CCDF of the relative error
on average latency. and without our approximation heuristic of the approximation of the sizes of the
for choosing a batch. dependency sets.

2. We fixed the ✓ parameter to be 0.8 and varied the prob- Understandably, f1 did not perform well, as it did not satisfy
ability of an “UPDATE” query in our microbenchmark condition C2 from Section 4.1. Functions f5 and f6 did not
between 20% and 100%. The larger this probability, the perform well either, since linear functions overestimate the
larger the fraction of exclusive locks. delay. For example, two transactions running concurrently
First, we ran FIFO using 300 clients, and then ran both take less time than if they ran one after another.
VATS and bLDSF at the same throughput as FIFO. The
results of these experiments are shown in Figures 15 and 16. 6.8 Approximating Sizes of Dependency Sets
Figure 15 shows that when there is no skew, there is no We studied the e↵ectiveness of our approximation heuris-
contention, and thus most queues are either empty or only tic from Section 5 for choosing a batch of shared requests.
have a single transaction waiting. Since there is no schedul- Computing the optimal batch accurately was costly, and sig-
ing decision to be made in this situation, FIFO, VATS and nificantly lowered the throughput. However, we measured
bLDSF become equivalent and exhibit a similar performance. the scheduling overhead, and compared it to when we used
However, the gap between bLDSF and the other two algo- an approximation. We ran TPC-C, and varied the number
rithms widens as skew (and thereby contention) increases. of clients from 100 to 900. As shown in Figure 18, our ap-
For example, when the data access is highly skewed (✓ = proximation reduced the scheduling overhead by up to 80x.
0.9), bLDSF outperforms FIFO by more than 50x and VATS We also measured the error of our approximation tech-
by 38x. Figure 16 reveals a similar trend: as more exclusive nique for estimating the dependency set sizes—the deviation
locks are requested, bLDSF achieves greater improvement. from the actual sizes of the dependency sets—for varying ra-
Specifically, when 20% of the lock requests are exclusive, tios of shared locks in the workload. Figure 19 shows the
bLDSF outperforms FIFO by 20x and VATS by 9x. How- complementary cumulative distribution function (CCDF) of
ever, when all the locks are exclusive, the improvement is the relative error of approximating the dependency set sizes.
even more dramatic, i.e., 70x over FIFO and 25x over VATS. The error grew with the ratio of shared locks; this was ex-
Note that, although VATS guarantees optimality when there pected, as shared locks are the cause of error in our approx-
are only exclusive locks [44], it fails to account for transac- imation. However, the errors remained within a reasonable
tion dependencies in its analysis (see Section 7 for a discus- range, e.g., even with 80% shared locks, we observed a 2-
sion of the assumptions made in VATS versus bLDSF). In approximation of the exact sizes in 99% of the cases.
summary, when there is no contention in the system, there
are no scheduling decisions to be made, and all scheduling 7. RELATED WORK
algorithms are equivalent. However, as contention rises, so
In short, the large body of work on traditional job schedul-
does the need for better scheduling decisions, and so does
ing is unsuitable in a database context due to the unique
the gap between bLDSF and other algorithms.
requirements of locking protocols deployed in databases. Al-
though there is some work on lock scheduling for real-time
6.7 Choice of Delay Factor databases, they aim at supporting explicit deadlines rather
To better understand the impact of delay factors on bLDSF, than minimizing the mean latency of transactions.
we experimented with several functions of di↵erent growth
Job Scheduling — Outside the database community, there
rates, ranging from the lower bound of all functions that
has been extensive research on scheduling problems in gen-
satisfy conditions C1, C2, and C3 (i.e., f (k) = 1) to their
eral. Here, the duration (and sometimes the weight and
upper bound (i.e., f (k) = k). Specifically, we used each
arrival time) of each task is known a priori, and a typical
of the following delay factors in our bLDSF algorithm, and
goal is to minimize (i) the sum of (weighted) completion
measured the average transaction latency:
times (SCT) [64, 41, 39], (ii) the latest completion time [20,
• f1 (k) = 1; 34, 67], (iii) the completion time variance (CTV) [14, 17,
p
• f2 (k) = log2 (1 + k); 76, 49], or even (iv) the waiting time variance (WTV) [28].
• f3 (k) = log2 (1 + k); The o✏ine SCT problem can be optimally solved using a
p Shortest-Weighted-Execution-Time approach, whereby jobs
• f4 (k) = k;
are scheduled in the non-decreasing order of their ratio of
• f5 (k) = 0.5(1 + k); execution time to weight [70], if they all arrive at the same
• f6 (k) = k. time. However, when the jobs arrive at di↵erent times, the
The results are shown in Figure 17. We can see that all sub- scheduling becomes NP-hard [52].
linear functions (i.e., f2 , f3 , and f4 ) performed comparably, None of these results are applicable to our setting, mainly
and that they performed better than the other functions. because of their assumption that each processor/worker can

658

You can also read