The Availability-Accountability Dilemma and its Resolution via Accountability Gadgets

Page created by Shannon Navarro
 
CONTINUE READING
The Availability-Accountability Dilemma
 and its Resolution via Accountability Gadgets
 Joachim Neu Ertem Nusret Tas David Tse
 jneu@stanford.edu nusret@stanford.edu dntse@stanford.edu
ABSTRACT This point of view is advocated by Buterin and Griffith [6] in the
Byzantine fault tolerant (BFT) consensus protocols are tradition- context of their effort to add accountability (among other things)
ally developed to support reliable distributed computing. For ap- to Ethereum’s Proof-of-Work (PoW) longest chain protocol, and
plications where the protocol participants are economic agents, is also central to the design of Gasper [7], the protocol running
recent works highlighted the importance of accountability: the Ethereum 2.0’s Proof-of-Stake (PoS) beacon chain. In these pro-
ability to identify participants who provably violate the protocol. tocols, accountability is used to incentivize proper behavior by
We propose to evaluate the security of an accountable protocol slashing of the stake of protocol-violating agents. Other protocols
in terms of its liveness resilience, the minimum number of Byzan- that are designed to provide accountability include Polygraph [13]
tine nodes when liveness is violated, and its accountable safety and GRANDPA [40]. A recent comprehensive work [25] shows that
resilience, the minimum number of accountable Byzantine nodes accountability can also be added on top of some but not all existing
when safety is violated. We characterize the optimal tradeoffs be- Byzantine fault tolerant (BFT) protocols.
tween these two resiliences in different network environments, and Given the importance of accountability, a key question is how
identify an availability-accountability dilemma: in an environment the traditional definition of consensus protocol security needs to
with dynamic participation, no protocol can simultaneously be be modified to reflect its accountability?
accountably-safe and live. We provide a resolution to this dilemma
by constructing an optimally-resilient accountability gadget to 1.2 Accountable Security
checkpoint a longest chain protocol, such that the full ledger is live
 Defining an appropriate notion of accountable security of a protocol
under dynamic participation and the checkpointed prefix ledger is
 is the first contribution of this work.
accountable. Our accountability gadget construction is black-box
 As a starting point, we observe there is an inherent asymmetry
and can use any BFT protocol which is accountable under static
 between safety and liveness as far as accountability is concerned.
participation. Using HotStuff as the black box, we implemented
 While safety violations such as due to double-voting can be caught
our construction as a protocol for the Ethereum 2.0 beacon chain,
 in a provable way, liveness violations such as transaction censor-
and our Internet-scale experiments with more than 4000 nodes
 ing cannot be. Indeed, the goal in [6] is to provide accountable
show that the protocol can achieve the required scalability and has
 safety. Hence, to set the stage to incorporate accountability, it is
better latency than the current solution Gasper, while having the
 helpful to split the single metric of resilience into two individual
advantage of being provably secure. To contrast, we demonstrate a
 metrics: safety resilience and liveness resilience. Safety resilience is
new attack on Gasper.
 the minimum number of Byzantine faults to cause a safety violation.
 Liveness resilience is the minimum number of Byzantine faults to
1 INTRODUCTION cause a liveness violation. Classic resilience is simply the minimum
1.1 Accountability of the two.
Safety and liveness are the two fundamental security properties of To capture accountable security, the notion of liveness resilience
consensus protocols. A protocol run by a distributed set of nodes is remains the same, but safety resilience should be strengthened to
safe if the ledgers generated by the protocol are consistent across a notion of accountable safety resilience: the minimum number of
nodes and across time. It is live if all honest transactions eventually faults that can be found accountable when there is a safety violation.
enter into the ledger. More precisely, a protocol has an accountable safety resilience of
 Traditionally, consensus protocols are developed for fault-tolerant means that in the case of a safety violation, at least nodes
distributed computing, where a set of distributed computing devices can be held accountable in a provable manner. By definition, the
aims to emulate a reliable centralized computer. In such a context, accountable safety resilience of a protocol cannot be larger than its
the security of consensus protocols is naturally measured by its safety resilience.
resilience: the minimum number of Byzantine protocol-violating Splitting the traditional notion of resilience into safety and live-
nodes needed to cause a loss of safety or liveness. In modern ap- ness resiliences, although usually not explicitly done, is implicit in
plications such as cryptocurrencies and decentralized applications many previous works in the BFT literature. Indeed, one can think
platforms, consensus nodes are no longer just disinterested com- of the design objective of a consensus protocol as achieving a good
puting devices but are agents acting based on economic and other tradeoff between safety and liveness resiliences. For example, in-
incentives. To provide the proper incentives to encourage nodes creasing the threshold of the quorum, a central concept in many
to follow the protocol, it is important that they can be held ac- BFT protocols, increases the protocol’s safety resilience while de-
countable for their protocol-violating behavior in a provable way. creases its liveness resilience. This separate treatment of safety and
 liveness resiliences is recently formalized in [29] through the no-
The authors contributed equally and are listed alphabetically. tion of alive-but-corrupt faults. Indeed, results in the literature on
 1
Joachim Neu, Ertem Nusret Tas, and David Tse

optimal resilience achievable in a given network environment (syn- Partially Dynamic
 Synchronous
chronous, partially synchronous, etc.) can be refined into results Synchronous Participation
on the optimal tradeoffs achievable between safety and liveness s s s
resiliences.
 (i) (ii) 1 (iii)
 In applications with an economic context, treating safety and
liveness resiliences separately makes sense, because the mecha-  1 1
 2, 2  1 2, 2
nisms of attacking safety are different from the mechanisms of 2 2 3, 3 2
attacking liveness and therefore the costs to the attacker are also
different. Trading off the two resiliences allows the protocol de-
 /2 l /2 l 1/2 1 l
signer to maximize the cost to the attacker. In this light, shifting
the attention from safety resilience to accountable safety resilience a a a
makes sense under the assumption that the dominant cost to the (iv) (v) 1 (vi)
attacker is the cost from being punished, from slashing of the stake
for example. 1
 A natural question then is: given a network environment, what is 2 2 2

the optimal tradeoff achievable between accountable safety resilience
and liveness resilience by any protocol? /2 l /2 l 1/2 1 l

1.3 Optimal Accountable-Safety vs Liveness Figure 1: Above: optimal tradeoffs between (traditional)
 Tradeoffs safety and liveness resilience in three environments. Below:
Our second contribution is the characterization of the optimal trade- optimal tradeoffs between accountable safety resilience and
offs between accountable safety resilience and liveness resilience in liveness resiliences. In each environment, the optimal trade-
three network environments: a) synchronous, where all nodes are off is described by a region of feasible resilience points, each
online and all messages are delivered between honest nodes within point achievable by some protocol; points outside the re-
a known delay bound; b) partially synchronous, where all nodes gion cannot be achieved by any protocol. The classic sin-
are online but messages suffer arbitrary delays before a Global gle resilience metric is maximized at the point on the re-
Stabilization Time (to model network partition), after which the gion boundary where the two resiliences are the same. In
network becomes synchronous; c) dynamic participation, where the synchronous and partially synchronous environments,
the number of nodes online is varying but messages delivered be- resilences are expressed in terms of the number of Byzan-
tween online honest nodes have a known delay bound. The results tine nodes; in the dynamic participation environment, re-
are shown in Figure 1 (the lower part) and compared to the optimal siliences are expressed in terms of the number of Byzantine
tradeoff between (traditional) safety and liveness resiliences (the nodes as a fraction of the total online nodes.
upper part).
 The point in each of the optimal tradeoff regions which max- all these cases, the same protocol simultaneously maximizes
imizes the traditional single resilience metric is the point on the the safety resilience and the accountable safety resilience for a
boundary where the safety and liveness resiliences are the same. given liveness resilience. That means one can simultaneously
In general, one may want to operate at a different point on the have optimal accountable security without sacrificing tradi-
boundary, reflecting the different costs of attacking safety versus tional security.
liveness. • In contrast to the synchronous and partially synchronous en-
 Several interesting conclusions regarding Figure 1: vironments, no accountability can be supported under dy-
 • As is well known, stronger security guarantees can be pro- namic participation: the accountable safety resilience is zero
 vided in the synchronous environment than in the partially for any positive liveness resilience (Figure 1 (vi)). Protocols like
 synchronous environment. This is reflected in an optimal trade- Ouroboros [16] and SnowWhite [15] can tolerate dynamic par-
 off between (traditional) safety and liveness resiliences which ticipation and are safe and live under a resilience of 50%, thus
 is better in the synchronous environment than in the partially achieving the optimal point in Figure 1 (iii). But they cannot
 synchronous environment. In contrast, the optimal tradeoff be- be made accountable.
 tween accountable safety resilience and liveness resilience is the
 same in the two environments. Thus the synchrony assump- 1.4 Availability-Accountability Dilemma
 tion does not improve accountable security. This is related to In public permissionless blockchains like Bitcoin and Ethereum, dy-
 an impossibility result in [25], which we discuss in Section 1.6. namic participation is a central feature. In Bitcoin, for example, the
 • Many protocols achieve the optimal tradeoffs in synchronous total hash rate varies over many orders of magnitude over the years.
 and partially synchronous environments. For example, Poly- Yet, the blockchains remain continuously available, i.e., live. Our
 graph [13], HotStuff [41] and Streamlet [10] achieve the optimal results say that it is impossible to support accountability for such
 tradeoff in the partially synchronous environment. Sync Hot- dynamically available protocols, i.e., protocols that are live under
 Stuff [2] and Sync Streamlet [10] achieve the optimal tradeoff dynamic participation. We call this the availability-accountability
 in the synchronous environment (see Appendix E). In fact, in dilemma.
 2
The Availability-Accountability Dilemma and its Resolution via Accountability Gadgets

 The works Casper [6] and Gasper [7] provide some hints on how Checkpoint decisions
to get around this dilemma. One way to interpret these works is that

 Confirmed
 Checkpoint decisions
they aim to design an accountability gadget to provide a checkpoint-

 blocks

 Votes

 Votes
 Accountable
ing mechanism on top of a dynamically available chain, so that the

 interpreter
 consensus
 generator
 Checkpoint-
 txs LOGacc

 Vote

 Vote
available chain can continue to grow while the ledger up to the lat- respecting
 longest LOGbft
est checkpoint is accountable. While the availability-accountability chain
dilemma says that a single ledger cannot be made dynamically avail- Πlc Πbft
able and accountable at all times, these works can be interpreted Πacc Accountability gadget
 LOGda
as aiming to generate two ledgers: 1) a full ledger LOGda , which
is dynamically available, 2) an accountable ledger LOGacc , which
 Figure 2: We construct an accountability gadget Π acc from
is the checkpointed prefix of the full ledger. Unfortunately these
 any accountable BFT protocol Πbft and apply it to a longest-
works have several significant limitations:
 chain-type protocol Π lc as follows: The fork choice rule of
 • They lack a formulation to specify what security properties Πlc is modified to respect the latest checkpoint decision.
 they want to achieve for these two ledgers. In particular, it is Blocks confirmed by Π lc are output as available ledger LOGda .
 not clear how closely the checkpointed ledger can track the They are also the basis on which nodes generate a proposal
 available ledger. and vote for the next checkpoint. To ensure that all nodes
 • Liveness attacks have been discovered for these protocols [31, reach the same checkpoint decision, consensus is reached
 33, 33, 35]. (To reinforce this point, in Appendix I we present on which votes to count using Π bft . Checkpoint decisions are
 a new practical attack which dispenses with the adversarial output as accountable ledger LOGacc and fed back into the
 network delay employed by earlier attacks.) protocol to ensure consistency of future block production
 in Πlc and future checkpoints with previous checkpoints.
1.5 Resolution via Accountability Gadgets
The third contribution of this work is the design and implementa- Accountable prefix ledger Available full ledger
tion of an accountability gadget which, when applied to a longest
 Ledger length [blocks]

chain protocol, generates a dynamically available ledger LOGda 200
and a checkpointed prefix ledger LOGacc with provably optimal
security properties.
 Consider a network with a total of permissioned nodes, and 100
an environment where the network may partition and the nodes
may go online and offline.
 0
(1) (P1: Accountability) For any < /2, the accountable ledger 0 500 1,000 1,500 2,000 2,500
 LOGacc can provide an accountable safety resilience of −2 +2 Time [s]
 at all times, and it is live after the partition heals and greater
 than − honest nodes come online. Figure 3: Ledger dynamics of a longest chain protocol outfit-
(2) (P2: Dynamic Availability) The available ledger LOGda is ted with our accountability gadget based on HotStuff, mea-
 guaranteed to be safe after network partition and live at all sured with 4,100 nodes distributed around the world. The
 times, provided that fewer than 1/2 of the online nodes are available full ledger grows steadily. The accountable pre-
 adversarial. fix periodically catches up whenever a new block is check-
 Note that while the checkpointed ledger is by definition always pointed. (Here, no attack; for attack, cf. Figure 6.)
a prefix of the full available ledger, the above result says that the
checkpointed ledger will catch up with the available ledger when respect the checkpoints. That is, new blocks are proposed and the
the network heals and a sufficient number of honest nodes come ledger of confirmed transactions is determined based on the longest
online. chain among all the chains containing the latest checkpointed block.
 The achieved resiliences are optimal. This can be seen by com- This gives the available full ledger LOGda . Periodically, nodes vote
paring this result with Figure 1 (iii) and (v). The checkpointed ledger on the next checkpoint (following a randomly selected leader’s
LOGacc cannot achieve a better tradeoff between accountable safety proposal). To ensure that when tallying votes all nodes base their
resilience and liveness resilience than the tradeoff in (v); it in fact decision for the next checkpoint on the same set of votes, any
achieves exactly the same tradeoff. The dynamically available led- accountable BFT protocol designed for a fixed level of participation
ger LOGda cannot achieve a better resilience than the (1/2, 1/2) can be used (entirely as a black box) to reach consensus on the votes.
point in (iii); the ledger in fact achieves it. Moreover, even if the The chain up to the latest checkpoint constitutes the accountable
network were synchronous at all times, no protocol could have prefix ledger LOGacc . Consistency of blocks confirmed by Πlc and
generated an accountable ledger with better resilience, in light of future checkpoint proposals with established checkpoints is ensured
Figure 1 (iv). So we are getting partition-tolerance for free, even throughout.
though accountability is the goal. Since there are many accountable BFT protocols [25], we have
 The accountability gadget construction is shown in Figure 2. It a lot of implementation choices. Due to its maturity and the avail-
is built on top of any existing longest chain protocol modified to
 3
Joachim Neu, Ertem Nusret Tas, and David Tse

ability of a high quality open-source implementation which we though they are different, it turns out that some, but not all, pro-
could employ practically as a black box, we decided to implement a tocols that resolve the availability-finality dilemma can be used to
prototype of our accountability gadget using the HotStuff protocol resolve the availability-accountability dilemma. The first resolution
[41]. Taking the Ethereum 2.0’s beacon chain as a target appli- of the availability-finality dilemma is the class of snap-and-chat pro-
cation and matching its key performance characteristics such as tocols [35], which combines a longest chain protocol with a partially
latency and block size, we performed Internet-scale experiments synchronous BFT protocol in a black box manner to provide finality.
to demonstrate that our solution can meet the target specification If the partially synchronous BFT protocol is accountable, it is not
with over 4000 participants (see Figure 3). In particular, for the cho- too difficult to show [34] that the resulting snap-and-chat protocol
sen parameterization and even before taking reduction measures, would also provide a resolution to the availability-accountability
the peak bandwidth required for a node to participate does not dilemma. On the other hand, checkpointed longest chain [39], an-
exceed 1.5 MB/s (with a long-term average of 78 KB/s) and hence other resolution of the availability-finality dilemma, is not account-
is feasible even for many consumer-grade Internet connections. At able, as shown in Appendix F.
the same time, our prototype provides 5× better average latency of The accountability gadget we designed combines elements from
LOGacc compared to the instantiation of Gasper currently used for snap-and-chat protocols and from the checkpointed longest chain.
Ethereum 2’s beacon chain. A strength of snap-and-chat protocols is its black box nature which
 gives it a flexibility to provide additional features. A drawback is
1.6 Related Works that the protocol may reorder the blocks in the longest chain proto-
1.6.1 Accountability. Accountability in distributed protocols has col to form the final ledger [34]. This means that when a proposer
been studied in earlier works [23, 24]. [23] designed a system, Peer- proposes a block on the longest chain, it cannot predict the ledger
Review, which detects faults. [24] classifies faults into different state and check the validity of the transactions by just looking at
types and studies their detectability. Casper [6] focuses on account- the earlier blocks in the longest chain. This lack of predictive valid-
ability and fault detection when there is violation of safety, and led ity opens the protocol to spamming and limits the use of standard
to the notion of accountable safety resilience we use in this work. techniques to support light clients and sharding. Checkpointed
Polygraph [13] is a partially synchronous BFT protocol which is se- longest chain builds upon a line of work called finality gadgets
cure when there are less than /3 adversarial nodes, and when there [6, 7, 18, 40] and overcomes this limitation of snap-and-chat pro-
is a safety violation, at least /3 nodes can be held accountable. In tocols because the longest chain protocol is modified to respect
our formulation, this corresponds to achieving the point ( /3, /3) the checkpoints. However, checkpointed longest chain’s finality
on Figure 1 (v). [38] builds upon [13] to create a blockchain which gadget is not black box but specifically uses Algorand BA [11],
can exclude Byzantine nodes that were found to have provably which is not accountable [25]. Our accountability gadget solution
violated the protocol. builds on the checkpointed longest chain but, like snap-and-chat
 Many of these previous works focus on studying the accounta- protocols, allows the use of any BFT protocol as a black box. When
bility of specific protocols and think of accountability as an add-on an accountable BFT protocol like HotStuff is used, the checkpointed
feature in addition to the basic security properties of the protocol. ledger is guaranteed to be accountable.
[25] follows this spirit but broadens the investigation to formulate
a framework to study the accountability of many existing BFT pro- 1.7 Outline
tocols. More specifically, their framework augments the traditional The remainder of this paper is structured as follows: First, we in-
resilience metric with accountable safety resilience (which they troduce the notation and model for a formal treatment of the trade-
call forensic support). The present work is more in the spirit of offs among safety, liveness and accountable safety resiliences in
[6] where accountability is a central design goal, not just an add- Section 2. Then, Section 3 presents the proof of the availability-
on feature. To formalize this spirit, we split traditional resilience accountability dilemma and formalizes the tradeoffs visualized in
into safety and liveness resiliences, upgrade safety resilience to Figure 1. Section 4 elaborates on the accountability gadgets intro-
accountable safety resilience, and formulate accountable security duced in Section 1.5 and argues for their security. We discuss de-
as a tradeoff between liveness resilience and accountable safety tails of a prototype implementation and experimental performance
resilience. Further, we broaden the study to the important dynamic results in Section 5. Finally, we conclude with a generalization of
participation environment, where we discovered the availability- accountability gadgets to Proof-of-Work and Proof-of-Space longest
accountability dilemma. Despite these differences in formulation chain protocols in Section 6.
and in scope, we are able to adopt the proof of the impossibility
result Theorem B.1 in [25], because at the heart of it, that theorem
 2 MODEL
is really about the tradeoff between liveness and accountable safety
resiliences, although not stated as such. We first give an overview of the client-server model for state ma-
 chine replication (SMR) protocols and introduce the notation that
1.6.2 Availability-Finality Dilemma and Finality Gadgets. The avail- will be used in subsequent proofs. In the classical SMR formulation,
ability-finality dilemma [22, 27, 35] states that no protocol can pro- nodes take inputs called transactions and enable clients to agree on
vide both finality, i.e., safety under network partitions, and availa- a single sequence of transactions, called the ledger and denoted by
bility, i.e., liveness under dynamic participation. The availability- LOG, that produced the state evolution. For this purpose, nodes
accountability dilemma states that no protocol can provide both exchange messages, e.g., blocks or votes, and each node records
accountable safety and liveness under dynamic participation. Al- its view of the protocol by time in an execution transcript T .
 4
The Availability-Accountability Dilemma and its Resolution via Accountability Gadgets

 To obtain the ledger at time , clients query the nodes running dynamic participation: For any given time slot, A determines which
the protocol. When a node is queried at time , it produces evidence honest nodes are awake/asleep at that slot, subject to the constraint
w by applying an evidence generation function W to its current that at all time slots, at most fraction of awake nodes are adversar-
transcript: w ≜ W (T ). Upon collecting evidences from some ial and at least one honest node is awake. At all times, the adversary
subset of the nodes, each client applies the confirmation rule C is required to deliver messages sent between honest nodes in at
to this set of evidences to obtain the ledger: LOG ≜ C({w } ∈ ). most Δ slots. Adversarial nodes are assumed to be always awake.
Protocols typically require to query a subset containing at least
 Examples: To illustrate the model above, consider a client that
one honest node.
 queries nodes running a Nakamoto-style longest chain protocol
 Environment: We assume that the transactions are input to the under (Ada, Zda ) at some time . Suppose < 1/2. The transcript
nodes by the environment Z. There are in total nodes numbered T held by node at time consists of the blocks received by node
from 1 thru . There exists a public-key infrastructure and each by time . Given the transcript T , W outputs as evidence the
node is equipped with a unique cryptographic identity. There is a longest chain implied by T . Upon collecting evidences from a
random oracle, serving as a common source of randomness for the subset of awake nodes with at least one honest node, a client calls
protocols. Time is slotted and the nodes have synchronized clocks. C which selects the longest chain in the set {w } ∈ and outputs
 the -deep prefix of that longest chain as the ledger.
 Corruption: Adversary A is a probabilistic poly-time algorithm. We can also consider propose-and-vote-style BFT protocols such
Before the protocol execution starts, A gets to corrupt (up to) as HotStuff, LibraBFT, Streamlet and PBFT [9, 10, 28, 41] with =
nodes, then called adversarial nodes. Adversarial nodes surrender 3 + 1 nodes under (Ap, Zp ). In this case, the transcript T held by
their internal state to the adversary and can deviate from the proto- a node at time consists of all received messages such as proposals
col arbitrarily (Byzantine faults) under the adversary’s control. The and votes. Given T , W outputs as evidence a sequence of proposals
remaining ( − ) nodes are called honest and follow the protocol with votes attesting to them. Upon collecting evidences from a
as specified. subset of nodes containing at least one honest node, a client calls C,
 Sleeping: To model dynamic participation, we adopt the concept which outputs the largest possible sequence of proposals that can be
of sleepiness from [37]. In the setting with dynamic participation, confirmed given the votes attesting to them. The confirmation rule
A chooses, for every time slot and node, whether an honest node is typically requires votes from − +1 nodes on consecutive proposals
awake (i.e., online) or asleep (i.e., offline) in that slot. Z then wakes to guarantee safety which follows from a quorum intersection
up or puts nodes to sleep following the schedule determined by argument. Liveness ensues from the fact that the honest evidence
A. An honest node that is awake in a slot executes the protocol within includes all of the confirmed proposals submitted by honest
faithfully in that slot. An honest node that is asleep in a slot does nodes. Existence of an honest evidence in is typically enforced
not execute the protocol in that slot, and messages that would have by collecting evidences from at least + 1 nodes.
arrived in that slot are queued and delivered in the first slot in which Safety and Liveness Resiliences: Safety and liveness are defined
the node is awake again. Adversarial nodes are always awake. We as the traditional security properties of SMR protocols:
define as the maximum value of the fraction of adversarial nodes
over the total number of awake nodes throughout the execution of Definition 1. Let confirm be a polynomial function of the security
the protocol. parameter of an SMR protocol Π. We say that Π with a confirma-
 tion rule C is secure and has transaction confirmation time confirm
 Networking: Nodes can send each other messages, which arrive if ledgers output by C satisfy:
with a certain delay controlled by the adversary, subject to con- • Safety: For any time slots , ′ and sets of nodes , ′ satisfying
straints elaborated below. the requirements stipulated by the protocol, either LOG ≜
 ′
 Network Environments: Given the definitions above, we provide C({w } ∈ ) is a prefix of LOG ′ ≜ C({w } ∈ ′ ) or vice versa.
three sets of assumptions on the environment Z and the adver- • Liveness: If Z inputs a transaction to an awake honest node
sary A to model a synchronous network, a partially synchronous at some time , then, for any time slot ′ ≥ + confirm and any
network and a synchronous network with dynamic participation, set of nodes satisfying the requirements stipulated by the
 ′
respectively. These assumptions are expressed as the (A, Z) tuples: protocol, the transaction is included in LOG ≜ C({w } ∈ ).
 (As, Zs ) formalizes the model of a synchronous network, where Definition 2. For static (dynamic) participation, safety resilience of
the adversary corrupts nodes, and all of the nodes are awake a protocol is the minimum number of adversarial nodes (minimum
throughout the execution. At all times, adversary is required to fraction of adversarial nodes among awake nodes) to cause a
deliver all messages sent between honest nodes in at most Δ slots. safety violation. Such a protocol provides -safety ( -safety).
 (Ap, Zp ) formalizes a partially synchronous network, where
the adversary corrupts nodes, and all of the nodes are awake Definition 3. For static (dynamic) participation, liveness resilience
throughout the execution. Before a global stabilization time GST, A of a protocol is the minimum number of adversarial nodes (mini-
can delay network messages arbitrarily. After GST, A is required mum fraction of adversarial nodes among awake nodes) to cause
to deliver all messages sent between honest nodes in at most Δ a liveness violation. Such a protocol provides -liveness ( -liveness).
slots. GST is chosen by A, unknown to the honest nodes, and can Accountable Safety Resilience: To formalize the concept of ac-
be a causal function of the randomness in the protocol. countable safety resilience, we define an adjudication function J ,
 (Ada, Zda ) formalizes the model of a synchronous network with similar to the forensic protocol defined in [25], as follows:
 5
Joachim Neu, Ertem Nusret Tas, and David Tse

Definition 4. An adjudication function J takes as input two sets from honest nodes and to have confirmed a ledger based solely on
of evidences and ′ with conflicting ledgers LOG ≜ C( ) and the longest chain provided by the adversarial evidences. Indeed,
LOG ′ ≜ C( ′ ), and outputs a set of nodes that have provably both would obtain non-empty ledgers, because the longest chain
violated the protocol rules. So, J never outputs an honest node. is dynamically available, but these two ledgers would conflict. Yet,
 based on the two sets of evidences, the judge J can neither dis-
 When the clients observe a safety violation, i.e., at least two
 tinguish who is honest client and who is co-conspirator, nor tell
sets of evidences and ′ such that LOG ≜ C( ) and LOG ′ ≜
 which nodes are honest or adversarial. So none of the adversarial
C( ′ ) conflict with each other, they call J on these evidences to
 nodes can be held accountable (without risking to falsely convict
identify nodes that have violated the protocol.
 an honest node).
 Accountable safety resilience builds on the concept of -account-
 A formal proof building on this observation is as follows:
able-safety first introduced in [6]:
Definition 5. For static (dynamic) participation, accountable safety
resilience of a protocol is the minimum number of nodes (mini-
mum fraction of nodes among awake nodes) output by J in the Proof. For the sake of contradiction, suppose there exists an
event of a safety violation. Such a protocol provides -accountable- SMR protocol Π that provides l -liveness and a -accountable-safety
safety ( -accountable-safety). for some l, a > 0 under (Ada, Zda ). Then, there exists an adjudi-
 Note that -accountable-safety implies -safety of the protocol cation function J , which given two sets of evidences attesting to
(and the same for ) since J outputs only adversarial nodes. conflicting ledgers, outputs a non-empty set of adversarial nodes.
 Suppose there are nodes in Z. Without loss of generality, we
3 THE AVAILABILITY-ACCOUNTABILITY may assume that is even; otherwise, Z puts one node to sleep
 throughout the execution. Let and partition the nodes into
 DILEMMA
 two disjoint equal groups with | | = | | = /2. We denote by [tx]
In this section, we investigate the fundamental tradeoffs between a ledger consisting of a single transaction tx at its first index.
liveness, safety and accountable safety resiliences shown in Figure 1 Next consider the following worlds:
under three different network environments: synchrony (As, Zs ), World 1: Nodes in are honest and awake throughout the
partial synchrony (Ap, Zp ) and dynamic participation (Ada, Zda ). execution. Z inputs tx1 to them. Nodes in are asleep. Since Π
 satisfies liveness for some l > 0 under (Ada, Zda ), nodes in 
3.1 Accountability and Liveness are eventually generate a set of evidences 1 such that C( 1 ) = [tx1 ].
 Incompatible Under Dynamic Participation World 2: Nodes in are honest and awake throughout the
We observe that the strictest tradeoff between the liveness and execution. Z inputs tx2 to them. Nodes in are asleep. Since Π
accountable safety resilience occurs for dynamically available pro- satisfies liveness for some l > 0 under (Ada, Zda ), nodes in 
tocols under (Ada, Zda ) (Figure 1 (vi)), a result which was named eventually generate a set of evidences 2 such that C( 2 ) = [tx2 ].
the availability-accountability dilemma in Section 1.4: World 3: Z wakes up all nodes, and inputs tx1 to the nodes in
 and tx2 to the nodes in . Nodes in are honest. Nodes in are
Theorem 1. No SMR protocol provides both a -accountable-safety
 adversarial and do not communicate with the nodes in . All nodes
and l -liveness for any a, l > 0 under (Ada, Zda ).
 stay awake throughout the execution. Since the worlds 1 and 3 are
 Theorem 1 states that under dynamic participation it is impos- indistinguishable for the nodes in , they eventually generate a set
sible for an SMR protocol to provide both positive accountable of evidences 1 such that C( 1 ) = [tx1 ]. Nodes in simulate the
safety resilience and positive liveness resilience. In light of this execution in world 2 without any communication with the nodes in
result, protocol designers are compelled to choose between pro- . Hence, they eventually generate a set of evidences 2 such that
tocols that maintain liveness under fluctuating participation, and C( 2 ) = [tx2 ]. Thus, there is a safety violation. So, J takes 1
protocols that can enforce the desired incentive mechanisms high- and 2 , and outputs a non-empty set 3 ⊆ of adversarial nodes.
lighted in Section 1.1 via accountability. Since both of the above World 4: Z wakes up all nodes, and inputs tx1 to the nodes in
features are desirable properties for Internet-scale consensus pro- and tx2 to the nodes in . Nodes in are honest. Nodes in are
tocols, the availability-accountability dilemma presents a serious adversarial and do not communicate with the nodes in . All nodes
obstacle in the effort to obtain an incentive-compatible and robustly stay awake throughout the execution. Since the worlds 2 and 4 are
live protocol for applications such as cryptocurrencies. indistinguishable for the nodes in , they eventually generate a set
 To build some intuition for the proof of Theorem 1, let us consider of evidences 2 such that C( 2 ) = [tx2 ]. Nodes in simulate the
a permissioned longest chain protocol under (Ada, Zda ) where half execution in world 1 without any communication with the nodes in
of nodes are adversarial. Adversarial nodes avoid all communica- . Hence, they eventually generate a set of evidences 1 such that
tion with honest nodes and build a private chain that conflicts with C( 1 ) = [tx1 ]. Thus, there is a safety violation. So, J takes 1
the chain built collectively by the honest nodes. Such diverging and 2 , and outputs a non-empty set 4 ⊆ of adversarial nodes.
chains mean the possibility of an (ostensible) safety violation. Think Note however that worlds 3 and 4 are indistinguishable from the
of an honest client towards whom adversarial nodes pretend to perspective of the adjudication function J . Thus, it is not possible
be asleep and who confirms a ledger based solely on the longest that J reliably outputs a non-empty set which in the case of world
chain provided by the honest evidences; and a co-conspirator of 3 contains only elements of and in the case of world 4 contains
the adversary who pretends to not have received any evidences only elements of , as would be required by Definition 4. □
 6
The Availability-Accountability Dilemma and its Resolution via Accountability Gadgets

3.2 Tradeoff Between Accountable Safety and Theorem 4. For any SMR protocol that satisfies s -safety and l -
 Liveness Resiliences liveness under (Ap, Zp ), l ≤ ⌈ /2⌉ and s ≤ − 2 l + 2.
Proof of Theorem 1 relies on the fact that in a dynamically available Proof of Theorem 4 is given in Appendix C. It states the funda-
protocol, adversarial nodes, by private execution, can always create mental safety-liveness resilience tradeoff for partially synchronous
a set of evidences that yields a conflicting ledger through the con- protocols. It is a generalization of the celebrated /3 resilience
firmation rule C. This is because dynamically available protocols bound [19] for the security of partially synchronous protocols.
cannot set a lower bound on the number of evidences eligible to
 Theorem 5. For any SMR protocol that satisfies s -safety and l -
generate a non-empty ledger through C, and thus are forced to
 liveness under (Ada, Zda ), l, s < 1/2.
output ledgers for evidences from any number of nodes. However,
in the case of a BFT protocol with a fixed level of participating Proof of Theorem 5 is given in Appendix D. It states the funda-
nodes, such an attack on accountability will not be possible as the mental safety-liveness resilience tradeoff for dynamically available
protocol could require a valid input to C to contain evidences from protocols, and generalizes the intuition behind a similar result for
at least a certain fraction of the nodes. In this context, instead of Proof-of-Work protocols given by [36, Theorem 3]. Dynamically
an availability-accountability dilemma, we can talk about a softer available protocols are designed to output ledgers for sets of evi-
tradeoff between the accountable safety and liveness resiliences dences containing responses from any number of nodes, as they do
(Figure 1 (iv) and (v)), which is formalized below: not know a priori the number of awake nodes. In this case, given
 two sets of evidences for conflicting ledgers, their relative sizes
Theorem 2. For any SMR protocol that satisfies a -accountable-
 become the only selection criterion for such protocols, unlike in a
safety and l -liveness, a ≤ max(0, − 2 l + 2).
 static environment where the protocol can require evidences from a
 Proof of Theorem 2 follows from the proof of [25, Theorem B.1] fixed number of nodes. Thus, the adversary is able to violate safety
and is given in Appendix A. Both proofs rely on the observation and liveness whenever it controls the majority among awake nodes.
that for an SMR protocol (or BA protocol in the case of [25, Theorem Finally, we show that the curves of Figure 1 (i)–(iii) implied
B.1]) that satisfies l -liveness ( l -validity for [25, Theorem B.1]), no by Theorems 3, 4, and 5 are tight. Sync Streamlet [10] and Sync
adjudication function is able to output more than max(0, − 2 l + 2) HotStuff [2] achieve all liveness and safety resilience points ( l, s )
nodes in the event of a safety (agreement for [25, Theorem B.1]) shaded in blue in Figure 1 (i) for (As, Zs ). Streamlet and HotStuff
violation without incorrectly accusing an honest node. [41] achieve all ( l, s ) shaded in blue in Figure 1 (ii) for (Ap, Zp ).
 Finally, note that no SMR protocol provides l -liveness for l > Sleepy and Ouroboros [3, 15, 16, 26, 37] achieve all ( l, s ) shaded
 ⌈ /2⌉ under any setting, as will be explained in Section 3.3. This, in blue in Figure 1 (iii) for (Ada, Zda ). A more detailed discussion
together with Theorems 1 and 2, completes the characterization on these protocols is given in Appendix E.
of the curves of Figure 1 (iv)–(vi). We proceed to show that these
curves are tight. Tightness of (vi) under (Ada, Zda ) follows directly 4 ACCOUNTABILITY GADGETS
from the nature of the dilemma, any dynamically available protocol In this section, we give a detailed description of the accountability
‘achieves’ it. On the other hand, Sync Streamlet [10] and Sync Hot- gadget introduced in Section 1.5. For ease of exposition, we con-
Stuff [2] achieve all liveness and accountable safety resilience points struct an accountability gadget from an accountable BFT protocol
( l, a ) shaded in blue in Figure 1 (iv) for (As, Zs ), and Streamlet Πbft with accountable safety and liveness resilience of ⌊ /3⌋.
and HotStuff [41] achieve all ( l, a ) shaded in blue in Figure 1 (v)
for (Ap, Zp ). A more detailed discussion on the protocols, in par- 4.1 Protocol Description
ticular on how the synchronous protocols can be made to provide Accountability gadgets, denoted by Π acc , can be used in conjunction
accountability, is given in Appendix E. with any dynamically available longest chain (LC) protocol Π lc
 such as Nakamoto’s PoW LC protocol [30], Sleepy [37], Ouroboros
3.3 Tradeoffs Between Safety and Liveness [3, 16, 26] and Chia [14] (Figure 2). The protocol Π lc then follows a
 Resiliences modified chain selection rule where honest nodes build on the tip
In this section, we formalize the tradeoffs between safety and live- of the LC that contains all of the checkpoints they have observed.
ness resiliences shown by Figure 1 (i)—(iii) under the three different We call such chains checkpoint-respecting LCs. At each time slot
network environments (As, Zs ), (Ap, Zp ) and (Ada, Zda ): , each honest node outputs the -deep prefix of the checkpoint-
 respecting LC (or the prefix of the latest checkpoint, whichever is
Theorem 3. For any SMR protocol that satisfies s -safety and l -
 longer) in its view as LOG da, .
liveness, l ≤ ⌈ /2⌉ and s ≤ − l + 1.
 The accountability gadget Π acc has three main components as
 Proof of Theorem 3 is given in Appendix B. The theorem applies shown on Figure 2: a checkpoint vote generator (Algorithm 1)
to all SMR protocols under any network environment. It formalizes that issues checkpoint proposals and votes, an accountable SMR
the common intuition that in the presence of clients, no consensus protocol Π bft that is used to reach consensus on which votes to
protocol can maintain security when the adversary controls half or count for the checkpoint decision, and a checkpoint vote interpreter
more of the nodes. In particular, it shows that no safe SMR protocol (Algorithm 2) that outputs checkpoint decisions computed deter-
can provide liveness if the adversary controls the majority of the ministically from the ordered sequence of checkpoint votes output
nodes, as the adversary can always use its majority to either commit by Πbft . The protocol Πbft can be instantiated with any accountable
safety violations or to censor transactions. BFT protocol such as Streamlet [10], LibraBFT [28] or HotStuff
 7
Joachim Neu, Ertem Nusret Tas, and David Tse

Algorithm 1 Pseudocode for Checkpoint Vote Generator their checkpoint iterations (lines 7 and 19 of Algorithm 1). For
 1: lastCp, props ← ⊥, { : ⊥ | = 0, 1, ... } ⊲ Last checkpoint, proposals any given iteration , RecvVerifiedProp() returns only proposals
 2: for currIter ← 0, 1, ... that were signed by the legitimate leader L ( ) of that iteration. A
 3: if lastCp ≠ ⊥
 4: while waiting cp time ⊲ Wait cp time after new checkpoint proposal is said to be valid in the view of a node if the proposed
 5: on Checkpoint( , ) ← GetNextCp () with = currIter block is within ’s checkpoint-respecting LC and extends all pre-
 6: goto 23 ⊲ Jump to conclusion of current iteration vious checkpoints observed by . During an iteration , each node
 7: on Proposal( , ) ← RecvVerifiedProp () with props[ ] = ⊥
 8: props[ ] ← ⊲ Keep track of proposals from authorized leader
 checks if the proposal received for that iteration is valid using
 9: if CpLeaderOfIter(currIter) = myself the procedure IsValidProposal() (line 13 of Algorithm 1). If it has
10: Broadcast ( ⟨propose, currIter, GetCurrProposalTip () ⟩myself ) indeed received a valid proposal with the proposed block , it votes
11: while waiting to time ⟨accept, , ⟩ (line 14 of Algorithm 1). Otherwise, if the proposal
12: on props[currIter] ≠ ⊥, but at most once ⊲ Act on the first proposal received for iteration is invalid or if does not receive any pro-
 received from authorized leader before end of cp -wait and to -timeout
13: if IsValidProposal (props[currIter]) ⊲ Valid proposal extends posal for a timeout period to , it votes ⟨reject, ⟩ (lines 16 and 21 of
 latest checkpoint and is consistent with current checkpoint-respecting LC Algorithm 1). Votes are input as payload to Πbft , which outputs an
14: SubmitVote ( ⟨accept, currIter, props[currIter] ⟩myself )
 ordered sequence of votes in the form of the ledger LOGbft . Thus,
15: else
16: SubmitVote ( ⟨reject, currIter⟩myself ) ⊲ Reject invalid proposal it enables nodes to reach consensus on which votes to count for an
17: on Checkpoint( , ) ← GetNextCp () with = currIter upcoming checkpoint decision.
18: goto 23 ⊲ Jump to conclusion of current iteration The checkpoint vote interpreter (Algorithm 2) processes the se-
19: on Proposal( , ) ← RecvVerifiedProp () with props[ ] = ⊥ quence of votes in LOGbft to produce checkpoint decisions. Each
20: props[ ] ← ⊲ Keep track of proposals from authorized leader
21: SubmitVote ( ⟨reject, currIter⟩myself ) ⊲ Reject due to timeout node receives verified votes (i.e., with valid signature) in the order
22: wait on Checkpoint( , ) ← GetNextCp () with = currIter they appear on LOGbft via the procedure GetNextVerifiedVote-
23: lastCp ← ⊲ Keep track of checkpoint decision FromBft() (line 4 of Algorithm 2). Upon observing unique accept
 votes ⟨accept, , ⟩ from more than 2 /3 nodes for a block and
Algorithm 2 Pseudocode for Checkpoint Vote Interpreter the current iteration , each node outputs as the checkpoint for
 1: for currIter ← 0, 1, ... iteration via the procedure OutputCp() (line 10 of Algorithm 2).
 2: currVotes ← { (pk, ⊥) | pk ∈ committee} ⊲ Latest vote of each node The checkpointed blocks output by OutputCp() over time, together
 3: while true ⊲ Go through votes as ordered by Πbft
 4: vote ← GetNextVerifiedVoteFromBft () ⊲ Verify signature
 with their respective prefixes, constitute the ledger LOG acc, . Fur-
 5: if vote = ⟨accept, , ⟩pk with = currIter thermore, the checkpoint decisions are fed back to Πlc and the
 6: currVotes[pk] ← Accept( ) ⊲ Count accept vote for block checkpoint vote generator so that they can ensure consistency of
 else if vote = ⟨reject, ⟩pk with = currIter
 future block production in Πlc and checkpoint proposals with prior
 7:
 8: currVotes[pk] ← Reject ⊲ Count reject vote
 9: if ∃ : | {pk | currVotes[pk] = Accept( ) } | > 2 /3 checkpoints.
10: OutputCp (Checkpoint(currIter, )) ⊲ New checkpoint decision On the other hand, upon observing reject votes ⟨reject, ⟩ from
11: break /3 nodes, each node outputs ⊥ as the checkpoint decision for
12: else if | {pk | currVotes[pk] = Reject} | ≥ /3
13: OutputCp (Checkpoint(currIter, ⊥)) ⊲ Abort current iteration the current iteration (line 13 of Algorithm 2). Here, ⊥ signals
14: break that an iteration is aborted with no new checkpointed block, which
 happens if honest nodes suspect a faulty checkpoint leader and vote
 ‘reject’ because they have not seen progress for too long. Note that
[41]. It is used as a black box ordering service within Π acc and is
 once a node outputs a checkpoint decision for its current iteration
assumed to have confirmation time confirm . We denote the ledger
 , the checkpoint vote interpreter jumps to iteration + 1; thus, only
output by Πbft as LOGbft , and emphasize that it remains internal to
 a single decision is output per iteration.
the protocol Π acc . Checkpoint vote generator and interpreter are
 Upon receiving a new checkpoint for the current iteration via
run locally by each node and interact with Π bft and LOGbft . Hence,
 the procedure GetNextCp(), nodes terminate iteration of the
whenever we refer to LOGbft in the following paragraphs, we mean
 checkpoint vote generator and enter iteration + 1 (lines 6 and 18 of
the ledger in the view of a specific node.
 Algorithm 1). If the checkpoint decision was for a non-empty block
 The accountability gadget Πacc proceeds in checkpoint iterations
 , nodes wait for cp time, denoted as the checkpoint interval, before
denoted by , each of which attempts to checkpoint a new block
 they consider checkpoint proposals for iteration + 1. Similarly,
in Πlc . The checkpoint vote generator produces requests which
 an honest leader for iteration + 1 waits for cp time before it
can be of three forms: a proposal ⟨propose, , ⟩ proposing block
 broadcasts the new checkpoint proposal. As will become clear in
 for checkpointing in iteration , an accept vote ⟨accept, , ⟩ in
 the analysis, the checkpoint interval is crucial to ensure that Π lc ’s
favor of checkpointing a block in iteration , or a reject vote
 chain dynamics are ‘not disturbed too much’ by accommodating
⟨reject, ⟩ for iteration . Here, ⟨...⟩ denotes a message signed by
 and respecting checkpoints.
node . Each iteration has a publicly verifiable and unique leader
L ( ) sampled using a random oracle. The leader obtains the cp -
deep block on its checkpoint-respecting LC via the procedure 4.2 Security Properties
GetCurrProposalTip() and broadcasts it to all other nodes as the In this section, we formalize and prove the security properties P1
checkpoint proposal for iteration . (line 10 of Algorithm 1). Nodes and P2 from Section 1.5 for accountability gadgets based on permis-
receive checkpoint proposals from the network via the procedure sioned LC protocols [3, 16, 26, 37]. (For an extension of the security
RecvVerifiedProp(), and order according them with respect to analysis to Proof-of-Work and Proof-of-Space LC protocols, see Sec-
 8
The Availability-Accountability Dilemma and its Resolution via Accountability Gadgets

 4 Gap and recency properties 5 Recurrence of checkpoint-strong
 of Π acc (Appendix G.4) pivots (Appendix H.3)

 1 Consistency of check- 2 Accountability 6 Security of Π after 7 Liveness of Π after 9 Checkpointing 10 Security of Π
 lc bft lc
 pointed blocks in Π lc of LOGbft max(GST, GAT) (Appendix H) max(GST, GAT) cp -deep blocks under synchrony

 3 Accountability 8 Liveness of LOG after
 acc 11 Security of LOG
 of LOGacc (Appendix G.2) max(GST, GAT) (Appendix G.3) da

 Figure 4: Dependency of the security properties of LOGacc and LOGda on the properties of Π acc , Πlc and Π bft .

tion 6.) For this purpose, we first fix ≤ ⌈ /2⌉ and consider an ac- The resiliences achieved by LOGacc and LOGda are optimal, as
countability gadget Π acc instantiated with a partially synchronous can be seen from Theorem 2 which states that for any protocol
BFT protocol Πbft that provides ( − 2 + 2)-accountable-safety at satisfying a -accountable-safety and l -liveness, it must be the case
all times, and -liveness under partial synchrony after the network that a ≤ − 2 l + 2, and from Theorem 5 which states that for any
partitions heal and sufficiently many honest nodes come online. protocol satisfying s -safety and l -liveness under (Ada, Zda ), it
(An example of such a protocol is HotStuff [41] with a quorum size must be the case that l, s ≤ 1/2.
of − + 1. See Appendix E for further discussion.) To provide the To prove Theorem 6, we will first focus on the security of LOGda
same accountable safety resilience as Π bft for LOGacc , we select under (Ada, Zda ) (box 11 of Figure 4). We know from [3, 16, 26, 37]
the thresholds for the number of accept and reject votes required that Πlc is safe and live with some security parameter under
to output a new checkpoint as − and in lines 9 and 12 of the original LC rule when < 1/2 (box 10 of Figure 4). Hence, if
Algorithm 2, respectively. cp is selected as an appropriate linear function of , once a block
 Let (Apda, Zpda ) denote a partially synchronous network with becomes cp -deep at time in the LC held by a node, it stays on
dynamic participation. It extends the partially synchronous network the LCs held by all of the honest nodes for all times ≥ . Since
defined in Section 2 (with a global stabilization time GST) to include any block checkpointed by an honest node at time has to be at
asleep/awake nodes and a global awake time GAT. Before GAT, the least cp -deep in the LC held by the proposer of that block (box 9
adversary determines which honest nodes are awake or asleep at of Figure 4), checkpoints are and will stay as part of the original
each slot. After GAT, all honest nodes are awake. Here, GAT, just LCs held by every other honest node for all times ≥ . Then, the
like GST, is chosen by the adversary, unknown to the honest nodes evolution of the checkpoint-respecting LCs will be the same as the
and can be a causal function of the randomness in the protocol. But, evolution of the LCs in the absence of checkpoints. This implies that
while GST needs to happen eventually (GST < ∞), GAT may be LOGda inherits the security properties of the original LC protocol
infinite. So, whereas GST represents the time when the partition Πlc .
heals, GAT represents the time when sufficiently many honest nodes We next focus on the accountability and liveness of LOGacc un-
wake up. With GST and GAT, (Apda, Zpda ) enables us to express der (Apda, Zpda ) with global stabilization and awake times GST
security properties for accountability gadgets under environments and GAT (boxes 3 and 8 of Figure 4). The pseudocode of Π acc stipu-
with both network partitions and dynamic participation. lates that honest nodes vote accept only for checkpoint proposals
 Let denote the security parameter associated with the employed that are consistent with previous checkpoints they have observed
cryptographic primitives. Similarly, let denote the security pa- (box 1 of Figure 4). Moreover, each new checkpoint requires at
rameter associated with the LC protocol Πlc . Then, we can state the least − + 1 accept votes (line 9 of Algorithm 2). Thus, in the
following theorem for the security properties of the ledgers LOGacc event of a safety violation, either there should be two inconsistent
and LOGda output by the accountability gadget Πacc and the per- ledgers LOGbft held by honest nodes, or at least − 2 + 1 nodes
missioned LC protocol Π lc (modified to be checkpoint-respecting): (which cannot be honest) must have voted on inconsistent check-
 points with respect to a single ledger LOGbft . In both cases, at least
Theorem 6. For any , , and confirm, , cp linear in : − 2 + 2 adversarial nodes can be identified by either invoking
(1) (P1: Accountability) Under (Apda, Zpda ), the accountable led- ( − 2 + 2)-accountable-safety of LOGbft (box 2 of Figure 4) or the
 ger LOGacc provides ( − 2 + 2)-accountable-safety at all times consistency requirement for checkpoints (box 1 of Figure 4). This
 (except with probability negl( )), and there exists a constant implies ( − 2 + 2)-accountable-safety of LOGacc , a detailed proof
 C such that LOGacc provides -liveness with confirmation time of which can be found in Appendix G.2.
 confirm after C max(GST, GAT) (except with probability negl( )). Liveness of LOGacc (box 8 of Figure 4) is the most involved
(2) (P2: Dynamic Availability) Under (Ada, Zda ), the available part of the proof and requires the existence of iterations after
 ledger LOGda provides 1/2-safety and 1/2-liveness at all times max(GST, GAT) where all honest nodes vote accept for proposals
 (except with probability negl( ) + negl( )). by honest leaders. This, in turn, depends on whether the proposals
(3) (Prefix) LOGacc is always a prefix of LOGda . by honest leaders are consistent with the checkpoint-respecting
 LCs held by the honest nodes after max(GST, GAT). To show that
 Here, negl(.) denotes a negligible function, i.e., a function that this is indeed the case, we prove that Π lc recovers its security af-
decays faster than all polynomials.
 9
Joachim Neu, Ertem Nusret Tas, and David Tse

ter max(GST, GAT) (box 6 of Figure 4). For this purpose, we first Checkpoints
observe that in the presence of checkpoints, honest nodes aban-

 Get/validate

 vote interpreter
 Tip to grow
 duction lottery

 vote generator
don their LC if a new checkpoint is revealed on another (possibly

 LC block pro-

 tree manager

 proposals
 Checkpoint

 Checkpoint
 LC block

 HotStuff
shorter) chain. Then, there can be honest blocks that do not con-

 Votes

 Votes
tribute to chain growth due to checkpoints arising at competing LOGacc
chains. This feature of checkpoint-respecting LCs violates a core
 Checkpoints
assumption of the standard proof techniques [21, 26, 37] for LC
protocols. Hence, to bound the number of such abandoned honest New
 Proposals HotStuff
 messages LOGda
blocks and demonstrate the self-healing property of checkpoint blocks
respecting LCs, we follow a different approach introduced in [39]. New
 blocks
In this context, we first observe the gap and recency properties for libp2p Gossipsub network Network
Πacc (Appendix G.4) which were highlighted in [39] as necessary
conditions for any checkpointing mechanism to ensure the self-
 Figure 5: Components and their interactions in our imple-
healing of Π lc (box 4 of Figure 4). The gap property states that the
 mentation of Figure 2. Gray: off the shelf components used
checkpoint interval cp has to be sufficiently longer than the time
 as black boxes. Blue: taken from Πlc without modification.
it takes for a checkpoint proposal to be checkpointed by Πacc . The
 Green: taken from Π lc , modified to respect checkpoints.
recency property requires that newly checkpointed blocks were
held in the checkpoint-respecting LC of at least one honest node
within a short time interval preceding the checkpoint decision. attack which we have discovered (detailed in Appendix I).
 Using the gap and recency properties, we next extend the analy-
 Implementation: Our prototype is implemented in the program-
sis of [39] to Proof-of-Stake protocols by introducing the concept
 ming language Rust. A diagram of the different components and
of checkpoint-strong pivots, a generalization of strong pivots from
 their interactions is provided in Figure 5. We use a longest chain
[37]. Whereas strong pivots count honest and adversarial blocks
 protocol modified to respect latest checkpoints as Πlc , with a per-
to claim the convergence of the LC in the view of different honest
 missioned block production lottery with winning probability per
nodes, checkpoint-strong pivots consider only the honest blocks
 node and per time slot of duration slot ; and HotStuff2 as Π bft . Hon-
that are guaranteed to extend the checkpoint-respecting LC, thus
 est nodes pause HotStuff (including its timeouts) while waiting for
resolving the issue of non-monotonicity for these chains. Recur-
 the next checkpoint proposal. All communication (including Hot-
rence of checkpoint-strong pivots after max(GST, GAT) (box 5 of
 Stuff’s) takes place in a broadcast fashion via libp2p’s Gossipsub
Figure 4) along with the gap and recency properties of Π acc enable
 protocol3 , mimicking Ethereum 2’s network layer [1], to be able
us to assert the security of Π lc after max(GST, GAT). Details of the
 to scale to thousands of nodes. Thus, we assume that under nor-
security analysis for Π lc can be found in Appendix H. Given the
 mal conditions every message received by one honest node will be
self-healing property of Πlc , liveness of LOGacc follows from the
 received by all honest nodes within some bounded delay. Since re-
liveness of Π bft after max(GST, GAT) (box 7 of Figure 4), full proof
 sponsiveness is not so important for our checkpointing application
of which is given in Appendix G.3.
 and to avoid broadcasting quorum certificates, we use a variant
 Finally, the prefix property follows from the fact that both LOGda
 of HotStuff where to ensure liveness the leader waits for the net-
and LOGacc are derived from the checkpoint-respecting LC. In par-
 work delay bound before proposing a block. Our prototype does
ticular, while LOGacc corresponds to the prefix of the latest check-
 not implement the application logic of the beacon chain (such as
point, LOGda corresponds to either the prefix of the latest check-
 validators joining and leaving, integration with shard chains and
point or that of the -deep block on the checkpoint-respecting LC,
 Ethereum 1, etc.) which can be realized on top of consensus in
selecting whichever one is longer. Hence, by construction, LOGacc
 the same way as currently done in Ethereum 2, and our prototype
is always a prefix of LOGda .
 does not use any orthogonal techniques to reduce bandwidth by
 constant factors (such as signature aggregation, short signature
5 EXPERIMENTAL EVALUATION schemes, compression of network communication, etc.) which are
To evaluate whether the protocol of Section 4.1 can serve as a drop- not fundamental to the consensus problem.
in replacement for Gasper as the Ethereum 2 beacon chain protocol,
we have implemented a prototype1 . Having proved security and Choice of parameters: We chose the parameters of our protocol
accountability in Section 4.2, we are interested in its real-world in the experiments to match the parameters of Ethereum 2’s beacon
behavior and performance characteristics at operating points cho- chain. The beacon chain has = 32 slots per epoch and = 128
sen to match those of Ethereum 2. We then compare Gasper and validators per slot, for a total of = 4096 validators (per epoch),
our protocol in terms of the average required bandwidth and the which is the approximate number of nodes that we run our experi-
latency of the accountable ledger, and conclude that our protocol ments with. To match the block inter-arrival time (i.e., the duration
can exceed the performance of Gasper. In particular, our protocol of one slot) of 12 s in the beacon chain, we set = 1/ and account
incurs comparable bandwidth at reduced latency. Finally, we ob- for the probability of no node winning the block production lottery
serve that Gasper’s resilience decreases as the number of nodes and choose slot = 7.5 s. We also match the block payload size of
increases, for fixed latency of the accountable ledger, due to a new
 2 We used this Rust implementation: https://github.com/asonnino/hotstuff
1 Source code: https://github.com/tse-group/accountability-gadget-prototype 3 We used this Rust implementation: https://github.com/libp2p/rust-libp2p
 10
You can also read