Noise-Induced Barren Plateaus in Variational Quantum Algorithms

Page created by Salvador Ramsey
 
CONTINUE READING
Noise-Induced Barren Plateaus in Variational Quantum Algorithms

                                                                      Samson Wang,1, 2 Enrico Fontana,1, 3, 4 M. Cerezo,1, 5 Kunal Sharma,1, 6
                                                                             Akira Sone,1, 5 Lukasz Cincio,1 and Patrick J. Coles1
                                                             1
                                                               Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
                                                                                       2
                                                                                         Imperial College London, London, UK
                                                                                      3
                                                                                        University of Strathclyde, Glasgow, UK
                                                                                  4
                                                                                    National Physical Laboratory, Teddington, UK
                                                            5
                                                              Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA
                                                              6
                                                                Hearne Institute for Theoretical Physics and Department of Physics and Astronomy,
                                                                                Louisiana State University, Baton Rouge, LA USA
                                                           Variational Quantum Algorithms (VQAs) may be a path to quantum advantage on Noisy
                                                        Intermediate-Scale Quantum (NISQ) computers. A natural question is whether noise on NISQ
                                                        devices places fundamental limitations on VQA performance. We rigorously prove a serious limita-
arXiv:2007.14384v3 [quant-ph] 8 Feb 2021

                                                        tion for noisy VQAs, in that the noise causes the training landscape to have a barren plateau (i.e.,
                                                        vanishing gradient). Specifically, for the local Pauli noise considered, we prove that the gradient
                                                        vanishes exponentially in the number of qubits n if the depth of the ansatz grows linearly with
                                                        n. These noise-induced barren plateaus (NIBPs) are conceptually different from noise-free barren
                                                        plateaus, which are linked to random parameter initialization. Our result is formulated for a generic
                                                        ansatz that includes as special cases the Quantum Alternating Operator Ansatz and the Unitary
                                                        Coupled Cluster Ansatz, among others. For the former, our numerical heuristics demonstrate the
                                                        NIBP phenomenon for a realistic hardware noise model.

                                                                 I.    Introduction                         VQAs may avoid the exponential scaling that otherwise
                                                                                                            would result from the exponential precision requirements
                                              One of the great unanswered technological questions           of navigating through a barren plateau.
                                           is whether Noisy Intermediate Scale Quantum (NISQ)                  However, these works do not consider quantum hard-
                                           computers will yield a quantum advantage for tasks of            ware noise, and very little is known about the scalability
                                           practical interest [1]. At the heart of this discussion are      of VQAs in the presence of noise. One of the main sell-
                                           Variational Quantum Algorithms (VQAs), which are be-             ing points of VQAs is noise mitigation, and indeed VQAs
                                           lieved to be the best hope for near-term quantum ad-             have shown evidence of noise resilience in the sense that
                                           vantage [2–4]. Such algorithms leverage classical opti-          the global minimum of the landscape may be unaffected
                                           mizers to train the parameters in a quantum circuit,             by noise [6, 23]. While some analysis has been done [44–
                                           while employing a quantum device to efficiently estimate         46], an important open question, which has not yet been
                                           an application-specific cost function or its gradient. By        addressed, is how noise affects the asymptotic scaling of
                                           keeping the quantum circuit depth relatively short, VQAs         VQAs. More specifically, one can ask how noise affects
                                           mitigate hardware noise and may enable near-term appli-          the training process. Intuitively, incoherent noise is ex-
                                           cations including electronic structure [5–8], dynamics [9–       pected to reduce the magnitude of the gradient and hence
                                           12], optimization [13–16], linear systems [17, 18], metrol-      hinder trainability, and preliminary numerical evidence
                                           ogy [19, 20], factoring [21], compiling [22–24], and oth-        of this has been seen [47, 48], although the scaling of this
                                           ers [25–30].                                                     effect has not been studied.
                                              The main open question for VQAs is their scalability to          In this work, we analytically study the scaling of gra-
                                           large problem sizes. While performing numerical heuris-          dient for VQAs as a function of n, the circuit depth L,
                                           tics for small or intermediate problem sizes is the norm         and a noise parameter q. We consider a general class of
                                           for VQAs, deriving analytical scaling results is rare for        local noise models that includes depolarizing noise and
                                           this field. Noteworthy exceptions are some recent studies        certain kinds of Pauli noise. Furthermore, we investi-
                                           of the scaling of the gradient in VQAs with the number of        gate a general, abstract ansatz that allows us to encom-
                                           qubits n [31–39]. For example, it was proven that the gra-       pass many of the important ansatzes in the literature,
                                           dient vanishes exponentially in n for randomly initialized,      hence allowing us to make a general statement about
                                           deep Hardware Efficient ansatzes [31, 32] and dissipative        VQAs. This includes the Quantum Alternating Operator
                                           quantum neural networks [33], and also for shallow depth         Ansatz (QAOA) which is used for solving combinatorial
                                           with global cost functions [34]. This vanishing gradient         optimization problems [13–16] and the Unitary Coupled
                                           phenomenon is now referred to as barren plateaus in the          Cluster (UCC) Ansatz which is used in the Variational
                                           training landscape. Fortunately, investigations into bar-        Quantum Eigensolver (VQE) to solve chemistry problems
                                           ren plateaus have spawned several promising strategies to        [49–51]. This is also applicable for the Hardware Efficient
                                           avoid them, including local cost functions [34, 40], param-      Ansatz and the Hamiltonian Variational Ansatz (HVA)
                                           eter correlation [37], pre-training [41], and layer-by-layer     which are employed for various applications [52–56]. Our
                                           training [42, 43]. Such strategies give hope that perhaps        results also generalize to settings that allow for multiple
2

                                                                 NIBP issue. In addition, we provide numerical heuristics
                                                                 that illustrate our main result for MaxCut optimization
                                                                 with the QAOA, showing that NIBPs significantly im-
                                                                 pact this application.

                                                                                            II.   Results

                                                                                  A.    General Framework

                                                                    In this work we analyze a general class of parameter-
                                                                 ized ansatzes U (θ) that can be expressed as a product of
                                                                 L unitaries sequentially applied by layers
                                                                            U (θ) = UL (θ L ) · · · U2 (θ 2 ) · U1 (θ 1 ) .       (1)
FIG. 1. Schematic diagram of the Noise-Induced Bar-
ren Plateau (NIBP) phenomenon. For various appli-
                                                                 Here θ = {θ l }L
                                                                                l=1 is a set of vectors of continuous pa-
cations such as chemistry and optimization, increasing the       rameters that are optimized to minimize a cost function
problem size often requires one to increase the depth L of the   C that can be expressed as the expectation value of an
variational ansatz. We show that, in the presence of local       operator O:
noise, the gradient vanishes exponentially in L and hence ex-
ponentially in the number of qubits n when L scales linearly                       C = Tr[OU (θ)ρU † (θ)] .                       (2)
in n. This can be seen in the plots on the right, which show     As shown in Fig. 2, ρ is an n-qubit input state. Without
the cost function landscapes for a simple variational problem
                                                                 loss of generality we assume that each Ul (θ l ) is given by
with local noise.
                                                                                            Y
                                                                                Ul (θ l ) =   e−iθlm Hlm Wlm ,            (3)
                                                                                              m
input states or training data, as in machine learning ap-
plications, often called quantum neural networks [57–61].        where Hlm are Hermitian operators, θ l = {θlm } are
   Our main result (Theorem 1) is an upper bound on the          continuous parameters, and Wlm denote unparametrized
magnitude of the gradient that decays exponentially with         gates. We expand Hlm and O in the Pauli basis as
L, namely as 2−κ with κ = L log2 (q). Hence we find that                            X
                                                                                        i
                                                                                                              X
the gradient vanishes exponentially in the circuit depth.        Hlm = η lm · σ n =   ηlm σni , O = ω · σ n =   ω i σni ,
                                                                                        i                                     i
Moreover, it is typical to consider L scaling as poly(n)                                                                  (4)
(e.g., in the UCC Ansatz [51]), for which our main result        where σni ∈ {11, X, Y, Z}⊗n are Pauli strings, and η lm and
implies an exponential decay of the gradient in n. We            ω are real-valued vectors that specify the terms present
refer to this as a Noise-Induced Barren Plateau (NIBP).          in the expansion. Defining Nlm = |η lm | and NO = |ω|
We remark that NIBPs can be viewed as concomitant to             as the number of non-zero elements, i.e., the number of
the cost landscape concentrating around the value of the         terms in the summations in Eq. (4), we say that Hlm and
cost for the maximally mixed state, and we make this             O admit an efficient Pauli decomposition if Nlm , NO ∈
precise in Lemma 1. See Fig. 1 for a schematic diagram           O(poly(n)), respectively.
of the NIBP phenomenon.                                            We now briefly discuss how the QAOA, UCC, and
   To be clear, any variational algorithm with a NIBP will       Hardware Efficient ansatzes fit into this general frame-
have exponential scaling. In this sense, NIBPs destroy           work. We refer the reader to the Methods section for
quantum speedup, as the standard goal of quantum algo-           additional details. In QAOA one sequentially alternates
rithms is to avoid the typical exponential scaling of clas-      the action of two unitaries as
sical algorithms. NIBPs are conceptually distinct from
the noise-free barren plateaus of Refs. [31–36]. Indeed,           U (γ, β) = e−iβp HM e−iγp HP · · · e−iβ1 HM e−iγ1 HP ,         (5)
strategies to avoid noise-free barren plateaus [34, 37, 40–
43] do not appear to solve the NIBPs issue.                      where HP and HM are the so-called problem and mixer
                                                                 Hamiltonian, respectively. We define NP (NM ) as the
   The obvious strategy to address NIBPs is to reduce
                                                                 number of terms in the Pauli decomposition of HP (HM ).
circuit complexity, or more precisely, to reduce the circuit
                                                                 On the other hand, Hardware Efficient ansatzes natu-
depth. Hence, our work provides quantitative guidance
                                                                 rally fit into Eqs. (1)–(3) as they are usually composed
for how small L needs to be to potentially avoid NIBPs.
                                                                 of fixed gates (e.g, controlled NOTs), and parametrized
   In what follows, we present our general framework fol-        gates (e.g., single qubit rotations). Finally, as detailed in
lowed by our main result. We also present two extensions         the Methods, the UCC ansatz can be expressed as
of our main result, one involving correlated ansatz pa-                                                    P i
                                                                                   Y               Y                i
rameters and one allowing for measurement noise. The                      U (θ) =     Ulm (θlm ) =    eiθlm i µlm σn ,     (6)
latter indicates that global cost functions exacerbate the                         lm                   lm
3

                                                                     In this case, one is primarily concerned with trainabil-
                                                                     ity, and hence the gradient is a key quantity of inter-
                                                                     est. These applications motivate our main result in The-
                                                                     orem 1, which bounds the magnitude of the gradient. We
                                                                     remark that trainability is of course also important for
                                                                     VQE, and hence Theorem 1 is also of interest for this
                                                                     application.
                                                                        With this motivation in mind, we now present our main
                                                                     results. We first present our bound on the cost function,
                                                                     since one can view this as a phenomenon that naturally
                                                                     accompanies our main theorem. Namely, in the following
                                                                     lemma, we show that the noisy cost function concentrates
                                                                     around the corresponding value for the maximally mixed
                                                                     state.
FIG. 2. Setting for our analysis. An n-qubit input state
ρ is sent through a variational ansatz U (θ) composed of L           Lemma 1 (Concentration of the cost function). Con-
unitary layers Ul (θ l ) sequentially acting according to Eq. (1).   sider an L-layered ansatz of the form in Eq. (1). Suppose
Here, Ul denotes the quantum channel that implements the             that local Pauli noise of the form of Eq. (7) with noise
unitary Ul (θ l ). The parameters in the ansatz θ = {θ l }Ll=1 are   strength q acts before and after each layer as in Fig. 2.
trained to minimize a cost function that is expressed as the         Then, for a cost function C e of the form in Eq. (8), the
expectation value of an operator O as in Eq. (2). We consider        following bound holds
a noise model where local Pauli noise channels Nj act on each
qubit j before and after each unitary.
                                                                                e − 1 Tr[O] 6 G(n) ρ − 1
                                                                                C                                        ,    (9)
                                                                                    2n                 2n            1

where µilm ∈ {0, ±1}, and where θlm are the coupled                  where

                                    P Nlm = |µlm | as
cluster amplitudes. Moreover, we denote                                                  G(n) = NO kωk∞ q L+1 .              (10)
                                          b
the number of non-zero elements in i µilm σni .
   As shown in Fig. 2, we consider a noise model where               Here k · k∞ is the infinity norm, k · k1 is the trace norm,
local Pauli noise channels Nj act on each qubit j before             ω is defined in Eq. (4), and NO = |ω| is the number of
and after each unitary Ul (θ l ). The action of Nj on a              non-zero elements in the Pauli decomposition of O.
local Pauli operator σ ∈ {X, Y, Z} can be expressed as
                                                                        This lemma implies the cost landscape exponentially
                                                                     concentrates on the value Tr[O]/2n for large n, whenever
                        Nj (σ) = qσ σ ,                       (7)
                                                                     the number of layers L scales linearly with the number of
where −1 < qX , qY , qZ < 1. Here, we character-                     qubits. While this lemma has important applications on
ize the noise strength with a single parameter q =                   its own, particularly for VQE, it also provides intuition
max{|qX |, |qY |, |qZ |}. Let Ul denote the channel that im-         for the NIBP phenomenon, which we now state.
plements the unitary Ul (θ l ) and let N = N1 ⊗ · · · ⊗ Nn              Let ∂lm C
                                                                                e = ∂ C/∂θ
                                                                                      e     lm denote the partial derivative of
denote the n-qubit noise channel. Then, the noisy cost               the noisy cost function with respect to the m-th param-
function is given by                                                 eter that appears in the l-th layer of the ansatz, as in
                                                                     Eq. (3). For our main technical result, we upper bound
                                                                     |∂lm C|
                                                                          e as a function of L and n.
                                           
       e = Tr O N ◦ UL ◦ · · · ◦ N ◦ U1 ◦ N (ρ) .
       C                                                      (8)
                                                                     Theorem 1 (Upper bound on the partial derivative).
                                                                     Consider an L-layered ansatz as defined in Eq. (1). Let
             B.   General Analytical Results                         θlm denote the trainable parameter corresponding to the
                                                                     Hamiltonian Hlm in the unitary Ul (θ l ) appearing in the
   There are some VQAs, such as the VQE [5] for chem-                ansatz. Suppose that local Pauli noise of the form in
istry and other physical systems, where it is important              Eq. (7) with noise parameter q acts before and after each
to accurately characterize the value of the cost function            layer as in Fig. 2. Then the following bound holds for the
itself. We provide an important result below in Lemma 1              partial derivative of the noisy cost function
that quantitatively bounds the cost function itself, and
we envision that this bound will be especially useful in                                     |∂lm C|
                                                                                                  e 6 F (n),                 (11)
the context of VQE. On the other hand, there are other
VQAs, such as those for optimization [13–16], compil-                where
                                                                                   √
ing [22–24], and linear systems [17, 18], where the key                  F (n) =       8 ln 2 NO kHlm k∞ kωk∞ n1/2 q L+1 ,   (12)
goal is to learn the optimal parameters and the precise
value of the cost function is either not important or can            and ω is defined in Eq. (4), with number of non-zero
be computed classically after learning the parameters.               elements NO .
4

   Let us now consider the asymptotic scaling of the func-    derivative of the noisy cost with respect to θst is bounded
tion F (n) in Eq. (12). Under standard assumptions such       as
as that O in Eq. (4) admits an efficient Pauli decomposi-                  √
                                                                       e 6 8 ln 2 gNO kHlm k∞ kωk∞ n1/2 q L+1 ,
                                                                  |∂st C|                                            (16)
tion and that Hlm has bounded eigenvalues, we now state
that F (n) decays exponentially in n, if L grows linearly     at all points in the cost landscape.
in n.
                                                                 Remark 1 is especially important in the context of the
Corollary 1 (Noise-induced barren plateaus). Let              QAOA and the UCC ansatz, as discussed below. We
                                 i
Nlm , NO ∈ O(poly(n)) and let ηlm  , ω j ∈ O(poly(n)) for     note that, in the general case, a unitary of the form of
all i, j. Then the upper bound F (n) in Eq. (12) vanishes     Eq. (3) cannot be implemented as a single gate on a phys-
exponentially in n as                                         ical device. In practice one needs to compile the unitary
                                                              into a sequence of native gates. Moreover, Hamiltoni-
                    F (n) ∈ O(2−αn ) ,                (13)    ans with non-commuting terms are usually approximated
                                                              with techniques such as Trotterization. This compiliation
for some positive constant α if we have
                                                              overhead potentially leads to a sequence of gates that
                        L ∈ Ω(n) .                    (14)    grows with n. Remark 1 enables us to account for such
                                                              scenarios, and we elaborate on its relevance to specific
   The asymptotic scaling in Eq. (13) is independent of       applications in the next subsection.
l and m, i.e., the scaling is blind to the layer, or the         Finally, we present an extension of our main result to
parameter within the layer, for which the derivative is       the case of measurement noise. Consider a model of mea-
taken. This corollary implies that when Eq. (14) holds,       surement noise where each local measurement indepen-
i.e. L grows at least linearly in n, the partial derivative   dently has some bit-flip probability given by (1 − qM )/2,
|∂lm C|
     e exponentially vanishes in n across the entire cost     which we assume to be symmetric with respect to the 0
landscape. In other words, one observes a Noise-Induced       and 1 outcomes. This leads to an additional reduction
Barren Plateau (NIBP).                                        of our bounds on the cost function and its gradient that
   We note that Eq. (14) is satisfied for all q < 1. That     depends on the locality of the observable O.
is, NIBPs occur regardless of the noise strength, it only     Proposition 1 (Measurement noise). Consider expand-
changes the severity of the exponential scaling.              ing the observable O as a sum of Pauli strings, as in
   In addition, Corollary 1 implies that NIBPs are con-       Eq. (4). Let w denote the minimum weight of these
ceptually different from noise-free barren plateaus. First,   strings, where the weight is defined as the number of non-
NIBPs are independent of the parameter initialization         identity elements for a given string. In addition to the
strategy or the locality of the cost function. Second,        noise process considered in Fig. 2, suppose there is also
NIBPs exhibit exponential decay of the gradient itself;       measurement noise consisting of a tensor product of lo-
not just of the variance of the gradient, which is the        cal bit-flip channels with bit-flip probability (1 − qM )/2.
hallmark of noise-free barren plateaus. Noise-free barren     Then we have
plateaus allow the global minimum to sit inside deep, nar-
row valley in the landscape [34], whereas NIBPs flatten                  e − 1 Tr[O] 6 qM
                                                                         C              w          1
                                                                                          G(n) ρ − n                 (17)
the entire landscape.                                                        2n                   2           1
   One of the strategies to avoid the noise-free barren
                                                              and
plateaus is to correlate parameters, i.e., to make a subset
of the parameters equal to each other [37]. We generalize                                   w
                                                                                  |∂lm C|
                                                                                       e 6 qM F (n)                  (18)
Theorem 1 in the following remark to accommodate such
a setting, consequently showing that such correlated or       where G(n) and F (n) are defined in Lemma 1 and The-
degenerate parameters do not help in avoiding NIBPs.          orem 1, respectively.
In this setting, the result we obtain in Eq. (16) below is       Proposition 1 goes beyond the noise model considered
essentially identical to that in Eq. (12) except with an      in Theorem 1. It shows that in the presence of measure-
additional factor quantifying the amount of degeneracy.       ment noise there is an additional contribution from the
Remark 1 (Degenerate parameters). Consider the                locality of the measurement operator. It is interesting to
ansatz defined in Eqs. (1) and (3). Suppose there is a        draw a parallel between Proposition 1 and noise-free bar-
subset Gst of the set {θlm } in this ansatz such that Gst     ren plateaus, which have been shown to be cost-function
consists of g parameters that are degenerate:                 dependent and in particular depend on the locality of the
                                                              observable O [34]. The bounds in Proposition 1 similarly
                                                              depend on the locality of O. For example, when w = n,
                       
                Gst = θlm | θlm = θst .              (15)
                                                                                                    w
                                                              i.e., global observables, the factor qM will hasten the ex-
Here, θst denotes the parameter in Gst for which              ponential decay. On the other hand, when w = 1, i.e., lo-
Nlm kη lm k∞ takes the largest value in the set. (θst can     cal observables, the scaling is unaltered by measurement
also be thought of as a reference parameter to which all      noise. In this sense, a global observable exacerbates the
other parameters are set equal in value.) Then the partial    NIBP issue by making the decay more rapid with n.
5

     C.   Application-Specific Analytical Results                Moreover, weak growth of p with n combined with com-
                                                                 pilation overhead could still result in an NIBP.
   Here we investigate the implications of our results from         Finally, we note that above we have assumed the con-
Section II B for two applications: optimization and chem-        tribution of kP dominates that of kM . However, it is
istry. In particular, we derive explicit conditions for          possible that for choice of more exotic mixers [16], kM
NIBPs for these applications. These conditions are de-           also needs to be carefully considered to avoid NIBPs.
rived in the setting where Trotterization is used, but
                                                                 Corollary 3 (Example: UCC). Let H denote a molec-
other compilation strategies incur similar asymptotic be-
                                                                 ular Hamiltonian of a system of Me electrons. Consider
havior. We begin with the QAOA for optimization and
                                                                 the UCC ansatz as defined in Eq. (6). If local Pauli noise
then discuss the UCC ansatz for chemistry. Finally,
                                                                 of the form in Eq. (7) with noise parameter q acts before
we make a remark about the Hamiltonian Variational
                                                                 and after every Ulm (θlm ) in Eq. (6), then we have
Ansatz (HVA), as well as remark that our results also ap-
ply to a generalized cost function that can employ train-                        √
                                                                       |∂θlm C|
                                                                             e 6 8 ln 2 Nblm NH kωk∞ n1/2 q L+1 ,      (21)
ing data.
Corollary 2 (Example: QAOA). Consider the QAOA                   for any coupled cluster amplitude θlm , and where O = H
with 2p trainable parameters, as defined in Eq. (5). Sup-        in Eq. (2).
pose that the implementation of unitaries corresponding
to the problem Hamiltonian HP and the mixer Hamilto-                Corollary 3 allows us to make general statements about
nian HM require kP - and kM -depth circuits, respectively.       the trainability of UCC ansatz. We present the details for
If local Pauli noise of the form in Eq. (7) with noise pa-       the standard UCC ansatz with single and double excita-
rameter q acts before and after each layer of native gates,      tions from occupied to virtual orbitals [49, 68] (see Meth-
then we have                                                     ods for more details). Let Mo denote the total number of
          √                                                      spin orbitals. Then at least n = Mo qubits are required
  |∂βl C|
       e 6 8 ln 2 gl,P NP kHP k∞ kωk∞ n1/2 q (kP +kM )p+1 ,      to simulate such a system and the number of variational
                                                          (19)   parameters grows as Ω(n2 Me2 ) [62, 69]. To implement the
          √                             1/2    (k +k   )p+1
  |∂γl C|
       e 6 8 ln 2 gl,M NP kHM k∞ kωk∞ n q P M               ,    UCC ansatz on a quantum computer, the excitation op-
                                                          (20)   erators are first mapped to Pauli operators using Jordan-
                                                                 Wigner or Bravyi-Kitaev mappings [70, 71]. Then, using
for any choice of parameters βl , γl , and where O = HP          first-order Trotterization and employing SWAP networks
in Eq. (2). Here gl,P and gl,M are the respective number         [62], the UCC ansatz can be implemented in Ω(n2 Me )
of native gates parameterized by βl and γl according to          depth, while assuming 1-D connectivity of qubits [62].
the compilation.                                                 Hence for the UCC ansatz, approximated by single- and
   Corollary 2 follows from Remark 1 and it has interest-        double-excitation operators, the upper bound in Eq. (21)
ing implications for the trainability of the QAOA. From          (asymptotically) vanishes exponentially in n.
Eqs. (19) and (20), NIBPs are guaranteed if pkP scales              To target strongly correlated states for molecular
linearly in n. This can manifest itself in a number of           Hamiltonians, one can employ a UCC ansatz that in-
ways, which we explain below.                                    cludes additional, generalized excitations [55, 72]. A
   First, we look at the depth kP required to implement          Ω(n3 ) depth circuit is required to implement the first-
one application of the problem unitary. Graph prob-              order Trotterized form of this ansatz [62]. Hence NIBPs
lems containing vertices of extensive degree such as the         become more prominent for generalized UCC ansatzes.
Sherrington-Kirkpatrick model inherently require Ω(n)            Finally, we remark that a sparse version of the UCC
depth circuits to implement [54]. On the other hand,             ansatz can be implemented in Ω(n) depth [62]. NIBPs
generic problems mapped to hardware topologies also              still would occur for such ansatzes.
have the potential to incur Ω(n) depth or greater in com-           Additionally, we can make the following remark about
pilation cost. For instance, implementation of MaxCut            the Hamiltonian Variational Ansatz (HVA). As argued in
and k-SAT using SWAP networks on circuits with 1-D               [55, 73, 74], the HVA has the potential to be an effective
connectivity requires depth Ω(n) and Ω(nk−1 ) respec-            ansatz for quantum many-body problems.
tively [15, 62]. Such mappings with the aforementioned           Remark 2 (Example: HVA). The HVA be thought of
compiling overhead for k > 2 are guaranteed to encounter         as a generalization of the QAOA to more than two non-
NIBPs even for a fixed number of rounds p.                       commuting Hamiltonians. It is remarked in Ref. [56] that
   Second, it appears that p values that grow at least           for problems of interest the number of rounds p scales
lightly with n may be needed for quantum advantage in            linearly in n. Thus, considering this growth of p and also
certain optimization problems (for example, [63–66]). In         the potential growth of the compiled unitaries with n, the
addition, there are problems employing the QAOA that             HVA has the potential to encounter NIBPs, by the same
explicitly require p scaling as poly(n) [21, 67]. Thus,          arguments made above for the QAOA (e.g., Corollary 2).
without even considering the compilation overhead for
the problem unitary, these QAOA problems may run into            Remark 3 (Quantum Machine Learning). Our results
NIBPs particularly when aiming for quantum advantage.            can be extended to generalized cost functions of the form
6

                              †
           P
Ctrain =     i Tr[Oi U (θ)ρi U (θ)] where {Oi } is a set of
operators each of the form (4) and {ρi } is a set of states.
This can encapsulate certain quantum machine learning
settings [57–61] that employ training data {ρi }. As an
example of an instance where NIBPs can occur, in one
study [61] an ansatz model has been proposed that requires
at least linear circuit depth in n.

                 D.   QAOA Heuristics

  To illustrate the NIBP phenomenon beyond the con-
ditions assumed in our analytical results, we numerically
implement the QAOA to solve MaxCut combinatorial op-
timization problems. We employ a realistic noise model
obtained from gate-set tomography on the IBM Ourense           FIG. 3. QAOA heuristics in the presence of real-
superconducting qubit device. In the Methods section           istic hardware noise: increasing number of rounds
we provide additional details on the noise model and the       for fixed problem size. (a) The approximation ratio av-
optimization method employed.                                  eraged over 100 random graphs of 5 nodes is plotted ver-
  Let us first recall that a MaxCut problem is specified       sus number of rounds p. The black, green, and red curves
by a graph G = (V, E) of nodes V and edges E. The              respectively correspond to noise-free training, noisy train-
goal is to partition the nodes of G into two sets which        ing with noise-free final cost evaluation, and noisy train-
                                                               ing with noisy final cost evaluation. The performance of
maximize the number of edges connecting nodes between
                                                               noise-free training increases with p, similar to the results in
sets. Here, the QAOA problem Hamiltonian is given by           Ref. [15]. The green curve shows that the training process
                       1 X                                     itself is hindered by noise, with the performance decreasing
              HP = −         Cij (11 − Zi Zj ) ,     (22)      steadily with p for p > 4. The dotted blue lines correspond
                       2
                         ij∈E                                  to known lower and upper bounds on classical performance
                                                               in polynomial time: respectively the performance guarantee
where Zi are local Pauli operators on qubit (node) i,
                                                               of the Goemans-Williamson algorithm [76] and the boundary
Cij = 1 if the nodes are connected and Cij = 0 otherwise.      of known NP-hardness [77, 78]. (b) The deviation of the cost
   We analyze performance in two settings. First, we fix       from Tr[HP ]/2n (averaged over graphs and parameter values)
the problem size at n = 5 nodes (qubits) and vary the          is plotted versus p. As p increases, this deviation decays ap-
number of rounds p (Fig. 3). Second, we fix the number         proximately exponentially with p (linear on the log scale). (c)
of rounds of QAOA at p = 4 and vary the problem size           The absolute value of the largest partial derivative, averaged
by increasing the number of nodes (Fig. 4).                    over graphs and parameter values, is plotted versus p. The
   In order to quantify performance for a given n and p,       partial derivatives decay approximately exponentially with p,
we randomly generate 100 graphs according to the               showing evidence of Noise-Induced Barren Plateaus (NIBPs).
Erdős–Rényi model [75], such that each graph G is chosen
uniformly at random from the set of all graphs of n nodes.
For each graph we run 10 instances of the parameter opti-      via noisy training. Note that evaluating the cost in a
mization, and we select the run that achieves the smallest     noise-free setting has practical meaning, since the clas-
energy. At each optimization step the cost is estimated        sicality of the Hamiltonian allows one to compute the
with 1000 shots. Performance is quantified by the aver-        cost on a (noise-free) classical computer, after training
age approximation ratio when training the QAOA in the          the parameters. For p > 4 this approximation ratio de-
presence and absence of noise. The approximation ratio         creases, meaning that as p becomes larger it becomes
is defined as the lowest energy obtained via optimizing        increasingly hard to find a minimizing direction to navi-
divided by the exact ground state energy of HP .               gate through the cost function landscape. Moreover, the
   In our first setting we observe in Fig. 3(a) that when      effect of NIPBs is evident in Fig. 3(c) where we depict
training in the absence of noise, the approximation ratio      the average absolute value of the largest cost function
increases with p. However, when training in the pres-          partial derivative (i.e., maxlm |∂lm C|).
                                                                                                    e    This plot shows
ence of noise the performance decreases for p > 2. This        an exponential decay of the partial derivative with p in
result is in accordance with Lemma 1, as the cost func-        accordance with Theorem 1.
tion value concentrates around Tr[HP ]/2n as p increases.         Finally, in Fig. 3(a) we contextualize our results
This concentration phenomenon can also be seen clearly         with previously known two-sided bounds on classical
in Fig. 3(b), where in fact we see evidence of exponential     polynomial-time performance. The lower bound cor-
decay of cost value with p.                                    responds to the performance guarantee of the classical
   In addition, we can see the effect of NIBPs as Fig. 3(a)    Goemans-Williamson algorithm [76], whilst the upper
also depicts the value of the approximation ratio com-         bound is at the value 16/17 which is the approximation
puted without noise by utilizing the parameters obtained       ratio beyond which Max-Cut is known to be NP-hard
7

                                                                    scalability, and there is even less known about the im-
                                                                    pact of noise on their scaling. Our work represents a
                                                                    breakthrough in understanding the effect of local noise on
                                                                    VQA scalability. We rigorously prove two important and
                                                                    closely related phenomena: the exponential concentra-
                                                                    tion of the cost function in Lemma 1 and the exponential
                                                                    vanishing of the gradient in Theorem 1. We refer to the
                                                                    latter as a Noise-Induced Barren Plateau (NIBP). Like
                                                                    noise-free barren plateaus, NIBPs require the precision
                                                                    and hence the algorithmic complexity to scale exponen-
                                                                    tially with the problem size. Thus, avoiding NIBPs is
                                                                    necessary for a VQA to have any hope of exponential
                                                                    quantum speedup.
                                                                       On the other hand, NIBPs are conceptually different
                                                                    from noise-free barren plateaus [31–36]. The latter are
FIG. 4. QAOA heuristics in the presence of realistic                due to random parameter initialization and hence can be
hardware noise: increasing problem size for a fixed                 addressed by pre-training, correlating parameters, and
number of rounds. The approximation ratio averaged over             other strategies [34, 37, 40–43]. In contrast, NIBPs hold
60 random graphs of increasing number of nodes n and fixed          for every point on the cost function landscape. Hence,
number of rounds p = 4 is plotted. The black, green, and red        pre-training and other similar strategies do not avoid
curves respectively correspond to noise-free training, noisy        NIBPs, and we explicitly demonstrate this for the pa-
training with noise-free final cost evaluation, and noisy train-    rameter correlation strategy in Remark 1. At the mo-
ing with noisy final cost evaluation. (a) For a problem size of 8
                                                                    ment, the only strategies we are aware of for avoiding
nodes or larger, the noisily-trained approximation ratio falls
below the performance guarantee of the classical Goemans-
                                                                    NIBPs are: (1) reducing the hardware noise level, or (2)
Williamson algorithm. (b) The depth of the circuit (red             improving the design of variational ansatzes such that
curve) scales linearly with the number of qubits, confirming        their circuit depth scales more weakly with n. Our work
we are in a regime where we would expect to observe Noise-          provides quantitative guidance for how to develop these
Induced Barren Plateaus.                                            strategies.
                                                                       An elegant feature of our work is its generality, as our
                                                                    results apply to a wide range of VQAs and ansatzes.
[77, 78].                                                           This includes the two most popular ansatzes, QAOA
   In our second setting we find complementary results.             for optimization and UCC for chemistry, which Corol-
In Fig. 4(a) we observe that at a problem size of 8 qubits          laries 2 and 3 treat respectively. In recent times QAQA,
or larger, 4 rounds of QAOA trained on the noisy circuit            UCC, and other physically motivated ansatzes have be
falls short of the performance guarantees of the classi-            touted as the potential solution to trainability issues due
cal Goemans-Williamson algorithm. As we increase the                to (noise-free) barren plateaus, while Hardware Efficient
number of qubits, we also observe this increases the depth          ansatzes, which minimize circuit depth, have been re-
of the circuit linearly (Fig. 4(b)), thus confirming we are         garded as problematic. Our work swings the pendulum in
in a regime of NIBPs.                                               the other direction: any additional circuit depth that an
   Our numerical results show that training the QAOA                ansatz incorporates (regardless of whether it is physically
in the presence of a realistic noise model significantly            motivated) will hurt trainability and potentially lead to
affects the performance. The concentration of cost and              a NIBP. This suggests that Hardware Efficient ansatzes
the NIBP phenomenon are both also clearly visible in our            are in fact worth exploring further, provided one has an
data. Even though we observe performance for n = 5                  appropriate strategy to avoid noise-free barren plateaus.
and p = 4 that is NP-hard to achieve classically, any               This claim is supported by recent state-of-the-art imple-
possible advantage would be lost for large problem sizes            mentations for optimization [54] and chemistry [53] using
or circuit depth due to bad scaling. Hence, noise appears           such ansatzes.
to be a crucial factor to account for when attempting to               We believe our work has particular relevance to opti-
understand the performance of QAOA.                                 mization. For combinatorial optimization problems, such
                                                                    as MaxCut on 3-regular graphs, the compilation of a sin-
                                                                    gle instance of the problem unitary e−iγHP can require an
                      III.   Discussion                             Ω(n)-depth circuit [54]. Therefore, for a constant number
                                                                    of rounds p of the QAOA, the circuit depth grows at least
  The success of NISQ computing largely depends on the              linearly with n. From Theorem 1, it follows that NIBPs
scalability of Variational Quantum Algorithms (VQAs),               can occur for practical QAOA problems, even for con-
which are widely viewed as the best hope for near-term              stant number of rounds. Furthermore, even neglecting
quantum advantage for various applications. Only a                  the aforementioned linear compilation overhead, NIBPs
small handful of works have analytically studied VQA                are guaranteed (asymptotically) if p grows in n. Such
8

growth has been shown to be necessary in certain in-
stances of MaxCut [63] as well as for other optimization
problems [21, 67], and hence NIBPs are especially rele-
vant in these cases.
   While it is well known that decoherence ultimately lim-
its the depth of quantum circuits in the NISQ era, there
was an interesting open question (prior to our work) as
to whether one could still train the parameters of a vari-
ational ansatz in the high decoherence limit. This ques-
tion was especially important for VQAs for optimization,
compiling, and linear systems, which are applications
that do not require accurate estimation of cost func-
tions on the quantum computer. Our work essentially
provides a negative answer to this question. Naturally,
important future work will involve extending our results
to more general (e.g., non-unital) noise models, and nu-
merically testing the tightness of our bounds. Moreover,      FIG. 5. Special cases of our general ansatz. (a) QAOA
our work emphasizes the importance of short-depth vari-       problem unitary e−iγHP for the ring-of-disagrees          MaxCut
                                                              problem, with Hamiltonian HP = 21 j Zj Zj+1 . (b) Hard-
                                                                                                        P
ational ansatzes. Hence a crucial research direction for
the success of VQAs will be the development of methods        ware Efficient ansatz composed of CNOTs and single qubit
to reduce ansatz depth.                                       rotations around the y-axis Ry (θ). (c) Unitary for the expo-
                                                              nential e−iθY1 Z2 Z3 X4 . This type of circuit is a representative
                                                              component of the UCC ansatz.

                      IV.   Methods
                                                                                2.   Hardware Efficient Ansatz
             A.   Special Cases of Our Ansatz
                                                                 The goal of the Hardware Efficient ansatz is to reduce
   Here we discuss how the the QAOA, the Hardware             the gate overhead (and hence the circuit depth) which
Efficient ansatz, and the UCC ansatz fit into the general     arises when implementing a general unitary as in (3).
framework of Section II A.                                    Hence, when employing a specific quantum hardware the
                                                              parametrized gates e−iθlm Hlm and the unparametrized
                                                              gates Wlm are taken from a gate alphabet composed of
                                                              native gates to that hardware. Figure 5(b) shows an
        1.    Quantum Alternating Operator Ansatz             example of a Hardware Efficient ansatz where the gate
                                                              alphabet is composed of rotations around the y axis and
                                                              of CNOTs.
   The QAOA can be understood as a discretized adi-
abatic transformation where the goal is to prepare the
ground state of a given Hamiltonian HP . The order p
of the Trotterization determines the solution precision
                                                                           3.    Unitary Coupled Cluster Ansatz
and the circuit depth. Given an initial state |si, usually
the linear superposition of all elements of the computa-
tional basis |si = |+i⊗n , the ansatz corresponds to the         This ansatz is employed to estimate the ground state
sequential application of two unitaries UP (γl ) = e−iγl HP   energy of the molecular Hamiltonian. In the second
and UM (βl ) = e−iβl HM . These alternating unitaries are     quantization, and within the Born-Oppenheimer approx-
usually known as the problem and mixer unitary, respec-       imation, the molecular Hamiltonian of aPsystem of Me
                                                                                                                         †
tively. Here γ = {γk }L                    L
                       l=1 and β = {βk }l=1 are vectors       electrons can be expressed as: H =                 pq hpq ap aq +
of variational parameters which determine how long each       1                † †                  †
                                                                P
                                                              2    pqrs hpqrs ap aq ar as , where {ap } ({aq }) are Fermionic
unitary is applied and which must be optimized to mini-       creation (annihilation) operators. Here, hpq and hpqrs
mize the cost function C, defined as the expectation value    respectively correspond to the so-called one- and two-
                                                              electron integrals [49, 68]. The ground state energy of
     C = hγ, β|HP |γ, βi = Tr[HP |γ, βihγ, β|] ,      (23)    H can be estimated with the VQE algorithm by prepar-
                                                              ing a reference state, normally taken to be the Hartree-
where |γ, βi = U (γ, β)|si is the QAOA variational state,     Fock (HF) mean-field state |ψ0 i, and acting on it with a
and where U (γ, β) is given by (5). In Fig. 5(a) we depict    parametrized UCC ansatz.
the circuit description of a QAOA ansatz for a specific          The action of a UCC ansatz with single (T1 ) and double
Hamiltonian where kP = 6.                                     (T2 ) excitations is given by |ψi = exp(T − T † )|ψ0 i, where
9

T = T1 + T2 , and where                                                Let us now present two lemmas that reflect these
                                                                    two parts of the proof. The action of the noise in
                                  ti,j a†a a†b aj ai .
         X                      X a,b
   T1 =       tai a†a ai , T2 =                             (24)    (7) on the operator Λ is to map the elements of λ as
          i∈occ                    i,j∈occ                               N          x(i) y(i) z(i)
          a∈vir                    a,b∈vir                          λi −−→ λ0i = qX qY qZ λi where x(i), y(i), and z(i)
                                                                    respectively denote the number of X, Y , and Z oper-
Here the i and j indices range over “occupied” orbitals             ators in the i-th Pauli string. Recall the definition
whereas the a and b indices range over “virtual” or-                q = max{|qX |, |qY |, |qZ |}. Since x(i) + y(i) + z(i) > 1,
bitals [49, 68]. The coefficients tai and ta,b   i,j are called     the inequality |λ0 | 6 q|λ| always holds. We use this re-
coupled cluster amplitudes. For simplicity, we denote               lationship, along with Weyl’s inequality and the unitary
these amplitudes {tai , ta,b
                         i,j } as {θlm }. Similarly, by denot-
                                                                    invariance of Schatten norms to show that for an operator
ing the excitation operators {a†a ai , a†a a†b aj ai } as {τlm },   of the form (25) we have
the UCC P  ansatz can be written in a compact form as                               W k (Λ)    ∞
                                                                                                   6 λ0 + q k λ      1
                                                                                                                           (26)
                          †
U (θ) = e lm θlm (τlm −τlm ) . In order to implement U (θ)
one maps the fermionic operators to spin operators by               where W k is a channel composed of k unitaries inter-
means of the Jordan-Wigner or the Bravyi-Kitaev trans-              leaved with noise channels of the form (7). The second
                                                           †        lemma we present is a bound on the relative entropy by
 P i i [70, 71], which allows us to write (τlm −τlm ) =
formations
                                                                    Müller-Hermes et al. [79] which states that
i i µlm σn . Then, from a first-order Trotterization we
obtain (6). Here, µilm ∈ {0, ±1}. In Fig. 5(c) we depict                     D W(ρ) 11⊗n /2n 6 q 2k D ρ 11⊗n /2n
                                                                                                                    
                                                                                                                           (27)
the circuit description of a representative component of
                                                                    where we recall that D ρ 11⊗n /2n itself is always upper
                                                                                                       
the UCC ansatz.
                                                                    bounded by n for any n-qubit quantum state ρ.
                                                                       Now that we have the main tools we present a sketch
                  B.     Proof of Theorem 1                         of the proof. In order to analyze the partial derivative of
                                                                    the cost function ∂lm C
                                                                                          e = Tr [O ∂lm ρL ] we first note that
  Here we outline the proof for our main result on Noise-           the output state ρL can be expressed as
Induced Barren Plateaus. We refer the reader to the Sup-                         ρL = (Wa ◦ Wb ) (ρ0 ) = Wa (ρ̄l ) ,       (28)
plementary Information for additional details. We note
that Lemma 1 and Remark 1 follow from similar steps                 where ρ0 is the input state and
and their proofs are also detailed in the Supplementary                                                     +
                                                                          Wa = N ◦ UL ◦ · · · ◦ Ul+1 ◦ N ◦ Ulm ,           (29)
Information. Moreover, we remark that Corollaries 1, 2,                           −
                                                                          Wb =   Ulm   ◦ N ◦ Ul−1 ◦ · · · ◦ N ◦ U1 ◦ N ,   (30)
and 3 follow in a straightforward manner from a direct
application of Theorem 1 and Remark 1.                                      ±
                                                                    where  Ulm   are channels that implement the unitaries
  Throughout our calculations we find it useful to use                −
                                                                    Ulm = s6m e−iθls Hls and Ulm
                                                                           Q                       +
                                                                                                       = s>m e−iθls Hls such
                                                                                                          Q
the expansion of operators in the Pauli tensor product                           +     −
                                                                    that Ul = Ulm   · Ulm . For simplicity of notation here we
basis. Given an n-qubit Hermitian operator Λ, one can
                                                                    have omitted the parameter dependence on the concate-
always consider the decomposition
                                                                    nation of channels. Additionally, we have introduced the
                       Λ = λ0 11⊗n + λ · σ n ,              (25)    notation ρ̄l = Wb (ρ0 ) and it is straightforward to show
                               n
                                                                    that
where λ0 ∈ R and λ ∈ R4 −1 . Note that here we redefine
                                                                                        ∂lm ρ̄l = −i[Hlm , ρ̄l ] .         (31)
the vector of Pauli strings σ n as a vector of length 4n − 1
which excludes 11⊗n .                                                 Using the tracial matrix Hölder’s inequality [80], we
   Central to our proof is to understand how operators              can write
are mapped by concatenations of unitary transformations                           e = Tr Wa† (O) ∂lm ρ̄l
                                                                                                        
                                                                              ∂lm C                                    (32)
and noise channels. We do this through two lenses. First,
                                                                                           †
                                                                                                           
given an operator Λ we investigate how various `p -norms                             6 Wa (O) ∞ ∂lm ρ̄l 1 ,            (33)
of λ are related at different points in the evolution. Such
quantities are well suited to study in our setting as we            where Wa† is the adjoint map of Wa . The two terms
can use the transfer matrix formalism in the Pauli ba-              in the product can then be bounded with the above
sis, that is, to represent a channel N with the matrix            two techniques. Using (26) we find Wa† (O) ∞ 6
(TN )ij = 21n Tr σni N (σnj ) . Indeed, we see that the noise       q L−l+1 NO ω ∞ for the first term. We bound the sec-
model in (7) has a diagonal Pauli transfer matrix, which            ond term by using (31), a bound on Schatten norms
motivates this choice of attack. The second      quantity we        of commutators [81], quantum
                                                                                                  Pinsker’s
                                                                                                    √        inequality [82],
use is the relative entropy D ρ 11⊗n /2n between a state            and (27) to obtain ∂lm ρ̄l 1 6 8 ln 2 Hlm ∞ n1/2 q l .
                                           
ρ and the maximally mixed state. This is also useful                Putting the two parts together we obtain
to study due to the strong data processing inequality in                          √
                                                                             e 6 8 ln 2 NO kHlm k∞ kωk∞ n1/2 q L+1 ,
                                                                         ∂lm C                                          (34)
Ref. [79] which quantifies how noise maps ρ closer to the
maximally mixed state.                                              completing the proof.
10

                                                                           w
               C.       Proof of Proposition 1                      with qM    for each term in the sum. This gives an ex-
                                                                    tra locality-dependent factor in the bound on the partial
   Here we sketch the proof of Proposition 1, with addi-            derivative:
tional details being presented in the Supplementary In-                                          w
                                                                                       |∂lm C|
                                                                                            e 6 qM F (n).                 (40)
formation.
   We model measurement noise as a tensor product of                  An analogous reasoning leads to the following result
independent local classical bit-flip channels, which math-          for the concentration of the cost function:
ematically corresponds to modifying the local POVM el-
ements P0 = |0ih0| and P1 = |1ih1| as follows:                                       e − 1 Tr O 6 q w G(n).
                                                                                     C                                    (41)
                                                                                                   M
                                                                                         2n
                      1 + qM          1 − qM
  P0 = |0ih0| → Pe0 =        |0ih0| +        |1ih1| (35)
                         2               2
                      1 − qM          1 + qM                             D.   Details of Numerical Implementations
  P1 = |1ih1| → Pe1 =        |0ih0| +        |1ih1| . (36)
                         2               2
                                                                       The noise model employed in our numerical simula-
In turn, it follows that one can also model this measure-           tions was obtained by performing one- and two-qubit
ment noise as a tensor product of local depolarizing chan-          gate-set tomography [83, 84] on the five-qubit IBM Q
nels with depolarizing probability 1 > (1 − qM )/2 > 0,             Ourense superconducting qubit device. The process ma-
which we indicate by NM . The channel is applied di-                trices for each gate native to the device’s alphabet, and
                                                                    the state preparation and measurement noise are de-
P i to thei measurement operator such that NM (O) =
rectly
   i ω NM (σn ) = ω  e · σ n . Here ω
                                    e is a vector of coeffi-        scribed in Ref. [85, Apendix B]. In addition, the opti-
cients ω
                w(i)
        e i = qM ω i , where w(i) = x(i) + y(i) + z(i) is           mization for the MaxCut problems was performed using
the weight of the Pauli string. Here we recall that we              an optimizer based on the Nelder-Mead simplex method.
have respectively defined x(i), y(i), z(i) as the number
of Pauli operators X, Y , and Z in the i-th Pauli string.
                                                                               V.   ACKNOWLEDGEMENTS
   Let us first focus on the partial derivative of the cost.
In the presence of measurement noise we then have
                                                                       We thank Daniel Stilck França for helpful discussions
                                                                    and for pointing us to Ref. [79]. Research presented in
                e = 1 Tr (e
                              h                         i
            ∂lm C             ω · σ n )(g (L) · σ n )        (37)   this article was supported by the Laboratory Directed
                    2n                                              Research and Development program of Los Alamos Na-
                  =ωe · g (L) .                              (38)   tional Laboratory under project number 20190065DR.
                                                                    SW and EF acknowledge support from the U.S. Depart-
Which means that |∂lm C|
                       e = |eω · g (L) |. We then examine           ment of Energy (DOE) through a quantum computing
the inner product in an element-wise fashion:                       program sponsored by the LANL Information Science
                                                                    & Technology Institute. MC and AS were also sup-
                    X        (L)
                                       X    w(i)     (L)            ported by the Center for Nonlinear Studies at LANL.
    ω · g (L) | 6
   |e                   |e
                         ωi ||gi | 6       qM |ωi ||gi | .   (39)   PJC also acknowledges support from the LANL ASC Be-
                    i                  i                            yond Moore’s Law project. LC and PJC were also sup-
                                                                    ported by the U.S. Department of Energy (DOE), Of-
Therefore, defining w = mini w(i) as the minimum                    fice of Science, Office of Advanced Scientific Computing
weight of the Pauli strings in the decomposition of O,              Research, under the Quantum Computing Applications
              w(i)    w                             w(i)
we have that qM 6 qM    , and hence we can replace qM               Team (QCAT) program.

 [1] J. Preskill, “Quantum computing in the NISQ era and             [4] Kishor Bharti, Alba Cervera-Lierta, Thi Ha Kyaw,
     beyond,” Quantum 2, 79 (2018).                                      Tobias Haug, Sumner Alperin-Lea, Abhinav Anand,
 [2] M. Cerezo, Andrew Arrasmith, Ryan Babbush, Simon C                  Matthias Degroote, Hermanni Heimonen, Jakob S.
     Benjamin, Suguru Endo, Keisuke Fujii, Jarrod R Mc-                  Kottmann, Tim Menke, Wai-Keong Mok, Sukin Sim,
     Clean, Kosuke Mitarai, Xiao Yuan, Lukasz Cincio, and                Leong-Chuan Kwek, and Alán Aspuru-Guzik, “Noisy
     Patrick J Coles, “Variational quantum algorithms,” arXiv            intermediate-scale quantum (nisq) algorithms,” arXiv
     preprint arXiv:2012.09265 (2020).                                   preprint arXiv:2101.08448 (2021).
 [3] Suguru Endo, Zhenyu Cai, Simon C Benjamin, and Xiao             [5] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q.
     Yuan, “Hybrid quantum-classical algorithms and quan-                Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien,
     tum error mitigation,” arXiv preprint arXiv:2011.01382              “A variational eigenvalue solver on a photonic quantum
     (2020).                                                             processor,” Nature Communications 5, 4213 (2014).
11

 [6] Jarrod R McClean, Jonathan Romero, Ryan Babbush,                 compiling,” New Journal of Physics (2020).
     and Alán Aspuru-Guzik, “The theory of variational           [24] T. Jones and S. C Benjamin, “Quantum compila-
     hybrid quantum-classical algorithms,” New Journal of             tion and circuit optimisation via energy dissipation,”
     Physics 18, 023023 (2016).                                       arXiv:1811.03147 [quant-ph].
 [7] Bela Bauer, Dave Wecker, Andrew J Millis, Matthew B         [25] A. Arrasmith, L. Cincio, A. T. Sornborger, W. H. Zurek,
     Hastings,    and Matthias Troyer, “Hybrid quantum-               and P. J. Coles, “Variational consistent histories as a hy-
     classical approach to correlated materials,” Physical Re-        brid algorithm for quantum foundations,” Nature com-
     view X 6, 031045 (2016).                                         munications 10, 3438 (2019).
 [8] Tyson Jones, Suguru Endo, Sam McArdle, Xiao Yuan,           [26] Marco Cerezo, Alexander Poremba, Lukasz Cincio, and
     and Simon C Benjamin, “Variational quantum algorithms            Patrick J Coles, “Variational quantum fidelity estima-
     for discovering hamiltonian spectra,” Physical Review A          tion,” Quantum 4, 248 (2020).
     99, 062304 (2019).                                          [27] M Cerezo, Kunal Sharma, Andrew Arrasmith, and
 [9] Ying Li and Simon C Benjamin, “Efficient variational             Patrick J Coles, “Variational quantum state eigensolver,”
     quantum simulator incorporating active error minimiza-           arXiv preprint arXiv:2004.01372 (2020).
     tion,” Physical Review X 7, 021050 (2017).                  [28] Ryan LaRose, Arkin Tikku, Étude O’Neel-Judy, Lukasz
[10] Cristina Cirstoiu, Zoe Holmes, Joseph Iosue, Lukasz Cin-         Cincio, and Patrick J Coles, “Variational quantum
     cio, Patrick J Coles, and Andrew Sornborger, “Vari-              state diagonalization,” npj Quantum Information 5, 1–
     ational fast forwarding for quantum simulation beyond            10 (2019).
     the coherence time,” npj Quantum Information 6, 1–10        [29] Guillaume Verdon, Jacob Marks, Sasha Nanda, Stefan
     (2020).                                                          Leichenauer, and Jack Hidary, “Quantum hamiltonian-
[11] K. Heya, K. M. Nakanishi, K. Mitarai,                 and        based models and the variational quantum thermalizer
     K. Fujii, “Subspace variational quantum simulator,”              algorithm,” arXiv preprint arXiv:1910.02071 (2019).
     arXiv:1904.08566 [quant-ph].                                [30] Peter D Johnson, Jonathan Romero, Jonathan Olson,
[12] Xiao Yuan, Suguru Endo, Qi Zhao, Ying Li, and Si-                Yudong Cao, and Alán Aspuru-Guzik, “QVECTOR: an
     mon C Benjamin, “Theory of variational quantum simu-             algorithm for device-tailored quantum error correction,”
     lation,” Quantum 3, 191 (2019).                                  arXiv:1711.02249 (2017).
[13] E. Farhi, J. Goldstone, and S. Gutmann, “A quantum          [31] Jarrod R McClean, Sergio Boixo, Vadim N Smelyanskiy,
     approximate optimization algorithm,” arXiv:1411.4028             Ryan Babbush, and Hartmut Neven, “Barren plateaus
     [quant-ph].                                                      in quantum neural network training landscapes,” Nature
[14] Z. Wang, S. Hadfield, Z. Jiang, and E. G. Rieffel, “Quan-        communications 9, 4812 (2018).
     tum approximate optimization algorithm for MaxCut: A        [32] Zoe Holmes, Kunal Sharma, M. Cerezo,                   and
     fermionic view,” Phys. Rev. A 97, 022304 (2018).                 Patrick J Coles, “Connecting ansatz expressiblity to gra-
[15] G. E. Crooks, “Performance of the quantum approximate            dient magnitudes and barren plateaus,” arXiv preprint
     optimization algorithm on the maximum cut problem,”              arXiv:2101.02138 (2021).
     arXiv:1811.08419 [quant-ph].                                [33] Kunal Sharma, M Cerezo, Lukasz Cincio,                 and
[16] Stuart Hadfield, Zhihui Wang, Bryan O’Gorman,                    Patrick J Coles, “Trainability of dissipative perceptron-
     Eleanor G Rieffel, Davide Venturelli, and Rupak Biswas,          based quantum neural networks,” arXiv preprint
     “From the quantum approximate optimization algorithm             arXiv:2005.12458 (2020).
     to a quantum alternating operator ansatz,” Algorithms       [34] M Cerezo, Akira Sone, Tyler Volkoff, Lukasz Cincio,
     12, 34 (2019).                                                   and Patrick J Coles, “Cost-function-dependent barren
[17] Carlos Bravo-Prieto, Ryan LaRose, M. Cerezo, Yigit               plateaus in shallow quantum neural networks,” arXiv
     Subasi, Lukasz Cincio, and Patrick J. Coles, “Variational        preprint arXiv:2001.00550 (2020).
     quantum linear solver: A hybrid algorithm for linear sys-   [35] Carlos Ortiz Marrero, Mária Kieferová, and Nathan
     tems,” arXiv:1909.05820 (2019).                                  Wiebe, “Entanglement induced barren plateaus,” arXiv
[18] X. Xu, J. Sun, S. Endo, Y. Li, S. C. Benjamin, and               preprint arXiv:2010.15968 (2020).
     X. Yuan, “Variational algorithms for linear algebra,”       [36] Taylor L Patti, Khadijeh Najafi, Xun Gao, and Su-
     arXiv:1909.03898 [quant-ph].                                     sanne F Yelin, “Entanglement devised barren plateau
[19] Bálint Koczor, Suguru Endo, Tyson Jones, Yuichiro Mat-           mitigation,” arXiv preprint arXiv:2012.12658 (2020).
     suzaki, and Simon C Benjamin, “Variational-state quan-      [37] Tyler Volkoff and Patrick J Coles, “Large gradients via
     tum metrology,” New Journal of Physics (2020).                   correlation in random parameterized quantum circuits,”
[20] Johannes Jakob Meyer, Johannes Borregaard, and                   Quantum Science and Technology (2021).
     Jens Eisert, “A variational toolbox for quantum multi-      [38] M. Cerezo and Patrick J Coles, “Impact of barren
     parameter estimation,” arXiv preprint arXiv:2006.06303           plateaus on the hessian and higher order derivatives,”
     (2020).                                                          arXiv preprint arXiv:2008.07454 (2020).
[21] Eric Anschuetz, Jonathan Olson, Alán Aspuru-Guzik,          [39] Andrew Arrasmith, M. Cerezo, Piotr Czarnik, Lukasz
     and Yudong Cao, “Variational quantum factoring,”                 Cincio,     and Patrick J Coles, “Effect of barren
     in Quantum Technology and Optimization Problems                  plateaus on gradient-free optimization,” arXiv preprint
     (Springer International Publishing, Cham, 2019) pp. 74–          arXiv:2011.12245 (2020).
     85.                                                         [40] Alexey Uvarov and Jacob Biamonte, “On barren plateaus
[22] S. Khatri, R. LaRose, A. Poremba, L. Cincio, A. T. Sorn-         and cost function locality in variational quantum algo-
     borger, and P. J. Coles, “Quantum-assisted quantum               rithms,” arXiv preprint arXiv:2011.10530 (2020).
     compiling,” Quantum 3, 140 (2019).                          [41] Guillaume Verdon, Michael Broughton, Jarrod R Mc-
[23] Kunal Sharma, Sumeet Khatri, Marco Cerezo, and                   Clean, Kevin J Sung, Ryan Babbush, Zhang Jiang, Hart-
     Patrick Coles, “Noise resilience of variational quantum          mut Neven, and Masoud Mohseni, “Learning to learn
12

       with quantum neural networks via classical neural net-          formation Processing 13, 2567–2586 (2014).
       works,” arXiv preprint arXiv:1907.05415 (2019).            [58] Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione,
[42]   Edward Grant, Leonard Wossnig, Mateusz Ostaszewski,             “An introduction to quantum machine learning,” Con-
       and Marcello Benedetti, “An initialization strategy for         temporary Physics 56, 172–185 (2015).
       addressing barren plateaus in parametrized quantum cir-    [59] Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick
       cuits,” Quantum 3, 214 (2019).                                  Rebentrost, Nathan Wiebe, and Seth Lloyd, “Quantum
[43]   Andrea Skolik, Jarrod R McClean, Masoud Mohseni,                machine learning,” Nature 549, 195–202 (2017).
       Patrick van der Smagt, and Martin Leib, “Layerwise         [60] Kerstin Beer, Dmytro Bondarenko, Terry Farrelly, To-
       learning for quantum neural networks,” arXiv preprint           bias J Osborne, Robert Salzmann, Daniel Scheiermann,
       arXiv:2006.14904 (2020).                                        and Ramona Wolf, “Training deep quantum neural net-
[44]   Cheng Xue, Zhao-Yun Chen, Yu-Chun Wu, and Guo-                  works,” Nature Communications 11, 1–6 (2020).
       Ping Guo, “Effects of quantum noise on quantum             [61] Amira Abbas, David Sutter, Christa Zoufal, Aurélien
       approximate optimization algorithm,” arXiv preprint             Lucchi, Alessio Figalli, and Stefan Woerner, “The
       arXiv:1909.02196 (2019).                                        power of quantum neural networks,” arXiv preprint
[45]   Jeffrey Marshall, Filip Wudarski, Stuart Hadfield, and          arXiv:2011.00027 (2020).
       Tad Hogg, “Characterizing local noise in QAOA circuits,”   [62] Bryan O’Gorman, William J Huggins, Eleanor G Ri-
       IOP SciNotes 1, 025208 (2020).                                  effel, and K Birgitta Whaley, “Generalized swap net-
[46]   Laura Gentini, Alessandro Cuccoli, Stefano Piran-               works for near-term quantum computing,” arXiv preprint
       dola, Paola Verrucchi, and Leonardo Banchi, “Noise-             arXiv:1905.05118 (2019).
       resilient variational hybrid quantum-classical optimiza-   [63] Sergey Bravyi, Alexander Kliesch, Robert Koenig, and
       tion,” Physical Review A 102, 052414 (2020).                    Eugene Tang, “Obstacles to state preparation and vari-
[47]   Jonas M Kübler, Andrew Arrasmith, Lukasz Cin-                   ational optimization from symmetry protection,” arXiv
       cio, and Patrick J Coles, “An adaptive optimizer for            preprint arXiv:1910.08980 (2019).
       measurement-frugal variational algorithms,” Quantum 4,     [64] Zhihui Wang, Stuart Hadfield, Zhang Jiang,            and
       263 (2020).                                                     Eleanor G. Rieffel, “Quantum approximate optimization
[48]   Andrew Arrasmith, Lukasz Cincio, Rolando D Somma,               algorithm for maxcut: A fermionic view,” Phys. Rev. A
       and Patrick J Coles, “Operator sampling for shot-frugal         97, 022304 (2018).
       optimization in variational algorithms,” arXiv preprint    [65] Matthew B Hastings, “Classical and quantum bounded
       arXiv:2004.06252 (2020).                                        depth approximation algorithms,” arXiv preprint
[49]   Yudong Cao, Jonathan Romero, Jonathan P Olson,                  arXiv:1905.07047 (2019).
       Matthias Degroote, Peter D Johnson, Mária Kieferová,       [66] Zhang Jiang, Eleanor G Rieffel, and Zhihui Wang, “Near-
       Ian D Kivlichan, Tim Menke, Borja Peropadre, Nico-              optimal quantum circuit for grover’s unstructured search
       las PD Sawaya, et al., “Quantum chemistry in the age            using a transverse field,” Physical Review A 95, 062317
       of quantum computing,” Chemical reviews 119, 10856–             (2017).
       10915 (2019).                                              [67] V. Akshay, H. Philathong, M. E. S. Morales, and J. D.
[50]   Rodney J Bartlett and Monika Musiał, “Coupled-cluster           Biamonte, “Reachability deficits in quantum approxi-
       theory in quantum chemistry,” Reviews of Modern                 mate optimization,” Phys. Rev. Lett. 124, 090504 (2020).
       Physics 79, 291 (2007).                                    [68] Sam McArdle, Suguru Endo, Alan Aspuru-Guzik, Si-
[51]   Joonho Lee, William J Huggins, Martin Head-Gordon,              mon C Benjamin, and Xiao Yuan, “Quantum computa-
       and K Birgitta Whaley, “Generalized unitary coupled             tional chemistry,” Reviews of Modern Physics 92, 015003
       cluster wave functions for quantum computation,” Jour-          (2020).
       nal of chemical theory and computation 15, 311–324         [69] Jonathan Romero, Ryan Babbush, Jarrod R McClean,
       (2018).                                                         Cornelius Hempel, Peter J Love, and Alán Aspuru-
[52]   A. Kandala, A. Mezzacapo, K. Temme, M. Takita,                  Guzik, “Strategies for quantum computing molecular en-
       M. Brink, J. M. Chow,           and J. M. Gambetta,             ergies using the unitary coupled cluster ansatz,” Quan-
       “Hardware-efficient variational quantum eigensolver for         tum Science and Technology 4, 014008 (2018).
       small molecules and quantum magnets,” Nature 549, 242      [70] Gerardo Ortiz, James E Gubernatis, Emanuel Knill, and
       (2017).                                                         Raymond Laflamme, “Quantum algorithms for fermionic
[53]   Frank Arute et al., “Hartree-fock on a superconduct-            simulations,” Physical Review A 64, 022319 (2001).
       ing qubit quantum computer,” Science 369, 1084–1089        [71] Sergey B Bravyi and Alexei Yu Kitaev, “Fermionic
       (2020).                                                         quantum computation,” Annals of Physics 298, 210–226
[54]   Frank Arute et al., “Quantum approximate optimization           (2002).
       of non-planar graph problems on a planar superconduct-     [72] Marcel Nooijen, “Can the eigenstates of a many-body
       ing processor,” arXiv preprint arXiv:2004.04197 (2020).         hamiltonian be represented exactly using a general two-
[55]   Dave Wecker, Matthew B Hastings, and Matthias                   body cluster expansion?” Physical review letters 84, 2108
       Troyer, “Progress towards practical quantum variational         (2000).
       algorithms,” Physical Review A 92, 042303 (2015).          [73] Wen Wei Ho and Timothy H Hsieh, “Efficient variational
[56]   Roeland Wiersema, Cunlu Zhou, Yvette de Sereville,              simulation of non-trivial quantum states,” SciPost Phys
       Juan Felipe Carrasquilla, Yong Baek Kim, and Henry              6, 029 (2019).
       Yuen, “Exploring entanglement and optimization within      [74] Chris Cade, Lana Mineh, Ashley Montanaro, and Stasja
       the hamiltonian variational ansatz,” PRX Quantum 1,             Stanisic, “Strategies for solving the fermi-hubbard model
       020319 (2020).                                                  on near-term quantum computers,” Physical Review B
[57]   Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione,        102, 235122 (2020).
       “The quest for a quantum neural network,” Quantum In-
You can also read