Ensemble Riemannian Data Assimilation over the Wasserstein Space

Page created by Linda Schwartz
 
CONTINUE READING
Ensemble Riemannian Data Assimilation over the Wasserstein Space
Ensemble Riemannian Data Assimilation over the Wasserstein
                                                                                            Space

                                              Sagar K. Tamang1,2 , Ardeshir Ebtehaj1,2 , Peter J. Van Leeuwen3 , Dongmian Zou4 , and
                                                                                         Gilad Lerman5
arXiv:2009.03443v1 [stat.ME] 7 Sep 2020

                                                     1
                                                         Saint Anthony Falls Laboratory, University of Minnesota-Twin Cities
                                          2
                                              Department of Civil, Environmental and Geo- Engineering, University of Minnesota-Twin
                                                                                             Cities
                                                          3
                                                              Department of Atmospheric Science, Colorado State University
                                                                                 4
                                                                                     Duke Kunshan University
                                                              5
                                                                  School of Mathematics, University of Minnesota-Twin Cities

                                                                                           Abstract

                                                    In this paper, we present a new ensemble data assimilation paradigm over a Rieman-
                                                 nian manifold equipped with the Wasserstein metric. Unlike Eulerian penalization of
                                                 error in the Euclidean space, the Wasserstein metric can capture translation and shape
                                                 difference between square integrable probability distributions of the background state
                                                 and observations, enabling to formally penalize geophysical biases in a non-Gaussian
                                                 state-space. The new approach is applied to dissipative and chaotic evolutionary dy-
                                                 namics with a wide range of applications in Earth system models. Its advantages over
                                                 classic variational and particle filter techniques are documented under systematic errors
                                                 and non-Gaussian state-spaces.

                                                                                               1
Ensemble Riemannian Data Assimilation over the Wasserstein Space
1    Introduction

Extending the forecast skill of Earth System Models (ESM) relies on ad-
vancing the science of data assimilation (DA). A large body of current DA
methodologies, either filtering (Evensen, 1994, 2003; Kalman, 1960; Reichle
et al., 2002a,0) or variational approaches (Andersson et al., 1994; Ebtehaj and
Foufoula-Georgiou, 2013; Le Dimet and Talagrand , 1986; Lorenc, 1986; Park
and Županski , 2003; Reichle et al., 2001), are derived from basic principles
of Bayesian inference under the assumption that the model and observation
errors are drawn from zero-mean Gaussian distributions. It is well docu-
mented that these assumptions often limit the forecast skills of DA systems
(Chen et al., 2019a; Dee, 2005; Ebtehaj et al., 2014; Walker et al., 2001) as
natural processes are not necessarily Gaussian (Bocquet et al., 2010; Pires
et al., 2010) and systematic errors or biases exist in ESMs and observations
– chiefly due to under-representation of the underlying physics and errors in
instruments as well as retrieval algorithms (Dee, 2003).
    Apart from particle filters (Van Leeuwen, 2010), which are intrinsically de-
signed to deal with non-Gaussian state-space, numerous modifications to the
variational DA (VDA) and ensemble filtering techniques have been made to
tackle non-Gaussianity (Anderson, 2010; Han and Li , 2008; Mandel and Bee-
zley, 2009; Pires et al., 1996). In four-dimensional VDA, where non-Gaussian
state-space may lead to local minima, a quasi-static VDA is proposed to en-
sure convergence to a global minimum by gradually increasing the assimila-
tion intervals (Pires et al., 1996). For ensemble-based filters, Anderson (2010)
proposes a new approach to account for non-Gaussian priors and posteriors

                                        2
Ensemble Riemannian Data Assimilation over the Wasserstein Space
by utilizing rank histograms (Anderson, 1996; Hamill , 2001). A hybrid en-
semble Kalman-particle filter is also developed (Mandel and Beezley, 2009) to
utilize both the ensemble Kalman filters (EnKF; Evensen, 1994) and particle
filters to cope with non-Gaussian priors in high-dimensional geophysical DA
systems.
  In recent years, significant progress has been made to treat systematic
errors through numerous ad-hoc methods such as field alignment techniques
(Ravela et al., 2007) and morphing EnKF (Beezley and Mandel , 2008) that
can tackle position errors between observations and forecast. Dual state-
parameter EnKF (Moradkhani et al., 2005) is also developed to resolve sys-
tematic errors originating from parameter uncertainties. Bias aware variants
of the Kalman filter are designed (De Lannoy et al., 2007a,0; Drécourt et al.,
2006; Kollat et al., 2008) to simultaneously update the state-space and an a
priori estimate of the additive biases. In parallel, the cumulative distribu-
tion function matching (Reichle and Koster , 2004) has garnered widespread
attention in land DA to alleviate systematic biases in remote sensing obser-
vations.
  Despite significant progress in developing versatile DA systems, many
existing techniques are ad-hoc and cannot formalize non-Gaussianity and
systematic errors rigorously. Even though particle filters can handle non-
Gaussian likelihood functions, when observations lie away from the support
set of the particles, their ensemble variance collapses over time and this de-
generates the filters (Poterjoy and Anderson, 2016). In this paper, we aim
to address the shortcomings of current Bayesian DA methodologies by devel-

                                       3
Ensemble Riemannian Data Assimilation over the Wasserstein Space
oping a new ensemble methodology, which borrows ideas from Riemannian
geometry and optimal mass transport, that can effectively handle a non-
Gaussian state-space and geophysical biases.
  In order to emphasize the advantage of this Riemannian perspective over
common Bayesian methods for DA, we refer to the following technical prop-
erty: The set of equal-mean Gaussian distributions, which are common in
Bayesian modeling, span an affine Euclidean subspace (Amari , 2012). How-
ever, sufficiently smooth non-Gaussian distributions with finite second-order
moments constitute a Riemannian statistical manifold (Lauritzen, 1987). It
is equipped with a geodesic distance, which assigns to any two non-Gaussian
probability distributions the shortest curve on the Riemannian manifold they
belong to. This distance thus encapsulates the dissimilarity between the mean
values as well as all higher-order moments of the two distributions (Pennec,
2006).
  More specifically, the geodesic distance on the Riemannian manifold of
square-integrable probability distributions is the Wasserstein distance (Vil-
lani , 2003). It stems from the theory of optimal mass transport (Kantorovich,
1942; Monge, 1781). Unlike the Euclidean distance, which is “Eulerian” in the
sense that it becomes insensitive to the magnitude of translation between two
disjoint densities, the Wasserstein distance acts as a “Lagrangian” metric and
penalizes the mismatch based on the difference between their central positions
as well as their shapes (Chen et al., 2019b; Ning et al., 2014). Wasserstein
space representations have been established for the following physical mod-
els: The Fokker-Plank equation (Jordan et al., 1998), the dissipative porous

                                      4
media equation (Otto, 2001) and the semi-geostrophic approximation of the
compressible Navier-Stokes equations (Cullen et al., 1991).
  Several attempts have been made to utilize the Wasserstein metric in geo-
physical DA. Ning et al. (2014) used the Wasserstein distance to reduce fore-
cast uncertainty due to errors in parameter estimation in dissipative evolu-
tionary equations. Feyeux et al. (2018) suggested replacing the Euclidean
distance in VDA with the Wasserstein distance to penalize the position error
between state and observations. Tamang et al. (2020) introduced a Wasser-
stein regularization of the VDA cost function with particular emphasis on
bias correction under chaotic dynamics.
  All of the aforementioned techniques stem from the variational prob-
lem formulation and therefore they are unable to rigorously tackle non-
Gaussian state-space. Furthermore, the high computational cost associated
with the Wasserstein distance inhibit the application of these methods in
high-dimensional geophysical systems. In this work, we attempt to answer
some of the key questions which still remain to be explored: How can a Rie-
mannian interpretation of the DA using the Wasserstein distance formally
account for non-Gaussian state-space? Why and how does the Riemannian
DA can correct for systematic errors? Is there a way for ensemble implemen-
tation in high-dimensional problems given the fact that the computation of
the Wasserstein distance is costly?
  We present a novel ensemble DA methodology namely Ensemble Rieman-
nian DA (EnRDA). As explained, the presented methodology extends the DA
formalism over a Riemannian manifold to account for non-Gaussian state-

                                      5
space under systematic geophysical biases. It is important to note that since
EnRDA operates in the probability domain it can lead to the estimation of
the entire forecast probability distribution. The initial results of the EnRDA
are demonstrated for dissipative advection-diffusion and chaotic Lorenz-63
system with a wide range of prospective applications in the forecast of mass,
momentum, and heat transport in Earth system models.
    The organization of the paper is as follows: Section 2 provides a brief
background on Bayesian DA formulations and optimal mass transport. The
mathematical formalism of the EnRDA is described in Section 3. Section 4
presents the results and compares them with their Euclidean counterparts.
Section 5 discusses the findings and ideas for future research.

2     Background

2.1    Notations

Throughout, small and capital boldface letters represent m-element column
vectors x ∈ Rm and m-by-n matrices X ∈ Rm×n , respectively, whereas
     m×n
Rm
 + (R+   ) denotes that vectors (matrices) only contain non-negative real
numbers. The 1m denotes an m-element vector of ones, Im is an m × m
identity matrix, δ(x − y) represents Kronecker delta function with value 1 iff
x = y and 0 otherwise, diag(x) ∈ Rm×m represents a diagonal matrix with
entries given by x ∈ Rm and EX (x) is the expectation of x under probability
density function p(x). Notation x ∼ N (µ, Σ) denotes that the random vec-
tor x is drawn from a Gaussian distribution with mean µ and covariance Σ.

                                      6
The square of the weighted `2 -norm of x is represented as kxk2B−1 = xT B−1 x,
where B is a positive definite matrix and (·)T is the transposition opera-
tor. Notations of x      y and x        y represent the element-wise Hadamard
product and division between equal length vectors x and y, respectively,
hA, Bi = tr(AT B) denotes the Frobenius inner product between matrices A
and B, tr(·) and det(·) represent trace and determinant of a square matrix
respectively.

2.2     Data Assimilation on Euclidean Space

In this section, we provide a brief review on derivation of classic variational
DA and particle filters based on the Bayes’ theorem to set the stage for the
presented Ensemble Riemannian DA formalism.

2.2.1   Variational Formulation

Let us consider a discrete-time Markovian state-space and its observations as
follows:

                  xt = M(xt−1 ) + ω t ,         ω t ∼ N (0, B)
                                                                           (1)
                  yt = H(xt ) + v t ,           v t ∼ N (0, R) ,

where xt ∈ Rm and yt ∈ Rn represent the state variables and the observations
at time t; M : Rm → Rm and H : Rm → Rn are the deterministic forward
model and observation operators; ω t ∈ Rm and v t ∈ Rn are the independent
and identically distributed model and observation errors, respectively.
   Recalling the Bayes’ theorem, dropping the time subscript, without loss of
generality the posterior probability density function (pdf) of the state given

                                          7
the observation can be obtained as p(x|y) = p(y|x) p(x)
/p(y), where p(y|x) is proportional to the likelihood function, p(x) is the prior
density and p(y) denotes the distribution of observations. As is evident, in
the log-space, the theorem can be expressed as follows:

             − log p(x|y) = − log p(x) − log p(y|x) + log p(y) .             (2)

Letting xb = EX (x) ∈ Rm represents the background state, ignoring the
additive constant terms including log p(y) and assuming Gaussian errors,
Equation 2 can be expanded to the known three-dimensional VDA (3D-Var)
cost function (Lorenc, 1986) as follows:

                 1                          1
   − log p(x|y) ∝ (x − xb )T B−1 (x − xb ) + (y − H(x))T R−1 (y − H(x))
                 2                          2
                ∝ kx − xb k2B−1 + ky − H(x)k2R−1 .
                                                                             (3)
As a result, the analysis state obtained by minimization of the 3D-Var cost
function in Equation 3 is the mode of the posterior distribution that coincides
with the posterior mean when errors are drawn from unbiased Gaussian densi-
ties and H is a linear operator. Using the Woodbury matrix inversion lemma
(Woodbury and Woodbury, 1950), it can be easily demonstrated that for a
linear observation operator, the analysis states in the 3D-Var and Kalman
filter are equivalent. As is evident, zero-mean Gaussian assumption leads to
penalization of the error through the weighted Euclidean norm.

                                       8
2.2.2   Particle Filters

Particle filters (Gordon et al., 1993; Van Leeuwen, 2010; Van Leeuwen et al.,
2019) in DA were introduced to address the issue of non-Gaussian state-
space by representing the prior and posterior distributions through a weighted
ensemble of model outputs referred to as the “particles”. In its standard
discrete setting, using Monte Carlo simulations, the state-space prior density
p(x) is represented by a sum of equal-weight Kronecker delta functions as:
                                      N
                                   1 X
                            p(x) =       δ(x − xi ) ,                      (4)
                                   N i=1

where xi ∈ Rm is the state variable represented by the ith particle and N is
the total number of particles.
   Each of these N particles are then evolved through the nonlinear model in
Equation 1. Let us assume that at every assimilation cycle, when observations
become available, the probability of observations y ∈ Rn given the state
variable xi follows a Gaussian distribution as:
                                                             
                    1                1            T
p(y|xi ) =                   exp − y − H(xi ) R−1 y − H(xi )
                                                           
                n/2      1/2
                                                                           ∀i.
           (2π) det(R)               2
                                                                           (5)
Using the Bayes’ theorem, under the Gaussian error assumption, it can be
easily shown that the posterior distribution p(x|y) can be approximated using
a set of weighted particles as:

                                      N
                                      X
                           p(x|y) =         wi δ(x − xi ) ,                (6)
                                      i=1

                                            9
where {wi }N                                     th
           i=1 ∈ R represents the weight of the i particle

                                     p(y|xi )
                             wi = P N            .                         (7)
                                        p(y|x j)
                                    j=1

  The particles are then resampled from the posterior distribution in Equa-
tion 6 based on their relative weights and are propagated forward in time
according to the model dynamics. As is evident, in particle filters, weights
of each particle are updated using the Gaussian likelihood function under a
zero-mean error assumption. However, in the presence of systematic biases,
when the support sets of particles and the observation are disjoint, only the
weights of a few particles become significantly large and weights of all other
particle tend to zero. As the underlying dynamical system progresses in time,
only those few particles, with relatively larger weights, are resampled, which
gradually makes the filter degenerate in time (Poterjoy and Anderson, 2016).
It is to note that the problem of filter degeneracy exists even when the ensem-
ble size is low and requires an exponential growth in the number of particles
to maintain a constant effective particle size (Doucet and Johansen, 2009).

2.3   Optimal Mass Transport

The theory of optimal mass transport (OMT) first coined by Gaspard Monge
(Monge, 1781) and later extended by Kantorovich (Kantorovich, 1942) was
developed to minimize transportation cost in resource allocation problems
with purely practical motivations. Later on mathematicians discovered that
the OMT provides a rich ground to compare and morph probability distribu-
tions and uncovered new connections to partial differential equations (Jordan

                                      10
et al., 1998; Otto, 2001) as well as functional analysis (Benamou and Brenier ,
2000; Brenier , 1987; Villani , 2003).
   In a discrete setting, for two probability histograms {pxb ∈ Rm
                                                                     P
                                                                 + :   i pxbi =

1} and {py ∈ Rm
                       P
                  +  :    j pyj = 1} with their respective measures supported

on xbi and yj , a “ground” transportation cost matrix C ∈ Rm×m
                                                           +   is defined
such that its elements cij = |xbi − yj |q represent the cost of transporting unit
probability masses from location xbi to yj , where q is a positive exponent.
Without loss of generality, assuming that pxb and py have the same number of
support points, OMT determines an optimal transportation plan Ua ∈ Rm×m
                                                                    +

that can linearly map the two histograms onto each other with minimum
amount of total transportation cost or “work” as follows:

 Ua = argmin hC, Ui         s.t.     U ≥ 0,    U1m = pxb ,    UT 1m = py . (8)
           U

   It is worth noting that the transportation plan can be interpreted as a joint
distribution that couples the marginals densities pxb and py . For the ground
transportation cost with q = 2, the OMT problem in Equation 8 is convex
and defines the square of the 2-Wasserstein distance d2W (pxb , py ) between the
probability histograms.
   One key question that might arise here is – what is the advantage of
the Wasserstein distance for interpolating probability distributions compared
to other measures of proximity – such as the Hellinger distance (Hellinger ,
1909) or the KullbackLeibler (KL) divergence (Kullback and Leibler , 1951)?
To elaborate on this question, we confine our consideration to the Gaus-
sian densities over which the Wasserstein distance can be represented in a
closed form. In particular, let us interpolate between two Gaussian distri-

                                         11
Figure 1: Barycenter of two Gaussian distributions N (1.4, 0.01) and N (−1.1, 0.35) as a
function of displacement parameter η ∈ [0, 1] for the (a) Hellinger distance, (b) Kullback-
Leibler divergence, and (c) Wasserstein distance.

butions N (µx , Σx ) with N (µy , Σy ) with an interpolation or displacement
parameter η. It can be shown that the interpolated density, with mini-
mum sum of squares of the Wasserstein distance, is a Gaussian distribution
N (µη , Ση ), where µη = (1 − η) µx + η µy and Ση = Σ−1/2
                                                     x    (1 − η) Σx +
                  1/2 2 −1/2
η (Σx1/2 Σy Σ1/2
                     
             x   )     Σx    (Chen et al., 2019b).
   Figure 1 shows the spectrum of barycenters between two Gaussian pdfs for
a range of the displacement parameter η ∈ [0, 1]. As shown, the interpolated
densities using the Hellinger distance, which is Euclidean in the space of
probability distributions, are bimodal. Although the Gaussian shape of the
interpolated densities using the KL divergence is preserved, the variance of
the interpolants is not necessarily bounded by the variances of the input
Gaussian densities. Unlike these metrics, as shown, the Wasserstein distance
moves the mean and preserves the shape of the interpolants through a natural
morphing process.

                                            12
Figure 2:   Wasserstein barycenters pη between two Gaussian mixture models pxb :
0.5 N (−12, 0.4) + 0.5 N (−8, 0.8) and py : 0.55 N (5, 4) + 0.45 N (9.5, 4) as a function of the
displacement parameter η ∈ [0, 1]. Also shown is the Wasserstein barycenter for η = 0.5.

   As previously noted, this metric is not limited to any Gaussian assumption.
Figure 2 shows the Wasserstein barycenter between two bimodal mixtures of
Gaussians as a function of the displacement parameter η ∈ [0, 1]. The results
clearly show the Lagrangian nature of the Wasserstein metric that penalizes
the mismatch not only based on the shape but also the shift between the two
probability density functions, which makes it suitable to tackle geophysical
biases. In a general setting, it can be shown that d2W (pxb , py ) = d2W (pxb , py )+
              2
 µxb − µy , where pxb and py are the centred zero-mean probability masses
and µxb and µy are the respective mean values.

                                              13
3     Ensemble Riemannian Data Assimilation

3.1    Problem Formulation

First, let us recall that the weighted mean of a cloud of points {xi }Ni=1 ∈ R
                                                                               m

on Euclidean space is µx = N
                               P
                                 i=1 ηi xi /N for a given family of non-negative
         P
weights i ηi = 1. This expected value is equivalent to solving the following
variational problem:
                                      N
                                      X
                        µx = argmin         ηi kxi − xk2 .                  (9)
                                 x
                                      i=1

Thus, the 3D-Var problem in Equation 3 after Cholesky decomposition (Nash,
1990) of the error covariance matrices and rearrangement of terms can be
interpreted as a “barycenter problem” in the Euclidean space, where the
analysis state is the weighted mean of the background state and observation.
    By changing the distance metric from Euclidean to the Wasserstein (Agueh
and Carlier , 2011), a Riemannian barycenter can be defined as the Fréchet
mean (Fréchet, 1948) of N probability histograms with finite second-order
moments
                                      N
                                      X
                        pη = argmin         ηk d2W (p, pk ) .              (10)
                                 p
                                      k=1
Let us assume that the probability histogram of the background state pxb =
1
  PN          i
N   i=1 δ(x−x ) can be represented through a sum of equal weight Kronecker

delta functions located at N ensemble members of the state variable xi ∈
Rm . We similarly approximate the distribution of the observations py =
1
  PM            j∗                                              j∗ M      n
M    j=1 δ(y − y ) using the replicated “pseudo-observations” {y }j=1 ∈ R

obtained by perturbing it with zero-mean Gaussian noise v ∗ ∼ N (0, R).

                                      14
The EnRDA defines the probability histogram of the analysis state pxa ∈
Rm as the Frchet barycenter over the Wasserstein space as follows:

  pxa = argmin η d2W (px , pxb ) + (1 − η) d2W (|det(H0 (x))|−1 px , py ) ,
              
                                                                              (11)
           px

where the displacement parameter η > 0 assigns the relative weights to the
observation and background term to capture their respective geodesic dis-
tances from the true state and H0 (·) is the Jacobian of the observation op-
erator assuming that the support set of the probability histograms of the
observation and state variables are the same. The optimal value of η shall
be determined empirically using some reference data. It is important to
note that the solution of the above DA formalism involves finding the opti-
mal transportation plan (Equation 8) or the joint density, which couples the
background and observation marginal distributions.
  From a Bayesian point of view, the above problem can be derived as-
suming that the model and observation errors have finite second-order mo-
ments and are explained by generalized Riemannian Gaussian distributions
(Said et al., 2017). Specifically, following the Bayesian principle explained
in Equation 2, the EnRDA can be derived assuming that the prior p(x) and
                                                           
likelihood function p(y|x) are p(x) ∝ exp −η d2W (px , pxb ) and p(y|x) ∝
exp −(1 − η)d2W |detH0 (x)|−1 px , py , .
                                      

  From the joint histogram (Ua ) coupling the pxb and py (Equation 8),
the histogram of the analysis state can be obtained based on the McCann’s
interpolation (McCann, 1997; Peyré et al., 2019),
                           N X
                           X M
                                     uaij δ (1 − η) xi + η yj∗ .
                                                             
                   pxa =                                                      (12)
                           i=1 j=1

                                         15
Samples from the analysis state histogram in Equation 12 can be drawn to
produce model ensemble forecast. It should be noted that computation of the
joint histogram is an expensive linear programming problem. The widely used
interior-point methods (Altman and Gondzio, 1999) and the Orlin’s (Orlin,
1993) algorithm have super-cubic run time with computational complexity of
O(d3 log d), where d represents the total number of support points over which
the input probability histograms are defined. This is a serious limitation in
high-dimensional geophysical DA problems.

3.2   Entropic Regularization of EnRDA

In order to speed up computation of coupling between pxb and py , the problem
in Equation 8 can be regularized (Cuturi , 2013) as follows:

 Ua = argmin hC, Ui−γ H(U)             s.t. U ≥ 0, U1m = pxb , UT 1m = py , (13)
          U

where γ > 0 is the regularization parameter and H(U) := hU, log U − 1m 1T
                                                                        mi

represents the Gibbs-Boltzmann relative entropy function. Note that the
relative entropy is a concave function and thus its negative value is convex.
  The Lagrangian function (L) of Equation 13 can be obtained by adding
two dual variables or Lagrangian multipliers q, r ∈ Rm as follows:

  L(U, q, r) = hC, Ui − γ H(U) − hq, U1m − pxb i − hr, UT 1m − py i .      (14)

Setting the derivative of the Lagrangian function to zero, we have

              ∂L(U, q, r)
                          = cij + γ log(uij ) − qi − rj = 0 ,   ∀i, j .    (15)
                 ∂uij

                                         16
The convexity of the entropic regularization keeps the problem in Equa-
tion 13 strongly convex and it can be shown (Peyré et al., 2019) that Equa-
tion 15 leads to a unique optimal joint density with the following form:

                           Ua = diag(v) K diag(w) ,                        (16)

where v = exp(q)        (γ 1m ) ∈ Rm and w = exp(r)      (γ 1m ) ∈ Rm are the
unknown scaling variables and K ∈ Rm×m+    is the Gibbs kernel, associated
                                             cij
with cost matrix C with element kij = exp (− ).
                                              γ
   From the mass conservation constraints in Equation 13 and scaling form
of the optimal joint density in Equation 16 we can derive,

   diag(v) K diag(w)1m = pxb        and      diag(w) KT diag(v)1m = py .
                                                                           (17)
  Using Equation 17, the two unknown scaling variables v and w can be
iteratively solved using the Sinkhorn’s algorithm (Cuturi , 2013) as follows:

           vl+1 = pxb     (Kwl )    and      wl+1 = py    (KT vl ) .       (18)

3.3   Selection of the Entropic Regularization Parameter

The entropic regularization parameter γ plays a very important role in char-
acterization of the joint density; however, there exists no closed form solu-
tion for its optimal selection. Generally speaking, increasing the value of γ
will increase convexity of the cost function and thus computational efficiency;
however, at the expense of reduced coupling between the marginal histograms
of the background state and observations. The reduced coupling might be in-
terpreted physically using the first and second law of thermodynamics, which

                                      17
states that the drive of thermodynamic systems toward equilibrium is a result
of a tendency toward maximum entropy or disorder. Therefore, we expect
that by increasing the entropic regularization, the degree of coupling is re-
duced and the probability masses of the joint distribution becomes more
spread out.
    As an example, the effects of γ on the coupling between two Gaussian
mixture models pxb and py , shown in Figure 2, are demonstrated in Figure 3.
It can be seen that at smaller values of γ = 0.001, the probability masses of
the joint distribution are sparse and lie compactly along the main diagonal –
capturing a strong coupling between the background state and observations.
However, as the value of γ increases, the probability masses of the joint
distribution become more dense and spread out – reflecting less degree of
dependencies between the marginals. It is important to note that in limiting
cases, as γ → 0, the solution of Equation 13 converges to the true optimal
joint histogram, while as γ → ∞ the entropy of the analysis state increases
and tends to pxb pT
                  y.

4    Numerical Experiments and Results

In order to demonstrate the performance of the EnRDA and assess its effec-
tiveness, we focus on the linear advection-diffusion equation and the chaotic
Lorenz-63 model (Lorenz , 1963). The advection-diffusion model explains a
wide range of heat, mass, and momentum transport across the land, vege-
tation, and atmospheric continuum, and has been utilized to evaluate the
performance of DA methodologies (Berardi et al., 2016; Ebtehaj et al., 2014;

                                     18
Figure 3: The effect of the entropic regularization parameter γ on the optimal joint his-
togram coupling two Gaussian mixture models pxb : 0.5 N (−12, 0.4) + 0.5 N (−8, 0.8) and
py : 0.55 N (5, 4) + 0.45 N (9.5, 4) – shown in Figure 2.

Hurkmans et al., 2006; Ning et al., 2014; Zhang et al., 1997). Similarly, the
Lorenz-63 model, as a chaotic model of atmospheric convection, has been
widely used in testing the performance of DA methodologies (Goodliff et al.,
2015; Miller et al., 1994; Nakano et al., 2007; Tamang et al., 2020; Tandeo
et al., 2015; Van Leeuwen, 2010). Throughout, under controlled experimental
settings with foreknown model and observation errors, we run the forward
models under systematic errors and compare the results of the EnRDA with
3D-Var for advection-diffusion dynamics and with a particle filter for the
Lorenz-63 system.

4.1     Advection-Diffusion Equation

4.1.1   State-space Characterization

The advection-diffusion is a special case of the Navier-Stokes partial differ-
ential equation and its linearized form, with constant diffusivity in an incom-

                                               19
pressible fluid flow, is expressed for a conserved physical quantity x(s, t) as
follows:
                     ∂x(s, t)
                              +a    ∇x(s, t) = D ∇2 x(s, t) ,               (19)
                       ∂t
where s ∈ Rk represents a k−dimensional spatial domain and t is time. In
the above expression, a = (a1 , . . . , ak ) ∈ Rk is the constant velocity vector
and D = diag(D1 , . . . , Dk ) ∈ Rk×k represents the diffusivity matrix.
   Given initial condition x(s, t = 0), owing to its linearity, the solution at
time t can be obtained by convolving the initial condition with a Kronecker
delta function δ(s − a t) followed by convolution with a zero-mean Gaussian
kernel G(s, t)
                                                           
                                   1                 1 T −1
                 G(s, t) =                exp       − s Σ s ,               (20)
                           (2π) det(Σ)1/2
                               k/2                   2
where the covariance matrix of the kernel is Σ = 2 D t.

4.1.2   Experimental Setup and Results

In this subsection, we present the results of DA experiments on 1-D and 2-D
advection-diffusion equations. For the 1-D case, the state-space is character-
ized over a spatial domain s ∈ (0, 60] with a discretization of ∆s = 0.1.
The model parameters are chosen to be a = 0.8 [L/T] and D = 0.25
[L2 /T]. The initial state is considered to be a bimodal mixture of Gaus-
sian distributions obtained by superposition of two Kronecker delta func-
tions x(s, t = 0) = 300 δ(s) – evolved for time 15 and 25 [t], respectively.
The ground truth of the state-space is then obtained by evolving the initial
state with a time step of ∆t = 0.5 over a time period of T = 0–30 [t] in the
absence of any model error.

                                       20
The observations are obtained at assimilation intervals 10∆t by corrupting
the ground truth with a heteroscedastic Gaussian noise v t ∼ N (0, diag(εy x
x)), where εy = 0.05. We introduce both systematic and random errors in
model simulations to evolve the state-space under model errors. For the
systematic error, model velocity and diffusivity coefficient are set to a0 =
0.12 [L/T] and D0 = 0.4 [L2 /T] respectively, and for the random error, a
Gaussian noise ω t ∼ N (0, diag(εb x    x)) is added at every ∆t, where εb =
0.02. For implementation of EnRDA, to construct the distribution of the
background state, 100 model ensemble members are generated by corrupting
the ground truth at T = 0 with a heteroscedastic Gaussian noise ω 0 ∼
N (0, diag(ε0 x   x)), where ε0 = 0.05. The regularization and displacement
parameters are set to γ = 3 and η = 0.2 by trial and error. To obtain a robust
conclusion about the comparison of the proposed EnRDA methodology with
3D-Var, experiments are repeated for 50 independent and identical simulation
scenarios.
   The evolution of the initial state over a time period 0 to 30 [t] and the
results comparing the EnRDA and 3D-Var at 5, 15, and 25 [t] are shown
in Figure 4. As demonstrated, during all time steps, EnRDA reduced the
analysis uncertainty, in terms of both bias and unbiased root mean squared
error (ubrmse), and the shape of the entire state-space is properly preserved
and remains closer to the ground truth – compared to 3D-Var. As seen, in the
3D-Var, although the analysis state follows the true state reasonably well for
initial time steps, as the system propagates over time under systematic errors,
the analysis state deviates further away from the ground truth and the shape

                                       21
Figure 4: (a) Temporal evolution of a bimodal initial state under a linear advection-diffusion
equation and (b–d) the true state xtr , observations y and analysis states xa by 3D-Var and
the EnRDA for three snapshots of time at 5, 15 and 25 [t], respectively. Also provided are
the bias and ubrmse of the analysis state by EnRDA (3D-Var) at respective times.

of the analysis state is no longer preserved properly. For 50 independent
simulations, on average, the bias and ubrmse are decreased by 97.9% and
73.1%, respectively, compared to the 3D-Var.
   Figure 5 shows the results of a 2-D experiment, elaborating more on the
ways that the Riemannian DA can tackle systematic errors in comparison
to the classic variational approach over the Euclidean space. In particular,
the results demonstrate the role of the displacement parameter η in EnRDA
versus the ratio of the error covariance matrices in 3D-Var. As is well un-
derstood, error covariance matrices in variational approaches act as weights
enabling interpolation between the background state and observations across
different dimensions. Recall that in the present form, EnRDA interpolates

                                             22
uniformly across multiple dimensions of the problem with a single displace-
ment parameter η. To make a fair comparison, a parameter α ∈ [0, 1] over
the Euclidean space is defined based on the model and observation error co-
variances as α = tr B−1 (HT R−1 H + B−1 )−1 – assuming that observation
                                             

and model errors are uncorrelated and homoscedastic. Thus, when assimilat-
ing over the Euclidean space, larger values of α assign larger weights to the
background state and vice versa.
   The state-space is characterized over a spatial domain s1 = (0, 10] and
s2 = (0, 10] with a discretization of ∆s1 = ∆s2 = 0.1. The advection-
diffusion is considered to be an isotropic process with the true model param-
eter values set as a1 = a2 = 0.08 [L/T], and D1 = D2 = 0.02 [L2 /T]. Similar
to the 1-D setting, the ground truth is considered to be a bi-variate Gaus-
sian mixture, which is obtained after evolving two Kronecker delta functions
x(s, t) = 1000 δ(s1 , s2 ) and x(s, t) = 4000 δ(s1 , s2 ) for 25 and 35 [t], respec-
tively.
   To resemble a model with systematic errors, a background state is obtained
by increasing the advective velocity isotropically set to 0.12 [L/T] while dif-
fusivity is reduced to 0.01 [L2 /T]. Thus the previous time forecast is not only
corrupted with systematic biases in the position but also in terms of its spread
and the size of its support set (Figure 5b). Observations are not considered to
have position biases. However, systematic representative errors are imposed
assuming that the sensing system has lower resolution than the model. To
that end, we evolve two Kronecker delta functions x(s, t) = 800 δ(s1 , s2 ) and
x(s, t) = 2400 δ(s1 , s2 ), which have less mass than the true state, for same

                                        23
Figure 5: The true state xtr versus the background states xb and observations y (a–c) with
systematic errors under a 2-D advection-diffusion dynamics as well as the analysis state xa
by 3D-Var (d–f) and the EnRDA (g–i) for different displacement parameters in the Euclidean
α and Riemannian space η, where the entropic regularization parameter is set to γ = 0.003.
The black plus signs show the location of the modes for the true state.

                                            24
time period of 25 and 35 [t], respectively. Then, in order to resemble poten-
tial representative error of the sensing system, the observations are up-scaled
by a factor of two through box averaging.
   As shown in Figure 5, the EnRDA preserves the shape of the state-space
very well and gradually moves the mass towards the background state as
the value of η increases. However, the expected shape of the state is not
recovered properly, in 3D-Var, especially for smaller values of α = 0.25 and
0.5 – which tend to nudge the analysis state towards the background state.
Although the higher values of α = 0.75 improves preservation of the shape of
the state-space in 3D-Var, it does so by nudging the modes of analysis state
further away from their true position towards the observations, ultimately
increasing the error metrics.
   Specifically, the bias is reduced by more than 30%, from 0.15 to 0.05, as
α increases from 0.25 to 0.75 for the 3D-Var; however, this occurs at the
expense of almost three folds increase in ubrmse, from 0.3 to 1.1. This is
primarily due to the fact that the background state has position error due
to erroneous velocity and diffusivity parameters. Since bias measures the
average magnitude of errors, the positive differences between the analysis
state and true state due to the position error is compensated by the negative
differences in their magnitude, leading to reduction in bias as the α increases.
However, ubrmse is quadratic and thus measures the average magnitude of
the error irrespective of its signs. Whereas, the bias (0.002–0.002) and ubrmse
(0.12–0.95) in EnRDA are consistently lower than the 3D-Var by almost 99%
and 60% respectively, over the reported range of η.

                                      25
4.2     Lorenz-63

4.2.1   State-space Characterization

The Lorenz system (Lorenz-63, Lorenz , 1963) is derived through truncation of
the Fourier series of the Rayleigh-Bénard convection model. This model can
be interpreted as a simplistic local weather system only involving the effect
of local shear stress and buoyancy forces. The system is expressed using
coupled ordinary differential equations that describe the temporal evolution
of three coordinates x, y, and z representing the rate of convective overturn,
horizontal, and vertical temperature variations as:
                               dx
                                  = −σ(x − y)
                               dt
                               dy
                                  = ρx − y − xz                          (21)
                               dt
                               dz
                                  = xy − βz ,
                               dt
where σ represents the Prandtl number, ρ is a normalized Rayleigh num-
ber proportional to the difference in temperature gradient through the depth
of the fluid and β denotes a horizontal wave number of the convective mo-
tion. It is well established that for parameter values of σ = 10, ρ = 28
and β = 8/3, the system exhibits chaotic behavior with the phase space re-
                                          p         p
volving around two attractors located at ( β(ρ − 1), β(ρ − 1), ρ − 1) and
   p            p
(− β(ρ − 1), − β(ρ − 1), ρ − 1).

4.2.2   Experimental Setup and Results

In this subsection, we will present the results from DA experiments conducted
on Lorenz system under systematic error and also compare the results of the

                                       26
EnRDA with particle filter. Apart from the systematic error component,
we utilize the standard experimental setting used in numerous DA studies
(Furtado et al., 2008; Miller et al., 1994; Van Leeuwen, 2010). In order to
obtain the ground truth of the model trajectory, the system initialized at
x0 = (1.508870, −1.531271, 25.46091), is integrated with a time step of ∆t =
0.01 over a time period of T = 0–20 [t] using the fourth-order Runge-Kutta
approximation (Kutta, 1901; Runge, 1895). The observations are obtained
at every assimilation time step of 40∆t by perturbing the ground truth with
                            2                2
Gaussian noise v t ∼ N (0, σobs Σρ ), where σobs = 2 is the observation error
variance and Σρ ∈ R3×3 denotes the correlation matrix with 1 on the diagonal
entries, 0.5 on the first sub and super diagonals, and 0.25 on the second sub
and super diagonals.
  In order to characterize the distribution of the background state, 10 en-
semble members of the EnRDA and 100 particles of the filter are generated
by corrupting the ground truth at T = 0 [t] with a zero-mean Gaussian noise
ω 0 ∼ N (0, σ02 I3 ), where σ02 = 2. For introducing systematic errors, the
model parameter values are set to σ 0 = 10.5, ρ0 = 27, and β 0 = 10/3. The
random errors are also introduced as the system evolves in time by adding
a Gaussian noise ω t ∼ N (0, σb2 I3 ) at every ∆t, where σb2 = 0.02. For En-
RDA, the regularization and displacement parameters are set to γ = 10 and
η = 0.05. Throughout, to draw a robust statistical conclusion about the error
statistics, the DA experiments are repeated for 50 independent and identical
simulations.
  Figure 6 shows the temporal evolution of the ground truth and the anal-

                                     27
Figure 6: Temporal evolution of the true state xtr of the Lorenz-63, observations y, mean
of 100 and 10 ensemble members in particle filter xa (PF) (first column) and EnRDA
xa (EnRDA) (second column), respectively. The temporal evolution of the ensemble mem-
bers are shown in gray. Also shown within dashed rectangles are the window of time over
which support set of observations and ensemble spread are disjoint.

ysis state by particle filter (left column) and EnRDA (right column) over
a time period of 0 to 15 [t] for one simulation. As is evident, the particle
filter is well capable of capturing the ground truth when the observations
lie within the particle spread. However, when the observations lie far apart
from the support set of particles (Figure 6, dashed box), i.e. when the pdfs
of background state and observations are disjoint, the particle filter becomes
degenerate and the analysis state (particle mean) deviates away from the
ground truth. Whereas, the EnRDA is capable of capturing the true state
remarkably well even when ensemble spread and observations are far apart

                                            28
Figure 7: Temporal evolution of bias and ubrmse along three dimensions of the Lorenz-63
for particle filter with 100 particles (a, c) and EnRDA with 10 ensemble members (b, d).
The mean values are computed over 50 independent and identical simulations.

from each other.
   The time evolution of the average bias and ubrmse for 50 independent and
identical simulations is color coded over the phase space in Figure 7. As is
evident, the average bias and ubrmse for the EnRDA is much lower than the
particle filter throughout the entire simulation period. The error statistics
show a transient behavior; however, for the EnRDA (Figure 7, left column)
error statistics remain fairly stable and much lower than that of the particle
filter (Figure 7, right column). In particular, the mean bias and ubrmse are
decreased by 36–88% and 47–55% along the three dimensions, respectively.
The expected values of the bias and ubrmse are reported in Table 1.

                                          29
Table 1: Expected values of the bias and ubrmse for the particle filter and EnRDA from 50
independent and identical simulations of Lorenz-63. Values shown inside the parentheses are
percent reduction as a result of EnRDA.

      Methods                        bias                              ubrmse
                          x           y           z           x           y           z

    Particle Filter      2.24        2.41        0.59        6.25        7.95        7.88
       EnRDA          0.26 (88.4) 0.32 (86.7) 0.38 (35.6) 2.80 (55.2) 4.22 (46.9) 4.17 (47.1)

5      Discussion and Concluding Remarks

In this study, we introduced an ensemble data assimilation (DA) method-
ology over a Riemannian manifold, namely Ensemble Riemannian DA (En-
RDA), and illustrated its advantages over classic Bayesian DA techniques
for dissipative and chaotic dynamics. We demonstrated that the presented
methodology is capable of assimilating information in the probability domain
– characterized by the families of probability distributions with finite second-
order moments. The key message is that when the forecast and observations
exhibit non-Gaussian structure and their support sets are disjoint due to
the presence of systematic errors; the translation variance of the Wasserstein
metric can be leveraged to extend geophysical forecast skills beyond what
is currently viable through classic Euclidean DA techniques. Even though,
future research for a comprehensive comparison with existing bias correction
methodologies is needed to completely characterize its relative pros and cons.
    We have explained the role of regularization and displacement parameter
in the EnRDA and empirically examined their effects on the optimal joint
histogram and consequently on the analysis state. Nevertheless, future stud-

                                                30
ies are required to characterize closed-form expressions to better understand
their implications on the forecast uncertainty. As was explained earlier, un-
like the classic Bayesian DA techniques which assimilate available information
using different relative weights across multiple dimensions dictated by the er-
ror covariance matrices; a scalar displacement parameter is utilized in the
EnRDA that balances the entire analysis state across all dimensions. Fu-
ture research can be devoted to developing a framework that utilizes a vector
representation of the displacement parameters to effectively tackle possible
heterogeneity of uncertainty across multiple dimensions.
  Although the proposed EnRDA methodology through entropic regular-
ization works properly for the presented low dimensional problems, future
research is needed to test its efficiency in operational applications in high-
dimensional geophysical DA systems with a state-space easily exceeding mil-
lion dimensions (Van Leeuwen, 2009). In particular, additional research can
be conducted to characterize the effective number of ensemble members re-
quired to sufficiently represent such a large state-space. In high-dimensional
systems, further constraining the solution of the optimal joint histogram on
a submanifold of probability distributions with a Gaussian mixture structure
can provide a way forward by significantly lowering the overall computational
cost (Chen et al., 2019b).
  Nevertheless, the effectiveness of the presented methodology in treating
systematic errors shows a great potential to improve precipitation and soil
moisture prediction in weather and land surface models, where position of
convective cells (Jones and Macpherson, 1997; Le Coz et al., 2019; Lin et al.,

                                      31
2017) and spatial dynamics of soil moisture (Dee and Da Silva, 1998; Reichle
and Koster , 2004) are often corrupted with significant biases. The intrinsic
ability to assimilate probabilistic information regardless of the shape of the
distribution further promises the methodology to be effective in the sea ice DA
framework, where the subgrid thickness is represented through probability
distribution of sea ice fractions (Asadi et al., 2019).
   Furthermore, a promising area of future research is that of extending the
presented formalism in Earth system models to assimilate a priori climatolog-
ical information obtained from in-situ gauges, ships, buoys, and radiosondes
where currently due to the existing scale gaps, assimilation of such observa-
tions into the model grids is not straightforward. A crude framework may
be cast to find the analysis state distribution as the barycenter of the three
marginals namely background state, observations, and a priori climatological
data. For such a framework, the Sinkhorn algorithm (Cuturi , 2013) can be
easily generalized to solve through iterative projections (Peyré et al., 2019).

Acknowledgements

The first and second author acknowledge the grant from the National Aero-
nautics and Space Administration (NASA) Terrestrial Hydrology Program
(THP, 80NSSC18K1528) and the New (Early Career) Investigator Program
(NIP, 80NSSC18K0742). The fifth author also acknowledge the support from
National Science Foundation (NSF, DMS1830418).

                                       32
References

Agueh, M., and G. Carlier (2011), Barycenters in the wasserstein space, SIAM Journal on
  Mathematical Analysis, 10.1137/100805741.

Altman, A., and J. Gondzio (1999), Regularized symmetric indefinite systems in interior
  point methods for linear and quadratic optimization, Optimization Methods and Software,
  11 (1-4), 275–302.

Amari, S.-i. (2012), Differential-geometrical methods in statistics, vol. 28, Springer Science
  & Business Media.

Anderson, J. L. (1996), A method for producing and evaluating probabilistic forecasts from
  ensemble model integrations, Journal of climate, 9 (7), 1518–1530.

Anderson, J. L. (2010), A non-gaussian ensemble filter update for data assimilation, Monthly
  Weather Review, 138 (11), 4186–4198.

Andersson, E., J. Pailleux, J.-N. Thépaut, J. Eyre, A. McNally, G. Kelly, and P. Courtier
  (1994), Use of cloud-cleared radiances in three/four-dimensional variational data assimi-
  lation, Quarterly Journal of the Royal Meteorological Society, 120 (517), 627–653.

Asadi, N., K. A. Scott, and D. A. Clausi (2019), Data fusion and data assimilation of ice
  thickness observations using a regularisation framework, Tellus A: Dynamic Meteorology
  and Oceanography, pp. 1–20.

Beezley, J. D., and J. Mandel (2008), Morphing ensemble kalman filters, Tellus A: Dynamic
  Meteorology and Oceanography, 60 (1), 131–140.

Benamou, J.-D., and Y. Brenier (2000), A computational fluid mechanics solution to the
  monge-kantorovich mass transfer problem, Numerische Mathematik, 84 (3), 375–393.

Berardi, M., A. Andrisani, L. Lopez, and M. Vurro (2016), A new data assimilation technique
  based on ensemble kalman filter and brownian bridges: an application to richards equation,
  Computer Physics Communications, 208, 43–53.

                                             33
Bocquet, M., C. A. Pires, and L. Wu (2010), Beyond gaussian statistical modeling in geo-
  physical data assimilation, Monthly Weather Review, 138 (8), 2997–3023.

Brenier, Y. (1987), Décomposition polaire et réarrangement monotone des champs de
  vecteurs, CR Acad. Sci. Paris Sér. I Math., 305, 805–808.

Chen, B., L. Dang, Y. Gu, N. Zheng, and J. C. Principe (2019a), Minimum Error Entropy
  Kalman Filter, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 10.1109/
  tsmc.2019.2957269.

Chen, Y., T. T. Georgiou, and A. Tannenbaum (2019b), Optimal transport for gaussian
  mixture models, IEEE Access, 7, 6269–6278.

Cullen, M., J. Norbury, and R. J. Purser (1991), Generalised lagrangian solutions for
  atmospheric and oceanic flows, SIAM Journal on Applied Mathematics, 51 (1), 20–31,
  10.1137/0151002.

Cuturi, M. (2013), Sinkhorn distances: Lightspeed computation of optimal transport, in
  Advances in neural information processing systems, pp. 2292–2300.

De Lannoy, G. J., P. R. Houser, V. R. Pauwels, and N. E. Verhoest (2007a), State and bias
  estimation for soil moisture profiles by an ensemble kalman filter: Effect of assimilation
  depth and frequency, Water resources research, 43 (6).

De Lannoy, G. J., R. H. Reichle, P. R. Houser, V. Pauwels, and N. E. Verhoest (2007b),
  Correcting for forecast bias in soil moisture assimilation with the ensemble kalman filter,
  Water Resources Research, 43 (9).

Dee, D. P. (2003), Detection and correction of model bias during data assimilation, Meteo-
  rological Training Course Lecture Series (ECMWF).

Dee, D. P. (2005), Bias and data assimilation, Quarterly Journal of the Royal Meteorological
  Society, 131 (613), 3323–3343.

Dee, D. P., and A. M. Da Silva (1998), Data assimilation in the presence of forecast bias,
  Quarterly Journal of the Royal Meteorological Society, 124 (545), 269–295.

                                             34
Doucet, A., and A. M. Johansen (2009), A tutorial on particle filtering and smoothing:
  Fifteen years later, Handbook of nonlinear filtering, 12 (656-704), 3.

Drécourt, J.-P., H. Madsen, and D. Rosbjerg (2006), Bias aware kalman filters: Comparison
  and improvements, Advances in Water Resources, 29 (5), 707–718.

Ebtehaj, A. M., and E. Foufoula-Georgiou (2013), On variational downscaling, fusion, and
  assimilation of hydrometeorological states: A unified framework via regularization, Water
  Resources Research, 49 (9), 5944–5963, 10.1002/wrcr.20424.

Ebtehaj, A. M., M. Zupanski, G. Lerman, and E. Foufoula-Georgiou (2014), Variational data
  assimilation via sparse regularisation, Tellus A: Dynamic Meteorology and Oceanography,
  66 (1), 21,789.

Evensen, G. (1994), Sequential data assimilation with a nonlinear quasi-geostrophic model
  using monte carlo methods to forecast error statistics, Journal of Geophysical Research:
  Oceans, 99 (C5), 10,143–10,162.

Evensen, G. (2003), The ensemble kalman filter: Theoretical formulation and practical im-
  plementation, Ocean dynamics, 53 (4), 343–367.

Feyeux, N., A. Vidard, and M. Nodet (2018), Optimal transport for variational data assim-
  ilation, Nonlinear Processes in Geophysics, 25 (1), 55–66.

Fréchet, M. (1948), Les éléments aléatoires de nature quelconque dans un espace distancié,
  in Annales de l’institut Henri Poincaré, vol. 10, pp. 215–310.

Furtado, H. C. M., H. F. de Campos Velho, and E. E. N. Macau (2008), Data assimilation:
  Particle filter and artificial neural networks, in Journal of Physics: Conference Series, vol.
  135, p. 012073, IOP Publishing.

Goodliff, M., J. Amezcua, and P. J. Van Leeuwen (2015), Comparing hybrid data assimila-
  tion methods on the lorenz 1963 model with increasing non-linearity, Tellus A: Dynamic
  Meteorology and Oceanography, 67 (1), 26,928.

                                              35
Gordon, N. J., D. J. Salmond, and A. F. Smith (1993), Novel approach to nonlinear/non-
  gaussian bayesian state estimation, in IEE proceedings F (radar and signal processing),
  vol. 140, pp. 107–113, IET.

Hamill, T. M. (2001), Interpretation of rank histograms for verifying ensemble forecasts,
  Monthly Weather Review, 129 (3), 550–560.

Han, X., and X. Li (2008), An evaluation of the nonlinear/non-gaussian filters for the se-
  quential data assimilation, Remote Sensing of Environment, 112 (4), 1434–1449.

Hellinger, E. (1909), Neue begründung der theorie quadratischer formen von unendlichvie-
  len veränderlichen., Journal für die reine und angewandte Mathematik (Crelles Journal),
  1909 (136), 210–271.

Hurkmans, R., C. Paniconi, and P. A. Troch (2006), Numerical assessment of a dynami-
  cal relaxation data assimilation scheme for a catchment hydrological model, Hydrological
  Processes: An International Journal, 20 (3), 549–563.

Jones, C., and B. Macpherson (1997), A latent heat nudging scheme for the assimilation
  of precipitation data into an operational mesoscale model, Meteorological Applications:
  A journal of forecasting, practical applications, training techniques and modelling, 4 (3),
  269–277.

Jordan, R., D. Kinderlehrer, and F. Otto (1998), The variational formulation of the Fokker-
  Planck equation, SIAM Journal on Mathematical Analysis, 10.1137/S0036141096303359.

Kalman, R. E. (1960), A new approach to linear filtering and prediction problems, Journal
  of basic Engineering, 82 (1), 35–45.

Kantorovich, L. V. (1942), On the translocation of masses, in Dokl. Akad. Nauk. USSR (NS),
  vol. 37, pp. 199–201.

Kollat, J., P. Reed, and D. Rizzo (2008), Addressing model bias and uncertainty in three
  dimensional groundwater transport forecasts for a physical aquifer experiment, Geophysical
  research letters, 35 (17).

                                             36
Kullback, S., and R. A. Leibler (1951), On information and sufficiency, The annals of math-
  ematical statistics, 22 (1), 79–86.

Kutta, W. (1901), Beitrag zur naherungsweisen integration totaler differentialgleichungen,
  Z. Math. Phys., 46, 435–453.

Lauritzen, S. L. (1987), Statistical manifolds, Differential geometry in statistical inference,
  10, 163–216.

Le Coz, C., A. Heemink, M. Verlaan, M.-c. ten Veldhuis, N. van de Giesen, et al. (2019),
  Correcting position error in precipitation data using image morphing, Remote Sensing,
  11 (21), 2557.

Le Dimet, F.-X., and O. Talagrand (1986), Variational algorithms for analysis and assimi-
  lation of meteorological observations: theoretical aspects, Tellus A: Dynamic Meteorology
  and Oceanography, 38 (2), 97–110.

Lin, L.-F., A. M. Ebtehaj, A. N. Flores, S. Bastola, and R. L. Bras (2017), Combined
  Assimilation of Satellite Precipitation and Soil Moisture: A Case Study Using TRMM and
  SMOS Data, Monthly Weather Review, 145 (12), 4997–5014, 10.1175/MWR-D-17-0125.1.

Lorenc, A. C. (1986), Analysis methods for numerical weather prediction, Quarterly Journal
  of the Royal Meteorological Society, 112 (474), 1177–1194.

Lorenz, E. N. (1963), Deterministic nonperiodic flow, Journal of the atmospheric sciences,
  20 (2), 130–141.

Mandel, J., and J. D. Beezley (2009), An ensemble kalman-particle predictor-corrector filter
  for non-gaussian data assimilation, in International Conference on Computational Science,
  pp. 470–478, Springer.

McCann, R. J. (1997), A convexity principle for interacting gases, Advances in mathematics,
  128 (1), 153–179.

Miller, R. N., M. Ghil, and F. Gauthiez (1994), Advanced data assimilation in strongly
  nonlinear dynamical systems, Journal of the atmospheric sciences, 51 (8), 1037–1056.

                                              37
Monge, G. (1781), Mémoire sur la théorie des déblais et des remblais, Histoire de l’Académie
  Royale des Sciences de Paris.

Moradkhani, H., S. Sorooshian, H. V. Gupta, and P. R. Houser (2005), Dual state–parameter
  estimation of hydrological models using ensemble kalman filter, Advances in water re-
  sources, 28 (2), 135–147.

Nakano, S., G. Ueno, and T. Higuchi (2007), Merging particle filter for sequential data
  assimilation, Nonlinear Processes in Geophysics, 10.5194/npg-14-395-2007.

Nash, J. C. (1990), Compact numerical methods for computers: linear algebra and function
  minimisation, CRC press.

Ning, L., F. P. Carli, A. M. Ebtehaj, E. Foufoula-Georgiou, and T. T. Georgiou (2014),
  Coping with model error in variational data assimilation using optimal mass transport,
  Water Resources Research, 50 (7), 5817–5830.

Orlin, J. B. (1993), A faster strongly polynomial minimum cost flow algorithm, Operations
  research, 41 (2), 338–350.

Otto, F. (2001), The geometry of dissipative evolution equations: The porous medium equa-
  tion, Communications in Partial Differential Equations, 10.1081/PDE-100002243.

Park, S. K., and D. Županski (2003), Four-dimensional variational data assimilation for
  mesoscale and storm-scale applications, Meteorology and Atmospheric Physics, 82 (1-4),
  173–208.

Pennec, X. (2006), Intrinsic statistics on Riemannian manifolds: Basic tools for geometric
  measurements, Journal of Mathematical Imaging and Vision, 10.1007/s10851-006-6228-4.

Peyré, G., M. Cuturi, et al. (2019), Computational optimal transport, Foundations and
  Trends R in Machine Learning, 11 (5-6), 355–607.

Pires, C., R. Vautard, and O. Talagrand (1996), On extending the limits of variational
  assimilation in nonlinear chaotic systems, Tellus A, 48 (1), 96–121.

                                              38
You can also read