Some Statistic and Information-theoretic Results On Arithmetic Average Fusion

Page created by Cecil Zimmerman
 
CONTINUE READING
VOL.XX, NO.XX, 1 OCT. 2021                                                                                                                    1

                                          Some Statistic and Information-theoretic Results On
                                                      Arithmetic Average Fusion
                                                                                                           Tiancheng Li

                                             Abstract—Finite mixture such as the Gaussian mixture is a                 of the same family is usually a mixture of distributions of
                                          flexible and powerful probabilistic modeling tool for representing
arXiv:2110.01440v1 [math.ST] 1 Oct 2021

                                                                                                                       that family. That is, the AA-based f -fusion always leads to a
                                          the multimodal distribution widely involved in many estimation               mixture. In the mixture distribution, components/mixands are
                                          and learning problems. The core of it is representing the target
                                          distribution by the arithmetic average (AA) of a finite number               properly weighted and correspond to the information gained
                                          of sub-distributions which constitute the mixture. The AA fusion             from different fusing sources. They joint approximate the
                                          has demonstrated compelling performance for both single-sensor               target distribution p(X) by their average/AA:
                                          and multi-sensor estimator design. In this paper, some statistic
                                          and information-theoretic results are given on the AA fusion
                                                                                                                                                     X
                                                                                                                                           fAA (X) =     wi fi (X),               (1)
                                          approach, including its covariance consistency, mean square er-
                                                                                                                                                       i∈I
                                          ror, mode-preservation capacity, mixture information divergence
                                          and principles for fusing/mixing weight design. In particular,
                                          based on the concept of conservative fusion, the relationship of
                                                                                                                       where X stands for the state of a single or multiple tar-
                                          the AA fusion with the existing covariance union, covariance                 get(s), w = [w1 , w2 , ...]T are positive mixing/fusing weights
                                          intersection and diffusion combination approaches is exposed.                which are typically normalized, namely wT 1 = 1, and
                                          Linear Gaussian models are considered for illustration.                      fi (X), i ∈ I = {1, 2, · · · , } are the probability distributions,
                                              Index Terms—Finite mixture, multi-sensor fusion, distributed             e.g., probability density function (PDF) and probability hy-
                                          filter, conservative estimation, average consensus, covariance               pothesis density function [6]–[8], [19], [20], regarding the
                                          intersection, diffusion, covariance union                                    same target(s) yielded by a set of estimators i ∈ I conditioned
                                                                                                                       on different data, models and/or hypotheses.
                                                                   I. I NTRODUCTION                                        The mixture distribution facilitates the closed-from Markov-
                                                                                                                       Bayesian recursion greatly in two means: First, a mixture

                                          T     HE last two decades have witnessed a steady uptick in the
                                                application of information fusion technologies to the state
                                          estimation problem which has burgeoned with the vitalization
                                                                                                                       of conjugate priors is also conjugate and can approximate
                                                                                                                       any kind of prior [21], [22]. Second, the linear fusion of
                                                                                                                       a finite number of mixtures of the same parametric family
                                          of networked sensors/agents [1]–[4] and has seen substantial                 remains a mixture of the same family. These properties play a
                                          interest in both military and commercial realms. One of the                  key role in the mixture filters such as GM filter [23], [24],
                                          most fundamental fusion approaches is the linear opinion                     Student’s-t mixture filter [25] and multi-Bernoulli mixture
                                          pool which simply mixes all information into one entity in                   filters of various forms [19], [26], [27]. The AA fusion has
                                          a linear manner. This results in a finite mixture representation             demonstrated outstanding performances in many challeng-
                                          of the underlying state probability distribution [5], such as the            ing scenarios [6]–[8], [11]–[17]. Nevertheless, statistical and
                                          popular Gaussian mixture (GM). Recently, it has been further                 information-theoretical study on the mixture/AA fusion of
                                          shown that the linear opinion pool has provided a compelling                 probability distributions (of the same family or not) seems
                                          approach, referred to as arithmetic average (AA) fusion, to                  still missing in two aspects, which motivate this paper.
                                          multi-sensor random set information fusion [6]–[17], which
                                                                                                                         1) First, while the concept of conservative estimation and
                                          enjoys high efficiency in computation, resilience to sensor fault
                                                                                                                            fusion has been well accepted, the AA fusion results
                                          (such as misdetection), tolerance to internode correlation of
                                                                                                                            in an inflated covariance which seems at variant with
                                          any degree, and closure for mixture fusion.
                                                                                                                            the minimum variance estimator/fusion. This unavoid-
                                             Two common types of data to be fused are variables and
                                                                                                                            ably raises a concern on the decreased accuracy if one
                                          probability distribution/functions, typically such as the esti-
                                                                                                                            simply swaps the inflated variance with increased mean
                                          mated number of targets and the posterior distribution of the
                                                                                                                            square error (MSE) in the context of fusion. We in
                                          target states, hereafter referred to as v-fusion and f -fusion [18],
                                                                                                                            this paper clarify their major difference and provide in-
                                          respectively. Averaging operation is significantly different in
                                                                                                                            depth analysis on the covariance consistency and “mode-
                                          two types of fusion: The average of multiple variables is
                                                                                                                            preservation” feature of the AA fusion.
                                          simply a single variable while the average of distributions
                                                                                                                         2) Second, we analyze how does the AA fusion compare
                                            Manuscript first submitted on 1st Oct. 2021                                     with the fusing estimators and how the fusing weights
                                            This work was partially supported by National Natural Science Foundation        should be designed in order to maximize the fusion
                                          of China under grants 62071389.                                                   gain. By these theoretical studies, the use of the AA
                                            T. Li is with the Key Laboratory of Information Fusion Technology
                                          (Ministry of Education), School of Automation, Northwestern Polytechnical         fusion approach for multi-sensor estimator design is
                                          University, Xi’an 710129, China, e-mail: t.c.li@nwpu.edu.cn                       better motivated.
VOL.XX, NO.XX, 1 OCT. 2021                                                                                                              2

   The remainder of this paper is organized as follows. In               We consider a set of estimate pairs composed of state
Sec. II, we revisit and compare several classic conservative         estimate x̂i and associated positive definite error covariance
estimation approaches and clarify the difference between the         matrix Pi , i ∈ I = {1, 2, 3, ...}, which are to be fused
covariance and MSE of an estimator. Moreover, as the unique                     P normalized weights w , {w1 , w2 , ...}, where
                                                                     using positive,
feature of the AA fusion, it adaptively switches between             wi > 0, i∈I wi = 1. Hereafter, x̂i is given by the EAP
mixing and merging fusing components according to the need           estimator unless otherwise stated. Then, each estimate pair
and can thereby preserve the mode information. This leads to         corresponds to the first and second moments of a posterior
the fault-tolerant capacity to deal with inconsistent-estimators.    PDF fi (x) thatR is an estimate of Rthe true distribution      p(x),
In Sec. III, we study the exact divergence of the mixture from       i.e., x̂i =      x̃fi (x̃)dx̃, Pi = (x̃ − x̂i )(·)T fi (x̃)dx̃. In
the true/target distribution, providing an information-theoretic     particular, a Gaussian PDF can be uniquely determined by an
justification for the AA fusion. Then, we discuss principles         estimate pair. Fusing in terms of only the mean and variance
for fusing weight design. Some of the results are not limited        implicitly imposes linear Gaussian model assumption. We use
to linear Gaussian models, but we use linear Gaussian models         the shorthand writing (x − y)(·)T := (x − y)(x − y)T . The
for illustration. We briefly conclude in Sec. IV.                    MSE of x̂i is denoted as MSEx̂i = E[(x − x̂i )(·)T ]. The KL
                                                                     divergence of the probability  R distribution  p(x) relative to f (x)
    II. C ONSERVATIVE F USION AND S TATISTICS OF AA                                                           f (x)
                                                                     is given as DKL (f ||p) = f (x) log p(x) dx.
   In the context of time-series state estimation, optimality is     Remark 1. Pi indicates the variance of the state estimate
usually sought such as minimum mean square error (MMSE),             x̂i , which can be interpreted as the level of uncertainty of
maximum a posteriori (MAP) or minimized Bayes risk [28],             the estimator i on its state estimate x̂i . It is, however, not an
resulting in different classes of estimators: MMSE point es-         estimate/approximate of MSEx̂i .
timator and Bayes-optimal density estimator. There is a key              Definition 1 (conservative). An estimate pair (x̂, Px̂ ) regard-
difference between two optimal criterion: the former relies on       ing the real state x, is conservative [33]–[36] when
the statistics of the estimator such as the mean and square error
[29], and the latter on the overall quality of the posterior for                               Px̂ ≥ MSEx̂ ,
which a proper distribution-based metric such as the Kullback        i.e., Px̂ − E[(x − x̂)(·)T ] is positive semi-definite.
Leibler (KL) divergence is useful.                                      The notion is also referred to as pessimistic definite [37] or
   The very nature of optimal fusion whether in the sense of         as covariance consistent. Extended definition of the conserva-
either MMSE or Bayes needs the knowledge of the cross-               tiveness of PDFs can be found in [38], [39]. A relevant notion
correlation or dependence among the information sources              of “informative” is given as follows
[29]–[32]. However, this often turns out to be impractical              Definition 2 (informative). An estimate pair (x̂1 , P1 ) is said
due to the complicated correlation and unknown common                to be more informative than (x̂2 , P2 ) regarding the same state
information among sensors/agents. This leads to suboptimal,          x when P1 < P2 .
covariance consistent solutions including the AA fusion.                With respect to the type of data, there are two forms of AA
                                                                     fusion as follows:
A. Notations, Concepts and Definitions
                                                                        • In a point estimation problem, the AA v-fusion is carried
   Note that the mean of the multitarget distribution corre-               out with regard to these state estimate variables x̂i , i ∈ I
sponds to the state of no target while the variance relies on not          which yields a new variable as follows:
only the estimation uncertainty but also on the distance among                                          X
targets. Therefore, statistics such as the mean and variance do                                 x̂AA =       wi x̂i .                 (4)
not apply to the multi-target estimator. In this section, we limit                                       i∈I
statistics analysis with respect to a single target only.              •   In the Bayesian formulation, the estimation problem is to
   In the following, we use x ∈ Rnx to denote the nx -                     find a distribution that best fits p(x). The corresponding
dimensional state of the concerning single-target which is the             AA f -fusion is carried out with regard to fi (x), i ∈ I
quantity to be estimated, namely the true state. It can be a non-          which yields a mixture distribution of these fusing distri-
random/deterministic or a random vector. When it is a random               butions as follows:
vector, we use p(x) to denote the corresponding PDF, namely                                             X
the true distribution of x. For a given Bayesian posterior f (x)                             fAA (x) =       wi fi (x).           (5)
                                                                                                         i∈I
which is an estimate to the real distribution p(x), the two
most common state estimate extraction approaches, namely the            In either way, as long as the fused data can be split to two
expected a posteriori (EAP) and MAP estimators are given as          parts depending on whether they are shared by all the fusing
follows.                                                             parties: non-common part and common part, it is obvious that
                                                                     the AA fusion avoids double-counting the common data as
                               Z
                     x̂EAP = x̃f (x̃)dx̃,                      (2)   long as the fusing weights are normalized. This is the key
                     x̂MAP = arg max f (x̃).                  (3)    to deal with common a-priori information [30], [31] and data
                                    x̃                               incest [40] when a sensor network is employed where some
That is, in the EAP estimator, the state estimate is given as        information can easily be replicated and repeatedly transmitted
the mean of the posterior while in the MAP the state estimate        to the local sensors. Meanwhile, non-common information will
is given as the peak of the posterior distribution.                  be diluted in the result: counted less than a unit [16], which
VOL.XX, NO.XX, 1 OCT. 2021                                                                                                             3

implies an over-conservativeness. In other words, in order to         According to the expressions (7), (8) and (9), we have a
avoid information double-counting [41], the AA fusion treats        conservative fusion chain:
the unknown correlation as the worst case: all fusing data are
                                                                             PNaive < min(Pi ) ≤ PlCU ≤ PAA ≤ PuCU ,               (12)
identical. In this paper, we refer to this worst-case handling                           i∈I
ability loosely as robustness and so the AA fusion can be           where mini∈I (Pi ) = PlCU holds if and only if (iif) x̂AA = x̂i
deemed a robust fusion approach.                                    in (8), PlCU = PAA = PuCU hold iff P̃i = P̃j , ∀i 6= j and
                                                                                              X          −1
B. Conservative Fusion: CU versus AA                                                 PNaive =        P−1      ,               (13)
                                                                                                       i
  Based on the concept of conservativeness, there are a                                              i∈I
number of results as given in the following Lemmas.                 which corresponds the covariance of theQmultiplied Gaussian
Lemma 1. A sufficient condition for the fused estimate pair         distributions, i.e., N (µNaive , PNaive ) = i∈I Ni (µi , Pi ).
(x̂AA , PCU ) in which at least one pair is unbiased and               The above chain implies that the AA fusion trades off
conservative, to be conservative is that                            between conservative and informative, lying in the middle of
                                                                    the two conditions: (i) all fusing estimators are conservative
            PCU ≥ Pi + (x̂AA − x̂i )(·)T , ∀i ∈ I ,          (6)    and (ii) at least one fusing estimator is conservative.
which is upper bounded by
                                                                    C. Conservative Fusion: More or Less Conservative
             PuCU = max Pi + (x̂AA − x̂i )(·)T .
                                              
                                                             (7)
                       i∈I                                            In contrast to the AA fusion (5), the geometric average (GA)
   Proof for Lemma 1 can be found in [7]. It actually pro-          of the fusing sub-PDFs fi (x) is given as
vides a conservative fusion method which is known as the
                                                                                                      Y
                                                                                      fGA (x) = C −1      fiwi (x).            (14)
covariance union (CU) [34], [42], [43]. It is fault tolerant                                              i∈I
as it preserves the covariance consistency as long as at least                                   −1
                                                                                   Q        wi
one fusing estimator is consistent. When all fusing estimators      where C :=         i∈I fi (x)         is the normalization constant.
are conservative, i.e., Pi ≥ (x − x̂i )(·)T , ∀i ∈ I, a more        In the linear Gaussian case with respect to estimate pairs
informative estimator can be obtained by the following Lemma        (x̂i , Pi ), i ∈ I , it is given as follows
                                                                                                           X
Lemma 2. For a set of conservative estimate pairs (x̂i , Pi ),                         x̂GA (w) = PGA          wi P−1
                                                                                                                   i x̂i ,         (15)
i ∈ I = {1, 2, · · · }, a sufficient condition for the fused                                              i∈I
                                                                                                                   −1
estimate pair (x̂AA , PlCU ) to be conservative is given by
                                                                                                 X
                                                                                    PGA (w) =             wi P−1
                                                                                                              i          .         (16)
            PlCU ≥ min Pi + (x̂AA − x̂i )(·)T .
                                                  
                                                            (8)                                     i∈I
                       i∈I
                                                                       As a special case of the GA fusion, the covariance intersec-
  When the local estimation is given in terms of Bayesian
                                                                    tion (CI) fusion optimizes the fusing weights as follows
posterior fi (x) with the corresponding state estimate x̂i and
associated positive definite error covariance matrix Pi , i ∈ I,                        wCI = arg min Tr(PGA ),                    (17)
                                                                                                     w
the AA fusion has the following results
                                                                    where Tr(P) calculates the trace (or the determinant) of matrix
Lemma 3. The AA f -fusion (5) results in a mixture distribu-        P.
tion for which the mean and covariance are respectively given          This indicates x̂CI = x̂GA (wCI ), PCI = PGA (wCI ). A
by (4) and                                                          variety of strategies and approaches have been reported for
                                                                    further reducing the error covariance metric, leading to various
                             X
                     PAA =      wi P̃i ,                  (9)
                                 i∈I
                                                                    CI-like fusion approaches such as the so-called split-CI [46] /
                                                                    bounded covariance inflation [47], ellipsoidal intersection [48],
where the adjusted covariance matrix is given by
                                                                    inverse CI (ICI) [49], [50].
                   P̃i = Pi + (x̂AA − x̂i )(·)T .            (10)      It is worth noting that, convex combinations have been
                       R                 R P                        widely, earlier considered in the context of adaptive filtering
Proof.
P       First, x̂AA = x̃fAA (x̃)dx̃ = x̃ i∈I wi fi (x̃)dx̃ =        [51]–[54] and of Kalman filters [55], as referred to as diffu-
       w
   i∈I i x̂i . Proof of (9) for fusing two Gaussian distributions   sion. That is, the diffusion combination operation is actually
can be found in Appendix B of [16], which can be easily             the AA fusion of merely the point state estimates as in (4)
extended to any finite number of fusing distributions. The          without any adjusting the local error covariance matrix. Instead
Appendix B of [16] further showed that, the above results           of using any of the above adjusted/fused covariance, the local
x̂AA and PAA correspond to the moments of the merged single         estimated error covariance matrix does not change in the
Gaussian distribution after applying the merging approach [44]      diffusion combination operation at each local node i, i.e.,
to all Gaussian distributions in the mixture.                       Pdiff = Pi ≥ PCI .
  In fact, the Gaussian PDF that best fits the AA mixture has          In contrary to this trend, it is our observation that the GA
the same first and second moments [45, Theorem 2], i.e.,            fusion is often not too conservative but insufficient in cluttered
                                                                   scenarios which may suffer from out-of-sequential measure-
      (x̂AA , PAA ) = arg min DKL fAA kN (µ, P) .        (11)       ment [42] or spurious data [43] and model mismatching. The
                             (µ,P)
VOL.XX, NO.XX, 1 OCT. 2021                                                                                                                    4

reason is simply that the fused covariance PCI does not take           Proof. The proof is straightforward as follows,
into account x̂CI or any fusing state estimate x̂i , i ∈ I as both                   h       X          T i
AA fusion and CU fusion do, not to mention any higher order             MSEx̂AA =E x −           wi x̂i ·
moments of the posterior distribution. If any fusing estimator                                   i∈I
                                                                                       X                    X
is covariance-inconsistent, the GA fusion will very likely be                      =         wi2 MSEx̂i +           2wi wj Cov(x̂i , x̂j ) (24)
inconsistent. In this case, a more conservative fusion approach                        i∈I                  i 0, ∀i ∈ I, i∈I wi = 1.
fusion chain:                                                          Remark 2. For independent estimators of equivalent covari-
     PNaive < PICI < PCI ≤ Pdiff ≤ PAA , PFFCC ,                (18)   ance Pi = Pj , ∀i 6= j ∈ I, the optimal fusion is the
                                                                       aforementioned naive fusion which yields the same fused
where Pdiff = PCI = PAA holds iif all fusing estimators are            state estimate x̂Naive = x̂AA but a smaller covariance as
identical, and PCI = PFFCC holds iif δ = 1.                            compared with the AA fusion except that all fusing estimators
                                                                       are identical. That is, the state estimate of the AA fusion in
                                                                       this case is optimal but it is not confident and conservatively
D. State Estimate MSE of AA f -Fusion                                  uses an inflated covariance.
   There is a significant difference between EAP and MAP:                 When MSEx̂i = MSEx̂j , wi = wj , ∀i 6= j ∈ I, (25) will
                                                                                                1
only when the EAP estimator is implemented, the AA f -fusion           reduce to MSEx̂AA = |I|    MSEx̂i , which indicates that the AA
(5) will lead to the AA v-fusion (4), i.e.,                            fusion can significantly benefit in gaining lower MSE in the
                          Z                                            case that all fusing estimators have equivalent MSEs and are
                    EAP                                                conditionally independent of each other. This does not matter
                  x̂AA = x̃fAA (x̃)dx̃
                                                                       what are their respective associated error covariances. In fact,
                          X Z
                        =      wi x̃fi (x̃)dx̃                         if these Pfusing estimators are overall negatively correlated,
                                 i∈I                                   namely, i 0.
                                          
                                                     (23)              densities), un-weighted GA fusion (i.e., (15) and (16) using
                i∈I
                                                                       w1 = w2 = 0.5), CI fusion (i.e., (15) and (16) using optimized
VOL.XX, NO.XX, 1 OCT. 2021                                                                                                                                   5

Fig. 1. Fusing two Gaussian densities having four different levels of divergences, using naive fusion, GA fusion, AA fusion or CU fusion, all using the
same fusing weights. (a): two densities overlap largely and both estimators are likely to be conservative. (b): two densities are offset from each other but
still overlap somehow. (c) and (d): two densities are greatly offset from each other and at most one estimator is conservative. Note: when the fusing weights
are optimized as shown in (17), the GA fusion reduces to the CI fusion which will have smaller covariance. The naive AA fusion is given the union of the
re-weighted two fusing densities which may not be merged to one in practice (especially when they are unsimilar/divergent, like in (c)) — This is the unique
feature of the AA fusion which does not always merge pieces of information into one. What has been shown in blue is the merged result, which is only
reasonable when the components are close/similar, like in (a), but not in (d).

weight as (17) which results in w1 = 0.3764, w2 = 0.6236
                                                                                                    GM/Gauss 1                AA
in this case), un-weighted AA fusion (i.e., (4) and (9) using                                                                Fusion
                                                                                                           
w1 = w2 = 0.5) with and without component merging, and                                                                         +           MAP
                                                                                                           
two versions of the CU fusion with fused covariance given as                                                                adaptive       Output
                                                                                                                           mixture
in (7) (referred to as CU max) and in (8) (referred to as CU                                         GM/Gauss n             merging
min), respectively. These results confirm the two conservative
fusion chains (12) and (18): First, all these conservation fusion                                                             Adaptive Switching
                                                                                                                            Between EAP and MAP
approaches lead to obvious covariance inflation as compared
with the naive fusion. Second, the CU and AA fusion have
                                                                                   Fig. 2. AA fusion with adaptive mixture merging and MAP performs
more or less greater inflation than the GA/CI fusion.                              equivalently as an estimator that adaptively switches between EAP and MAP.
   In all four scenarios, without the knowledge of the true
target position, we cannot tell whether any of the seven fusion                    distributions and the fused result as follows [16], [58], [59]
schemes is better than the others, no matter the state estimate
                                                                                                            X                 
                                                                                          fAA (x) = arg min     wi DKL fi ||g ,                (27)
is given by the EAP or MAP. Even in scenario (a) these is                                                      g
                                                                                                                      i∈I
no guarantee that the target is localized in the intersection of                                                                   X                 
                                                                                          fGA (x) = arg         R    min                 wi DKL g||fi .   (28)
P1 and P2; if it is not, then both naive fusion and GA/CI                                                  g:       χ
                                                                                                                      g(x)δx=1
                                                                                                                                   i∈I
fusion will likely produce incorrect results. In scenarios (c)
and (d), at least one of the local densities to be fused (P1                       It has been pointed out in [60] that the forward KL divergence
or P2) is inconsistent in the sense that there is a large offset                   as shown in (28) but not (27) has a tendency towards the
from the target position, whatever it may be 1 . In these cases,                   merging operation no matter how separated the components
the AA fusion opts not to merge two components to one to                           are; see illustrative examples studied therein. Such a tendency
avoid producing incorrect results. This demonstrates an unique                     may be preferable in applications such as MMSE based
feature of the AA fusion: When the fusing distributions diverge                    estimation, namely, (2), but may lead to a loss of the important
significantly with each other (such as in (d)), it will not merge                  details of the mixture, e.g., the mode, which is less desirable
them but only re-weight them and preserve the original modes;                      in applications such as MAP estimation, namely, (3). This
so is done in existing AA fusion approaches which outperform                       explains the advantage of the AA fusion in dealing with in-
their competitors in various scenarios [6]–[17]. We refer to this                  consistent fusing estimators, namely “fault-tolerant”. Arguably
as the mode-preservation capacity of the AA fusion.                                speaking, when component merging is adaptively applied with
                                                                                   the AA fusion and the state estimates are extracted in the
  There is a theoretical explanation. Recall that the AA and                       MAP mode, they joint perform equivalently to the scheme that
GA fusion rules symmetrically minimize the weighted sum of                         adaptive switches between EAP and MAP for state estimate
the directional KL divergences between the fusing probability                      output. This can be illustrated as in Fig. 2.
                                                                                      There are other reasons why a local fusing estimator may be
                                                                                   grossly incorrect, including supernormal noise, missed detec-
                                                                                   tion and false alarms at the respective sensor, while the other
   1 This is common in mixture-type filters, since at any specific time-instant,   fusing estimators may still provide an accurate representation
the centers of most components do not necessarily conform to the state of          of the target state. In any time-instant, it is also possible that
any target even through the mixture filter might be statistically unbiased. This
can be easily illustrated in the GM-based random finite set filters of which       different local estimators provide accurate representations of
Matlab codes are available at http://ba-tuong.vo-au.com/codes.html                 different targets. In these cases, the AA fusion can be expected
VOL.XX, NO.XX, 1 OCT. 2021                                                                                                         6

to compensate the effects of local misdetection, false alarm          As shown above, the component that fits the target dis-
and noises through averaging and tends to be more robust and       tribution better (corresponding to smaller DKL (fi kp)) and
accurate, in the sense that its mode-preservation feature makes    diverges more from the average (corresponding to greater
better use of the correct parts of the information provided by     DKL (fi kfAA )) will be assigned with a greater fusing weight.
the fusing estimators.                                             However, the true/target distribution p(x) is always unknown,
Remark 3. In the sequential filtering problem, new data will       so is DKL (fi kp). It is also obvious that even if the true
help identify which component is correct and which is false        distribution p(x) is available, the knowledge DKL (fi kp) <
and can therefore be pruned. That is, the AA fusion does not       DKL (fj kp), ∀j 6= i does not necessarily result in wi = 1, wj =
forcefully merge/prune conflicting information but leave the       0, ∀j 6= i. It also depends on DKL (fi kfj ), ∀j 6= i.
decision to the new data.                                          Remark 4. As long as at least two fusing estimators are
                                                                   consistent, the optimal fusion will not be fully dominated by
III. I NFORMATION D IVERGENCE I N BAYESIAN         VIEWPOINT       any one component. In other words, when the fusing weights
                                                                   are properly designed, the average of the mixture may fit the
   In the Bayesian formulation, the real state is considered
                                                                   target distribution better than the best component.
random and the Bayesian posterior is given in the manner of
                                                                      A simplified alternative is ignoring the former part in (31)
an estimate to the true distribution p(x) of x (or p(X) of a       which will then be reduced approximately to the following
multi-target set X). In this section we use the KL divergence
                                                                   suboptimal maximization problem
to measure the quality of distribution f (x) with regard to the                             X
real state distribution p(x).                                            wsubopt = arg max      wi DKL (fi kfAA ),              (32)
                                                                                       w
                                                                                           i∈I
                                                                                           X                             
A. Mixture Divergence                                                          = arg max         wi H(fi , fAA ) − H(fi ) ,    (33)
                                                                                       w
                                                                                           i∈I
Lemma 5. For a number of probability distributions fi (x), i ∈                            R
I, the KL divergence of the target distribution p(x) relative      where H(f, g) := − f (x) log g(x)δx is the cross-entropy
to their average fAA (x) is given as                               of distributions f and g, also called Shannon entropy, and
                   X                                              H(f ) := H(f, f ) is the differential entropy of distribution
    DKL (fAA kp) =     wi DKL (fi kp) − DKL (fi kfAA)    (29)      f (x).
                      i∈I                                             The suboptimal, practically operable, optimization given by
                                                                   (32)/(33) assigns a greater fusing weight to the distribution
                      X
                  ≤         wi DKL (fi kp)                 (30)
                      i∈I                                          that diverges more from the others. This can be referred to
                                                                   as a diversity preference solution. An alternative solution is to
where the last equation holds iif all fusing sub-distributions
                                                                   resort to some functionally-similar divergences or metrics to
fi , i ∈ I are identical.
                                                                   assign higher weights to the components that fit the data better,
Proof. The proof is straightforward and is independently           namely having a higher likelihood. This likelihood driven
found in [61]. Similar results have been given earlier in the      solution is the key idea for weight updating in most mixture
textbook [62, Theorem 4.3.2]. Noticing that the KL divergence      models/filters.
function is convex, this can also be proved by using Jensen’s         Nevertheless, one may design the fusing weights for some
inequality [63, Ch. 2.6].                                          other purposes, e.g., in the context of seeking consensus over
                                                                   a peer-to-peer network [1], [67], they are typically designed
   Lemma 4 indicates that the average of the mixture fits the      for ensuring fast convergence [7], [8], [13], [16].
target distribution better than all component sub-distributions
on average. This therefore provides an information-theoretic       C. Max-Min Optimization
justification for the AA fusion. Optimized mixing weights will       Recall the divergence minimization (27) that the AA fusion
accentuate the benefit of fusion.                                  admits [13], [59]. Now, combining (27) with (32) yields joint
                                                                   optimization of the fusing form and fusing weights as follows
B. Fusing Weight                                                                                     X
                                                                        fAA (wsubopt ) = arg max min     wi DKL (fi kg).     (34)
                                                                                              w      g
   The naive weighting solution is the normalized uniform                                                i∈I
weights [64], [65], namely w = 1/S. That is, all fusing              This variational fusion problem (34) resembles that for
estimators are treated equally which makes sense for fusing        geometric average (GA) fusion [68], [69], i.e.,
information from homogeneous sources. This is simple but                                            X
does not distinguish online the information of high quality            fGA (wsubopt ) = arg max min     wi DKL (gkfi ). (35)
                                                                                              w      g
from that of low at any particular time. That being said, it                                             i∈I

does not necessarily mean a worse performance as compared             It has actually been pointed out that the suboptimal fusion
with online tuned fusing weights [66].                             results for both variational fusion problems have equal KL
   More convincingly, the optimal solution should minimize         divergence from/to the fusing sub-distributions [68]. That is,
DKL (fAA kp) in order to best fit the target distribution, i.e.,   ∀i 6= j ∈ I
                   X                                                  DKL (fi kfAA (wsubopt )) = DKL (fj kfAA (wsubopt )),    (36)
   wopt = arg min     wi DKL (fi kp) − DKL (fi kfAA ) . (31)
                 w
                      i∈I                                              DKL (fGA (wsubopt )kfi ) = DKL (fGA (wsubopt )kfj ),    (37)
VOL.XX, NO.XX, 1 OCT. 2021                                                                                                                7

which implies that the suboptimal, diversity preference fusion         When the target distribution is unknown and the diversity
tends to revise all fusing estimators equivalently, resulting in     preference solution as given in (32) is adopted, (39) approxi-
a middle distribution where the AA and GA differs from each          mately reduces to
other due to the asymmetry of the KL divergence.                                        X                          
                                                                     wsubopt ≈ arg max      wi DKL fi kfAA,merged
Remark 5. The above max-min solution   P is suboptimal, which
                                                                                     w
                                                                                         i∈I
has ignored the minimization over        i∈I wi DKL (fi kp) and
                                                                                                 
                                                                                         X                           det(PAA )
                                                                                           wi tr P−1
                                                                                                         
prefers diversity. Derivation for (37) has been earlier given in             = arg max              AA Pi − nx + log
[35], [70] which is related to the Chernoff information [63],
                                                                                     w                                det(Pi )
                                                                                       i∈I
                                                                                                             
[71]. Weighted middle is suggested in [72] which is assigning
different weights on both sides of (37).                                       + (µi − µAA )T P−1
                                                                                               AA (µi − µAA ) .            (40)

                                                                        Both (39) and (40) can be solved exactly. However, such an
                                                                     approximation based on fitting the GM by a single Gaussian is
D. Case Study: Gaussian Fusion                                       poor in accuracy [75]. As explained in Sec. II-E and in [60],
                                                                     the merging may lead to a loss of the important details of
  We now consider the case of Gaussian distribution, denoted
                                                                     the mixture, such as the sub-peaks and modes. More general
by N (x; µ, P) with nx -dimensional mean vector µ and error
                                                                     result of the KL divergence between multivariate generalized
covariance matrix P. The probability density function of the
                                                                     Gaussian distributions can be found in [76]. The KL diver-
nx -dimensional Gaussian distribution N (x; µ, P) is given by
                                                                     gence between two Bernoulli random finite set distributions
                        1
                                  
                                    1
                                                                    with Gaussian single target densities is given in [77].
                                          T −1
N (x; µ, P) =                  exp − (x−µ) P (x−µ) .                    2) Bound Optimization: Using Jensen’s inequality on the
              (2π)nx /2 |P|1/2      2
                                                                     convex function − log(x), i.e., f (E[x]) ≤ E[f (x)], a lower
                                                                     bound can be found for H(fi , fAA ) as follows
   The KL divergence of f1 (x) := N (x; µ1 , P1 ) relative to                              Z
f2 (x) := N (x; µ2 , P2 ) is given as                                     H(fi , fAA ) = − fi (x) log fAA δx
                                                                                              Z               
          DKL N (x; µ1 , P1 )kN (x; µ2 , P2 )
                                                                                      ≥ − log     fi (x)fAA δx
                1                              det(P2 )
                   tr P−1
                                
              =          2 P1 − nx + log
                                                                                               X Z                      
                2                              det(P1 )                                = − log      wj fi (x)fj (x)δx .       (41)
                                                
                              T   −1                                                                 j∈I
                + (µ1 − µ2 ) P2 (µ1 − µ2 ) .                  (38)
                                                                     Considering Gaussian distributions, i.e., fi (x)               =
                                                                     N (x; µi , Pi ),fj (x) = N (x; µj , Pj ), we have the integration
   The KL divergence between Gaussians follows a relaxed
                                                                     of the product of two Gaussian distributions as
triangle inequality and small KL divergence further shows                  Z
approximate symmetry [73]. However, for the KL divergence                     fi (x)fj (x)δx = N (µi ; µj , Pi + Pj ) , zi,j .    (42)
between two GMs, there is no such closed form expression.
Beyond the Monte Carlo method [74], a number of approx-              Furthermore, for a single Gaussian density fi (x) =
imate, exactly-expressed approaches have been investigated           N (x; µi , Pi ), the differential entropy is given by
[75]. In the following we consider two alternatives.                                             1
                                                                                       H(fi ) = log (2πe)nx |Pi | .
                                                                                                                     
   1) Moment Matching-Based Approximation: The first is                                                                         (43)
                                                                                                 2
merging the mixture to a single Gaussian, or to say, fitting           Combing (41), (42) and (43) leads to
the GM by a single Gaussian. Then, the divergence of two                                                        
GMs or between a Gaussian distribution and a GM can                   DKL (fi kfAA ) = H(fi , fAA ) − H(fi )
be approximated by that between their best-fitting single                                       X             log (2πe)nx |P |
                                                                                                                              i
Gaussian distributions as shown in (38). As given in Lemma                              ≥ − log        wj zi,j −                  .
                                                                                                                          2
3, the moment fitting Gaussian for a GM consisting of a                                          j∈I

number of Gaussian distributions fi (x) = N (x; µi , Pi ), i ∈                                                                         (44)

P is fAA,merged (x) P= N (x; µAA , PAA ), where
I
                                                    T
                                                       µAA =          This is useful for solving the suboptimal optimization prob-
       w
   i∈I i iµ , PAA =        w
                       i∈I i   Pi + (µ AA − µ i )(·)   .             lem (32) as maximizing the lower bounds implies maximizing
   Then, one uses fAA,merged (x) for approximately fitting the       the content. That is,
target Gaussian distribution p(x) = N (x; µ, P), yielding
                                                                                             X
                                                                         wsubopt = arg max       wi DKL (fi kfAA ),
                                                                                         w
                                                                                              i∈I
    wopt ≈ arg min DKL       N (x; µAA , PAA )kN (x; µ, P)                                             "      X              
                w                                                                              X
                   
                                          det(P)                                 ≈ arg min           wi log         wj zi,j
                                                                                         w
         = arg min tr P−1 PAA − nx + log
                              
                                                                                               i∈I            j∈I
                w                        det(PAA )                                                                                #
                                                                                                        log (2πe)nx |Pi |
           + (µAA − µ)T P−1 (µAA − µ) .            (39)                                                +                           .   (45)
                                                                                                                2
VOL.XX, NO.XX, 1 OCT. 2021                                                                                                                                       8

  Bounds for entropy or KL divergence between GMs can be                            [16] T. Li, X. Wang, Y. Liang, and Q. Pan, “On arithmetic average fusion
found in e.g., [75], [78]–[82] which can therefore be used in                            and its application for distributed multi-Bernoulli multitarget tracking,”
                                                                                         IEEE Trans. Signal Process., vol. 68, pp. 2883–2896, 2020.
(33) or in (32) for approximate sub-optimization. Sure, other                       [17] H. V. Nguyen, H. Rezatofighi, B.-N. Vo, and D. C. Ranasinghe,
divergences may be considered in lieu of the KL divergence,                              “Distributed multi-object tracking under limited field of view sensors,”
such as the Jensen-Renyi divergence [83], integral square error                          IEEE Trans. Signal Process., 2021, DOI: 10.1109/TSP.2021.3103125.
                                                                                    [18] T. Li, H. Fan, J. Garcı́a, and J. M. Corchado, “Second-order statistics
[84], [85] and so on; see [82].                                                          analysis and comparison between arithmetic and geometric average
                                                                                         fusion: Application to multi-sensor target tracking,” Information Fusion,
                                                                                         vol. 51, pp. 233 – 243, 2019.
                            IV. C ONCLUSION                                         [19] R. P. S. Mahler, Advances in Statistical Multisource-Multitarget Infor-
                                                                                         mation Fusion, ser. Electronic Warfare. Artech House, 2014.
   This paper explains how does the AA fusion maintain co-                          [20] B.-N. Vo, M. Mallick, Y. Bar-shalom, S. Coraluppi, R. Osborne,
variance consistency, improve accuracy and preserve important                            R. Mahler, and B.-T. Vo, “Multitarget tracking,” in Wiley Encyclopedia
mode information, how is it information-theoretically sound                              of Electrical and Electronics Engineering. John Wiley & Sons, 2015.
                                                                                    [21] S. R. Dalal and W. J. Hall, “Approximating priors by mixtures of natural
and how should the fusing weights be properly designed.                                  conjugate priors,” Journal of the Royal Statistical Society: Series B
Its connection with classic conservative fusion approaches is                            (Methodological), vol. 45, no. 2, pp. 278–286, 1983.
exposed. These findings are expected to fill the gap in the                         [22] P. Diaconis and D. Ylvisaker, “Quantifying prior opinion,” Report
                                                                                         Number: EFS NSF207, Oct. 1983.
theoretical study and promote further development of this                           [23] H. Sorenson and D. Alspach, “Recursive Bayesian estimation using
fundamental information fusion approach.                                                 Gaussian sums,” Automatica, vol. 7, no. 4, pp. 465–479, 1971.
   It is interesting albeit challenging to extend the AA fusion                     [24] L. Pishdad and F. Labeau, “Analytic MMSE bounds in linear dynamic
                                                                                         systems with Gaussian mixture noise statistics,” arXiv:1506.07603,
for information fusing regarding state trajectories of different                         2015.
spatial-temporal dimensions, whether based on discrete labels                       [25] D. Peel and G. McLachlan, “Robust mixture modelling using the t
[26], [86] or continuous-time curves [87], [88].                                         distribution,” Stat. Comput., vol. 10, no. 4, pp. 339–348, 2000.
                                                                                    [26] B.-T. Vo and B.-N. Vo, “Labeled random finite sets and multi-object
                                                                                         conjugate priors,” IEEE Trans. Signal Process., vol. 61, no. 13, pp.
                               R EFERENCES                                               3460–3475, Jul. 2013.
                                                                                    [27] A. F. Garcı́a-Fernández, J. L. Williams, K. Granström, and L. Svensson,
 [1] R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus and coopera-               “Poisson multi-Bernoulli mixture filter: Direct derivation and implemen-
     tion in networked multi-agent systems,” Proc. IEEE, vol. 95, no. 1, pp.             tation,” IEEE Trans. Aerosp. Electron. Syst., vol. 54, no. 4, pp. 1883–
     215–233, Jan. 2007.                                                                 1901, 2018.
 [2] A. H. Sayed, “Adaptation, learning, and optimization over networks,”           [28] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation
     Found. Trends in Machine Learn., vol. 7, no. 4-5, pp. 311–801, 2014.                Theory. Upper Saddle River, NJ, USA: Prentice-Hall, 1993.
 [3] S. H. Javadi and A. Farina, “Radar networks: A review of features and          [29] X. Li, Y. Zhu, J. Wang, and C. Han, “Optimal linear estimation fusion.i.
     challenges,” Information Fusion, vol. 61, pp. 48–55, 2020.                          unified fusion rules,” IEEE Transactions on Information Theory, vol. 49,
 [4] K. Da, T. Li, Y. Zhu, H. Fan, and Q. Fu, “Recent advances in multisensor            no. 9, pp. 2192–2208, 2003.
     multitarget tracking using random finite set,” Front Inform Technol            [30] Y. Bar-Shalom and L. Campo, “The effect of the common process
     Electron Eng., vol. 22, no. 1, pp. 5–24, 2021.                                      noise on the two-sensor fused-track covariance,” IEEE Trans. Aerosp.
 [5] G. McLachlan and D. Peel, Finite Mixture Models, ser. Wiley Series in               Electron. Syst., vol. AES-22, no. 6, pp. 803–805, Nov. 1986.
     Probability and Statistics. New York, USA: Wiley, 2000.                        [31] C. Chong, S.Mori, and K. Chang, “Distributed multitarget multisensory
 [6] T. Li, J. Corchado, and S. Sun, “On generalized covariance intersection             tracking,” in Multitarget Multisensor Tracking: Advanced applications,
     for distributed PHD filtering and a simple but better alternative,” in Proc.        Y. Bar-Shalom, Ed. Artech House, 1990.
     FUSION 2017, Xi’an, China, Jul. 2017, pp. 808–815.                             [32] S.-L. Sun, “Distributed optimal linear fusion estimators,” Information
 [7] ——, “Partial consensus and conservative fusion of Gaussian mixtures                 Fusion, vol. 63, pp. 56–73, 2020.
     for distributed PHD fusion,” IEEE Trans. Aerosp. Electron. Syst., vol. 55,     [33] J. K. Uhlmann, “Dynamic map building and localization: New theoret-
     no. 5, pp. 2150–2163, Oct. 2019.                                                    ical foundations,” Ph.D. dissertation, University of Oxford, UK, 1995.
 [8] T. Li and F. Hlawatsch, “A distributed particle-PHD filter using               [34] ——, “Covariance consistency methods for fault-tolerant distributed data
     arithmetic-average fusion of Gaussian mixture parameters,” Information              fusion,” Inf. Fusion, vol. 4, no. 3, pp. 201–215, 2003.
     Fusion, vol. 73, pp. 111–124, 2021.                                            [35] S. J. Julier, “An empirical study into the use of chernoff information for
 [9] T. Li, V. Elvira, H. Fan, and J. M. Corchado, “Local-diffusion-based dis-           robust, distributed fusion of Gaussian mixture models,” in Proc. FUSION
     tributed SMC-PHD filtering using sensors with limited sensing range,”               2006, Florence, Italy, Jul. 2006.
     IEEE Sensors J., vol. 19, no. 4, pp. 1580–1589, Feb. 2019.                     [36] O. Bochardt, R. Calhoun, J. K. Uhlmann, and S. J. Julier, “Generalized
[10] A. K. Gostar, R. Hoseinnezhad, and A. Bab-Hadiashar, “Cauchy-                       information representation and compression using covariance union,” in
     Schwarz divergence-based distributed fusion with Poisson random finite              Proc. FUSION 2006, Florence, Italy, Jul. 2006, pp. 1–7.
     sets,” in Proc. ICCAIS 2017, Chiang Mai, Thailand, Oct. 2017, pp. 112–         [37] X. R. Li and Z. Zhao, “Measuring estimator’s credibility: Noncredibility
     116.                                                                                index,” in Proc. FUSION 2006, 2006, pp. 1–8.
[11] H. Kim, K. Granström, L. Gao, G. Battistelli, S. Kim, and H. Wymeer-          [38] J. Ajgl and M. Šimandl, “Conservativeness of estimates given by
     sch, “5g mmwave cooperative positioning and mapping using multi-                    probability density functions: Formulation and aspects,” Inf. Fusion,
     model PHD filter and map fusion,” IEEE Trans. Wireless Commun.,                     vol. 20, pp. 117–128, 2014.
     vol. 19, no. 6, pp. 3782–3795, 2020.                                           [39] S. Lubold and C. N. Taylor, “Formal definitions of conservative PDFs,”
[12] R. K. Ramachandran, N. Fronda, and G. Sukhatme, “Resilience in multi-               [Online] arXiv:1912.06780v2, 2019.
     robot multi-target tracking with unknown number of targets through             [40] B. Khaleghi, A. Khamis, F. O. Karray, and S. N. Razavi, “Multisensor
     reconfiguration,” IEEE Trans. Contr. Netw. Syst., 2021, in press.                   data fusion: A review of the state-of-the-art,” Information Fusion,
[13] T. Li, Z. Liu, and Q. Pan, “Distributed Bernoulli filtering for target              vol. 14, no. 1, pp. 28–44, 2013.
     detection and tracking based on arithmetic average fusion,” IEEE Signal        [41] T. Bailey, S. Julier, and G. Agamennoni, “On conservative fusion of
     Processing Letters, vol. 26, no. 12, pp. 1812–1816, Dec. 2019.                      information with unknown non-Gaussian dependence,” in Proc. FUSION
[14] K. Da, T. Li, Y. Zhu, and Q. Fu, “Gaussian mixture particle jump-                   2012, Singapore, Jul. 2012, pp. 1876–1883.
     Markov-CPHD fusion for multitarget tracking using sensors with limited         [42] S. Julier and J. Uhlmann, “Fusion of time delayed measurements with
     views,” IEEE Trans. Signal Inform. Process. Netw., vol. 6, pp. 605–616,             uncertain time delays,” in Proc. ACC 2005, 2005, pp. 4028–4033 vol.
     Aug. 2020.                                                                          6.
[15] L. Gao, G. Battistelli, and L. Chisci, “Multiobject fusion with minimum        [43] X. Wang, S. Sun, T. Li, and Y. Liu, “Fault tolerant multi-robot co-
     information loss,” IEEE Signal Process. Lett., vol. 27, pp. 201 – 205,              operative localization based on covariance union,” IEEE Robotics and
     Jan. 2020.                                                                          Automation Letters, vol. 6, no. 4, pp. 7799–7806, 2021.
VOL.XX, NO.XX, 1 OCT. 2021                                                                                                                                     9

[44] D. J. Salmond, “Mixture reduction algorithms for point and extended          [71] N. R. Ahmed and M. Campbell, “Fast consistent chernoff fusion of
     object tracking in clutter,” IEEE Transactions on Aerospace and Elec-             Gaussian mixtures for ad hoc sensor networks,” IEEE Transactions on
     tronic Systems, vol. 45, no. 2, pp. 667–686, 2009.                                Signal Processing, vol. 60, no. 12, pp. 6739–6745, 2012.
[45] A. R. Runnalls, “Kullback-Leibler approach to Gaussian mixture reduc-        [72] N. Ahmed, J. R. Schoenberg, and M. E. Campbell, Fast Weighted
     tion,” IEEE Trans. Aerosp. Electron. Syst., vol. 43, no. 3, pp. 989–999,          Exponential Product Rules for Robust General Multi-Robot Data Fusion.
     July 2007.                                                                        MIT Press, 2013, pp. 9–16.
[46] S. Julier and J. Uhlmann, “General decentralized data fusion with            [73] Y. Zhang, W. Liu, Z. Chen, J. Wang, and K. Li, “On the
     covariance intersection (CI),” in Handbook of Data Fusion, D. Hall and            properties of Kullback-Leibler divergence between Gaussians,”
     J. Llinas, Eds. Boca Raton, FL, USA: CRC Press, 2001, ch. 12, pp.                 arxiv.org/abs/2102.05485, 2021.
     1–25.                                                                        [74] J.-Y. Chen, J. R. Hershey, P. A. Olsen, and E. Yashchin, “Accelerated
[47] S. Reece and S. Roberts, “Robust, low-bandwidth, multi-vehicle map-               monte carlo for Kullback-Leibler divergence between Gaussian mixture
     ping,” in 2005 7th International Conference on Information Fusion,                models,” in 2008 IEEE International Conference on Acoustics, Speech
     vol. 2, 2005, pp. 8 pp.–.                                                         and Signal Processing, 2008, pp. 4553–4556.
[48] J. Sijs and M. Lazar, “State fusion with unknown correlation: Ellipsoidal    [75] J. R. Hershey and P. A. Olsen, “Approximating the Kullback Leibler
     intersection,” Automatica, vol. 48, no. 8, pp. 1874–1878, 2012.                   divergence between Gaussian mixture models,” in Proc. ICASSP ’07,
[49] B. Noack, J. Sijs, M. Reinhardt, and U. D. Hanebeck, “Decentralized               vol. 4, 2007, pp. IV–317–IV–320.
     data fusion with inverse covariance intersection,” Automatica, vol. 79,      [76] N. Bouhlel and A. Dziri, “Kullback-Leibler divergence between mul-
     pp. 35–41, 2017.                                                                  tivariate generalized Gaussian distributions,” IEEE Signal Processing
[50] B. Noack, U. Orguner, and U. D. Hanebeck, “Nonlinear decentralized                Letters, vol. 26, no. 7, pp. 1021–1025, 2019.
     data fusion with generalized inverse covariance intersection,” in Proc.      [77] M. Fontana, A. F. Garcia-Fenandez, and S. Maskell, “Bernoulli merging
     FUSION 2019, 2019, pp. 1–7.                                                       for the poisson multi-bernoulli mixture filter,” in Proc. FUSION 2020,
[51] J. Arenas-Garcia, A. Figueiras-Vidal, and A. Sayed, “Mean-square                  Rustenburg, South Africa, 2020, pp. 1–8.
     performance of a convex combination of two adaptive filters,” IEEE           [78] M. F. Huber, T. Bailey, H. Durrant-Whyte, and U. D. Hanebeck, “On
     Trans. Signal Process., vol. 54, no. 3, pp. 1078–1090, 2006.                      entropy approximation for Gaussian mixture random vectors,” in 2008
[52] Y. Zhang and J. Chambers, “Convex combination of adaptive filters for             IEEE International Conference on Multisensor Fusion and Integration
     a variable tap-length LMS algorithm,” IEEE Signal Processing Letters,             for Intelligent Systems, 2008, pp. 181–188.
     vol. 13, no. 10, pp. 628–631, 2006.                                          [79] J. V. Michalowicz, J. M. Nichols, and F. Bucholtz, “Calculation of
[53] N. J. Bershad, J. C. M. Bermudez, and J.-Y. Tourneret, “An affine                 differential entropy for a mixed Gaussian distribution,” Entropy, vol. 10,
     combination of two lms adaptive filters - transient mean-square analysis,”        no. 3, pp. 200–206, 2008.
     IEEE Transactions on Signal Processing, vol. 56, no. 5, pp. 1853–1864,       [80] J.-L. Durrieu, J.-P. Thiran, and F. Kelly, “Lower and upper bounds for
     2008.                                                                             approximation of the Kullback-Leibler divergence between Gaussian
[54] N. Takahashi, I. Yamada, and A. H. Sayed, “Diffusion least-mean                   mixture models,” in 2012 IEEE International Conference on Acoustics,
     squares with adaptive combiners: Formulation and performance analy-               Speech and Signal Processing (ICASSP), 2012, pp. 4833–4836.
     sis,” IEEE Transactions on Signal Processing, vol. 58, no. 9, pp. 4795–      [81] S. Cui and M. Datcu, “Comparison of Kullback-Leibler divergence
     4810, 2010.                                                                       approximation methods between Gaussian mixture models for satellite
[55] F. S. Cattivelli and A. H. Sayed, “Diffusion strategies for distributed           image retrieval,” in 2015 IEEE International Geoscience and Remote
     kalman filtering and smoothing,” IEEE Transactions on Automatic                   Sensing Symposium (IGARSS), 2015, pp. 3719–3722.
     Control, vol. 55, no. 9, pp. 2069–2084, 2010.                                [82] F. Nielsen and K. Sun, “Guaranteed bounds on the Kullback-Leibler
[56] Y. Wang and X. R. Li, “A fast and fault-tolerant convex combination               divergence of univariate mixtures,” IEEE Signal Processing Letters,
     fusion algorithm under unknown cross-correlation,” in Proc. FUSION                vol. 23, no. 11, pp. 1543–1546, 2016.
     2009, Seattle, Washington, USA, Jul. 2009, pp. 571–578.                      [83] F. Wang, T. Syeda-Mahmood, B. Vemuri, D. Beymer, and A. Rangarajan,
[57] R. P. S. Mahler, “The multisensor PHD filter: II. Erroneous solution via          “Closed-form Jensen-renyi divergence for mixture of Gaussians and
     Poisson magic,” in Proc. SPIE, vol. 7336, 2009, pp. 7336–12.                      applications to group-wise shape registration,” Med Image Comput
[58] A. E. Abbas, “A Kullback-Leibler view of linear and log-linear pools,”            Comput Assist Interv., vol. 12, no. Pt 1, pp. 648–655, 2009.
     Decision Analysis, vol. 6, no. 1, pp. 25–37, 2009.                           [84] D. W. Scott and W. F. Szewczyk, “From kernels to mixtures,” Techno-
[59] K. Da, T. Li, Y. Zhu, H. Fan, and Q. Fu, “Kullback-Leibler averaging              metrics, vol. 43, no. 3, pp. 323–335, 2001.
     for multitarget density fusion,” in Proc. DCAI 2019, Avila, Spain, Jun.      [85] J. L. Williams and P. S. Maybeck, “Cost-function-based hypothesis
     2019, pp. 253–261.                                                                control techniques for multiple hypothesis tracking,” Math Comput
[60] T. Ardeshiri, U. Orguner, and E. Özkan, “Gaussian mixture                        Model., vol. 43, no. 9-10, pp. 976–989, 2006.
     reduction using reverse Kullback-Leibler divergence,” 2015,                  [86] T. Kropfreiter, F. Meyer, and F. Hlawatsch, “A fast labeled multi-
     arxiv.org/abs/1508.05514.                                                         Bernoulli filter using belief propagation,” IEEE Trans. Aerosp. Electron.
[61] T. Li, “Why mixture?” [Online] techXiv:15082113, 2021.                            Syst., vol. 56, no. 3, pp. 2478–2488, Jun 2020.
[62] R. E. Blahut, Principles and Practice of Information Theory. USA:            [87] T. Li, H. Chen, S. Sun, and J. M. Corchado, “Joint smoothing and
     Addison-Wesley Longman Publishing Co., Inc., 1987.                                tracking based on continuous-time target trajectory function fitting,”
[63] T. M. Cover and J. A. Thomas, Elements of Information Theory. John                IEEE Transactions on Automation Science and Engineering, vol. 16,
     Wiley & Sons, Ltd, 2001.                                                          no. 3, pp. 1476–1483, 2019.
[64] T. Li, V. Elvira, H. Fan, and J. M. Corchado, “Local-diffusion-based dis-    [88] T. Li and H. Fan, “A computationally efficient approach to non-
     tributed SMC-PHD filtering using sensors with limited sensing range,”             cooperative target detection and tracking with almost no a-priori in-
     IEEE Sensors Journal, vol. 19, no. 4, pp. 1580–1589, Feb. 2019.                   formation,” 2021, arXiv:2104.11122.
[65] T. Li, K. Da, Z. Liu, X. Wang, and Y. Liang, “Best fit of mixture
     for computationally efficient poisson multi-bernoulli mixture filtering,”
     DOI: 10.36227/techrxiv.12351710, 2021, under revision in IEEE TSP.
[66] M. Üney, D. E. Clark, and S. J. Julier, “Information measures in
     distributed multitarget tracking,” in Proc. Fusion 2011, Chicago, IL,
     USA, Jul. 2011.
[67] L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,”
     Syst. Control. Lett., vol. 53, no. 1, pp. 65–78, 2004.
[68] F. Nielsen, “An information-geometric characterization of Chernoff
     information,” IEEE Signal Processing Letters, vol. 20, no. 3, pp. 269–
     272, 2013.
[69] M. Üney, J. Houssineau, E. Delande, S. J. Julier, and D. E. Clark,
     “Fusion of finite set distributions: Pointwise consistency and global
     cardinality,” IEEE Trans. Aerosp. Electron. Syst., vol. 55, no. 6, pp.
     2759–2773, 2019.
[70] M. B. Hurley, “An information theoretic justification for covariance
     intersection and its generalization,” in Proc. FUSION 2002, Annapolis,
     MD, USA, Jul. 2002, pp. 505–511.
You can also read