Estimating entropy rate from censored symbolic time series: a test for time-irreversibility

Page created by Billy Floyd
 
CONTINUE READING
Estimating entropy rate from censored symbolic time series

                                                      Estimating entropy rate from censored symbolic time series: a test for
                                                      time-irreversibility
                                                                  R. Salgado-García1, a) and Cesar Maldonado2
                                                                  1) Centrode Investigación en Ciencias-IICBA, Physics Department, Universidad Autónoma del
                                                                  Estado de Morelos. Avenida Universidad 1001, colonia Chamilpa, CP 62209, Cuernavaca Morelos,
                                                                  Mexico.
                                                                  2) IPICYT/División de Control y Sistemas Dinámicos. Camino a la Presa San José 2055, Lomas 4a. sección, C.P. 78216,
                                                                  San Luis Potosí, S.L.P. Mexico.
arXiv:2009.11351v2 [cond-mat.stat-mech] 10 Jan 2021

                                                                  (Dated: 12 January 2021)
                                                                  In this work we introduce a method for estimating entropy rate and entropy production rate from finite symbolic time
                                                                  series. From the point of view of statistics, estimating entropy from a finite series can be interpreted as a problem
                                                                  of estimating parameters of a distribution with a censored or truncated sample. We use this point of view to give
                                                                  estimations of entropy rate and entropy production rate assuming that they are parameters of a (limit) distribution. The
                                                                  last statement is actually a consequence of the fact that the distribution of estimations obtained from recurrence-time
                                                                  statistics satisfy the central limit theorem. We test our method using time series coming from Markov chain models,
                                                                  discrete-time chaotic maps and real a DNA sequence from human genome.

                                                      Entropy rate as well as entropy production rate are fun-               namical systems and stochastic processes (see Ref. 6 for com-
                                                      damental properties of stochastic processes and determin-              plete details). The entropy production rate quantifies, in some
                                                      istic dynamical systems. For instance, in dynamical sys-               way, the degree of time-irreversibility of a given system from
                                                      tems the entropy rate is closely related to the largest Lya-           a microscopic point of view, which in turn tells us how much
                                                      punov exponent, stating that the positivity of entropy rate            such a system is far from the thermodynamic equilibrium4,5,7 .
                                                      is a signature of the presence of chaos. Similarly, the en-            Moreover, time-irreversibility of certain dynamical processes
                                                      tropy production rate is a measure of the degree of irre-              in nature might be an important feature because it would im-
                                                      versibility of a given system. Thus, in some sense, a non              ply the influence of nonlinear dynamics or non-Gaussian noise
                                                      zero entropy production rate states how much, a system,                on the dynamics of the system 8 . All these features of time-
                                                      is far from equilibrium. However, estimating either, en-               irreversibility has encouraged the study of this property in sev-
                                                      tropy rate or entropy production rate is not a trivial task.           eral systems. For instance, in Ref. 9 it has been found that
                                                      One of the main limitations to give precise estimations of             real DNA sequences would be spatially irreversible, a prop-
                                                      these quantities is the fact that observed data (time series)          erty that has been explored aimed to understand the intriguing
                                                      are always finite, but the entropy rate and entropy pro-               statistical features of the actual structure of the genome. The
                                                      duction rate are asymptotic quantities defined as a limit              fact that DNA might be spatially irreversible has been used to
                                                      for which it is necessary to have infinitely long time-series.         propose a mechanism of noise-induced rectification of parti-
                                                      We use the recurrence-time statistics combined with the-               cle motion10 that would be important in the study of biolog-
                                                      ory of censored samples from statistics to propose sam-                ical processes involving the DNA transport. Testing the irre-
                                                      pling schemes and define censored estimators for the en-               versibility of time series has also been the subject of intense
                                                      tropy rate and the entropy production rate, taking advan-              research. For example, in Ref. 8 it has been proposed a sym-
                                                      tage of the finiteness of the observed data.                           bolic dynamics approach to determine whether the time series
                                                                                                                             are time-irreversible or not. Another important study has been
                                                                                                                             reported in Ref. 11, where the authors introduced a method
                                                                                                                             to determining time-irreversibility of time series by using a
                                                      I.   INTRODUCTION                                                      visibility graph approach. That approach has also been used
                                                                                                                             to understanding the time-reversibility of non-stationary pro-
                                                         Entropy rate and entropy production rate are two quantities         cesses12 . The possibility of determining this temporal asym-
                                                      playing a central role in equilibrium and nonequilibrium sta-          metry has also lead to try to understand the dynamics of sev-
                                                      tistical mechanics. On the one hand, entropy rate (also called         eral processes beyond physical systems. In Ref. 13 it has been
                                                      Kolmogorov-Sinai entropy) is closely related to the thermody-          explored the time-irreversibility of financial time-series as a
                                                      namical entropy1,2 which is a fundamental quantity in the con-         feature that could be used for ranking companies for optimal
                                                      text of equilibrium statistical mechanics. On the other hand,          portfolio designs. In Ref. 14 it has been studied the time-
                                                      entropy production has a prominent role in the development             irreversibility of human heartbeat time-series, and relating this
                                                      of nonequilibrium statistical mechanics3–5 . Both, entropy rate        property to aging and disease of individuals. Moreover, time-
                                                      and entropy production rate, have a rigorous definition in dy-         irreversibility has also been used to understand several prop-
                                                                                                                             erties of classical music15 .
                                                                                                                                In the literature one can find many estimators of the entropy
                                                                                                                             rate in symbol sequences produced by natural phenomena as
                                                      a) Electronic   mail: raulsg@uaem.mx                                   well as in dynamical systems, random sequences or even in
Estimating entropy rate from censored symbolic time series                                                                              2

natural languages taken from written texts. Perhaps, the most       summary of the asymptotic properties of the estimators based
used method for entropy estimation is the empirical approach,       on the recurrence-time statistics. We also describe the method
in which one estimates the probability of the symbols using         used for estimating parameters of the normal distribution from
their empirical frequency in the sample and then, this is used      a given censored sample. In Section III we propose our sam-
to estimate the entropy rate directly from its definition16,17 .    pling schemes for estimating the entropy rate and the reversed
One can find a lot of works in this direction trying to find bet-   entropy rate using the recurrence-time statistics. There, we
ter, unbiased and well-balanced estimators (see Ref. 18 and         also describe the method that will be used for implementing
references therein). One can go further by asking for the con-      the estimations in real data. In Section IV we test the method-
sistency and the fluctuation properties of these estimators. For    ology established in Section III for estimating the entropy
instance in Refs. 19 and 20 there are explicit and rigorous fluc-   rate and the reversed entropy rate in an irreversible three-state
tuation bounds under some mild additional assumptions, for          Markov chain. We compare our estimations with the exact
these so-called “Plug-In" estimators. On the other hand, but        values that can be actually computed. In Section V we imple-
in the same empirical approach, there are also estimators for       ment the proposed estimating method in deterministic chaotic
the relative empirical entropy as a quantification of the entropy   systems, a n-step Markov chain and a real DNA sequence. Fi-
production21,22 .                                                   nally in Section VI we give the main conclusions of our work.
   From another point of view, the problem of estimating the
entropy rate of stationary processes has also been studied us-
ing the recurrence properties of the source. This is, another       II. ENTROPY RATE AND ENTROPY PRODUCTION
major technique used in the context of stationary ergodic pro-      RATE
cesses on the space of infinite sequences, in areas such as in-
formation theory, probability theory and in the ergodic theory      A.   Recurrence time statistics
of dynamical systems (we refer the interested reader to Ref. 23
and the references therein ). The basis of this approach is            Consider a finite set A which we will refer to as alphabet.
the Wyner-Ziv-Ornstein-Weiss theorem which establishes an           Let X := {Xn : n ∈ N} a discrete-valued stationary ergodic
almost sure asymptotic convergence of the logarithm of the          process generated by the law P, whose realizations are infi-
recurrence time of a finite sample (scaled by its length), to       nite sequences of symbols taken from A, that is, the set of
the entropy rate23 . This result uses the Shannon-McMillan-         all posible realizations is a subset of AN . Here we denote by
Breiman theorem, which in turn, can be thought as an ergodic        x = x1 x2 x3 . . . an infinite realization of the process X. Let ℓ be
theorem for the entropy23. Under this approach it is possi-         a positive integer, we denote by xℓ1 the string of the first ℓ sym-
ble to define estimators using quantities such as return time,      bols of the realization x. A finite string a := a1 a2 a3 . . . aℓ com-
hitting time, waiting time among others24 . Here we will use        prised of ℓ symbols will be called either ℓ-word or ℓ-block, we
the term “recurrence time” as a comprehensive term for those        may use one or the other without making any distinction. We
mentioned before. Moreover, it is possible to obtain very pre-      will say that the ℓ-word a “occurs” at the kth site of the tra-
cise results on the consistency and estimation of the fluctua-      jectory x, if xk+ℓ−1
                                                                                     k      = a. An alternative notation for indicating
tions of these estimators by applying the available results on      the ℓ-block at the kth site of x will be: x(k, k + ℓ − 1).
the distribution of these quantities25–27 .                            Next, we introduce the return time, the waiting time and
   In the setting of Gibbs measures in the thermodynamic for-       the hitting time. Let us consider a finite string aℓ1 made out
malism, one can also find consistent estimators defined from        of symbols of the alphabet A. Given two independent realiza-
the return, hitting, and waiting times for entropy rate and one     tions x and y, let xℓ1 and yℓ1 be their first ℓ symbols, then the
also has precise statements on their fluctuations, such as the      return, the waiting and the hitting time are defined as follows,
central limit theorem28, large deviation bounds and fluctua-
                                                                               ρℓ := ρℓ (x) := inf{k > 1 : xk+ℓ−1 = xℓ1 },            (1)
tion bounds20,28 . Similarly occurs within the study of the                                                 k
estimation of the entropy production rate. In the context of
Markov chains applied to the quantification of the irreversibil-             ωℓ := ωℓ (x, y) := inf{k ≥ 1 : yk+ℓ−1
                                                                                                             k     = xℓ1 },           (2)
ity or time-reversal asymmetry see Refs. 7 and 29, in Gibb-
ssian sources see Ref. 30 as well as for their fluctuation prop-             τℓ := τℓ (aℓ1 , x) := inf{k ≥ 1 : xk+ℓ−1 = aℓ1 },        (3)
                                                                                                                k
erties in Ref. 30 and 31.
   Nonetheless, for real systems, determining the value of the      respectively.
entropy rate and the entropy production rate is not a trivial          Wyner and Ziv (see for instance Ref. 32) proved that for an
task. This is because these quantities are obtained as limit val-   stationary ergodic process, the quantity 1ℓ log ρℓ converges to
ues of the logarithm of recurrence times, as the sample length      the entropy rate in probability, and that for stationary ergodic
goes to infinity. This is a fundamental limitation, since obser-    Markov chains, 1ℓ log ωℓ also converges to the entropy rate h,
vations are always finite. So, instead of having the true value     in probability. That is, these quantities grow exponentially fast
of the entropy rate or the entropy production rate, one always      with ℓ and their limit rate is equal to the entropy rate in proba-
obtains a finite-time approximation. This makes us believe          bility. Later, Ornstein and Weiss33 showed that for stationary
that there is a need to define estimators for finite samples, us-   ergodic processes
ing the point of view of the recurrence times.                                            1
   The article is organized as follows. In Section II we give a                        lim  log ρℓ = h        P − a.s.                (4)
                                                                                      ℓ→∞ ℓ
Estimating entropy rate from censored symbolic time series                                                                                 3

For the waiting time, it was proved by Shields23 that for sta-      where in this case, σ 2 = limℓ→∞ 1ℓ (log ωℓ − h)2 d(P × P).
                                                                                                                R

tionary ergodic Markov chains one has,                                 So, in the context of Gibbs measures, the asymptotic nor-
                                                                    mality it is fulfilled for both, the return times and the waiting
                  1
               lim  log ωℓ = h        P × P − a.s.           (5)    times. This also holds for exponential φ -mixing processes.
              ℓ→∞ ℓ                                                 Moreover it is satisfied a large deviations principle for both
These theorems are based on the Shannon-McMillan-Breiman            quantities as well28 (with some additional restrictions in the
theorem, which claims that − 1ℓ log P([xℓ1 ]) converges almost      case of the return-time). For the case of the hitting times, one
surely to the entropy rate h, where [xℓ1 ] stands for the cylin-    has to overcome the bad statistics produced by very short re-
der set [xℓ1 ] := {z ∈ AN : zℓ1 = xℓ1 }. Furthermore, in Ref. 27,   turns for which the approximation changes (see Ref. 35).
Kontoyiannis has obtained strong approximations for the re-            In the same context, one can find fluctuation bounds for
currence and waiting times of the probability of a finite vector    both, the plug-in estimators and for the waiting and the hitting
which in turn, have let him to obtain an almost sure conver-        time estimators.20 One of the main tools used is the concen-
gence for the waiting time in ψ -mixing processes, extending        tration inequalities that are valid for very general mixing pro-
previous results for Markov chains. He has also obtained an         cesses. Using the concentration phenomenon, one can obtain
almost sure invariance principle for log ρℓ and log ωℓ . This im-   non-asymptotic results. That is, upper bounds for the fluctu-
plies that these quantities satisfy a central limit theorem and a   ations of the entropy estimator which are valid for every n,
law of iterated logarithm.27                                        where n denotes the length of the sample.
   In the same spirit, the works of Abadi and collabora-               Next, for the estimation of the entropy production rate, in
tors24–26 provide very precise results for the approximation        Ref. 30, two estimators of the entropy production were intro-
of the distribution of the hitting times (properly rescaled) to     duced. The entropy production was defined as a trajectory-
an exponential distribution, under mild mixing conditions for       valued function quantifying the degree of irreversibility of the
the process. They also give sharp bounds for the error term         process producing the samples, in the following way: let P be
for this exponential distribution approximation. This enables       the law of the process and let us denote by Pr the law of the
to obtain bounds for the fluctuations of the entropy estimators     time-reversed process, then the entropy production rate is the
using hitting times.20,28,31                                        relative entropy rate of the process with respect to the time-
                                                                    reversed one,
                                                                                                                 Hℓ (P|Pr )
B.   Asymptotic behavior of the estimators                                          ep = h(P|Pr ) := lim                    ,             (9)
                                                                                                             ℓ→∞     ℓ
   We are interested in estimating the entropy and the entropy      where
production rates, moreover, we need to assure that their esti-
                                                                                                                           P([xℓ1 ])
mators have good properties of convergence and fluctuations,                     Hℓ (P|Pr ) :=    ∑        P([xℓ1 ]) log             .   (10)
since this will enable us to use our method.                                                     xℓ1 ∈Aℓ
                                                                                                                           P([x1ℓ ])
   Here, we are interested in estimators defined by recurrence
times, for which one can find very precise asymptotic results       Here x1ℓ stand for the word xℓ1 reversed in order. The estimators
regarding their fluctuations. It is known33 that                    defined in Ref. 30 using the hitting and waiting times are given
                                                                    as follows:
                            1
                        lim   log ρℓ = h,                    (6)
                        ℓ→∞ ℓ                                                                                   τx1 (x)
                                                                                         Sℓτ (x) := log             ℓ
                                                                                                                           ,             (11)
almost surely in ergodic process, thus one can use the return                                                   τxℓ (x)
                                                                                                                    1
time as an estimator of the entropy rate. Furthermore, under
the Gibbssian assumption,
                        √ it has been proved that the random        where τxℓ (x) := inf{k ≥ 1 : xk+ℓ
                                                                                                  k   = xℓ1 }. Notice that the estima-
                                                                             1
variable (log ρℓ − ℓh)/ ℓ converges in law to a normal distri-      tor actually quantifies the logarithm of the first time the word
bution, when ℓ tends to infinite.34                                 xℓ1 appears in the reversed sequence divided by the first return
   The waiting and hitting times are also used as estimators.       time of the first ℓ symbols in x. For the case of the estimator
For instance, it has beed proved that,                              using the waiting time, one has in an analogous way that:
                         1                                                                                      ωℓr (x, y)
                     lim   log ωℓ (x, y) = h,                (7)                      Sℓω (x, y) := log                    ,             (12)
                     ℓ→∞ ℓ
                                                                                                                ωℓ (x, y)
for P × P almost every pair (x, y), where the distribution P is
a Gibbs measure.28 This is obtained from an approximation           where ωℓ (x, y) := τxℓ (y) and ωℓr (x, y) := τx1 (y). In the con-
                                                                                          1                                    ℓ
of the 1ℓ log ωℓ to the − 1ℓ log P([xℓ1 ]) which, by the Shannon-   text of Gibbs measures or exponential ψ -mixing,30 it has been
McMillan-Breiman theorem, goes almost surely to the en-             studied the fluctuation properties of such estimators for which
tropy rate. Also, they proved the same log-normal fluctuations      its consistency has also been proved, that is, P × P-almost
for the waiting times, i.e.,                                        surely we have that,
                  n log ω − ℓh
                                                                                                 Sℓω
                                      o
                          ℓ
        lim P × P        √        < t = N (0, 1)(−∞,t], (8)                                   lim    = ep ,                              (13)
       ℓ→∞              σ ℓ                                                                   ℓ→∞ ℓ
Estimating entropy rate from censored symbolic time series                                                                              4

as well as, P-almost surely                                          entropy. Taking into account this observation, we can state
                                                                     our problem as follows: given a sample set {hi : 1 ≤ i ≤ m} of
                            Sℓτ                                      independent realizations of Θℓ , we wish to estimate hℓ and σℓ
                         lim    = ep .                       (14)
                         ℓ→∞ ℓ                                       knowing that such a sample is censored from above by hc .
The asymptotic normality also holds, in that case, the asymp-           It is important to remark that, since the realizations of Θℓ
totic variance of the estimator coincides with that of the en-       are censored from above by hc , then any sample set H := {hi :
tropy production. In the same reference the authors also ob-         1 ≤ i ≤ m} of (independent) realizations of Θℓ will contain nu-
tain a large deviation principle for the waiting time estimator.     merically undefined realizations; i.e., hi such that hi > hc . We
Later in Ref. 31 the fluctuation bounds were obtained for the        well refer to these numerically undefined values as censored
same estimators introduced in Ref. 30 under the same setting.        realizations or censored samples. Those sample with a well-
This result is interesting from the practical point of view since    defined numerical value will be called uncensored samples o
it provides bounds that are valid for finite time and not only in    realizations. We will see below that censored sample data will
the asymptotic sense.                                                be used for the estimation of hℓ and σℓ .
   Here, we will use the approach defined in Ref. 7 for the es-         Let m := |H | be the size of the sample and let us assume
timation of the entropy production rate, since we want to com-       that the total number of uncensored realizations in the sam-
pare it with the exact results one is able to obtain for Markov      ple set H is exactly k, with k < m. Then, the total number of
chains. In Ref. 7 it is shown that the entropy production rate       censored realizations in H is m − k. Since the realizations are
can be obtained as the difference between the entropy rate and       assumed to be independent (a usual hypothesis in statistics),
the reversed entropy rate for Markov processes. For more gen-        we have that k can be seen as a realization of a random vari-
eral systems, the entropy production is defined in some anal-        able with binomial distribution. Thus, the fraction p̂ := k/m
ogous way.5 The reversed entropy rate is defined as the rate         of uncensored samples with respect to the total number of re-
of entropy of the reversed process in time, i.e., as if we were      alizations in H is an estimation of the parameter p of the
estimating the entropy rate of the process evolving backwards        above-mentioned binomial distribution. As we said above, Θℓ
in time. From the practical point of view, in a time series, the     has normal distribution, implying that the parameter p is given
entropy production rate may be estimated as the difference           by,
between the entropy rate and the entropy rate estimated from                                      
                                                                                                    hc − hℓ
                                                                                                             
the reversed time series. To implement the latter methodol-                                 p=Φ                ,                   (16)
ogy using the recurrence time statistics, in Section III we will                                       σℓ
define the reversed recurrence times which will allow us to          where Φ is the distribution function of a standard normal ran-
give estimations of the reversed entropy rate and eventually,        dom variable, i.e.,
the corresponding estimations of the entropy production rate
                                                                                               1
                                                                                                           Z x
as a measure of time-irreversibility of the process. It is im-                        Φ(x) := √                  e−y
                                                                                                                         2 /2
                                                                                                                                dy.   (17)
portant to mention that our methodology can still be applied                                   2π           −∞
further than Markov chains, nevertheless, in those cases, one
expects to obtain results displaying the irreversibility as a con-     In Appendix A, following calculations from Ref. 36, we
sequence of the positivity of the entropy production, and not        show that the parameters hℓ and σℓ2 can be estimated by using
the exact results.                                                   the censored sample as follows:

                                                                                           ĥ = h̄ + ζ̂ (hc − h̄),                    (18)
                                                                                            2      2                      2
C. Parameter estimation of a normal distribution from                                     σ̂ = s + ζ̂ (hc − h̄) ,                     (19)
censored data
                                                                     where h̄ is the sample mean of the uncensored samples and s2
                                                                     the corresponding sample variance, i.e.,
  Let us denote by Θℓ the random variable whose realizations
are estimations of the ℓ-block entropy rate obtained by the                                        1 k
recurrence-time statistics. To be precise, Θℓ can be defined as                            h̄ :=     ∑ hi ,                           (20)
                                                                                                   k i=1
                               1
                        Θℓ =     log(Tℓ ),                   (15)                                  1 k
                               ℓ                                                           s2 :=     ∑ (hi − h̄)2 .                   (21)
                                                                                                   k i=1
where Tℓ can be the return, hitting, or waiting time random
variable. As pointed out above, Θℓ satisfy the central limit         Additionally ζ̂ is defined as:
theorem regardless the choice of the recurrence time statistics.
This fact enables us to assume that Θℓ has a normal distribu-                                             φ (ξ̂ )
                                                                                            ζ̂ :=                                     (22)
tion, with mean hℓ and variance σℓ2 . As mentioned before, one                                         p̂ ξ̂ + φ (ξ̂ )
of the problems arising in implementing this estimator for real
time series is that the return time Tℓ is censored from above by     where ξ̂ is obtained by means of the normal distribution func-
a prescribed finite value Tc . From eq. (15), it is clear that the   tion as
random variable Θℓ becomes censored from above by a finite
value hc := log(Tc )/ℓ which will be referred to as censoring                                   ξ̂ := Φ−1 ( p̂).                      (23)
Estimating entropy rate from censored symbolic time series                                                                                  5

III. SAMPLING SCHEMES FOR ESTIMATING ENTROPY                            It is necessary to stress the fact that the values in the above-
RATE FROM RECURRENCE-TIME STATISTICS                                    defined sample set are not necessarily all of them numerically
                                                                        well-defined (or uncensored). This is because the return-time
   As we said above, we are interested in estimating the en-            defined in eq. (24) is actually censored from above. Notice
                                                                                                               (n)
tropy rate and the entropy production rate from an observed             that we impose the condition that ρℓ take a value no larger
trajectory. The trajectory, in this context, stands for a finite-       that n. This is imposed by two reasons: on the one hand,
length symbolic sequence x = x1 x2 x3 . . . xn which is assumed         we have that the return-time cannot be arbitrarily large due
to be generated by some process with an unknown law P. As               to the finiteness of the trajectory x. And, on the other hand,
we saw in section II, we have to assume that the process com-           although it is possible that the return-time for some sample
plies with the appropriate mixing properties, such as exponen-          words might be larger than n and still well-defined, it is not
tial φ -mixing or Gibbs, in order for the central limit theorem         convenient for the statistics. Let us explain this point in more
to be valid. The next step is to obtain samples of the recur-           detail. If we take a sample word a located at the kth site,
rence time statistics, i.e., we need to establish a protocol for        its corresponding return-time can in principle be at most as
extracting samples of return, waiting or hitting-times from the         large as n + k − ℓ. This happens when the word a occurs (by
sequence x. The method for extracting samples we use, is sim-           chance) at the n + k − ℓth site. Since all the sample words
                                                                               ρ
ilar to the one introduced in Ref. 37, which is used for estimat-       in Mℓ are located at different sites along x, it is clear that
ing the symbolic complexity and particularly, the topological           their corresponding return-time values have different upper
entropy of a process. After that, we will define the estimators         bounds. Therefore, if we do not impose a homogeneous upper
of the entropy rate and entropy production rate, using the fact         bound, the collection of return-time samples results in inho-
that the observed samples might be censored.                            mogeneous censored data. As we have seen in section II C,
                                                                        having a homogeneous bound (homogeneous censored data)
                                                                        is crucial for implementing our estimators.
A.     Return time
                                                                                    k                       k+n

   First, we establish the method for obtaining samples of the             (A)
                                                                                        a           a
return-time. Given a sequence x of size 2n, take two non-
                                                                                        a                             a
negative integers ℓ and ∆ such that ℓ < ∆ ≪ n. Then define                 (B)
            ρ
the set Mℓ = {ai : ai = x(i∆ + 1, i∆ + ℓ), 0 ≤ i < m}, where
                                                                                        a                                               a
m := ⌊n/∆⌋, of words of length ℓ and evenly ∆-spaced along                 (C)
the first half of the trajectory x. In Fig. 1 we show a schematic                                                                  2n
                                                   ρ
representation of how the sample words in Mℓ are collected
from the trajectory x.                                                                                  x

     a1 a2   a3      ...     am
                                                                        FIG. 2.     Uncensored and censored return-time values. First we
                                  n                              2n     suppose that a sample word a occurs at the kth site along a finite
                                                                                                                    (n)
                                                                        trajectory x of length 2n. In order to get ρℓ (a) we should look for
                                  x                                     the occurrence of a along x, from the (k +1)th symbol to the (k +n)th
                                                                        symbol of x. This section of the trajectory is written x(k + 1, k +
                                                                                                                        (n)
                                                                        n). (A) If a is found in x(k + 1, k + n), then ρℓ (a) is numerically
     FIG. 1. Selection of sample words for return-time statistics.      well defined, thus called uncensored. (B) If a is found in x but not
                                                                                                                            (n)
                                                                        in the section x(k + 1, k + n) we consider that ρℓ (a) is censored
  Next, we define the sample sets of return times Rℓ and re-            (numerically undefined ). (C) Finally, if we do not observe any other
versed return times R ℓ as follows. First, we associate to each         occurrence of a in x beyond the (k + 1)th symbol, it is clear that
              ρ                              (n)                          (n)
word a ∈ Mℓ the censored return time, ρℓ (a, x), and the                ρℓ (a) is numerically undefined, henceforth, censored.
                           (n)
reversed return time, ρℓ (a, x), as follows,
                                                                           In the following, we will refer to this homogeneous upper
     (n)
ρℓ (a, x) := inf{t > 1 : xk+t+ℓ−1
                          k+t     = a, t ≤ n, a := xk+ℓ−1
                                                    k     },            bound for return-times as censoring time and, whenever con-
                                                        (24)            venient it will alternatively be denoted by Tc . In Fig. 2 we give
     (n)                                                                an illustrative description of the censoring of the samples.
ρℓ (a, x) := inf{t > 1 : xk+t+ℓ−1
                          k+t     = a, t ≤ n, a := xkk+ℓ−1 }.              Once we have the return-time sample set Rℓ , we introduce
                                                         (25)           the estimator of the entropy rate and the entropy production
                                                                        rate. As we saw in section II, if we take a return-time value
Observe that a stands for the block a with its symbols in a             t from the sample set Rℓ , then the quantity log(t)/ℓ can be
reversed order. Next, Rℓ and R ℓ are defined by                         interpreted as a realization of the block entropy rate, hℓ which
                                  (n)
              Rℓ := {t ∈ N : ρℓ (a) = t, a ∈ Mℓ },
                                                     ρ
                                                                 (26)   in the limit when ℓ → ∞, obeys the central limit theorem. This
                                                                        fact enables us to implement the following hypothesis: for
                                  (n)                ρ
              R ℓ := {t ∈ N : ρℓ (a) = t, a ∈ Mℓ }.              (27)   finite ℓ, the value log(t)/ℓ is a realization of a normal random
Estimating entropy rate from censored symbolic time series                                                                                6
                                                                                               2    √
variable with (unknown) mean hℓ and variance σℓ2 . Then, the                where φ (x) = e−x /2 / 2π is the probability density
sample sets                                                                 function of the standard normal distribution and Φ its
                    ρ                                                       (cumulative) distribution function.
                Hℓ := {h = log(t)/ℓ : t ∈ Rℓ },             (28)
                H
                    ρ
                        := {h = log(t)/ℓ : t ∈ R ℓ },       (29)         7. Finally, the estimations for the mean of the block en-
                    ℓ
                                                                            tropy and its variance using the return-time estimator
can be considered as sets of realizations of normal ran-                    are given by
dom variables censored from above by the quantity hc :=
log(Tc )/ℓ = log(n)/ℓ that we call censoring entropy. Then the                             ĥℓ = h̄ + ζ̂ (hc − h̄),                   (38)
estimation procedure for the block entropy is essentially the                             σ̂ℓ2       2
                                                                                                 = s + ζ̂ (hc − h̄) .   2
                                                                                                                                      (39)
one described in section II C. Here we summarize the steps
for performing the estimation of hℓ for return-time statistics.             where hc is the censoring entropy and it is defined as
   1. Given a finite sample trajectory or a symbolic sequence                                    hc := log(Tc )/ℓ.
      x of size 2n, define the censoring time as the half of the
      size of the sample trajectory, i.e., Tc = n. Fix the number        8. Repeat steps 4 – 7 for the set R ℓ in order to have an esti-
      m of sample words or blocks to be collected and the size              mation of the reversed block entropy rate, which allows
      of the block ℓ to be analyzed. Next, define the spacing               to have an estimation of the block entropy production
                                             ρ
      ∆ := ⌊n/m⌋ and the sample set Mℓ of evenly ∆-spaced                   rate just by taking the difference between the reversed
      words that lies along the first half of the trajectory x,             block entropy and the block entropy7 as follows,
      i.e.,
                                                                                                  ê p := ĥRℓ − ĥℓ.                 (40)
          ρ
       Mℓ = {ai : ai = x(i∆ + 1, i∆ + ℓ), 0 ≤ i < m}.

   2. Define the sets of return-time samples and reversed           B.    Waiting time
      return-time samples as
                                                                       The waiting-time estimator for the block entropy requires
                                   (n)                  ρ
              Rℓ := {t ∈ N : ρℓ (a) = t, a ∈ Mℓ },          (30)    two distinct trajectories. In practical situations, we normally
                                   (n)                  ρ           have one single trajectory. In order to overcome this prob-
           R ℓ := {t ∈ N : ρℓ (a) = t, a ∈ Mℓ }.            (31)    lem, we split the original sequence in two equal-sized parts.
                                                                    Since we assume sufficiently rapid mixing, it is possible to re-
   3. Using the previous sets of return-time samples define         gard the second half of the sample to be independent of the
      the sets of block entropy and reversed block entropy          first half, provided that the size of the sample is large enough.
                    ρ                                               Thus, one may consider the two parts of the sample as two
                Hℓ := {h = log(t)/ℓ : t ∈ Rℓ },             (32)    independent trajectories. After that, we collect m different
                    ρ                                               ℓ-words at random along one of those trajectories. This col-
                H   ℓ   := {h = log(t)/ℓ : t ∈ R ℓ }.       (33)
                                                                    lection is denoted by Mℓω , and will play the role of the set
                                                                                                                                   ρ
   4. Next, define the rate uncensored sample values p̂ :=          of sample words, in the same way as it was done by set Mℓ
                                                       ρ
      k/m, where m is the total number of samples in Hℓ and         in section III A. A schematic representation of this sampling
                                                    ρ
      k is the number of uncensored samples in Hℓ (hence-           procedure is shown in Fig. 3.
                                                   ρ
      forth there are m − k censored samples in Hℓ ).                                                         a1   a2       a3   a4

   5. Let 1 ≤ i ≤ k, and denote by hi , each of the uncensored                                                                          2n
                     ρ                                                                                    n
      samples in Hℓ . Then its mean and variance are given
      as follows
                                                                                                          x
                               1 k
                          h̄ := ∑ hi ,                      (34)
                               k i=1                                 FIG. 3. Selection of sample words for the waiting time statistics.
                              1 k
                         s2 := ∑ (hi − h̄)2 .               (35)       The next step consists in defining the censored waiting-time
                              k i=1
                                                                    corresponding to each word in the sample Mℓω . Let x be the
                                                                    trajectory consisting of 2n symbols. Assume that the sam-
   6. Define the sample functions (see section II C and ap-         ples are randomly collected from the segment x(n + 1, 2n − ℓ).
      pendix A for details)                                         Then we define the censored waiting-time and the censored
                                                                    reversed waiting-time for a ∈ Mℓω as follows,
                                     φ (ξ̂ )
                          ζ̂ :=                     ,       (36)                   (n)
                                  p̂ ξ̂ + φ (ξ̂ )                                ωℓ (a, x) := inf{t ≥ 1 : xtt+ℓ−1 = a},               (41)
                                                                                   (n)
                          ξ̂ := Φ−1 ( p̂),                  (37)                 ωℓ (a, x) := inf{t ≥ 1 : xtt+ℓ−1 = a}.               (42)
Estimating entropy rate from censored symbolic time series                                                                               7
                                                                                                                          ω
It is important to notice that the both, the waiting time and the               8. Repeat steps 4 – 7 for the set H ℓ in order to have an
reversed waiting time are bounded from above by n, i.e., the                       estimation of the reversed block entropy rate, which al-
sample waiting times are homogeneously censored by n.                              lows to have an estimation of the block entropy produc-
    The rest of the method follows the lines of the one described                  tion rate by taking the difference between the reversed
in section III A. Here we summarize the main steps:                                block entropy and the block entropy7 as follows,
   1. Given a finite sample trajectory x of size 2n, set the cen-
                                                                                                    ê p := ĥRℓ − ĥℓ.               (53)
      soring time Tc = n equals to the half of the size of the
      sample trajectory. Fix the number m of sample words to
      be collected and the size of the block ℓ. Next, collect m
      different words at random along the symbolic sequence               C.     Hitting time
      x(n + 1 : 2n). We denote by Mℓω this collection of ℓ-
      words.                                                                 The hitting-time estimator requires a set of sample words
                                                                          which should be drawn at random from the process that gen-
   2. Define the sets of waiting-time samples and reversed
                                                                          erates the observed trajectory x. Although we do not know
      waiting-time samples as
                                                                          the law of the process, we can still avoid this problem if the
                                 (n)                                      set of sample words is obtained by choosing the ℓ-words at
           Wℓ := {t ∈ N : ωℓ (a, x) = t, a ∈ Mℓω },                (43)
                                                                          random from another observed trajectory. However, this is the
                               (n)
          W ℓ := {t ∈ N     : ωℓ (a, x)          = t, a ∈ Mℓω }.   (44)   very same method we used for collecting the sample words
                                                                          for the waiting-time estimator. Then, from the statistical point
   3. From the sets of waiting-time samples define the sets of            of view, the hitting-time and waiting-time method can be re-
      block entropy and reversed block entropy                            garded as the same method.
              Hℓω := {h = log(t)/ℓ : t ∈ Wℓ },                     (45)
                  ω
              H   ℓ   := {h = log(t)/ℓ : t ∈ W ℓ },                (46)   IV.    ESTIMATIONS TESTS

   4. Define the rate of uncensored sample values as p̂ :=
                                                                             Now, we will implement the above defined methods for es-
      k/m, where m is the total number of samples in Hℓω
                                                                          timating the block entropy and entropy production rates. First
      and k is the number of uncensored samples also in Hℓω
                                                                          of all, we will perform numerical simulations in order to im-
      ( thus, the remaining m − k samples are censored).
                                                                          plement a control test statistics which will be compared with
   5. Let 1 ≤ i ≤ k, denote by hi , each of the uncensored sam-           the numerical experiments using our methods.
      ples in Hℓω . Then its mean and variance are given as                  In section III we established two methods for estimating
      follows                                                             block entropies by using either, the return-time statistics or
                                                                          the waiting-time statistics. These methods assume that we
                                1 k                                       only have a single “trajectory” or, better said, symbolic se-
                        h̄ :=     ∑ hi ,                           (47)
                                k i=1                                     quence, obtained by making an observation of real life. Our
                                1 k                                       purpose here is to test the estimators themselves, and not the
                       s2 :=      ∑ (hi − h̄)2 .                   (48)   sampling methods. The latter means that we will implement
                                k i=1                                     the estimators (20) and (21) for both, the return-time and
                                                                          the waiting-time statistics, without referring to the sampling
   6. Define the sample functions (see section II C and ap-               schemes mentioned in section III. This is possible because we
      pendix A for details)                                               have access to an unlimited number of sequences, which are
                                                                          produced numerically with a three-states Markov chain. In
                                       φ (ξ̂ )
                        ζ̂ :=                      ,               (49)   this sense we have control of all of the parameters involved
                                 p̂ ξ̂ + φ (ξ̂ )                          in the estimators, namely, the length of the block ℓ, the en-
                     ξ̂ := Φ−1 ( p̂),                   (50)              tropy threshold hc (by which the recurrence-time samples are
                              √                                           censored) and the sampling size |Hℓ |. After that, we will im-
                         2
      where φ (x) = e−x /2 / 2π is the probability density                plement the estimation method described in section III using
      function of the standard normal distribution and Φ its              a single sequence obtained from the Markov chain defined be-
      (cumulative) distribution function.                                 low. The latter is a numerical experiment done to emulate an
                                                                          observation of real life where the accesible sample symbolic
   7. Finally, the estimations for the mean of the block en-              sequences are rather limited.
      tropy and its variance using the return-time estimator
      are given by

                       ĥℓ = h̄ + ζ̂ (hc − h̄),                    (51)   A.     Finite-state Markov chain

                      σ̂ℓ2 = s2 + ζ̂ (hc − h̄)2 ,                  (52)
                                                                             For numerical purposes we consider a Markov chain whose
      again, hc is the censoring entropy defined as above.                set of states is defined as A = {0, 1, 2}. The corresponding
Estimating entropy rate from censored symbolic time series                                                                                                                                                 8

     5                                                   5                                                     10
                                                                                                                       (a)                           (b)                           (c)
                     direct               (a)                                            (b)
    4                reversed                            4

                                                                                                    log( f )
     3                                                   3                                                      1
h                                                   ep
    2                                                    2
                                                                                                               0.1
     1                                                   1
                                                                                                               10
                                                                                                                       (d)                       (e)                           (f)
    0                                                    0
        0      0.2      0.4       0.6   0.8     1            0   0.2   0.4       0.6   0.8     1
                              q                                              q

                                                                                                    log( f )
                                                                                                                1

FIG. 4. Entropy rate and entropy production rate. (a) We show the
                                                                                                               0.1
behavior of entropy rate h and reversed entropy rate hR as a func-
tion of the parameter q using the exact formulas given in eqs. (55)                                              0.2         0.4   0.6   0.8   0.2         0.4   0.6   0.8   0.2         0.4   0.6   0.8
and (56). (b) We display the behavior of entropy production rate as a                                                                                            h
function of q using the exact formula (57).

                                                                                                    FIG. 5. Return-time entropy density for q = 0.60 and ℓ = 10. (a)
                                                                                                    hc = 0.48, (b) hc = 0.57, (c) hc = 0.66, (d) hc = 0.75, (e) hc = 0.84,
stochastic matrix P : A × A → [0, 1] is given by,
                                                                                                    (f) hc = 1.02.
                                         
                         0   q 1−q
                P =  1−q 0           q ,                                                   (54)   the corresponding return time (for the ith realization) will be
                         q 1−q 0                                                                    either, ρi := t − ℓ + 1 or an undefined value ρi > Tc .
                                                                                                       Once we have collected the sample set of return times {ρi }
 where q is a parameter such that q ∈ [0, 1]. It is easy to see that                                we obtain a set of block entropy rates by means of the equation
this matrix is doubly stochastic and the unique invariant prob-
ability vector π = π P is given by π = ( 31 , 13 , 31 ). Moreover,                                                                                           log(ρi )
it is easy to compute the entropy rate and the time-reversed                                                                                    hi =                  ,                                (58)
                                                                                                                                                                ℓ
entropy rate, indeed, they are given by,
                                                                                                    whenever ρi is numerically defined. Of course, we might
                      h(q) = −q log(q) − (1 − q) log(1 − q),                                 (55)   obtain some numerically undefined sample block entropies
                     hR (q) = −(1 − q) log(q) − q log(1 − q).                                (56)   hi > hc due to the censored return times.
                                                                                                        Analogously, we obtain a sample set of reversed entropy
Additionally, the corresponding entropy production rate is                                          rates. That is, we make evolve the Markov chain and stop
given by                                                                                            its evolution at time t until the first ℓ-word a1 , a2 , . . . , aℓ ap-
                                                                                                  pears reversed in the realization, i.e., at−ℓ+1 , at−ℓ+2 , . . . , at =
                                         q                                                          aℓ , aℓ−1 , . . . , a1 or until the time t − ℓ + 1 exceeds the given
               e p (q) = (2q − 1) log         .       (57)
                                        1−q                                                         upper bound Tc . The reversed return time for the realiza-
                                                                                                    tion i will be ρi = t − ℓ + 1 or it is numerically undefined if
The behavior of the entropy rate and entropy production rate                                        t − ℓ + 1 > Tc . Then we obtain the sample set {hi } by means
can be observed in Figure 4 We will use this model to generate                                      of equation hi = log(ρi )/ℓ.
symbolic sequences in order to test the estimators.                                                     Notice that this procedure involves two parameters that can
                                                                                                    freely vary. These are the block length ℓ and hc (or equiv-
                                                                                                    alently Tc ), where hc is an upper bound for the possibly ob-
B.          Statistical features of estimators for censored data                                    served block entropy rates, thus, by censoring the correspond-
                                                                                                    ing sample set.
   The first numerical experiment we perform is intended to                                             Then, we analyze statistically the sample set of block en-
show the statistical properties of the estimators without im-                                       tropy rates and reversed block entropy rates for several values
plementing the sampling schemes introduced above. To this                                           of the free parameters. In Figure 5 we show the histogram of
end, we produce a censored sample set of 5 × 104 return times                                       the relative frequencies of the block entropy rate for ℓ = 10,
obtained from several realizations of the three-state Markov                                        q = 0.60 and several values of hc . Correspondingly, in Fig-
chain. We obtain each of those return times as follows. First                                       ure 6 we show the histogram of the relative frequencies of the
we initialize the Markov chain at the stationary state (i.e., we                                    reversed block entropy rate for ℓ = 10, q = 0.60 and several
choose the first symbol at random using the stationary vec-                                         values of hc .
tor of the chain) and we make evolve the chain. This proce-                                             We can appreciate how the density of the block entropy rate
dure generates a sequence which grows in time, say for in-                                          is censored while ℓ is kept fix. If the value of hc is small, for
stance a1 , a2 , . . . at . The evolution of the Markov chain will                                  most of the samples the return time is numerically undefined
be stoped at time t either, until the first ℓ-word a1 , a2 , . . . , aℓ                             because the samples are censored from above. This is seen
appears again, that is if, at−ℓ+1 , at−ℓ+2 , . . . , at = a1 , a2 , . . . , aℓ                      for instance, in Figure 5a, in which hc takes the smallest value
or when the time t − ℓ + 1 exceeds a given bound Tc . Then,                                         for the displayed graphs. In this case, approximately only a
Estimating entropy rate from censored symbolic time series                                                                                                                                                                                                                                          9

            10                                                                                                                                    1.1                                                                             3                       0.8
                     (a)                                     (b)                                      (c)                                                                                                                               (b) return-time
                                                                                                                                                        (a) return-time 0.67
                                                                                                                                                                                                                                       (reversed entropy)0.75
                                                                                                                                                   1                                                                             2.5
                1
 log( f )

                                                                                                                                                                                                                                                          0.7
                                                                                                                                                                        0.66
                                                                                                                                                  0.9                                                                             2                      0.65
                                                                                                                                              ^                                                                             ^
                                                                                                                                              h                                                                             hR
            0.1                                                                                                                                                         0.65
                                                                                                                                                                           0.7        0.8         0.9     1     1.1                                       0.6
                                                                                                                                                                                                                                                                0.7    0.8    0.9       1     1.1
                                                                                                                                                  0.8                                                                            1.5

            10
                     (d)                                     (e)                                      (f)                                         0.7                                                                             1

                1                                                                                                                                 0.6                                                                            0.5
 log( f )

                                                                                                                                                         0.3   0.4   0.5       0.6   0.7    0.8     0.9   1   1.1     1.2              0.3   0.4   0.5   0.6    0.7   0.8    0.9    1       1.1     1.2
                                                                                                                                                                                      hc                                                                         hc

            0.1

                                                                                                                                              FIG. 8. Return-time entropy estimations as a function of hc for
                0.2        0.4     0.6       0.8     1 0.2         0.4     0.6       0.8      1 0.2         0.4     0.6       0.8         1
                                                                               h                                                              several values of ℓ. Panel (a): The graphics shows the behavior of
                                                                                                                                              ĥ as we increase the entropy threshold hc for ℓ = 6 (filled squares),
                                                                                                                                              ℓ = 9 (filled triangles), ℓ = 12 (filled circles), ℓ = 15 (X’s), ℓ = 18
FIG. 6. Reversed return-time entropy density for q = 0.60 and ℓ =                                                                             (stars), ℓ = 19 (plus). Panel (b): it is shown the behavior of the
10. (a) hc = 0.48, (b) hc = 0.57, (c) hc = 0.66, (d) hc = 0.75, (e)                                                                           estimated reversed entropy for the same parameter values used in
hc = 0.84, (f) hc = 1.02.                                                                                                                     panel (a).

25% of the samples are numerically well-defined resulting in                                                                                  not well manifested for the block entropy rate. We can also
the ‘partial’ histogram displayed in Figure 5a. In Figure 5b,                                                                                 observe that increasing the block length, the histogram pro-
the value of hc is increased causing the histogram to ‘grow’.                                                                                 gressively evolve towards a bell-shaped distribution, which is
In the remaining graphs, from Figure 5c to Figure 5d, this ten-                                                                               reminiscent of the normal one. This shows that an estimation
dency is clear, as we increase the value of hc the number of                                                                                  using our approach could be more accurate for large values of
numerically defined samples grows, thus completing gradu-                                                                                     block lengths due to the central limit theorem.
ally the corresponding histogram. Something similar occurs                                                                                       Once we have the sample set of block entropy rates we
for the reversed block entropy shown in Figure 6.                                                                                             use the estimation procedure for censored data as described
   On the other hand, if we keep hc constant and vary the block                                                                               in Section III. We perform this procedure for the entropy rates
length ℓ, we can appreciate the evolution of the histogram to-                                                                                and reversed entropy rates obtained from the the return-time
wards a normal-like distribution. We show this effect in Fig-                                                                                 and the waiting-time statistics.
ure 7 for q = 0.60 and hc = 1.155 fixed. This is in agree-                                                                                       In Figure 8 we show the estimation of the block entropy
ment with the central limit theorem, as we have mentioned                                                                                     rate and the reversed block entropy rate using the return-time
in previous sections. In Figure 7 we show the histograms for                                                                                  statistics. In Figure 8a, the displayed curves (solid black lines)
ℓ = 6, 9, 12, 15, 18 and 19 (panels (a)–(f) respectively). We                                                                                 show the behavior of the estimation of the block entropy rate
observe that for the lowest value of ℓ, the histogram is rather                                                                               as a function of the censoring bound hc for several values of
irregular, which means that the central limit theorem is still                                                                                ℓ. This figure exhibits two important features of our estima-
                                                                                                                                              tion technique. Firstly, we notice that the estimation of the
                                                                                                                                              entropy rate has large fluctuations for small hc . We can say
            4                                                                                                                                 that the smaller hc , the larger statistical errors are observed, as
                    (a)                                  (b)                                         (c)
            3
                                                                                                                                              expected. Secondly, we observe that, the larger ℓ, the better
                                                                                                                                              the estimation. The latter can be inferred from the fact that
      f 2
                                                                                                                                              the curve with the largest value of ℓ in Figure 8a is closest to
            1                                                                                                                                 the exact entropy rate (solid red line). A similar behavior oc-
            0
                                                                                                                                              curs for the reversed block entropy rate estimations shown in
            0.4                  0.6           0.8     0.4               0.6           0.8    0.4                 0.6           0.8
                                                                                                                                              Figure 8b.
            6 (d)                                        (e)                                     (f)
                                                                                                                                                 For the waiting-time statistics an analogous behavior oc-
            4                                                                                                                                 curs. In Figure 9 it is shown the curves for the estimations
      f
                                                                                                                                              of the block entropy rate, in panel (a), and the reversed block
            2
                                                                                                                                              entropy rate, in panel (b). As expected, the estimations for
            0
                                                                                                                                              small values of the censoring bound hc have large fluctua-
            0.5            0.6         0.7         0.8 0.5         0.6         0.7
                                                                               h
                                                                                           0.8 0.5          0.6         0.7         0.8
                                                                                                                                              tions, which gradually decrease as hc is increased. This is
                                                                                                                                              clearly observed in Figure 9 because the black solid lines de-
                                                                                                                                              viate largely from the exact value (solid red line) for small
FIG. 7. Entropy estimated by means of the return-time statistics                                                                              values of hc . Concerning the value of ℓ, it is clear that for the
for the three-states Markov chain We show the histograms of the es-                                                                           largest value of ℓ, the estimation is closer to the exact entropy
timated entropy density for q = 0.60, hc = 1.155 and (a) ℓ = 6, (b)                                                                           rate for hc large enough (see the insets in Figure 9).
ℓ = 9, (c) ℓ = 12, (d) ℓ = 15, (e) ℓ = 18, (f) ℓ = 19. We obtained the                                                                           All these observations allows us to state that, for obtain-
corresponding histograms using 5 × 104 sample words in each case.                                                                             ing the best estimations (as far as possible within the present
Estimating entropy rate from censored symbolic time series                                                                                                                                                                                                 10

     1
          (a) waiting-time
                              0.67
                                                                               1.1
                                                                                      (b) waiting-time 0.76
                                                                                     (reversed entropy)
                                                                                                                                                                (a)          return-time                               (b)         waiting-time
                                                                                                                                                     0.8                                                     0.8
    0.9                                                                                                0.74

                              0.66
                                                                                1                                                                    0.7                                                     0.7
                                                                                                                                                                                                           ^
                                                                                                                                                   h^ 0.6
                                                                          ^                            0.72
^                                                                         hR
h 0.8
                                                                               0.9
                                                                                                                                                                                                           h 0.6
                              0.65                                                                      0.7
                                 0.7        0.8    0.9    1   1.1                                         0.7        0.8    0.9   1   1.1            0.5                                                     0.5
    0.7                                                                        0.8
                                                                                                                                                     0.4                                                     0.4

    0.6
              0.4            0.6             0.8          1         1.2
                                                                               0.7
                                                                                          0.4         0.6             0.8         1         1.2
                                                                                                                                                     0.3                                                     0.3
                                                                                                                                                            0         5    10     15     20    25     30           0         5    10    15     20    25     30
                                       hc                                                                       hc

                                                                                                                                                       3        (c)       return-time (reversed)              3        (d)       waiting-time (reversed)
                                                                                                                                                     2.5                                                     2.5
FIG. 9. Waiting-time entropy estimations as a function of hc for
several values of ℓ. Panel (a): The graphics shows the behavior of                                                                                h^R
                                                                                                                                                      2                                                    ^ 2
                                                                                                                                                                                                           hR
ĥ as we increase the entropy threshold hc for ℓ = 6 (filled squares),                                                                               1.5                                                     1.5
ℓ = 9 (filled triangles), ℓ = 12 (filled circles), ℓ = 15 (X’s), ℓ = 18                                                                                1                                                       1
(stars), ℓ = 19 (plus). Panel (b): it is shown the behavior of the
estimated reversed entropy for the same parameter values used in                                                                                     0.5                                                     0.5

panel (a).                                                                                                                                                  0         5    10    15    20      25     30           0         5    10    15    20      25    30
                                                                                                                                                                            block length                                           block length

scheme) we should keep hc as large as possible. Similarly, in
                                                                                                                                                  FIG. 10. Estimation of block entropy rate as a function of ℓ. Black
order to assure the central limit to be valid, we should take the
                                                                                                                                                  lines stand for the estimated block entropy rate and red lines are
block length ℓ as large as possible.                                                                                                              the exact entropy rate. We show the curves corresponding to the
   Now, we turn our attention to the implementation of the es-                                                                                    Markov chain parameter q = 0.50 (solid lines), q = 0.60 (dotted
timations of block entropy rate using the schemes described                                                                                       lines), q = 0.70 (dashed lines), q = 0.80 (dotted–dashed lines) and
in Section III. For this purpose, first, we generate a single se-                                                                                 q = 0.90 (double-dotted–dashed lines). (a) Block entropy rate es-
quence of N = 12 × 106 symbols by means of the three-states                                                                                       timations using the return-time statistics. (b) Same as in (a) using
Markov chain. Then, we implement the sampling schemes for                                                                                         waiting-time statistics. (c) Reversed block entropy rate estimations
the return-time and the waiting-time statistics. In each case,                                                                                    using the return-time statistics. (d) Same as in (c) using waiting-time
we collect m = 5 × 104 sample words, which correspond to                                                                                          statistics.
m = 5 × 104 samples of block entropy rates and reversed block
entropy rates. These sample sets contain both, numerically
defined and undefined samples, the latter ones are due to the                                                                                     TABLE I. Block entropy estimations using the return-time statistics.
censoring. In this case, the censoring bound for entropy rate                                                                                       Parameters                                             Estimations
hc is determined by                                                                                                                                 q       ℓ∗                       p̂                 ĥ          ∆ĥ                          ∆ĥ/h
                               log(N/2)                                                                                                            0.5     22                     0.6066            0.692325     0.000822                      0.001187
                                          .              hc =  (59)                                                                                0.6     23                     0.5488            0.669906     0.003106                      0.004636
                                    ℓ
                                                                                                                                                   0.7     25                     0.5718            0.607702     0.003162                      0.005203
   We should emphasize that in the present case we have con-                                                                                      0.80     30                     0.5880            0.496892     0.003510                      0.007064
trol only on a single parameter, which we take as the length                                                                                       0.9     30                     0.9250            0.328234     0.003151                      0.009600
of the block ℓ. Contrary to the above exposed numerical ex-
periments, in this case hc is no longer a free parameter; it is
actually determined by means of the length of the symbolic
sequence N and ℓ, the length of the block. Consequently,                                                                                          for a large ℓ we have a short censoring upper bound, implying
changes in the values of ℓ imply changes in the value of hc .                                                                                     that only a few samples for block entropy rates are numeri-
The latter is important for two reasons: on the one hand, we                                                                                      cally well-defined. This entails a loss of accuracy since, the
have that, in order to assure the validity of the central limit the-                                                                              less numerically well-defined samples, the larger becomes the
orem, we should take ℓ as large as possible (actually, the true                                                                                   variance of the estimators. This phenomenon can be observed
entropy rate is obtained in the limit ℓ → ∞). On the other hand,                                                                                  in Figure 10 for several values of the parameter q of the three-
it is desirable to have as much as non-censored samples as                                                                                        states Markov chain defined in Section IV A.
possible, i.e., it is convenient for hc to be as large as possible.                                                                                  In Figure 10a we show the estimation of block entropy rate
However, in practice, we cannot comply with both require-                                                                                         as a function of ℓ using the return-time statistics. The red
ments at once because of expression (59): the larger ℓ, the                                                                                       lines show the exact value of the entropy rate obtained with
shorter hc , whenever the length N of the symbolic sequence is                                                                                    eq. (55), while the black lines correspond to the estimations
kept constant (which commonly occurs in real-world observed                                                                                       of the block entropy rate using the return-time statistics under
data).                                                                                                                                            the sampling scheme described in Section III A. Figure 10b
   An important consequence of the latter, that we cannot                                                                                         shows the same as in Figure 10a but using the waiting-time
make ℓ as large as we want. Actually, the maximal block                                                                                           statistics. Figures 10d and 10d show the corresponding curves
length that it is possible to use for entropy estimations is de-                                                                                  for the reversed block entropy rate for the return-time and the
termined by the accuracy we would like to obtain. This is,                                                                                        waiting-time statistics, respectively.
Estimating entropy rate from censored symbolic time series                                                                           11

TABLE II. Block entropy estimations using the waiting-time statis-   TABLE V. Entropy production estimations from return and waiting
tics.                                                                time statistics.
  Parameters                        Estimations                                        Return                         Waiting
 q       ℓ∗          p̂          ĥ          ∆ĥ          ∆ĥ/h        q           êp           ∆ep              êp          ∆ep
0.5      22       0.6096     0.691518     0.001630      0.002357     0.50      −0.000876      0.000876         0.001266     0.001266
0.6      23       0.5460     0.670477     0.002535      0.003781     0.60      0.081349       0.000256         0.080766     0.000327
0.7      26       0.4590     0.609711     0.001153      0.001891     0.70      0.326271       0.012648         0.328695     0.010224
0.8      30       0.5926     0.496028     0.004374      0.008818     0.80      0.775849       0.055928         0.789039     0.042738
0.9      30       0.9242     0.333756     0.008673      0.025986     0.90      1.653559       0.104221         1.666150     0.091630

TABLE III. Time-reversed block entropy estimations using the
return-time statistics.
  Parameters                        Estimations                      timal value ℓ∗ for q = 0.50, q = 0.60, q = 0.70, q = 0.80, and
 q       ℓ∗          p̂         ĥR          ∆ĥR        ∆ĥR /hR    q = 0.90 for the return-time and the waiting-time statistics re-
0.5      22       0.6104     0.691449     0.001698      0.002456     spectively. We also show a comparison of the estimated block
0.6      21       0.4620     0.751255     0.002850      0.003794     entropy rate with their corresponding exact values. We can
0.7      17       0.4504     0.933973     0.015810      0.016928     appreciate from these tables that the relative error ∆ĥ/h (the
0.8      12       0.5586     1.272741     0.059438      0.046701     relative difference between the estimation and the exact value)
0.9      8        0.4642     1.981793     0.101070      0.050999     is lower than 0.06. Moreover, for q = 0.50 and q = 0.60, the
                                                                     relative errors are even less than 1%. In Tables III and IV we
                                                                     show the estimations of reversed entropy rate, and the corre-
                                                                     sponding optimal ℓ∗ , for the return-time and the waiting-time
    Observe that all the curves of the estimated block entropy
                                                                     statistics respectively.
rate have a common behavior that we anticipated above: there
is a special value of ℓ for which the estimation seems to be
optimal. But, for small and large values of ℓ the estimated en-         In Figure 11 we show both the block entropy rate (panel
tropy deviates visibly from the exact value. This phenomenon         a) and the reversed block entropy rate (panel b) as a function
is produced because, the estimators become better as the ℓ           of the parameter q. In that figure, the estimation correspond-
increases but also decreases the number of numerically well-         ing to the return-time and waiting-time statistics are compared
defined samples due to the censoring. A criterium for obtain-        with the exact value. We observe that the return-time and
ing an optimal ℓ∗ might not be unique, so here we use a simple       waiting-time statistics have approximately the same accuracy.
one. First of all, once the value of ℓ is chosen, the censoring      From Figure 11 we can also see an interesting behavior of
entropy rate is fixed according to eq. (59). This bound in turn,     the estimation, that is, the larger the entropy rate, the larger
determines the number of numerically well-defined samples;           the deviation from the exact result. This effect can actually
the shorter hc , the lower number k of numerically well-defined      be explained as follows. First we should have in mind that
samples we have. Due to relationship (59) we can also say that       the return-time and waiting-time can be interpreted as mea-
the larger ℓ, the lower number k of numerically well-defined         sures of the recurrence properties of the system. Specifically,
samples. A simple way to optimize this interplay between ℓ           the entropy rate itself can, in some way, be interpreted of as
and k = k(ℓ), is taking the block length ℓ∗ for which k(ℓ∗ ) is      a measure of the recurrence time per unit length of the word
as close as possible to the half of the sample size m.               (this is a consequence of the fact that the logarithm is a one-to-
                                                                     one function). Thus, it becomes clear that the larger entropy
    Using this criterium we compute the optimal block length
                                                                     rate, the larger the recurrence times in the system. Since all
ℓ∗ , and the corresponding estimated value of entropy rate, for
                                                                     the samples are censored from above it should be clear that a
several values of the parameter q of the Markov chain. Ta-
                                                                     system having larger recurrence times will have larger errors
bles I and II we show the estimated entropy rate ĥ and the op-
                                                                     in the estimations. Therefore we may say that a system with
                                                                     large entropy rate will exhibit large statistical errors in its es-
                                                                     timations. Despite of this effect, we observe in Figure 11 that
TABLE IV. Time-reversed block entropy estimations using the          the errors in the estimations are sufficiently small for practical
waiting-time statistics.                                             applications.
  Parameters                        Estimations
 q       ℓ∗          p̂         ĥR          ∆ĥR        ∆ĥR /hR       Finally we show in Table V the entropy production rate of
0.5      22       0.6094     0.692784     0.000363      0.000524
                                                                     the system by taking the difference between the block entropy
0.6      21       0.4612     0.751243     0.002861      0.003808     rate and the reversed block entropy rate, for both, the return
0.7      16       0.6494     0.938406     0.011377      0.012124     and the waiting-time statistics. It is important to remark that
0.8      12       0.5286     1.285067     0.047112      0.036661     these recurrence statistics are consistent one to each other,
0.9      8        0.4472     1.999906     0.082957      0.041480     having moderate deviations (statistical errors) when compared
                                                                     with the exact values.
Estimating entropy rate from censored symbolic time series                                                                                                    12

             0.7                                                                             0.522
                                                            (a)                               0.52
                                                                                                     (a)              Lorenz-like            estimated
                                                                                                                                             Lyapunov
             0.6                                                                             0.518
                                                                                             0.516
                                                                                             0.514
           h 0.5                                                                             0.512

             0.4
                                                                                             0.445
                                                                                                     (b)

                                                                              entropy rate
                            return-time                                                                            Logistic (a = 3.6)
                            waiting-time                                                      0.44
             0.3            exact
                                                                                             0.435
                   0.5           0.6        0.7       0.8   0.9
                                              q                                               0.43
                2.1
                             return-time                                                       0.2
                             waiting-time
                1.8
                             exact                                                            0.18 (c)        Manneville-Pomeau (z=31/16)
                                                                                              0.16
                1.5
                                                                                              0.14
           hR
                1.2                                                                           0.12
                                                                                               0.1                                                        7
                                                                                                   3           4            5            6
                0.9                                                                              10          10           10            10               10
                                                            (b)                                                           Tc
                0.6
                      0.5         0.6       0.7       0.8   0.9
                                                  q
                                                                            FIG. 12. Estimation of block entropy rate as a function of the cen-
                                                                            soring time Tc for several chaotic maps. In all these numerical exper-
FIG. 11. Estimation of block entropy rate and reversed block entropy        iments we obtained a symbolic sequence of 8 × 106 symbols long.
rate as a function of q. (a) It is shown the block entropy rate estimated   We obtained a sample set of 2 × 104 words following the waiting
from the return-time statistics (black filled circles) and the waiting-     time sampling scheme. We fix the censoring time Tc and compute the
time statistics (red filled squares). We also show the corresponding        corresponding estimations for entropy rate. We repeat the estimation
exact values of the entropy rate (black solid line) of the system for       for several values of the censoring time Tc . Solid lines represent the
comparing these estimations. In panel (b) the same as in panel (a),         waiting-time estimations and dashed lines represent the estimation
but for the reversed entropy rate.                                          reported in Ref. 39 of the Lyapunov exponent for (a) the Lorenz-like
                                                                            map (b) the logistic map (a = 3.8) and (c) the Manneville-Pomeau
                                                                            map (z = 31/64).
V.   EXAMPLES

                                                                            It is clear that this map has a generating partition defined by
   In this section we apply our methodology for estimating en-
                                                                            {[0, 1/2), [1/2, 1]} allowing a direct symbolization of the time
tropy rate in some well-studied systems for which either, the
                                                                            series.
entropy rate or the (positive) Lyapunov exponents are known.
                                                                               The second chaotic map we use is the well-known family
We also include a couple of examples for estimating the en-
                                                                            of logistic maps defined as
tropy production rate for showing the performance of our
estimator for the analysis of the time-reversibility (or time-                                               Ka (x) := ax(1 − x).                         (61)
irreversibility) of the process from a finite time-series.
                                                                            We take a = 3.6, a = 3.8 and a = 4 corresponding to the en-
                                                                            tropy rate (estimated from Lyapunov exponents) reported in
A.   Entropy rate for chaotic maps                                          Ref. 39. As in the case of the Lorenz-like map, the generating
                                                                            partition is given by {[0, 1/2), [1/2, 1]} which is the one we
   For one-dimensional chaotic maps, a theorem of Hofbauer                  use for the symbolization of the time-series.
and Raith38 allows to compute the entropy rate by means of                     Finally we test our method on the Manneville-Pomeau
the Lyapunov exponent and the fractal dimension of the cor-                 maps defined as
responding invariant measure. We use the results reported in
Ref. 39 for the entropy rate estimated from the Lyapunov ex-                                               Mz (x) := x + xz (mod 1),                      (62)
ponents as reference values. We test our methodology for
three chaotic maps: a Lorenz-like transformation, the logis-                which is a family of chaotic maps exhibiting a dynamics
                                                                            with long range correlations. This family is parametrized by
tic map and the Manneville-Pomeau map.
                                                                            z ∈ R+ . We concentrate on parameter values within the in-
   The first chaotic map we use to exemplify our estimator for
                                                                            terval 1 < z < 2 for which the map admits a unique abso-
entropy rate is a Lorenz-like map L : [0, 1] → [0, 1] defined as
                                                                            lutely continuous invariant measure39 . Additionally, for such
               (           3/4                                             parameter values the dynamics has a power law decay of cor-
                  1 − 3−6x
                        4         if 0 ≤ x < 1/2                            relations. We use the parameter values z1 = 3/2, z2 = 7/4,
     L(x) :=                                                (60)
                      6x−3 3/4                                              z3 = 15/8, z4 = 31/16, z5 = 63/32, and z6 = 127/64. In
                                  if 1/2 ≤ x ≤ 1.
                           
                        4
You can also read