1. Introduction
Complexity is an important property of a complex system such as the living organisms, Internet, traffic system etc. Measuring system complexity has long been of great interest in many research fields. Since complexity is still elusive to define, a few approximate metrics have been used to quantify complexity. One widely used measure is entropy which quantifies the irregularity or randomness. Complexity and entropy, however, diverge when complexity reaches the peak. Before the peak, complexity increases with complexity, but complexity decreases with entropy after the peak. To provide approximate solution to this dilemma, people have proposed many empirical measures. A popular one is the multi-scale entropy (MSE) proposed by Costa et al. [
1]. MSE is based on Sample entropy (SampEn) [
2,
3], which is an extension of the well-known Approximate entropy (ApEn) [
3,
4] after removing the self-matching induced bias. SampEn has gained popularity in many applications such as neurophysiological data analysis [
5] and functional MRI data analysis [
6,
7] because of the relative insensitivity to data length [
2,
8]. Because complex signal often presents self similarity when the signal is observed at different time scale, Costa et al first applied SampEn to the same signal but at different time scales after coarse graining. When applied to Gaussian noise and 1/f noise, it was observed that SampEn of Gaussian noise decreases with the signal subsampling scale while it stays at the same level for most of scales of a 1/f process. Since a 1/f process is known to have higher complexity (defined by the higher self similarity) than Gaussian noise, the diverging MSE of a 1/f noise and the Gaussian noise appears to support that MSE may provide an approximate approach to measure system complexity. Since its introduction, MSE has been widely used in many different applications as reflected by the thousands of paper citations [
1,
9]. While MSE and its variants have been shown to be effective for differentiating different system states through simulation or real data, it introduces bias by using the same threshold for identifying the repeated transit status at all time scales. Nikulin and Brismar [
10] first observed that MSE not purely measures entropy but both entropy and variation at different scales. We here claimed that the changing variation captured by MSE is mainly caused by an incomplete scaling during the coarse-graining process and the subsequent variance change induced entropy change should be considered as a systematic bias to be removed.
The rest of this report is organized as follows.
Section 2 is the background. To better understand the series of entropy formation, we introduced Shannon entropy, ApEn, SampEn, and MSE.
Section 3 describes the bias caused by the coarse-graining process and the one threshold-for-all-scales MSE algorithm. Both a mathematical solution and a practical solution were provided to correct the bias.
Section 5 concludes the paper.
2. Entropy and MSE
This section provides a brief history about the evolution of entropy and approximate entropy measures.
Hartley and Nyquist first used logarithm to quantify information [
11,
12]. Shannon then proposed the concept of Shannon entropy as a measure of information through the sum of the logarithmically weighted probability [
13]. Denoting a discrete random variable by
X and its probability by
, Shannon entropy of
X is formulated as:
In an analogous manner Shannon defined the entropy of a continuous distribution with the density distribution function(pdf)
by:
where
E represent the expectation operator. Without loss of generality, in this paper we use natural logarithms to calculate entropy. When the entropy calculated via a logarithm to base
b, it could be calculated by
.
Shannon entropy was then extended into the Kolmogorov–Sinai(K-S) entropy [
14] for characterizing a dynamic system. Assume that the F-dimension phase space is partitioned into a collection of cells of size
and the state of the system is measured at constant time intervals
. Let
be the joint probability that the state of system
is in cell
,
is in cell
, … , and
is in cell
. The K-S entropy is defined as
K-S entropy depends several parameters and is not easy to estimate. To solve this problem, Grassberger and Procaccia [
15] proposed
entropy as a lower bound of K-S entropy. Given a time series
with length
N, define a sequence of m dimension vectors
,
. The m dependence of functions are
and
where
is Euclidean metric
and
is Heaviside step function.
entropy is defined as
By incorporating the embedding vector based phase space reconstruction idea proposed by Takens [
16] and replacing the Euclidean metric with the Chebyshev metric
, Eckmann and Ruelle [
17] proposed an estimate of the K-S entropy through the so-called E-R entropy:
where the delay is often set to be
.
The E-R entropy has been useful in classifying low-dimensional chaotic systems, but it becomes infinity for a process with superimposed noise of any magnitude [
18]. Pincus [
4] then extended the E-R entropy into the now well-known ApEn depending on a given embedding window length
m and a distance cutoff
r for the Heaviside function:
and
SampEn was proposed by Richman and Moorman [
19] as an extension of ApEn to avoid the bias induced by countering the self-matching of each of the embedding vectors. Specifically, SampEn is formulated by:
The coarse-graining multi-scale entropy-based complexity measurement can be traced back to the work by Zhang [
20] and Fogedby [
21]. In [
1,
22] Costa et al. calculated entropy at each coarse-grained scale using SampEn and named this process as the MSE. As commented by Nikulin and Brismar [
10], a problem of the MSE algorithm is the use of the same matching criterion
r for all scales, which causes systematic bias to SampEn.
3. The Systematic Bias of Entropy Calculation in MSE
In MSE [
1,
22], the embedding vector matching threshold
r in defined by the standard deviation of the original signal. Using the same threshold, entropy of Gaussian signal decreases with the scale used to downsample the original signal. By contrast, entropy of 1/f signal remains unchanged when scale increases. As 1/f signal is known to have high complexity while Gaussian noise has a very low complexity, the monotonic MSE decaying trend or the sum of MSE at different scales were proposed as a metric for quantifying signal complexity.
However, the moving-average based coarse-graining process automatically scales down the subsampled signal at different time scales. Without correction, this additional multiplicative scaling will be propagated into the standard deviation of the signal to be assessed at each time scale and will artificially change sample entropy. This bias can be easily seen from the coarse-graining of a Gaussian noise.
Denote a Gaussian variable and its observations by , where N indicates the length of the time series. The coarse-graining or moving averaging process can be described by , where is the coarse-graining level or the so-called “scale”. Given the mutual independence of the individual samples of X, the moving averaging of these samples can be considered as an average of independent random variables rather than observations of a particular random variable. In other word, we can rewrite to be , where is a random variable. For Gaussian noise X, will be Gaussian noise too and can be fully characterized with the same mean and standard deviation (SD) . Through a simple mathematics operation, we can get that . Because SD() monotonically decreases with , if we do not adjust the matching threshold, the number of matched embedded vectors will increase with , resulting a decreasing SampEn.
Entropy of a Gaussian distributed variable can be calculated through Shannon entropy:
For the simplicity of description, we often normalize the random variable to have a
and
. Considering the scale-dependent SD derived above, we can then get the Shannon entropy of the Gaussian variable at the scale
by
This equation clearly demonstrates the non-linearly but monotonically decreasing relationship of entropy with respect to scale
.
Below, we provided mathematical derivation of the dependence of MSE on the signal subsampling scale. Given the
m dimensional embedding vectors
, sample entropy can be expressed as [
22]
where
is the Chebyshev distance.
For
, we can have
and
Thus,
Based on the iid condition of
, we can draw a conclusion that
If
, we can get
and
Therefore,
and
given the mutual independence of
. It should be noted that this conclusion does not require the condition of identical distribution, as long as the condition of independence is sufficient.
For the simplicity of description, we re-denote
and
by two general normally distributed but independent random variables
and
whose means are 0 and SDs are 1. The joint probability density functions (PDF) is
and probability is
We can then get
Similar to Shannon entropy calculating, after normalize the random variable to have a
and
, the scale-dependent SD derived for coarse grained signal is
. We can get
Since the interval
increases with
, the above integral monotonically increases with
. Accordingly, the negative logarithm based sample entropy
will monotonically decreases with
. This is consistent with the aforementioned Shannon entropy-based MSE bias description.
The systematic bias in MSE can be corrected by using a scale adaptive matching threshold. One approach to adjust the threshold is to use
for scale
during
calculation. This works well for Gaussian signal but may not be effective for other signals if they have extra scale-dependent SD behavior in addition to that induced by the subsampling scale. Finding the theoretical scale-dependent SD equation may not be trivial too. Instead, SD can be directly calculated from the data after each coarse graining. This approach has been proposed in [
10].
To demonstrate the systematic bias of MSE and the effeteness of the correction method, we used three synthetic time series with known entropy difference: the Gaussian noise, a 1/f noise, and a random walk. The length of time series was . MSE with and without bias correction were performed.