Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Dyncomp Preprint

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

DYNCOMP 1

dyncomp: an R package for Estimating


the Complexity of Short Time Series
by Tim Kaiser

Abstract As non-linear time series analysis becomes more and more wide-spread, measures that can
be applied to short time series with relatively low temporal resolution are in demand. The author
introduces a complexity parameter for time series based on fluctuation and distribution of values, as
well as its R implementation. This parameter is validated with a known chaotic dynamic system. It is
shown that the parameter’s validity approaches or even surpasses that of most similar measures. In
another step of validation, data from time series of daily ratings of anxiety and depression symptoms
is used to show the utility of the proposed measure.

Introduction
The study of complex systems is a relatively new scientific field that seeks to understand the interaction
of a system’s components and the interaction of a system with its environment. According to Foote
(2007), the term "complex system" is used to describe

[. . . ]phenomena, structures, aggregates, organisms, or problems that share some


common themes: (i) They are inherently complicated or intricate, in that they have
factors such as the number of parameters affecting the system or the rules governing
interactions of components of the system; (ii) they are rarely completely deterministic,
and state parameters or measurement data may only be known in terms of probabilities;
(iii) mathematical models of the system are usually complex and involve nonlinear, ill-
posed, or chaotic behavior; and (iv) the systems are predisposed to unexpected outcomes
(so-called “emergent behavior”).

Because it offers a theoretical framework that cuts across all scientific disciplines, and because
many phenomena approach the behavior of a complex dynamic system, the concept has been applied
to various fields of study. It has proved useful e.g. in understanding economic processes (Misra
et al., 2011) or social networks (Butts, 2001). The application of dynamic systems theory to the human
brain and human behavior also created a promising body of research with strong implications for
neuroscience (Orsucci, 2006) or psychotherapy (Hayes et al., 2007). As implied by the definition,
complex systems are subject to change over time, but the changes occur in a non-stationary, non-linear
fashion. However, non-linear change processes are notoriously hard to predict, so various methods
have been developed to capture indicators of the change of a complex system (Scheffer et al., 2009).
Their main use lies in detecting early signs of so-called "critical transitions", at which a system will
rapidly reorganize into a different state. Every phenomenon that is studied as a complex system is
prone to this behavior (see Scheffer et al., 2012, for an overview). Detecting critical instabilities in
complex systems is a worthwhile endeavor, as it facilitates taking certain actions for all possible forms
of catastrophic change. Depending on the desired outcome, actions can be taken to either promote
change processes, reduce their probability or make preparations for when the anticipated change
occurs. This is useful on many levels of analysis. Examples include earthquake warnings (Ramos,
2010) or interactions between ecological conditions and societal changes (Caticha et al., 2016). In
medicine, early signs of epileptic seizures (demonstrated by Martinerie et al., 1998) can enable medical
practitioners or patients to take precautions. Early warning signals have also been studies in bipolar
disorder (Glenn et al., 2006), preceding an outbreak of mania or depression by 60 days.
While the diversity of fields that profit from this theoretical approach is high, a major weakness
of most of the proposed methods is their need for relatively large amounts of data and their lack of
validity, especially when only short time series are available. For example, the study of human change
processes in psychology mainly relies on measuring constructs with methods at a low sample rate,
like questionnaires. Additionally, the widespread use of Likert scales that have a limited range of
values (like 1 to 10 or even 1 to 5) produces time series with a very limited distribution. This would
not be a problem if only post-hoc analyses of complexity were conducted, but the monitoring of states
and transitions of complex systems depends on valid methods that can be used in real time. Methods
that allow for the real-time analysis of short time series with low sample rate are scarce. Bandt and
Pompe (2002) were the first to propose a measure that reached acceptable validity for these scenarios:
Permutation Entropy (PE). An R function based on their approach was provided by Sippel et al. (2016).
It is the goal of the proposed package to provide researchers with measures that accomplish this goal
in a reliable and valid manner.
DYNCOMP 2

Figure 1: Moving window of analysis. The window, as indicated by solid blue lines, moves to the
right by one value.

Description and basic usage


After reviewing the principles underlying the estimation of complexity in time series, this section
introduces the dyncomp function. dyncomp provides the user with an easy to use function that
estimates the complexity of the provided time series by analyzing the values in windows that move
along the vector, as visualized in 1. For each window, an estimate of complexity is calculated. This is
repeated until the window reaches the end of the time series and a vector containing the complexity
estimates in temporal order is returned.
Usage of the complexity function is as follows:

complexity(x, scaleMin = min(x), scaleMax = max(x), width = 7, measure = "complexity",


rescale = FALSE)

The x argument can be any numeric vector, but the function will only provide meaningful results
when values are ordered chronologically from oldest to newest. The function will fail if x is not
numeric.
The scaleMax and scaleMin arguments determine the theoretical maximum and minimum of the
vector. If they are not provided, the function will take the observed maximum and minimum.
The width argument determines the size of the moving window as described in figure 1.
The measure argument determines if the compound measure of complexity should be returned (the
default), or if one of its components (either "fluctuation" or "distribution") should be returned.
All measures are described in greater detail in the following section.
The rescale argument determines if the returned vector of complexity estimates should be rescaled
using the scale minimum and maximum. This is especially useful if the resulting values are to be
plotted in the same graph.

The dyncomp Measures of Complexity


This measure follows the concept of Schiepek and Strunk (Schiepek and Strunk, 2010), combining a
measure of relative fluctuation intensity and relative deviation of observed values from a hypothetical
uniform distribution in the moving window of analysis. The window size can be chosen freely and
depends on the temporal resolution and length of the time series. It is hypothesized that a measure
that is based on this approach is applicable to short time series that only allows for window sizes of
five to seven points.

Fluctuation

The fluctuation measure used in this function is based upon a well-known measure for fluctuation in
time series: the mean square of summed differences (MSSD) (Von Neumann et al., 1941). It is defined
DYNCOMP 3

as follows:
∑in=−11 ( xi+1 − xi )2
MSSD =
n−1
For each time window, the MSSD is calculated and divided by the MSSD for a hypothetical
distribution with "perfect" fluctuation between the scale minimum and maximum value. As a simple
example, let us assume the following vector:

> test <- round(runif(n = 10, 1, 10))


> test
[1] 3 3 5 10 6 1 4 3 2 7

The maximum possible MSSD for a vector with the length 10, a minimum value of 1 and a
maximum value of 10 is calculated. The MSSD of the observed values is then divided by this
theoretical maximum.

> fmax <- mssd(rep_len(c(1, 10), length.out = 10))


> fmax
[1] 91.125
> fobs <- mssd(test)
> fobs
[1] 13.25
> fobs/fmax
[1] 0.1454047

The flucuation intensity of this vector is .1454047.

Distribution

While a time series that fluctuates between its extremes would result in a high fluctuation coefficient,
one could hardly argue that chaos can be identified by fluctuation alone, because a perfect fluctuation
between two values is predictable and orderly. If a system destabilizes, it should become open to a
wide variety of possible system states. These states are represented in time series by different values.
Thus, a coefficient for measuring the degree of dispersion is proposed. Its intent is to capture the
irregularity of a time series by comparing the distribution of values in the moving windows with a
distribution that would be observed if all values in the window were uniformly distributed. The
calculation of the distribution parameter will be demonstrated using the same test vector. First, we
generate a hypothetical uniform distribution by building a sequence from the smallest to the highest
value and the width of the analysis window. Then, we calculate the differences between successive
values.

> uniform <- seq(from = 1, to = 10, length.out = 10)


> uniform
> uni.diff <- diff(uniform, lag=1)
> uni.diff
[1] 1 1 1 1 1 1 1 1 1

Next, we order the observed values from the smallest to the highest value and calculate the differences
between successive values as well.

> empirical <- sort(test)


> emp.diff <- diff(empirical, lag=1)
Now, the difference between both difference vectors is calculated. A Heaviside step function is applied
to eliminate negative differences. The result is then divided by the difference vector of the uniform
distribution.

> deviation <- uni.diff - emp.diff


> dev.h <- deviation * ((sign(deviation) + 1)/2)
> div.diff <- dev.h / uni.diff
> div.diff
[1] 0 0 1 1 0 0 0 0 0
The distribution coefficient is the mean of this vector.

> mean(div.diff)
[1] 0.2222222
DYNCOMP 4

Table 1: Results of the validation of dyncomp f and d. Correlations of both coefficients with PE
with a word length of 6 and a window size of 100 (h61 00 ) and positive Lyapunov exponent (λ) for
different window widths are shown. For comparison, correlations of PE with Lyapunov exponent
were included in the rightmost column. All correlations are statistically significant, p < 10−5 .

h6,100 λ
Window d f d f h6
100 .931 -.955 .932 -.869 .932
80 .932 -.955 .930 -.868 .928
60 .935 -.954 .928 -.867 .920
40 .936 -.952 .915 -.864 .901
20 .940 -.943 .886 -.853 .842
10 .925 -.910 .817 -.811 .661
7 .898 -.879 .769 -.768 n/a
5 .840 -.857 .688 -.748 n/a

Compound Measure

Fluctuation and distribution values can be combined into a compound measure of overall complexity
by multiplying them. This way, information on both aspects of chaos are contained in one measure.
For our test vector, this would be

0.1454047 × 0.2222222 = 0.03231215

General Validity and Scalability

Method

In order to generate data comparable with other measures of complexity that were already published,
the validation approach follows the one used by Bandt and Pompe (2002) as well as Schiepek and
Strunk (2010). First, a data set was simulated using the iterative logistic map x[ n + 1] = rxn (1 − xn )
and increasing r continuously by steps of .001 from 3.1 to 4.001 . This resulted in 901 sequences of
100 data points. For each of these sequences, different, well-established, complexity measures were
calculated along with the newly proposed measures:

• The Lyapunov exponent (λ) (Osedelec, 1968; Wolf et al., 1985)) serves as a "gold standard" in
this validation approach because it was calculated directly from the logistic map equations.
Correlations with this measure will be interpreted as validity coefficients.
• Fluctuation (f ) and distribution (d) coefficients proposed in this paper were calculated for the
window sizes 5, 7, 10, 20, 40, 60, 80 and 1002 . While smaller window sizes are important in
scenarios with short time series, larger window sizes show the potential accuracy and scalability
of the algorithm.
• Permutation Entropy (PE) (Bandt and Pompe, 2002) (h) was calculated using the statcomp R
package (Sippel et al., 2016). This is a well validated and robust measure. PE combines the
available data to so-called words of a given maximum length. PE is then derived from the
frequency distribution of these words. PE values with a word length of 6 were calculated for the
same window sizes like f and d. However, window sizes of 5 or 7 are too small for calculating
PE with a word length of 6, so they were omitted.

Results

Correlations between the Lyapunov exponent, PE and fluctuation and distribution generated by
dyncomp were calculated and interpreted as validity coefficients. They are shown in Table 1.
Fluctuation correlated negatively with the other two measures of complexity. This behavior is
expected due to the relatively large fluctuation intensity for small r values in the logistic map, while
1 R code is provided as an online supplement that includes all calculations presented here, including simulation

of the logistic map.


2 For windows smaller than 100, only the first values of the sequences were included in the calculation, e.g. the

first 10 values for a window size of 10.


DYNCOMP 5

Figure 2: Comparing different complexity estimates. a: simulated logistic map. b: positive Lyapunov
exponents, calculated from the logistic map equation. c to f: dyncompdistribution and fluctuation
coefficients with window sizes 20 and 7. g: PE with a word length of 6 and a window size of 100.
DYNCOMP 6

Figure 3: Daily means of affect scores. The gray area marks the "transition phase" identified by Wichers
et al. (2016). Different colours indicate different study phases.

extreme fluctuation diminishes for large r values. The opposite was true for the distribution of possible
values, indicating that the d coefficient indeed validly measures the available range of values that
increases with r. In Figure 2, this phenomenon can be visually identified by noticing the increased
density of the plot for larger r values. All validity coefficients were high (i.e. larger than .60) even for
window sizes as small as 5. For smaller window sizes, validity coefficients were substantially higher
than those of PE.

Case Study: Complexity as an Early Warning Sign in Depression


The general validation of the the dyncomp measures relied on an already established method of
validating complexity measures. The measure’s main use, however, lies in its application to shorter
time series from real-world settings. The utility of the proposed measures of complexity will hence be
demonstrated using a data set from the field of psychiatry, used in a study by Wichers et al. (2016)
and made publicly available later (Kossakowski et al., 2017). The data set contains ratings of mood,
physical sensations and social contact of a depressive patient over the course of 239 consecutive days
for up to ten times a day. A case study was conducted during this period, beginning with a pre-trial
baseline assessment of 28 days, followed by a double-blinded phase of antidepressant dose reduction
that lasted 14 weeks. Actual reduction of antidepressant dose started at week 6. At the end of this
period, a significant increase of depressive symptoms was observed, preceded by a brief phase of
transition. This dataset is very usable for demonstrating the easy visualization of human change
processes because it offers psychological data with high temporal resolution and was recorded under
highly controlled conditions. In this example, we will focus on three affect scales: positive affect,
negative affect, and mental relaxation3 . Figure 3 shows daily mean scores of these scales over the full
trial period.
Complexity scores were calculated in moving windows of seven days. Figure 4 shows their
course over the trial period. A critical threshold, defined as 1.645 standard deviations above the mean
complexity (the 95th percentile), was calculated from the first 127 days. Spikes in complexity that
surpass this threshold can be interpreted as indicative for critical instabilities. Figure 4 shows that
several potential critical instabilities can be identified visually with ease.

• The transition between the second and third study phase is marked by spikes in complexity on
3 Item text: "I feel. . . " + "satisfied", "enthusiastic", "cheerful", "strong" for positive affect,

"down", "lonely", "anxious", "guilty" for negative affect,


"relaxed" and "irritated" (reverse-scored) for mental relaxation.
DYNCOMP 7

all scales. Complexity peaks shortly before day 50.


• The "transition period" identified by Wichers et al. (2016), that occurred after reducing antide-
pressant dose, is marked by spikes in complexity on all scales.
• In the post-trial assessment phase, several spikes in complexity can be observed on the "mental
relaxation" scale. On the "positive" and "negative affect" scales, a strong peak occurs only
around trial day 175, indicating another critical instability. Another strong increase in depressive
symptoms follows this peak.
In this example, depressive symptoms increased after critical instabilities. However, it is worth
noting that these instabilities are not always followed by negative symptom change. For a decrease in
symptoms, "stable boundary conditions" have to be present (Schiepek et al., 2014). In psychotherapy
and psychiatry, these are created, for example, by positive relationship experiences with therapists or
a positive atmosphere in the psychiatry ward.

Summary
A new measure for the complexity of time series was introduced and validated. It was demonstrated
that this measure has a high validity even when analyzing relatively short time series. The proposed
measures performed well compared to PE, especially in small window sizes. For larger window
sizes, validity coefficients were comparable. The fluctuation coefficient proposed by Schiepek and
Strunk (2010) was more robust against the reduction of window size. Its validity surpassed that of the
fluctuation measure used for dyncopmp in most cases. However, the distribution coefficient proposed
reached substantially higher validity for small windows and all measures maintained satisfactory
reliability even for analysis window sizes as small as 5. Using a publicly available data set from a
well-documented case study, it could be shown that the compound measure of dynamic complexity
was able to detect critical transitions that lead to successive symptom change. All in all, the measures
introduced in this publication can be considered suitable for the study of complex dynamical systems,
especially when observations have to be made with low temporal resolution and only a small amount
of data is available. Possible fields of application for this newly introduced measure are manifold,
because the analysis of non-linear time series has spread to various disciplines. Thus, future studies
should focus on studying the utility of the proposed measures in every field that relies on complexity
estimates. For example, the proposed measure of complexity is used in an open-source software
platform for real-time monitoring of psychotherapeutic processes that enables both researchers and
clinical practitioners to predict and study critical transitions in human change processes (Kaiser and
Laireiter, 2017). The author hopes that the tools provided will advance the field of complex systems
research.

Bibliography
C. Bandt and B. Pompe. Permutation entropy: A natural complexity measure for time series. Physical
Review Letters, 88(17):174102, Apr 2002. doi: 10.1103/PhysRevLett.88.174102. [p1, 4]

C. T. Butts. The complexity of social networks: theoretical and empirical findings. Social Networks, 23
(1):31–72, 2001. [p1]

N. Caticha, R. Calsaverini, and R. Vicente. Phase transition from egalitarian to hierarchical societies
driven by competition between cognitive and social constraints. arXiv:1608.03637 [physics], Aug
2016. URL http://arxiv.org/abs/1608.03637. arXiv: 1608.03637. [p1]

R. Foote. Mathematics and complex systems. Science, 318(5849):410–412, Oct 2007. ISSN 0036-8075,
1095-9203. doi: 10.1126/science.1141754. [p1]

T. Glenn, P. C. Whybrow, N. Rasgon, P. Grof, M. Alda, C. Baethge, and M. Bauer. Approximate entropy
of self-reported mood prior to episodes in bipolar disorder. Bipolar Disorders, 8(5p1):424–429, Oct
2006. ISSN 1399-5618. doi: 10.1111/j.1399-5618.2006.00373.x. [p1]

A. M. Hayes, J.-P. Laurenceau, G. Feldman, J. L. Strauss, and L. Cardaciotto. Change is not always
linear: The study of nonlinear and discontinuous patterns of change in psychotherapy. Clinical
Psychology Review, 27(6):715–723, Jul 2007. ISSN 02727358. doi: 10.1016/j.cpr.2007.01.008. [p1]

T. Kaiser and A. R. Laireiter. Dynamo: A modular platform for monitoring process, outcome, and
algorithm-based treatment planning in psychotherapy. JMIR Medical Informatics, 5(3):e20, Jul 2017.
ISSN 2291-9694. doi: 10.2196/medinform.6808. [p7]
DYNCOMP 8

Figure 4: Complexity measure for daily means of affect scores and weekly depression symptom
ratings. The gray area marks the "transition phase" identified by Wichers et al. (2016). The black
horizontal line marks a critical threshold of complexity.
DYNCOMP 9

J. Kossakowski, P. Groot, J. Haslbeck, D. Borsboom, and M. Wichers. Data from “critical slowing down
as a personalized early warning signal for depression”. Journal of Open Psychology Data, 5(1), 2017.
ISSN 2050-9863. doi: 10.5334/jopd.29. URL http://openpsychologydata.metajnl.com/articles/
10.5334/jopd.29/. [p6]

J. Martinerie, C. Adam, M. L. V. Quyen, M. Baulac, S. Clemenceau, B. Renault, and F. J. Varela. Epileptic


seizures can be anticipated by non-linear analysis. Nature Medicine, 4(10):1173–1176, Oct 1998. ISSN
1078-8956. doi: 10.1038/2667. [p1]

V. Misra, M. Lagi, and Y. Bar-Yam. Evidence of market manipulation in the financial crisis. arXiv
preprint arXiv:1112.3095, 2011. [p1]

F. F. Orsucci. The paradigm of complexity in clinical neurocognitive science. The Neuroscientist, 12(5):
390–397, Oct 2006. ISSN 1073-8584. doi: 10.1177/1073858406290266. [p1]

V. Osedelec. Multiplicative ergodic theorem: Lyapunov characteristic exponent for dynamical systems.
In Moscow Math. Soc., volume 19, pages 539–575, 1968. [p4]

O. Ramos. Criticality in earthquakes. good or bad for prediction? Tectonophysics, 485(1–4):321–326,


Apr 2010. ISSN 0040-1951. doi: 10.1016/j.tecto.2009.11.007. [p1]

M. Scheffer, J. Bascompte, W. A. Brock, V. Brovkin, S. R. Carpenter, V. Dakos, H. Held, E. H. Van Nes,


M. Rietkerk, and G. Sugihara. Early-warning signals for critical transitions. Nature, 461(7260):53–59,
2009. [p1]

M. Scheffer, S. R. Carpenter, T. M. Lenton, J. Bascompte, W. Brock, V. Dakos, J. van de Koppel, I. A.


van de Leemput, S. A. Levin, E. H. van Nes, and et al. Anticipating critical transitions. Science, 338
(6105):344–348, Oct 2012. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.1225244. [p1]

G. Schiepek and G. Strunk. The identification of critical fluctuations and phase transitions in short term
and coarse-grained time series—a method for the real-time monitoring of human change processes.
Biological Cybernetics, 102(3):197–207, Mar 2010. ISSN 0340-1200, 1432-0770. doi: 10.1007/s00422-
009-0362-1. [p2, 4, 7]

G. K. Schiepek, I. Tominschek, and S. Heinzel. Self-organization in psychotherapy: testing the


synergetic model of change processes. Frontiers in psychology, 5, 2014. URL http://www.ncbi.nlm.
nih.gov/pmc/articles/PMC4183104/. [p7]

S. Sippel, H. Lange, and F. Gans. statcomp: Statistical Complexity and Information Measures for Time
Series Analysis, 2016. URL https://CRAN.R-project.org/package=statcomp. R package version
0.0.1.1000. [p1, 4]

J. Von Neumann, R. Kent, H. Bellinson, and B. t. Hart. The mean square successive difference. The
Annals of Mathematical Statistics, 12(2):153–162, 1941. [p2]

M. Wichers, P. C. Groot, and Psychosystems, ESM Group. Critical slowing down as a personalized
early warning signal for depression. Psychotherapy and Psychosomatics, 85(2):114–116, 2016. ISSN
0033-3190, 1423-0348. doi: 10.1159/000441458. [p6, 7, 8]

A. Wolf, J. B. Swift, H. L. Swinney, and J. A. Vastano. Determining lyapunov exponents from a time
series. Physica D: Nonlinear Phenomena, 16(3):285–317, Jul 1985. ISSN 0167-2789. doi: 10.1016/0167-
2789(85)90011-9. [p4]

Tim Kaiser
University of Salzburg
Department of Psychology
Hellbrunnerstrasse 34
5020 Salzburg
Austria
Tim.Kaiser@sbg.ac.at

You might also like