Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Detecting thresholds in time series data using changepoint detection methods. Robert Maidstone Paul Fearnhead 2 Adam Letchford 2 1st Year PhD Student - STOR-i DTC, Lancaster University STOR-i DTC, Lancaster University Reordering the Data One way in which threshold levels can be found in a data set is by using changepoint detection methods. To do this the problem needs to be reformulated to take advantage of these methods. This is done by taking the data and reordering based on the level of the last data point, that is for data y: Let x = ordered(y), Then zi = {yj s.t. xi = yj−1}. 30 6 400 600 800 1000 0 Index where f is a piecewise function defined by;  z1 for |ys | ≤ H, f (ys ) = z2 for |ys | > H. Then when the threshold H is exceeded the variance changes to a new value, until the series falls back below H. An example of this is given in Figure 2(b). 600 800 1000 Index (b) Thresholded time-series with change in variance. Figure 3: Reordered time-series from Figure 2 Then the changes (in variance and mean respectively) can be found using changepoint algorithms. These can then be used, by looking at what values the changes occur at, to estimate what the threshold value is in each case. Comparisons Using these methods on our reordered threshold data we can compare them against each other as the length of the data set increases. PDPA PELT Opt Par Rigaill ● ● ● 40 30 6 −20 −10 0 0 10 x 20 4 2 x −2 200 400 600 800 1000 0 t (a) Thresholded time-series with change in mean. 200 400 600 800 1000 t (b) Thresholded time-series with change in variance. Figure 2: Examples of time-series with thresholds Likewise Figure 2(a) shows a threshold model where the model changes the mean when a threshold is exceeded. Other threshold models make use of autoregressive processes. These are known as TAR models and are heavily used in economics, examples of these can be found in (Hansen, 2011). http://www.lancs.ac.uk/~maidston ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●● ● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ●● ● ●●●●●●● ●●●●● ●● ● ●●● ●●●●●●●● ● ● ● ● ● ● ● ● ●●● ●●●●●●● ●● 200 ●● ● ● ● ●●● ● ● ●● ● ● ● ●● ● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●●● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● Many methods for finding these changepoints exist, both heuristic and exact, and for a good summary please see (Eckley et al., 2011). Here dynamic programming algorithms are considered, as they are exact and can be sped up using pruning methods. There are two main types of algorithm which can be used: Optimal Partitioning Uses dynamic programming to find the best partitioning for each time step by optimising over all potential places for the positioning of the previous changepoint. Segment Neighbourhood Search Uses dynamic programming to find the best way to partition the data set into K ∈ N segments. It does this by considering (for each k = 1, . . . , K ) the best way to partition the data at each time t by optimising over the ways to partition into k − 1 segments. These two methods optimise over the set of all the possible places the last changepoint can be placed and hence take a long time to run (O(n2) and O(Kn2) respectively). This set can be pruned down and hence the computational time can be reduced. ● ●● ● ● ● ● ● Basic Changepoint Methods ● ● ● 0 0 ● 2 Yt ∼ N(0, σ 2f (yt−1)), 400 0 In many time-series the underlying model may change due to the value the variable currently is at. For example consider: (a) Thresholded time-series with change in mean. 200 6 200 5 Threshold Models 0 4 −2 −20 −10 0 0 10 B2[, 1] 20 4 2 B2[, 1] Figure 1: One place where threshold models are used is in determining dosage levels for pharmaceuticals. (Image from http://www.khamtran.com/) 40 Then plotting z will give graphs such as the two in Figure 3. There have been two main approaches to pruning in recent papers, PELT and PDPA. These are discussed below alongside a new method which aims to combine the two. PELT Pruned Exact Linear Time (Killick et al., 2011). Based on Optimal Partitioning, PELT prunes the state space by comparing the difference in cost between placing a changepoint at the current step and placing one at previous steps. PDPA Pruned Dynamic Programming Algorithm (Rigaill, 2010). Based on Segment Neighbourhood Search, PDPA prunes the state space by considering letting the parameter vary in the cost function and pruning potential changepoints when they are no longer optimal for any region of the parameter space. OPR Optimal Partitioning using Rigaill’s methods. Based on Optimal Partitioning, but using the pruning techniques of PDPA. 3 When an ordered data set is collected then this is called a time-series. To enable analysis of time-series and the forecasting of future data points, statisticians wish to fit models to the data. However in some cases the model fitted may be dependent on previous values in the series. One example of this is in threshold modelling. Threshold models occur in many real world data sets including economics and medicine. Pruned Changepoint Algorithms 1 Introduction 2 Time (seconds) 1 1 ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● 400 600 800 1000 Length of data set (n) Figure 4: Comparison of the computational time taken by the three algorithms. As can be seen in Figure 4 OPR performs at a similar level to PELT and slightly quicker than PDPA. This is to be expected as both PELT and OPR are O(n) whereas PDPA is O(Kn) (where K is the maximum number of segments we are searching over). The main advantage of OPR over PELT is that as it is based on functions of the parameters, it is felt that it is more versatile especially when looking at data which is dependent across segments. A natural extension of this is to try and bring PELT pruning steps into the OPR algorithm to further reduce the computational time. References Eckley, I. A., Fearnhead, P., and Killick, R. (2011). Analysis of changepoint models. In Bayesian Time Series Models. Cambridge University Press. Hansen, B. (2011). Threshold autoregression in economics. Statistics and its Interface, pages 123–127. Killick, R., Fearnhead, P., and Eckley, I. A. (2011). Optimal detection of changepoints with a linear computational cost. ArXiv e-prints. Provided by the SAO/NASA Astrophysics Data System. Rigaill, G. (2010). Pruned dynamic programming for optimal multiple change-point detection. ArXiv e-prints, (May):9. Figure 5: Scan here to visit my web page. r.maidstone@lancaster.ac.uk