Abstract
Time series indexing plays an important role in querying and pattern mining of big data. This paper proposes a novel structure for tightly covering a given set of time series under the dynamic time warping similarity measurement. The structure, referred to as dynamic covering with cross-range constraints (DCRC), enables more efficient and scalable indexing to be developed than current hypercube-based partitioning approaches. In particular, a lower bound of the DTW distance from a given query time series to a DCRC-based cover set is introduced. By virtue of its tightness, which is proven theoretically, the lower bound can be used for pruning when querying on an indexing tree. If the DCRC-based lower bound (LB_DCRC) of an upper node in an index tree is larger than a given threshold, all child nodes can be pruned yielding a significant reduction in computational time. A hierarchical DCRC (HDCRC) structure is proposed to generate the DCRC-tree-based indexing and used to develop time series indexing and insertion algorithms. Experimental results for a selection of benchmark time series datasets are presented to illustrate the tightness of LB_DCRC, as well as the pruning efficiency on the DCRC-tree, especially when the time series have large deformations.
Similar content being viewed by others
References
Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Proceedings of International Conference on Foundations of Data Organization and Algorithms, pp. 69–84. Springer, Boston, MA (1993)
Chen, C.L.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)
Dau, H.A., Keogh, E., Kamgar, K., Yeh, C.C.M., Zhu, Y., Gharghabi, S., Ratanamahatana, C.A., Yanping, Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G., Hexagon, M.L.: The UCR time series classification archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018 (2018)
Edstrom, J., Chen, D., Gong, Y., Wang, J., Gong, N.: Data-pattern enabled self-recovery low-power storage system for big video data. IEEE Trans. Big Data 5(1), 95–105 (2019)
Esling, P., Agon, C.: Time-series data mining. ACM Comput. Surv. 45(1), 12:1–34 (2012)
Fu, T.C.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)
Grabocka, J., Wistuba, M., Schmidt-Thieme, L.: Fast classification of univariate and multivariate time series through shapelet discovery. Knowl. Inf. Syst. 49(2), 429–454 (2016)
Guttman, A.: (1984) R-trees: A dynamic index structure for spatial searching. In: ACM Sigmod International Conference on Management of Data, pp. 47–57. ACM, New York, NY (2018)
He, H., Tan, Y.: Unsupervised classification of multivariate time series using VPCA and fuzzy clustering with spatial weighted matrix distance. IEEE Trans. Cybern. 50(3), 1096–1105 (2020)
Hu, J., Yang, B., Guo, C., Jensen, C.S.: Risk-aware path selection with time-varying, uncertain travel costs: A time series approach. VLDB J. 27(2), 179–200 (2018)
Ignatov, A.: Real-time human activity recognition from accelerometer data using convolutional neural networks. Appl. Soft Comput. 62, 915–922 (2018)
Itakura, F.: Minimum prediction residual principle applied to speech recognition. IEEE Trans. Acoust. Speech Signal Process. 23(1), 67–72 (1975)
Kacprzyk, J., Wilbik, A., Zadrożny, S.: Linguistic summarization of time series using a fuzzy quantifier driven aggregation. Fuzzy Sets Syst. 159(12), 1485–1499 (2008)
Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)
Keogh, E., Wei, L., Xi, X., Vlachos, M., Lee, S.H., Protopapas, P.: Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures. VLDB J. 18(3), 611–630 (2009)
Lemire, D.: Faster retrieval with a two-pass dynamic-time-warping lower bound. Pattern Recogn. 42, 2169–2180 (2009)
Li, H., Yang, L.: Extensions and relationships of some existing lower-bound functions for dynamic time warping. J., Intell. Inf. Syst. 43(1), 59–79 (2014)
Li, Q., Chen, Y., Wang, J., Chen, Y., Chen, H.C.: Web media and stock markets: A survey and future directions from a big data perspective. IEEE Trans. Knowl. Data Eng. 30(2), 381–399 (2018)
Lin, S.C., Yeh, M.Y., Chen, M.S.: Non-overlapping subsequence matching of stream synopses. IEEE Trans. Knowl. Data Eng. 30(1), 101–114 (2018)
Liu, M., Zhang, X., Xu, G.: Continuous motion classification and segmentation based on improved dynamic time warping algorithm. Int. J. Pattern Recognit Artif Intell. 32(2), 1850,002 (2018)
Mikalsen, K.Ø., Bianchi, F.M., Soguero-Ruiz, C., Jenssen, R.: Time series cluster kernel for learning similarities between multivariate time series with missing data. Pattern Recogn. 76, 569–581 (2018)
Mondal, T., Ragot, N., Ramel, J.Y., Pal, U.: Comparative study of conventional time series matching techniques for word spotting. Pattern Recogn. 73, 47–64 (2018)
Mori, U., Mendiburu, A., Lozano, J.A.: Similarity measure selection for clustering time series databases. IEEE Trans. Knowl. Data Eng. 28(1), 181–195 (2016)
Mueen, A., Chavoshi, N., Abu-El-Rub, N., Hamooni, H., Minnich, A., MacCarthy, J.: Speeding up dynamic time warping distance for sparse time series data. Knowl. Inf. Syst. 54(1), 237–263 (2018)
Mueen, A., Keogh, E.: Extracting optimal performance from dynamic time warping. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2129–2130. ACM, New York, NY (2016)
Park, S., Lee, D., Chu, W.W.: Fast retrieval of similar subsequences in long sequence databases. In: Proceedings of 1999 Workshop on Knowledge and Data Engineering Exchange, pp. 60–67. IEEE, Chicago, IL (1999)
Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., Keogh, E.: Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 262–270. ACM, New York, NY (2012)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)
Shen, Y., Chen, Y., Keogh, E., Jin, H.: Accelerating time series searching with large uniform scaling. In: Proceedings of the 2018 SIAM International Conference on Data Mining, pp. 234–242. SIAM, Bologna, Italy (2018)
Son, N.T., Anh, D.T.: Discovery of time series \(k\)-motifs based on multidimensional index. Knowl. Inf. Syst. 46(1), 59–86 (2016)
Sun, T., Liu, H., Yu, H., Chen, C.L.P.: Degree-pruning dynamic planning approaches to central time series through minimizing dynamic time warping distance. IEEE Trans. Cybern. 47(7), 1719–1729 (2017)
Tan, C.W., Petitjean, F., Webb, G.: Elastic bands across the path: a new framework and method to lower bound DTW. In: Proceedings of the 2019 SIAM International Conference on Data Mining, pp. 522–530. SIAM, Alberta, Canada (2019)
Tan, C.W., Webb, G.I., Petitjean, F.: Indexing and classifying gigabytes of time series under time warping. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 282–290. SIAM, Houston, TX (2017)
Tan, Z., Wang, Y., Zhang, Y., Zhou, J.: A novel time series approach for predicting the long-term popularity of online videos. IEEE Trans. Broadcast. 62(2), 436–445 (2016)
Tang, J., Cheng, H., Zhao, Y., Guo, H.: Structured dynamic time warping for continuous hand trajectory gesture recognition. Pattern Recogn. 80, 21–31 (2018)
Wu, X., Zhu, X., Wu, G., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)
Wu, Y., Tong, Y., Zhu, X., Wu, X.: NOSEP: Nonoverlapping sequence pattern mining with gap constraints. IEEE Trans. Cybern. 48(10), 2809–2822 (2018)
Yi, B.K., Jagadish, H.V., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In: Proceedings of the 14th International Conference on Data Engineering, pp. 201–208. IEEE, Orlando, FL (1998)
Zhou, M., Wong, M.H.: Boundary-based lower-bound functions for dynamic time warping and their indexing. Inf. Sci. 181(19), 4175–4196 (2011)
Zoumpatianos, K., Lou, Y., Ileana, I., Palpanas, T., Gehrke, J.: Generating data series query workloads. VLDB J. 27(6), 823–846 (2018)
Acknowledgements
The authors sincerely thank the editors and the anonymous reviewers for the very helpful and kind comments that have enhanced the presentation of our paper. The authors would also like to thank the UCR time series classification archive and Prof. Keogh for providing the datasets used in the study. This work is supported in part by the National Natural Science Foundation of China (Grant Nos. 61751205, 91746209, 61772102)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sun, T., Liu, H., McLoone, S. et al. Time series indexing by dynamic covering with cross-range constraints. The VLDB Journal 29, 1365–1384 (2020). https://doi.org/10.1007/s00778-020-00614-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-020-00614-9