Abstract
Time-series classification (TSC) problems present a specific challenge for classification algorithms: how to measure similarity between series. A shapelet is a time-series subsequence that allows for TSC based on local, phase-independent similarity in shape. Shapelet-based classification uses the similarity between a shapelet and a series as a discriminatory feature. One benefit of the shapelet approach is that shapelets are comprehensible, and can offer insight into the problem domain. The original shapelet-based classifier embeds the shapelet-discovery algorithm in a decision tree, and uses information gain to assess the quality of candidates, finding a new shapelet at each node of the tree through an enumerative search. Subsequent research has focused mainly on techniques to speed up the search. We examine how best to use the shapelet primitive to construct classifiers. We propose a single-scan shapelet algorithm that finds the best \(k\) shapelets, which are used to produce a transformed dataset, where each of the \(k\) features represent the distance between a time series and a shapelet. The primary advantages over the embedded approach are that the transformed data can be used in conjunction with any classifier, and that there is no recursive search for shapelets. We demonstrate that the transformed data, in conjunction with more complex classifiers, gives greater accuracy than the embedded shapelet tree. We also evaluate three similarity measures that produce equivalent results to information gain in less time. Finally, we show that by conducting post-transform clustering of shapelets, we can enhance the interpretability of the transformed data. We conduct our experiments on 29 datasets: 17 from the UCR repository, and 12 we provide ourselves.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bagnall A, Hills J, Lines J (2012) Shapelet based time series classification. http://www.uea.ac.uk/computing/machine-learning/Shapelets. Accessed 14 May 2013
Bagnall A, Davis L, Hills J, Lines J (2012) Transformation based ensembles for time series classification. Proceedings of the twelfth SIAM conference on data mining (SDM)
Batista G, Wang X, Keogh E (2011) A complexity-invariant distance measure for time series. Proceedings of the eleventh SIAM conference on data mining (SDM)
Bober M (2001) Mpeg-7 visual shape descriptors. IEEE Trans Circ Syst Video Technol 11(6):716–719
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Buza K (2011) Fusion methods for time-series classification. Ph.D. thesis, University of Hildesheim, Germany
Campana S, Casselman J (1993) Stock discrimination using otolith shape analysis. Can J Fish Aquat Sci 50(5):1062–1083
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Davis LM, Theobald B-J, Lines J, Toms A, Bagnall A (2012) On the segmentation and classification of hand radiographs. Int J Neural Syst 22(5):1250020
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Deng H, Runger G, Tuv E, Vladimir M (2011) A time series forest for classification and feature extraction. Tech. rep., Arizona State University
De Vries D, Grimes C, Prager M (2002) Using otolith shape analysis to distinguish eastern gulf of mexico and atlantic ocean stocks of king mackerel. Fish Res 57(1):51–62
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552
Duarte-Neto P, Lessa R, Stosic B, Morize E (2008) The use of sagittal otoliths in discriminating stocks of common dolphinfish (coryphaena hippurus) off northeastern brazil using multishape descriptors. ICES J Mar Sci J du Conseil 65(7):1144–1152
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18
Hartmann B, Link N (2010) Gesture recognition with inertial sensors and optimized dtw prototypes. Systems man and cybernetics (SMC), 2010 IEEE international conference on. IEEE pp 2102–2109
He Q, Dong Z, Zhuang F, Shi Z (2012) Fast Time Series Classification based on infrequent shapelets. Machine learning and applications (ICMLA), 2012 11th international conference on. IEEE, pp 215–219
Hoare C (1962) Quicksort. Comput J 5(1):10–16
Image Processing and Informatics Lab, University of Southern California, The Digital Hand Atlas Database System. http://www.ipilab.org/BAAweb
Janacek G, Bagnall A, Powell M (2005) A likelihood ratio distance measure for the similarity between the fourier transform of time series. Proceedings of the Ninth Pacific-Asia Conference on knowledge discovery and data mining (PAKDD)
Jeong Y, Jeong M, Omitaomu O (2010) Weighted dynamic time warping for time series classification. Pattern Recognit 44:2231–2240
Hu B, Chen Y, Keogh E (2013) Time series classification under more realistic assumptions. Proceedings of the thirteenth SIAM conference on data mining (SDM)
Kruskal W (1952) A nonparametric test for the several sample problem. Ann Math Stat 23(4):525–540
Latecki L, Lakamper R, Eckhardt T (2000) Shape descriptors for non-rigid shapes with a single closed contour. Computer vision and pattern recognition, 2000. Proceedings. IEEE conference on vol. 1. IEEE, pp 424–429
Lines J, Bagnall A (2012) Alternative quality measures for time series shapelets. Intelligent data engineering and automated learning (IDEAL). Lect Notes Comput Sci 7435:475–483
Lines J, Davis L, Hills J, Bagnall A (2012) A shapelet transform for time series classification. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 289–297
Mood AM, Graybill FA, Boes DC (1974) Introduction to the theory of statistics, 3rd edn. McGraw-Hill, New York
Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1154–1162
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 262–270
Rakthanmanon T, Keogh E (2013) Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets. Proceedings of the thirteenth SIAM conference on data mining (SDM)
Rodriguez J, Alonso C (2005) Support vector machines of interval-based features for time series classification. Knowl-Based Syst 18:171–178
Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. Pattern Anal Mach Intell IEEE Trans 28(10):1619–1630
Shannon C, Weaver W, Blahut R, Hajek B (1949) The mathematical theory of communication, vol 117. University of Illinois press, Urbana
Sivakumar P et al. (2012) Human gait recognition and classification using time series shapelets. International conference on advances in computing and communications (ICACC)
Stransky C (2005) Geographic variation of golden redfish (sebastes marinus) and deep-sea redfish (s. mentella) in the north atlantic based on otolith shape analysis. ICES J Mar Sci J du Conseil 62(8):1691–1698
Wu Y, Agrawal D, Abbadi AE (2000) A comparison of dft and dwt based similarity search in time-series databases. Proceedings of the ninth international conference on information and knowledge management (ACM CIKM)
Xing Z, Pei J, Yu P (2012) Early classification on time series. Knowl Inform syst 31(1):105–127
Xing Z, Pei J, Yu P, Wang K (2011) Extracting interpretable features for early classification on time series. Proceedings of the eleventh SIAM conference on data mining (SDM)
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 947–956
Ye L, Keogh E (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1):149–182
Zakaria J, Mueen A, Keogh E (2012) Clustering time series using unsupervised-shapelets. Data mining (ICDM), 2012 IEEE 12th international conference. pp. 785–794
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Eamonn Keogh.
Rights and permissions
About this article
Cite this article
Hills, J., Lines, J., Baranauskas, E. et al. Classification of time series by shapelet transformation. Data Min Knowl Disc 28, 851–881 (2014). https://doi.org/10.1007/s10618-013-0322-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-013-0322-1