Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Classification of time series by shapelet transformation

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Time-series classification (TSC) problems present a specific challenge for classification algorithms: how to measure similarity between series. A shapelet is a time-series subsequence that allows for TSC based on local, phase-independent similarity in shape. Shapelet-based classification uses the similarity between a shapelet and a series as a discriminatory feature. One benefit of the shapelet approach is that shapelets are comprehensible, and can offer insight into the problem domain. The original shapelet-based classifier embeds the shapelet-discovery algorithm in a decision tree, and uses information gain to assess the quality of candidates, finding a new shapelet at each node of the tree through an enumerative search. Subsequent research has focused mainly on techniques to speed up the search. We examine how best to use the shapelet primitive to construct classifiers. We propose a single-scan shapelet algorithm that finds the best \(k\) shapelets, which are used to produce a transformed dataset, where each of the \(k\) features represent the distance between a time series and a shapelet. The primary advantages over the embedded approach are that the transformed data can be used in conjunction with any classifier, and that there is no recursive search for shapelets. We demonstrate that the transformed data, in conjunction with more complex classifiers, gives greater accuracy than the embedded shapelet tree. We also evaluate three similarity measures that produce equivalent results to information gain in less time. Finally, we show that by conducting post-transform clustering of shapelets, we can enhance the interpretability of the transformed data. We conduct our experiments on 29 datasets: 17 from the UCR repository, and 12 we provide ourselves.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Bagnall A, Hills J, Lines J (2012) Shapelet based time series classification. http://www.uea.ac.uk/computing/machine-learning/Shapelets. Accessed 14 May 2013

  • Bagnall A, Davis L, Hills J, Lines J (2012) Transformation based ensembles for time series classification. Proceedings of the twelfth SIAM conference on data mining (SDM)

  • Batista G, Wang X, Keogh E (2011) A complexity-invariant distance measure for time series. Proceedings of the eleventh SIAM conference on data mining (SDM)

  • Bober M (2001) Mpeg-7 visual shape descriptors. IEEE Trans Circ Syst Video Technol 11(6):716–719

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  • Buza K (2011) Fusion methods for time-series classification. Ph.D. thesis, University of Hildesheim, Germany

  • Campana S, Casselman J (1993) Stock discrimination using otolith shape analysis. Can J Fish Aquat Sci 50(5):1062–1083

    Article  Google Scholar 

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  • Davis LM, Theobald B-J, Lines J, Toms A, Bagnall A (2012) On the segmentation and classification of hand radiographs. Int J Neural Syst 22(5):1250020

    Google Scholar 

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MATH  MathSciNet  Google Scholar 

  • Deng H, Runger G, Tuv E, Vladimir M (2011) A time series forest for classification and feature extraction. Tech. rep., Arizona State University

  • De Vries D, Grimes C, Prager M (2002) Using otolith shape analysis to distinguish eastern gulf of mexico and atlantic ocean stocks of king mackerel. Fish Res 57(1):51–62

    Article  Google Scholar 

  • Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552

    Google Scholar 

  • Duarte-Neto P, Lessa R, Stosic B, Morize E (2008) The use of sagittal otoliths in discriminating stocks of common dolphinfish (coryphaena hippurus) off northeastern brazil using multishape descriptors. ICES J Mar Sci J du Conseil 65(7):1144–1152

    Article  Google Scholar 

  • Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163

    Article  MATH  Google Scholar 

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18

    Article  Google Scholar 

  • Hartmann B, Link N (2010) Gesture recognition with inertial sensors and optimized dtw prototypes. Systems man and cybernetics (SMC), 2010 IEEE international conference on. IEEE pp 2102–2109

  • He Q, Dong Z, Zhuang F, Shi Z (2012) Fast Time Series Classification based on infrequent shapelets. Machine learning and applications (ICMLA), 2012 11th international conference on. IEEE, pp 215–219

  • Hoare C (1962) Quicksort. Comput J 5(1):10–16

    Article  MATH  MathSciNet  Google Scholar 

  • Image Processing and Informatics Lab, University of Southern California, The Digital Hand Atlas Database System. http://www.ipilab.org/BAAweb

  • Janacek G, Bagnall A, Powell M (2005) A likelihood ratio distance measure for the similarity between the fourier transform of time series. Proceedings of the Ninth Pacific-Asia Conference on knowledge discovery and data mining (PAKDD)

  • Jeong Y, Jeong M, Omitaomu O (2010) Weighted dynamic time warping for time series classification. Pattern Recognit 44:2231–2240

    Article  Google Scholar 

  • Hu B, Chen Y, Keogh E (2013) Time series classification under more realistic assumptions. Proceedings of the thirteenth SIAM conference on data mining (SDM)

  • Kruskal W (1952) A nonparametric test for the several sample problem. Ann Math Stat 23(4):525–540

    Article  MATH  MathSciNet  Google Scholar 

  • Latecki L, Lakamper R, Eckhardt T (2000) Shape descriptors for non-rigid shapes with a single closed contour. Computer vision and pattern recognition, 2000. Proceedings. IEEE conference on vol. 1. IEEE, pp 424–429

  • Lines J, Bagnall A (2012) Alternative quality measures for time series shapelets. Intelligent data engineering and automated learning (IDEAL). Lect Notes Comput Sci 7435:475–483

    Article  Google Scholar 

  • Lines J, Davis L, Hills J, Bagnall A (2012) A shapelet transform for time series classification. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 289–297

  • Mood AM, Graybill FA, Boes DC (1974) Introduction to the theory of statistics, 3rd edn. McGraw-Hill, New York

  • Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1154–1162

  • Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 262–270

  • Rakthanmanon T, Keogh E (2013) Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets. Proceedings of the thirteenth SIAM conference on data mining (SDM)

  • Rodriguez J, Alonso C (2005) Support vector machines of interval-based features for time series classification. Knowl-Based Syst 18:171–178

    Google Scholar 

  • Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. Pattern Anal Mach Intell IEEE Trans 28(10):1619–1630

    Article  Google Scholar 

  • Shannon C, Weaver W, Blahut R, Hajek B (1949) The mathematical theory of communication, vol 117. University of Illinois press, Urbana

    MATH  Google Scholar 

  • Sivakumar P et al. (2012) Human gait recognition and classification using time series shapelets. International conference on advances in computing and communications (ICACC)

  • Stransky C (2005) Geographic variation of golden redfish (sebastes marinus) and deep-sea redfish (s. mentella) in the north atlantic based on otolith shape analysis. ICES J Mar Sci J du Conseil 62(8):1691–1698

    Article  Google Scholar 

  • Wu Y, Agrawal D, Abbadi AE (2000) A comparison of dft and dwt based similarity search in time-series databases. Proceedings of the ninth international conference on information and knowledge management (ACM CIKM)

  • Xing Z, Pei J, Yu P (2012) Early classification on time series. Knowl Inform syst 31(1):105–127

    Article  Google Scholar 

  • Xing Z, Pei J, Yu P, Wang K (2011) Extracting interpretable features for early classification on time series. Proceedings of the eleventh SIAM conference on data mining (SDM)

  • Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 947–956

  • Ye L, Keogh E (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1):149–182

    Article  MATH  MathSciNet  Google Scholar 

  • Zakaria J, Mueen A, Keogh E (2012) Clustering time series using unsupervised-shapelets. Data mining (ICDM), 2012 IEEE 12th international conference. pp. 785–794

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jon Hills.

Additional information

Responsible editor: Eamonn Keogh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hills, J., Lines, J., Baranauskas, E. et al. Classification of time series by shapelet transformation. Data Min Knowl Disc 28, 851–881 (2014). https://doi.org/10.1007/s10618-013-0322-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-013-0322-1

Keywords