Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2339530.2339576acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Searching and mining trillions of time series subsequences under dynamic time warping

Published: 12 August 2012 Publication History

Abstract

Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We show that our ideas allow us to solve higher-level time series data mining problem such as motif discovery and clustering at scales that would otherwise be untenable. In addition to mining massive datasets, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.

Supplementary Material

JPG File (best_paper_4.jpg)
MP4 File (best_paper_4.mp4)

References

[1]
N. Adams, D. Marquez, and G. Wakefield. 2005. Iterative deepening for melody alignment and retrieval. ISMIR, 199--206.
[2]
I. Assent, R. Krieger, F. Afschari, and T. Seidl. 2008. The TS-Tree: efficient time series search and retrieval. EDBT, 252--63.
[3]
J. Alon, V. Athitsos, Q. Yuan, and S. Sclaroff. 2009. A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE PAMI 31, 9, 1685--1699.
[4]
T. Bragge, M.P. Tarvainen, and P. A. Karjalainen. 2004. High-Resolution QRS Detection Algorithm for Sparsely Sampled ECG Recordings. Univ. of Kuopio, Dept. of Applied Physics Report.
[5]
N. Chadwick, D. McMeekin, and T. Tan. 2011. Classifying eye and head movement artifacts in EEG Signals. DEST.
[6]
H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. J. Keogh. 2008. Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB 1, 2, 1542--52.
[7]
B. Dupasquier and S. Burschka. 2011. Data mining for hackers -- encrypted traffic mining. The 28th Chaos Comm' Congress.
[8]
Y. Chen, G. Chen, K. Chen, and B. C. Ooi. 2009. Efficient processing of warping time series join of motion capture data. ICDE, 1048--1059.
[9]
Faceted DBLP. 2011. http://dblp.l3s.de
[10]
A. Fornés, J. Lladós, and G. Sanchez. 2007. Old handwritten musical symbol classification by a dynamic time warping based method. Graphics Recognition 5046, 51--60.
[11]
A. Fu, E. Keogh, L. Lau, C. Ratanamahatana, and R. Wong. 2008. Scaling and time warping in time series querying. VLDB J. 17, 4, 899--921.
[12]
N. Gillian, R. Knapp, and S. O'Modhrain. 2011. Recognition of multivariate temporal musical gestures using n-dimensional dynamic time warping. Proc of the 11th Int'l conference on New Interfaces for Musical Expression.
[13]
D. Goldberg. 1991. What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys 23, 1.
[14]
G. Guitel. 1975. Histoire comparée des numérations écrites. Chapter: "Les grands nombres en numération parlée," Paris: Flammarion, 566--574.
[15]
R. Huber-Mörk, S. Zambanini, M. Zaharieva, and M. Kampel. 2011. Identification of ancient coins based on fusion of shape and local features. Mach. Vis. Appl. 22, 6, 983--994.
[16]
M. Hsiao, K. West, and G. Vedatesh. 2005. Online context recognition in multisensor system using dynamic time warping. ISSNIP, 283--288.
[17]
H. Jegou, M. Douze, C. Schmid, and P. Perez. 2010. Aggregating local descriptors into a compact image representation. IEEE CVPR, San Francisco, CA, USA.
[18]
T. Kahveci and A. K. Singh. 2004. Optimizing similarity search for arbitrary length time series queries. IEEE Trans. Knowl. Data Eng. 16, 4, 418--433.
[19]
E. Keogh and S. Kasetty. 2003. On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining and Knowledge. Discovery 7, 4, 349--371.
[20]
E. Keogh, L. Wei, X. Xi, M. Vlachos, S.H. Lee, and P. Protopapas. 2009. Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures. VLDB J. 18, 3, 611--630.
[21]
S. Kim, S Park, and W. Chu. 2001. An index-based approach for similarity search supporting time warping in large sequence databases. ICDE, 607--61.
[22]
K. Laerhoven, E. Berlin, and B. Schiele. 2009. Enabling efficient time series analysis for wearable activity data. ICMLA, 392--397.
[23]
S. H. Lim, H. Park, and S. W. Kim. 2007. Using multiple indexes for efficient subsequence matching in time-series databases. Inf. Sci. 177, 24, 5691--5706.
[24]
D. P. Locke, L. W. Hillier, W. C. Warren, et al. 2011. Comparative and demographic analysis of orangutan genomes. Nature 469, 529--533.
[25]
A. Mueen and E. Keogh. 2010. Online discovery and maintenance of time series motifs. KDD, 1089--1098.
[26]
A. Mueen, E. Keogh, Q. Zhu, S. Cash, M. B. Westover, and N. Shamlo. 2011. A disk-aware algorithm for time series motif discovery. Data Min. Knowl. Discov. 22, 1--2, 73--105.
[27]
M. Muller. 2009. Analysis and retrieval techniques for motion and music data. EUROGRAPHICS tutorial.
[28]
P. Papapetrou, V. Athitsos, M. Potamias, G. Kollios, and D. Gunopulos. 2011. Embedding-based subsequence matching in time-series databases. ACM TODS 36, 3, 17*.
[29]
W. Pressly. 2008. TSPad: a Tablet-PC based application for annotation and collaboration on time series data. ACM Southeast Regional Conference, 527--52.
[30]
B. Raghavendra, D. Bera, A. Bopardikar, and R. Narayanan. 2011. Cardiac arrhythmia detection using dynamic time warping of ECG beats in e-healthcare systems. WOWMOM, 1--6.
[31]
U. Rebbapragada, P. Protopapas, C. Brodley, and C. Alcock. 2009. Finding anomalous periodic time series. Machine Learning 74, 3, 281--313.
[32]
Y. Sakurai, C. Faloutsos, and M. Yamamuro. 2007. Stream monitoring under the time warping distance. ICDE, 1046--55.
[33]
Y. Sakurai, M. Yoshikawa, and C. Faloutsos. 2005. FTW: fast similarity search under the time warping distance. PODS'05.
[34]
S. Srikanthan, A.Kumar, and R. Gupta. 2011. Implementing the dynamic time warping algorithm in multithreaded environments for real time and unsupervised pattern discovery. IEEE ICCCT, 394--398.
[35]
J. Shieh and E. J. Keogh. 2008. iSAX: indexing and mining terabyte sized time series. KDD, 623--631.
[36]
T. Stiefmeier, D. Roggen, and G. Tröster. 2007. Gestures are strings: efficient online gesture spotting and classification using string matching. Proceedings of the ICST 2nd international conference on Body area networks.
[37]
C. R. Whitney. 1997. Jeanne Calment, World's elder, dies at 122. New York Times (August 5th, 1997).
[38]
J. O. Wobbrock, A. D. Wilson, and Y. Li. 2007. Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. ACM UIST, 159--168.
[39]
L. Ye and E. Keogh. 2009. Time series shapelets: a new primitive for data mining. KDD, 947--956.
[40]
B. Yi, H. Jagadish, and C. Faloutsos. 1998. Efficient retrieval of similar time sequences under time warping. ICDE, 201--208.
[41]
Y. Zhang and J. Glass. 2011. An inner-product lower-bound estimate for dynamic time warping. ICASSP, 5660--5663.
[42]
A. Zinke and D. Mayer. 2006. Iterative Multi Scale Dynamic Time Warping. Universität Bonn, Tech Report # CG-2006--1.
[43]
Project Website: www.cs.ucr.edu/~eamonn/UCRsuite.html

Cited By

View all
  • (2025)AutoFOX: An automated cross-modal 3D fusion framework of coronary X-ray angiography and OCTMedical Image Analysis10.1016/j.media.2024.103432101(103432)Online publication date: Apr-2025
  • (2024)MC-GTAProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694164(51086-51104)Online publication date: 21-Jul-2024
  • (2024)Classification-Based Parameter Optimization Approach of the Turning ProcessMachines10.3390/machines1211080512:11(805)Online publication date: 13-Nov-2024
  • Show More Cited By

Index Terms

  1. Searching and mining trillions of time series subsequences under dynamic time warping

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2012
    1616 pages
    ISBN:9781450314626
    DOI:10.1145/2339530
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. lower bounds
    2. similarity search
    3. time series

    Qualifiers

    • Research-article

    Conference

    KDD '12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)411
    • Downloads (Last 6 weeks)30
    Reflects downloads up to 05 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)AutoFOX: An automated cross-modal 3D fusion framework of coronary X-ray angiography and OCTMedical Image Analysis10.1016/j.media.2024.103432101(103432)Online publication date: Apr-2025
    • (2024)MC-GTAProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694164(51086-51104)Online publication date: 21-Jul-2024
    • (2024)Classification-Based Parameter Optimization Approach of the Turning ProcessMachines10.3390/machines1211080512:11(805)Online publication date: 13-Nov-2024
    • (2024)A Self-Adaptive Compression Method for Ship Trajectories without Threshold SettingJournal of Marine Science and Engineering10.3390/jmse1206098012:6(980)Online publication date: 11-Jun-2024
    • (2024)Improving the Robustness of DTW to Global Time Warping Conditions in Audio SynchronizationApplied Sciences10.3390/app1404145914:4(1459)Online publication date: 10-Feb-2024
    • (2024)Efficient Time-Series Clustering through Sparse Gaussian ModelingAlgorithms10.3390/a1702006117:2(61)Online publication date: 30-Jan-2024
    • (2024)Consumer and Professional Inflation Expectations – Properties and Mutual DependenciesOczekiwania inflacyjne konsumentów i profesjonalistów – własności i wzajemne zależnościComparative Economic Research. Central and Eastern Europe10.18778/1508-2008.27.2327:3(93-116)Online publication date: 30-Sep-2024
    • (2024)DIDS: Double Indices and Double Summarizations for Fast Similarity SearchProceedings of the VLDB Endowment10.14778/3665844.366585117:9(2198-2211)Online publication date: 1-May-2024
    • (2024)CIVET: Exploring Compact Index for Variable-Length Subsequence Matching on Time SeriesProceedings of the VLDB Endowment10.14778/3665844.366584517:9(2123-2135)Online publication date: 1-May-2024
    • (2024)Introducing Mplots: scaling time series recurrence plots to massive datasetsJournal of Big Data10.1186/s40537-024-00954-111:1Online publication date: 20-Jul-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media