Abstract
Given a long time series, the distance profile of a query time series computes distances between the query and every possible subsequence of a long time series. MASS (Mueen’s Algorithm for Similarity Search) is an algorithm to efficiently compute distance profile under z-normalized Euclidean distance (Mueen et al. in The fastest similarity search algorithm for time series subsequences under Euclidean distance. http://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html, 2017). MASS is recognized as a useful tool in many data mining works. However, complete documentation of the increasingly efficient versions of the algorithm does not exist. In this paper, we formalize the notion of a distance profile, describe four versions of the MASS algorithm, show several extensions of distance profiles under various operating conditions, describe how MASS improves performances of existing data mining algorithms, and finally, show utility of MASS in domains including seismology, robotics and power grids.



















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abdoli A, Alaee S, Imani S, Murillo A, Gerry A, Hickle L, Keogh E (2020) Fitbit for chickens? Time series data mining can increase the productivity of poultry farms. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. KDD ’20. Association for Computing Machinery, New York, NY, USA, pp 3328–3336 (2020). https://doi.org/10.1145/3394486.3403385
Alshaer M, Garcia-Rodriguez S, Gouy-Pailler C (2020) Detecting anomalies from streaming time series using matrix profile and shapelets learning. In: 2020 IEEE 32nd international conference on tools with artificial intelligence (ICTAI), pp 376–383. https://doi.org/10.1109/ICTAI50040.2020.00066
Arfken GB, Weber HJ, Harris FE (2013) Chapter 20—Integral transforms. In: Arfken GB, Weber HJ, Harris FE (eds) Mathematical methods for physicists, 7th edn. Academic Press, Boston, pp 963–1046. https://doi.org/10.1016/B978-0-12-384654-9.00020-7
Bagnall A, Lines J, Vickers W, Keogh E (2023) The UEA & UCR time series classification repository. www.timeseriesclassification.com
Bastogne T, Noura H, Richard A, Hittinger J-M (1997) Application of subspace methods to the identification of a winding process. In: 1997 European control conference (ECC), pp 2168–2173. https://doi.org/10.23919/ECC.1997.7082426
Camerra A, Palpanas T, Shieh J, Keogh E (2010) iSAX 2.0: indexing and mining one billion time series. In: 2010 IEEE international conference on data mining, pp 58–67. https://doi.org/10.1109/ICDM.2010.124
Chandrasekar S, Coble JB, List F, Carver K, Beauchamp S, Godfrey A, Paquit V, Babu SS (2022) Similarity analysis for thermal signature comparison in metal additive manufacturing. Mater Des 224:111261. https://doi.org/10.1016/j.matdes.2022.111261
Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms. The MIT Press, Cambridge
Fast Fourier transform with CuPy (2023). https://docs.cupy.dev/en/stable/user_guide/fft.html
Franch G, Jurman G, Coviello L, Pendesini M, Furlanello C (2019) MASS-UMAP: fast and accurate analog ensemble search in weather radar archives. Remote Sens. 11(24):2922. https://doi.org/10.3390/rs11242922
Frigo, M, Johnson SG (1998) FFTW: an adaptive software architecture for the FFT. In: Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, ICASSP’98 (Cat. No. 98CH36181), vol 3. https://doi.org/10.1109/ICASSP.1998.681704
Frigo M, Johnson SG (2005) The design and implementation of FFTW3. In: Proceedings of the IEEE, vol 93. https://doi.org/10.1109/JPROC.2004.840301
Frigo M, Johnson SG (2020) FFTW manual. https://fftw.org/fftw3.pdf
Harris FJ (1987) Chapter 8—Time domain signal processing with the DFT. In: Elliott DF (ed) Handbook of digital signal processing. Academic Press, San Diego, pp 633–699. https://doi.org/10.1016/B978-0-08-050780-4.50013-8
Heo H, Kim HJ, Kim WS, Lee K (2017) Cover song identification with metric learning using distance as a feature. In: Cunningham SJ, Duan Z, Hu X, Turnbull D (eds) Proceedings of the 18th international society for music information retrieval conference, ISMIR 2017, Suzhou, China, October 23–27, 2017, pp 628–634. https://ismir2017.smcnus.org/wp-content/uploads/2017/10/33_Paper.pdf
Johnson SG, Frigo M (2007) A modified split-radix FFT with fewer arithmetic operations. IEEE Trans. Signal Process. 55:111–119. https://doi.org/10.1109/TSP.2006.882087
Kammerer K, Hoppenstedt B, Pryss R, Stökler S, Allgaier J, Reichert M (2019) Anomaly detections for manufacturing systems based on sensor data-insights into two challenging real-world production settings. Sensors 19(24):5370. https://doi.org/10.3390/s19245370
Keogh E (2017) The UCR matrix profile page. https://www.cs.ucr.edu/eamonn/MatrixProfile.html
Keogh EJ, Pazzani MJ (2000) Scaling up dynamic time warping for datamining applications. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining—KDD ’00. ACM Press, New York, New York, USA, pp 285–289. https://doi.org/10.1145/347090.347153. http://dl.acm.org/citation.cfm?id=347090.347153
Lai E (2003) 4–frequency-domain representation of discrete-time signals. In: Lai E (ed) Practical digital signal processing. Newnes, Oxford, pp 61–78. https://doi.org/10.1016/B978-075065798-3/50004-7
Lu Y, Wu R, Mueen A, Zuluaga MA, Keogh E (2022) Matrix profile xxiv: scaling time series anomaly detection to trillions of datapoints and ultra-fast arriving data streams. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining. KDD ’22. Association for Computing Machinery, New York, NY, USA, pp 1173–1182 (2022). https://doi.org/10.1145/3534678.3539271
Mercer R, Alaee S, Abdoli A, Senobari NS, Singh S, Murillo A, Keogh E (2022) Introducing the contrast profile: a novel time series primitive that allows real world classification. Data Min Knowl Disc 36:877–915. https://doi.org/10.1007/s10618-022-00824-5
Mercer R, Keogh E (2022) Matrix profile xxv: introducing novelets: a primitive that allows online detection of emerging behaviors in time series. In: 2022 IEEE international conference on data mining (ICDM), pp 338–347. https://doi.org/10.1109/ICDM54844.2022.00044
Mollah MP, Souza VMA, Mueen A (2021) Multi-way time series join on multi-length patterns. In: 2021 IEEE international conference on data mining (ICDM), pp 429–438. https://doi.org/10.1109/ICDM51629.2021.00054
Mueen A, Keogh E (2010) Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining—KDD ’10. ACM Press, New York, New York, USA, p 1089. https://doi.org/10.1145/1835804.1835941. http://dl.acm.org/citation.cfm?id=1835804.1835941
Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: The 17th ACM SIGKDD international conference, pp 1154–1162. https://doi.org/10.1145/2020408.2020587
Mueen A, Nath S, Liu J (2010) Fast approximate correlation for massive time-series data. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, pp 171–182. https://doi.org/10.1145/1807167.1807188
Mueen A, Zhu Y, Yeh M, Kamgar K, Viswanathan K, Gupta C, Keogh E (2017) The fastest similarity search algorithm for time series subsequences under Euclidean distance. http://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html
Multiple GPU cuFFT transforms (2023). https://docs.nvidia.com/cuda/cufft/index.html#multiple-gpu-2d-and-3d-transforms-on-permuted-input
Piatov D, Helmer S, Dignös A, Gamper J (2019) Interactive and space-efficient multi-dimensional time series subsequence matching. Inf Syst 82:121–135. https://doi.org/10.1016/j.is.2018.08.002
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 262–270. https://doi.org/10.1145/2339530.2339576
Rakthanmanon T, Keogh EJ, Lonardi S, Evans S (2011) Time series epenthesis: clustering time series streams requires ignoring some data. In: Proceedings—IEEE international conference on data mining, ICDM. ICDM ’11, pp 547–556. https://doi.org/10.1109/ICDM.2011.146
Shao X, Johnson SG (2008) Type-II/III DCT/DST algorithms with reduced number of arithmetic operations. Signal Process 88:1553–1564. https://doi.org/10.1016/j.sigpro.2008.01.004
Shieh J, Keogh E (2008) iSAX : indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, vol KDD ’08, pp 623–631. https://doi.org/10.1145/1401890.1401966. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.155.4531
Silva DF, Batista GEAPA, Keogh E (2016) Prefix and suffix invariant dynamic time warping. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 1209–1214. https://doi.org/10.1109/ICDM.2016.0161
Silva DF, Yeh CM, Batista GEAPA, Keogh EJ (2016) Simple: assessing music similarity using subsequences joins. In: Mandel MI, Devaney J, Turnbull D, Tzanetakis G (eds) Proceedings of the 17th international society for music information retrieval conference, ISMIR 2016, New York City, United States, August 7–11, 2016, pp 23–29. https://wp.nyu.edu/ismir2016/wp-content/uploads/sites/2294/2016/07/099_Paper.pdf
Silva DF, Yeh C-CM, Zhu Y, Batista GEAPA, Keogh E (2019) Fast similarity matrix profile for music analysis and exploration. IEEE Trans Multimedia 21(1):29–38. https://doi.org/10.1109/TMM.2018.2849563
Stefan A, Athitsos V, Das G (2013) The move-split-merge metric for time series. IEEE Trans Knowl Data Eng 25(6):1425–1438. https://doi.org/10.1109/TKDE.2012.88
Uudeberg T, Belikov J, Päeske L, Hinrikus H, Liiv I, Bachmann M (2023) In-phase matrix profile: a novel method for the detection of major depressive disorder. Biomed Signal Process Control 88:105378. https://doi.org/10.1016/j.bspc.2023.105378
Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh E (2003) Indexing multi-dimensional time-series with support for multiple distance measures. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’03. ACM, New York, NY, USA, pp 216–225. https://doi.org/10.1145/956750.956777
Wilhelm S, Kasbauer J (2021) Exploiting smart meter power consumption measurements for human activity recognition (HAR) with a motif-detection-based non-intrusive load monitoring (NILM) approach. Sensors 21(23):8036. https://doi.org/10.3390/s21238036
Yang D (2018) Ultra-fast preselection in lasso-type spatio-temporal solar forecasting problems. Sol Energy 176:788–796. https://doi.org/10.1016/j.solener.2018.08.041
Yang D, Alessandrini S (2019) An ultra-fast way of searching weather analogs for renewable energy forecasting. Sol Energy 185:255–261. https://doi.org/10.1016/j.solener.2019.03.068
Yang D, Wu E, Kleissl J (2019) Operational solar forecasting for the real-time market. Int J Forecast 35(4):1499–1519. https://doi.org/10.1016/j.ijforecast.2019.03.009
Yankov D, Keogh E, Rebbapragada U (2008) Disk aware discord discovery: finding unusual time series in terabyte sized datasets. In: Knowledge and information systems, vol 17, pp 241–262. https://doi.org/10.1007/s10115-008-0131-9
Yeh C-CM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2017) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: Proceedings—IEEE international conference on data mining, ICDM. https://doi.org/10.1109/ICDM.2016.89
Yeh C-CM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Zimmerman Z, Silva DF, Mueen A, Keogh E (2017) Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile. Data Min Knowl Disc. https://doi.org/10.1007/s10618-017-0519-9
Zhong S, Souza VMA, Mueen A (2020) FilCorr: filtered and lagged correlation on streaming time series. In: 2020 IEEE international conference on data mining (ICDM), pp 1436–1441. https://doi.org/10.1109/ICDM50108.2020.00190
Zhu L, Lu C, Sun Y (2016) Time series shapelet classification based online short-term voltage stability assessment. IEEE Trans Power Syst 31(2):1430–1439. https://doi.org/10.1109/TPWRS.2015.2413895
Zhu Y, Mueen A, Keogh E (2018) Admissible time series motif discovery with missing data. arXiv preprint arXiv:1802.05472
Zhu Y, Yeh C-CM, Zimmerman Z, Kamgar K, Keogh E (2018) Matrix profile xi: SCRIMP++: time series motif discovery at interactive speeds. In: 2018 IEEE international conference on data mining (ICDM), pp 837–846. https://doi.org/10.1109/ICDM.2018.00099
Zhu Y, Zimmerman Z, Senobari NS, Yeh C-CM, Funning G, Mueen A, Brisk P, Keogh E (2016) Matrix profile ii: exploiting a novel algorithm and GPUs to break the one hundred million barrier for time series motifs and joins. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 739–748. https://doi.org/10.1109/ICDM.2016.0085
Acknowledgments
This material is based on work supported by the National Science Foundation under #2104537.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Eamonn Keogh.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhong, S., Mueen, A. MASS: distance profile of a query over a time series. Data Min Knowl Disc 38, 1466–1492 (2024). https://doi.org/10.1007/s10618-024-01005-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-024-01005-2