research-article

Data Series Similarity Using Correlation-Aware Measures

Authors:

Katsiaryna Mirylenka,

Michele Dallachiesa,

Themis PalpanasAuthors Info & Claims

SSDBM '17: Proceedings of the 29th International Conference on Scientific and Statistical Database Management

Article No.: 11, Pages 1 - 12

https://doi.org/10.1145/3085504.3085515

Published: 27 June 2017 Publication History

Abstract

The increased availability of unprecedented amounts of sequential data (generated by Internet-of-Things, as well as scientific applications) has led in the past few years to a renewed interest and attention to the field of data series processing and analysis. Data series collections are processed and analyzed using a large variety of techniques, most of which are based on the computation of some distance function. In this study, we revisit this basic operation of data series distance calculation. We observe that the popular distance measures are oblivious to the correlations inherent in neighboring values in a data series. Therefore, we evaluate the plausibility and benefit of incorporating into the distance function measures of correlation, which enable us to capture the associations among neighboring values in the sequence. We propose four such measures, inspired by statistical and probabilistic approaches, which can effectively model these correlations. We analytically and experimentally demonstrate the benefits of the new measures using the 1NN classification task, and discuss the lessons learned. Finally, we propose future research directions for enabling the proposed measures to be used in practice.

References

[1]

Adhd-200. http://fcon_1000.projects.nitrc.org/indi/adhd200/, 2011.

[2]

Sloan digital sky survey. https://www.sdss3.org/drl0/data_access/volume.php, 2015.

[3]

R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. pages 69--84. Springer Verlag, 1993.

Digital Library

[4]

J. Aßfalg, H. Kriegel, P. Kröger, P. Kunath, A. Pryakhin, and M. Renz. Similarity search on time series based on threshold queries. Advances in Database Technology-EDBT 2006, pages 276--294, 2006.

Digital Library

[5]

D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In AAAIWS, pages 359--370, 1994.

Digital Library

[6]

G. Beskales, M. Soliman, and I. Ilyas. Efficient search for the top-k probable nearest neighbors in uncertain databases. Proceedings of the VLDB Endowment, 1(1):326--339, 2008.

Digital Library

[7]

A. Camerra, T. Palpanas, J. Shieh, and E. Keogh. isax 2.0: Indexing and mining one billion time series. In ICDM, 2010.

Digital Library

[8]

A. Camerra, J. Shieh, T. Palpanas, T. Rakthanmanon, and E. Keogh. Beyond one billion time series: indexing and mining very large time series collections with isax2+. KAIS, 39(1):123--151, 2014.

Digital Library

[9]

L. Chen, M. T. Özsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In Proceedings of the ACM SIGMOD international conference on Management of data, pages 491--502, 2005.

Digital Library

[10]

Y. Chen, M. A. Nascimento, B. C. Ooi, and A. K. Tung. Spade: On shape-based pattern detection in streaming time series. In International Conference on Data Engineering (ICDE)., pages 786--795, 2007.

[11]

M. Dallachiesa, B. Nushi, K. Mirylenka, and T. Palpanas. Similarity matching for uncertain time series: Analytical and experimental comparison. QUeST '11, pages 8--15. ACM, 2011.

Digital Library

[12]

M. Dallachiesa, B. Nushi, K. Mirylenka, and T. Palpanas. Uncertain time-series similarity: return to the basics. Proceedings of the VLDB Endowment, 5(11):1662--1673, 2012.

Digital Library

[13]

M. Dallachiesa, T. Palpanas, and I. F. Ilyas. Top-k nearest neighbor search in uncertain data series. PVLDB, 8(1):13--24, 2014.

Digital Library

[14]

G. Das, D. Gunopulos, and H. Mannila. Pkdd. Principles of Data Mining and Knowledge Discovery, pages 88--100, 1997.

[15]

B. Dasarathy. Nearest Unlike Neighbor (NUN): An Aid to Decision Confidence Estimation. In Optical Engineering 34, 1995.

[16]

H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh. Querying and mining of time series data: experimental comparison of representations and distance measures. Proceedings of the VLDB Endowment, 1(2):1542--1552, 2008.

Digital Library

[17]

M.-P. Dubuisson and A. Jain. A modified hausdorff distance for object matching. In Pattern Recognition, 1994. Vol. 1 - Conference A: Computer Vision amp; Image Processing., Proceedings of the 12th IAPR International Conference on, volume 1, pages 566--568 vol.1, 1994.

[18]

R. P. W. Duin and P. Paclik. Prototype selection for dissimilarity-based classifiers. Pattern Recognition, 39:189--208, 2006.

Digital Library

[19]

M. Gavrilov, D. Anguelov, P. Indyk, and R. Motwani. Mining the stock market (extended abstract): which measure is best? In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '00, pages 487--496. ACM, 2000.

Digital Library

[20]

X. Ge and P. Smyth. Deformable markov model templates for time-series pattern matching. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '00, pages 81--90. ACM, 2000.

Digital Library

[21]

P. Huijse, P. A. Estévez, P. Protopapas, J. C. Principe, and P. Zegers. Computational intelligence challenges and applications on large-scale astronomical time series databases. IEEE Comp. Int. Mag., 9(3):27--39, 2014.

Digital Library

[22]

D. W. Jacobs, D. Weinshall, and Y. Gdalyahu. Classification with non-metric distances: Image retrieval and class representation. IEEE Trans. Pattern Anal. Mach. Intell., 22(6):583600, 2000.

Digital Library

[23]

K. Kalpakis, D. Gada, and V. Puttagunta. Distance measures for effective clustering of arima time-series. In ICDM, pages 273--280, 2001.

Digital Library

[24]

K. Kashino, G. Smith, and H. Murase. Time-series active search for quick retrieval of audio and video. In ICASSP, 1999.

Digital Library

[25]

E. Keogh, X. Xi, L. Wei, and C. Ratanamahatana. The UCR Time Series Classification/Clustering Homepage, 2011.

[26]

A. Kotsifakos, V. Athitsos, and P. Papapetrou. Query-sensitive distance measure selection for time series nearest neighbor classification. IDA, 20(1):5--27, 2016.

[27]

L. Li, B. A. Prakash, and C. Faloutsos. Parsimonious linear fingerprinting for time series. PVLDB, 3(1):385--396, 2010.

Digital Library

[28]

J. Lin, R. Khade, and Y. Li. Rotation-invariant similarity in time series using bag-of-patterns representation. J. Intell. Inf. Syst., 39(2), 2012.

Digital Library

[29]

K. Mirylenka, V. Christophides, T. Palpanas, I. Pefkianakis, and M. May. Characterizing home device usage from wireless traffic time series. In EDBT, pages 539--550, 2016.

[30]

K. Mirylenka, G. Cormode, T. Palpanas, and D. Srivastava. Conditional heavy hitters: detecting interesting correlations in data streams. The VLDB Journal, 24(3):395--414, 2015.

Digital Library

[31]

K. Mirylenka, M. Dallachiesa, and T. Palpanas. Correlation-aware distance measures for data series. In Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, Venice, Italy, March 21-24, 2017., pages 502--505, 2017.

[32]

K. Mirylenka, C. Miksovic, and P. Scotton. Recurrent neural networks for modeling company-product time series. In Proceedings of AALTD 2016: Second ECML/PKDD International Workshop on Advanced Analytics and Learning on Temporal Data, pages 29--36, 2016.

[33]

K. Mirylenka, T. Palpanas, G. Cormode, and D. Srivastava. Finding interesting correlations with conditional heavy hitters. In ICDE, pages 1069--1080, 2013.

Digital Library

[34]

G. Moody and R. Mark. The impact of the mit-bih arrhythmia database. Engineering in Medicine and Biology Magazine, IEEE, 20(3):45--50, may-june 2001.

[35]

T. Palpanas. Data series management: The road to big sequence analytics. SIGMOD Record, 44(2):47--52, 2015.

Digital Library

[36]

T. Palpanas. Big sequence management: A glimpse of the past, the present, and the future. In International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM), pages 63--80, 2016.

Digital Library

[37]

S. Papadimitriou, J. Sun, and P. S. Yu. Local correlation tracking in time series. In ICDM, pages 456--465, 2006.

Digital Library

[38]

P. Paraskevopoulos, T.-C. Dinh, Z. Dashdorj, T. Palpanas, and L. Serafini. Identification and characterization of human behavior patterns from mobile phone data. In D4D Challenge session, NetMob, 2013.

[39]

T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, and E. Keogh. Searching and mining trillions of time series subsequences under dynamic time warping. In KDD, pages 262--270, 2012.

Digital Library

[40]

U. Raza, A. Camerra, A. L. Murphy, T. Palpanas, and G. P. Picco. Practical data prediction for real-world wireless sensor networks. IEEE Trans. Knowl. Data Eng., accepted for publication, 2015.

[41]

D. Shasha. Tuning time series queries in finance: Case studies and recommendations. IEEE Data Eng. Bull., 22(2):40--46, 1999.

[42]

J. Shieh and E. J. Keogh. isax: indexing and mining terabyte sized time series. In KDD, pages 623--631, 2008.

Digital Library

[43]

Z. R. Struzik and A. Siebes. Measuring time series' similarity through large singular features revealed with wavelet transformation. In International Workshop on Database & Expert Systems Applications (DEXA), 1999.

Digital Library

[44]

T. Warren Liao. Clustering of time series data: a survey. Pattern Recognition, 38(11):1857--1874, 2005.

Digital Library

[45]

L. Ye and E. J. Keogh. Time series shapelets: a new primitive for data mining. In KDD, 2009.

Digital Library

[46]

K. Zoumpatianos, S. Idreos, and T. Palpanas. ADS: the adaptive data series index. VLDB J., 25(6):843--866, 2016.

Digital Library

[47]

K. Zoumpatianos, Y. Lou, T. Palpanas, and J. Gehrke. Query workloads for data series indexes. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10-13, pages 1603--1612, 2015.

Digital Library

Cited By

Echihabi KTsandilas TGogolou ABezerianos APalpanas T(2022)ProS: data series progressive k-NN similarity search and classification with probabilistic quality guaranteesThe VLDB Journal10.1007/s00778-022-00771-z32:4(763-789)Online publication date: 30-Nov-2022
https://doi.org/10.1007/s00778-022-00771-z
Makris KVonta I(2021)Presentation of Coupling Analysis Techniques of Maximum and Minimum Values Between N Sets of Data Using Matrix [µ][MKN]International Journal of Mathematical, Engineering and Management Sciences10.33889/IJMEMS.2021.6.4.0676:4(1127-1136)Online publication date: 18-Jul-2021
https://doi.org/10.33889/IJMEMS.2021.6.4.067
Paparrizos JLiu CElmore AFranklin MMaier DPottinger RDoan ATan WAlawini ANgo H(2020)Debunking Four Long-Standing Misconceptions of Time-Series Distance MeasuresProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389760(1887-1905)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3318464.3389760
Show More Cited By

Index Terms

Data Series Similarity Using Correlation-Aware Measures

Recommendations

Improving performance of similarity measures for uncertain time series using preprocessing techniques
SSDBM '15: Proceedings of the 27th International Conference on Scientific and Statistical Database Management

We study the impact of preprocessing techniques on performance and effectiveness of the similarity measures for uncertain time series. Some existing work on uncertain time series use the same similarity measures developed for standard time series, to ...
A new similarity measure between intuitionistic fuzzy sets and the positive definiteness of the similarity matrix

As a generation of fuzzy set theory, intuitionistic fuzzy (IF) set theory has received considerable attention for its capability on dealing with uncertainty. Similarity measures of IF sets are used to indicate the degree of commonality between IF sets. ...
An evidential view of similarity measure for Atanassov’s intuitionistic fuzzy sets

In this paper, the construction of similarity measures for Atanassov’s intuitionistic fuzzy sets (AIFSs) is considered from the view of evidence theory. We define similarity measures for AIFSs in the framework of Dempster–Shafer evidence theory. The ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SSDBM '17: Proceedings of the 29th International Conference on Scientific and Statistical Database Management

June 2017

373 pages

ISBN:9781450352826

DOI:10.1145/3085504

General Chair:
Alok Choudhary,
Program Chair:
Kesheng Wu,
Publications Chair:
Bin Dong

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

Northwestern University: Northwestern University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SSDBM '17

SSDBM '17: 29th International Conference on Scientific and Statistical Database Management

June 27 - 29, 2017

IL, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 56 of 146 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
150
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)1

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Echihabi KTsandilas TGogolou ABezerianos APalpanas T(2022)ProS: data series progressive k-NN similarity search and classification with probabilistic quality guaranteesThe VLDB Journal10.1007/s00778-022-00771-z32:4(763-789)Online publication date: 30-Nov-2022
https://doi.org/10.1007/s00778-022-00771-z
Makris KVonta I(2021)Presentation of Coupling Analysis Techniques of Maximum and Minimum Values Between N Sets of Data Using Matrix [µ][MKN]International Journal of Mathematical, Engineering and Management Sciences10.33889/IJMEMS.2021.6.4.0676:4(1127-1136)Online publication date: 18-Jul-2021
https://doi.org/10.33889/IJMEMS.2021.6.4.067
Paparrizos JLiu CElmore AFranklin MMaier DPottinger RDoan ATan WAlawini ANgo H(2020)Debunking Four Long-Standing Misconceptions of Time-Series Distance MeasuresProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389760(1887-1905)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3318464.3389760
Ahsan RNeamtu RBashir MRundensteiner ESarkozy G(2020)Correlation-Based Analytics of Time Series Data2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378155(4482-4491)Online publication date: 10-Dec-2020
https://doi.org/10.1109/BigData50022.2020.9378155
Palpanas T(2020)Evolution of a Data Series IndexInformation Search, Integration, and Personalization10.1007/978-3-030-44900-1_5(68-83)Online publication date: 27-Mar-2020
https://doi.org/10.1007/978-3-030-44900-1_5
Mirylenka KScotton PMiksovic CAlaoui S(2020)Linking IT Product RecordsMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-43887-6_9(101-111)Online publication date: 28-Mar-2020
https://doi.org/10.1007/978-3-030-43887-6_9
Echihabi KZoumpatianos KPalpanas TBenbrahim H(2019)Return of the Lernaean HydraProceedings of the VLDB Endowment10.14778/3368289.336830313:3(403-420)Online publication date: 1-Nov-2019
https://dl.acm.org/doi/10.14778/3368289.3368303
Alseghayer RPetrov DChrysanthis PSharaf MLabrinidis A(2019)DCS: A Policy Framework for the Detection of Correlated Data StreamsReal-Time Business Intelligence and Analytics10.1007/978-3-030-24124-7_12(191-210)Online publication date: 11-Oct-2019
https://doi.org/10.1007/978-3-030-24124-7_12
Echihabi KZoumpatianos KPalpanas TBenbrahim H(2018)The lernaean hydra of data series similarity searchProceedings of the VLDB Endowment10.14778/3282495.328249812:2(112-127)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.14778/3282495.3282498
Alseghayer RPetrov DChrysanthis P(2018)Strategies for Detection of Correlated Data StreamsProceedings of the 5th International Workshop on Exploratory Search in Databases and the Web10.1145/3214708.3214714(1-6)Online publication date: 15-Jun-2018
https://dl.acm.org/doi/10.1145/3214708.3214714

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents