research-article

Data Series Progressive Similarity Search with Probabilistic Quality Guarantees

Authors:

Theophanis Tsandilas,

Karima Echihabi,

Anastasia Bezerianos,

Themis PalpanasAuthors Info & Claims

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Pages 1857 - 1873

https://doi.org/10.1145/3318464.3389751

Published: 31 May 2020 Publication History

Abstract

Existing systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support exploration and decision making by providing progressive results, before the final and exact ones have been computed. Prior works lack both efficiency and accuracy when applied to large-scale data series collections. We present and experimentally evaluate a new probabilistic learning-based method that provides quality guarantees for progressive Nearest Neighbor (NN) query answering. We provide both initial and progressive estimates of the final answer that are getting better during the similarity search, as well suitable stopping criteria for the progressive queries. Experiments with synthetic and diverse real datasets demonstrate that our prediction methods constitute the first practical solution to the problem, significantly outperforming competing approaches.

Supplementary Material

MP4 File (3318464.3389751.mp4)

Presentation Video

Download
85.09 MB

References

[1]

2020. Suplmentary Material. http://helios.mi.parisdescartes.fr/~themisp/progrss/

[2]

Marco Angelini, Giuseppe Santucci, Heidrun Schumann, and Hans-Jörg Schulz. 2018. A Review and Characterization of Progressive Visual Analytics. Informatics 5 (2018), 31.

[3]

Sunil Arya, David M. Mount, Nathan S. Netanyahu, Ruth Silverman, and Angela Y. Wu. 1998. An Optimal Algorithm for ApproximateNearest Neighbor Searching Fixed Dimensions. J. ACM 45, 6 (Nov. 1998), 891--923. https://doi.org/10.1145/293347.293348

Digital Library

[4]

Johannes Aßfalg, Hans-Peter Kriegel, Peer Kröger, Peter Kunath,Alexey Pryakhin, and Matthias Renz. 2006. Similarity Search on TimeSeries Based on Threshold Queries. In Advances in Database Technology - EDBT 2006, 10th International Conference on Extending DatabaseTechnology, Munich, Germany, March 26--31, 2006, Proceedings. 276--294. https://doi.org/10.1007/11687238_19

[5]

Artem Babenko and Victor S. Lempitsky. 2015. The Inverted Multi-Index. IEEE Trans. Pattern Anal. Mach. Intell. 37, 6 (2015), 1247--1260.

Digital Library

[6]

Sriram Karthik Badam, Niklas Elmqvist, and Jean-Daniel Fekete. 2017. Steering the Craft: UI Elements and Visualizations for Supporting Progressive Visual Analytics. Comput. Graph. Forum 36, 3 (June 2017), 491--502. https://doi.org/10.1111/cgf.13205

Digital Library

[7]

Anthony J. Bagnall, Richard L. Cole, Themis Palpanas, and Konstantinos Zoumpatianos. 2019. Data Series Management (Dagstuhl Seminar 19282). Dagstuhl Reports 9, 7 (2019).

[8]

Gustavo E. Batista, Eamonn J. Keogh, Oben Moses Tataw, and Vinícius M. Souza. 2014. CID: An Efficient Complexity-invariant Distance for Time Series. Data Min. Knowl. Discov. 28, 3 (2014).

[9]

Donald J Berndt and James Clifford. 1994. Using Dynamic Time Warping to Find Patterns in Time Series. In AAAIWS. 359--370.

[10]

Paul Boniol, Michele Linardi, Federico Roncallo, and Themis Palpanas. 2020. Automated Anomaly Detection in Large Sequences. InICDE.

[11]

Paul Boniol and Themis Palpanas. 2020. Series2Graph: Graph-based Subsequence Anomaly Detection for Time Series. PVLDB(2020).

[12]

Sergey Brin. 1995. Near Neighbor Search in Large Metric Spaces. InProceedings of the 21th International Conference on Very Large DataBases (VLDB '95). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 574--584. http://dl.acm.org/citation.cfm?id=645921.673006

[13]

Paolo Buono and Adalberto Lafcadio Simeone. 2008. Interactive Shape Specification for Pattern Search in Time Series. In AVI.

[14]

Alessandro Camerra, Themis Palpanas, Jin Shieh, and Eamonn J. Keogh. 2010. iSAX 2. 0: Indexing and Mining One Billion Time Series. In ICDM. IEEE Computer Society, 58--67.

[15]

Alessandro Camerra, Jin Shieh, Themis Palpanas, Thanawin Rakthanmanon, and Eamonn J. Keogh. 2014. Beyond One Billion Time Series:Indexing and Mining Very Large Time Series Collections with iSAX2+. Knowl. Inf. Syst. 39, 1 (2014), 123--151.

Digital Library

[16]

Kaushik Chakrabarti, Eamonn Keogh, Sharad Mehrotra, and Michael Pazzani. 2002. Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. ACM Trans. Database Syst. 27, 2 (June 2002), 188--228. https://doi.org/10.1145/568518.568520

Digital Library

[17]

Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly Detection: A Survey. ACM Computing Surveys (CSUR)41, 3 (2009), 15.

Digital Library

[18]

Surajit Chaudhuri, Bolin Ding, and Srikanth Kandula. 2017. Approximate Query Processing: No Silver Bullet. In SIGMOD.

[19]

Yihua Chen, Eric K. Garcia, Maya R. Gupta, Ali Rahimi, and Luca Cazzanti. 2009. Similarity-based Classification: Concepts and Algorithms. J. Mach. Learn. Res. 10 (June 2009), 747--776. http://dl.acm.org/citation.cfm?id=1577069.1577096

[20]

Paolo Ciaccia, Alessandro Nanni, and Marco Patella. 1999. A Query-sensitive Cost Model for Similarity Queries with M-tree. In Proc. Of the 10th ADC. Springer Verlag, 65--76.

[21]

Paolo Ciaccia and Marco Patella. 2000. PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces. In ICDE. 244--255.

[22]

Paolo Ciaccia, Marco Patella, and Pavel Zezula. 1998. A Cost Model for Similarity Queries in Metric Spaces. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '98). ACM, New York, NY, USA, 59--68. https://doi.org/10.1145/275487.275495

Digital Library

[23]

Michael Correll and Michael Gleicher. 2016. The Semantics of Sketch: Flexibility in Visual Query Systems for Time Series Data. In VAST.

[24]

Michele Dallachiesa, Themis Palpanas, and Ihab F. Ilyas. 2014. Top-k Nearest Neighbor Search in Uncertain Data Series. Proc. VLDB Endow. 8, 1 (Sept. 2014), 13--24. https://doi.org/10.14778/2735461.2735463

Digital Library

[25]

Bolin Ding, Silu Huang, Surajit Chaudhuri, Kaushik Chakrabarti, and Chi Wang. 2016. Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee. In SIGMOD.

[26]

Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn Keogh. 2008. Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures. Proceedings of the VLDB Endowment 1, 2 (2008), 1542--1552.

Digital Library

[27]

Tarn Duong and Martin L. Hazelton. 2005. Cross-validation BandwidthMatrices for Multivariate Kernel Density Estimation. ScandinavianJournal of Statistics 32, 3 (2005), 485--506. https://doi.org/10.1111/j.1467--9469.2005.00445.x

[28]

Tarn Duong, Matt Wand, Jose Chacon, and Artur Gramacki. 2019. ks: Kernel Smoothing. https://cran.r-project.org/web/packages/ks/.

[29]

Karima Echihabi. 2019. Truly Scalable Data Series Similarity Search. In VLDB PhD Workshop.

[30]

Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas, and Houda Benbrahim. 2018. The Lernaean Hydra of Data Series Similarity Search:An Experimental Evaluation of the State of the Art. PVLDB 12, 2 (2018),112--127.

Digital Library

[31]

Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas, and Houda Benbrahim. 2019. Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search. PVLDB 13, 3(2019), 402--419.

[32]

Roger Koenker et al. 2019. quantreg: Quantile Regression. https://cran. r-project.org/web/packages/quantreg.

[33]

Christos Faloutsos, M. Ranganathan, and Yannis Manolopoulos. 1994. Fast Subsequence Matching in Time-Series Databases. In SIGMOD. ACM, New York, NY, USA, 419--429. https://doi.org/10.1145/191839.191925

[34]

Jean-Daniel Fekete and Romain Primet. 2016. Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis. CoRRabs/1607.05162 (2016). http://arxiv.org/abs/1607.05162

[35]

Danyel Fisher, Steven M. Drucker, and A. Christian König. 2012. Exploratory Visualization Involving Incremental, Approximate Database Queries and Uncertainty. IEEE CG&A32 (2012).

[36]

Incorporated Research Institutions for Seismology. 2014. IRIS Seismic Data Access. http://ds. iris. edu/data/access/.

[37]

Anna Gogolou, Theophanis Tsandilas, Themis Palpanas, and Anastasia Bezerianos. 2018. Comparing Similarity Perception in Time Series Visualizations. IEEE TVCG 25 (2018).

[38]

Anna Gogolou, Theophanis Tsandilas, Themis Palpanas, and Anastasia Bezerianos. 2019. Progressive Similarity Search on Time Series Data. In Proceedings of the Workshops of the EDBT/ICDT 2019 Joint Conference, EDBT/ICDT 2019, Lisbon, Portugal, March 26, 2019. http://ceur-ws.org/Vol-2322/BigVis_5.pdf

[39]

Dina Q. Goldin and Paris C. Kanellakis. 1995. On Similarity Queries for Time-Series Data: Constraint Specification and Implementation. In CP.

Digital Library

[40]

Yue Guo, Carsten Binnig, and Tim Kraska. 2017. What you see isnot what you get!: Detecting Simpson's Paradoxes during Data Exploration. In Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics, HILDA@SIGMOD.

Digital Library

[41]

Joseph M. Hellerstein, Peter J. Haas, and Helen J. Wang. 1997. Online Aggregation. In SIGMOD.

[42]

Joseph M. Hellerstein, Elias Koutsoupias, and Christos H. Papadimitriou. 1997. On the Analysis of Indexing Schemes. In Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '97). Association for Computing Machinery, New York, NY, USA, 249--256. https://doi.org/10.1145/263661.263688

[43]

Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. ProductQuantization for Nearest Neighbor Search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1 (2011), 117--128.

Digital Library

[44]

Chris Jermaine, Subramanian Arumugam, Abhijit Pol, and Alin Dobra. 2008. Scalable approximate query processing with the DBO engine. ACM Trans. Database Syst. 33, 4 (2008), 23:1--23:54.

Digital Library

[45]

J. Jing, J. Dauwels, T. Rakthanmanon, E. Keogh, S. S. Cash, and M. B. Westover. 2016. Rapid Annotation of Interictal Epilepti form Discharges via Template Matching under Dynamic Time Warping. Journal ofNeuroscience Methods 274 (2016).

[46]

Paris C. Kanellakis, Sridhar Ramaswamy, Darren E. Vengroff, andJeffrey S. Vitter. 1993. Indexing for Data Models with Constraints and Classes (Extended Abstract). In Proceedings of the Twelfth ACMSIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems(PODS '93). Association for Computing Machinery, New York, NY, USA, 233--243. https://doi.org/10.1145/153850.153884

Digital Library

[47]

Eamonn Keogh, Kaushik Chakrabarti, Michael Pazzani, and Sharad Mehrotra. 2001. Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Knowledge and Information Systems 3,3 (2001), 263--286. https://doi.org/10.1007/PL00011669

[48]

Eamonn Keogh and M. Pazzani. 1998. An Enhanced Representation of Time Series which Allows Fast and Accurate Classification, Clustering and Relevance Feedback. In Fourth International Conference on Knowledge Discovery and Data Mining (KDD'98). ACM Press, New York City, NY, 239--241.

[49]

Roger Koenker. 2005. Quantile Regression. Cambridge University Press. https://doi.org/10.1017/CBO9780511754098

[50]

Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos, and Themis Palpanas. 2018. Coconut: A Scalable Bottom-Up Approach for BuildingData Series Indexes. PVLDB 11, 6 (2018), 677--690. https://doi.org/10.14778/3184470.3184472

[51]

Tim Kraska. 2018. Northstar: An Interactive Data Science System. PVLDB 11, 12 (2018), 2150--2164.

Digital Library

[52]

Conglong Li, Minjia Zhang, David G. Andersen, and Yuxiong He. 2020. Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination. In SIGMOD.

[53]

Jessica Lin, Eamonn J. Keogh, Stefano Lonardi, and Bill Yuan-chi Chiu. 2003. A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery,DMKD 2003, San Diego, California, USA, June 13, 2003. 2--11. https://doi.org/10.1145/882082.882086

Digital Library

[54]

Michele Linardi and Themis Palpanas. 2019. Scalable, Variable-Length Similarity Search in Data Series: The ULISSE Approach. PVLDB(2019).

[55]

Michele Linardi and Themis Palpanas. 2020. Scalable Data Series Subsequence Matching with ULISSE. VLDBJ(2020).

[56]

Michele Linardi, Yan Zhu, Themis Palpanas, and Eamonn J. Keogh. 2018. Matrix Profile X: VALMOD - Scalable Discovery of Variable-Length Motifs in Data Series. SIGMOD.

[57]

Yury A. Malkov and D. A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Trans. Pattern Anal. Mach. Intell. 42, 4 (2020),824--836.

Digital Library

[58]

Miro Mannino and Azza Abouzied. 2018. Expressive Time SeriesQuerying with Hand-Drawn Scale-Free Sketches. InCHI.

[59]

Luana Micallef, Hans-Jörg Schulz, Marco Angelini, Michaël Aupetit, Remco Chang, Jörn Kohlhammer, Adam Perer, and Giuseppe Santucci. 2019. The Human User in Progressive Visual Analytics. In Short Paper Proceedings of EuroVis'19. Eurographics Association, 19--23. https://doi.org/10.2312/evs.20191164

[60]

Katsiaryna Mirylenka, Michele Dallachiesa, and Themis Palpanas. 2017. Data Series Similarity Using Correlation-Aware Measures. In SSDBM.

[61]

Dominik Moritz, Danyel Fisher, Bolin Ding, and Chi Wang. 2017. Trust, but Verify: Optimistic Visualizations of Approximate Queries for Exploring Big Data. InCHI.

[62]

Dominik Moritz, Bill Howe, and Jeffrey Heer. 2019. Falcon: Balancing Interactive Latency and Resolution Sensitivity for Scalable Linked Visualizations. In Proceedings of the 2019 CHI Conference on HumanFactors in Computing Systems (CHI '19). ACM, New York, NY, USA, Article 694, 11 pages. https://doi.org/10.1145/3290605.3300924

Digital Library

[63]

J. Nielsen. [n. d. ]. Response times: The 3 important limits. https://www.nngroup.com/articles/response-times-3-important-limits/.

[64]

Themis Palpanas. 2015. Data Series Management: The Road to BigSequence Analytics. SIGMOD Record 44, 2 (2015), 47--52. https://doi.org/10.1145/2814710.2814719

Digital Library

[65]

Themis Palpanas. 2020. Evolution of a Data Series Index - The iSAXFamily of Data Series Indexes. Communications in Computer and Information Science (CCIS)(2020).

[66]

Themis Palpanas and Volker Beckmann. 2019. Report on the First and Second Interdisciplinary Time Series Analysis Workshop (ITISA). SIGMOD Rec. 48, 3 (2019).

[67]

Botao Peng, Panagiota Fatourou, and Themis Palpanas. 2020. MESSI: In-Memory Data Series Indexing. In ICDE.

[68]

Botao Peng, Themis Palpanas, and Panagiota Fatourou. 2018. ParIS: The Next Destination for Fast Data Series Indexing and Query Answering. IEEE Big Data(2018).

[69]

Botao Peng, Themis Palpanas, and Panagiota Fatourou. 2020. ParIS+: Data Series Indexing on Multi-core Architectures. TKDE(2020).

[70]

Nathaniel Phillips. 2017. A Companion to the e-Book "YaRrr!: ThePirate's Guide to R". https://github.com/ndphillips/yarrr.

[71]

Sajjadur Rahman, Maryam Aliakbarpour, Ha Kyung Kong, Eric Blais,Karrie Karahalios, Aditya Parameswaran, and Ronitt Rubinfield. 2017. I've Seen "Enough": Incrementally Improving Visualizations to SupportRapid Decision Making. Proc. VLDB Endow. 10, 11 (Aug. 2017), 1262--1273. https://doi.org/10.14778/3137628.3137637

Digital Library

[72]

Thanawin Rakthanmanon, Bilson J. L. Campana, Abdullah Mueen, Gustavo E. A. P. A. Batista, M. Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn J. Keogh. 2012. Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping. InKDD. ACM, 262--270.

[73]

Thanawin Rakthanmanon, Eamonn J Keogh, Stefano Lonardi, and Scott Evans. 2011. Time Series Epenthesis: Clustering Time Series Streams requires Ignoring Some Data. In Data Mining (ICDM), 2011 IEEE 11th International Conference on. IEEE, 547--556.

Digital Library

[74]

Pedro Pereira Rodrigues, João Gama, and João Pedro Pedroso. 2006. ODAC: Hierarchical Clustering of Time Series Data Streams. In SDM. SIAM, 499--503.

[75]

Hans-Jörg Schulz, Marco Angelini, Giuseppe Santucci, and H Schumann. 2016. An Enhanced Visualization Process Model for Incremental Visualization. IEEE Transactions on Visualization and ComputerGraphics22 (07 2016), 1830--1842. https://doi.org/10.1109/TVCG.2015.2462356

Digital Library

[76]

Tarique Siddiqui, Albert Kim, John Lee, Karrie Karahalios, and AdityaParameswaran. 2016. Effortless Data Exploration with Zenvisage: An Expressive and Interactive Visual Analytics System. Proc. VLDB Endow. 10, 4 (Nov. 2016), 457--468. https://doi.org/10.14778/3025111.3025126

Digital Library

[77]

Charles D. Stolper, Adam Perer, and David Gotz. 2014. Progressive Visual Analytics: User-Driven Visual Exploration of In-Progress Analytics. IEEE TVCG20 (2014).

[78]

Edward R. Tufte. 1986. The Visual Display of Quantitative Information.

[79]

Cagatay Turkay, Erdem Kaya, Selim Balcisoy, and Helwig Hauser. 2017. Designing Progressive and Interactive Analytics Processes for High-Dimensional Data Analysis. IEEE Transactions on Visualization and Computer Graphics 23, 1 (Jan. 2017), 131--140. https://doi.org/10.1109/TVCG. 2016. 2598470

Digital Library

[80]

Southwest University. 2017. Southwest University Adult LifespanDataset (SALD). http://fcon_1000.projects.nitrc.org/indi/retro/sald. html.

[81]

Skoltech Computer Vision. 2018. Deep billion-scale indexing. http://sites.skoltech.ru/compvision/noimi.

[82]

Abraham Wald. 1945. Sequential Tests of Statistical Hypotheses. The Annals of Mathematical Statistics 16, 2 (06 1945), 117--186. https://doi.org/10.1214/aoms/1177731118

[83]

Matt P. Wand and Michael C. Jones. 1993. Comparison of Smoothing Parameterizations in Bivariate Kernel Density Estimation. J. Amer. Statist. Assoc. 88, 422 (1993), 520--528. https://doi.org/10.1080/01621459. 1993. 10476303

[84]

Matt P. Wand and Michael C. Jones. 1994. Multivariate plug-in bandwidth selection. Computational Statistics 9, 2 (1994), 97--116. http://oro.open.ac.uk/28244/

[85]

Yang Wang, Peng Wang, Jian Pei, Wei Wang, and Sheng Huang. 2013. A Data-adaptive and Dynamic Segmentation Index for Whole Matching on Time Series. PVLDB6, 10 (2013), 793--804.

[86]

T. Warren Liao. 2005. Clustering of Time Series Data - A Survey. Pattern Recognition 38, 11 (2005), 1857--1874.

Digital Library

[87]

Sai Wu, Beng Chin Ooi, and Kian-Lee Tan. 2013. Online Aggregation. In Advanced Query Processing, Volume 1: Issues and Trends. 187--210.

[88]

Djamel-Edine Yagoubi, Reza Akbarinia, Florent Masseglia, and Themis Palpanas. 2020. Massively Distributed Time Series Indexing and Querying. TKDE32, 1 (2020).

[89]

E. Zgraggen, A. Galakatos, A. Crotty, J. Fekete, and T. Kraska. 2017. How Progressive Visualizations Affect Exploratory Analysis. IEEE Transactions on Visualization and Computer Graphics 23, 8 (Aug 2017), 1977--1987. https://doi.org/10.1109/TVCG.2016.2607714

Digital Library

[90]

Emanuel Zgraggen, Zheguang Zhao, Robert C. Zeleznik, and Tim Kraska. 2018. Investigating the Effect of the Multiple Comparisons Problem in Visual Analysis. InCHI.

[91]

Kostas Zoumpatianos, Stratos Idreos, and Themis Palpanas. 2015. RINSE: Interactive Data Series Exploration with ADS+. PVLDB 8, 12 (2015), 1912--1915. https://doi.org/10.14778/2824032.2824099

Digital Library

[92]

Kostas Zoumpatianos, Stratos Idreos, and Themis Palpanas. 2016. ADS: The Adaptive Data Series Index. VLDB J. 25, 6 (2016), 843--866. https://doi.org/10.1007/s00778-016-0442--5

Digital Library

[93]

Kostas Zoumpatianos, Yin Lou, Themis Palpanas, and Johannes Gehrke. 2015. Query Workloads for Data Series Indexes. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10--13, 2015. 1603--1612. https://doi.org/10.1145/2783258. 2783382

Digital Library

Cited By

Azizi IEchihabi KPalpanas T(2023)ELPIS: Graph-Based Similarity Search for Scalable Data ScienceProceedings of the VLDB Endowment10.14778/3583140.358316616:6(1548-1559)Online publication date: 20-Apr-2023
https://doi.org/10.14778/3583140.3583166
Chatzakis MFatourou PKosmas EPalpanas TPeng B(2023)Odyssey: A Journey in the Land of Distributed Data Series Similarity SearchProceedings of the VLDB Endowment10.14778/3579075.357908716:5(1140-1153)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.14778/3579075.3579087
Zhao XZheng BYi XLuan XXie CZhou XJensen C(2023)FARGO: Fast Maximum Inner Product Search via Global Multi-ProbingProceedings of the VLDB Endowment10.14778/3579075.357908416:5(1100-1112)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.14778/3579075.3579084
Show More Cited By

Index Terms

Data Series Progressive Similarity Search with Probabilistic Quality Guarantees
1. Information systems

Recommendations

High-Dimensional Vector Similarity Search: From Time Series to Deep Network Embeddings
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Similarity search is an important and challenging problem that is typically modeled as nearest neighbor search in high dimensional space, where objects are represented as high dimensional vectors and their (dis)similarity is evaluated using a distance ...
ProS: data series progressive k-NN similarity search and classification with probabilistic quality guarantees
Abstract
Existing systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support ...
Generating data series query workloads

Data series (including time series) has attracted lots of interest in recent years. Most of the research has focused on how to efficiently support similarity or nearest neighbor queries over large data series collections (an important data mining task), ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

June 2020

2925 pages

ISBN:9781450367356

DOI:10.1145/3318464

General Chairs:
David Maier
Portland State University, USA
,
Rachel Pottinger
University of British Columbia, Canada
,
Program Chairs:
AnHai Doan
University of Wisconsin, USA
,
Wang-Chiew Tan
Megagon Labs, USA
,
Publications Chairs:
Abdussalam Alawini
University of Illinois at Urbana-Champaign, USA
,
Hung Q. Ngo
RelationalAI, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

EU project NESTOR
EDF-THALES
Investir l'Avenir and Univ. of Paris IDEX Emergence en Recherche

Conference

SIGMOD/PODS '20

Sponsor:

SIGMOD

SIGMOD/PODS '20: International Conference on Management of Data

June 14 - 19, 2020

OR, Portland, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
551
Total Downloads

Downloads (Last 12 months)49
Downloads (Last 6 weeks)9

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Azizi IEchihabi KPalpanas T(2023)ELPIS: Graph-Based Similarity Search for Scalable Data ScienceProceedings of the VLDB Endowment10.14778/3583140.358316616:6(1548-1559)Online publication date: 20-Apr-2023
https://doi.org/10.14778/3583140.3583166
Chatzakis MFatourou PKosmas EPalpanas TPeng B(2023)Odyssey: A Journey in the Land of Distributed Data Series Similarity SearchProceedings of the VLDB Endowment10.14778/3579075.357908716:5(1140-1153)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.14778/3579075.3579087
Zhao XZheng BYi XLuan XXie CZhou XJensen C(2023)FARGO: Fast Maximum Inner Product Search via Global Multi-ProbingProceedings of the VLDB Endowment10.14778/3579075.357908416:5(1100-1112)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.14778/3579075.3579084
Alizade Nikoo ABöhlen MHelmer S(2023)Correlation Joins over Time Series Data Streams Utilizing Complementary Dimension Reduction and TransformationProceedings of the ACM on Management of Data10.1145/36267221:4(1-26)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1145/3626722
Campos DZhang MYang BKieu TGuo CJensen C(2023)LightTS: Lightweight Time Series Classification with Adaptive Ensemble DistillationProceedings of the ACM on Management of Data10.1145/35893161:2(1-27)Online publication date: 20-Jun-2023
https://doi.org/10.1145/3589316
Fatourou PKosmas EPalpanas TPaterakis G(2023)FreSh: A Lock-Free Data Series Index2023 42nd International Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS60354.2023.00029(209-220)Online publication date: 25-Sep-2023
https://doi.org/10.1109/SRDS60354.2023.00029
Echihabi KFatourou PZoumpatianos KPalpanas TBenbrahim H(2022)Hercules against data series similarity searchProceedings of the VLDB Endowment10.14778/3547305.354730815:10(2005-2018)Online publication date: 7-Sep-2022
https://dl.acm.org/doi/10.14778/3547305.3547308
Boniol PMeftah MRemy EPalpanas TIves ZBonifati AEl Abbadi A(2022)dCAM: Dimension-wise Class Activation Map for Explaining Multivariate Data Series ClassificationProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526183(1175-1189)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526183
Echihabi KPalpanas T(2022)Scalable Analytics on Large Sequence Collections2022 23rd IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM55031.2022.00022(5-8)Online publication date: Jun-2022
https://doi.org/10.1109/MDM55031.2022.00022
Paparrizos JEdian ILiu CElmore AFranklin M(2022)Fast Adaptive Similarity Search through Variance-Aware Quantization2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00268(2969-2983)Online publication date: May-2022
https://doi.org/10.1109/ICDE53745.2022.00268
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents