Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Multivariate Time Series Cleaning under Speed Constraints

Published: 20 December 2024 Publication History

Abstract

Errors are common in time series due to unreliable sensor measurements. Existing methods focus on univariate data but do not utilize the correlation between dimensions. Cleaning each dimension separately may lead to a less accurate result, as some errors can only be identified in the multivariate case. We also point out that the widely used minimum change principle is not always the best choice. Instead, we try to change the smallest number of data to avoid a significant change in the data distribution. In this paper, we propose MTCSC, the constraint-based method for cleaning multivariate time series. We formalize the repair problem, propose a linear-time method to employ online computing, and improve it by exploiting data trends. We also support adaptive speed constraint capturing. We analyze the properties of our proposals and compare them with SOTA methods in terms of effectiveness, efficiency versus error rates, data sizes, and applications such as classification. Experiments on real datasets show that MTCSC can have higher repair accuracy with less time consumption. Interestingly, it can be effective even when there are only weak or no correlations between the dimensions.

References

[1]
Foto N. Afrati and Phokion G. Kolaitis. 2009. Repair checking in inconsistent databases: algorithms and complexity. In ICDT (ACM International Conference Proceeding Series, Vol. 361). ACM, 31--41.
[2]
Charu C Aggarwal. 2016. Outlier analysis second edition.
[3]
Fabrizio Angiulli and Fabio Fassetti. 2007. Detecting distance-based outliers in streams of data. In CIKM. ACM, 811--820.
[4]
Anthony J. Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn J. Keogh. 2018. The UEA multivariate time series classification archive, 2018. CoRR, Vol. abs/1811.00075 (2018). [arXiv]1811.00075 http://arxiv.org/abs/1811.00075
[5]
Ane Blázquez-García, Angel Conde, Usue Mori, and José Antonio Lozano. 2022. A Review on Outlier/Anomaly Detection in Time Series Data. ACM Comput. Surv., Vol. 54, 3 (2022), 56:1--56:33.
[6]
Philip Bohannon, Michael Flaster, Wenfei Fan, and Rajeev Rastogi. 2005. A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification. In SIGMOD Conference. ACM, 143--154.
[7]
David R. Brillinger. 2001. Time series - data analysis and theory. Classics in applied mathematics, Vol. 36. SIAM.
[8]
Samuel Burer and Anureet Saxena. 2011. The MILP road to MIQCP. Mixed integer nonlinear programming (2011), 373--405.
[9]
Yanping Chen, Eamonn Keogh, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, and Gustavo Batista. 2015. The UCR Time Series Classification Archive. www.cs.ucr.edu/ eamonn/time_series_data/.
[10]
Xu Chu, Ihab F. Ilyas, and Paolo Papotti. 2013. Holistic data cleaning: Putting violations into context. In ICDE. IEEE Computer Society, 458--469.
[11]
Thomas M. Cover and Peter E. Hart. 1967. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory, Vol. 13, 1 (1967), 21--27.
[12]
Tamraparni Dasu and Ji Meng Loh. 2012. Statistical Distortion: Consequences of Data Cleaning. Proc. VLDB Endow., Vol. 5, 11 (2012), 1674--1683.
[13]
Janez Demsar. 2006. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res., Vol. 7 (2006), 1--30. http://jmlr.org/papers/v7/demsar06a.html
[14]
Guohui Ding, Yueyi Zhu, Chenyang Li, Jinwei Wang, Ru Wei, and Zhaoyu Liu. 2023. Time Series Data Cleaning Method Based on Optimized ELM Prediction Constraints. J. Inf. Process. Syst., Vol. 19, 2 (2023), 149--163.
[15]
Chenglong Fang, Feng Wang, Bin Yao, and Jianqiu Xu. 2022. GPSClean: A Framework for Cleaning and Repairing GPS Data. ACM Trans. Intell. Syst. Technol., Vol. 13, 3 (2022), 40:1--40:22.
[16]
Everette S. Gardner. 2006. Exponential smoothing: The state of the art-Part II. International Journal of Forecasting, Vol. 22, 4 (2006), 637--666. https://doi.org/10.1016/j.ijforecast.2006.03.005
[17]
M. R. Garey and David S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman.
[18]
Lukasz Golab, Howard J. Karloff, Flip Korn, Avishek Saha, and Divesh Srivastava. 2009. Sequential Dependencies. Proc. VLDB Endow., Vol. 2, 1 (2009), 574--585.
[19]
LLC Gurobi Optimization. 2024. Gurobi Optimization. https://www.gurobi.com/. Accessed: 2024-03--30.
[20]
David J. Hill and Barbara S. Minsker. 2010. Anomaly detection in streaming environmental sensor data: A data-driven modeling approach. Environ. Model. Softw., Vol. 25, 9 (2010), 1014--1022.
[21]
Rob J Hyndman and George Athanasopoulos. 2018. Forecasting: principles and practice. OTexts.
[22]
Shawn R. Jeffery, Minos N. Garofalakis, and Michael J. Franklin. 2006. Adaptive Cleaning for RFID Data Streams. In VLDB. ACM, 163--174.
[23]
H Zar Jerrold. 1999. Biostatistical analysis. Biostatistical analysis (1999).
[24]
Solomon Kullback. 1997. Information theory and statistics. Courier Corporation.
[25]
Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, and Divesh Srivastava. 2012. Truth Finding on the Deep Web: Is the Problem Solved? Proc. VLDB Endow., Vol. 6, 2 (2012), 97--108.
[26]
James MacQueen et al. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. Oakland, CA, USA, 281--297.
[27]
Samuel Madden. 2003. Intel Berkeley research lab data. https://db.csail.mit.edu/labdata/labdata.html. Accessed: 2024-04--10.
[28]
Mostafa Milani, Zheng Zheng, and Fei Chiang. 2019. CurrentClean: Spatio-Temporal Cleaning of Stale Data. In ICDE. IEEE, 172--183.
[29]
William M Rand. 1971. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, Vol. 66, 336 (1971), 846--850.
[30]
Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, and Christopher Ré. 2017. HoloClean: Holistic Data Repairs with Probabilistic Inference. Proc. VLDB Endow., Vol. 10, 11 (2017), 1190--1201.
[31]
Craige Schensted. 1961. Longest increasing and decreasing subsequences. Canadian Journal of mathematics, Vol. 13 (1961), 179--191.
[32]
Shaoxu Song, Fei Gao, Aoqian Zhang, Jianmin Wang, and Philip S. Yu. 2021. Stream Data Cleaning under Speed and Acceleration Constraints. ACM Trans. Database Syst., Vol. 46, 3 (2021), 10:1--10:44.
[33]
Shaoxu Song, Aoqian Zhang, Jianmin Wang, and Philip S. Yu. 2015. SCREEN: Stream Data Cleaning under Speed Constraints. In SIGMOD Conference. ACM, 827--841.
[34]
C Robert Taylor. 2019. Dynamic programming and the curses of dimensionality. In Applications of dynamic programming to agricultural decision problems. CRC Press, 1--10.
[35]
Shreshth Tuli, Giuliano Casale, and Nicholas R. Jennings. 2022. TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data. Proc. VLDB Endow., Vol. 15, 6 (2022), 1201--1214.
[36]
Haoyu Wang, Aoqian Zhang, Shaoxu Song, and Jianmin Wang. 2024. Streaming data cleaning based on speed change. VLDB J., Vol. 33, 1 (2024), 1--24.
[37]
Susik Yoon, Jae-Gil Lee, and Byung Suk Lee. 2020. Ultrafast local outlier detection from a data stream with stationary region skipping. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1181--1191.
[38]
Aoqian Zhang, Shaoxu Song, and Jianmin Wang 2016. Sequential Data Cleaning: A Statistical Approach. In SIGMOD Conference. ACM, 909--924.
[39]
Yuxin Zhang, Yiqiang Chen, Jindong Wang, and Zhiwen Pan. 2021. Unsupervised Deep Anomaly Detection for Multi-Sensor Time-Series Signals. CoRR, Vol. abs/2107.12626 (2021).
[40]
Yu Zheng, Like Liu, Longhao Wang, and Xing Xie. 2008. Learning transportation mode from raw gps data for geographic applications on the web. In WWW. ACM, 247--256.
[41]
Jingjing Zhou, Xiaokang Yu, Jilin Zhang, Hanxiao Shi, Yuxin Mao, and Junfeng Yuan. 2022. A High-Dimensional Timing Data Cleaning Algorithm for Wireless Sensor Networks. Ad Hoc Sens. Wirel. Networks, Vol. 53, 1--2 (2022), 141--164.

Cited By

View all
  • (2024)FedSLS: Exploring Federated Aggregation in Saliency Latent SpaceProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681278(7182-7190)Online publication date: 28-Oct-2024

Index Terms

  1. Multivariate Time Series Cleaning under Speed Constraints

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 2, Issue 6
    SIGMOD
    December 2024
    792 pages
    EISSN:2836-6573
    DOI:10.1145/3709598
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 December 2024
    Published in PACMMOD Volume 2, Issue 6

    Permissions

    Request permissions for this article.

    Author Tags

    1. data cleaning
    2. multivariate time series
    3. speed constraint

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)FedSLS: Exploring Federated Aggregation in Saliency Latent SpaceProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681278(7182-7190)Online publication date: 28-Oct-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media