Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3638985.3639007acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicitConference Proceedingsconference-collections
research-article
Open access

Understanding Integrity of Time Series IoT Datasets through Local Outlier Detection with Steep Peak and Valley

Published: 11 March 2024 Publication History

Abstract

With substantial advances in emerging and enabling technologies in IoT sensors, a vast amount of IoT-based environmental data allows preparation for adverse impacts by providing helpful information for predictive and precise services. However, data acquired by IoT sensors can be corrupted by external environmental factors, which can negatively affect the integrity of data interpretation. To address this problem, a prior study proposed outlier detection techniques using transform-based sparse profiles. However, it would lose its worth without an evaluation methodology for data integrity after probing datasets by outlier detection. In addition, it did not consider data with steep peaks or data that is dependent on other data, which is common in real-world scenarios such as soil moisture data used in this paper. Therefore, we propose a process of preprocessing defective soil moisture sensor data using local pattern-based outlier detection (LPOD) and evaluating the integrity of data after outlier detection. Our paper specifically aims to: 1) detect outliers of original soil IoT datasets to eliminate fault data possibly giving wrong decisions using local and global outlier detection (OD); 2) exploit the results of statistical evaluation to determine whether the outliers have been well eliminated; and 3) find the ground truth pattern of soil IoT datasets considering precipitation. Experiments using real-world soil moisture datasets show that the LPOD method outperforms other statistical outlier detection methods, suggesting that the preprocessed data can improve the integrity of IoT datasets.

References

[1]
Eduardo Berrocal, Leonardo Bautista-Gomez, Sheng Di, Zhiling Lan, and Franck Cappello. 2015. Lightweight Silent Data Corruption Detection Based on Runtime Data Analysis for HPC Applications. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing(HPDC). https://doi.org/10.1145/2749246.2749253
[2]
Ane Blázquez-García, Angel Conde, Usue Mori, and Jose A. Lozano. 2021. A Review on Outlier/Anomaly Detection in Time Series Data. ACM Comput. Surv. 54, 3, Article 56 (apr 2021), 33 pages. https://doi.org/10.1145/3444690
[3]
Y. Cai, W. Zheng, X. Zhang, L. Zhangzhong, and X. Xue. 2019. Research on soil moisture prediction model based on deep learning. In PLOS ONE, Vol. 14. https://doi.org/10.1371/journal.pone.0214508
[4]
T. Chai and R. R. Draxler. 2014. Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature. Geoscientific Model Development (GMD) (2014), 1247–1250. https://doi.org/10.5194/gmd-7-1247-2014
[5]
Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly Detection: A Survey. ACM Comput. Surv. 41, 3, Article 15 (July 2009), 58 pages. https://doi.org/10.1145/1541880.1541882
[6]
Peter Schmidt Denis Kwiatkowski, Peter C.B. Phillips and Yongcheol Shin. 1992. Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root?Journal of Econometrics (1992), 159–178. https://doi.org/10.1016/0304-4076(92)90104-Y
[7]
David A. Dickey and Wayne A. Fuller. 1984. Distribution of the Estimators for Autoregressive Time Series With a Unit Root. J. Amer. Statist. Assoc. (1984), 427–431. https://doi.org/10.2307/2286348
[8]
Alexander T M Fisch, Idris A Eckley, and Paul Fearnhead. 2019. Subset Multivariate Collective And Point Anomaly Detection. ArXiv e-prints (2019).
[9]
Johan Florbäck. 2015. Anomaly Detection in Logged Sensor Data. Master’s thesis. Chalmers University of Technology. Master’s thesis in Complex Adaptive Systems.
[10]
C. W. J. Granger. 1969. Investigating Causal Relations by Econometric Models and Cross-Spectral Methods. Econometrica (1969), 424–438. https://doi.org/10.2307/1912791
[11]
Minqi Jiang, Songqiao Han, and Hailiang Huang. 2023. Anomaly Detection with Score Distribution Discrimination. In KDD ’23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
[12]
Sumukh Marathe, Akshay Nambi, Manohar Swaminathan, and Ronak Sutaria. 2021. CurrentSense: A novel approach for fault and drift detection in environmental IoT sensors. In IoTDI.
[13]
Luis Martí, Nayat Sanchez-Pi, José Manuel Molina, and Ana Cristina Bicharra Garcia. 2015. Anomaly Detection Based on Sensor Data in Petroleum Industry Applications. Sensors (2015).
[14]
Aekyeung Moon, Jaeyoung Kim, Jialing Zhang, and Seung Woo Son. 2018. Evaluating Fidelity of Lossy Compression on Spatiotemporal Data from an IoT Enabled Smart Farm. Computers and Electronics in Agriculture 154 (Nov. 2018), 304–313.
[15]
Aekyeung Moon, Minjun Kim, Jiaxi Chen, and Seung Woo Son. 2023. Anomaly Detection in Scientific Datasets using Sparse Representation. In AI4Sys ’23: Proceedings of the First Workshop on AI for Systems.
[16]
Aekyeung Moon, Xiaoyan Zhuo, Jialing Zhang, Seung Woo Son, and Yun Jeong Song. 2020. Anomaly Detection in Edge Nodes using Sparsity Profile. In IEEE Big Data.
[17]
Aekyeung Moon, Xiaoyan Zhuo, Jialing Zhang, Seung Woo Son, and Yun Jeong Song. 2020. Anomaly Detection in Edge Nodes using Sparsity Profile. In Proceedings of IEEE Big Data. 1236–1245.
[18]
José R. Rozante, Enver Ramirez Gutierrez, Pedro Leite da Silva Dias, Alex de Almeida Fernandes, Debora Souza Alvim, and Vinicius Matoso Silva. 2019. Development of an index for frost prediction: Technique and validation. Meteorological Applications (2019).
[19]
Abir Smiti. 2020. A critical overview of outlier detection methods. Computer Science Review (2020). https://doi.org/10.1016/j.cosrev.2020.100306.
[20]
Majid Vafaeipour, Omid Rahbari, Marc A. Rosen, Farivar Fazelpour, and Pooyandeh Ansarirad. 2014. Application of sliding window technique for prediction of wind velocity time series. International Journal of Energy and Environmental Engineering (2014), 1–7. https://doi.org/10.1007/s40095-014-0105-5
[21]
Juan Vera, Wenceslao Conejero, Ana B. Mira-García, María R. Conesa, and M. Carmen Ruiz-Sánchez. 2021. Towards irrigation automation based on dielectric soil sensors. In The Journal of Horticultural Science and Biotechnology, Vol. 96. https://doi.org/10.1080/14620316.2021.1906761
[22]
Kate Smith-Miles Xiaozhe Wang and Rob Hyndman. 2009. Rule induction for forecasting method selection: Meta-learning the characteristics of univariate time series. Neurocomputing (2009), 2581–2594. https://doi.org/10.1016/j.neucom.2008.10.017
[23]
Jialing Zhang, Xiaoyan Zhuo, Aekyeung Moon, Hang Liu, and Seung Woo Son. 2019. Efficient Encoding and Reconstruction of HPC Datasets for Checkpoint/Restart. In Symposium on Mass Storage Systems and Technologies (MSST). https://doi.org/10.1109/MSST.2019.00-14

Index Terms

  1. Understanding Integrity of Time Series IoT Datasets through Local Outlier Detection with Steep Peak and Valley

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICIT '23: Proceedings of the 2023 11th International Conference on Information Technology: IoT and Smart City
    December 2023
    266 pages
    ISBN:9798400709043
    DOI:10.1145/3638985
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 March 2024

    Check for updates

    Author Tags

    1. IoT data analytics
    2. data patterns.
    3. outlier detection
    4. outliers
    5. time-series data

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICIT 2023
    ICIT 2023: IoT and Smart City
    December 14 - 17, 2023
    Kyoto, Japan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 96
      Total Downloads
    • Downloads (Last 12 months)96
    • Downloads (Last 6 weeks)23
    Reflects downloads up to 01 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media