Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Dimensionality Reduction of Service Monitoring Time-Series: An Industrial Use Case

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Our study proposes a dimensionality reduction approach to efficiently process a service monitoring application’s high-dimensional and unlabeled time-series dataset. The approach aims to improve data quality and lower the feature space optimally. Since the dataset is vast and the reduction approach requires colossal resources, we divide it into several weekly sub-datasets. Using clustering methods and metrics, we evaluate the approach’s efficacy on each sub-dataset thoroughly and show that the information loss was minimal after data transformation. Moreover, we assess each sub-dataset’s trustworthiness and similarity to verify whether the new data acquire the same cluster labels as the initial data. Since the experiments reveal a high data quality, the industrial partner can utilize the new data in their decision-making tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Anowar F, Sadaoui S, Dalal H. Clustering quality of a high-dimensional service monitoring time-series dataset. In: 14th International Conference on Agents and Artificial Intelligence (ICAART), 2022;183–192.

  2. Jia W, Sun M, Lian J, Hou S. Feature dimensionality reduction: a review. Complex Intell Syst. 2022;8(1):1–31.

    Google Scholar 

  3. Spruyt V. The curse of dimensionality in classification. Comput Vis Dumm. 2014;21(3):35–40.

    Google Scholar 

  4. Van Der Maaten L, Postma E, Van den Herik J. Dimensionality reduction: A comparative review. J Mach Learn Res. 2009;10(66–71):13.

    Google Scholar 

  5. Jindal P, Kumar D. A review on dimensionality reduction techniques. Int J Comput Appl. 2017;173(2):42–6.

    Google Scholar 

  6. Verleysen M, François D. The curse of dimensionality in data mining and time series prediction. In: International Work-conference on Artificial Neural Networks, 2005;758–770. Springer.

  7. Hawkins DM. The problem of overfitting. J Chem Inf Comput Sci. 2004;44(1):1–12.

    Article  MathSciNet  Google Scholar 

  8. Anowar F, Sadaoui S, Selim B. Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne). Comput Sci Rev. 2021;40:1–13. https://doi.org/10.1016/j.cosrev.2021.100378.

    Article  MathSciNet  MATH  Google Scholar 

  9. Dong Y, Du B, Zhang L, Zhang L. Dimensionality reduction and classification of hyperspectral images using ensemble discriminative local metric learning. IEEE Trans Geosci Remote Sens. 2017;55(5):2509–24.

    Article  Google Scholar 

  10. Abdulhammed R, Musafer H, Alessa A, Faezipour M, Abuzneid A. Features dimensionality reduction approaches for machine learning based network intrusion detection. Electronics. 2019;8(3):322.

    Article  Google Scholar 

  11. He H, Tan Y. Automatic pattern recognition of ecg signals using entropy-based adaptive dimensionality reduction and clustering. Appl Soft Comput. 2017;55:238–52.

    Article  Google Scholar 

  12. Zarzour H, Al-Sharif Z, Al-Ayyoub M, Jararweh Y. A new collaborative filtering recommendation algorithm based on dimensionality reduction and clustering techniques. In: 2018 9th International Conference on Information and Communication Systems (ICICS), 2018;102–106. IEEE.

  13. Chormunge S, Jena S. Correlation based feature selection with clustering for high dimensional data. J Electr Syst Inf Technol. 2018;5(3):542–9.

    Article  Google Scholar 

  14. Messaoud TA, Smiti A, Louati A. A novel density-based clustering approach for outlier detection in high-dimensional data. In: International Conference on Hybrid Artificial Intelligence Systems, 2019;322–331 . Springer.

  15. Graving JM, Couzin ID. Vae-sne: a deep generative model for simultaneous dimensionality reduction and clustering. BioRxiv, 2020.

  16. Prometheus: Overview. https://prometheus.io/docs/introduction/overview/. Last accessed 21 Feb 2022 (2021).

  17. Prometheus: From metrics to insight. https://prometheus.io/docs/concepts/metric_types/. Last accessed 21 Feb 2022 (2021).

  18. Li D, Wong WE, Wang W, Yao Y, Chau M. Detection and mitigation of label-flipping attacks in federated learning systems with kpca and k-means. In: 2021 8th International Conference on Dependable Systems and Their Applications (DSA), 2021;551–559. IEEE.

  19. Hoffmann H. Kernel pca for novelty detection. Pattern Recogn. 2007;40(3):863–74.

    Article  MATH  Google Scholar 

  20. Fan Z, Wang J, Xu B, Tang P. An efficient kpca algorithm based on feature correlation evaluation. Neural Comput Appl. 2014;24(7):1795–806.

    Article  Google Scholar 

  21. Kwak N. Nonlinear projection trick in kernel methods: an alternative to the kernel trick. IEEE Trans Neural Netw Learn Syst. 2013;24(12):2113–9.

    Article  Google Scholar 

  22. Baudat G, Anouar F. Kernel-based methods and function approximation. In: IJCNN’01. International Joint Conference on Neural Networks. Proceedings (Cat. No. 01CH37222), 2001;2:1244–1249. IEEE.

  23. Ghashami M, Perry DJ, Phillips J. Streaming kernel principal component analysis. In: Artificial Intelligence and Statistics, 2016;1365–1374. PMLR.

  24. Kumar A. PCA Explained Variance Concepts with Python Example. https://vitalflux.com/pca-explained-variance-concept-python-example/. Last accessed 21 February 2022 (2020).

  25. Wang Y, Yao H, Zhao S. Auto-encoder based dimensionality reduction. Neurocomputing. 2016;184:232–42.

    Article  Google Scholar 

  26. Nousi P, Tefas A. Self-supervised autoencoders for clustering and classification. Evol Syst. 2020;11(3):453–66.

    Article  Google Scholar 

  27. Almotiri J, Elleithy K, Elleithy A. Comparison of autoencoder and principal component analysis followed by neural network for e-learning using handwritten recognition. In: 2017 IEEE Long Island Systems, Applications and Technology Conference (LISAT), 2017;1–5. IEEE.

  28. Canelli F, de Cosa A, Le Pottier L, Niedziela J, Pedro K, Pierini M. Autoencoders for semivisible jet detection. J High Energy Phys. 2022;1(2):1–17.

    Google Scholar 

  29. Lawton G. Autoencoders’ example uses augment data for machine learning. https://searchenterpriseai.techtarget.com/feature/Autoencoders-example-uses-augment-data-for-machine-learning. Last accessed 21 February 2022 (2020).

  30. Wu W, Xu Z, Kou G, Shi Y. Decision-making support for the evaluation of clustering algorithms based on mcdm. Complexity. 2020;2020:1–17.

    Article  Google Scholar 

  31. Tavenard R, Faouzi J, Vandewiele G, Divo F, Androz G, Holtz C, Payne M, Yurchak R, Rußwurm M, Kolar K, et al. Tslearn, a machine learning toolkit for time series data. J Mach Learn Res. 2020;21(118):1–6.

    MATH  Google Scholar 

  32. Huang X, Ye Y, Xiong L, Lau RY, Jiang N, Wang S. Time series k-means: A new k-means type smooth subspace clustering for time series data. Inf Sci. 2016;367:1–13.

    MATH  Google Scholar 

  33. Paparrizos J, Gravano L. k-shape: Efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015;1855–1870.

  34. Yuan C, Yang H. Research on k-value selection method of k-means clustering algorithm. J. 2019;2(2):226–35.

    Google Scholar 

  35. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.

    MathSciNet  MATH  Google Scholar 

  36. Kaoungku N, Suksut K, Chanklan R, Kerdprasop K, Kerdrasop N. The silhouette width criterion for clustering and association mining to select image features. Int J Mach Learn Comput. 2018;8(1):1–5.

    Google Scholar 

  37. Thinsungnoena T, Kaoungkub N, Durongdumronchaib P, Kerdprasopb K, Kerdprasopb N. The clustering validity with silhouette and sum of squared errors. In: The 3rd International International Conference on Industrial Application and Engineering, 2015:1–8.

  38. Zhang Y, Li D. Cluster analysis by variance ratio criterion and firefly algorithm. Int J Digit Content Technol Appl. 2013;7(3):689–97.

    Google Scholar 

  39. Anowar F, Sadaoui S. Incremental neural-network learning for big fraud data. In: 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2020:3551–3557. https://doi.org/10.1109/SMC42975.2020.9283136

  40. Griparis A, Faur D, Datcu M. A dimensionality reduction approach for the visualization of the cluster space: A trustworthiness evaluation. In: 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2016:2917–2920. IEEE.

  41. Lee JA, Verleysen M. Quality assessment of dimensionality reduction: rank-based criteria. Neurocomputing. 2009;72(7–9):1431–43.

    Article  Google Scholar 

  42. Lee J, Verleysen M. Quality assessment of nonlinear dimensionality reduction based on \(k\)-ary neighborhoods. In: New Challenges for Feature Selection in Data Mining and Knowledge Discovery, 2008:21–35. PMLR.

  43. Anowar F, Sadaoui S. Incremental learning framework for real-world fraud detection environment. Comput Intell. 2021;37(1):635–56.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We want to express our gratitude to Global AI Accelerator (GAIA) Ericsson, Montreal, for collaborating with us on this research work and the Observability team for allowing us access to the data.

Funding

This research is supported by MITACS Accelerate Canada (project number: IT16751).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samira Sadaoui.

Ethics declarations

Conflict of interest

The authors declare that no conflicts of interest is related to this publication.

Ethics approval

Not applicable.

Consent to participate

All authors have read, approved the manuscript, and agreed for the authorship.

Consent for publication

All authors have given consent for publishing the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection: “Advances on Agents and Artificial Intelligence” guest-edited by Jaap van den Herik, Ana Paula Rocha and Luc Steels.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Anowar, F., Sadaoui, S. & Dalal, H. Dimensionality Reduction of Service Monitoring Time-Series: An Industrial Use Case. SN COMPUT. SCI. 4, 23 (2023). https://doi.org/10.1007/s42979-022-01428-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-022-01428-y

Keywords