Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3448016.3452757acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

FeatTS: Feature-based Time Series Clustering

Published: 18 June 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Clustering time series is a recurrent problem in real-life applications involving data science and data analytics pipelines. Existing time series clustering algorithms are ineffective for feature-rich real-world time series since they only compare the time series based on raw data or use a fixed set of features for determining the similarity. In this paper, we showcase FeatTS, a feature-based semi-supervised clustering framework addressing the above issues for variable-length and heterogeneous time series. Specifically, FeatTS leverages a graph encoding of the time series that is obtained by considering a high number of significant extracted features. It then employs community detection and builds upon a Co-Occurrence matrix in order to unify all the best clustering results. We let the user explore the various steps of FeatTS by visualizing the initial data, its graph encoding and its division into communities along with the obtained clusters. We show how the user can interact with the process for the choice of the features and for varying the percentage of input labels and the various parameters. In view of its characteristics, FeatTS outperforms the state of the art clustering methods and is the first to be able to digest domain-specific time series such as healthcare time series, while still being robust and scalable.

    Supplementary Material

    MP4 File (3448016.3452757.mp4)
    The problem of clustering time series has several applications in real-life contexts, especially in data science and data analytics pipelines. Existing time series clustering algorithms are ineffective for feature-rich real-world time series since they only compute the similarity of time series based on raw data or use a fixed setof features. In this paper, we showcase FeatTS, a feature-based semi-supervised clustering framework addressing the above issues for variable-length and heterogeneous time series. Specifically, it first relies on a graph encoding of the time series that is obtained by considering a high number of significant extracted features. Itthen employs community detection and leverages a co-occurrencematrix in order to group together all the best clustering results.We let the user delve in the various steps of FeatTS by visualizingthe initial data, its graph encoding and its division into communitiesalong with the obtained clusters.We show how the user can interactwith the process for the choice of the features and for varyingthe percentage of input labels and the parameters. In view of itscharacteristics, FeatTS outperforms the state of the art clusteringmethods and is the first to be able to digest domain-specific timeseries such as healthcare time series, while still being robust andscalable.

    References

    [1]
    Sugato Basu, Arindam Banerjee, and Raymond J. Mooney. 2002. Semi-supervised Clustering by Seeding. In Machine Learning, Proceedings of the Nineteenth International Conference (ICML 2002), University of New South Wales, Sydney, Australia, July 8--12, 2002, Claude Sammut and Achim G. Hoffmann (Eds.). Morgan Kaufmann, 27--34.
    [2]
    Maximilian Christ, Nils Braun, Julius Neuffer, and Andreas W. Kempa-Liehr. 2018. Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh - A Python package). Neurocomputing 307 (2018), 72--77. https://doi.org/10.1016/j.neucom.2018.03.067
    [3]
    Hoang Anh Dau, Anthony J. Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn J. Keogh. 2019. The UCR time series archive. IEEE CAA J. Autom. Sinica 6, 6 (2019), 1293--1305. https://doi.org/10.1109/jas.2019.1911747
    [4]
    Anil K. Jain and Richard C. Dubes. 1988. Algorithms for Clustering Data. Prentice- Hall, Inc., Upper Saddle River, NJ, USA.
    [5]
    Yijuan Lu, Ira Cohen, Xiang Sean Zhou, and Qi Tian. 2007. Feature selection using principal feature analysis. In Proceedings of the 15th International Conference on Multimedia 2007, Augsburg, Germany, September 24--29, 2007, Rainer Lienhart, Anand R. Prasad, Alan Hanjalic, Sunghyun Choi, Brian P. Bailey, and Nicu Sebe (Eds.). ACM, 301--304. https://doi.org/10.1145/1291233.1291297
    [6]
    Mark E. J. Newman. 2010. Networks: An Introduction. Oxford University Press. https://doi.org/10.1093/ACPROF:OSO/9780199206650.001.0001
    [7]
    Donato Tiano, Angela Bonifati, and Raymond Ng. 2021. Feature-driven Time Series Clustering. In Proceedings of EDBT.
    [8]
    Haishuai Wang, Qin Zhang, Jia Wu, Shirui Pan, and Yixin Chen. 2019. Time series feature learning with labeled and unlabeled data. Pattern Recognit. 89 (2019), 55--66. https://doi.org/10.1016/j.patcog.2018.12.026

    Cited By

    View all
    • (2024)From Peaks to TroughsUsing Strategy Analytics for Business Value Creation and Competitive Advantage10.4018/979-8-3693-2823-1.ch009(188-206)Online publication date: 28-Jun-2024
    • (2024)Unsupervised feature selection using chronological fitting with Shapley Additive explanation (SHAP) for industrial time-series anomaly detectionApplied Soft Computing10.1016/j.asoc.2024.111426155:COnline publication date: 2-Jul-2024
    • (2023)Querying Similar Multi-Dimensional Time Series with a Spatial DatabaseISPRS International Journal of Geo-Information10.3390/ijgi1204017912:4(179)Online publication date: 21-Apr-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
    June 2021
    2969 pages
    ISBN:9781450383431
    DOI:10.1145/3448016
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. clustering for data science
    2. community detection
    3. features selection
    4. semi-supervised clustering

    Qualifiers

    • Short-paper

    Funding Sources

    Conference

    SIGMOD/PODS '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)112
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 29 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)From Peaks to TroughsUsing Strategy Analytics for Business Value Creation and Competitive Advantage10.4018/979-8-3693-2823-1.ch009(188-206)Online publication date: 28-Jun-2024
    • (2024)Unsupervised feature selection using chronological fitting with Shapley Additive explanation (SHAP) for industrial time-series anomaly detectionApplied Soft Computing10.1016/j.asoc.2024.111426155:COnline publication date: 2-Jul-2024
    • (2023)Querying Similar Multi-Dimensional Time Series with a Spatial DatabaseISPRS International Journal of Geo-Information10.3390/ijgi1204017912:4(179)Online publication date: 21-Apr-2023
    • (2023)A pipeline architecture for feature-based unsupervised clustering using multivariate time series from HPC jobsInformation Fusion10.1016/j.inffus.2022.12.01793:C(1-20)Online publication date: 1-May-2023
    • (2023)PLAHSApplied Soft Computing10.1016/j.asoc.2023.110718147:COnline publication date: 1-Nov-2023
    • (2022)Time2FeatProceedings of the VLDB Endowment10.14778/3565816.356582216:2(193-201)Online publication date: 1-Oct-2022

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media