Robust Factorization of Real-world Tensor Streams with Patterns, Missing Values, and Outliers

Lee, Dongjin; Shin, Kijung

Computer Science > Machine Learning

arXiv:2102.08466 (cs)

[Submitted on 16 Feb 2021 (v1), last revised 9 Jun 2022 (this version, v2)]

Title:Robust Factorization of Real-world Tensor Streams with Patterns, Missing Values, and Outliers

Authors:Dongjin Lee, Kijung Shin

View PDF

Abstract:Consider multiple seasonal time series being collected in real-time, in the form of a tensor stream. Real-world tensor streams often include missing entries (e.g., due to network disconnection) and at the same time unexpected outliers (e.g., due to system errors). Given such a real-world tensor stream, how can we estimate missing entries and predict future evolution accurately in real-time? In this work, we answer this question by introducing SOFIA, a robust factorization method for real-world tensor streams. In a nutshell, SOFIA smoothly and tightly integrates tensor factorization, outlier removal, and temporal-pattern detection, which naturally reinforce each other. Moreover, SOFIA integrates them in linear time, in an online manner, despite the presence of missing entries. We experimentally show that SOFIA is (a) robust and accurate: yielding up to 76% lower imputation error and 71% lower forecasting error; (b) fast: up to 935X faster than the second-most accurate competitor; and (c) scalable: scaling linearly with the number of new entries per time step.

Comments:	Published at ICDE 2021
Subjects:	Machine Learning (cs.LG); Databases (cs.DB)
Cite as:	arXiv:2102.08466 [cs.LG]
	(or arXiv:2102.08466v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2102.08466

Submission history

From: Dongjin Lee [view email]
[v1] Tue, 16 Feb 2021 22:01:25 UTC (678 KB)
[v2] Thu, 9 Jun 2022 19:01:00 UTC (678 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2021-02

Change to browse by:

cs
cs.DB

References & Citations

DBLP - CS Bibliography

listing | bibtex

Dongjin Lee
Kijung Shin

export BibTeX citation

Computer Science > Machine Learning

Title:Robust Factorization of Real-world Tensor Streams with Patterns, Missing Values, and Outliers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Robust Factorization of Real-world Tensor Streams with Patterns, Missing Values, and Outliers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators