Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

ASAP: prioritizing attention via time series smoothing

Published: 01 August 2017 Publication History

Abstract

Time series visualization of streaming telemetry (i.e., charting of key metrics such as server load over time) is increasingly prevalent in modern data platforms and applications. However, many existing systems simply plot the raw data streams as they arrive, often obscuring large-scale trends due to small-scale noise. We propose an alternative: to better prioritize end users' attention, smooth time series visualizations as much as possible to remove noise, while retaining large-scale structure to highlight significant deviations. We develop a new analytics operator called ASAP that automatically smooths streaming time series by adaptively optimizing the trade-off between noise reduction (i.e., variance) and trend retention (i.e., kurtosis). We introduce metrics to quantitatively assess the quality of smoothed plots and provide an efficient search strategy for optimizing these metrics that combines techniques from stream processing, user interface design, and signal processing via autocorrelation-based pruning, pixel-aware preaggregation, and on-demand refresh. We demonstrate that ASAP can improve users' accuracy in identifying long-term deviations in time series by up to 38.4% while reducing response times by up to 44.3%. Moreover, ASAP delivers these results several orders of magnitude faster than alternative search strategies.

References

[1]
Amazon CloudWatch. https://aws.amazon.com/cloudwatch/.
[2]
Datadog. https://www.datadoghq.com/.
[3]
Ganglia Monitoring System. http://ganglia.info/.
[4]
Google Stackdriver. https://cloud.google.com/stackdriver/.
[5]
Graphite. https://graphiteapp.org/.
[6]
InfluxDB. https://docs.influxdata.com/influxdb/.
[7]
Microsoft Azure Monitor. https://docs.microsoft.com/azure/monitoring-and-diagnostics.
[8]
New Relic. https://newrelic.com/.
[9]
OpenTSDB. http://opentsdb.net/.
[10]
Prometheus. https://prometheus.io/.
[11]
CHAPTER 15 - moving average filters. In S. W. Smith, editor, Digital Signal Processing. 2003.
[12]
R. Agrawal, K.-I. Lin, H. S. Sawhney, and K. Shim. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In VLDB, pages 490--501, 1995.
[13]
W. Aigner, S. Miksch, H. Schumann, and C. Tominski. Visualization of time-oriented data. Springer, 2011.
[14]
M. I. Ali et al. Citybench: A configurable benchmark to evaluate rsp engines using smart city datasets. In ISWC, pages 374--389, 2015.
[15]
A. Arasu and J. Widom. Resource sharing in continuous sliding-window aggregates. In VLDB, pages 336--347, 2004.
[16]
A. Asta. Observability at Twitter: technical overview, part i, 2016. https://blog.twitter.com/2016/observability-at-twitter-technical-overview-part-i.
[17]
P. Bailis, E. Gan, et al. MacroBase: Prioritizing attention in fast data. In SIGMOD, pages 541--556, 2017.
[18]
P. Bailis, E. Gan, K. Rong, and S. Suri. Prioritizing attention in fast data: Challenges and opportunities. In CIDR, 2017.
[19]
B. Beyer, C. Jones, et al., editors. Site Reliability Engineering: How Google Runs Production Systems. O'Reilly, 2016.
[20]
C. Chatfield. The Analysis of Time Series: An Introduction, Sixth Edition. 2016.
[21]
J. Chen, J. Benesty, et al. New insights into the noise reduction wiener filter. TASLP, pages 1218--1234, 2006.
[22]
N. Cressie. Statistics for spatial data. 1993.
[23]
I. Daubechies. The wavelet transform, time-frequency localization and signal analysis. IEEE Transactions on Information Theory, 1990.
[24]
M. de Oliveira and H. Levkowitz. From visual data exploration to visual data mining: A survey. TVCG, pages 378--394, 2003.
[25]
L. T. DeCarlo. On the meaning and use of kurtosis. Psychological methods, 2(3):292, 1997.
[26]
D. Douglas and T. Peucker. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica, 1973.
[27]
A. N. Eugene Wu. Towards perception-aware interactive data visualization systems. In DSIA, 2015.
[28]
T.-c. Fu. A review on time series data mining. Engineering Applications of Artificial Intelligence, 24(1):164--181, 2011.
[29]
T.-c. Fu, F.-l. Chung, R. Luk, and C.-m. Ng. Representing financial time series based on data point importance. Engineering Applications of Artificial Intelligence, pages 277 -- 300, 2008.
[30]
L. Girod, K. Jamieson, et al. Wavescope: a signal-oriented data stream management system. In SenSys, pages 421--422, 2006.
[31]
H. Hochheiser and B. Shneiderman. Dynamic query tools for time series data sets: timebox widgets for interactive exploration. Information Visualization, pages 1--18, 2004.
[32]
M. Httermann. DevOps for developers. Apress, 2012.
[33]
V. Hulusić, G. Czanner, et al. Investigation of the beat rate effect on frame rate for animated content. In SCCG, pages 151--159, 2009.
[34]
R. Hyndman. Time series data library. http://data.is/TSDLdemo.
[35]
U. Jugel, Z. Jerzak, and other. M4: A visualization-oriented time series data aggregation. In VLDB, pages 797--808, 2014.
[36]
Y. Katsis, Y. Freund, and Y. Papakonstantinou. Combining databases and signal processing in plato. In CIDR, 2015.
[37]
E. Keogh et al. Dimensionality reduction for fast similarity search in large time series databases. KAIS, pages 263--286, 2001.
[38]
E. Keogh et al. Finding surprising patterns in a time series database in linear time and space. In KDD, pages 550--556, 2002.
[39]
E. Keogh, J. Lin, and A. Fu. HOT SAX: Efficiently finding the most unusual time series subsequence. In ICDM, pages 226--233, 2005.
[40]
E. Kreyszig. Advanced Engineering Mathematics. Wiley, NY, fourth edition, 1979.
[41]
A. Lavin and S. Ahmad. Evaluating real-time anomaly detection algorithms - the numenta anomaly benchmark. In IEEE ICMLA, pages 38--44, 2015.
[42]
J. Li et al. No pane, no gain: Efficient evaluation of sliding-window aggregates over data streams. SIGMOD Rec., pages 39--44, 2005.
[43]
L. Li, K. Jamieson, et al. Hyperband: A novel bandit-based approach to hyperparameter optimization. arXiv:1603.06560, 2016.
[44]
T. W. Liao. Clustering of time series data: a survey. Pattern Recognition, pages 1857--1874, 2005.
[45]
M. Lichman. UCI machine learning repository, 2013. Accessed 19-Aug-2016.
[46]
J. Lin, E. Keogh, et al. Visually mining and monitoring massive time series. In KDD, pages 460--469, 2004.
[47]
J. Mackinlay, P. Hanrahan, and C. Stolte. Show me: Automatic presentation for visual analysis. TVCG, pages 1137--1144, 2007.
[48]
J. S. Marron. Automatic smoothing parameter selection: A survey. Empirical Economics, 13(3):187--208, 1988.
[49]
M. Nikulin. Excess coefficient. In M. Hazewinkel, editor, Encyclopedia of Mathematics. 2002.
[50]
A. Parameswaran et al. SeeDB: Visualizing database queries efficiently. In VLDB, pages 325--328, 2013.
[51]
T. Pelkonen et al. Gorilla: A fast, scalable, in-memory time series database. In VLDB, pages 1816--1827, 2015.
[52]
W. H. Press, S. A. Teukolsky, et al. Numerical Recipes in C (2nd Ed.): The Art of Scientific Computing. 1992.
[53]
K. Reumann and A. P. M. Witkam. Optimizing curve segmentation in computer graphics. ICS, 1974.
[54]
K. Rong and P. Bailis. ASAP: Prioritizing attention via time series smoothing (extended version). arXiv:1703.00983, 2017.
[55]
S. Salvador and P. Chan. Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. Tools with Artificial Intelligence, pages 576--584, 2004.
[56]
A. Savitzky and M. J. E. Golay. Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 1964.
[57]
W. Shi and C. Cheung. Performance evaluation of line simplification algorithms for vector generalization. The Cartographic Journal, pages 27--44, 2006.
[58]
R. H. Shumway and D. S. Stoffer. Time Series Analysis and Its Applications. Springer, 2005.
[59]
T. Siddiqui, A. Kim, J. Lee, K. Karahalios, and A. Parameswaran. Effortless visual data exploration with zenvisage: An interactive and expressive visual analytics system. In VLDB, pages 457 -- 468, 2017.
[60]
J. O. Smith. Spectral Audio Signal Processing. 2011.
[61]
F. Tajima. Determination of window size for analyzing dna sequences. Journal of Molecular Evolution, pages 470--473, 1991.
[62]
K. Tangwongsan et al. General incremental sliding-window aggregation. In VLDB, pages 702--713, 2015.
[63]
C. Tomasi and R. Manduchi. Bilateral filtering for gray and color images. In ICCV, pages 839--846, 1998.
[64]
M. Visvalingam and J. D. Whyatt. Line generalisation by repeated elimination of points. The Cartographic Journal, pages 46--51, 1993.
[65]
S. Weart. The carbon dioxide greenhouse effect. The Discovery of Global Warming. American Institute of Physics.
[66]
P. H. Westfall. Kurtosis as Peakedness, 1905--2014. RIP. The American Statistician, pages 191--195, 2014.
[67]
K. Wongsuphasawat et al. Voyager: Exploratory analysis via faceted browsing of visualization recommendations. TVCG, pages 649--658, 2016.
[68]
A. Woodie. Kafka tops 1 trillion messages per day at LinkedIn. Datanami, September 2015. http://www.datanami.com/2015/09/02/kafka-tops-1-trillion-messages-per-day-at-linkedin/.
[69]
E. Wu, L. Battle, and S. R. Madden. The case for data visualization management systems: vision paper. In VLDB, pages 903--906, 2014.
[70]
Y. Zhu and D. Shasha. Efficient elastic burst detection in data streams. In KDD, pages 336--345, 2003.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 10, Issue 11
August 2017
432 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2017
Published in PVLDB Volume 10, Issue 11

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)MinMaxLTTB: Leveraging MinMax-Preselection to Scale LTTB2023 IEEE Visualization and Visual Analytics (VIS)10.1109/VIS54172.2023.00013(21-25)Online publication date: 21-Oct-2023
  • (2023)EC: A Tool for Guiding Chart and Caption EmphasisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332715030:1(120-130)Online publication date: 3-Nov-2023
  • (2022)OnlineSTLProceedings of the VLDB Endowment10.14778/3523210.352321915:7(1417-1425)Online publication date: 1-Mar-2022
  • (2022)Semantics and Anomaly Preserving Sampling Strategy for Large-Scale Time Series DataACM/IMS Transactions on Data Science10.1145/35119182:4(1-25)Online publication date: 30-Mar-2022
  • (2022)cleanTSNeurocomputing10.1016/j.neucom.2022.05.057500:C(155-176)Online publication date: 21-Aug-2022
  • (2022)Efficient Computation of All-Window Length CorrelationsDigital Business and Intelligent Systems10.1007/978-3-031-09850-5_17(251-266)Online publication date: 27-Jun-2022
  • (2021)Stream Data Cleaning under Speed and Acceleration ConstraintsACM Transactions on Database Systems10.1145/346574046:3(1-44)Online publication date: 28-Sep-2021
  • (2021)TSExplain: Surfacing Evolving Explanations for Time SeriesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452769(2686-2690)Online publication date: 9-Jun-2021
  • (2021)GPSClean: An Embedded Tool for Cleaning GPS Data2021 22nd IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM52706.2021.00044(229-232)Online publication date: Jun-2021
  • (2021)Industrial Time Series Data Cleaning Using Generative LSTM and Adaptive Confidence Interval2021 3rd International Symposium on Robotics & Intelligent Manufacturing Technology (ISRIMT)10.1109/ISRIMT53730.2021.9596676(237-242)Online publication date: 24-Sep-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media