Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1315451.1315500dlproceedingsArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
Article

Adaptive, hands-off stream mining

Published: 09 September 2003 Publication History

Abstract

Sensor devices and embedded processors are becoming ubiquitous. Their limited resources (CPU, memory and/or communication bandwidth and power) pose some interesting challenges. We need both powerful and concise "languages" to represent the important features of the data, which can (a) adapt and handle arbitrary periodic components, including bursts, and (b) require little memory and a single pass over the data.
We propose AWSOM (Arbitrary Window Stream mOdeling Method), which allows sensors in remote or hostile environments to efficiently and effectively discover interesting patterns and trends. This can be done automatically, i.e., with no user intervention and expert tuning before or during data gathering. Our algorithms require limited resources and can thus be incorporated in sensors, possibly alongside a distributed query processing engine [9, 5, 22]. Updates are performed in constant time, using logarithmic space. Existing, state of the art forecasting methods (SARIMA, GARCH, etc) fall short on one or more of these requirements. To the best of our knowledge, AWSOM is the first method that has all the above characteristics.
Experiments on real and synthetic datasets demonstrate that AWSOM discovers meaningful patterns over long time periods. Thus, the patterns can also be used to make long-range forecasts, which are notoriously difficult to perform. In fact, AWSOM outperforms manually set up auto-regressive models, both in terms of long-term pattern detection and modeling, as well as by at least 10× in resource consumption.

References

[1]
{1} M. Akay, editor. Time Frequency and Wavelets in Biomedical Signal Processing. J. Wiley, 1997.
[2]
{2} A. Arasu, B. Babcock, S. Babu, J. McAlister, and J. Widom. Characterizing memory requirements for queries over continuous data streams. In PODS, 2002.
[3]
{3} J. Beran. Statistics for Long-Memory Processes. Chapman & Hall, 1994.
[4]
{4} T. Bollerslev. Generalized autoregressive conditional heteroskedasticity. J. Econometrics, 31:307-327, 1986.
[5]
{5} P. Bonnet, J. E. Gehrke, and P. Seshadri. Towards sensor database systems. In Proc. MDM, 2001.
[6]
{6} P. J. Brockwell and R. A. Davis. Time Series: Theory and Methods. Springer Series in Statistics. Springer-Verlag, 2nd edition, 1991.
[7]
{7} A. Bulut and A. K. Singh. SWAT: Hierarchical stream summarization in large networks. In Proc. 19th ICDE, 2003.
[8]
{8} L. R. Carley, G. R. Ganger, and D. Nagle. Mems-based integrated-circuit mass-storage systems. CACM, 43(11):72-80, 2000.
[9]
{9} D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. B. Zdonik. Monitoring streams - a new class of data management applications. In Proc. VLDB, 2002.
[10]
{10} Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. Multi-dimensional regression analysis of time-series data streams. In Proc. VLDB, 2002.
[11]
{11} M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. In Proc. SODA, 2002.
[12]
{12} A. Dobra, M. N. Garofalakis, J. Gehrke, and R. Rastogi. Processing complex aggregate queries over data streams. In Proc. SIGMOD, 2002.
[13]
{13} C. Faloutsos. Searching Multimedia Databases by Content . Kluwer Academic Inc., 1996.
[14]
{14} M. N. Garofalakis and P. B. Gibbons. Wavelet synopses with error guarantees. In Proc. SIGMOD, 2002.
[15]
{15} J. Gehrke, F. Korn, and D. Srivastava. On computing correlated aggregates over continual data streams. In Proc. SIGMOD, 2001.
[16]
{16} R. Gencay, F. Selcuk, and B. Whitcher. An Introduction to Wavelets and Other Filtering Methods in Finance and Economics. Academic Press, 2001.
[17]
{17} A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In Proc. VLDB, 2001.
[18]
{18} S. Guha and N. Koudas. Approximating a data stream for querying and estimation: Algorithms and performance evaluation. In Proc. ICDE, 2002.
[19]
{19} J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, and K. Pister. System architecture directions for networked sensors. In Proc. ASPLOS-IX, 2000.
[20]
{20} P. Indyk, N. Koudas, and S. Muthukrishnan. Identifying representative trends in massive time series data sets using sketches. In Proc. VLDB, 2000.
[21]
{21} W. Leland, M. Taqqu, W. Willinger, and D. Wilson. On the self-similar nature of ethernet traffic. IEEE Trans. on Networking, 2(1):1-15, 1994.
[22]
{22} S. R. Madden, M. A. Shah, J. M. Hellerstein, and V. Raman. Continuously adaptive continuous queries over streams. In SIGMOD Conf., 2002.
[23]
{23} D. B. Percival and A. T. Walden. Wavelet Methods for Time Series Analysis. Cambridge Univ. Press, 2000.
[24]
{24} E. Riedel, C. Faloutsos, G. R. Ganger, and D. Nagle. Data mining on an OLTP system (nearly) for free. In SIGMOD Conf., 2000.
[25]
{25} A. S. Weigend and N. A. Gerschenfeld. Time Series Prediction: Forecasting the Future and Understanding the Past. Addison Wesley, 1994.
[26]
{26} B.-K. Yi, N. Sidiropoulos, T. Johnson, H. Jagadish, C. Faloutsos, and A. Biliris. Online data mining for co-evolving time sequences. Proc. ICDE, 2000.
[27]
{27} P. Young. Recursive Estimation and Time-Series Analysis: An Introduction. Springer-Verlag, 1984.
[28]
{28} D. Zhang, D. Gunopulos, V. J. Tsotras, and B. Seeger. Temporal aggregation over data streams using multiple granularities. In Proc. EDBT, 2002.
[29]
{29} R. Zuidwijk and P. de Zeeuw. Fast algorithm for directional time-scale analysis using wavelets. In Proc. SPIE, Wavelet Applications in Signal and Image Processing VI, volume 3458, 1998.

Cited By

View all
  • (2019)Classical and Contemporary Approaches to Big Time Series ForecastingProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3314033(2042-2047)Online publication date: 25-Jun-2019
  • (2019)Dynamic Modeling and Forecasting of Time-evolving Data StreamsProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330947(458-468)Online publication date: 25-Jul-2019
  • (2018)Forecasting big time seriesProceedings of the VLDB Endowment10.14778/3229863.322987811:12(2102-2105)Online publication date: 1-Aug-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
VLDB '03: Proceedings of the 29th international conference on Very large data bases - Volume 29
September 2003
1134 pages

Sponsors

  • VLDB Endowment: Very Large Database Endowment

Publisher

VLDB Endowment

Publication History

Published: 09 September 2003

Qualifiers

  • Article

Conference

VLDB '03
Sponsor:
  • VLDB Endowment
VLDB '03: Very large data bases
September 9 - 12, 2003
Berlin, Germany

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Classical and Contemporary Approaches to Big Time Series ForecastingProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3314033(2042-2047)Online publication date: 25-Jun-2019
  • (2019)Dynamic Modeling and Forecasting of Time-evolving Data StreamsProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330947(458-468)Online publication date: 25-Jul-2019
  • (2018)Forecasting big time seriesProceedings of the VLDB Endowment10.14778/3229863.322987811:12(2102-2105)Online publication date: 1-Aug-2018
  • (2017)Employing traditional machine learning algorithms for big data streams analysisJournal of Systems and Software10.1016/j.jss.2016.06.016127:C(249-257)Online publication date: 1-May-2017
  • (2017)Ecosystem on the WebWorld Wide Web10.1007/s11280-016-0389-x20:3(439-465)Online publication date: 1-May-2017
  • (2016)Regime Shifts in StreamsProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2939672.2939755(1045-1054)Online publication date: 13-Aug-2016
  • (2016)Mining Big Time-series Data on the WebProceedings of the 25th International Conference Companion on World Wide Web10.1145/2872518.2891061(1029-1032)Online publication date: 11-Apr-2016
  • (2016)Non-Linear Mining of Competing Local ActivitiesProceedings of the 25th International Conference on World Wide Web10.1145/2872427.2883010(737-747)Online publication date: 11-Apr-2016
  • (2015)The Web as a JungleProceedings of the 24th International Conference on World Wide Web10.1145/2736277.2741092(721-731)Online publication date: 18-May-2015
  • (2015)Mining and Forecasting of Big Time-series DataProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2731081(919-922)Online publication date: 27-May-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media