Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1316689.1316719dlproceedingsArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
Article

Memory-limited execution of windowed stream joins

Published: 31 August 2004 Publication History

Abstract

We address the problem of computing approximate answers to continuous sliding-window joins over data streams when the available memory may be insufficient to keep the entire join state. One approximation scenario is to provide a maximum subset of the result, with the objective of losing as few result tuples as possible. An alternative scenario is to provide a random sample of the join result, e.g., if the output of the join is being aggregated. We show formally that neither approximation can be addressed effectively for a sliding-window join of arbitrary input streams. Previous work has addressed only the maximum-subset problem, and has implicitly used a frequency-based model of stream arrival. We address the sampling problem for this model. More importantly, we point out a broad class of applications for which an age-based model of stream arrival is more appropriate, and we address both approximation scenarios under this new model. Finally, for the case of multiple joins being executed with an overall memory constraint, we provide an algorithm for memory allocation across the joins that optimizes a combined measure of approximation in all scenarios considered. All of our algorithms are implemented and experimental results demonstrate their effectiveness.

References

[1]
{1} N. Alon, P. Gibbons, Y. Matias, and M. Szegedy. Tracking join and self-join sizes in limited storage. In Proc. of the 1999 ACM Symp. on Principles of Database Systems, pages 10-20, 1999.
[2]
{2} B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. of the 2002 ACM Symp. on Principles of Database Systems, pages 1-16, June 2002.
[3]
{3} B. Babcock, M. Datar, and R. Motwani. Sampling from a moving window over streaming data. In Proc. of the 2002 Annual ACM-SIAM Symp. on Discrete Algorithms, pages 633-634, 2002.
[4]
{4} B. Babcock, M. Datar, and R. Motwani. Load-shedding for aggregation queries over data streams. In Proc. of the 2004 Intl. Conf. on Data Engineering, 2004. To appear.
[5]
{5} S. Chaudhuri, R. Motwani, and V. Narasayya. On random sampling over joins. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 263-274, June 1999.
[6]
{6} W. G. Cochran. Sampling Techniques. John Wiley & Sons, 1977.
[7]
{7} A. Das, J. Gehrke, and M. Riedewald. Approximate join processing over data streams. In Proc. of the 2003 ACM SIGMOD Intl. Conf. on Management of Data, June 2003.
[8]
{8} M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. In Proc. of the 2002 Annual ACM-SIAM Symp. on Discrete Algorithms, pages 635-644, 2002.
[9]
{9} A. Dobra, M. Garofalakis, J. Gehrke, and R. Rastogi. Processing complex aggregate queries over data streams. In Proc. of the 2002 ACM SIGMOD Intl. Conf. on Management of Data, pages 61-72, 2002.
[10]
{10} A. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Fast, small-space algorithms for approximate histogram maintenance. In Proc. of the 2002 Annual ACM Symp. on Theory of Computing, 2002.
[11]
{11} L. Golab and M. Ozsu. Issues in data stream management. SIGMOD Record, 32(2):5-14, June 2003.
[12]
{12} S. Guha, N. Koudas, and K. Shim. Data-streams and histograms. In Proc. of the 2001 Annual ACM Symp. on Theory of Computing, pages 471-475, 2001.
[13]
{13} J. Kang, J. F. Naughton, and S. Viglas. Evaluating window joins over unbounded streams. In Proc. of the 2003 Intl. Conf. on Data Engineering, March 2003.
[14]
{14} S. Krishnamurthy et al. TelegraphCQ: An Architectural Status Report. IEEE Data Engineering Bulletin, 26(1):11-18, March 2003.
[15]
{15} R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.
[16]
{16} SQR - A Stream Query Repository. http://www-db.stanford.edu/stream/sqr.
[17]
{17} N. Tatbul, U. Cetintemel, S. Zdonik, M. Cherniack, and M. Stone-braker. Load-shedding in a data stream manager. In Proc. of the 2003 Intl. Conf. on Very Large Data Bases, September 2003.
[18]
{18} The STREAM Group. STREAM: The Stanford Stream Data Manager. IEEE Data Engineering Bulletin, 26(1):19-26, March 2003.
[19]
{19} T. Urhan and M. J. Franklin. Xjoin: A reactively-scheduled pipelined join operator. IEEE Data Engineering Bulletin, 23(2):27-33, June 2000.
[20]
{20} J. Xie, J. Yang, and Y. Chen. On joining and caching stochastic streams. Technical report, Duke University, Durham, North Carolina, November 2003.
[21]
{21} A. C. Yao. Probabilistic computations: Towards a unified measure of complexity. In Proc. of the 1977 Annual IEEE Symp. on Foundations of Computer Science, pages 222-227, 1977.
[22]
{22} S. Zdonik et al. The Aurora and Medusa Projects. IEEE Data Engineering Bulletin, 26(1), March 2003.
[23]
{23} G. E. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, Inc., 1949.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
August 2004
1380 pages

Sponsors

  • VLDB Endowment: Very Large Database Endowment

Publisher

VLDB Endowment

Publication History

Published: 31 August 2004

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Optimization RFID-enabled Retail Store Management with Complex Event ProcessingInternational Journal of Automation and Computing10.1007/s11633-018-1164-516:1(52-64)Online publication date: 1-Feb-2019
  • (2019)Smart schemeKnowledge and Information Systems10.1007/s10115-018-1195-958:2(341-370)Online publication date: 1-Feb-2019
  • (2017)Skewed distributions in semi-stream joinsInformation Systems10.1016/j.is.2016.09.00764:C(63-74)Online publication date: 1-Mar-2017
  • (2016)FluxQueryProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2882945(1333-1345)Online publication date: 26-Jun-2016
  • (2015)Scalable Distributed Stream Join ProcessingProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2746485(811-825)Online publication date: 27-May-2015
  • (2015)CowicProceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2015.45(21-30)Online publication date: 4-May-2015
  • (2015)High frequency batch-oriented computations over large sliding time windowsFuture Generation Computer Systems10.1016/j.future.2014.09.00843:C(1-11)Online publication date: 1-Feb-2015
  • (2014)Streaming the WebWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2014.02.00125:C(24-44)Online publication date: 1-Mar-2014
  • (2013)Input data organization for batch processing in time window based computationsProceedings of the 28th Annual ACM Symposium on Applied Computing10.1145/2480362.2480437(363-370)Online publication date: 18-Mar-2013
  • (2013)Overcoming memory limitations in high-throughput event-based applicationsProceedings of the 4th ACM/SPEC International Conference on Performance Engineering10.1145/2479871.2479933(399-410)Online publication date: 21-Apr-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media