article

Continuous queries over data streams

Authors:

Jennifer WidomAuthors Info & Claims

ACM SIGMOD Record, Volume 30, Issue 3

Pages 109 - 120

https://doi.org/10.1145/603867.603884

Published: 01 September 2001 Publication History

Abstract

In many recent applications, data may take the form of continuous data streams, rather than finite stored data sets. Several aspects of data management need to be reconsidered in the presence of data streams, offering a new research direction for the database community. In this paper we focus primarily on the problem of query processing, specifically on how to define and evaluate continuous queries over data streams. We address semantic issues as well as efficiency concerns. Our main contributions are threefold. First, we specify a general and flexible architecture for query processing in the presence of data streams. Second, we use our basic architecture as a tool to clarify alternative semantics and processing techniques for continuous queries. The architecture also captures most previous work on continuous queries and data streams, as well as related concepts such as triggers and materialized views. Finally, we map out research topics in the area of query processing over data streams, showing where previous work is relevant and describing problems yet to be addressed.

References

[1]

{AF00} M. Altinel and M. J. Franklin. Efficient filtering of XML documents for selective dissemination of information. In Proc. of the 2000 Intl. Conf. on Very Large Data Bases, pages 53-64, September 2000.

Digital Library

[2]

{AGP00} S. Acharya, P. B. Gibbons, and V. Poosala. Congressional samples for approximate answering of group-by queries. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, pages 487-498, May 2000.

Digital Library

[3]

{AGPR99} S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 275-286, June 1999.

Digital Library

[4]

{AH00} R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, pages 261-272, May 2000.

Digital Library

[5]

{AMS96} N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In Proc. of the 1996 Annual ACM Symp. on Theory of Computing, pages 20-29, May 1996.

Digital Library

[6]

{B+97} D. Barbara et al. The New Jersey data reduction report. IEEE Data Engineering Bulletin, 20(4):3-45, 1997.

[7]

{Bar99} D. Barbara. The characterization of continuous queries. Intl. Journal of Cooperative Information Systems, 8(4):295-323, December 1999.

[8]

{BCL89} J. A. Blakeley, N. Coburn, and P. A. Larson. Updating derived relations: Detecting irrelevant and autonomously computable updates. ACM Trans. on Database Systems, 14(3):369-400, 1989.

Digital Library

[9]

{BGR01} S. Babu, M. N. Garofalakis, and R. Rastogi. SPARTAN: A model-based semantic compression system for massive data tables. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 283-294, May 2001.

Digital Library

[10]

{CDTW00} J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continuous query system for internet databases. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, pages 379-390, May 2000.

Digital Library

[11]

{CFPR00} C. Cortes, K. Fisher, D. Pregibon, and A. Rogers. Hancock: a language for extracting signatures from data streams. In Proc. of the 2000 ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pages 9-17, August 2000.

Digital Library

[12]

{CGRS00} K. Chakrabarti, M. N. Garofalakis, R. Rastogi, and K. Shim. Approximate query processing using wavelets. In Proc. of the 2000 Intl. Conf. on Very Large Data Bases, pages 111-122, September 2000.

Digital Library

[13]

{CMN99} S. Chaudhuri, R. Motwani, and V. R. Narasayya. On random sampling over joins. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 263-274, June 1999.

Digital Library

[14]

{DG00} N. G. Duffield and M. Grossglauser. Trajectory sampling for direct traffic observation. In Proc. of the 2000 ACM SIGCOMM, pages 271-284, September 2000.

Digital Library

[15]

{DH00} P. Domingos and G. Hulten. Mining high-speed data streams. In Proc. of the 2000 ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pages 71-80, August 2000.

Digital Library

[16]

{Fin82} S. J. Finkelstein. Common subexpression analysis in database applications. In Proc. of the 1982 ACM SIGMOD Intl. Conf. on Management of Data, pages 235-245, June 1982.

Digital Library

[17]

{FRM94} C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In Proc. of the 1994 ACM SIGMOD Intl. Conf. on Management of Data, pages 419-429, May 1994.

Digital Library

[18]

{FW98} A. Fiat and G. J. Woeginger. Online Algorithms, The State of the Art. Springer-Verlag, Berlin, 1998.

Digital Library

[19]

{GJM96} A. Gupta, H. V. Jagadish, and I. S. Mumick. Data integration using self-maintainable views. In Proc. of the 1996 Intl. Conf. on Extending Database Technology, pages 140-144, March 1996.

Digital Library

[20]

{GK01} M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 58-66, May 2001.

Digital Library

[21]

{GKMS01} A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In Proc. of the 2001 Intl. Conf. on Very Large Data Bases, September 2001.

Digital Library

[22]

{GKS01} J. Gehrke, F. Korn, and D. Srivastava. On computing correlated aggregates over continual data streams. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 13-24, May 2001.

Digital Library

[23]

{GM95} A. Gupta and I. S. Mumick. Maintenance of materialized views: Problems, techniques, and applications. IEEE Data Engineering Bulletin, 18(2):3-18, June 1995.

[24]

{GM99} P. B. Gibbons and Y. Matias. Synopsis data structures for massive data sets. In External Memory Algorithms, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, volume 50, 1999.

Digital Library

[25]

{GMLY98} H. Garcia-Molina, W. J. Labio, and J. Yang. Expiring data in a warehouse. In Proc. of the 1998 Intl. Conf. on Very Large Data Bases, pages 500-511, August 1998.

Digital Library

[26]

{GMMO00} S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In Proc. of the 2000 Annual Symp. on Foundations of Computer Science, pages 359-366, November 2000.

Digital Library

[27]

{GMP97} P. B. Gibbons, Y. Matias, and V. Poosala. Histogram-based approximation of set-valued query-answers. In Proc. of the 1997 Intl. Conf. on Very Large Data Bases, pages 466-475, August 1997.

Digital Library

[28]

{Gra90} Goetz Graefe. Encapsulation of parallelism in the volcano query processing system. In Proc. of the 1990 ACM SIGMOD Intl. Conf. on Management of Data, pages 102-111, May 1990.

Digital Library

[29]

{Gra93} G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73-170, 1993.

Digital Library

[30]

{HF+00} J. M. Hellerstein, M. J. Franklin, et al. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 23(2):7-18, June 2000.

[31]

{HH99} P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 287-298, June 1999.

Digital Library

[32]

{HHW97} J. M. Hellerstein, P. J. Haas, and H. Wang. Online aggregation. In Proc. of the 1997 ACM SIGMOD Intl. Conf. on Management of Data, pages 171-182, May 1997.

Digital Library

[33]

{Hid99} C. Hidber. Online association rule mining. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 145-156, June 1999.

Digital Library

[34]

{HRR98} M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical Report TR-1998-011, Compaq Systems Research Center, Palo Alto, California, May 1998.

[35]

{HSD01} G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In Proc. of the 2001 ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, August 2001. (To appear).

Digital Library

[36]

{IFF+99} Z. G. Ives, D. Florescu, M. Friedman, A. Y. Levy, and D. S. Weld. An adaptive query execution system for data integration. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 299-310, June 1999.

Digital Library

[37]

{IP99} Y. E. Ioannidis and V. Poosala. Histogram-based approximation of set-valued query-answers. In Proc. of the 1999 Intl. Conf. on Very Large Data Bases, pages 174-185, September 1999.

Digital Library

[38]

{JMS95} H. V. Jagadish, I. S. Mumick, and A. Silberschatz. View maintenance issues for the Chronicle data model. In Proc. of the 1995 ACM Symp. on Principles of Database Systems, pages 113-124, May 1995.

Digital Library

[39]

{KGM95} B. Kao and H. Garcia-Molina. An overview of real-time database systems. In S. H. Son, editor, Advances in Real-Time Systems, pages 463-486. Prentice Hall, 1995.

Digital Library

[40]

{LPT99} L. Liu, C. Pu, and W. Tang. Continual queries for internet scale event-driven information delivery. IEEE Trans. on Knowledge and Data Engineering, 11(4):583-590, August 1999.

Digital Library

[41]

{MRL99} G. S. Manku, S. Rajagopalan, and B. G. Lindsay. Random sampling techniques for space efficient online computation of order statistics of large datasets. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 251-262, June 1999.

Digital Library

[42]

{MVW00} Y. Matias, J. S. Vitter, and M. Wang. Dynamic maintenance of wavelet-based histograms. In Proc. of the 2000 Intl. Conf. on Very Large Data Bases, pages 101-110, September 2000.

Digital Library

[43]

{NACP01} B. Nguyen, S. Abiteboul, G. Cobena, and M. Preda. Monitoring XML data on the web. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 437-448, May 2001.

Digital Library

[44]

{PG99} V. Poosala and V. Ganti. Fast approximate answers to aggregate queries on a data cube. In Proc. of the 1999 Intl. Conf. on Scientific and Statistical Database Management, pages 24-33, July 1999.

Digital Library

[45]

{QGMW96} D. Quass, A. Gupta, I. S. Mumick, and J. Widom. Making views self-maintainable for data warehousing. In Proc. of the 1996 Intl. Conf. on Parallel and Distributed Information Systems, pages 158-169, December 1996.

Digital Library

[46]

{Se188} T. K. Sellis. Multiple-query optimization. ACM Trans. on Database Systems, 13(1):23-52, 1988.

Digital Library

[47]

{SLR94} P. Seshadri, M. Livny, and R. Ramakrishnan. Sequence query processing. In Proc. of the 1994 ACM SIGMOD Intl. Conf. on Management of Data, pages 430-441, May 1994.

Digital Library

[48]

{SPAM91} U. Schreier, H. Pirahesh, R. Agrawal, and C. Mohan. Alert: An architecture for transforming a passive DBMS into an active DBMS. In Proc. of the 1991 Intl. Conf. on Very Large Data Bases, pages 469-478, September 1991.

Digital Library

[49]

{STD+00} J. Shanmugasundaram, K. Tufte, D. J. DeWitt, J. F. Naughton, and D. Maier. Architecting a network query engine for producing partial results. In Proc. of the 2000 Intl. Workshop on the Web and Databases, pages 17-22, May 2000.

[50]

{Sul96} M. Sullivan. Tribeca: A stream database manager for network traffic analysis. In Proc. of the 1996 Intl. Conf. on Very Large Data Bases, page 594, September 1996.

Digital Library

[51]

{Tan96} A. S. Tanenbaum. Computer Networks. Prentice Hall, Upper Saddle River, New Jersey, 1996.

Digital Library

[52]

{Tea99} Times-Ten Team. In-memory data management for consumer transactions: The Times-Ten approach. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 528-529, June 1999.

Digital Library

[53]

{TGNO92} D. B. Terry, D. Goldberg, D. Nichols, and B. M. Oki. Continuous queries over append-only databases. In Proc. of the 1992 ACM SIGMOD Intl. Conf. on Management of Data, pages 321-330, June 1992.

Digital Library

[54]

{Tra} Traderbot home page. http://www.traderbot.com.

[55]

{UF01} T. Urhan and M. J. Franklin. Dynamic pipeline scheduling for improving interactive performance of online queries. In Proc. of the 2001 Intl. Conf. on Very Large Data Bases, September 2001. (To appear).

Digital Library

[56]

{UW97} J.D. Ullman and J. Widom. A First Course in Database Systems. Prentice Hall, Upper Saddle River, New Jersey, 1997.

Digital Library

[57]

{Vit85} J. S. Vitter. Random sampling with a reservoir. ACM Trans. on Mathematical Software, 11(1):37-57, March 1985.

Digital Library

[58]

{VW99} J. S. Vitter and M. Wang. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 193-204, June 1999.

Digital Library

[59]

{WA91} A. N. Wilschut and P. M. G. Apers. Dataflow query execution in a parallel main-memory environment. In Proc. of the 1991 Intl. Conf. on Parallel and Distributed Information Systems, pages 68-77, December 1991.

Digital Library

[60]

{WC96} J. Widom and S. Ceri. Active Database Systems: Triggers and Rules for Advanced Database Processing. Morgan Kaufmann, San Francisco, California, 1996.

Digital Library

[61]

{XPA99} XML path language (XPath) version 1.0, November 1999. W3C Recommendation available at http://www.w3.org/TR/xpath.

[62]

{YSJ+00} B. Yi, N. Sidiropoulos; T. Johnson, H. V. Jagadish, C. Faloutsos, and A. Biliris. Online data mining for coevolving time sequences. In Proc. of the 2000 Intl. Conf. on Data Engineering, pages 13-22, March 2000.

Digital Library

Cited By

Heddes MNunes IGivargis TNicolau A(2024)Convolution and Cross-Correlation of Count Sketches Enables Fast Cardinality Estimation of Multi-Join QueriesProceedings of the ACM on Management of Data10.1145/36549322:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654932
Bonifati ATommasini RBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654679
Itkin ITreshcheva EYermolayev ADorofeev NGlushkov S(2024)Data Stream Processing in Reconciliation Testing: Industrial ExperienceTools and Methods of Program Analysis10.1007/978-3-031-50423-5_15(161-174)Online publication date: 3-Jan-2024
https://doi.org/10.1007/978-3-031-50423-5_15
Show More Cited By

Index Terms

Continuous queries over data streams
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Transformation of continuous aggregation join queries over data streams
SSTD'07: Proceedings of the 10th international conference on Advances in spatial and temporal databases

We address continuously processing an aggregation join query over data streams. Queries of this type involve both join and aggregation operations, with windows specified on join input streams. To our knowledge, the existing researches address join query ...
Continuous query processing in data streams using duality of data and queries
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data

Recent data stream systems such as TelegraphCQ have employed the well-known property of duality between data and queries. In these systems, query processing methods are classified into two dual categories -- data-initiative and query-initiative -- ...
Continuous queries over data streams

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record

ACM SIGMOD Record Volume 30, Issue 3

September 2001

97 pages

ISSN:0163-5808

DOI:10.1145/603867

Issue’s Table of Contents

Copyright © 2001 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2001

Published in SIGMOD Volume 30, Issue 3

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

504
Total Citations
View Citations
3,449
Total Downloads

Downloads (Last 12 months)87
Downloads (Last 6 weeks)11

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Heddes MNunes IGivargis TNicolau A(2024)Convolution and Cross-Correlation of Count Sketches Enables Fast Cardinality Estimation of Multi-Join QueriesProceedings of the ACM on Management of Data10.1145/36549322:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654932
Bonifati ATommasini RBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654679
Itkin ITreshcheva EYermolayev ADorofeev NGlushkov S(2024)Data Stream Processing in Reconciliation Testing: Industrial ExperienceTools and Methods of Program Analysis10.1007/978-3-031-50423-5_15(161-174)Online publication date: 3-Jan-2024
https://doi.org/10.1007/978-3-031-50423-5_15
Russo MHashimoto TKang DSun YZaharia M(2023)Accelerating Aggregation Queries on Unstructured Streams of DataProceedings of the VLDB Endowment10.14778/3611479.361149616:11(2897-2910)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611496
Liu RLiu QGe TSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Fairness-Aware Continuous Predictions of Multiple Analytics Targets in Dynamic NetworksProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599341(1512-1523)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599341
Xue YLau V(2023)Online Orthogonal Dictionary Learning Based on Frank–Wolfe MethodIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.313118134:9(5774-5788)Online publication date: Sep-2023
https://doi.org/10.1109/TNNLS.2021.3131181
Wałęga PKaminski MWang DGrau B(2023)Stream reasoning with DatalogMTLWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2023.10077676:COnline publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1016/j.websem.2023.100776
King LOsborn W(2023)Ensemble Methods for Spatial Data Stream ClassificationProcedia Computer Science10.1016/j.procs.2023.09.023224(155-162)Online publication date: 2023
https://doi.org/10.1016/j.procs.2023.09.023
Vianna AKamei FGama KZimmerle CNeto J(2023)A Grey Literature Review on Data Stream Processing applications testingJournal of Systems and Software10.1016/j.jss.2023.111744203(111744)Online publication date: Sep-2023
https://doi.org/10.1016/j.jss.2023.111744
Giallorenzo SMontesi FSafina LZingaro S(2022)Ephemeral data handling in microservices with TqueryPeerJ Computer Science10.7717/peerj-cs.10378(e1037)Online publication date: 22-Jul-2022
https://doi.org/10.7717/peerj-cs.1037
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents