Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/872757.872765acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Approximate join processing over data streams

Published: 09 June 2003 Publication History

Abstract

We consider the problem of approximating sliding window joins over data streams in a data stream processing system with limited resources. In our model, we deal with resource constraints by shedding load in the form of dropping tuples from the data streams. We first discuss alternate architectural models for data stream join processing, and we survey suitable measures for the quality of an approximation of a set-valued query result. We then consider the number of generated result tuples as the quality measure, and we give optimal offline and fast online algorithms for it. In a thorough experimental study with synthetic and real data we show the efficacy of our solutions. For applications with demand for exact results we introduce a new Archive-metric which captures the amount of work needed to complete the join in case the streams are archived for later processing.

References

[1]
A. Arasu, B. Babcock, S. Babu, J. McAlister, and J. Widom. Characterizing memory requirements for queries over continuous data streams. In Proc. Symp. on Principles of Database Systems (PODS), pages 221--232, 2002.]]
[2]
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. Symp. on Principles of Database Systems (PODS), pages 1--16, 2002.]]
[3]
S. Babu and J. Widom. Continuous queries over data streams. ACM SIGMOD Record, 30(3):109--120, 2001.]]
[4]
D. Barbará, W. DuMouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. E. Ioannidis, H. V. Jagadish, T. Johnson, R. T. Ng, V. Poosala, K. A. Ross, and K. C. Sevcik. The New Jersey data reduction report. IEEE Data Engineering Bulletin, 20(4):3--45, 1997.]]
[5]
P. Bonnet, J. Gehrke, and P. Seshadri. Towards sensor database systems. In Proc. Int. Conf. on Mobile Data Management (MDM), pages 3--14, 2001.]]
[6]
D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring streams---a new class of data management applications. In Proc. Int. Conf. on Very Large Databases (VLDB), 2002.]]
[7]
S. Chandrasekaran and M. J. Franklin. Streaming queries over streaming data. In Proc. Int. Conf. on Very Large Databases (VLDB), 2002.]]
[8]
S. Chaudhuri, R. Motwani, and V. R. Narasayya. On random sampling over joins. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 263--274, 1999.]]
[9]
J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continuous query system for internet databases. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 379--390, 2000.]]
[10]
M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. In Proc. ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 635--644, 2002.]]
[11]
M. R. Garey and D. S. Johnson. Computers and Intractability. W. H. Freeman and Company, 1979.]]
[12]
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In Proc. Int. Conf. on Very Large Databases (VLDB), pages 79--88, 2001.]]
[13]
A. V. Goldberg. An efficient implementation of a scaling minimum-cost flow algorithm. Journal of Algorithms, 22(1):1--29, 1997.]]
[14]
M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 58--65, 2001.]]
[15]
S. Guha, N. Koudas, and K. Shim. Data-streams and histograms. In Proc. ACM Symp. on the Theory of Computing (STOC), pages 471--475, 2001.]]
[16]
C. J. Hahn, S. G. Warren, and J. London. Edited synoptic cloud reports from ships and land stations over the globe, 1982--1991. http://cdiac.esd.ornl.gov/ftp/ndp026b, 1996.]]
[17]
D. Hand, H. Mannila, and P. Smyth. Principles of Data Mining. MIT Press, 2001.]]
[18]
J. M. Hellerstein, M. J. Franklin, S. Chandrasekaran, A. Deshpande, K. Hildrum, S. Madden, V. Raman, and M. A. Shah. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 23(2):7--18, 2000.]]
[19]
Y. E. Ioannidis and V. Poosala. Histogram-based approximation of set-valued query-answers. In Proc. Int. Conf. on Very Large Databases (VLDB), pages 174--185, 1999.]]
[20]
Z. G. Ives, D. Florescu, M. Friedman, A. Y. Levy, and D. S. Weld. An adaptive query execution system for data integration. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 299--310, 1999.]]
[21]
J. Kang, J. F. Naughton, and S. D. Viglas. Evaluating window joins over unbounded streams. In Proc. Int. Conf. on Data Engineering (ICDE), 2003.]]
[22]
F. Korn, S. Muthukrishnan, and D. Srivastava. Reverse nearest neighbor aggregates over data streams. In Proc. Int. Conf. on Very Large Databases (VLDB), 2002.]]
[23]
S. Madden and M. J. Franklin. Fjording the stream: An architecture for queries over streaming sensor data. In Proc. Int. Conf. on Data Engineering (ICDE), 2002.]]
[24]
S. R. Madden, M. A. Shah, J. M. Hellerstein, and V. Raman. Continuously adaptive continuous queries over streams. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2002.]]
[25]
R. T. Rockafellar. Network flows and monotropic optimization. John Wiley & Sons, 1984.]]
[26]
Y. Rubner, C. Tomasi, and L. J. Guibas. A metric for distributions with applications to image databaases. In Proc. Int. Conf. on Computer Vision (ICCV), pages 207--214, 1998.]]
[27]
N. Thaper, S. Guha, P. Indyk, and N. Koudas. Dynamic multidimensional histograms. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2002.]]
[28]
C. J. van Rijsbergen. Information Retrieval. Butterworths, 2 edition, 1979.]]

Cited By

View all
  • (2023)OneSketch: A Generic and Accurate Sketch for Data StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327802835:12(12887-12901)Online publication date: 1-Dec-2023
  • (2023)A survey on the evolution of stream processing systemsThe VLDB Journal10.1007/s00778-023-00819-833:2(507-541)Online publication date: 22-Nov-2023
  • (2022)Scaling Equi-JoinsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526042(2163-2176)Online publication date: 10-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '03: Proceedings of the 2003 ACM SIGMOD international conference on Management of data
June 2003
702 pages
ISBN:158113634X
DOI:10.1145/872757
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2003

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS03
Sponsor:

Acceptance Rates

SIGMOD '03 Paper Acceptance Rate 53 of 342 submissions, 15%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)4
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)OneSketch: A Generic and Accurate Sketch for Data StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327802835:12(12887-12901)Online publication date: 1-Dec-2023
  • (2023)A survey on the evolution of stream processing systemsThe VLDB Journal10.1007/s00778-023-00819-833:2(507-541)Online publication date: 22-Nov-2023
  • (2022)Scaling Equi-JoinsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526042(2163-2176)Online publication date: 10-Jun-2022
  • (2021)Data Streaming Processing Window Joined With Graphics Processing Units (GPUs)Encyclopedia of Information Science and Technology, Fifth Edition10.4018/978-1-7998-3479-3.ch043(602-623)Online publication date: 2021
  • (2021)Optimization of threshold functions over streamsProceedings of the VLDB Endowment10.14778/3447689.344769314:6(878-889)Online publication date: 1-Feb-2021
  • (2021)Online Topic-Aware Entity Resolution Over Incomplete Data StreamsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457238(1478-1490)Online publication date: 9-Jun-2021
  • (2020)Efficient Join Synopsis Maintenance for Data WarehouseProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389717(2027-2042)Online publication date: 11-Jun-2020
  • (2019)EdgewiseProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358887(929-945)Online publication date: 10-Jul-2019
  • (2019)Skyline queries over incomplete data streamsThe VLDB Journal10.1007/s00778-019-00577-628:6(961-985)Online publication date: 17-Oct-2019
  • (2018)Providing streaming joins as a service at FacebookProceedings of the VLDB Endowment10.14778/3229863.322986911:12(1809-1821)Online publication date: 1-Aug-2018
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media