Abstract
Effective Big Data analytics pose several difficult challenges for modern data management architectures. One key such challenge arises from the naturally streaming nature of big data, which mandates efficient algorithms for querying and analyzing massive, continuous data streams (that is, data that is seen only once and in a fixed order) with limited memory and CPU-time resources. Such streams arise naturally in emerging large-scale event monitoring applications; for instance, network-operations monitoring in large ISPs, where usage information from numerous sites needs to be continuously collected and analyzed for interesting trends. In addition to memory- and time-efficiency concerns, the inherently distributed nature of such applications also raises important communication-efficiency issues, making it critical to carefully optimize the use of the underlying network infrastructure. In this talk, we introduce the distributed data streaming model, and discuss recent work on tracking complex queries over massive distributed streams, as well as new research directions in this space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alon, N., Gibbons, P.B., Matias, Y., Szegedy, M.: Tracking Join and Self-Join Sizes in Limited Storage. In: Proc. of the 18th ACM Symposium on Principles of Database Systems, Philadeplphia, Pennsylvania (May 1999)
Alon, N., Matias, Y., Szegedy, M.: The Space Complexity of Approximating the Frequency Moments. In: Proc. of the 28th Annual ACM Symposium on the Theory of Computing, Philadelphia, Pennsylvania, pp. 20–29 (May 1996)
Babcock, B., Olston, C.: Distributed Top-K Monitoring. In: Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (June 2003)
Charikar, M., Chen, K., Farach-Colton, M.: Finding Frequent Items in Data Streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)
Cormode, G., Garofalakis, M.: Streaming in a connected world: querying and tracking distributed data streams. In: SIGMOD (2007)
Cormode, G., Garofalakis, M.: Approximate Continuous Querying of Distributed Streams. ACM Transactions on Database Systems 33(2) (June 2008)
Cormode, G., Garofalakis, M., Muthukrishnan, S., Rastogi, R.: Holistic Aggregates in a Networked World: Distributed Tracking of Approximate Quantiles. In: Proc. of the 2005 ACM SIGMOD Intl. Conference on Management of Data, Baltimore, Maryland (June 2005)
Cormode, G., Garofalakis, M., Sacharidis, D.: Fast Approximate Wavelet Tracking on Streams. In: Ioannidis, Y., et al. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 4–22. Springer, Heidelberg (2006)
Cormode, G., Muthukrishnan, S.: What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically. In: Proc. of the 22nd ACM Symposium on Principles of Database Systems, San Diego, California, pp. 296–306 (June 2003)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: The count-min sketch and its applications. In: Latin American Informatics, pp. 29–38 (2004)
Cranor, C., Johnson, T., Spatscheck, O., Shkapenyuk, V.: Gigascope: A Stream Database for Network Applications. In: Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (June 2003)
Das, A., Ganguly, S., Garofalakis, M., Rastogi, R.: Distributed Set-Expression Cardinality Estimation. In: Proc. of the 30th Intl. Conference on Very Large Data Bases, Toronto, Canada (September 2004)
Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing Complex Aggregate Queries over Data Streams. In: Proc. of the 2002 ACM SIGMOD Intl. Conference on Management of Data, Madison, Wisconsin, pp. 61–72 (June 2002)
Ganguly, S., Garofalakis, M., Rastogi, R.: Processing Set Expressions over Continuous Update Streams. In: Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (June 2003)
Garofalakis, M., Keren, D., Samoladas, V.: Sketch-based Geometric Monitoring of Distributed Stream Queries. In: Proc. of the 39th Intl. Conference on Very Large Data Bases, Trento, Italy (August 2013)
Giatrakos, N., Deligiannakis, A., Garofalakis, M., Sharfman, I., Schuster, A.: Prediction-based Geometric Monitoring over Distributed Data Streams. In: Proc. of the 2012 ACM SIGMOD Intl. Conference on Management of Data (June 2012)
Giatrakos, N., Deligiannakis, A., Garofalakis, M., Sharfman, I., Schuster, A.: Distributed Geometric Query Monitoring using Prediction Models. ACM Transactions on Database Systems 39(2) (2014)
Gibbons, P.B.: Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports. In: Proc. of the 27th Intl. Conference on Very Large Data Bases, Roma, Italy (September 2001)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: How to Summarize the Universe: Dynamic Maintenance of Quantiles. In: Proc. of the 28th Intl. Conference on Very Large Data Bases, Hong Kong, China, pp. 454–465 (August 2002)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: One-pass wavelet decomposition of data streams. IEEE Transactions on Knowledge and Data Engineering 15(3), 541–554 (2003)
Greenwald, M.B., Khanna, S.: Space-Efficient Online Computation of Quantile Summaries. In: Proc. of the 2001 ACM SIGMOD Intl. Conference on Management of Data, Santa Barbara, California (May 2001)
Keralapura, R., Cormode, G., Ramamirtham, J.: Communication-efficient distributed monitoring of thresholded counts. In: Proc. of the 2006 ACM SIGMOD Intl. Conference on Management of Data, Chicago, Illinois, pp. 289–300 (June 2006)
Keren, D., Sharfman, I., Schuster, A., Livne, A.: Shape-Sensitive Geometric Monitoring. IEEE Transactions on Knowledge and Data Engineering 24(8) (August 2012)
Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press, Cambridge (1997)
Madden, S.R., Franklin, M.J., Hellerstein, J.M., Hong, W.: The Design of an Acquisitional Query Processor for Sensor Networks. In: Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (June 2003)
Manku, G.S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proc. of the 28th Intl. Conference on Very Large Data Bases, Hong Kong, China, pp. 346–357 (August 2002)
NII Shonan Workshop on Large-Scale Distributed Computation, Shonan Village, Japan (January 2012), http://www.nii.ac.jp/shonan/seminar011/ .
Olston, C., Jiang, J., Widom, J.: Adaptive Filters for Continuous Queries over Distributed Data Streams. In: Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (June 2003)
Sharfman, I., Schuster, A., Keren, D.: A geometric approach to monitoring threshold functions over distributed data streams. In: Proc. of the 2006 ACM SIGMOD Intl. Conference on Management of Data, Chicago, Illinois, pp. 301–312 (June 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Garofalakis, M. (2014). Querying Distributed Data Streams. In: Manolopoulos, Y., Trajcevski, G., Kon-Popovska, M. (eds) Advances in Databases and Information Systems. ADBIS 2014. Lecture Notes in Computer Science, vol 8716. Springer, Cham. https://doi.org/10.1007/978-3-319-10933-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-10933-6_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10932-9
Online ISBN: 978-3-319-10933-6
eBook Packages: Computer ScienceComputer Science (R0)