Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1989284.1989314acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Theory of data stream computing: where to go

Published: 13 June 2011 Publication History

Abstract

Computing power has been growing steadily, just as communication rate and memory size. Simultaneously our ability to create data has been growing phenomenally and therefore the need to analyze it. We now have examples of massive data streams that are created in far higher rate than we can capture and store in memory economically, gathered in far more quantity than can be transported to central databases without overwhelming the communication infrastructure, and arrives far faster than we can compute with them in a sophisticated way.
This phenomenon has challenged how we store, communicate and compute with data. Theories developed over past 50 years have relied on full capture, storage and communication of data. Instead, what we need for managing modern massive data streams are new methods built around working with less. The past 10 years have seen new theories emerge in computing (data stream algorithms), communication (compressed sensing), databases (data stream management systems) and other areas to address the challenges of massive data streams. Still, lot remains open and new applications of massive data streams have emerged recently. We present an overview of these challenges.

Supplementary Material

WMV File (1989314.wmv)

References

[1]
S. Muthukrishnan. Data Streams: Algorithms and Applications. In Foundations and Trends in Theoretical Computer Science, 2005.
[2]
P. Indyk A tutorial on Streaming, Sketching and Sub-linear Space Algorithms. 2009 Information Theory and Applications Workshop, San Diego, 2009. http://people.csail.mit.edu/indyk/ita-web.pdf
[3]
M. Garofalakis, J. Gehrke and R. Rastogi. Data Stream Management: Processing High-Speed Data Streams, 2007.
[4]
C. Cranor, T. Johnson and O. Spatscheck. Gigascope: How to monitor network traffic at 5Gbit/sec at a time. http://www2.research.att.com/~divesh/meetings/mpds2003/schedule/spatscheck.pdf.
[5]
David Donoho. Compressed sensing. Technical Report, 2004.
[6]
E. Candes and T. Tao. Near-optimal signal recovery from random projections and universalencoding strategies. 2004.
[7]
http://dsp.rice.edu/cs.
[8]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Proc. OSDI, 2004.
[9]
http://en.wikipedia.org/wiki/XLDB.
[10]
Jon Feldman, S. Muthukrishnan, Anastasios Sidiropoulos, Clifford Stein, Zoya Svitkina. On distributing symmetric streaming computations. em Proc. SODA 2008: 710--719.
[11]
H. Karloff, S. Suri, S. and S. Vassilvitskii.A Model of Computation for MapReduce. Proc. ACM-SIAM SODA 2010.
[12]
Graham Cormode, S. Muthukrishnan, Ke Yi. Algorithms for distributed functional monitoring. Proc. SODA 2008: 1076--1085
[13]
Eyal Kushilevitz and Noam Nisan. Communication Complexity, 1997.
[14]
http://www.scholarpedia.org/article/Slepian-Wolf_coding
[15]
Kenneth L. Clarkson, David P. Woodruff. Numerical linear algebra in the streaming model. Proc STOC. 2009: 205--214.
[16]
Kook Jin Ahn, Sudipto Guha. Graph Sparsification in the Semi-streaming Model. ICALP (2) 2009: 328--338.
[17]
A. Chakrabarti, G. Cormode, and A. McGregor. Annotations in data streams. In International Colloquium on Automata, Languages and Programming (ICALP), 2009.
[18]
C. Dwork, M. Naor, T. Pitassi, G. Rothblum, and S. Yekhanin. Pan-Private Streaming Algorithms. ICS, 2010.
[19]
D. Mir, S. Muthukrishnan, A. Nikolov and R. Wright.Pan-Private Algorithms Via Statistics on Sketches. PODS, 2011.
[20]
Cynthia Dwork. Differential Privacy in New Settings. SODA, 2010.

Cited By

View all

Index Terms

  1. Theory of data stream computing: where to go

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
    June 2011
    332 pages
    ISBN:9781450306607
    DOI:10.1145/1989284
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tag

    1. data streams

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '11
    Sponsor:

    Acceptance Rates

    PODS '11 Paper Acceptance Rate 25 of 113 submissions, 22%;
    Overall Acceptance Rate 642 of 2,707 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Lazy regular sensingTheoretical Computer Science10.1016/j.tcs.2023.114057971:COnline publication date: 6-Sep-2023
    • (2022)Lazy Regular SensingDescriptional Complexity of Formal Systems10.1007/978-3-031-13257-5_12(155-169)Online publication date: 29-Aug-2022
    • (2020)Pushing the Scalability of RDF Engines on IoT Edge DevicesSensors10.3390/s2010278820:10(2788)Online publication date: 14-May-2020
    • (2017)Sensing as a Complexity MeasureDescriptional Complexity of Formal Systems10.1007/978-3-319-60252-3_1(3-15)Online publication date: 3-Jun-2017
    • (2014)A High-Throughput and Low-Latency Parallelization of Window-Based Stream Joins on MulticoresProceedings of the 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications10.1109/ISPA.2014.24(117-126)Online publication date: 26-Aug-2014
    • (2013)Computing (and Life) Is All about TradeoffsSpace-Efficient Data Structures, Streams, and Algorithms10.1007/978-3-642-40273-9_9(112-132)Online publication date: 2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media