Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Data streams: algorithms and applications

Published: 01 August 2005 Publication History

Abstract

In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerged for reasoning about algorithms that work within these constraints on space, time, and number of passes. Some of the methods rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity. The applications for this scenario include IP network traffic analysis, mining text message streams and processing massive data sets in general. Researchers in Theoretical Computer Science, Databases, IP Networking and Computer Systems are working on the data stream challenges. This article is an overview and survey of data stream algorithmics and is an updated version of [1].

References

[1]
{1} S. Muthukrishnan, "Data streams: Algorithms and applications," http://www.cs.rutgers.edu/~muthu/stream-1-1.ps.]]
[2]
{2} J. Spencer and P. Winkler, "Three thresholds for a liar," Combinatorics, Probability and Computing, vol. 1, no. 1, pp. 81-93, 1992.]]
[3]
{3} Y. Minsky, A. Trachtenberg, and R. Zippel, "Set reconciliation with nearly optimal communication complexity," Technical Report 2000-1796, Cornell Univ.]]
[4]
{4} J. von zur Gathen and J. Gerhard, Modern Computer Algebra, Cambridge University Press, 1999.]]
[5]
{5} A. Broder, M. Charikar, A. Freize, and M. Mitzenmacher, "Min-wise independent permutations," in Proc. ACM STOC, 1998, pp. 327-336.]]
[6]
{6} P. Indyk, "A small approximately min-wise independent family of hash functions," Journal of Algorithms, vol. 38, no. 1, pp. 84-90, 2001.]]
[7]
{7} I. Good, "The population frequencies of species and the estimation of population parameters," Biometrika, vol. 40, no. 16, pp. 237-264, 1953.]]
[8]
{8} A. Orlitsky, N. Santhanam, and J. Zhang, "Always good turing: Asymptotically optimal probability estimation," in Proc. IEEE FOCS, 2003, pp. 179-188.]]
[9]
{9} M. Datar and S. Muthukrishnan, "Estimating rarity and similarity in window streams," in Proc. ESA, 2002, pp. 323-334.]]
[10]
{10} J. Tarui," Finding duplicates in passes. Personal Communication and http://weblog.fortnow.com/2005/03/finding-duplicates.html#comments.]]
[11]
{11} "Lance fortnow," Blog on 03/2005.]]
[12]
{12} M. Hansen, "Slogging," Keynote plenary talk at SIAM Conf. Data Mining, 2005.]]
[13]
{13} J. Vitter, "External memory algorithms and data structures: Dealing with massive data," ACM Computing Surveys, vol. 33, no. 2, pp. 209-271, 2001.]]
[14]
{14} J. Gray and T. Hey, "In search of petabyte databases," http://www.research.microsoft.com/~Gray/talks/.]]
[15]
{15} J. Gray, P. Sundaresan, S. Eggert, K. Baclawski, and P. Weinberger, "Quickly generating billion-record synthetic databases," in Proc. ACM SIGMOD, 1994, pp. 243-252.]]
[16]
{16} http://www.tpc.org/. Details of transactions testing at http://www.tpc .org/tpcc/detail.asp.]]
[17]
{17} C. Cortes, D. Pregibon, and C. Volinsky, "Communities of interest," in Proc. of Intelligent Data Analysis, 2001, pp. 105-114.]]
[18]
{18} A. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, "Surfing wavelets on streams: One pass summaries for approximate aggregate queries," VLDB Journal, pp. 79-88, 2001.]]
[19]
{19} M. Henzinger, P. Raghavan, and S. Rajagopalan, "Computing on data stream," Technical Note 1998-011, May 1998, Digital systems research center, Palo Alto.]]
[20]
{20} C. Estan and G. Varghese, "New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice," ACM Transactions on Computer System, vol. 21, no. 3, pp. 270-313, 2003.]]
[21]
{21} A. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, "Quicksand: Quick summary and analysis of network data," DIMACS Technical Report, 2001-43.]]
[22]
{22} K. Levchenko, R. Paturi, and G. Varghese, "On the difficulty of scalably detecting network attacks," ACM Conference on Computer and Communications Security, pp. 12-20, 2004.]]
[23]
{23} T. Johnson, S. Muthukrishnan, O. Spatscheck, and D. Srivastava, "Streams, security and scalability," Keynote talk, appears in Proc.of 19th Annual IFIP Conference on Data and Applications Security, Lecture Notes in Computer Science 3654, 2005, Springer-Verlag, pp. 1-15.]]
[24]
{24} M. Balazinska, H. Balakrishnan, and M. Stonebraker, "Load management and high availability in the medusa distributed stream processing system," Proc. ACM SIGMOD, pp. 929-930, 2004.]]
[25]
{25} P. Juang, H. Oki, Y. Wang, M. Martonosi, L. Peh, and D. Rubenstein, "Energy-efficient computing for wildlife tracking: Design tradeoffs and early experiences with zebranet," ASPLOS-X Conference, pp. 96-107, 2002.]]
[26]
{26} A. Mainwaring, J. Polastre, R. Szewczyk, D. Culler, and J. Anderson, "Wireless sensor networks for habitat monitoring," in Proc. WSNA, 2002, pp. 88-97.]]
[27]
{27} S. Chen, A. Gaur, S. Muthukrishnan, and D. Rosenbluth, "Wireless in loco sensor data collection and applications," Workshop on Mobile Data Access (MOBEA) II, Held with WWW Conf, 2004.]]
[28]
{28} M Greenwald and S. Khanna, "Space-efficient online computation of quantile summaries," in Proc. ACM SIGMOD, 2001.]]
[29]
{29} G. Manku and R. Motwani, "Approximate frequency counts over data streams," Proc. VLDB, pp. 346-357, 2002.]]
[30]
{30} G. Cormode, S. Muthukrishnan, and I. Rozenbaum, "Summarizing and mining inverse distributions on data streams via dynamic inverse sampling," in Proc. VLDB, 2005, pp. 25-36.]]
[31]
{31} N. Alon, N. Duffield, C. Lund, and M. Thorup, "Estimating sums of arbitrary selections with few probes," Proc. ACM PODS, 2005.]]
[32]
{32} T. Johnson, S. Muthukrishnan, and I. Rozenbaum, "Sampling algorithms in a stream operator," in Proc. ACM SIGMOD, 2005, pp. 1-12.]]
[33]
{33} I. Pohl, "A minimum storage algorithm for computing the median," IBM TR 12713, 1969.]]
[34]
{34} I. Munro and M. Paterson, "Selection and sorting with limited storage," in Proc. IEEE FOCS, 1978, pp. 253-258, Also, Theoretical Computer Science, vol. 12, pp. 315-323, 1980.]]
[35]
{35} G. Manku, S. Rajagopalan, and B. Lindsay, "Random sampling techniques for space efficient online computation of order statistics of large datasets," in Proc. ACM SIGMOD, 1999, pp. 251-262.]]
[36]
{36} N. Shrivastava, C. Buragohain, D. Agrawal, and S. Suri, "Medians and beyond: New aggregation techniques for sensor networks," in Proc. ACM SenSys, 2004.]]
[37]
{37} N. Alon, Y. Matias, and M. Szegedy, "The space complexity of approximating the frequency moments," Proc. ACM STOC, pp. 20-29, 1996.]]
[38]
{38} E. Kushilevitz and N. Nisan, Communication Complexity, Cambridge University Press, 1997.]]
[39]
{39} M. Fischer and S. Salzberg, "Finding a majority among n votes: Solution to problem 81-5," J. Algorithms, vol. 3, pp. 376-379, 1982.]]
[40]
{40} J. Misra and D. Gries, "Finding repeated elements," Science of Computer Programming, pp. 143-152, 1982.]]
[41]
{41} R. Karp, C. Papadimitriou, and S. Shenker, "A simple algorithm for finding frequent elements in sets and bags," ACM Transactions on Database Systems, pp. 51-55, 2003.]]
[42]
{42} L. Golab, D. DeHaan, E. Demaine, A. Lopez-Ortiz, and I. Munro, "Identifying frequent items in sliding windows over on-line packet streams," Internet Measurement Conference, pp. 173-178, 2003.]]
[43]
{43} A. Metwally, D. Agrawal, and A. El Abbadi, "Efficient computation of frequent and top-k elements in data stream," in Proc. ICDT, 2005, pp. 398-412.]]
[44]
{44} G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava, "Finding hierarchical heavy hitters in data streams," in Proc. VLDB, 2003, pp. 464-475.]]
[45]
{45} C. Estan, S. Savage, and G. Varghese, "Automatically inferring patterns of resource consumption in network traffic," in Proc. SIGCOMM, 2003, pp. 137-148.]]
[46]
{46} G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava, "Diamond in the rough: Finding hierarchical heavy hitters in multi-dimensional data," in Proc. ACM SIGMOD, 2004, pp. 155-166.]]
[47]
{47} J. Hershberger, N. Shrivastava, S. Suri, and C. Toth, "Space complexity of hierarchical heavy hitters in multi-dimensional data streams," in Proc. ACM PODS, 2005.]]
[48]
{48} Z. Bar-Yossef, T. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan, "Counting distinct elements in a data stream," in Proc. RANDOM, 2000, pp. 1-10.]]
[49]
{49} G. Frahling, P. Indyk, and C. Sohler, "Sampling in dynamic data streams and applications," in Proc. ACM SoCG, 2005, pp. 142-149.]]
[50]
{50} D. Geiger, V. Karamcheti, Z. Kedem, and S. Muthukrishnan, "Detecting malicious network traffic using inverse distributions of packet contents," in Proc. MineNet, 2005, Held with Proc. ACM SIGCOMM, 2005.]]
[51]
{51} N. Duffield, C. Lund, and M. Thorup, "Flow sampling under hard resource constraints," Sigmetrics, pp. 85-96, 2004.]]
[52]
{52} M. Szegedy, "Naer optimality of the priority sampling procedure," ECCC TR05-001, 2005.]]
[53]
{53} S. Chaudhuri, R. Motwani, and V. Narasayya, "Random sampling for histogram construction: How much is enough?," in Proc. SIGMOD, 1998, pp. 436-447.]]
[54]
{54} F. Korn, J. Gehrke, and D. Srivastava, "On computing correlated aggregates over continual data streams," in Proc. ACM SIGMOD, 2001, pp. 13-24.]]
[55]
{55} D. Coppersmith and R. Kumar, "An improved data stream algorithm for frequency moments," in Proc. ACM-SIAM SODA, 2004, pp. 151-156.]]
[56]
{56} P. Indyk and D. Woodruff, "Tight lower bounds for the distinct elements problem," in Proc. IEEE FOCS, 2003.]]
[57]
{57} M. Saks and X. Sun, "Space lower bounds for distance approximation in the data stream model," in Proc. ACM STOC, 2002, pp. 360-369.]]
[58]
{58} Z. Bar-yossef, T. Jayram, R. Kumar, and D. Sivakumar, "Information statistics approach to data stream and communication complexity," Proc. IEEE FOCS, pp. 209-218, 2002.]]
[59]
{59} A. Chakrabarti, S. Khot, and X. Sun, "Near-optimal lower bounds on the multi-party communication complexity of set disjointness," IEEE Conference on Computational Complexity, pp. 107-117, 2003.]]
[60]
{60} P. Indyk, "Stable distributions, pseudorandom generators, embeddings and data stream computation," in Proc. IEEE FOCS, 2000, pp. 189-197.]]
[61]
{61} J. Chambers, C. Mallows, and B. Stuck, "A method for simulating stable random variables," Journal of the American Statistical Association, vol. 71, no. 354, pp. 340-344, 1976.]]
[62]
{62} G. Cormode, "Stable distributions for stream computations: It's as easy as 0,1,2," Workshop on Management and Processing of Massive Data Streams (MPDS) at FCRC, 2003.]]
[63]
{63} B. Krishnamurthy, S. Sen, Y. Zhang, and Y. Chen, "Sketch-based change detection: Methods, evaluation, and applications," in Proc. ACM SIGCOMM Internet Measurement Conference, 2003, pp. 234-247.]]
[64]
{64} N. Alon, P. Gibbons, Y. Matias, and M. Szegedy, "Tracking join and self-join sizes in limited storage," Proc. ACM PODS, pp. 10-20, 1999.]]
[65]
{65} G. Cormode and S. Muthukrishnan, "What is hot and what is not: Tracking most frequent items dynamically," in Proc. ACM PODS, 2003, pp. 296-306.]]
[66]
{66} A. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, "How to summarize the universe: Dynamic maintenance of quantiles," in Proc. VLDB, 2002, pp. 454-465.]]
[67]
{67} A. Gilbert, S. Guha, Y. Kotidis, P. Indyk, S. Muthukrishnan, and M. Strauss, "Fast, small space algorithm for approximate histogram maintenance," in Proc. ACM STOC, 2002, pp. 389-398.]]
[68]
{68} G. Cormode, M. Datar, P. Indyk, and S. Muthukrishnan, "Comparing data streams using hamming norms (how to zero in)," in Proc. VLDB, 2002, pp. 335-345.]]
[69]
{69} M. Charikar, K. Chen, and M. Farach-Colton, "Finding frequent items in data streams," in Proc. ICALP, 2002, pp. 693-703.]]
[70]
{70} G. Cormode and S. Muthukrishnan, "An improved data stream summary: The count-min sketch and its applications," J. Algorithms, vol. 55, no. 1, pp. 58-75, April 2005.]]
[71]
{71} G. Cormode and S. Muthukrishnan, "Summarizing and mining skewed data streams," in Proc. SIAM SDM, 2005.]]
[72]
{72} G. Cormode, T. Johnson, F. Korn, S. Muthukrishnan, O. Spatscheck, and D. Srivastava, "Holistic UDAFs at streaming speeds," in Proc. ACM SIGMOD, 2004, pp. 35-46.]]
[73]
{73} G. Zipf, Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology, Addison Wesley, 1949.]]
[74]
{74} L. Adamic, "Zipf," 2000, power-law, pareto - a ranking tutorial, http://www.hpl.hp .com/research/idl/papers/ranking/.]]
[75]
{75} P. Flajolet and G. Martin, "Probabilistic counting," in Proc. FOCS, 1983, pp. 76-82.]]
[76]
{76} P. Gibbons and S. Trithapura, "Estimating simple functions on the union of data streams," in Proc. ACM SPAA, 2001, pp. 281-291.]]
[77]
{77} G. Cormode and S. Muthukrishnan, "Estimating dominance norms on multiple data streams," in Proc. ESA, 2003, pp. 148-160.]]
[78]
{78} D. Woodruff, "Optimal space lower bounds for all frequency moments," in Proc. ACM-SIAM SODA, 2004, pp. 167-175.]]
[79]
{79} S. Chien, L. Rasmussen, and A. Sinclair, "Clifford algebras and approximating the permanent," in Proc. ACM STOC, 2002, pp. 222-231.]]
[80]
{80} J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan, "An approximate l1 difference algorithm for massive data streams," in Proc. IEEE FOCS, 1999, pp. 501-511.]]
[81]
{81} A. R. Calderbank, A. Gilbert, K. Levchenko, S. Muthukrishnan, and M. Strauss, "Improved range-summable random variable construction algorithms," in Proc. ACM-SIAM SODA, 2005, pp. 840-849.]]
[82]
{82} K. Levchenko and Y. Liu, "Counting solutions of polynomial equations," Manuscript, 2005.]]
[83]
{83} D. Du and F. Hwang, Combinatorial Group Testing and Its Applications, World Scientific Singapore, 2nd edition, 2000.]]
[84]
{84} G. Cormode and S. Muthukrishnan, "What is new: Finding significant differences in network data streams," in Proc. INFOCOM, 2004.]]
[85]
{85} A. Gilbert, S. Muthukrishnan, and M. Strauss, "Improved time bounds for near-optimal sparse fourier representations," in SPIE Conf. Wavelets, 2005, See also: A. Gilbert and S. Guha and P. Indyk and S. Muthukrishnan and M. Strauss, "Near-optimal sparse fourier estimation via sampling", Proc. ACM STOC, 152-161, 2002.]]
[86]
{86} L. Gasieniec and S. Muthukrishnan, "Determinsitic algorithms for estimating heavy-hitters on turnstile data streams," Manuscript, 2005.]]
[87]
{87} S. Guha, N. Koudas, and K. Shim, "Data streams and histograms," in Proc. ACM STOC, 2001, pp. 471-475.]]
[88]
{88} S. Muthukrishnan, R. Shah, and J. Vitter, "Finding deviants on data streams," in Proc. SSDBM, 2004, pp. 41-50.]]
[89]
{89} G. Cormode and S. Muthukrishnan, "The string edit distance matching problem with moves," in Proc. ACM-SIAM SODA, 2002, pp. 667-676.]]
[90]
{90} A. Haar, "Zur theorie der orthogonalen functionsysteme," Math Annal., vol. 69, pp. 331-371, 1910.]]
[91]
{91} M. Parseval, http://encyclopedia.thefreedictionary.com/Parseval's+theorem 1799.]]
[92]
{92} K. Beauchamp, "Walsh functions and their applications," 1975.]]
[93]
{93} D. Hirschberg, "A linear space algorithm for computing maximal common subsequences," Comm. ACM, vol. 18, no. 6, pp. 341-343, 1975.]]
[94]
{94} S. Guha, "Space efficiency in synopsis construction algorithms," in Proc. VLDB, 2005.]]
[95]
{95} S. Guha, P. Indyk, S. Muthukrishnan, and M. Strauss, "Histogramming data streams with fast per-item processing," in Proc. ICALP, 2002, pp. 681-692.]]
[96]
{96} S. Guha, N. Koudas, and K. Shim, "Approximation and streaming algorithms for histogram construction problems," Journal version.]]
[97]
{97} S. Muthukrishnan and M. Strauss, "Approximate histogram and wavelet summaries of streaming data," DIMACS TR 2004-52, Survey.]]
[98]
{98} S. S. ahinalp and U. Vishkin, "Data compression using locally consistent parsing," Technical Report, University of Maryland Department of Computer Science, 1995.]]
[99]
{99} S. S. ahinalp and U. Vishkin, "Symmetry breaking for suffix tree construction," in Proc. of 26th Symposium on Theory of Computing, 1994, pp. 300-309.]]
[100]
{100} S. S. ahinalp and U. Vishkin, "Efficient approximate and dynamic matching of patterns using a labeling paradigm," in Proc. IEEE FOCS, 1996, pp. 320-328.]]
[101]
{101} G. Cormode, M. Paterson, S. S. ahinalp, and U. Vishkin, "Communication complexity of document exchange," in Proc. ACM-SIAM SODA, 2000, pp. 197-206.]]
[102]
{102} S. Muthukrishnan and S. S. ahinalp, "Approximate nearest neighbors and sequence comparison with block operations," in Proc. STOC, 2000, pp. 416-424.]]
[103]
{103} R. Cole and U. Vishkin, "Deterministic coin tossing and accelerating cascades: micro and macro techniques for designing parallel algorithms," in Proc. ACM STOC, 1986, pp. 206-219.]]
[104]
{104} M. Charikar, C. Chekuri, T. Feder, and R. Motwani, "Incremental clustering and dynamic information retrieval," in Proc. ACM STOC, 1997, pp. 626-635.]]
[105]
{105} S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan, "Clustering data streams," in Proc. IEEE FOCS, 2000, pp. 359-366.]]
[106]
{106} M. Charikar, L. O'Callaghan, and R. Panigrahy, "Better streaming algorithms for clustering problems," in Proc. ACM STOC, 2003, pp. 693-703.]]
[107]
{107} P. Indyk, "Streaming algorithms for geometric problems," Invited talk at CCCG'04.]]
[108]
{108} T. Chan, "Data stream algorithms in computational geometry," Workshop on New Horizons in Computing (Kyoto), 2005.]]
[109]
{109} F. Korn, S. Muthukrishnan, and D. Srivastava, "Reverse nearest neighbor aggregates over data streams," in Proc. VLDB, 2002, pp. 814-825.]]
[110]
{110} M. Datar, A. Gionis, P. Indyk, and R. Motwani, "Maintaining stream statistics over sliding windows," in Proc. ACM-SIAM SODA, 2002, pp. 635-644.]]
[111]
{111} P. Indyk, "Algorithms for dynamic geometric problems over data streams," in Proc. ACM STOC, 2004, pp. 373-380.]]
[112]
{112} G. Frahling and C. Sohler, "Coresets in dynamic geometric data streams," in Proc. ACM STOC, 2005, pp. 209-217.]]
[113]
{113} K. Mulmuley, Computational Geometry: An Introduction through Randomized Algorithms, Prentice Hall, 1993.]]
[114]
{114} R. Motwani and P. Raghavan, Randomized Algorithms, Cambridge University Press, 1995.]]
[115]
{115} R. Seidel and C. Aragon, "Randomized search trees," Algorithmica, vol. 16, pp. 464-497, 1996.]]
[116]
{116} G. Cormode and S. Muthukrishnan, "Space efficient mining of multigraph streams," in Proc. ACM PODS, 2005.]]
[117]
{117} E. Cohen and H. Kaplan, "Spatially-decaying aggregation over a network: Model and algorithms," in Proc. ACM SIGMOD, 2004, pp. 707-718.]]
[118]
{118} S. Muthukrishnan and M. Strauss, "Rangesum histograms," ACM-SIAM SODA, pp. 233-242, 2003.]]
[119]
{119} S. Muthukrishnan and M. Strauss, "Maintenance of multidimensional histograms," in Proc. FSTTCS, 2003, pp. 352-362.]]
[120]
{120} S. Muthukrishnan, M. Strauss, and X. Zheng, "Workload-optimal histograms on streams," in Proc. ESA, 2005.]]
[121]
{121} P. Indyk and D. Woodruff, "Optimal approximations of the frequency moments of data streams," in Proc. ACM STOC, 2005, pp. 202-208.]]
[122]
{122} G. Varghese," 2002, Detecting packet patterns at high speeds. Tutorial at ACM SIGCOMM.]]
[123]
{123} C. Cortes, K. Fisher, D. Pregibon, and A. Rogers, "Hancock: A language for extracting signatures from data streams," in Proc. KDD, 2000, pp. 9-17.]]
[124]
{124} J. Chen, D. DeWitt, F. Tian, and Y. Wang, "Niagaracq: A scalable continuous query system for internet databases," in Proc. ACM SIGMOD, 2000, pp. 379-390.]]
[125]
{125} D. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik, "Aurora: A new model and architecture for data stream management," VLDB Journal, vol. 12, no. 2, pp. 120-139, 2003, See also: "Aurora: A data stream management system", Proc. ACM SIGMOD 2003, Demo.]]
[126]
{126} S. Krishnamurthy, S. Chandrasekaran, O. Cooper, A. Deshpande, M. Franklin, J. Hellerstein, W. Hong, S. Madden, F. Reiss, and M. Shah, "Telegraphcq: An architectural status report," IEEE Data Engineering Bulletin, vol. 26, no. 1, pp. 11-18, 2003.]]
[127]
{127} A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, I. Nishizawa, J. Rosenstein, and J. Widom, "Stream: The stanford stream data manager," in Proc. ACM SIGMOD, 2003, Demo.]]
[128]
{128} C. Cranor, T. Johnson, V. Shkapenyuk, and O. Spatscheck, "The gigascope stream database," IEEE Data Engineering Bulletin, vol. 26, no. 1, pp. 27-32, 2003, See also: C. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk and O. Spatscheck, "Gigsacope: High performance network monitoring with an SQL interface," ACM SIGMOD 2002, Demo.]]
[129]
{129} S. Kannan, "Open problems in streaming," Ppt slides. Personal communication.]]
[130]
{130} R. Kumar and R. Rubinfeld, "Sublinear time algorithms," Algorithms column in SIGACT News 2003.]]
[131]
{131} N. Mishra, D. Oblinger, and L. Pitt, "Sublinear time approximate clustering," in Proc. ACM-SIAM SODA, 2001, pp. 439-447.]]
[132]
{132} T. Batu, S. Guha, and S. Kannan, "Inferring mixtures of markov chains," in Proc. COLT, 2004, pp. 186-199.]]
[133]
{133} J. Kleinberg, "Bursty and hierarchical structure in streams," in Proc. ACM KDD, 2002, pp. 91-101.]]
[134]
{134} L. Villemoes, "Best approximation with walsh atoms," Constructive Approximation, vol. 133, pp. 329-355, 1997.]]
[135]
{135} B. Natarajan, "Sparse approximate solutions to linear systems," SIAM J. Computing, vol. 25, no. 2, pp. 227-234, 1995.]]
[136]
{136} G. Davis, S. Mallat, and M. Avellaneda, "Greedy adaptive approximation," Journal of Constructive Approximation, vol. 13, pp. 57-98, 1997.]]
[137]
{137} M. Garey and D. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman, 1979.]]
[138]
{138} C. Lund and M. Yannakakis, "On the hardness of approximating minimization problems," Journal of ACM, vol. 41, pp. 960-981, 1994.]]
[139]
{139} U. Feige, "A threshold of ln n for approximating set cover," Journal of ACM, vol. 45, no. 4, pp. 634-652, 1998.]]
[140]
{140} A. Gilbert, S. Muthukrishnan, and M. Strauss, "Approximation of functions over redundant dictionaries using coherence," in Proc. ACM-SIAM SODA, 2003, pp. 243-252.]]
[141]
{141} A. Gilbert, S. Muthukrishnan, M. Strauss, and J. Tropp, "Improved sparse approximation over quasi-coherent dictionaries," Intl Conf on Image Processing (ICIP), pp. 37-40, 2003.]]
[142]
{142} J. Tropp, "Greed is good: Algorithmic results for sparse approximation," IEEE Trans. Inform. Theory, vol. 50, no. 10, pp. 2231-2242, 2004.]]
[143]
{143} S. Muthukrishnan, "Nonuniform sparse approximation with haar wavelet basis," DIMACS TR, 2004-42.]]
[144]
{144} Y. Matias and D. Urieli, "Optimal workload-based wavelet synopses," in Proc. ICDT, 2005, pp. 368-382.]]
[145]
{145} M. Garofalakis and A. Kumar, "Deterministic wavelet thresholding for maximum-error metrics," in Proc. ACM PODS, 2004, pp. 166-176.]]
[146]
{146} S. Guha and B. Harb, "Waveletssynopsis for data streams: Minimizing non-euclidean error," in Proc. ACM KDD, 2005.]]
[147]
{147} R. DeVore and G. Lorentz, Constructive Approximation, Springer-Verlag, New York, 1993.]]
[148]
{148} V. Temlyakov, "The best m-term approximation and greedy algorithms," Advances in Computational Math., vol. 8, pp. 249-265, 1998.]]
[149]
{149} D. Donoho, "Compressed sensing," Manuscript, 2004, http://www-stat.stanford.edu/~donoho/Reports/2004/CompressedSensing091604.pdf.]]
[150]
{150} http://www2.ece.rice.edu/~duarte/compsense/.]]
[151]
{151} S. Muthukrishnan, V. Poosala, and T. Suel, "On rectangular partitionings in two dimensions: Algorithms, complexity and applications," in Proc. ICDT, 1999, pp. 236-256.]]
[152]
{152} M. Bender, A. Fernandez, D. Ron, A. Sahai, and S. Vadhan, "The power of a pebble: Exploring and mapping directed graphs," in Proc. ACM STOC, 1998, pp. 269-278.]]
[153]
{153} R. Cole, "On the dynamic finger conjecture for splay trees, part ii, the proof," Technical Report TR1995-701, 1995, Courant Institute, NYU.]]
[154]
{154} G. Blelloch, B. Maggs, S. Leung, and M. Woo, "Space efficient finger search on degree-balanced search trees," in Proc. ACM-SIAM SODA, 2003, pp. 374-383.]]
[155]
{155} S. Sarawagi, "Query processing in tertiary memory databases," in Proc. VLDB, 1995, pp. 585-596.]]
[156]
{156} P. Indyk, "Stream-based geometric algorithms," in Proc. ACM/DIMACS Workshop on Management and Processing of Data Streams (MPDS), 2003.]]
[157]
{157} P. Indyk, "Better algorithms for high dimensional proximity problems via asymmetric embeddings," in Proc. ACM-SIAM SODA, 2003, pp. 539-545.]]
[158]
{158} J. Feigenbaum, S. Kannan, and J. Ziang, "Computing diameter in the streaming and sliding window models," Algorithmica, vol. 41, no. 1, pp. 25-41, 2004.]]
[159]
{159} J. Hershberger and S. Suri, "Adaptive sampling for geometric problems over data streams," in Proc. ACM PODS, 2004, pp. 252-262.]]
[160]
{160} S. Suri, C. Toth, and Y. Zhou, "Range counting over multidimensional data streams," in Proc. ACM SoCG, 2004, pp. 160-169.]]
[161]
{161} A. Bagchi, A. Chaudhary, D. Eppstein, and M. Goodrich, "Deterministic sampling and range counting in geometric data streams," Proc. ACM SOCG, pp. 144-151, 2004.]]
[162]
{162} P. K. Agarwal, S. Har-Peled, and K. Varadarajan, "Geometric approximation via coresets," Survey. Available at http://valis.cs.uiuc.edu/~sariel/papers/04/survey/.]]
[163]
{163} G. Cormode and S. Muthukrishnan, "Radial histograms for spatial streams," DIMACS Technical Report, 2003-11.]]
[164]
{164} T. Chan and E. Chen, "Multi-pass geometric algorithms," in Proc. ACM SoCG, 2005, pp. 180-189.]]
[165]
{165} O. Reingold, "Undirected st-connectivity in logspace," in Proc. STOC, 2005, pp. 376-385.]]
[166]
{166} Z. Bar-Yossef, R. Kumar, and D. Sivakumar, "Reductions in streaming algorithms, with an application to counting triangles in graphs," in Proc. ACM-SIAM SODA, 2002, pp. 623-632.]]
[167]
{167} P. Raghavan, "Graph structure of the web: A survey," in Proc. LATIN, 2000, pp. 123-125.]]
[168]
{168} L. Buriol, D. Donato, S. Leonardi, and T. Matzner, "Using data stream algorithms for computing properties of large graphs," in Workshop on Massive Geometric Datasets, With ACM SoCG 2005, Pisa.]]
[169]
{169} S. Venkataraman, D. Song, P. Gibbons, and A. Blum, "New streaming algorithms for superspreader detection," Network and Distributed Systems Security Symposium, 2005.]]
[170]
{170} S. Nath, H. Yu, P. Gibbons, and S. Seshan, "Synopsis diffusion for robust aggregation in sensor networks," Intel Tech Report, IRP-TR-04-13, 2004.]]
[171]
{171} J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang, "Graph distances in the streaming model: The value of space," in Proc. ACM-SIAM SODA, 2005, pp. 745-754.]]
[172]
{172} J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang, "On graph problems in a semi-streaming model," in Proc of ICALP, 2004, pp. 531-543.]]
[173]
{173} M. Elkin and J. Zhang, "Efficient algorithms for constructing (1 + e,b)-spanners in the distributed and streaming models," in Proc. ACM PODC, 2004, pp. 160-168.]]
[174]
{174} M. Magdon-Ismail, M. Goldberg, W. Wallace, and D. Siebecker, "Locating hidden groups in communication networks using hidden markov models," in Proc. ISI, 2003, pp. 126-137.]]
[175]
{175} N. Thaper, S. Guha, P. Indyk, and N. Koudas, "Dynamic multidimensional histograms," in Proc. ACM SIGMOD, 2002, pp. 428-439.]]
[176]
{176} D. Shah, S. Iyer, B. Prabhakar, and N. McKeown, "Maintaining statistics counters in router line cards," IEEE Micro, pp. 76-81, 2002.]]
[177]
{177} Y. Zhang, S. Singh, S. Sen, N. Duffield, and C. Lund, "Online identification of hierarchical heavy hitters: Algorithms, evaluation, and applications," in Proc. of the Internet Measurement Conference (IMC), 2004, pp. 101-114.]]
[178]
{178} G. Humphreys, M. Houston, Y. Ng, R. Frank, S. Ahern, P. Kirchner, and J. Klosowski, "Chromium: A stream processing framework for interactive rendering on clusters," in Proc. ACM SIGGRAPH, 2002, pp. 693-702.]]
[179]
{179} S. Guha, K. Mungala, K. Shankar, and S. Venkatasubramanian, "Application of the two-sided depth test to csg rendering," in Proc. I3d, ACM Interactive 3D graphics, 2003.]]
[180]
{180} S. Venkatasubramanian, "The graphics card as a stream computer," in Proc. ACM/DIMACS Workshop on Management and Processing of Data Streams (MPDS), 2003, See also: http://www.research.att.com/~suresh/papers/mpds/index.html.]]
[181]
{181} D. Manocha, "Interactive geometric computations using graphics hardware. course," ACM SIGGRAPH, 2002.]]
[182]
{182} G. Cormode, S. Muthukrishnan, and C. Sahinalp, "Permutation editing and matching via embeddings," in Proc. ICALP, 2001, pp. 481-492.]]
[183]
{183} M. Ajtai, T. Jayram, S. R. Kumar, and D. Sivakumar, "Counting inversions in a data stream," Proc. ACM STOC, pp. 370-379, 2002.]]
[184]
{184} A. Gupta and F. Zane, "Counting inversions in lists," ACM-SIAM SODA, pp. 253-254, 2003.]]
[185]
{185} A. Arasu and G. Manku, "Approximate counts and quantiles over sliding windows," Proc. ACM PODS, pp. 286-296, 2004.]]
[186]
{186} C. Cortes and D. Pregibon, "Signature-based methods for data streams," Data Mining and Knowledge Discovery, vol. 5, no. 3, pp. 167-182, 2001.]]
[187]
{187} M. Hoffman, S. Muthukrishnan, and R. Raman, "Location streams: Models and algorithms," DIMACS TR, 2004-28.]]
[188]
{188} A. Aboulnaga and S. Chaudhuri, "Self-tuning histograms: Building histograms without looking at data," Proc. ACM SIGMOD, pp. 181-192, 1998.]]
[189]
{189} A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston, "Finding (recently) frequent items in distributed data streams," in Proc. ICDE, 2005, pp. 767-778.]]
[190]
{190} S. Ganguly, M. Garofalakis, and R. Rastogi, "Tracking set-expression cardinalities over continuous update streams," VLDB Journal, vol. 13, no. 4, pp. 354-369, 2004.]]
[191]
{191} G. Cormode, M. Garofalakis, S. Muthukrishnan, and R. Rastogi, "Holistic aggregates in a networked world: Distributed tracking of approximate quantiles," in Proc. ACM SIGMOD, 2005, pp. 25-36.]]
[192]
{192} G. Cormode and M. Garofalakis, "Sketching streams through the net: Distributed approximate query tracking," in Proc. VLDB, 2005, pp. 13-24.]]
[193]
{193} T. Dasu and T. Johnson, Exploratory Data Mining and Data Quality, vol. ISBN: 0-471-26851-8, Wiley, May 2003.]]
[194]
{194} T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk, "Mining database structure or how to build a data quality browser," in Proc. ACM SIGMOD, 2002, pp. 240-251.]]
[195]
{195} F. Korn, S. Muthukrishnan, and Y.Zhu, "Checks and balances: Monitoring data quality in network traffic databases," in Proc. VLDB, 2003, pp. 536-547.]]
[196]
{196} D. Achlioptas and F. McSherry, "Fast computation of low rank approximation," Proc. ACM STOC, pp. 611-618, 2001.]]
[197]
{197} P. Drineas and R. Kannan, "Pass efficient algorithms for approximating large matrices," in Proc. ACM-SIAM SODA, 2003, pp. 223-232.]]
[198]
{198} P. Drineas, R. Kannan, and M. Mahoney, "Fast monte carlo algorithms for matrices i: Approximating matrix multiplication," Yale Technical Report YALEU/DCS/TR-1269, 2004, To appear in SIAM J. Computing.]]
[199]
{199} P. Drineas, R. Kannan, and M. Mahoney, "Fast monte carlo algorithms for matrices ii: Computing low-rank approximations to a matrix," Yale Technical Report YALEU/DCS/TR-1270, 2004, To appear in SIAM J. Computing.]]
[200]
{200} P. Drineas, R. Kannan, and M. Mahoney, "Fast monte carlo algorithms for matrices iii: Computing an efficient approximate decomposition of a matrix," Yale Technical Report YALEU/DCS/TR-1271, 2004, To appear in SIAM J. Computing.]]
[201]
{201} D. Donoho, "High-dimensional data analysis: The curses and blessings of dimensionality," Manuscript, 2000, http://www-stat.stanford.edu/~donoho/.]]
[202]
{202} E. Kohler, J. Li, V. Paxson, and S. Shenker, "Observed structure of addresses in ip traffic," Internet Measurement Workshop, pp. 253-266, 2002.]]
[203]
{203} A. Wong, L. Wu, P. Gibbons, and C. Faloutsos, "Fast estimation of fractal dimension and correlation integral on stream data," Inf. Process. Lett., vol. 93, no. 2, pp. 91-97, 2005.]]
[204]
{204} F. Korn, S. Muthukrishnan, and Y. Wu, "Model fitting of ip network traffic at streaming speeds," Manuscript, 2005.]]
[205]
{205} S. Balakrishnan and D. Madigan, "A one-pass sequential monte carlo method for bayesian analysis of massive datasets," Manuscript, 2004.]]
[206]
{206} A. Razborov, A. Wigderson, and A. Yao, "Read-once branching programs, rectangular proofs of the pigeonhole principle and the transversal calculus," in Proc. STOC, 1997, pp. 739-748.]]
[207]
{207} R. Fagin, M. Naor, and P. Winkler, "Comparing information without leaking it: Simple solutions," Communications of the ACM, vol. 39, no. 5, pp. 77-85, 1996.]]
[208]
{208} A. Yao, "Protocols for secure computations," in Proc. IEEE FOCS, 1982, pp. 160-164.]]
[209]
{209} O. Goldreich, "Secure multiparty computation," 1998, Book at http://philby.ucsd .edu/cryptolib/BOOKS/oded-sc.html.]]
[210]
{210} B. Pinkas, "Cryptographic techniques for privacy-preserving data mining," SIGKDD Explorations, the newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining, January 2003.]]
[211]
{211} G. Aggarwal, N. Mishra, and B. Pinkas, "Secure computation of the k'th-ranked element," Advances in Cryptology - Eurocrypt, LNCS 3027, May 2004, Springer-Verlag, pp. 40-55.]]
[212]
{212} M. Freedman, K. Nissim, and B. Pinkas, "Efficient private matching and set intersection," Advances in Cryptology - Eurocrypt, LNCS 3027, May 2004, Springer-Verlag, pp. 1-19.]]
[213]
{213} G. Jagannathan and R. Wright, "Privacy-preserving distributed k-means clustering over arbitrarily partitioned data," in Proc. ACM KDD, 2005.]]
[214]
{214} H. Subramaniam, R. Wright, and Z. Yang, "Experimental analysis of privacy-preserving statistics computation," in Proc. of the Workshop on Secure Data Management (held in conjunction with VLDB). 2004, Springer, LNCS 3178.]]
[215]
{215} J. Feigenbaum, Y. Ishai, T. Malkin, K. Nissim, M. Strauss, and R. Wright, "Secure multiparty computation of approximations," in Proc. ICALP, 2001, pp. 927-938.]]
[216]
{216} D. Knuth, The art of computer programming, Volume III: Sorting and searching, Addison-Wesley, 1973.]]
[217]
{217} J. Feigenbaum, "Massive graphs: Algorithms, applications, and open problems," Invited Lecture, Combinatorial Pattern Matching, 1999.]]
[218]
{218} B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Models and issues in data stream systems," Proc. ACM PODS, pp. 1-16, 2002.]]
[219]
{219} http://www7.nationalacademies.org/bms/Massive_Data_Workshop.html.]]

Cited By

View all
  • (2024)Streaming Graph Algorithms in the Massively Parallel Computation ModelProceedings of the 43rd ACM Symposium on Principles of Distributed Computing10.1145/3662158.3662770(496-507)Online publication date: 17-Jun-2024
  • (2024)GraphZeppelin: How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)ACM Transactions on Database Systems10.1145/364384649:3(1-31)Online publication date: 16-May-2024
  • (2024)DPSW-Sketch: A Differentially Private Sketch Framework for Frequency Estimation over Sliding WindowsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671694(3255-3266)Online publication date: 25-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Foundations and Trends® in Theoretical Computer Science
Foundations and Trends® in Theoretical Computer Science  Volume 1, Issue 2
August 2005
120 pages

Publisher

Now Publishers Inc.

Hanover, MA, United States

Publication History

Published: 01 August 2005

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Streaming Graph Algorithms in the Massively Parallel Computation ModelProceedings of the 43rd ACM Symposium on Principles of Distributed Computing10.1145/3662158.3662770(496-507)Online publication date: 17-Jun-2024
  • (2024)GraphZeppelin: How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)ACM Transactions on Database Systems10.1145/364384649:3(1-31)Online publication date: 16-May-2024
  • (2024)DPSW-Sketch: A Differentially Private Sketch Framework for Frequency Estimation over Sliding WindowsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671694(3255-3266)Online publication date: 25-Aug-2024
  • (2024)Sequential Data Classification under Dynamic EmissionPattern Recognition and Image Analysis10.1134/S105466182401004834:1(187-198)Online publication date: 1-Mar-2024
  • (2024)Massively parallel and streaming algorithms for balanced clusteringTheoretical Computer Science10.1016/j.tcs.2023.114296983:COnline publication date: 1-Feb-2024
  • (2024)Semi-streaming Algorithms for Submodular Function Maximization Under b-Matching, Matroid, and Matchoid ConstraintsAlgorithmica10.1007/s00453-024-01272-x86:11(3598-3628)Online publication date: 14-Sep-2024
  • (2024)Nearly Time-Optimal Kernelization Algorithms for the Line-Cover Problem with Big DataAlgorithmica10.1007/s00453-024-01231-686:8(2448-2478)Online publication date: 1-Aug-2024
  • (2023)Single-pass pivot algorithm for correlation clustering. keep it simple!Proceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666402(6412-6421)Online publication date: 10-Dec-2023
  • (2023)Towards Optimal Moment Estimation in Streaming and Distributed ModelsACM Transactions on Algorithms10.1145/359649419:3(1-35)Online publication date: 24-Jun-2023
  • (2023)A Survey on Progressive VisualizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.334664130:9(6447-6467)Online publication date: 25-Dec-2023
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media