Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1244002.1244108acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

A priority random sampling algorithm for time-based sliding windows over weighted streaming data

Published: 11 March 2007 Publication History

Abstract

This paper introduces the problem of random sampling from time-based sliding windows over weighted streaming data and presents a priority random sampling (PRS) algorithm for this problem. The algorithm extends classic reservoir-sampling algorithm and weighted random sampling algorithm with a reservoir to deal with the expiration of data items from time-based sliding window, and can avoid drawbacks of classic reservoir-sampling algorithm and weighted sampling algorithm with a reservoir. In the new algorithm, a key is assigned for each data item in the time-based sliding window by compromising its weight and arrival time, and works even when the number of data items in a sliding window varies dynamically over time. The experiments show that PRS algorithm is somewhat superior to WRS algorithm.

References

[1]
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. Proceeding of 21st ACM SIGACT-SIGMODSIGART Symp. on Principles of Database Systems, Madison, Wisconsin, pp. 1--16, May 2002.
[2]
Sirish Chandrasekaran and Michael J. Franklin. Streaming queries over streaming data. Proc. of the 28th Int'l Conf. on Very Large Data Bases (VLDB), Hong Kong, China, 2002.
[3]
P. Gibbons. Distinct sampling for highly-accurate answers to distinct values queries and event reports. Proc. of the 27th Int'l Conf. on Very Large Data Bases (VLDB), Roma, Italy, 2001.
[4]
D. J. Abadi, D. Carney, U. Cetintemel, et al. Aurora: a new model and architecture for data stream management. The VLDB Journal (2003)/Digital Object Identifier (DOI) 10.1007/s00778-003-0095-z
[5]
Zhu Y, Shasha D. Statstream: statistical monitoring of thousands of data streams in real time. Proc. of the 28th Int'l Conf. on Very Large Data Bases (VLDB), Hong Kong, China, 2002.
[6]
Vitter JS. Random sampling with a reservoir. ACM Trans. on Mathematical Software, 1985, 11(1): 37--57.
[7]
G. Manku and R. Motwani. Approximate frequency counts over data streams. Proc. of the 28th Int'l Conf. on Very Large Data Bases. Hong Kong, China, pp. 346--357, 2002.
[8]
Babcock B, Datar M, Motwani R. Sampling from a moving window over streaming data. Proc. of the 13th Annual ACM-SIAM Symp. on Discrete Algorithms. San Francisco: ACM/SIAM, pp. 633--634. 2002.
[9]
M Datar, A Gionis, P Indyk, et al. Maintaining stream statistics over sliding windows. Proc. of the 13th Annual ACM-SIAM Symp on Discrete Algorithms, San Francisco, California, 2002.
[10]
M. Greenwald and S. Khanna, Space-efficient online computation of quantile summaries, Proc. of SIGMOD 2001.
[11]
M. Datar. Algorithms for data stream systems. Ph. D Thesis, Stanford University, 2004.
[12]
P. S. Efraimidis, P. G. Spirakis. Weighted random sampling with a reservoir. Information Processing Letters, Volume 97, Issue 5, pp. 181--185, March 2006.
[13]
C. Cranor, T. Johnson, O. Spatschnek, V. Shkapenyuk. Gogascope: a stream database for network applications. Proc. of ACM SIGMOD 2002, pp. 262, 2002.
[14]
Zhang L, Li Z, Yu M, et al. Random sampling algorithms for sliding windows over data streams. Proc. of the 11th Joint International Computer Conference (JICC 2005). Chongqing, China, pp. 572--575. 2005.
[15]
T. Johnson, S. Muthukrishnan, I. Rozenbaum. Sampling algorithms in a stream operator. SIGMOD Record 2005.
[16]
P. Domingos, G. Hulten. A general framework for mining massive data streams. Journal of Computational & Graphical Statistics, Vol. 12, No. 4, pp.945--949. 2003.
[17]
http://en.wikipedia.org/wiki/Zipf's_law.
[18]
http://www.nslij-genetics.org/wli/zipf/

Cited By

View all
  • (2024)Approximate Matrix Multiplication over Sliding WindowsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671819(3896-3906)Online publication date: 25-Aug-2024
  • (2016)Matrix Sketching Over Sliding WindowsProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2915228(1465-1480)Online publication date: 26-Jun-2016
  • (2015)Weighted Random Sampling over Data StreamsAlgorithms, Probability, Networks, and Games10.1007/978-3-319-24024-4_12(183-195)Online publication date: 22-Nov-2015

Index Terms

  1. A priority random sampling algorithm for time-based sliding windows over weighted streaming data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '07: Proceedings of the 2007 ACM symposium on Applied computing
    March 2007
    1688 pages
    ISBN:1595934804
    DOI:10.1145/1244002
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 March 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data stream
    2. random sampling algorithm
    3. sliding window

    Qualifiers

    • Article

    Conference

    SAC07
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 26 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Approximate Matrix Multiplication over Sliding WindowsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671819(3896-3906)Online publication date: 25-Aug-2024
    • (2016)Matrix Sketching Over Sliding WindowsProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2915228(1465-1480)Online publication date: 26-Jun-2016
    • (2015)Weighted Random Sampling over Data StreamsAlgorithms, Probability, Networks, and Games10.1007/978-3-319-24024-4_12(183-195)Online publication date: 22-Nov-2015

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media