Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2882903.2882946acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Time Adaptive Sketches (Ada-Sketches) for Summarizing Data Streams

Published: 14 June 2016 Publication History

Abstract

Obtaining frequency information of data streams, in limited space, is a well-recognized problem in literature. A number of recent practical applications (such as those in computational advertising) require temporally-aware solutions: obtaining historical count statistics for both time-points as well as time-ranges. In these scenarios, accuracy of estimates is typically more important for recent instances than for older ones; we call this desirable property Time Adaptiveness. With this observation, [20] introduced the Hokusai technique based on count-min sketches for estimating the frequency of any given item at any given time. The proposed approach is problematic in practice, as its memory requirements grow linearly with time, and it produces discontinuities in the estimation accuracy. In this work, we describe a new method, Time-adaptive Sketches, (Ada-sketch), that overcomes these limitations, while extending and providing a strict generalization of several popular sketching algorithms. The core idea of our method is inspired by the well-known digital Dolby noise reduction procedure that dates back to the 1960s. The theoretical analysis presented could be of independent interest in itself, as it provides clear results for the time-adaptive nature of the errors. An experimental evaluation on real streaming datasets demonstrates the superiority of the described method over Hokusai in estimating point and range queries over time. The method is simple to implement and offers a variety of design choices for future extensions. The simplicity of the procedure and the method's generalization of classic sketching techniques give hope for wide applicability of Ada-sketches in practice.

References

[1]
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 20--29. ACM, 1996.
[2]
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 2002.
[3]
S. Bhattacharyya, A. Madeira, S. Muthukrishnan, and T. Ye. How to scalably and accurately skip past streams. In Data Engineering Workshop, 2007 IEEE 23rd International Conference on, pages 654--663. IEEE, 2007.
[4]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970.
[5]
O. Chapelle. Modeling delayed feedback in display advertising. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1097--1105. ACM, 2014.
[6]
M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In Automata, Languages and Programming, pages 693--703. Springer, 2002.
[7]
G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005.
[8]
G. Cormode, S. Tirthapura, and B. Xu. Time-decayed correlated aggregates over data streams. Statistical Analysis and Data Mining: The ASA Data Science Journal, 2(5--6):294--310, 2009.
[9]
R. Dolby. Noise reduction systems, Nov. 5 1974. US Patent 3,846,719.
[10]
C. Estan and G. Varghese. Data streaming in computer networking.
[11]
C. Estan and G. Varghese. New directions in traffic measurement and accounting, volume 32. ACM, 2002.
[12]
P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of computer and system sciences, 31(2):182--209, 1985.
[13]
P. B. Gibbons and Y. Matias. Synopsis data structures for massive data sets.
[14]
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In VLDB, volume 1, pages 79--88, 2001.
[15]
T. Graepel, J. Q. Candela, T. Borchert, and R. Herbrich. Web-scale bayesian click-through rate prediction for sponsored search advertising in Microsoft's bing search engine. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 13--20, 2010.
[16]
D. Hillard, S. Schroedl, E. Manavoglu, H. Raghavan, and C. Leggetter. Improving ad relevance in sponsored search. In Proceedings of the third ACM international conference on Web search and data mining, pages 361--370. ACM, 2010.
[17]
O. Kennedy, C. Koch, and A. Demers. Dynamic approaches to in-network aggregation. In Data Engineering, 2009. ICDE'09. IEEE 25th International Conference on, pages 1331--1334. IEEE, 2009.
[18]
G. Krempl, I.vZliobaite, D. Brzezinski, E. Hüllermeier, M. Last, V. Lemaire, T. Noack, A. Shaker, S. Sievi, M. Spiliopoulou, et al. Open challenges for data stream mining research. ACM SIGKDD Explorations Newsletter, 2014.
[19]
G. S. Manku and R. Motwani. Approximate frequency counts over data streams. In Proceedings of the 28th international conference on Very Large Data Bases, pages 346--357. VLDB Endowment, 2002.
[20]
S. Matusevych, A. Smola, and A. Ahmed. Hokusai-sketching streams in real time. In UAI, 2012.
[21]
H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, et al. Ad click prediction: a view from the trenches. In KDD, 2013.
[22]
S. Muthukrishnan. Data streams: Algorithms and applications. Now Publishers Inc, 2005.
[23]
M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In Proceedings of the 16th international conference on World Wide Web, pages 521--530. ACM, 2007.
[24]
S. V. Vaseghi. Advanced digital signal processing and noise reduction. John Wiley & Sons, 2008.
[25]
Q. Yang and X. Wu. 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making, 2006.

Cited By

View all
  • (2024)Unbiased Real-Time Traffic SketchingIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.328400411:3(2371-2383)Online publication date: May-2024
  • (2024)Scout Sketch+: Finding Both Promising and Damping Items Simultaneously in Data StreamsIEEE/ACM Transactions on Networking10.1109/TNET.2024.346919632:6(5491-5506)Online publication date: Dec-2024
  • (2024)A Probabilistic Sketch for Summarizing Cold Items of Data StreamsIEEE/ACM Transactions on Networking10.1109/TNET.2023.331642632:2(1287-1302)Online publication date: Apr-2024
  • Show More Cited By

Index Terms

  1. Time Adaptive Sketches (Ada-Sketches) for Summarizing Data Streams

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
    June 2016
    2300 pages
    ISBN:9781450335317
    DOI:10.1145/2882903
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 June 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. approximate counting algorithms
    2. big-data mining
    3. count-min sketches
    4. hashing
    5. randomized algorithms
    6. sketching
    7. streaming

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS'16
    Sponsor:
    SIGMOD/PODS'16: International Conference on Management of Data
    June 26 - July 1, 2016
    California, San Francisco, USA

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)36
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Unbiased Real-Time Traffic SketchingIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.328400411:3(2371-2383)Online publication date: May-2024
    • (2024)Scout Sketch+: Finding Both Promising and Damping Items Simultaneously in Data StreamsIEEE/ACM Transactions on Networking10.1109/TNET.2024.346919632:6(5491-5506)Online publication date: Dec-2024
    • (2024)A Probabilistic Sketch for Summarizing Cold Items of Data StreamsIEEE/ACM Transactions on Networking10.1109/TNET.2023.331642632:2(1287-1302)Online publication date: Apr-2024
    • (2024)Priority Sketch: A Priority-aware Measurement Framework2024 International Conference on Satellite Internet (SAT-NET)10.1109/SAT-NET62854.2024.00012(18-23)Online publication date: 25-Oct-2024
    • (2024)Scout Sketch: Finding Promising Items in Data StreamsIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621279(1561-1570)Online publication date: 20-May-2024
    • (2024)Synopsis: a Scalable Byzantine Distributed Ledger for IoT Networks2024 6th International Conference on Blockchain Computing and Applications (BCCA)10.1109/BCCA62388.2024.10844486(30-38)Online publication date: 26-Nov-2024
    • (2023)MicroscopeSketch: Accurate Sliding Estimation Using Adaptive ZoomingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599432(2660-2671)Online publication date: 6-Aug-2023
    • (2023)Compressing Distributed Network Sketches With Traffic-Aware SummariesIEEE Transactions on Network and Service Management10.1109/TNSM.2022.317229920:2(1962-1975)Online publication date: Jun-2023
    • (2023)BurstSketch: Finding Bursts in Data StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322368635:11(11126-11140)Online publication date: 1-Nov-2023
    • (2023)A survey on sliding window sketch for network measurementComputer Networks10.1016/j.comnet.2023.109696226(109696)Online publication date: May-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media