Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1321440.1321601acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

An efficient algorithm for approximate biased quantile computation in data streams

Published: 06 November 2007 Publication History

Abstract

We propose an efficient algorithm for approximate biased quantile computation in large data streams. Our algorithm computes decomposable biased quantile summaries on fixed sized blocks and dynamically maintains the biased quantile summary for the entire stream as the exponential histogram over the block-wise quantile summaries. The algorithm is computationally efficient and achieves an amortized computational cost of O(log(1log(∈n))) and a space requirement of O(log3n). Our algorithm does not assume prior knowledge of the stream sizes or the range of data values in the streams. In practice, our algorithm is able to efficiently maintain summaries over large data streams with over tens of millions of observations and achieves significant performance improvement over prior algorithms.

References

[1]
Rakesh Agrawal and Arun Swami. A one-pass space-efficient algorithm for finding quantiles. International Conference of Management of Data (COMAD), 1995.
[2]
Arvind Arasu and Gurmeet Singh Manku. Approximate counts and quantiles over sliding windows. ACM Symposium on Principles of Database Systems (PODS), pages 286--296, 2004.
[3]
Graham Cormode, Flip Korn, S. Muthukrishnan, and Divesh Srivastava. Effective computation of biased quantiles over data streams. International Conference on Data Engineering, pages 20--32, 2005.
[4]
Graham Cormode, Flip Korn, S. Muthukrishnan, and Divesh Srivastava. Space- and time-efficient deterministic algorithms for biased quantiles over data streams. The ACM Symposium on Principles of Database Systems PODS, pages 263--272, 2006.
[5]
Michael B. Greenwald and Sanjeev Khanna. Space-efficient online computation of quantile summaries. ACM SIGMOD, pages 58--66, 2001.
[6]
Michael B. Greenwald and Sanjeev Khanna. Power-conserving computation of order-statistics over sensor networks. Symposium on Principles of database systems, 2004.
[7]
Raj Jain and Imrich Chlamtac. The p2 algorithm for dynamic calculation for quantiles and histograms without storing observations. Communications of the ACM, 28(10):1076--1085, 1985.
[8]
Xuemin Lin, Hongjun Lu, Jian Xu, and Jeffrey Xu Yu. Continuously maintaining quantile summaries of the most recent n elements over a data stream. In ICDE '04: Proceedings of the 20th International Conference on Data Engineering, page 362, Washington, DC, USA, 2004. IEEE Computer Society.
[9]
Gurmeet Singh Manku, Sridhar Rajagopalan, and Bruce G. Lindsay. Approximate medians and other quantiles in one pass and with limited memory. ACM SIGMOD, 12:426--435, June 1998.
[10]
Gurmeet Singh Manku, Sridhar Rajagopalan, and Bruce G. Lindsay. Random sampling techniques for space efficient online computation of order statistics of large datasets. ACM SIGMOD, pages 251--262, 1999.
[11]
J. Ian Munro and Mike Paterson. Selection and sorting with limited storage. Theoretical Computer Science, 12:315--323, 1980.
[12]
Qi Zhang and Wei Wang. An efficient algorithm for approximate biased quantile computation in data streams. Technical report, Department of Computer Science, University of North Carolina, 2007.
[13]
Qi Zhang and Wei Wang. A fast algorithm for approximate quantiles in high speed data streams. SSDBM, 2007.
[14]
Ying Zhang, Xuemin Lin, Jian Xu, Flip Korn, and Wei Wang. Space-efficient relative error order sketch over data streams. International Conference on Data Engineering, 2006.

Cited By

View all
  • (2024)Simple & Optimal Quantile Sketch: Combining Greenwald-Khanna with Khanna-GreenwaldProceedings of the ACM on Management of Data10.1145/36516102:2(1-25)Online publication date: 14-May-2024
  • (2023)Relative Error Streaming QuantilesJournal of the ACM10.1145/361789170:5(1-48)Online publication date: 16-Oct-2023
  • (2023)Together is Better: Heavy Hitters Quantile EstimationProceedings of the ACM on Management of Data10.1145/35889371:1(1-25)Online publication date: 30-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
November 2007
1048 pages
ISBN:9781595938039
DOI:10.1145/1321440
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. biased quantiles
  2. streaming algorithms

Qualifiers

  • Poster

Conference

CIKM07

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Simple & Optimal Quantile Sketch: Combining Greenwald-Khanna with Khanna-GreenwaldProceedings of the ACM on Management of Data10.1145/36516102:2(1-25)Online publication date: 14-May-2024
  • (2023)Relative Error Streaming QuantilesJournal of the ACM10.1145/361789170:5(1-48)Online publication date: 16-Oct-2023
  • (2023)Together is Better: Heavy Hitters Quantile EstimationProceedings of the ACM on Management of Data10.1145/35889371:1(1-25)Online publication date: 30-May-2023
  • (2022)Relative Error Streaming QuantilesACM SIGMOD Record10.1145/3542700.354271751:1(69-76)Online publication date: 1-Jun-2022
  • (2022)Parallel Batch-Dynamic Minimum Spanning Forest and the Efficiency of Dynamic Agglomerative Graph ClusteringProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538584(233-245)Online publication date: 11-Jul-2022
  • (2021)Relative Error Streaming QuantilesProceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3452021.3458323(96-108)Online publication date: 20-Jun-2021
  • (2020)A Tight Lower Bound for Comparison-Based Quantile SummariesProceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3375395.3387650(81-93)Online publication date: 14-Jun-2020
  • (2014)Feature-based high-availability mechanism for quantile tasks in real-time data stream processingSoftware—Practice & Experience10.1002/spe.224444:7(855-871)Online publication date: 1-Jul-2014
  • (2013)Fast computation of approximate biased histograms on sliding windows over data streamsProceedings of the 25th International Conference on Scientific and Statistical Database Management10.1145/2484838.2484851(1-12)Online publication date: 29-Jul-2013
  • (2011)Lightweight Detection of Additive Watermarking in the DWT-DomainIEEE Transactions on Image Processing10.1109/TIP.2010.206432720:2(474-484)Online publication date: 1-Feb-2011

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media