Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1055558.1055582acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Deterministic wavelet thresholding for maximum-error metrics

Published: 14 June 2004 Publication History

Abstract

Several studies have demonstrated the effectiveness of the wavelet, decomposition as a tool for reducing large amounts of data down to compact, wavelet synopses that can be used to obtain fast, accurate approximate answers to user queries. While conventional wavelet synopses are based on greedily minimizing the overall root-mean-squared (i.e., L2-norm) error in the data approximation, recent work has demonstrated that such synopses can suffer from important problems, including severe bias and wide variance in the quality of the data reconstruction, and lack of non-trivial guarantees for individual approximate answers. As a result, probabilistic thresholding schemes have been recently proposed as a means of building wavelet synopses that try to probabilistically control other approximation-error metrics, such as the maximum relative error in data-value reconstruction, which is arguably the most important for approximate query answers and meaningful error guarantees.One of the main open problems posed by this earlier work is whether it is possible to design efficient deterministic wavelet-thresholding algorithms for minimizing non-L2 error metrics that are relevant to approximate query processing systems, such as maximum relative or maximum absolute error. Obviously, such algorithms can guarantee better wavelet synopses and avoid the pitfalls of probabilistic techniques (e.g., "bad" coin-flip sequences) leading to poor solutions. In this paper, we address this problem and propose novel, computationally efficient schemes for deterministic wavelet thresholding with the objective of optimizing maximum-error metrics. We introduce an optimal low polynomial-time algorithm for one-dimensional wavelet thresholding--our algorithm is based on a new Dynamic-Programming (DP) formulation, and can be employed to minimize the maximum relative or absolute error in the data reconstruction. Unfortunately, directly extending our one-dimensional DP algorithm to multi-dimensional wavelets results in a super-exponential increase in time complexity with the data dimensionality. Thus, we also introduce novel, polynomial-time approximation schemes (with tunable approximation guarantees for the target maximum-error metric) for deterministic wavelet thresholding in multiple dimensions.

References

[1]
Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala, and Sridhar Ramaswamy. "Join Synopses for Approximate Query Answering". In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pages 275--286, Philadelphia, Pennsylvania, May 1999.]]
[2]
Laurent Amsaleg, Philippe Bonnet, Michael J. Franklin, Anthony Tomasic, and Tolga Urhan. "Improving Responsiveness for Wide-Area Data Access". IEEE Data Engineering Bulletin, 20(3):3--11, September 1997. (Special Issue on Improving Query Responsiveness).]]
[3]
Kaushik Chakrabarti, Minos Garofalakis, Rajeev Rastogi, and Kyuseok Shim. "Approximate Query Processing Using Wavelets". In Proceedings of the 26th International Conference on Very Large Data Bases, pages 111--122, Cairo, Egypt, September 2000.]]
[4]
Antonios Deligiannakis and Nick Roussopoulos. "Extended Wavelets for Multiple Measures". In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, June 2003.]]
[5]
Amol Deshpande, Minos Garofalakis, and Rajeev Rastogi. "Independence is Good: Dependency-Based Histogram Synopses for High-Dimensional Data". In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, Santa Barbara, California, May 2001.]]
[6]
R. A. DeVore. "Nonlinear Approximation". Acta Numerica, 7:51--150, 1998.]]
[7]
Minos Garofalakis and Phillip B. Gibbons. "Wavelet Synopses with Error Guarantees". In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 476--487, Madison, Wisconsin, June 2002.]]
[8]
Minos Garofalakis and Phillip B. Gibbons. "Probabilistic Wavelet Synopses". ACM Transactions on Database Systems, 29(1), March 2004. (SIGMOD/PODS Special Issue).]]
[9]
Minos Garofalakis and Amit Kumar. "Deterministic Wavelet Thresholding for Maximum-Error Metrics". Bell Labs Technical Memorandum, December 2003.]]
[10]
Anna C. Gilbert, Yannis Kotidis, S. Muthukrishnan, and Martin J. Strauss. "Surfing Wavelets on Streams: One-pass Summaries for Approximate Aggregate Queries". In Proceedings of the 27th International Conference on Very Large Data Bases, Roma, Italy, September 2001.]]
[11]
Dimitrios Gunopulos, George Kollios, Vassilis J. Tsotras, and Carlotta Domeniconi. "Approximating Multi-Dimensional Aggregate Range Queries Over Real Attributes". In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, May 2000.]]
[12]
Peter J. Haas and Arun N. Swami. "Sequential Sampling Procedures for Query Size Estimation". In Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data, pages 341--350, San Diego, California, June 1992.]]
[13]
Joseph M. Hellerstein, Peter J. Haas, and Helen J. Wang. "Online Aggregation". In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, May 1997.]]
[14]
Björn Jawerth and Wim Sweldens. "An Overview of Wavelet Based Multiresolution Analyses". SIAM Review, 36(3):377--412, 1994.]]
[15]
Yossi Matias, Jeffrey Scott Vitter, and Min Wang. "Wavelet-Based Histograms for Selectivity Estimation". In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pages 448--459, Seattle, Washington, June 1998.]]
[16]
Yossi Matias, Jeffrey Scott Vitter, and Min Wang. "Dynamic Maintenance of Wavelet-Based Histograms". In Proceedings of the 26th International Conference on Very Large Data Bases, Cairo, Egypt, September 2000.]]
[17]
Rajeev Motwani and Prabhakar Raghavan. "Randomized Algorithms". Cambridge University Press, 1995.]]
[18]
S. Muthukrishnan, Viswanath Poosala, and Torsten Suel. "On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications". In Proceedings of the Seventh International Conference on Database Theory (ICDT'99), Jerusalem, Israel, January 1999.]]
[19]
Apostol Natsev, Rajeev Rastogi, and Kyuseok Shim. "WALRUS: A Similarity Retrieval Algorithm for Image Databases". In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, May 1999.]]
[20]
Eric J. Stollnitz, Tony D. DeRose, and David H. Salesin. "Wavelets for Computer Graphics -- Theory and Applications". Morgan Kaufmann Publishers, San Francisco, CA, 1996.]]
[21]
Jeffrey Scott Vitter and Min Wang. "Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets". In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, May 1999.]]

Cited By

View all
  • (2024)Lossy Compression of Z-Map Based Shape Models Using Daubechies Wavelet Transform and QuickselectInternational Journal of Automation Technology10.20965/ijat.2024.p061318:5(613-620)Online publication date: 5-Sep-2024
  • (2022)Digital Forensics AI: Evaluating, Standardizing and Optimizing Digital Evidence Mining TechniquesKI - Künstliche Intelligenz10.1007/s13218-022-00763-936:2(143-161)Online publication date: 12-May-2022
  • (2022)Efficient two-dimensional Haar$$^+$$ synopsis construction for the maximum absolute error measureThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00551-228:5(675-701)Online publication date: 11-Mar-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2004
350 pages
ISBN:158113858X
DOI:10.1145/1055558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2004

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS04

Acceptance Rates

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Lossy Compression of Z-Map Based Shape Models Using Daubechies Wavelet Transform and QuickselectInternational Journal of Automation Technology10.20965/ijat.2024.p061318:5(613-620)Online publication date: 5-Sep-2024
  • (2022)Digital Forensics AI: Evaluating, Standardizing and Optimizing Digital Evidence Mining TechniquesKI - Künstliche Intelligenz10.1007/s13218-022-00763-936:2(143-161)Online publication date: 12-May-2022
  • (2022)Efficient two-dimensional Haar$$^+$$ synopsis construction for the maximum absolute error measureThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00551-228:5(675-701)Online publication date: 11-Mar-2022
  • (2021)An Edge Device for Personal Big Data Tracking2021 7th Annual International Conference on Network and Information Systems for Computers (ICNISC)10.1109/ICNISC54316.2021.00095(500-505)Online publication date: Jul-2021
  • (2021)Research on R & D post demand analysis system based on K-means clustering algorithm2021 2nd International Conference on Information Science and Education (ICISE-IE)10.1109/ICISE-IE53922.2021.00113(467-470)Online publication date: Nov-2021
  • (2021)Wavelet-based dynamic and privacy-preserving similitude data models for edge computingWireless Networks10.1007/s11276-020-02457-227:1(351-366)Online publication date: 1-Jan-2021
  • (2020)An Optimal Online Semi-connected PLA Algorithm with Maximum Error BoundIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.2981319(1-1)Online publication date: 2020
  • (2019)Multidimensional Analysis of Big DataEmerging Perspectives in Big Data Warehousing10.4018/978-1-5225-5516-2.ch009(198-224)Online publication date: 2019
  • (2019)Two-dimensional wavelet synopses with maximum error bound and its application in parallel compressionJournal of Intelligent & Fuzzy Systems10.3233/JIFS-17915437:3(3499-3511)Online publication date: 9-Oct-2019
  • (2019)Maintaining Wavelet Synopses for Sliding-Window AggregatesProceedings of the 31st International Conference on Scientific and Statistical Database Management10.1145/3335783.3335793(73-84)Online publication date: 23-Jul-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media