Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/380752.380810acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
Article

Sampling algorithms: lower bounds and applications

Published: 06 July 2001 Publication History

Abstract

We develop a framework to study probabilistic sampling algorithms that approximate general functions of the form \genfunc, where \domain and \range are arbitrary sets. Our goal is to obtain lower bounds on the query complexity of functions, namely the number of input variables x_i that any sampling algorithm needs to query to approximate f(x_1,\ldots,x_n).
We define two quantitative properties of functions --- the it block sensitivity and the minimum Hellinger distance --- that give us techniques to prove lower bounds on the query complexity. These techniques are quite general, easy to use, yet powerful enough to yield tight results. Our applications include the mean and higher statistical moments, the median and other selection functions, and the frequency moments, where we obtain lower bounds that are close to the corresponding upper bounds.
We also point out some connections between sampling and streaming algorithms and lossy compression schemes.

References

[1]
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58(1):137-147, 1999.
[2]
A. Amir, G. Benson, and M. Farach. Let sleeping files lie: Pattern matching in Z-compressed files. Journal of Computer and System Sciences, 52(2):299-307, 1996.
[3]
T. Batu, L. Fortnow, R. Rubinfeld, W. D. Smith, and P. White. Testing that distributions are close. In Proceedings of the 41st IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 259-269, 2000.
[4]
J. O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer-Verlag, 1985.
[5]
A. Broder, M. Charikar, A. Frieze, and M. Mitzenmacher. Min-wise independent permutations. In Proceedings of the 30th Annual ACM Symposium on the Theory of Computing (STOC), pages 327-336, 1998.
[6]
H. Buhrman and R. de Wolf. Complexity measures and decision tree complexity: A survey, 1999. Available at http://www.cwi.nl/rdewolf.
[7]
R. Canetti, G. Even, and O. Goldreich. Lower bounds for sampling algorithms for estimating the average. Information Processing Letters, 53:17-25, 1995.
[8]
S.-F. Chang. Compressed-domain techniques for image/video indexing and manipulation. In Invited article in IEEE International Conference on Image Processing, Special Session on Digital Image/Video Libraries and Video-on-demand, 1995.
[9]
M. Charikar, S. Chaudhuri, R. Motwani, and V. Narasayya. Towards estimation error guarantees for distinct values. In Proceedings of the 19th Annual ACM Symposium on Principles of Database Systems (PODS), pages 268-279, 2000.
[10]
H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. American Mathematical Society, 23:493-507, 1952.
[11]
P. Dagum, R. Karp, M. Luby, and S. Ross. An optimal algorithm for monte carlo estimation. In Proceedings of the 36th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 142-149, 1995.
[12]
P. Diaconis. Group Representation in Probability and Statistics. IMS Lecture Series 11, Institute of Mathematical Statistics, 1999.
[13]
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate L1-difference algorithm for massive data streams. In Proceedings of the 40th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 501-511, 1999.
[14]
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. Testing and spot-checking of data streams. In Proceedings of the 11th IEEE Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 165-174, 2000.
[15]
O. Goldreich and D. Ron. Property testing in bounded degree graphs. In Proceedings of the 29th Annual ACM Symposium on the Theory of Computing (STOC), pages 406-415, 1997.
[16]
O. Goldreich and D. Ron. On testing expansion in bounded-degree graphs. Electronic Colloquium on Computational Complexity (ECCC), 2000. TR00-020.
[17]
M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. In DIMACS series in Discrete Mathematics and Theoretical Computer Science, volume 50, pages 107-118, 1999.
[18]
W. Hoeding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13-30, 1963.
[19]
P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. In Proceedings of the 41st IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 189-197, 2000.
[20]
M. J. Kearns and U. V. Vazirani. An Introduction to Computational Learning Theory. The MIT Press, 1994.
[21]
L. Le Cam and G. Lo Yang. Asymptotics in Statistics - Some Basic Concepts, pages 24-30. Springer-Verlag, 1990.
[22]
A. Nayak and F. Wu. The quantum query complexity of approximating the median and related statistics. In Proceedings of the 31st Annual ACM Symposium on the Theory of Computing (STOC), pages 384-393, 1999.
[23]
N. Nisan. CREW PRAMs and Decision Trees. SIAM Journal on Computing, 20(6):999-1007, 1991.
[24]
J. Radhakrishnan and A. Ta-Shma. Tight bounds for depth-two superconcentrators. In Proceedings of the 38th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 585-594, 1997.
[25]
L. Schulman and V. V. Vazirani. Majorizing estimators and the approximation of #P-complete problems. In Proceedings of the 31st Annual ACM Symposium on the Theory of Computing (STOC), pages 288-294, 1999.
[26]
D. Siegmund. Sequential Analysis - Tests and Confidence Intervals. Springer-Verlag, 1985.
[27]
L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134-1142, 1984.
[28]
H. L. Van Trees. Detection, Estimation, and Modulation Theory. Jon Wiley & Sons, Inc., 1968.
[29]
V. N. Vapnik. Statistical Learning Theory. John Wiley & Sons, Inc., 1998.
[30]
A.-C. Yao. Probabilistic computations: toward a unified measure of complexity. InProceedings of the 18th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 222-227, 1977.

Cited By

View all
  • (2023)Accelerating voting by quantum computationProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625954(1274-1283)Online publication date: 31-Jul-2023
  • (2023)Sampling-Based Winner Prediction in District-Based ElectionsProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3599035(2661-2663)Online publication date: 30-May-2023
  • (2023)Generalized LRS Estimator for Min-Entropy EstimationIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.328074518(3305-3317)Online publication date: 2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
STOC '01: Proceedings of the thirty-third annual ACM symposium on Theory of computing
July 2001
755 pages
ISBN:1581133499
DOI:10.1145/380752
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 July 2001

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

STOC01
Sponsor:

Acceptance Rates

STOC '01 Paper Acceptance Rate 83 of 230 submissions, 36%;
Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)49
  • Downloads (Last 6 weeks)17
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Accelerating voting by quantum computationProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625954(1274-1283)Online publication date: 31-Jul-2023
  • (2023)Sampling-Based Winner Prediction in District-Based ElectionsProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3599035(2661-2663)Online publication date: 30-May-2023
  • (2023)Generalized LRS Estimator for Min-Entropy EstimationIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.328074518(3305-3317)Online publication date: 2023
  • (2022)Generalized Longest Repeated Substring Min-Entropy Estimator2022 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT50566.2022.9834465(342-347)Online publication date: 26-Jun-2022
  • (2021)Support Estimation with Sampling Artifacts and Errors2021 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT45174.2021.9517892(244-249)Online publication date: 12-Jul-2021
  • (2021)Range partitioning within sublinear time: Algorithms and lower boundsTheoretical Computer Science10.1016/j.tcs.2021.01.017857(177-191)Online publication date: Feb-2021
  • (2021)Predicting winner and estimating margin of victory in elections using samplingArtificial Intelligence10.1016/j.artint.2021.103476296(103476)Online publication date: Jul-2021
  • (2020)Range Partitioning Within Sublinear Time in the External Memory ModelAlgorithmic Aspects in Information and Management10.1007/978-3-030-57602-8_29(323-335)Online publication date: 9-Aug-2020
  • (2019)Estimating entropy of distributions in constant spaceProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454751(5162-5173)Online publication date: 8-Dec-2019
  • (2018)A Differentiated Caching Mechanism to Enable Primary Storage Deduplication in CloudsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.279094629:6(1202-1216)Online publication date: 1-Jun-2018
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media