Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/645844.668482guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Sequential Sampling Algorithms: Unified Analysis and Lower Bounds

Published: 13 December 2001 Publication History

Abstract

Sequential sampling algorithms have recently attracted interest as a way to design scalable algorithms for Data mining and KDD processes. In this paper, we identify an elementary sequential samplingtask (estimation from examples), from which one can derive many other tasks appearing in practice. We present a generic algorithm to solve this task and an analysis of its correctness and running time that is simpler and more intuitive than those existing in the literature. For two specific tasks, frequency and advantage estimation, we derive lower bounds on running time in addition to the general upper bounds.

References

[1]
H. Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Annals of Mathematical Statistics 23, pp. 493-509, 1952.
[2]
P. Dagum, R. Karp, M. Luby, and S. Ross, An optimal algorithm for monte carlo estimation, SIAM J. Comput. Vol. 29(5), pp. 1484-1496, 2000.
[3]
C. Domingo and O. Watanabe, Scaling up a boosting-based learner via adaptive sampling, in Proc. of Knowledge Discovery and Data Mining (PAKDD'00), Lecture Notes in Artificial Intelligence 1805, Springer-Verlag, pp. 317-328, 2000.
[4]
C. Domingo, R. Gavaldà, and O. Watanabe, Practical algorithms for on-line selection, in Proc. of the First Intl. Conference on Discovery Science , Lecture Notes in Artificial Intelligence 1532, Springer-Verlag, pp. 150-161, 1998.
[5]
C. Domingo, R. Gavaldà, and O. Watanabe, Adaptive sampling methods for scaling up knowledge discovery algorithms, in Proc. of the Second Intl. Conference on Discovery Science , Lecture Notes in Artificial Intelligence, Springer-Verlag, pp. 172-183, 1999. The final version will appear in J. Knowledge Discovery and Data Mining and is also available as research report C-136, Dept. of Math. and Computing Sciences, Tokyo Institute of Technology, from www.is.titech.ac.jp/research/research-report/C/.
[6]
P. Domingos and G. Hulten, Mining high-speed data streams, in Proc. 6th Intl. Conference on Knowledge Discovery in Databases , ACM Press, pp. 71-80, 2000.
[7]
P. Domingos and G. Hulten, A general method for scaling up machine learning algorithms and its applications to clustering, in Proc. 8th Intl. Conference on Machine Learning , Morgan Kaufmann, pp. 106-113, 2001.
[8]
W. Feller, An Introduction to Probability Theory and its Applications (Third Edition), John Wiley & Sons, 1968.
[9]
B.K. Ghosh, M. Mukhopadhyay, P.K. Sen, Sequential Estimation , Wiley, 1997.
[10]
P. Haas and A. Swami, Sequential sampling, procedures for query size estimation, IBM Research Report , RJ 9101 (80915), 1992.
[11]
W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association 58, pp. 13-30, 1963.
[12]
G.H. John and P. Langley, Static versus dynamic sampling for data mining, in Proc. of the Second Intl. Conference on Knowledge Discovery and Data Mining , AAAI/MIT Press, pp. 367-370, 1996.
[13]
J. Kivinen and H. Mannila, The power of sampling in knowledge discovery, in Proc. of the 14th ACM SIGACT-SIGMOD-SIGACT Symposium on Principles of Database Systems (PODS'94), ACM Press, pp. 77-85, 1994.
[14]
R.J. Lipton, J.F. Naughton, D.A. Schneider, and S. Seshadri, Efficient sampling strategies for relational database operations, Theoretical Computer Science 116, pp. 195-226, 1993.
[15]
R.J. Lipton and J.F. Naughton, Query size estimation by adaptive sampling, Journal of Computer and System Science 51, pp. 18-25, 1995.
[16]
J.F. Lynch, Analysis and application of adaptive sampling, in Proc. of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'99), ACM Press, pp. 260-267, 1999.
[17]
O. Maron and A. Moore, Hoeffding races: accelerating model selection search for classification and function approximation, in Advances in Neural Information Processing Systems , Morgan Kaufmann, pp. 59-66, 1994.
[18]
A.W. Moore and M.S. Lee, Efficient algorithms for minimizing cross validation error, in Proc. of the 11th Intl. Conference on Machine Learning , Morgan Kauffman, pp. 190-198, 1994.
[19]
F.J. Provost, D. Jensen, and T. Oates, Efficient Progressive Sampling, in Proc. of the 5th ACM SIGKDD Intl. Conference on Knowledge Discovery and Data Mining , ACM Press, pp. 23-32, 1999.
[20]
T. Scheffer and S. Wrobel, A sequential sampling algorithm for a general class of utility criteria, in Proc. of the 6th ACM SIGKDD Intl. Conference on Knowledge Discovery and Data Mining , ACM Press, 2000, to appear.
[21]
T. Scheffer and S. Wrobel, Finding the most interesting patterns in a database quickly by using sequential sampling, Technical Report, University of Magdeburg, School of Computer Science, january 2001.
[22]
A. Wald, Sequential Analysis , John Wiley & Sons, 1947.
[23]
O. Watanabe, Simple sampling techniques for discovery science, IEICE Trans. Info. & Systems , E83-D (1), pp. 19-26, 2000.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
SAGA '01: Proceedings of the International Symposium on Stochastic Algorithms: Foundations and Applications
December 2001
202 pages
ISBN:3540430253

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 13 December 2001

Author Tags

  1. Chernoff bounds
  2. adaptive sampling
  3. data mining
  4. random sampling
  5. sequential sampling

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media