Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/777092.777117acmconferencesArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Progressive rademacher sampling

Published: 28 July 2002 Publication History

Abstract

Sampling can enhance processing of large training example databases, but without knowing all of the data, or the example producing process, it is impossible to know in advance what size of a sample to choose in order to guarantee good performance. Progressive sampling has been suggested to circumvent this problem. The idea in it is to increase the sample size according to some schedule until accuracy close to that which would be obtained using all of the data is reached. How to determine this stopping time efficiently and accurately is a central difficulty in progressive sampling.We study stopping time determination by approximating the generalization error of the hypothesis rather than by assuming the often observed shape for the learning curve and trying to detect whether the final plateau has been reached in the curve. We use data dependent generalization error bounds. Instead of using the common cross validation approach, we use the recently introduced Rademacher penalties, which have been observed to give good results on simple concept classes.We experiment with two-level decision trees built by the learning algorithm T2. It finds a hypothesis with the minimal error with respect to the sample. The theoretically well motivated stopping time determination based on Rademacher penalties gives results that are much closer to those attained using heuristics based on assumptions on learning curve shape than distribution independent estimates based on VC dimension do.

References

[1]
Auer, P.; Holte, R. C.; and Maass, W. 1995. Theory and Application of Agnostic PAC-Learning with Small Decision Trees. In Proceedings of the Twelfth International Conference on Machine Learning, 21-29. San Francisco, Calif.: Morgan Kaufmann.]]
[2]
Bartlett, P. L., and Mendelson, S. 2001. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. In Computational Learning Theory, Proceedings of the Fourteenth Annual Conference, 224-240. Lecture Notes in Artificial Intelligence 2111. Heidelberg: Springer.]]
[3]
Blake, C. L., and Merz, C. J. 1998. UCI Repository of Machine Learning Databases. Univ. of California, Irvine, Dept. of Information and Computer Science.]]
[4]
Catlett, J. 1991. Megainduction: A Test Flight. In Proceedings of the Eighth International Workshop on Machine Learning, 596-599. San Mateo, Calif.: Morgan Kaufmann.]]
[5]
Cortes, C.; Jackel, L. D.; Solla, S. A.; Vapnik, V.; and Denker J. S. 1994. Learning Curves: Asymptotic Values and Rate of Convergence. In Advances in Neural Information Processing Systems 6, 327-334. San Francisco, Calif.: Morgan Kaufmann.]]
[6]
Frey, L. J., and Fisher, D. H. 1999. Modeling Decision Tree Performance with the Power Law. In Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics, 59-65. San Francisco, Calif.: Morgan Kaufmann.]]
[7]
Fürnkranz, J. 1998. Integrative Windowing. Journal of Artificial Intelligence Research 8: 129-164.]]
[8]
Haussler, D.; Kearns, M.; Seung, H. S.; and Tishby, N. 1996. Rigorous Learning Curve Bounds from Statistical Mechanics. Machine Learning 25(2-3): 195-236.]]
[9]
John, G., and Langley, P. 1996. Static versus Dynamic Sampling for Data Mining. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 367-370. Menlo Park, Calif.: AAAI Press.]]
[10]
Kivinen, J., and Mannila, H. 1994. The Power of Sampling in Knowledge Discovery. In Proceedings of the Thirteenth ACM Symposium on Principles of Database Systems, 77-85. New York, NY: ACM Press.]]
[11]
Koltchinskii, V. 2001. Rademacher Penalties and Structural Risk Minimization. IEEE Transactions on Information Theory 47(5): 1902-1914.]]
[12]
Koltchinskii, V.; Abdallah, C. T.; Ariola, M.; Dorato, P.; and Panchenko, D. 2000. Improved Sample Complexity Estimates for Statistical Learning Control of Uncertain Systems. IEEE Transactions on Automatic Control 45(12): 2383-2388.]]
[13]
Lozano, F. 2000. Model Selection Using Rademacher Penalization. In Proceedings of the Second ICSC Symposium on Neural Networks. Berlin, Germany: ICSC Academic.]]
[14]
Oates, T., and Jensen, D. 1997. The Effects of Training Set Size on Decision Tree Complexity. In Proceedings of the Fourteenth International Conference on Machine Learning , 254-261. San Francisco, Calif.: Morgan Kaufmann.]]
[15]
Provost, F.; Jensen, D.; and Oates, T. 1999. Efficient Progressive Sampling. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 23-32. New York, NY: ACM Press.]]
[16]
Quinlan, J. R. 1983. Learning Efficient Classification Procedures and Their Application to Chess End Games. In Michalski, R. S.; Carbonell, J. G.; and Mitchell, T. M., eds., Machine Learning: An Artificial Intelligence Approach, 463-482. Palo Alto, Calif.: Tioga.]]
[17]
Quinlan, J. R. 1993. C4.5: Programs for Machine Learning . San Francisco, Calif.: Morgan Kaufmann.]]
[18]
Scheffer, T., and Wrobel, S. 2000. A Sequential Sampling Algorithm for a General Class of Utility Criteria. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 330-334. New York, NY: ACM Press.]]
[19]
Schuurmans, D., and Greiner, R. 1995. Practical PAC Learning. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1169-1175. Menlo Park, Calif.: International Joint Conferences on Artificial Intelligence, Inc.]]
[20]
Toivonen, H. 1996. Sampling Large Databases for Association Rules. In Proceedings of the Twenty-Second International Conference on Very Large Databases, 134-145. San Francisco, Calif.: Morgan Kaufmann.]]
[21]
Van der Vaart, A. W., and Wellner, J. A. 2000. Weak Convergence and Empirical Processes. Corrected second printing. New York, NY: Springer-Verlag.]]
[22]
Vapnik, V. N. 1998. Statistical Learning Theory. New York, NY: Wiley.]]

Cited By

View all
  • (2018)MiSoSouPProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3219989(2130-2139)Online publication date: 19-Jul-2018
  • (2018)ABRAACM Transactions on Knowledge Discovery from Data10.1145/320835112:5(1-38)Online publication date: 20-Jul-2018
  • (2018)A Session-Based Approach to Fast-But-Approximate Interactive Data Cube ExplorationACM Transactions on Knowledge Discovery from Data10.1145/307064812:1(1-26)Online publication date: 13-Feb-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
Eighteenth national conference on Artificial intelligence
July 2002
1068 pages
ISBN:0262511290

Sponsors

  • NSF: National Science Foundation
  • Alberta Informatics Circle of Research Excellence (iCORE)
  • SIGAI: ACM Special Interest Group on Artificial Intelligence
  • Naval Research Laboratory: Naval Research Laboratory
  • AAAI: American Association for Artificial Intelligence
  • NASA Ames Research Center: NASA Ames Research Center
  • DARPA: Defense Advanced Research Projects Agency

Publisher

American Association for Artificial Intelligence

United States

Publication History

Published: 28 July 2002

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2018)MiSoSouPProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3219989(2130-2139)Online publication date: 19-Jul-2018
  • (2018)ABRAACM Transactions on Knowledge Discovery from Data10.1145/320835112:5(1-38)Online publication date: 20-Jul-2018
  • (2018)A Session-Based Approach to Fast-But-Approximate Interactive Data Cube ExplorationACM Transactions on Knowledge Discovery from Data10.1145/307064812:1(1-26)Online publication date: 13-Feb-2018
  • (2016)ABRAProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2939672.2939770(1145-1154)Online publication date: 13-Aug-2016
  • (2015)Mining Frequent Itemsets through Progressive Sampling with Rademacher AveragesProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2783265(1005-1014)Online publication date: 10-Aug-2015
  • (2005)A dynamic adaptive sampling algorithm (DASA) for real world applicationsProceedings of the 15th international conference on Foundations of Intelligent Systems10.1007/11425274_65(631-640)Online publication date: 25-May-2005
  • (2004)Selective Rademacher Penalization and Reduced Error Pruning of Decision TreesThe Journal of Machine Learning Research10.5555/1005332.10446965(1107-1126)Online publication date: 1-Dec-2004

View Options

Get Access

Login options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media