Article

Diverse ensembles for active learning

Authors:

Raymond J. MooneyAuthors Info & Claims

ICML '04: Proceedings of the twenty-first international conference on Machine learning

Page 74

https://doi.org/10.1145/1015330.1015385

Published: 04 July 2004 Publication History

Abstract

Query by Committee is an effective approach to selective sampling in which disagreement amongst an ensemble of hypotheses is used to select data for labeling. Query by Bagging and Query by Boosting are two practical implementations of this approach that use Bagging and Boosting, respectively, to build the committees. For effective active learning, it is critical that the committee be made up of consistent hypotheses that are very different from each other. DECORATE is a recently developed method that directly constructs such diverse committees using artificial training data. This paper introduces ACTIVE-DECORATE, which uses DECORATE committees to select good training examples. Extensive experimental results demonstrate that, in general, ACTIVE-DECORATE outperforms both Query by Bagging and Query by Boosting.

References

[1]

Abe, N., & Mamitsuka, H. (1998). Query learning strategies using boosting and bagging. Proc. of 15th Intl. Conf. on Machine Learning (ICML-98) (pp. 1--10).

Digital Library

[2]

Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html.

[3]

Cohn, D., Atlas, L., & Ladner, R. (1994). Improving generalization with active learning. Machine Learning, 15, 201--221.

Digital Library

[4]

Cohn, D. A., Ghahramani, Z., & Jordan, M. I. (1996). Active learning with statistical models. Journal of Artificial Intelligence Research, 4, 129--145.

Digital Library

[5]

Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York, NY: Wiley.

Digital Library

[6]

Dagan, I., & Engelson, S. P. (1995). Committee-based sampling for training probabilistic classifiers. Proc. of 12th Intl. Conf. on Machine Learning (ICML-95) (pp. 150--157). San Francisco, CA: Morgan Kaufmann.

[7]

Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28, 133--168.

Digital Library

[8]

Gomez-Lopera, J. F., Martinez-Aroza, J., Robles-Perez, A. M., & Roman-Roldan, R. (2000). An analysis of edge detection by using the Jensen-Shannon divergence. Journal of Mathematical Imaging and Vision, 13, 35--56.

Digital Library

[9]

Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross validation and active learning. Advances in Neural Information Processing Systems 7.

[10]

Lewis, D. D., & Catlett, J. (1994). Heterogeneous uncertainty sampling for supervised learning. Proc. of 11th Intl. Conf. on Machine Learning (ICML-94) (pp. 148--156). San Francisco, CA: Morgan Kaufmann.

[11]

Liere, R., & Tadepalli, P. (1997). Active learning with committees for text categorization. Proc. of 14th Natl. Conf. on Artificial Intelligence (AAAI-97) (pp. 591--596). Providence, RI.

Digital Library

[12]

McCallum, A., & Nigam, K. (1998). Employing EM and pool-based active learning for text classification. Proc. of 15th Intl. Conf. on Machine Learning (ICML-98). Madison, WI: Morgan Kaufmann.

Digital Library

[13]

Melville, P., & Mooney, R. (2003). Constructing diverse classifier ensembles using artificial training examples. Proc. of 18th Intl. Joint Conf. on Artificial Intelligence (pp. 505--510). Acapulco, Mexico.

Digital Library

[14]

Melville, P., & Mooney, R. J. (2004). Creating diversity in ensembles using artificial data. Information Fusion: Special Issue on Diversity in Multiclassifier Systems.

[15]

Melville, P., Saar-Tsechansky, M., Provost, F., & Mooney, R. (2004). Active feature acquisition for classifier induction. Submitted for review. Available at http://www.cs.utexas.edu/users/ml/publication/.

Digital Library

[16]

Muslea, I., Minton, S., & Knoblock, C. A. (2000). Selective sampling with redundant views. Proc. of 17th Natl. Conf. on Artificial Intelligence (AAAI-2000) (pp. 621--626).

Digital Library

[17]

Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.

Digital Library

[18]

Roy, N., & McCallum, A. (2001). Toward optimal active learning through sampling estimation of error reduction. Proc. 18th Intl. Conf. on Machine Learning (pp. 441--448). Morgan Kaufmann, San Francisco, CA.

Digital Library

[19]

Saar-Tsechansky, M., & Provost, F. J. (2001). Active learning for class probability estimation and ranking. Proc. of 17th Intl. Joint Conf. on Artificial Intelligence (IJCAI-2001) (pp. 911--920).

Digital Library

[20]

Seung, H. S., Opper, M., & Sompolinsky, H. (1992). Query by committee. Proc. of the ACM Workshop on Computational Learning Theory. Pittsburgh, PA.

Digital Library

[21]

Witten, I. H., & Frank, E. (1999). Data mining: Practical machine learning tools and techniques with Java implementations. San Francisco: Morgan Kaufmann.

Digital Library

[22]

Zhu, X., Lafferty, J., & Ghahramani, Z. (2003). Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. Proc. of the ICML Workshop on the Continuum from Labeled to Unlabeled Data (pp. 58--65).

Cited By

Huang SLee GBao ZPan SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Cost-effective Data Labelling for Graph Neural NetworksProceedings of the ACM on Web Conference 202410.1145/3589334.3645339(353-364)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645339
Yuen KKuok S(2024)Telescopic broad Bayesian learning for big data streamComputer-Aided Civil and Infrastructure Engineering10.1111/mice.13305Online publication date: 24-Jul-2024
https://doi.org/10.1111/mice.13305
Ullah FUllah IKhan RKhan SKhan KPau G(2024)Conventional to Deep Ensemble Methods for Hyperspectral Image Classification: A Comprehensive SurveyIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2024.335355117(3878-3916)Online publication date: 2024
https://doi.org/10.1109/JSTARS.2024.3353551
Show More Cited By

Recommendations

Active Learning with Adaptive Heterogeneous Ensembles
ICDM '09: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining

One common approach to active learning is to iteratively train a single classifier by choosing data points based on its uncertainty, but it is nontrivial to design uncertainty measures unbiased by the choice of classifier. Query by committee suggests ...
Active learning with misclassification sampling using diverse ensembles enhanced by unlabeled instances
PAKDD'08: Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining

Active learners can significantly reduce the number of labeled training instances to learn a classification function by actively selecting only the most informative instances for labeling. Most existing methods try to select the instances which could ...
Cost-effective active learning from diverse labelers
IJCAI'17: Proceedings of the 26th International Joint Conference on Artificial Intelligence

In traditional active learning, there is only one labeler that always returns the ground truth of queried labels. However, in many applications, multiple labelers are available to offer diverse qualities of labeling with different costs. In this paper, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '04: Proceedings of the twenty-first international conference on Machine learning

July 2004

934 pages

ISBN:1581138385

DOI:10.1145/1015330

Conference Chair:
Carla Brodley
Purdue University/Tufts University

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 July 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

192
Total Citations
View Citations
1,832
Total Downloads

Downloads (Last 12 months)105
Downloads (Last 6 weeks)7

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Huang SLee GBao ZPan SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Cost-effective Data Labelling for Graph Neural NetworksProceedings of the ACM on Web Conference 202410.1145/3589334.3645339(353-364)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645339
Yuen KKuok S(2024)Telescopic broad Bayesian learning for big data streamComputer-Aided Civil and Infrastructure Engineering10.1111/mice.13305Online publication date: 24-Jul-2024
https://doi.org/10.1111/mice.13305
Ullah FUllah IKhan RKhan SKhan KPau G(2024)Conventional to Deep Ensemble Methods for Hyperspectral Image Classification: A Comprehensive SurveyIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2024.335355117(3878-3916)Online publication date: 2024
https://doi.org/10.1109/JSTARS.2024.3353551
Zhang WWang YYou ZLi YCao GYang ZCui B(2024)NC-ALG: Graph-Based Active Learning Under Noisy Crowd2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00210(2681-2694)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00210
Fernández ASegarra PSanchidrián JNavarro R(2024)Ore/waste identification in underground mining through geochemical calibration of drilling data using machine learning techniquesOre Geology Reviews10.1016/j.oregeorev.2024.106045168(106045)Online publication date: May-2024
https://doi.org/10.1016/j.oregeorev.2024.106045
Flesca SMandaglio DScala FTagarelli A(2024)A meta-active learning approach exploiting instance importanceExpert Systems with Applications10.1016/j.eswa.2024.123320247(123320)Online publication date: Aug-2024
https://doi.org/10.1016/j.eswa.2024.123320
Chen YDeierling PXiao S(2024)Exploring active learning strategies for predictive models in mechanics of materialsApplied Physics A10.1007/s00339-024-07728-9130:8Online publication date: 24-Jul-2024
https://doi.org/10.1007/s00339-024-07728-9
Li YLi ZLi WShao YZheng YHao JElkind E(2023)Generative flow networks for precise reward-oriented active learning on graphsProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/438(3939-3947)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/438
Yin TPanapitiya GCoda ESaldanha E(2023)Evaluating uncertainty-based active learning for accelerating the generalization of molecular property predictionJournal of Cheminformatics10.1186/s13321-023-00753-515:1Online publication date: 8-Nov-2023
https://doi.org/10.1186/s13321-023-00753-5
Smith RDragone M(2023)Generalisable Dialogue-based Approach for Active Learning of Activities of Daily LivingACM Transactions on Interactive Intelligent Systems10.1145/361601713:3(1-37)Online publication date: 11-Sep-2023
https://dl.acm.org/doi/10.1145/3616017
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents