Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1015330.1015385acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Diverse ensembles for active learning

Published: 04 July 2004 Publication History
  • Get Citation Alerts
  • Abstract

    Query by Committee is an effective approach to selective sampling in which disagreement amongst an ensemble of hypotheses is used to select data for labeling. Query by Bagging and Query by Boosting are two practical implementations of this approach that use Bagging and Boosting, respectively, to build the committees. For effective active learning, it is critical that the committee be made up of consistent hypotheses that are very different from each other. DECORATE is a recently developed method that directly constructs such diverse committees using artificial training data. This paper introduces ACTIVE-DECORATE, which uses DECORATE committees to select good training examples. Extensive experimental results demonstrate that, in general, ACTIVE-DECORATE outperforms both Query by Bagging and Query by Boosting.

    References

    [1]
    Abe, N., & Mamitsuka, H. (1998). Query learning strategies using boosting and bagging. Proc. of 15th Intl. Conf. on Machine Learning (ICML-98) (pp. 1--10).
    [2]
    Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html.
    [3]
    Cohn, D., Atlas, L., & Ladner, R. (1994). Improving generalization with active learning. Machine Learning, 15, 201--221.
    [4]
    Cohn, D. A., Ghahramani, Z., & Jordan, M. I. (1996). Active learning with statistical models. Journal of Artificial Intelligence Research, 4, 129--145.
    [5]
    Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York, NY: Wiley.
    [6]
    Dagan, I., & Engelson, S. P. (1995). Committee-based sampling for training probabilistic classifiers. Proc. of 12th Intl. Conf. on Machine Learning (ICML-95) (pp. 150--157). San Francisco, CA: Morgan Kaufmann.
    [7]
    Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28, 133--168.
    [8]
    Gomez-Lopera, J. F., Martinez-Aroza, J., Robles-Perez, A. M., & Roman-Roldan, R. (2000). An analysis of edge detection by using the Jensen-Shannon divergence. Journal of Mathematical Imaging and Vision, 13, 35--56.
    [9]
    Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross validation and active learning. Advances in Neural Information Processing Systems 7.
    [10]
    Lewis, D. D., & Catlett, J. (1994). Heterogeneous uncertainty sampling for supervised learning. Proc. of 11th Intl. Conf. on Machine Learning (ICML-94) (pp. 148--156). San Francisco, CA: Morgan Kaufmann.
    [11]
    Liere, R., & Tadepalli, P. (1997). Active learning with committees for text categorization. Proc. of 14th Natl. Conf. on Artificial Intelligence (AAAI-97) (pp. 591--596). Providence, RI.
    [12]
    McCallum, A., & Nigam, K. (1998). Employing EM and pool-based active learning for text classification. Proc. of 15th Intl. Conf. on Machine Learning (ICML-98). Madison, WI: Morgan Kaufmann.
    [13]
    Melville, P., & Mooney, R. (2003). Constructing diverse classifier ensembles using artificial training examples. Proc. of 18th Intl. Joint Conf. on Artificial Intelligence (pp. 505--510). Acapulco, Mexico.
    [14]
    Melville, P., & Mooney, R. J. (2004). Creating diversity in ensembles using artificial data. Information Fusion: Special Issue on Diversity in Multiclassifier Systems.
    [15]
    Melville, P., Saar-Tsechansky, M., Provost, F., & Mooney, R. (2004). Active feature acquisition for classifier induction. Submitted for review. Available at http://www.cs.utexas.edu/users/ml/publication/.
    [16]
    Muslea, I., Minton, S., & Knoblock, C. A. (2000). Selective sampling with redundant views. Proc. of 17th Natl. Conf. on Artificial Intelligence (AAAI-2000) (pp. 621--626).
    [17]
    Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.
    [18]
    Roy, N., & McCallum, A. (2001). Toward optimal active learning through sampling estimation of error reduction. Proc. 18th Intl. Conf. on Machine Learning (pp. 441--448). Morgan Kaufmann, San Francisco, CA.
    [19]
    Saar-Tsechansky, M., & Provost, F. J. (2001). Active learning for class probability estimation and ranking. Proc. of 17th Intl. Joint Conf. on Artificial Intelligence (IJCAI-2001) (pp. 911--920).
    [20]
    Seung, H. S., Opper, M., & Sompolinsky, H. (1992). Query by committee. Proc. of the ACM Workshop on Computational Learning Theory. Pittsburgh, PA.
    [21]
    Witten, I. H., & Frank, E. (1999). Data mining: Practical machine learning tools and techniques with Java implementations. San Francisco: Morgan Kaufmann.
    [22]
    Zhu, X., Lafferty, J., & Ghahramani, Z. (2003). Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. Proc. of the ICML Workshop on the Continuum from Labeled to Unlabeled Data (pp. 58--65).

    Cited By

    View all
    • (2024)Cost-effective Data Labelling for Graph Neural NetworksProceedings of the ACM on Web Conference 202410.1145/3589334.3645339(353-364)Online publication date: 13-May-2024
    • (2024)Telescopic broad Bayesian learning for big data streamComputer-Aided Civil and Infrastructure Engineering10.1111/mice.13305Online publication date: 24-Jul-2024
    • (2024)Conventional to Deep Ensemble Methods for Hyperspectral Image Classification: A Comprehensive SurveyIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2024.335355117(3878-3916)Online publication date: 2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICML '04: Proceedings of the twenty-first international conference on Machine learning
    July 2004
    934 pages
    ISBN:1581138385
    DOI:10.1145/1015330
    • Conference Chair:
    • Carla Brodley
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 July 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate 140 of 548 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)105
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Cost-effective Data Labelling for Graph Neural NetworksProceedings of the ACM on Web Conference 202410.1145/3589334.3645339(353-364)Online publication date: 13-May-2024
    • (2024)Telescopic broad Bayesian learning for big data streamComputer-Aided Civil and Infrastructure Engineering10.1111/mice.13305Online publication date: 24-Jul-2024
    • (2024)Conventional to Deep Ensemble Methods for Hyperspectral Image Classification: A Comprehensive SurveyIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2024.335355117(3878-3916)Online publication date: 2024
    • (2024)NC-ALG: Graph-Based Active Learning Under Noisy Crowd2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00210(2681-2694)Online publication date: 13-May-2024
    • (2024)Ore/waste identification in underground mining through geochemical calibration of drilling data using machine learning techniquesOre Geology Reviews10.1016/j.oregeorev.2024.106045168(106045)Online publication date: May-2024
    • (2024)A meta-active learning approach exploiting instance importanceExpert Systems with Applications10.1016/j.eswa.2024.123320247(123320)Online publication date: Aug-2024
    • (2024)Exploring active learning strategies for predictive models in mechanics of materialsApplied Physics A10.1007/s00339-024-07728-9130:8Online publication date: 24-Jul-2024
    • (2023)Generative flow networks for precise reward-oriented active learning on graphsProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/438(3939-3947)Online publication date: 19-Aug-2023
    • (2023)Evaluating uncertainty-based active learning for accelerating the generalization of molecular property predictionJournal of Cheminformatics10.1186/s13321-023-00753-515:1Online publication date: 8-Nov-2023
    • (2023)Generalisable Dialogue-based Approach for Active Learning of Activities of Daily LivingACM Transactions on Interactive Intelligent Systems10.1145/361601713:3(1-37)Online publication date: 11-Sep-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media