research-article

Combining active learning and semi-supervised for improving learning performance

Authors:

Jin WangAuthors Info & Claims

ISABEL '11: Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication Technologies

Article No.: 173, Pages 1 - 5

https://doi.org/10.1145/2093698.2093871

Published: 26 October 2011 Publication History

Abstract

In many learning tasks, there are abundant unlabeled samples but the number of labeled training samples is limited, because labeling the samples requires the efforts of human annotators and expertise. There are three major techniques for labeling the samples: semi-supervised learning, transductive learning and active learning. Semi-supervised and transductive learning deal with methods for automated exploiting unlabeled samples in addition to improve learning performance. Active learning deals with methods that assume the learner has control over the whole input space. So combing the advantage of semi-supervised learning and active learning is a practical technique for improving the learning performance. In this paper, a general framework of combing (Active Learning) AL and (Semi-Supervised Learning) SSL algorithms is proposed. Then the ensemble learning for combing AL and SSL algorithms is introduced, which is denoted by ASC (AL and SSL by Committee). At last, the ensemble learning and confidence measure of the ASC is discussed.

References

[1]

Zhou, Z. H. 2006. Learning with unlabeled data and its application to image retrieval. In Proceedings of the 9th Pacific rim international conference on artificial intelligence (Guilin, China, August 7--11, 2006). Springer, Heidelberg, 5--10. DOI=http://dx.doi.org/10.1007/978-3-540-36668-3_3.

Digital Library

[2]

Nigam, K. and Ghani, R. 2000. Analyzing the effectiveness and applicability of Co-Training. In Proceedings of the 9th International Conference on Information and Knowledge Management. (McLean, USA, Nov. 6--11, 2000). ACM, New York, NY, 86--93. DOI=http://dx.doi.org/10.1145/354756.354805.

Digital Library

[3]

Blum, A. and Mitchell, T. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory. (Madison, WI, July 24--26, 1998). ACM, New York, 92--100. DOI=http://dx.doi.org/10.1145/279943.279962.

Digital Library

[4]

Nigam, K. 2001. Using Unlabeled Data to Improve Text Classification. Doctoral Thesis. Carnegie Mellon University Computer Science Dept. DOI=http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.2.4771&rep=rep1&type=pdf.

Digital Library

[5]

Zhou, Z. H. and Li, M. 2005. Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering. 17, 11 (Nov. 2005), 1529--1541. DOI=http://dx.doi.org/10.1109/TKDE.2005.186.

Digital Library

[6]

Li, M. and Zhou, Z. H. 2007. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans. Systems, Man and Cybernetics- Part A: Systems and Humans, 37, 6 (Nov. 2007), 1088--1098. DOI=http://dx.doi.org/10.1109/TSMCA.2007.904745.

Digital Library

[7]

Szummer, M. and Jaakkola, T. 2001. Partially labeled classification with markov random walks. In Proceedings of Advances in Neural Information Processing Systems. (Cambridge, MA, 2001). MIT Press, 945--952. DOI=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.1955.

[8]

Blum, A. and Chawla, S. 2001. Learning from Labeled and Unlabeled Data using Graph Mincuts. In Proceedings of the 18th International Conference on Machine Learning (San Francisco, CA, USA, 2001). Morgan Kaufmann Publishers, 19--26. DOI=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.3957.

Digital Library

[9]

Lewis, D. D. and Gale, A. W. 1994. A sequential algorithm for training text classifiers. In Proceedings of the Special Interest Group on Information Retrieval (Dublin, Ireland, July 3--6, 1994). ACM, New York, 3--12. DOI=http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.3103.

Digital Library

[10]

Muslea, I., Minton, S., and Knoblock, C. A. 2000. Selective sampling with redundant views. In Proceedings of the 17th National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence (Austin, USA, July 30--August 3, 2000). AAAI Press, 621--626. DOI=http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.71.9637&rep=rep1&type=pdf.

Digital Library

[11]

Freund, Y., Seung, H. S., Shamir, E., and Tishby, N. 1997. Selective sampling using the query by committee algorithm. Machine Learning. 28 (1997), 133--168. DOI=http://dx.doi.org/10.1023/A:1007330508534.

Digital Library

[12]

Wang, J. and Luo, S. W. 2006. Semi-supervised classification with active query selection. In Proceedings of Structural, syntactic, and statistical pattern recognition (Hong Kong, China, August 17--19, 2006). Springer, 741--746. DOI=http://dx.doi.org/10.1007/11815921_81.

Digital Library

[13]

Zhou, Z. H., Chen, K. J., and Jiang, Y. 2004. Exploiting unlabeled data in content-based image retrieval. In Proceedings of the 15th European Conference on Machine Learning (Pisa, Italy, Sept. 20--24, 2004). Springer, 525--536. DOI=http://dx.doi.org/10.1007/978-3-540-30115-8_48.

[14]

Muslea, I., Minton, S., and Knoblock, C. A. 2002. Active + semi-supervised learning = robust multi-view learning. In Proceedings of the 19th International Conference on Machine Learning (Sydney, Australia, July 8--12, 2002). Morgan Kaufmann Publisher, 435--442. DOI=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=?doi=10.1.1.18.4294.

Digital Library

[15]

McCallum, A. K. and Nigam, K. 1998. Employing EM and pool-based active learning for text classification. In Proceedings of the 15th International Conference on Machine Learning (Madison, USA, July 24--27, 1998). Morgan Kaufmann Publisher, 350--358. DOI=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.10.

Digital Library

[16]

Belkin, M., Niyogi, P., and Sindhwani, V. 2004. On Manifold Regularization. Technical Report. Department of Computer Science, University of Chicago.

[17]

Quinlan, J. R. 1996. Bagging, boosting, and C4.5. In Proceedings of the 13th National Conference on Artificial Intelligence (Portland, OR, 1996). AAAI Press, 725--730. DOI=http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.49.2457.

Digital Library

[18]

Wolpert, D. 1992. Stacked generalization. Neural Networks, 5, 2 (1992), 241--259. DOI=http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.133.8090.

Digital Library

[19]

Freund, Y. and Schapire, R. E.1995. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the 2nd European Conference on Computational Learning Theory (1995). Springer, 23--37. DOI=http://dx.doi.org/10.1006/jcss.1997.1504.

Digital Library

[20]

Breiman, L.1998. Arcing classifiers. Annals of Statistics, 26, 3 (1998), 801--849. DOI=http://dx.doi.org/10.1214/aos/1024691079.

[21]

Harries, M. 1999. Boosting a strong learner: evidence against the minimum margin. In Proceedings of the 16th International Conference on Machine Learning (Bled, Slovenia, June 27--30, 1999). Morgan Kaufmann Publisher, 171--179. DOI=http://dx.doi.org/10.1007/3-540-39205-X_81.

Digital Library

[22]

Quinlan, J. R. 1999. Miniboosting decision trees. Machine Learning. (July 1999), 81--106. DOI=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.52.2154.

[23]

Webb, G. I. 2000. MultiBoosting: a technique for combining boosting and wagging. Machine Learning, 40, 2 (2000), 159--196. DOI=http://dx.doi.org/10.1023/A:1007659514849.

Digital Library

[24]

Harries, M. 1999. Boosting a strong learner: evidence against the minimum margin. In Proceedings of the 16th International Conference on Machine Learning (Bled, Slovenia, June 27--30, 1999). Morgan Kaufmann Publisher, 171--179. DOI=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.40.5522.

Digital Library

[25]

Efron, B. and Tibshirani, R. 1993. An Introduction to the Bootstrap. Chapman & Hall, New York (1993). DOI=http://dx.doi.org/10.1111/1467-9639.00050.

[26]

Zhou, Z. H. and Tang, W. 2003. Selective Ensemble of Decision Trees. In proceedings of the 9th international conference on Rough sets, fuzzy sets, data mining, and granular computing (April 2003). Springer, 476--483. DOI=http://dx.doi.org/10.1007/3-540-39205-X_81.

Digital Library

[27]

Wang, L. and Y, Y. 2009. Selective Ensemble Algorithms of Support Vector Machines Based on Constraint Projection. Lecture Notes in Computer Science, 5552 (2009), 287--295. DOI=http://dx.doi.org/10.1007/978-3-642-01510-6_33.

Digital Library

[28]

Zhou, Z. H., Wu, J. X, and Tang, W. 2001. Ensembling neural networks: Many could be better than all. Artificial Intelligence, 137, (May, 2002), 239--263. DOI=http://dx.doi.org/10.1016/S0004-3702(02)00190-X.

Digital Library

[29]

Provost, F. J. and Domingos, P. 2003. Tree induction for probability-based ranking. Machine Learning, 52, 30 (2003), 199--215. DOI=http://dx.doi.org/10.1023/A:1024099825458.

Digital Library

[30]

Liang, H. and Yan, Y. 2006. Improve decision trees for probability-based ranking by lazy learners. In Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence (Arlington, VA, Nov.13--15, 2006). IEEE, 427--435. DOI=http://dx.doi.org/10.1109/ICTAI.2006.65.

Digital Library

[31]

Abdel, H. M. and Schwenker, F. 2010. Combining committee-based semi-supervised and active learning. Journal of computer and science and technology, 25, 4(July 2010), 681--698. DOI=http://dx.doi.org/10.1007/s11390-010-1053-z

Digital Library

[32]

Witten, I. H. and Frank, E. 1999. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, October, 1999.

Digital Library

[33]

Blake, C. and Merz, C. J. 1998. UCI repository of machine learning databases. University of California, http://www.ics.uci.edu/learn/MLRepository.html, 1998

Cited By

Masolele RMarcos DDe Sy VAbu IVerbesselt JReiche JHerold M(2024)Mapping the diversity of land uses following deforestation across AfricaScientific Reports10.1038/s41598-024-52138-914:1Online publication date: 19-Jan-2024
https://doi.org/10.1038/s41598-024-52138-9
Guo LWang SChen HShi Q(2020)A Load Identification Method Based on Active Deep Learning and Discrete Wavelet TransformIEEE Access10.1109/ACCESS.2020.3003778(1-1)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3003778

Index Terms

Combining active learning and semi-supervised for improving learning performance
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees

Recommendations

Semi-supervised Dictionary Active Learning for Pattern Classification
Pattern Recognition and Computer Vision
Abstract
Gathering labeled data is one of the most time-consuming and expensive tasks in supervised machine learning. In practical applications, there are usually quite limited labeled training samples but abundant unlabeled data that is easy to collect. ...
Consistency-Based Semi-supervised Active Learning: Towards Minimizing Labeling Cost
Computer Vision – ECCV 2020
Abstract
Active learning (AL) combines data labeling and model training to minimize the labeling cost by prioritizing the selection of high value data that can best improve model performance. In pool-based active learning, accessible unlabeled data are not ...
Combining committee-based semi-supervised learning and active learning

Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this problem by using unlabeled data together ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ISABEL '11: Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication Technologies

October 2011

949 pages

ISBN:9781450309134

DOI:10.1145/2093698

Conference Chair:
Simone Frattasi,
Program Chair:
Nicola Marchetti

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Universitat Pompeu Fabra
IEEE
Technical University of Catalonia Spain: Technical University of Catalonia (UPC), Spain
River Publishers: River Publishers
CTTC: Technological Center for Telecommunications of Catalonia
CTIF: Kyranova Ltd, Center for TeleInFrastruktur

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

ISABEL '11

Sponsor:

Technical University of Catalonia Spain
River Publishers
CTTC
CTIF

ISABEL '11: International Symposium on Applied Sciences in Biomedical and Communication Technologies

October 26 - 29, 2011

Barcelona, Spain

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
167
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Masolele RMarcos DDe Sy VAbu IVerbesselt JReiche JHerold M(2024)Mapping the diversity of land uses following deforestation across AfricaScientific Reports10.1038/s41598-024-52138-914:1Online publication date: 19-Jan-2024
https://doi.org/10.1038/s41598-024-52138-9
Guo LWang SChen HShi Q(2020)A Load Identification Method Based on Active Deep Learning and Discrete Wavelet TransformIEEE Access10.1109/ACCESS.2020.3003778(1-1)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3003778

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten