Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Active learning in multimedia annotation and retrieval: A survey

Published: 24 February 2011 Publication History

Abstract

Active learning is a machine learning technique that selects the most informative samples for labeling and uses them as training data. It has been widely explored in multimedia research community for its capability of reducing human annotation effort. In this article, we provide a survey on the efforts of leveraging active learning in multimedia annotation and retrieval. We mainly focus on two application domains: image/video annotation and content-based image retrieval. We first briefly introduce the principle of active learning and then we analyze the sample selection criteria. We categorize the existing sample selection strategies used in multimedia annotation and retrieval into five criteria: risk reduction, uncertainty, diversity, density and relevance. We then introduce several classification models used in active learning-based multimedia annotation and retrieval, including semi-supervised learning, multilabel learning and multiple instance learning. We also provide a discussion on several future trends in this research direction. In particular, we discuss cost analysis of human annotation and large-scale interactive multimedia annotation.

References

[1]
Ahn, L. and Dabbish, L. 2004. Labeling images with a computer game. In Proceedings of ACM CHI.
[2]
Ahn, L., Liu, R., and Blum, M. 2006. Peekaboom: A game for locating objects in images. In Proceedings of the ACM Conference on Human Factors in Computing Systems.
[3]
Ahn, L., Maurer, B., Mcmillen, C., Abraham, D., and Blum, M. 2008. Recaptcha: Human-based character recognition via web security measures. Science.
[4]
Andrews, S., Tsochantaridis, I., and Hofmann, T. 2002. Support vector machines for multiple-instance learning. In Proceedings of the Neural Information Processing Systems.
[5]
Angluin, D. 1998. Queries and concept learning. Mach. Learn. 2.
[6]
Ayache, S. and Quénot, G. 2007. Evaluation of active learning strategies for video indexing. In Proceedings of International Workshop on Content-Based Multimedia Indexing.
[7]
Bao, L., Cao, J., Xia, T., Zhang, Y., and Li, J. 2009. Locally non-negative linear structure learning for interactive image retrieval. In Proceedings of ACM Multimedia.
[8]
Berger, A., Pietra, S. D., and Pietra, V. D. 1996. A maximum entropy approach to natural language processing. Computat. Linguistics 22, 1.
[9]
Blum, A. and Mitchell, T. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of Workshop on Computational Learning Theory.
[10]
Brinker, K. 2003. Incorporating diversity in active learning with support vector machines. In Proceedings of the International Conference on Machine Learning.
[11]
Cauwenberghs, G. and Poggio, T. 2000. Incremental and decremental support vector machine learning. In Proceedings of Neural Information Processing Systems.
[12]
Chapelle, O., Zien, A., and Schölkopf, B. 2006. Semi-Supervised Learning. MIT Press.
[13]
Chen, M., Christel, M., Hauptmann, A., and Wactlar, H. 2005. Putting active learning into multimedia applications: dynamic definition and refinement of concept classifiers. In Proceedings of ACM Multimedia.
[14]
Cohen, D. A., Ghahramani, Z., and Jordan, M. I. 1996. Active learning with statistical models. J. Artif. Intell. Res.
[15]
Cohn, D., Atlas, L., and Ladner, R. 1994. Improving generalization with active learning. Mach. Learn. 15, 2.
[16]
Collins, B., Deng, J., Li, K., and Fei-Fei, L. 1995. Towards scalable dataset construction: an active learning approach. In Proceedings of the European Conference on Machine Learning.
[17]
Dagan, I. and Engselon, S. 1995. Committee-based sampling for training probabilistic classifiers. In Proceedings of the International Conference on Machine Learning.
[18]
Dagli, C. K., Rajaram, S., and Huang, T. S. 2006. Leveraging active learning for relevance feedback using an information-theoretic diversity measure. In Proceedings of the International Conference on Image and Video Retrieval.
[19]
Dietterich, T. G., Lathrop, R. H., and Lozano-Perez, T. 1997. Solving the multiple-instance problem with axis-parallel rectangles. Artif. Intell.
[20]
Foote, J. 1997. Content-based retrieval of music and audio. In Proceedings of SPIE Multimedia Storage Archiving Systems II.
[21]
Freund, M., Seung, H. S., Shamir, E., and Tishby, N. 1997. Selective sampling using the query by committee algorithm. Mach. Learn. 28.
[22]
Geng, B., Yang, L., Zha, Z. J., Xu, C., and Hua, X. S. 2008. Unbiased active learning for image retrieval. In Proceedings of the International Conference on Multimedia & Expo.
[23]
Goh, K. S., Chang, E. Y., and Lai, W. C. 2004. Multimodal concept-dependent active learning for image trieval. In Proceedings of ACM Multimedia.
[24]
Gosselin, P. H. and Cord, M. 2004. A comparison of active classification methods for content-based image retrieval. In Proceedings of the International Workshop on Computer Vision Meets Databases.
[25]
Hakkani-tur, D., Riccardi, G., and Gorin, A. 2002. Active learning for automatic speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing.
[26]
Hanneke, S. and Yang, L. 2010. Negative results for active learning with convex losses. In Proceedings of the International Conference on Artificial Intelligence and Statistics.
[27]
He, J. R., Li, M., Zhang, H. J., Tong, H., and Zhang, C. 2004. Mean version space: a new active learning method for content-based image retrieval. In Proceedings of the ACM Workshop on Multimedia Information Retrieval.
[28]
Hoi, S. C., Jin, R., Zhu, J., and Lyu, M. R. 2006. Batch mode active learning and its application to medical image classification. In Proceedings of the International Conference on Machine Learning.
[29]
Hoi, S. C. and Lyu, M. R. 2005. A semi-supervised active learning framework for image retrieval. In Proceedings of the International Conference on Computer Vision and Pattern Recognition.
[30]
Hsieh, C. J., Chang, K. W., Lin, C. J., Keerthi, S. S., and Sundararajan, S. 2008. A dual coordinate descent method for large-scale linear svm. In Proceedings of the International Conference on Machine Learning.
[31]
Hua, X. and Qi, G. J. 2008. Online multi-label active annotation: towards large-scale content-based video search. In Proceedings of ACM Multimedia.
[32]
Huang, T. S., Dagli, C. K., Rajaram, S., Chang, E. Y., Mandel, M. I., Poliner, G. E., and Ellis, D. P. W. 2008. Active learning for interactive multimedia retrieval. Proc. IEEE 96, 4.
[33]
Jain, P. and Kapoor, A. 2009. Active learning for large multi-class problems. In Proceedings of the International Conference on Computer Vision and Pattern Recognition.
[34]
Joshi, A. J., Porikli, F., and Papanikolopoulos, N. 2009. Multi-class active learning for image classification. In Proceedings of the International Conference on Computer Vision and Pattern Recognition.
[35]
King, R. D., Whelan, K. E., Jones, F. M., Reiser, P. G., Bryant, C. H., Muggleton, S. H., Kell, D. B., and Oliver, S. G. 2004. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature 427, 6971.
[36]
Li, J. and Wang, J. 2008. Real-time computerized annotation of pictures. IEEE Trans. Patt. Anal. Mach. Intell. 30, 6.
[37]
Lin, C., Tseng, B., and Smith, J. R. 2003. VideoAnnEx: IBM MPEG-7 annotation tool for multimedia indexing and concept learning. In Proceedings of the International Conference on Multimedia & Expo.
[38]
Mandel, M., Poliner, G., and Ellis, D. 2006. Support vector machine active learning for music retrieval. Multimed. Syst. 12, 1.
[39]
Maronand, O. and Ratan, A. L. 1998. Multiple-instance learning for natural scene classification. In Proceedings of the International Conference on Machine Learning.
[40]
Mitchell, T. 1982. Generalization as search. Artif. Intell.
[41]
Muslea, I., Minton, S., and Knoblock, C. A. 2002. Active + semi-supervised learning = robust multi-view learning. In Proceedings of the International Conference on Machine Learning.
[42]
Naphade, M. and Smith, J. R. 2004a. Active learning for simultaneous annotation of multiple binary semantic concepts. In Proceedings of the International Conference on Image Processing.
[43]
Naphade, M. R. and Smith, J. R. 2004b. On the detection of semantic concepts at TRECVID. In Proceedings of the ACM Multimedia.
[44]
Nguyen, H. T. and Smeulders, A. 2004. Active learning using pre-clustering. In Proceedings of the International Conference on Machine Learning.
[45]
Olsson, F. 2009. A literature survey of active machine learning in the context of natural language processing. SICS Tech. rep., Swedish Institute of Computer Science.
[46]
Panda, N., Chang, E. Y., and Wu, G. 2006a. Concept boundary detection for speeding up SVMs. In Proceedings of the International Conference on Machine Learning.
[47]
Panda, N., Goh, K., and Chang, E. Y. 2006b. Active learning in very large image databases. J. Multimed. Tools Appl.
[48]
Parzen, E. 1962. On the estimation of a probability density function and the mode. Ann. Math. Stat. 33.
[49]
Qi, G., Hua, X. S., Rui, Y., Tang, J., Mei, T., and Zhang, H. J. 2007. Correlative multi-label video annotation. In Proceedings of ACM Multimedia.
[50]
Qi, G., Hua, X. S., Rui, Y., Tang, J., and Zhang, H. J. 2009. Two-dimensional multilabel active learning with an efficient online adaptation model for image classification. IEEE Trans. Patt. Anal. Mach. Intell. 31.
[51]
Qi, G., Song, Y., Hua, X. S., Zhang, H. J., and Dai, L. R. 2004. Video annotation by active learning and cluster tuning. In Proceedings of the CVPR Workshop.
[52]
Redner, R. and Walker, H. 1984. Mixture densities, maximum likelihood and the em algorithm. SIAM Rev. 26, 2.
[53]
Roy, N. and McCallum, A. 2001. Toward optimal active learning through monte carlo estimation of error reduction. In Proceedings of the International Conference on Machine Learning.
[54]
Rui, Y., Huang, T. S., and Chang, S. F. 1999. Image retrieval: current techniques, promising directions and open issues. J. Vis. Comm. Image Rep. 10, 4.
[55]
Rui, Y., Huang, T. S., Ortega, M., and Mehrotra, S. 1998. Relevance feedback: a power tool in interactive content-based image retrieval. IEEE Trans. Circuits Syst. Video Tech. 8, 5.
[56]
Sahbi, H., Etyngier, P., Audibert, J., and Keriven, R. 2008. Manifold learning using robust graph Laplacian for interactive image search. In Proceedings of the International Conference on Computer Vision and Pattern Recognition.
[57]
Settles, B. 2009. Active learning literature survey. Computer Sciences Tech. rep., University of Wisconsin-Madison.
[58]
Settles, B., Craven, M., and Friedland, L. 2008. Active learning with real annotation costs. In Proceedings of the NIPS Workshop on Cost-Sensitive Learning.
[59]
Settles, B., Craven, M., and Ray, S. 2007. Multiple-instance active learning. In Proceedings of Neural Information Processing Systems.
[60]
Seung, H. S., Opper, M., and Sompolinsky, H. 1992. Query by committee. In Proceedings of the Annual Workshop on Computational Theory.
[61]
Smeulders, A. W., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content based image retrieval at the end of the early years. IEEE Trans. Patt. Anal. Mach. Intell. 22, 12.
[62]
Song, Y., Hua, X. S., Dai, L. R., and Wang, M. 2005. Semi-automatic video annotation based on active learning with multiple complementary predictors. In Proceedings of the ACM Workshop on Multimedia Information Retrieval.
[63]
Sorokin, A. and Forsyth, D. 2008. Utility data annotation via amazon mechanical turk. In Proceedings of the CVPR Workshop.
[64]
Sychay. G., Chang, E. Y., and Goh, K. 2002. Effective image annotation via active learning. In Proceedings of the International Conference on Image Processing.
[65]
Tang, J., Hua, X. S., Qi, G., Gu, Z., and Wu, X. 2007. Beyond accuracy: Typicality ranking for video annotation. In Proceedings of the International Conference on Multimedia & Expo.
[66]
Tong, S. and Chang, E. Y. 2001. Support vector machine active learning for image retrieval. In Proceedings of the ACM Multimedia.
[67]
Tong, S. and Koller, D. 2000. Support vector machine active learning with applications to text classification. In Proceedings of the International Conference on Machine Learning.
[68]
Vendrig, J., den Hartog, J., van Leeuwen, D., Patras, I., Raaijmakers, S., van Rest, J., Snoek, C., and Worring, M. 2002. Trec feature extraction by active learning. In Proceedings of the TRECVID Workshop.
[69]
Vijayanarasimhan, S. and Grauman, K. 2008. Multi-level active prediction of useful image annotations for recognition. In Proceedings of the Neural Information Processing Systems.
[70]
Vijayanarasimhan, S. and Grauman, K. 2009. What it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations. In Proceedings of the Symposium on Computer Vision and Pattern Recognition.
[71]
Volkmer, T., Smith, J. R., and Natsev, A. 2005. A web-based system for collaborative annotation of large image and video collections. In Proceedings of ACM Multimedia.
[72]
Wang, M., Hua, X. S., Mei, T., Tang, J., Qi, G. J., Song, Y., and Dai, L. R. 2007. Interactive video annotation by multi-concept multi-modality active learning. Int. J. Seman. Comput. 1, 4.
[73]
Wang, M., Hua, X. S., Tang, J., and Hong, R. 2009a. Beyond distance measurement: constructing neighborhood similarity for video annotation. IEEE Trans. Multimed. 11, 3.
[74]
Wang, M., Hua, S, X., Hong, R., Tang, J., Qi, G. J., and Song, Y. 2009b. Unified video annotation via multi-graph learning. IEEE Trans. Circ. Syst. Video Tech. 19, 5.
[75]
Wang, W. and Zhou, Z. 2008. On multi-view active learning and the combination with semi-supervised learning. In Proceedings of the International Conference on Machine Learning.
[76]
Wu, Y., Kozintsev, I., Bouguet, J.-Y., and Dulong, C. 2006. Sampling strategies for active learning in personal photo retrieval. In Proceedings of the International Conference on Multimedia & Expo.
[77]
Yan, R., Natsev, A., and Campbell, M. 2009. Hybrid tagging and browsing approaches for efficient manual image annotation. IEEE Multimed. Mag. 16, 2.
[78]
Yan, R., Yang, J., and Hauptmann, A. 2003. Automatically labeling video data using multi-class active learning. In Proceedings of the International Conference on Computer Vision.
[79]
Yang, J., Li, Y., Tian, Y., Duan, L., and Gao, W. 2009. Multiple kernel active learning for image classification. In Proceedings of the International Conference on Multimedia & Expo.
[80]
Yuan, J., Zhou, X., Zhang, J., Wang, M., Zhang, Q., Wang, W., and Shi, B. 2006. Positive sample enhanced angle-diversity learning for SVM-based image retrieval. In Proceedings of the International Conference on Multimedia & Expo.
[81]
Zhang, C. and Chen, T. 2003. Annotating retrieval database with active learning. In Proceedings of the International Conference on Image Processing.
[82]
Zhang, Q. and Goldman, S. A. 2001. EM-DD: An improved multiple-instance learning technique. In Proceedings of the Neural Information Processing Systems.
[83]
Zhang, X., Cheng, J., Xu, C., Lu, H., and Ma, S. 2009. Multi-view multi-label active learning for image classification. In Proceedings of the International Conference on Multimedia & Expo.
[84]
Zhu, X. 2009. Semi-supervised learning literature survey. Tech. rep. (1530), Wisconsin-Madison.
[85]
Zhu, X., Ghahramani, Z., and Lafferty, J. 2003a. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the International Conference on Machine Learning.
[86]
Zhu, X., Lafferty, J., and Ghabramani, Z. 2003b. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the ICML 2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining.

Cited By

View all
  • (2024)Active Learning for Data Quality Control: A SurveyJournal of Data and Information Quality10.1145/366336916:2(1-45)Online publication date: 11-May-2024
  • (2024)ATAL: Active Learning Using Adversarial Training for Data AugmentationIEEE Internet of Things Journal10.1109/JIOT.2023.330030011:3(4787-4800)Online publication date: 1-Feb-2024
  • (2024)A novel deep belief network architecture with interval type-2 fuzzy set based uncertain parameters towards enhanced learningFuzzy Sets and Systems10.1016/j.fss.2023.108744477(108744)Online publication date: Feb-2024
  • Show More Cited By

Index Terms

  1. Active learning in multimedia annotation and retrieval: A survey

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 2, Issue 2
    February 2011
    175 pages
    ISSN:2157-6904
    EISSN:2157-6912
    DOI:10.1145/1899412
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 February 2011
    Accepted: 01 August 2010
    Revised: 01 June 2010
    Received: 01 February 2010
    Published in TIST Volume 2, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Active learning
    2. content-based image retrieval
    3. image annotation
    4. model learning
    5. sample selection
    6. video annotation

    Qualifiers

    • Research-article
    • Survey
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)89
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 16 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Active Learning for Data Quality Control: A SurveyJournal of Data and Information Quality10.1145/366336916:2(1-45)Online publication date: 11-May-2024
    • (2024)ATAL: Active Learning Using Adversarial Training for Data AugmentationIEEE Internet of Things Journal10.1109/JIOT.2023.330030011:3(4787-4800)Online publication date: 1-Feb-2024
    • (2024)A novel deep belief network architecture with interval type-2 fuzzy set based uncertain parameters towards enhanced learningFuzzy Sets and Systems10.1016/j.fss.2023.108744477(108744)Online publication date: Feb-2024
    • (2024)An intensity-based deep approach to mitigate step-imbalance problem under extreme paucity of images from rare classesMultimedia Tools and Applications10.1007/s11042-024-19303-8Online publication date: 9-May-2024
    • (2024)Active Learning for Low-Resource Project-Specific Code SummarizationKnowledge Science, Engineering and Management10.1007/978-981-97-5489-2_5(48-57)Online publication date: 16-Aug-2024
    • (2024)A Hybrid Fuzzy Deep Belief Network Extreme Learning Machine Framework With Hyperbolic Secant Activation Function for Robust Semi‐Supervised Sentiment ClassificationApplied AI Letters10.1002/ail2.102Online publication date: 13-Oct-2024
    • (2023)A Survey on Active Learning: State-of-the-Art, Practical Challenges and Research DirectionsMathematics10.3390/math1104082011:4(820)Online publication date: 6-Feb-2023
    • (2023)Ensemble Active Learning by Contextual Bandits for AI Incubation in ManufacturingACM Transactions on Intelligent Systems and Technology10.1145/362782115:1(1-26)Online publication date: 19-Dec-2023
    • (2023)ALECComputers in Biology and Medicine10.1016/j.compbiomed.2023.106841158:COnline publication date: 1-May-2023
    • (2023)Active Learning—ReviewActive Learning to Minimize the Possible Risk of Future Epidemics10.1007/978-981-99-7442-9_3(19-30)Online publication date: 23-Nov-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media