Abstract
We propose a novel active learning strategy based on the compression framework of [9] for label ranking functions which, given an input instance, predict a total order over a predefined set of alternatives. Our approach is theoretically motivated by an extension to ranking and active learning of Kääriäinen’s generalization bounds using unlabeled data [7], initially developed in the context of classification. The bounds we obtain suggest a selective sampling strategy provided that a sufficiently, yet reasonably large initial labeled dataset is provided. Experiments on Information Retrieval corpora from automatic text summarization and question/answering show that the proposed approach allows to substantially reduce the labeling effort in comparison to random and heuristic-based sampling strategies.
Chapter PDF
Similar content being viewed by others
References
Amini, M., Usunier, N., Gallinari, P.: Automatic text summarization based on word clusters and ranking algorithms. In: Proc. of the 27th ECIR (2005)
Brinker, K.: Active learning of label ranking functions. In: Proc. of 21st International Conference on Machine learning (2004)
Brinker, K., Fürnkranz, J., Hüllermeier, E.: Label ranking by learning pairwise preferences. Journal of Machine learning Research (2005)
Chapelle, O.: Active learning for parzen window classifier. In: AI STATS (2005)
Crammer, K., Singer, Y.: A family of additive online algorithms for category ranking. Journal of Machine Learning Research 3(6), 1025–1058 (2003)
Floyd, S., Warmuth, M.: Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Machine Learning 21(3), 269–304 (1995)
Kääriäinen, M.: Generalization error bounds using unlabeled data. In: Proceedings of the 18th Annual Conference on Learning Theory, pp. 127–142 (2005)
Laviolette, F., Marchand, M., Shah, M.: Margin-sparsity trade-off for the set covering machine. In: Proc. of the 16th ECML, pp. 206–217 (2005)
Littlestone, N.: Manfred Warmuth. Relating data compression and learnability. Technical Report, University of California (1986)
Marcu, D.: The automatic construction of large-scale corpora for summarization research. In: Proceedings of the 22nd ACM SIGIR, pp. 137–144 (1999)
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2002)
Usunier, N., Amini, M., Gallinari, P.: Boosting weak ranking functions to enhance passage retrieval for question answering. In: IR4QA-workshop, SIGIR (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amini, M., Usunier, N., Laviolette, F., Lacasse, A., Gallinari, P. (2006). A Selective Sampling Strategy for Label Ranking. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871842_7
Download citation
DOI: https://doi.org/10.1007/11871842_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45375-8
Online ISBN: 978-3-540-46056-5
eBook Packages: Computer ScienceComputer Science (R0)