Abstract
This work focuses on multiple instance learning (MIL) with sparse positive bags (which we name as sparse MIL). A structural representation is presented to encode both instances and bags. This representation leads to a non-i.i.d. MIL algorithm, miStruct, which uses a structural similarity to compare bags. Furthermore, MIL with this representation is shown to be equivalent to a document classification problem. Document classification also suffers from the fact that only few paragraphs/words are useful in revealing the category of a document. By using the TF-IDF representation which has excellent empirical performance in document classification, the miDoc method is proposed. The proposed methods achieve significantly higher accuracies and AUC (area under the ROC curve) than the state-of-the-art in a large number of sparse MIL problems, and the document classification analogy explains their efficacy in sparse MIL problems.
Similar content being viewed by others
Notes
One instance can appear in more than one bags, e.g., \({x_{1}^{1}}\) and \({x_{2}^{1}}\) can have the same values.
The moralization used here has two differences with the one for DAG: on the one hand, cycle is permissive in G = (X, E); on the other hand, multiple marriage edges can be existed between two instances. So we are slightly abusing this concept.
This means that the component of every z i corresponding to that instance is non-zero for most bags.
For the convenient of presentation, we present AUC in percentage.
References
Andrews S, Tsochantaridis I, Hofmann T (2003) Support vector machines for multiple-instance learning. In: The 15th advances in neural information processing systems
Babenko B, Yang M-H, Belongie S (2009) Visual tracking with online multiple instance learning
Bunescu RC, Mooney RJ (2007) Multiple instance learning for sparse positive bags. In: The 24th international conference on machine learning
Chen Y, Bi J, Wang JZ (2006) MILES: multiple-instance learning via embedded instance selection. IEEE Trans Pattern Anal Mach Intell 28(12):1931–1947
Chen Y, Wang JZ (2004) Image categorization by learning and reasoning with regions. J Mach Learn Res 5:913–939
Cheung P-M, Kwok JT (2006) A regularization framework for multiple-instance learning. In: The 23 rd international conference on machine learning
Dietterich TG, Lathrop RH, Lozano-Pérez T (1997) Solving the multiple-instance problem with axis-parallel rectangles. Artif Intell 89:31–71
Fung G, Dundar M, Krishnappuram B, Rao RB (2007) Multiple instance learning for computer aided diagnosis. In: The 19th advances in neural information processing systems
Gärtner T, Flach PA, Kowalczyk A, Smola AJ (2002) Multi-instance kernels. In: The 19th international conference on machine learning
Gehler PV, Chapelle O (2007) Deterministic annealing for multiple-instance learning. In: The 11th international conference on artificial intelligence and statistics
Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Koller D, Friedman N (2009) Probabilistic graphical models. MIT Press
Li F, Sminchisescu C (2010) Convex multiple-instance learning by estimating likelihood ratio. In: The 24th advances in neural information processing systems
Li W, Yeung D (2010) Mild: multiple-instance learning via disambiguation. IEEE Trans Knowl Data Eng 22(1):76–89
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press
Rastegari M, Hajishirzi H, Farhadi A (2015) Discriminative and consistent similarities in instance-level multiple instance learning
Ray S, Craven M (2005) Supervised versus multiple instance learning: an empirical comparison. In: The 22th international conference on machine learning
Robertson S (2004) Understanding inverse document frequency: on theoretical arguments for IDF. J Doc 60:503–520
Settles B, Craven M, Ray S (2008) Multiple instance active learning. In: The 20th advances in neural information processing systems
Viola P, Platt J-C, Zhang C (2007) Multiple instance boosting for object detection
Viola P, Platt JC, Zhang C (2006) Multiple instance boosting for object detection. In: The 18th advances in neural information processing systems
Winn J, Criminisi A, Minka T (2005) Object categorization by learned universal visual dictionary. In: The 10th IEEE international conference on computer vision
Wu J (2011) Balance support vector machines locally using the structural similarity kernel. In: The 15th pacific-asia conference on knowledge discovery and data mining
Wu J, Rehg JM (2011) CENTRIST: a visual descriptor for scene categorization. IEEE Trans Pattern Anal Mach Intell. To appear
Zhang B-C, Li Z-G, Liu J (2015) A compressed sensing ensemble classifier with application to human detection. Neurocomputing 170:221–227
Zhang B-C, Li Z-G, Perina A, Bue A-D, Murino V (2015) Adaptive local movement modelling for object tracking. In: IEEE Winter conference on applications of computer vision, pp 25–32
Zhang B-C, Perina A, Bue VMA-D (2015) Sparse representation classification with manifold constraints transfer. In: The IEEE conference on computer vision and pattern recognition
Zhang B-C, Perina A, Li Z-G, Murino V, Liu J-Z, Ji R-R (2016) Bounding multiple gaussians uncertainty with application to object tracking
Zhang M, Zhou Z (2009) Multi-instance clustering with applications to multi-instance prediction. Appl Intell 31(1):47–68
Zhang Q, Goldman S (2002) EM-DD: An improved multiple-instance learning technique. In: The 14th advances in neural information processing systems
Zhou Z-H, Sun Y-Y, Li Y-F (2009) Multi-instance learning by treating instances as non-i.i.d. samples. In: The 26th international conference on machine learning
Zhou Z-H, Xu J-M (2007) On the relation between multi-instance learning and semi-supervised learning. In: The 24th international conference on machine learning
Acknowledgments
This research was supported by the National Natural Science Foundation of China under Grant Nos of 61300163 and 61422203.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yan, S., Zhu, X., Liu, G. et al. Sparse multiple instance learning as document classification. Multimed Tools Appl 76, 4553–4570 (2017). https://doi.org/10.1007/s11042-016-3567-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3567-z