Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Multiple-instance Learning from Triplet Comparison Bags

Published: 12 February 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Multiple-instance learning (MIL) solves the problem where training instances are grouped in bags, and a binary (positive or negative) label is provided for each bag. Most of the existing MIL studies need fully labeled bags for training an effective classifier, while it could be quite hard to collect such data in many real-world scenarios, due to the high cost of data labeling process. Fortunately, unlike fully labeled data, triplet comparison data can be collected in a more accurate and human-friendly way. Therefore, in this article, we for the first time investigate MIL from only triplet comparison bags, where a triplet (Xa, Xb, Xc) contains the weak supervision information that bag Xa is more similar to Xb than to Xc. To solve this problem, we propose to train a bag-level classifier by the empirical risk minimization framework and theoretically provide a generalization error bound. We also show that a convex formulation can be obtained only when specific convex binary losses such as the square loss and the double hinge loss are used. Extensive experiments validate that our proposed method significantly outperforms other baselines.
    Appendices

    A Generation Process of Triplet Comparison Bags

    Recall the assumption that three bags in a triplet are sampling independently. Therefore, for a triplet \((X_{a}, X_{b}, X_{c})\) , the bag labels \((Y_a,Y_b,Y_c)\) can only appear to be one of the following cases:
    \(\begin{eqnarray*} \nonumber \mathcal {Y}_{1} = \lbrace (+1,+1,+1),(+1,+1,-1),(+1,-1,-1),(-1,+1,+1),(-1,-1,+1),(-1,-1,-1)\rbrace . \end{eqnarray*}\)
    Otherwise, the first bag is more similar to the third bag than to the second bag, and in this case, \((Y_a,Y_b,Y_c)\) appears to be one of the following cases:
    \(\begin{eqnarray*} \nonumber \mathcal {Y}_{2} = \lbrace (+1,-1,+1),(-1,+1,-1)\rbrace . \end{eqnarray*}\)
    According to the above distributions \(\mathcal {Y}_{1}\) and \(\mathcal {Y}_{2}\) , we can actually collect two distinct types of datasets as follows:
    \(\begin{eqnarray*} \nonumber \mathcal {D}_{1} = \lbrace (X_{a},X_{b},X_{c})|(Y_a,Y_b,Y_c)\in \mathcal {Y}_{1}\rbrace , \quad \mathcal {D}_{2} = \lbrace (X_{a},X_{b},X_{c})|(Y_a,Y_b,Y_c)\in \mathcal {Y}_{2}\rbrace . \end{eqnarray*}\)
    The two types of datasets \(\mathcal {D}_{1}\) and \(\mathcal {D}_{2}\) can be considered to be generated from the following underlying distributions:
    \(\begin{eqnarray*} \nonumber p_{1}(X_{a},X_{b},X_{c}) &= \frac{p(X_{a},X_{b},X_{c}, (Y_a,Y_b,Y_c)\in \mathcal {Y}_{1})}{\theta _{T}}, \\ \nonumber p_{2}(X_{a},X_{b},X_{c}) &= \theta _{+}p_{+}(X)p_{+}(X)p_{-}(X) + \theta _{-}p_{-}(X)p_{+}(X)p_{-}(X), \end{eqnarray*}\)
    where \(\theta _{T} = 1- \theta _{+}\theta _{-}\) , \(\theta _{+} = p(y=+1)\) and \(\theta _{-} = p(y=-1)\) and \(p_{+}(X)=p(X|y=+1)\) and \(p_{-}(X)=p(X|y=-1)\) . Then, we have
    \(\begin{eqnarray*} \nonumber \mathcal {D}_1 = \lbrace (X_{1,a},X_{1,b},X_{1,c})\rbrace ^{m_1}\sim p_{1}(X_{a},X_{b},X_{c}), \quad \mathcal {D}_2 = \lbrace (X_{2,a},X_{2,b},X_{2,c})\rbrace ^{m_2}\sim p_{2}(X_{a},X_{b},X_{c}). \end{eqnarray*}\)
    Furthermore, we denote the pointwise data collected from \(\mathcal {D}_1\) and \(\mathcal {D}_2\) by ignoring the triplet comparison relation as \(\mathcal {D}_{1,a} = \lbrace X_{1,a}\rbrace ^{m_{1}}\) , \(\mathcal {D}_{1,b} = \lbrace X_{1,b}\rbrace ^{m_{1}}\) , \(\mathcal {D}_{1,c} = \lbrace X_{1,c}\rbrace ^{m_{1}}\) , \(\mathcal {D}_{2,a} = \lbrace X_{2,a}\rbrace ^{m_{2}}\) , \(\mathcal {D}_{2,b} = \lbrace X_{2,b}\rbrace ^{m_{2}}\) and \(\mathcal {D}_{2,c} = \lbrace X_{2,c}\rbrace ^{m_{2}}\) . From Theorem 1 in Cui et al. [12], samples in \(\mathcal {D}_{1,a}\) , \(\mathcal {D}_{1,c}\) , \(\mathcal {D}_{2,a}\) and \(\mathcal {D}_{2,c}\) are independently drawn from
    \(\begin{eqnarray*} \nonumber \tilde{p}_{1}(X) = \theta _{+}p_{+}(X) + \theta _{-}p_{-}(X), \end{eqnarray*}\)
    samples in \(\mathcal {D}_{1,b}\) are independently drawn from
    \(\begin{eqnarray*} \nonumber \tilde{p}_{2}(X) = \frac{(\theta _{+}^3+2\theta _{+}^2\theta _{-})p_{+}(X) + (2\theta _{+}\theta _{-}^2+ \theta _{-}^3)p_{-}(X)}{\theta _{T}}, \end{eqnarray*}\)
    and samples in \(\mathcal {D}_{2,b}\) are independently drawn from
    \(\begin{eqnarray*} \nonumber \tilde{p}_{3}(X) = \theta _{-}p_{+}(X) + \theta _{+}p_{-}(X). \end{eqnarray*}\)
    Those indicate that from triplet comparison data, we can essentially obtain samples that can be drawn independently from three different distributions. Then, we denote the three aggregated datasets (from respective distributions) as
    \(\begin{eqnarray*} \nonumber \tilde{\mathcal {D}}_{1} =\lbrace \tilde{X}_{i}^{1}\rbrace ^{n_1}_{i=1}= \mathcal {D}_{1,a}\cup \mathcal {D}_{1,c}\cup \mathcal {D}_{2,a}\cup \mathcal {D}_{2,c}, \quad \tilde{\mathcal {D}}_{2} =\lbrace \tilde{X}_{i}^{2}\rbrace ^{n_2}_{i=1} = \mathcal {D}_{1,b}, \quad \tilde{\mathcal {D}}_{3} =\lbrace \tilde{X}_{i}^{3}\rbrace ^{n_3}_{i=1} = \mathcal {D}_{2,b}, \end{eqnarray*}\)
    where
    \(\begin{eqnarray*} \nonumber \tilde{\mathcal {D}}_{1} \sim \tilde{p}_{1}(X), \quad \tilde{\mathcal {D}}_{2} \sim \tilde{p}_{2}(X), \quad \tilde{\mathcal {D}}_{3} \sim \tilde{p}_{3}(X). \end{eqnarray*}\)
    Let \(C = \frac{\theta _{+}^3+2\theta _{+}^2\theta _{-}}{\theta _{T}}\) and \(D = \frac{2\theta _{+}\theta _{-}^2+ \theta _{-}^3}{\theta _{T}}\) , we can express the relationship between these densities as
    \(\begin{eqnarray*} \nonumber \begin{bmatrix} \tilde{p}_{1}(X)\\ \tilde{p}_{2}(X)\\ \tilde{p}_{3}(X) \end{bmatrix} = \begin{bmatrix} \theta _{+} &\theta _{-}\\ C & D\\ \theta _{-} &\theta _{+} \end{bmatrix}\begin{bmatrix} p_{+}(X)\\ p_{-}(X) \end{bmatrix}. \end{eqnarray*}\)

    B Proof of Theorem 1

    Recall that by using the loss function that satisfies the linear-odd condition, \(\widehat{R}_{\mathrm{Trip}}(g)\) can be also represented as
    \(\begin{align*} \widehat{R}_{\mathrm{Trip}}(g) =&\,\, \frac{1}{n_1}\sum \limits _{i=1}^{n_1}\Big ((\lambda _1+\lambda _2)\ell _+\left(g\left(X_i^1\right)\right)+\lambda _2g(X_i^1)\Big) +\frac{1}{n_2}\sum \limits _{i=1}^{n_2}\Big ((\lambda _3+\lambda _4)\ell _+\left(g\left(X_i^2\right)\right)+\lambda _4 g\left(X_i^2\right)\!\Big)\\ &+\frac{1}{n_3}\sum \limits _{i=1}^{n_3}\Big ((\lambda _5+\lambda _6)\ell _-\left(g\left(X_i^3\right)\right)-\lambda _5 g\left(X_i^3\right)\!\Big). \end{align*}\)
    In this way, we can represent \({R}_{\mathrm{Trip}}(g)\) as
    \(\begin{align*} {R}_{\mathrm{Trip}}(g) =&\,\, \mathbb {E}_{\widetilde{p}_{1}(X)}\Big [ (\lambda _1+\lambda _2)\ell _+(g(X^1))+\lambda _2 g(X^1)\Big ] +\mathbb {E}_{\widetilde{p}_{2}(X)}\Big [ (\lambda _3+\lambda _4)\ell _+(g(X^2))+\lambda _4 g(X^2)\Big ] \\ &+\mathbb {E}_{\widetilde{p}_{3}(X)}\Big [ (\lambda _5+\lambda _6)\ell _-(g(X^3))-\lambda _5 g(X^3)\Big ], \end{align*}\)
    where we assumed that the collected data \(\lbrace X^1_{i}\rbrace _{i=1}^{n_1}\) are independently sampled from \(\widetilde{p}_{1}(X)\) , the collected data \(\lbrace X^2_{i}\rbrace _{i=1}^{n_2}\) are independently sampled from \(\widetilde{p}_{2}(X)\) and the collected data \(\lbrace X^3_{i}\rbrace _{i=1}^{n_3}\) are independently sampled from \(\widetilde{p}_{3}(X)\) . Let us further introduce
    \(\begin{eqnarray*} \nonumber \widehat{R}_{1}(g) =\frac{1}{n_1}\sum \limits _{i=1}^{n_1}\left((\lambda _1+\lambda _2)\ell _+\left(g\left(X_i^1\right)\!\right) +\lambda _2g\left(X_i^1\right)\!\right)\!, \quad R_{1}(g) =\mathbb {E}_{\widetilde{p}_{1}(X)}\Big [(\lambda _1+\lambda _2)\ell _+(g(X^1))+\lambda _2 g(X^1)\Big ],\\ \nonumber \widehat{R}_{2}(g) = \frac{1}{n_2}\sum \limits _{i=1}^{n_2}\left((\lambda _3+\lambda _4)\ell _+\left(g\left(X_i^2\right)\!\right)+\lambda _4 g\left(X_i^2\right)\!\right)\!, \quad R_{2}(g) =\mathbb {E}_{\widetilde{p}_{2}(X)}\Big [ (\lambda _3+\lambda _4)\ell _+(g(X^2))+\lambda _4 g(X^2)\Big ],\\ \nonumber \widehat{R}_{3}(g) = \frac{1}{n_3}\sum \limits _{i=1}^{n_3}\left((\lambda _5+\lambda _6)\ell _-\left(g\left(X_i^3\right)\right)-\lambda _5 g\left(X_i^3\right)\!\right)\!, \quad R_{3}(g) =\mathbb {E}_{\widetilde{p}_{3}(X)}\Big [ (\lambda _5+\lambda _6)\ell _-(g(X^3))-\lambda _5 g(X^3)\Big ]. \end{eqnarray*}\)
    In this way, we have
    \(\begin{eqnarray*} \nonumber \widehat{R}_{\mathrm{Trip}}(g) =\widehat{R}_{1}(g) + \widehat{R}_{2}(g) + \widehat{R}_{3}(g),\quad {R}_{\mathrm{Trip}}(g) = R_{1}(g) + R_{2}(g) + R_{3}(g). \end{eqnarray*}\)
    Thus,
    \(\begin{eqnarray*} \nonumber \sup _{g\in \mathcal {G}}\left|{R}_{\mathrm{Trip}}(g)-\widehat{R}_{\mathrm{Trip}}(g)\right|\le \sup _{g\in \mathcal {G}}\left|{R}_{1}(g)-\widehat{R}_{1}(g)\right|+\sup _{g\in \mathcal {G}}\left|{R}_{2}(g)-\widehat{R}_{2}(g)\right| +\sup _{g\in \mathcal {G}}\left|{R}_{3}(g)-\widehat{R}_{3}(g)\right|. \end{eqnarray*}\)
    Hence, the problem becomes how to find an upper bound of each term in the right hand size of the inequality.
    Lemma 1.
    With the introduced definitions and conditions in Theorem 1, for any \(\delta \gt 0\) , with probability at least \(1-\delta\) , we have
    \(\begin{eqnarray*} \nonumber \sup _{g\in \mathcal {G}}\left| R_{1}(g)-\widehat{R}_{1}(g)\right| \le & (\left|\lambda _1\right|+\left|\lambda _2\right|)\left(\frac{2C_{\mathcal {G}}}{\sqrt {n_1}}+C_{\boldsymbol {w}}C_{\boldsymbol {\phi }}\sqrt {\frac{\log \frac{2}{\delta }}{2n_1}}\right)\!. \end{eqnarray*}\)
    Proof.
    First, it is easy to verify that the double hinge loss \(\ell _{\mathrm{DH}}\) is 1-Lipschitz. Suppose an example in \(\widehat{R}_{1}(g)\) is replaced by another arbitrary example, then the change of \(\sup _{g\in \mathcal {G}}\big (R_{1}(g)-\widehat{R}_{1}(g)\big)\) is no greater than \((\left|\lambda _1\right|+\left|\lambda _2\right|)C_{\boldsymbol {w}}C_{\boldsymbol {\phi }}/n_{1}\) . Then, by applying McDiarmid’s inequality [28], for any \(\delta \gt 0\) , with probability at least \(1-\frac{\delta }{2}\) ,
    \(\begin{eqnarray*} \nonumber \sup _{g\in \mathcal {G}}\big (R_{1}(g)-\widehat{R}_{1}(g)\big) &\le \mathbb {E}\Big [\sup _{g\in \mathcal {G}}\big (R_{1}(g)-\widehat{R}_{1}(g)\big)\Big ]+ (\left|\lambda _1\right|+\left|\lambda _2\right|)C_{\boldsymbol {w}}C_{\boldsymbol {\phi }}\sqrt {\frac{\log \frac{2}{\delta }}{2n_1}}. \end{eqnarray*}\)
    Besides, it is routine [30] to show
    \(\begin{eqnarray*} \nonumber \mathbb {E}\Big [\sup _{g\in \mathcal {G}}\big (R_{1}(g)-\widehat{R}_{1}(g)\big)\Big ]\le 2(\left|\lambda _1\right|+\left|\lambda _2\right|)\mathfrak {R}_{n_1}(\mathcal {G}), \end{eqnarray*}\)
    where we have used the Talagrand’s lemma (Lemma 4.2 in Mohri et al. [30]), i.e., \(\mathfrak {R}_{n}(\ell \circ \mathcal {G})\le \rho \mathfrak {R}_n(\mathcal {G})\) if \(\ell\) is a \(\rho\) -Lipschitz loss function. By considering \(\mathfrak {R}_n(\mathcal {G})\le C_{\mathcal {G}}/\sqrt {n}\) , we have
    \(\begin{eqnarray*} \nonumber \sup _{g\in \mathcal {G}}\big (R_{1}(g)-\widehat{R}_{1}(g)\big) \le & (\left|\lambda _1\right|+\left|\lambda _2\right|)\left(\frac{2C_{\mathcal {G}}}{\sqrt {n_1}}+C_{\boldsymbol {w}}C_{\boldsymbol {\phi }}\sqrt {\frac{\log \frac{2}{\delta }}{2n_1}}\right)\!. \end{eqnarray*}\)
    By further taking into account the other side \(\sup _{g\in \mathcal {G}}\big (\widehat{R}_{1}(g)-R_{1}(g)\big)\) , we have for any \(\delta \gt 0\) , with probability at least \(1-\delta\) ,
    \(\begin{eqnarray*} \nonumber \sup _{g\in \mathcal {G}}\left|R_{1}(g)-\widehat{R}_{1}(g)\right| \le & (\left|\lambda _1\right|+\left|\lambda _2\right|)\left(\frac{2C_{\mathcal {G}}}{\sqrt {n_1}}+C_{\boldsymbol {w}}C_{\boldsymbol {\phi }}\sqrt {\frac{\log \frac{2}{\delta }}{2n_1}}\right)\!, \end{eqnarray*}\)
    which completes the proof of Lemma 1. □
    Lemma 2.
    With the introduced definitions and conditions in Theorem 1, for any \(\delta \gt 0\) , with probability at least \(1-\delta\) , we have
    \(\begin{eqnarray*} \nonumber \sup _{g\in \mathcal {G}}\left| R_{2}(g)-\widehat{R}_{2}(g)\right| \le & (\left|\lambda _3\right|+\left|\lambda _4\right|)\left(\frac{2C_{\mathcal {G}}}{\sqrt {n_2}}+C_{\boldsymbol {w}}C_{\boldsymbol {\phi }}\sqrt {\frac{\log \frac{2}{\delta }}{2n_2}}\right)\!. \end{eqnarray*}\)
    Lemma 3.
    With the introduced definitions and conditions in Theorem 1, for any \(\delta \gt 0\) , with probability at least \(1-\delta\) , we have
    \(\begin{eqnarray*} \nonumber \sup _{g\in \mathcal {G}}\left| R_{2}(g)-\widehat{R}_{2}(g)\right| \le & (\left|\lambda _5\right|+\left|\lambda _6\right|)\left(\frac{2C_{\mathcal {G}}}{\sqrt {n_3}}+C_{\boldsymbol {w}}C_{\boldsymbol {\phi }}\sqrt {\frac{\log \frac{2}{\delta }}{2n_3}}\right)\!. \end{eqnarray*}\)
    Lemmas 2 and 3 can be proved similarly as Lemma 1; hence, we omit the proof. By combining Lemmas 1, 2, and 3, Theorem 1 is immediately proved. \(\Box\)

    References

    [1]
    Jaume Amores. 2013. Multiple instance classification: Review, taxonomy, and comparative study. Artific. Intell. 201 (2013), 81–105.
    [2]
    Martin S. Andersen, Joachim Dahl, and Lieven Vandenberghe. 2013. CVXOPT: Python software for convex optimization. Retrieved from https://cvxopt.org
    [3]
    Stuart Andrews, Ioannis Tsochantaridis, and Thomas Hofmann. 2002. Support vector machines for multiple-instance learning. In Proceedings of the NeurIPS. 577–584.
    [4]
    Boris Babenko, Ming-Hsuan Yang, and Serge Belongie. 2009. Visual tracking with online multiple instance learning. In Proceedings of the CVPR. 983–990.
    [5]
    Han Bao, Gang Niu, and Masashi Sugiyama. 2018. Classification from pairwise similarity and unlabeled data. In Proceedings of the ICML. 452–461.
    [6]
    Han Bao, Tomoya Sakai, Issei Sato, and Masashi Sugiyama. 2018. Convex formulation of multiple instance learning from positive and unlabeled bags. Neural Netw. 105 (2018), 132–141.
    [7]
    Peter L. Bartlett and Shahar Mendelson. 2002. Rademacher and Gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res. 3, 11 (2002), 463–482.
    [8]
    Yuzhou Cao, Lei Feng, Yitian Xu, Bo An, Gang Niu, and Masashi Sugiyama. 2021. Learning from similarity-confidence data. In Proceedings of the ICML. 1272–1282.
    [9]
    Marc-André Carbonneau, Veronika Cheplygina, Eric Granger, and Ghyslain Gagnon. 2018. Multiple instance learning: A survey of problem characteristics and applications. Pattern Recogn. 77 (2018), 329–353.
    [10]
    Yixin Chen, Jinbo Bi, and James Ze Wang. 2006. MILES: Multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Mach. Intell. 28, 12 (2006), 1931–1947.
    [11]
    Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb, Kazuaki Yamamoto, and David Ha. 2018. Deep learning for classical japanese literature. Retrieved from https://arXiv:1812.01718
    [12]
    Zhenghang Cui, Nontawat Charoenphakdee, Issei Sato, and Masashi Sugiyama. 2020. Classification from triplet comparison data. Neural Comput. 32, 3 (2020), 659–681.
    [13]
    Thomas G. Dietterich, Richard H. Lathrop, and Tomás Lozano-Pérez. 1997. Solving the multiple instance problem with axis-parallel rectangles. Artific. Intell. 89, 1-2 (1997), 31–71.
    [14]
    M. C. du Plessis, G. Niu, and M. Sugiyama. 2015. Convex formulation for learning from positive and unlabeled data. In Proceedings of the ICML. 1386–1394.
    [15]
    Lei Feng, Senlin Shu, Yuzhou Cao, Lue Tao, Hongxin Wei, Tao Xiang, Bo An, and Gang Niu. 2021. Multiple-instance learning from similar and dissimilar bags. In Proceedings of the KDD. 374–382.
    [16]
    James Richard Foulds and Eibe Frank. 2010. A review of multi-instance learning assumptions. Knowl. Eng. Rev. 25 (2010), 1–25.
    [17]
    Thomas Gärtner, Peter A. Flach, Adam Kowalczyk, and Alexander J. Smola. 2002. Multi-instance kernels. In Proceedings of the ICML. 179–186.
    [18]
    Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. 2018. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In Proceedings of the NeurIPS.
    [19]
    Sheng-Jun Huang, Wei Gao, and Zhi-Hua Zhou. 2018. Fast multi-instance multi-label learning. IEEE Trans. Pattern Anal. Mach. Intell. 41, 11 (2018), 2614–2627.
    [20]
    Maximilian Ilse, Jakub Tomczak, and Max Welling. 2018. Attention-based deep multiple instance learning. In Proceedings of the ICML. PMLR, 2127–2136.
    [21]
    T. Ishida, G. Niu, and M. Sugiyama. 2018. Binary classification for positive-confidence data. In Proceedings of the NeurIPS. 5917–5928.
    [22]
    Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.
    [23]
    Christian Leistner, Amir Saffari, and Horst Bischof. 2010. MIForests: Multiple-instance learning with randomized trees. In Proceedings of the ECCV. 29–42.
    [24]
    Xin-Chun Li, De-Chuan Zhan, Jia-Qi Yang, and Yi Shi. 2021. Deep multiple instance selection. Sci. China Info. Sci. 64 (2021), 1–15.
    [25]
    Dong Liang, Xinbo Gao, Wen Lu, and Jie Li. 2021. Deep blind image quality assessment based on multiple instance regression. Neurocomputing 431 (2021), 78–89.
    [26]
    Nan Lu, Gang Niu, Aditya K. Menon, and Masashi Sugiyama. 2019. On the minimal supervision for training any binary classifier from only unlabeled data. In Proceedings of the ICLR.
    [27]
    James MacQueen et al. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the BSMSP. 281–297.
    [28]
    Colin McDiarmid. 1989. On the method of bounded differences. Surveys Combinator. 141, 1 (1989), 148–188.
    [29]
    Shahar Mendelson. 2008. Lower bounds for the empirical minimization algorithm. IEEE Trans. Info. Theory 54, 8 (2008), 3797–3803.
    [30]
    Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. 2012. Foundations of Machine Learning. MIT Press, Cambridge, MA.
    [31]
    Gang Niu, Marthinus Christoffel du Plessis, Tomoya Sakai, Yao Ma, and Masashi Sugiyama. 2016. Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In Proceedings of the NeurIPS. 1199–1207.
    [32]
    Soumya Ray and Mark Craven. 2005. Learning statistical models for annotating proteins with function information using biomedical text. BMC Bioinform. 6, Suppl. 1 (2005), S18.
    [33]
    Soumya Ray and Mark Craven. 2005. Supervised versus multiple instance learning: An empirical comparison. In Proceedings of the ICML. ACM, 697–704.
    [34]
    Tomoya Sakai, Marthinus Christoffel du Plessis, Gang Niu, and Masashi Sugiyama. 2017. Semi-supervised classification based on classification from positive and unlabeled data. In Proceedings of the ICML. 2998–3006.
    [35]
    Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the CVPR. 815–823.
    [36]
    Takuya Shimada, Han Bao, Issei Sato, and Masashi Sugiyama. 2020. Classification from pairwise similarities/dissimilarities and unlabeled data via empirical risk minimization. Neural Comput. 33, 5 (2020), 1234–1268.
    [37]
    Qingping Tao, Stephen Scott, NV Vinodchandran, and Thomas Takeo Osugi. 2004. SVM-based generalized multiple-instance learning via approximate box counting. In Proceedings of the ICML. 101.
    [38]
    Kiri Wagstaff, Claire Cardie, Seth Rogers, Stefan Schrödl et al. 2001. Constrained k-means clustering with background knowledge. In Proceedings of the ICML. 577–584.
    [39]
    Zhuang Wang, Vladan Radosavljevic, Bo Han, Zoran Obradovic, and Slobodan Vucetic. 2008. Aerosol optical depth prediction from satellite observations by multiple instance regression. In Proceedings of the ICDM. SIAM, 165–176.
    [40]
    Hong-Xin Wei, Lei Feng, Xiang-Yu Chen, and Bo An. 2020. Combating noisy labels by agreement: A joint training method with co-regularization. In Proceedings of the CVPR. 13726–13735.
    [41]
    Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. Retrieved from https://arXiv:1708.07747
    [42]
    Ling Xiao, Renfa Li, Juan Luo, et al. 2006. Sensor localization based on nonmetric multidimensional scaling. STRESS 2, 1 (2006).
    [43]
    Xin Xu and Eibe Frank. 2004. Logistic regression and boosting for labeled bags of instances. In Proceedings of the PAKDD. Springer, 272–281.
    [44]
    Cha Zhang and Paul Viola. 2007. Multiple-instance pruning for learning efficient cascade detectors. In Proceedings of the NeurIPS. 1681–1688.
    [45]
    Min-Ling Zhang, Fei Yu, and Cai-Zhi Tang. 2017. Disambiguation-free partial label learning. IEEE Trans. Knowl. Data Eng. 29, 10 (2017), 2155–2167.
    [46]
    Qi Zhang and Sally A. Goldman. 2001. EM-DD: An improved multiple-instance learning technique. In Proceedings of the NeurIPS. 1073–1080.
    [47]
    Teng Zhang and Hai Jin. 2020. Optimal margin distribution machine for multi-instance learning. In Proceedings of the IJCAI. 2383–2389.
    [48]
    Weijia Zhang, Xuanhui Zhang, Min-Ling Zhang et al. 2022. Multi-instance causal representation learning for instance label prediction and out-of-distribution generalization. In Proceedings of the NeurIPS. 34940–34953.
    [49]
    Zhi-Li Zhang and Min-Ling Zhang. 2006. Multi-instance multi-label learning with application to scene classification. In Proceedings of the NeurIPS.
    [50]
    Zhi-Hua Zhou. 2018. A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5, 1 (2018), 44–53.
    [51]
    Zhi-Hua Zhou, Yu-Yin Sun, and Yu-Feng Li. 2009. Multi-instance learning by treating instances as non-iid samples. In Proceedings of the ICML. 1249–1256.
    [52]
    Xiaojin Zhu and Andrew B. Goldberg. 2009. Introduction to semi-supervised learning. Synth. Lect. Artific. Intell. Mach. Learn. 3, 1 (2009), 1–130.

    Cited By

    View all
    • (2024)Exploring Multiple Instance Learning (MIL): A brief surveyExpert Systems with Applications10.1016/j.eswa.2024.123893250(123893)Online publication date: Oct-2024

    Index Terms

    1. Multiple-instance Learning from Triplet Comparison Bags

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 4
        May 2024
        707 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/3613622
        Issue’s Table of Contents

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 12 February 2024
        Online AM: 02 January 2024
        Accepted: 15 December 2023
        Revised: 12 May 2023
        Received: 08 June 2022
        Published in TKDD Volume 18, Issue 4

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Multi-Instance Learning
        2. Triplet Comparison
        3. Empirical Risk Minimization

        Qualifiers

        • Research-article

        Funding Sources

        • Chongqing Overseas Chinese Entrepreneurship and Innovation Support Program and the National Science Foundation of China

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)197
        • Downloads (Last 6 weeks)8
        Reflects downloads up to 26 Jul 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Exploring Multiple Instance Learning (MIL): A brief surveyExpert Systems with Applications10.1016/j.eswa.2024.123893250(123893)Online publication date: Oct-2024

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        Full Text

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media