-
Incorporating intratumoral heterogeneity into weakly-supervised deep learning models via variance pooling
Authors:
Iain Carmichael,
Andrew H. Song,
Richard J. Chen,
Drew F. K. Williamson,
Tiffany Y. Chen,
Faisal Mahmood
Abstract:
Supervised learning tasks such as cancer survival prediction from gigapixel whole slide images (WSIs) are a critical challenge in computational pathology that requires modeling complex features of the tumor microenvironment. These learning tasks are often solved with deep multi-instance learning (MIL) models that do not explicitly capture intratumoral heterogeneity. We develop a novel variance poo…
▽ More
Supervised learning tasks such as cancer survival prediction from gigapixel whole slide images (WSIs) are a critical challenge in computational pathology that requires modeling complex features of the tumor microenvironment. These learning tasks are often solved with deep multi-instance learning (MIL) models that do not explicitly capture intratumoral heterogeneity. We develop a novel variance pooling architecture that enables a MIL model to incorporate intratumoral heterogeneity into its predictions. Two interpretability tools based on representative patches are illustrated to probe the biological signals captured by these models. An empirical study with 4,479 gigapixel WSIs from the Cancer Genome Atlas shows that adding variance pooling onto MIL frameworks improves survival prediction performance for five cancer types.
△ Less
Submitted 19 November, 2022; v1 submitted 17 June, 2022;
originally announced June 2022.
-
mvlearn: Multiview Machine Learning in Python
Authors:
Ronan Perry,
Gavin Mischler,
Richard Guo,
Theodore Lee,
Alexander Chang,
Arman Koul,
Cameron Franz,
Hugo Richard,
Iain Carmichael,
Pierre Ablin,
Alexandre Gramfort,
Joshua T. Vogelstein
Abstract:
As data are generated more and more from multiple disparate sources, multiview data sets, where each sample has features in distinct views, have ballooned in recent years. However, no comprehensive package exists that enables non-specialists to use these methods easily. mvlearn is a Python library which implements the leading multiview machine learning methods. Its simple API closely follows that…
▽ More
As data are generated more and more from multiple disparate sources, multiview data sets, where each sample has features in distinct views, have ballooned in recent years. However, no comprehensive package exists that enables non-specialists to use these methods easily. mvlearn is a Python library which implements the leading multiview machine learning methods. Its simple API closely follows that of scikit-learn for increased ease-of-use. The package can be installed from Python Package Index (PyPI) and the conda package manager and is released under the MIT open-source license. The documentation, detailed examples, and all releases are available at https://mvlearn.github.io/.
△ Less
Submitted 25 May, 2021; v1 submitted 24 May, 2020;
originally announced May 2020.
-
Geometric Insights into Support Vector Machine Behavior using the KKT Conditions
Authors:
Iain Carmichael,
J. S. Marron
Abstract:
The support vector machine (SVM) is a powerful and widely used classification algorithm. This paper uses the Karush-Kuhn-Tucker conditions to provide rigorous mathematical proof for new insights into the behavior of SVM. These insights provide perhaps unexpected relationships between SVM and two other linear classifiers: the mean difference and the maximal data piling direction. For example, we sh…
▽ More
The support vector machine (SVM) is a powerful and widely used classification algorithm. This paper uses the Karush-Kuhn-Tucker conditions to provide rigorous mathematical proof for new insights into the behavior of SVM. These insights provide perhaps unexpected relationships between SVM and two other linear classifiers: the mean difference and the maximal data piling direction. For example, we show that in many cases SVM can be viewed as a cropped version of these classifiers. By carefully exploring these connections we show how SVM tuning behavior is affected by characteristics including: balanced vs. unbalanced classes, low vs. high dimension, separable vs. non-separable data. These results provide further insights into tuning SVM via cross-validation by explaining observed pathological behavior and motivating improved cross-validation methodology. Finally, we also provide new results on the geometry of complete data piling directions in high dimensional space.
△ Less
Submitted 10 October, 2018; v1 submitted 3 April, 2017;
originally announced April 2017.