Abstract
We propose a non-parametric approach for characterizing heterogeneous diseases in large-scale studies. We target diseases where multiple types of pathology present simultaneously in each subject and a more severe disease manifests as a higher level of tissue destruction. For each subject, we model the collection of local image descriptors as samples generated by an unknown subject-specific probability density. Instead of approximating the probability density via a parametric family, we propose to side step the parametric inference by directly estimating the divergence between subject densities. Our method maps the collection of local image descriptors to a signature vector that is used to predict a clinical measurement. We are able to interpret the prediction of the clinical variable in the population and individual levels by carefully studying the divergences. We illustrate an application this method on simulated data as well as on a large-scale lung CT study of Chronic Obstructive Pulmonary Disease (COPD). Our approach outperforms classical methods on both simulated and COPD data and demonstrates the state-of-the-art prediction on an important physiologic measure of airflow (the forced respiratory volume in one second, FEV1).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alexander, D.H., Novembre, J., Lange, K.: Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19(9), 1655–1664 (2009)
Batmanghelich, N.K., Saeedi, A., Cho, M., Estepar, R.S.J., Golland, P.: Generative method to discover genetically driven image biomarkers. Int. Conf. Inf. Process. Med. Imaging 17(1), 30–42 (2015)
Binder, P., Batmanghelich, N.K., Estepar, R.S.J., Golland, P.: Unsupervised discovery of emphysema subtypes in a large clinical cohort. In: Wang, L., Adeli, E., Wang, Q., Shi, Y., Suk, H.-I. (eds.) MLMI 2016. LNCS, vol. 10019, pp. 180–187. Springer, Cham (2016). doi:10.1007/978-3-319-47157-0_22
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Depeursinge, A., Chin, A.S., Leung, A.N., Terrone, D., Bristow, M., Rosen, G., Rubin, D.L.: Automated classification of usual interstitial pneumonia using regional volumetric texture analysis in high-resolution computed tomography. Invest. Radiol. 50(4), 261–267 (2015)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., Ishwaran, H., Knight, K., Loubes, J.M., Massart, P., Madigan, D., Ridgeway, G., Rosset, S., Zhu, J.I., Stine, R.A., Turlach, B.A., Weisberg, S., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Gao, W., Oh, S., Viswanath, P.: Breaking the bandwidth barrier: geometrical adaptive entropy estimation (2016). http://arxiv.org/abs/1609.02208
Holzer, M., Donner, R.: Over-segmentation of 3D medical image volumes based on monogenic cues. In: CVWW, pp. 35–42 (2014). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.707.2473&rep=rep1&type=pdf
Lauritzen, S.L., Barndorff-Nielsen, O.E., Kass, R.E., Lauritzen, S.L., Rao, C.R.: Chapter 4: Statistical Manifolds, pp. 163–216. Institute of Mathematical Statistics (1987). http://projecteuclid.org/euclid.lnms/1215467061
Liu, K., Skibbe, H., Schmidt, T., Blein, T., Palme, K., Brox, T., Ronneberger, O.: Rotation-invariant HOG descriptors using fourier analysis in polar and spherical coordinates. Int. J. Comput. Vis. 106(3), 342–364 (2014)
Loader, C.R.: Local likelihood density estimation. Ann. Stat. 24(4), 1602–1618 (1996)
Mendoza, C.S., et al.: Emphysema quantification in a multi-scanner HRCT cohort using local intensity distributions. In: 2012 9th IEEE International Symposium on Biomedical Imaging (ISBI), pp. 474–477. IEEE (2012)
Muja, M., Lowe, D.G.: Scalable nearest neighbour algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)
Póczos, B., Schneider, J.G.: On the estimation of alpha-divergences. In: AISTATS, pp. 609–617 (2011)
Poczos, B., Xiong, L., Schneider, J.: Nonparametric divergence estimation with applications to machine learning on distributions. Uncertainty in Artificial Intelligence (2011)
Regan, E.A., Hokanson, J.E., Murphy, J.R., Make, B., Lynch, D.A., Beaty, T.H., Curran-Everett, D., Silverman, E.K., Crapo, J.D.: Genetic epidemiology of COPD (COPDGene) study design. COPD: J. Chronic Obstructive Pulm. Dis. 7(1), 32–43 (2011)
Satoh, K., Kobayashi, T., Misao, T., Hitani, Y., Yamamoto, Y., Nishiyama, Y., Ohkawa, M.: CT assessment of subtypes of pulmonary emphysema in smokers. CHEST J. 120(3), 725–729 (2001)
Shaker, S.B., Bruijne, M.D., Sorensen, L., Shaker, S.B., De Bruijne, M.: Quantitative analysis of pulmonary emphysema using local binary patterns. IEEE Trans. Med. Imaging 29(2), 559–569 (2010)
Shapiro, S.D.: Evolving concepts in the pathogenesis of chronic obstructive pulmonary disease. Clin. Chest Med. 21(4), 621–632 (2000)
Song, L., Siddiqi, S.M., Gordon, G., Smola, A.: Hilbert space embeddings of hidden Markov models. In: The 27th International Conference on Machine Learning (ICML2010), pp. 991–998 (2010)
Sorensen, L., Nielsen, M., Lo, P., Ashraf, H., Pedersen, J.H., De Bruijne, M.: Texture-based analysis of COPD: a data-driven approach. IEEE Trans. Med. Imaging 31(1), 70–78 (2012)
Vogl, W.-D., Prosch, H., Müller-Mang, C., Schmidt-Erfurth, U., Langs, G.: Longitudinal alignment of disease progression in fibrosing interstitial lung disease. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8674, pp. 97–104. Springer, Cham (2014). doi:10.1007/978-3-319-10470-6_13
Zhang, Q., Goncalves, B.: Why should I trust you? Explaining the predictions of any classifier, p. 4503. ACM (2015)
Zhang, Z., Wang, J.: MLLE: modified locally linear embedding using multiple weights. In: Advances in Neural Information Processing Systems, pp. 1593–1600 (2006)
Acknowledgements
This work was supported by in part by NLM Training grant T15LM007059, NIH NIBIB NAMIC U54-EB005149, NIH NCRR NAC P41-RR13218 and NIH NIBIB NAC P41-EB015902, NHLBI R01HL089856, R01HL089897, K08HL097029, R01HL113264, 5K25HL104085, 5R01HL116931, and 5R01HL116473. The COPDGene study (NCT00608764) is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, Novartis, Pfizer, Siemens, GlaxoSmithKline and Sunovion.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix: Non-parametric Inference
A Appendix: Non-parametric Inference
In this section, we first show that the unnormalized density f(x) has a closed-form using locally constant approximation. Then, we show why the second-order approximation is computationally expensive for our problem. Finally, we provide more detail on the approximation of the KL and HE divergences.
Assuming a locally constant function for \(f(x) = \exp (a_0)\), we can compute a closed-form solution for \(a_0\) by differentiating Eq. 4 with respect to \(a_0\):
If we set \(h \equiv \rho _{k, S_i } (x) \) and use the step window function (\(w(x) = \mathbb {I}(\Vert x \Vert \le 1)\)), the first term in the right hand-side becomes exactly k and the second term is the volume of a d-dimensional hyper-sphere with radius h which is \(C_d h^d\), and we arrive at Eq. 5. For the Gaussian window function, the first term becomes a weighted sum k points in the vicinity of x and the second term has the same closed-form as the normalizer of the Gaussian distribution.
If we set h to a constant and use the Gaussian window function and the second-order polynomial, i.e., \(\log f(u) |_x \approx a_0 + (u -x)^T a_1 + (u-x)^T a_2 (u-x)\), the local parameters have closed-forms [7, 11]:
where \(A_0 \equiv \sum _{v \in S_i} { \alpha _v (x) }\) and \(\alpha _v (x) \equiv \text {exp}\left( - \frac{ \Vert \psi (v) - x \Vert ^2 }{ 2 h^2 } \right) \), for \(D(x,v) \equiv \frac{1}{h} ( \psi (v) - x ) \), \(A_1 \equiv \sum _{v \in S_i} { \alpha _v (x) D(x,v) }\), and \(A_2 \equiv \sum _{v \in S_i} { \alpha _v (x) D(x,v) D(x,v)^T }\). It is straightforward to see computing \(a_2\) demands inversion of a \(d\times d\) matrix (\(O(d^3)\)) which needs to be done for every patch hence it is computationally prohibitive.
The KL divergence is a straightforward substitution of Eq. 5. Our estimator for HE is proposed by Poczos et al. [14]. The HE estimator is also based on substitution. The minor adjustment (the term behind the summation in Eq. 6) makes sure that the estimator is unbiased.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Schabdach, J., Wells, W.M., Cho, M., Batmanghelich, K.N. (2017). A Likelihood-Free Approach for Characterizing Heterogeneous Diseases in Large-Scale Studies. In: Niethammer, M., et al. Information Processing in Medical Imaging. IPMI 2017. Lecture Notes in Computer Science(), vol 10265. Springer, Cham. https://doi.org/10.1007/978-3-319-59050-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-59050-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59049-3
Online ISBN: 978-3-319-59050-9
eBook Packages: Computer ScienceComputer Science (R0)