A Likelihood-Free Approach for Characterizing Heterogeneous Diseases in Large-Scale Studies

Schabdach, Jenna; Wells, William M.; Cho, Michael; Batmanghelich, Kayhan N.

doi:10.1007/978-3-319-59050-9_14

Jenna Schabdach²⁰,
William M. Wells III²²,
Michael Cho²² &
…
Kayhan N. Batmanghelich^20,21

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10265))

Included in the following conference series:

International Conference on Information Processing in Medical Imaging

6926 Accesses
4 Altmetric

Abstract

We propose a non-parametric approach for characterizing heterogeneous diseases in large-scale studies. We target diseases where multiple types of pathology present simultaneously in each subject and a more severe disease manifests as a higher level of tissue destruction. For each subject, we model the collection of local image descriptors as samples generated by an unknown subject-specific probability density. Instead of approximating the probability density via a parametric family, we propose to side step the parametric inference by directly estimating the divergence between subject densities. Our method maps the collection of local image descriptors to a signature vector that is used to predict a clinical measurement. We are able to interpret the prediction of the clinical variable in the population and individual levels by carefully studying the divergences. We illustrate an application this method on simulated data as well as on a large-scale lung CT study of Chronic Obstructive Pulmonary Disease (COPD). Our approach outperforms classical methods on both simulated and COPD data and demonstrates the state-of-the-art prediction on an important physiologic measure of airflow (the forced respiratory volume in one second, FEV1).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unsupervised Discovery of Emphysema Subtypes in a Large Clinical Cohort

A Feature-Based Approach to Big Data Analysis of Medical Images

Cluster-Guided Multiscale Lung Modeling via Machine Learning

References

Alexander, D.H., Novembre, J., Lange, K.: Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19(9), 1655–1664 (2009)
Article Google Scholar
Batmanghelich, N.K., Saeedi, A., Cho, M., Estepar, R.S.J., Golland, P.: Generative method to discover genetically driven image biomarkers. Int. Conf. Inf. Process. Med. Imaging 17(1), 30–42 (2015)
Google Scholar
Binder, P., Batmanghelich, N.K., Estepar, R.S.J., Golland, P.: Unsupervised discovery of emphysema subtypes in a large clinical cohort. In: Wang, L., Adeli, E., Wang, Q., Shi, Y., Suk, H.-I. (eds.) MLMI 2016. LNCS, vol. 10019, pp. 180–187. Springer, Cham (2016). doi:10.1007/978-3-319-47157-0_22
Chapter Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Depeursinge, A., Chin, A.S., Leung, A.N., Terrone, D., Bristow, M., Rosen, G., Rubin, D.L.: Automated classification of usual interstitial pneumonia using regional volumetric texture analysis in high-resolution computed tomography. Invest. Radiol. 50(4), 261–267 (2015)
Article Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., Ishwaran, H., Knight, K., Loubes, J.M., Massart, P., Madigan, D., Ridgeway, G., Rosset, S., Zhu, J.I., Stine, R.A., Turlach, B.A., Weisberg, S., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Article MathSciNet Google Scholar
Gao, W., Oh, S., Viswanath, P.: Breaking the bandwidth barrier: geometrical adaptive entropy estimation (2016). http://arxiv.org/abs/1609.02208
Holzer, M., Donner, R.: Over-segmentation of 3D medical image volumes based on monogenic cues. In: CVWW, pp. 35–42 (2014). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.707.2473&rep=rep1&type=pdf
Lauritzen, S.L., Barndorff-Nielsen, O.E., Kass, R.E., Lauritzen, S.L., Rao, C.R.: Chapter 4: Statistical Manifolds, pp. 163–216. Institute of Mathematical Statistics (1987). http://projecteuclid.org/euclid.lnms/1215467061
Liu, K., Skibbe, H., Schmidt, T., Blein, T., Palme, K., Brox, T., Ronneberger, O.: Rotation-invariant HOG descriptors using fourier analysis in polar and spherical coordinates. Int. J. Comput. Vis. 106(3), 342–364 (2014)
Article MathSciNet MATH Google Scholar
Loader, C.R.: Local likelihood density estimation. Ann. Stat. 24(4), 1602–1618 (1996)
Article MathSciNet MATH Google Scholar
Mendoza, C.S., et al.: Emphysema quantification in a multi-scanner HRCT cohort using local intensity distributions. In: 2012 9th IEEE International Symposium on Biomedical Imaging (ISBI), pp. 474–477. IEEE (2012)
Google Scholar
Muja, M., Lowe, D.G.: Scalable nearest neighbour algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)
Article Google Scholar
Póczos, B., Schneider, J.G.: On the estimation of alpha-divergences. In: AISTATS, pp. 609–617 (2011)
Google Scholar
Poczos, B., Xiong, L., Schneider, J.: Nonparametric divergence estimation with applications to machine learning on distributions. Uncertainty in Artificial Intelligence (2011)
Google Scholar
Regan, E.A., Hokanson, J.E., Murphy, J.R., Make, B., Lynch, D.A., Beaty, T.H., Curran-Everett, D., Silverman, E.K., Crapo, J.D.: Genetic epidemiology of COPD (COPDGene) study design. COPD: J. Chronic Obstructive Pulm. Dis. 7(1), 32–43 (2011)
Article Google Scholar
Satoh, K., Kobayashi, T., Misao, T., Hitani, Y., Yamamoto, Y., Nishiyama, Y., Ohkawa, M.: CT assessment of subtypes of pulmonary emphysema in smokers. CHEST J. 120(3), 725–729 (2001)
Article Google Scholar
Shaker, S.B., Bruijne, M.D., Sorensen, L., Shaker, S.B., De Bruijne, M.: Quantitative analysis of pulmonary emphysema using local binary patterns. IEEE Trans. Med. Imaging 29(2), 559–569 (2010)
Article Google Scholar
Shapiro, S.D.: Evolving concepts in the pathogenesis of chronic obstructive pulmonary disease. Clin. Chest Med. 21(4), 621–632 (2000)
Article Google Scholar
Song, L., Siddiqi, S.M., Gordon, G., Smola, A.: Hilbert space embeddings of hidden Markov models. In: The 27th International Conference on Machine Learning (ICML2010), pp. 991–998 (2010)
Google Scholar
Sorensen, L., Nielsen, M., Lo, P., Ashraf, H., Pedersen, J.H., De Bruijne, M.: Texture-based analysis of COPD: a data-driven approach. IEEE Trans. Med. Imaging 31(1), 70–78 (2012)
Article Google Scholar
Vogl, W.-D., Prosch, H., Müller-Mang, C., Schmidt-Erfurth, U., Langs, G.: Longitudinal alignment of disease progression in fibrosing interstitial lung disease. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8674, pp. 97–104. Springer, Cham (2014). doi:10.1007/978-3-319-10470-6_13
Google Scholar
Zhang, Q., Goncalves, B.: Why should I trust you? Explaining the predictions of any classifier, p. 4503. ACM (2015)
Google Scholar
Zhang, Z., Wang, J.: MLLE: modified locally linear embedding using multiple weights. In: Advances in Neural Information Processing Systems, pp. 1593–1600 (2006)
Google Scholar

Download references

Acknowledgements

This work was supported by in part by NLM Training grant T15LM007059, NIH NIBIB NAMIC U54-EB005149, NIH NCRR NAC P41-RR13218 and NIH NIBIB NAC P41-EB015902, NHLBI R01HL089856, R01HL089897, K08HL097029, R01HL113264, 5K25HL104085, 5R01HL116931, and 5R01HL116473. The COPDGene study (NCT00608764) is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, Novartis, Pfizer, Siemens, GlaxoSmithKline and Sunovion.

Author information

Authors and Affiliations

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, USA
Jenna Schabdach & Kayhan N. Batmanghelich
Intelligence Systems Program, University of Pittsburgh, Pittsburgh, USA
Kayhan N. Batmanghelich
Brigham and Women’s Hospital, Harvard Medical School, Boston, USA
William M. Wells III & Michael Cho

Authors

Jenna Schabdach
View author publications
You can also search for this author in PubMed Google Scholar
William M. Wells III
View author publications
You can also search for this author in PubMed Google Scholar
Michael Cho
View author publications
You can also search for this author in PubMed Google Scholar
Kayhan N. Batmanghelich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kayhan N. Batmanghelich .

Editor information

Editors and Affiliations

University of North Carolina, Chapel Hill, North Carolina, USA
Marc Niethammer
University of North Carolina, Chapel Hill, North Carolina, USA
Martin Styner
Kitware Inc., Carrboro, North Carolina, USA
Stephen Aylward
University of North Carolina, Chapel Hill, North Carolina, USA
Hongtu Zhu
University of Pennsylvania, Philadelphia, Pennsylvania, USA
Ipek Oguz
University of North Carolina, Chapel Hill, North Carolina, USA
Pew-Thian Yap
University of North Carolina, Chapel Hill, North Carolina, USA
Dinggang Shen

A Appendix: Non-parametric Inference

In this section, we first show that the unnormalized density f(x) has a closed-form using locally constant approximation. Then, we show why the second-order approximation is computationally expensive for our problem. Finally, we provide more detail on the approximation of the KL and HE divergences.

Assuming a locally constant function for $f(x) = \exp (a_0)$, we can compute a closed-form solution for $a_0$ by differentiating Eq. 4 with respect to $a_0$:

$$\begin{aligned} \frac{d \mathcal {L}_x (f_i)}{da_0} = \sum _{ v \in S_i } { w \left( \frac{x - \psi (v)}{h} \right) } - | S_i | \int { w \left( \frac{y - x}{h} \right) e^{a_0} dy} = 0 \end{aligned}$$

If we set $h \equiv \rho _{k, S_i } (x) $ and use the step window function ($w(x) = \mathbb {I}(\Vert x \Vert \le 1)$), the first term in the right hand-side becomes exactly k and the second term is the volume of a d-dimensional hyper-sphere with radius h which is $C_d h^d$, and we arrive at Eq. 5. For the Gaussian window function, the first term becomes a weighted sum k points in the vicinity of x and the second term has the same closed-form as the normalizer of the Gaussian distribution.

If we set h to a constant and use the Gaussian window function and the second-order polynomial, i.e., $\log f(u) |_x \approx a_0 + (u -x)^T a_1 + (u-x)^T a_2 (u-x)$, the local parameters have closed-forms [7, 11]:

$$\begin{aligned}&a_0 = \log (A_0 ) - \frac{ \Vert A_1 \Vert ^2}{A_0^2} -( d \log \sqrt{2\pi } + (d+1) \log n ), a_1 = \frac{1}{ h A_0 } A_1, \\&a_2 = \frac{1}{ 2h^2 } I_{d \times d} - \frac{A_0}{ 2h^2 } \left( A_2 - A_1 A_1^T \right) ^{-1} \end{aligned}$$

where $A_0 \equiv \sum _{v \in S_i} { \alpha _v (x) }$ and $\alpha _v (x) \equiv \text {exp}\left( - \frac{ \Vert \psi (v) - x \Vert ^2 }{ 2 h^2 } \right) $, for $D(x,v) \equiv \frac{1}{h} ( \psi (v) - x ) $, $A_1 \equiv \sum _{v \in S_i} { \alpha _v (x) D(x,v) }$, and $A_2 \equiv \sum _{v \in S_i} { \alpha _v (x) D(x,v) D(x,v)^T }$. It is straightforward to see computing $a_2$ demands inversion of a $d\times d$ matrix ($O(d^3)$) which needs to be done for every patch hence it is computationally prohibitive.

The KL divergence is a straightforward substitution of Eq. 5. Our estimator for HE is proposed by Poczos et al. [14]. The HE estimator is also based on substitution. The minor adjustment (the term behind the summation in Eq. 6) makes sure that the estimator is unbiased.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schabdach, J., Wells, W.M., Cho, M., Batmanghelich, K.N. (2017). A Likelihood-Free Approach for Characterizing Heterogeneous Diseases in Large-Scale Studies. In: Niethammer, M., et al. Information Processing in Medical Imaging. IPMI 2017. Lecture Notes in Computer Science(), vol 10265. Springer, Cham. https://doi.org/10.1007/978-3-319-59050-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-59050-9_14
Published: 23 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59049-3
Online ISBN: 978-3-319-59050-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Likelihood-Free Approach for Characterizing Heterogeneous Diseases in Large-Scale Studies

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised Discovery of Emphysema Subtypes in a Large Clinical Cohort

A Feature-Based Approach to Big Data Analysis of Medical Images

Cluster-Guided Multiscale Lung Modeling via Machine Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix: Non-parametric Inference

A Appendix: Non-parametric Inference

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us