Interleaved Text/Image Deep Mining on a Large-Scale Radiology Image Database

Shin, Hoo-Chang; Lu, Le; Kim, Lauren; Seff, Ari; Yao, Jianhua; Summers, Ronald

doi:10.1007/978-3-319-42999-1_17

Hoo-Chang Shin⁶,
Le Lu⁶,
Lauren Kim⁶,
Ari Seff⁶,
Jianhua Yao⁶ &
…
Ronald Summers⁶

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

8756 Accesses

Abstract

Exploiting and effective learning on very large-scale (>100K patients) medical image databases have been a major challenge in spite of noteworthy progress in computer vision. This chapter suggests an interleaved text/image deep learning system to extract and mine the semantic interactions of radiologic images and reports, from a national research hospital’s Picture Archiving and Communication System. This chapter introduces a method to perform unsupervised learning (e.g., latent Dirichlet allocation, feedforward/recurrent neural net language models) on document- and sentence-level texts to generate semantic labels and supervised deep ConvNets with categorization and cross-entropy loss functions to map from images to label spaces. Keywords can be predicted for images in a retrieval manner, and presence/absence of some frequent types of disease can be predicted with probabilities. The large-scale datasets of extracted key images and their categorization, embedded vector labels, and sentence descriptions can be harnessed to alleviate deep learning’s “data-hungry” challenge in the medical domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

RadTex: Learning Efficient Radiograph Representations from Text Reports

Medical Text and Image Processing: Applications, Issues and Challenges

A Cross-Modality Neural Network Transform for Semi-automatic Medical Image Annotation

Notes

1.
Imaging modalities of magnetic resonance imaging (MRI).
2.
Natural language expressions for imaging modalities of magnetic resonance imaging (MRI).
3.
While RNN [46, 47] is one of the popular choices for learning language models [48, 49], deep convolutional neural network [3, 50] is more suitable for image classification.

References

Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Computer vision and pattern recognition. IEEE, pp 248–255
Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2014) Imagenet large scale visual recognition challenge. arXiv:1409.0575
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv:1409.4842
Ordonez V, Deng J, Choi Y, Berg A, Berg T (2013) From large scale image categorization to entry-level categories. In: ICCV
Google Scholar
Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: ECCV
Google Scholar
Kulkarni G, Premraj V, Ordonez V, Dhar S, Li S, Choi Y, Berg A, Berg T (2013) Babytalk: understanding and generating simple image descriptions. IEEE Trans Pattern Anal Mach Intell 35(12):2891–2903
Article Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Frome A, Corrado G, Shlens J, Bengio S, Dean J, Ranzato M, Mikolov T (2013) Devise: a deep visual-semantic embedding model. In: NIPS, pp 2121–2129
Google Scholar
Kiros R, Szepesvri C (2012) Deep representations and codes for image auto-annotation. In: NIPS, pp 917–925
Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv:1408.5093
Gupta S, Girshick R, Arbelez P, Malik J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: ECCV
Google Scholar
Gupta A, Ayhan M, Maida A (2013) Natural image bases to represent neuroimaging data. In: ICML
Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Google Scholar
Ganin Y, Lempitsky V (2014) N4-fields: neural network nearest neighbor fields for image transforms. CoRR. arXiv:1406.6558
Deselaers T, Ney H (2008) Deformations, patches, and discriminative models for automatic annotation of medical radiographs. PRL 29:2003
Google Scholar
Carrivick L, Prabhu S, Goddard P, Rossiter J (2005) Unsupervised learning in radiology using novel latent variable models. In: CVPR
Google Scholar
Barnard K, Duygulu P, Forsyth D, Freitas N, Blei D, Jordan M (2003) Matching words and pictures. JMRL 3:1107–1135
MATH Google Scholar
Blei D, Jordan M (2003) Modeling annotated data. In: ACM SIGIR
Google Scholar
Socher R, Ganjoo M, Manning CD, Ng A (2013) Zero-shot learning through cross-modal transfer. In: Advances in neural information processing systems, pp 935–943
Google Scholar
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Technical report
Google Scholar
Lampert CH, Nickisch H, Harmeling S (2014) Attribute-based classification for zero-shot visual object categorization. IEEE Trans Pattern Anal Mach Intell 36(3):453–465
Article Google Scholar
Scheirer W, Kumar N, Belhumeur P, Boult T (2012) Multi-attribute spaces: calibration for attribute fusion and similarity search. In: CVPR
Google Scholar
Lampert CH, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer. In: CVPR, pp 951–958
Google Scholar
Rashtchian C, Young P, Hodosh M, Hockenmaier J (2010) Collecting image annotations using amazon’s mechanical turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with amazon’s mechanical turk. Association for Computational Linguistics, pp 139–147
Google Scholar
Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: ECCV, pp 512–528
Google Scholar
Ordonez V, Kulkarni G, Berg TL (2011) Im2text: describing images using 1 million captioned photographs. In: Advances in neural information processing systems, pp 1143–1151
Google Scholar
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 50–57
Google Scholar
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Article Google Scholar
Stevens K, Kegelmeyer P, Andrzejewski D, Buttler D (2012) Exploring topic coherence over many models and many topics. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics, pp 952–961
Google Scholar
Girolami M, Kabán A (2003) On an equivalence between PLSI and LDA. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM, pp 433–434
Google Scholar
Ding C, Li T, Peng W (2006) Nonnegative matrix factorization and probabilistic latent semantic indexing: equivalence chi-square statistic, and a hybrid method. In: Proceedings of the national conference on artificial intelligence, vol 21. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, p 342
Google Scholar
Gaussier E, Goutte C (2005) Relation between PLSA and NMF and implications. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 601–602
Google Scholar
Ramage D, Rosen E (2011) Stanford topic modeling toolbox. http://www-nlp.stanford.edu/software/tmt
Kiapour H, Yamaguchi K, Berg A, Berg T (2014) Hipster wars: discovering elements of fashion styles. In: ECCV
Google Scholar
Ordonez V, Berg T (2014) Learning high-level judgments of urban perception. In: ECCV
Google Scholar
Mikolov T, Yih WT, Zweig G (2013) Linguistic regularities in continuous space word representations. In: HLT-NAACL, pp 746–751 (Citeseer)
Google Scholar
Openi - an open access biomedical image search engine. http://openi.nlm.nih.gov. Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine
Shin HC, Orton MR, Collins DJ, Doran SJ, Leach MO (2013) Stacked autoencoders for unsupervised feature learning and multiple organ detection in a pilot study using 4d patient data. IEEE Trans Pattern Anal Mach Intell 35(8):1930–1943
Article Google Scholar
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning. ACM, pp 1096–1103
Google Scholar
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
MathSciNet MATH Google Scholar
Schriml LM, Arze C, Nadendla S, Chang YWW, Mazaitis M, Felix V, Feng G, Kibbe WA (2012) Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res 40(D1):D940–D946
Article Google Scholar
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems
Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. Cognitive modeling
Google Scholar
Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10):1550–1560
Article Google Scholar
Bengio Y, Schwenk H, Senécal JS, Morin F, Gauvain JL (2006) Neural probabilistic language models. Innovations in machine learning. Springer, Berlin, pp 137–186
Google Scholar
Mikolov T, Karafiát M, Burget L, Cernockỳ J, Khudanpur S (2010) Recurrent neural network based language model. In: INTERSPEECH, pp 1045–1048
Google Scholar
LeCun Y, Huang FJ, Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, CVPR 2004, vol 2. IEEE, pp II–97
Google Scholar
Li S, Kulkarni G, Berg T, Berg A, Choi Y (2011) Composing simple image descriptions using web-scale n-grams. In: ACM CoNLL, pp 220–228
Google Scholar
Mitchell M, Han X, Dodge J, Mensch A, Goyal A, Berg A, Yamaguchi K, Berg T, Stratos K, Daume H (2012) Midge: generating image descriptions from computer vision detections. In: EACL, pp 747–756
Google Scholar
Mittelman R, Lee H, Kuipers B, Savarese S (2013) Weakly supervised learning of mid-level features with beta-Bernoulli process restricted Boltzmann machines. In: CVPR
Google Scholar
Oquab M, Bottou L, Laptev I, Sivic J (2014) Weakly supervised object recognition with convolutional neural networks. Technical report. HAL-01015140, INRIA
Google Scholar
Pinheiro P, Collobert R (2014) Weakly supervised object segmentation with convolutional neural networks. Technical report. Idiap-RR-13-2014, Idiap
Google Scholar
Berg A, Berg T, Daume H, Dodge J, Goyal A, Han X, Mensch A, Mitchell M, Sood A, Stratos K, Yamaguchi K (2012) Understanding and predicting importance in images. In: CVPR
Google Scholar
Gong Y, Wang L, Hodosh M, Hockenmaier J, Lazebnik S (2014) Improving image-sentence embeddings using large weakly annotated photo collections. In: ECCV
Google Scholar

Download references

Author information

Authors and Affiliations

Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Radiology and Imaging Sciences, National Institutes of Health Clinical Center, Bethesda, MD, 20892-1182, USA
Hoo-Chang Shin, Le Lu, Lauren Kim, Ari Seff, Jianhua Yao & Ronald Summers

Authors

Hoo-Chang Shin
View author publications
You can also search for this author in PubMed Google Scholar
Le Lu
View author publications
You can also search for this author in PubMed Google Scholar
Lauren Kim
View author publications
You can also search for this author in PubMed Google Scholar
Ari Seff
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Yao
View author publications
You can also search for this author in PubMed Google Scholar
Ronald Summers
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Le Lu .

Editor information

Editors and Affiliations

NIH Clinical Center, Bethesda, Maryland, USA
Le Lu
Siemens Healthcare Technology Center, Princeton, New Jersey, USA
Yefeng Zheng
University of Adelaide, Adelaide, South Australia, Australia
Gustavo Carneiro
University of Florida, Gainesville, Florida, USA
Lin Yang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shin, HC., Lu, L., Kim, L., Seff, A., Yao, J., Summers, R. (2017). Interleaved Text/Image Deep Mining on a Large-Scale Radiology Image Database. In: Lu, L., Zheng, Y., Carneiro, G., Yang, L. (eds) Deep Learning and Convolutional Neural Networks for Medical Image Computing. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-42999-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-42999-1_17
Published: 14 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42998-4
Online ISBN: 978-3-319-42999-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Interleaved Text/Image Deep Mining on a Large-Scale Radiology Image Database

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

RadTex: Learning Efficient Radiograph Representations from Text Reports

Medical Text and Image Processing: Applications, Issues and Challenges

A Cross-Modality Neural Network Transform for Semi-automatic Medical Image Annotation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Interleaved Text/Image Deep Mining on a Large-Scale Radiology Image Database

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

RadTex: Learning Efficient Radiograph Representations from Text Reports

Medical Text and Image Processing: Applications, Issues and Challenges

A Cross-Modality Neural Network Transform for Semi-automatic Medical Image Annotation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation