Abstract
Exploiting and effective learning on very large-scale (>100K patients) medical image databases have been a major challenge in spite of noteworthy progress in computer vision. This chapter suggests an interleaved text/image deep learning system to extract and mine the semantic interactions of radiologic images and reports, from a national research hospital’s Picture Archiving and Communication System. This chapter introduces a method to perform unsupervised learning (e.g., latent Dirichlet allocation, feedforward/recurrent neural net language models) on document- and sentence-level texts to generate semantic labels and supervised deep ConvNets with categorization and cross-entropy loss functions to map from images to label spaces. Keywords can be predicted for images in a retrieval manner, and presence/absence of some frequent types of disease can be predicted with probabilities. The large-scale datasets of extracted key images and their categorization, embedded vector labels, and sentence descriptions can be harnessed to alleviate deep learning’s “data-hungry” challenge in the medical domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Imaging modalities of magnetic resonance imaging (MRI).
- 2.
Natural language expressions for imaging modalities of magnetic resonance imaging (MRI).
- 3.
References
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Computer vision and pattern recognition. IEEE, pp 248–255
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2014) Imagenet large scale visual recognition challenge. arXiv:1409.0575
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv:1409.4842
Ordonez V, Deng J, Choi Y, Berg A, Berg T (2013) From large scale image categorization to entry-level categories. In: ICCV
Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: ECCV
Kulkarni G, Premraj V, Ordonez V, Dhar S, Li S, Choi Y, Berg A, Berg T (2013) Babytalk: understanding and generating simple image descriptions. IEEE Trans Pattern Anal Mach Intell 35(12):2891–2903
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Frome A, Corrado G, Shlens J, Bengio S, Dean J, Ranzato M, Mikolov T (2013) Devise: a deep visual-semantic embedding model. In: NIPS, pp 2121–2129
Kiros R, Szepesvri C (2012) Deep representations and codes for image auto-annotation. In: NIPS, pp 917–925
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv:1408.5093
Gupta S, Girshick R, Arbelez P, Malik J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: ECCV
Gupta A, Ayhan M, Maida A (2013) Natural image bases to represent neuroimaging data. In: ICML
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Ganin Y, Lempitsky V (2014) N4-fields: neural network nearest neighbor fields for image transforms. CoRR. arXiv:1406.6558
Deselaers T, Ney H (2008) Deformations, patches, and discriminative models for automatic annotation of medical radiographs. PRL 29:2003
Carrivick L, Prabhu S, Goddard P, Rossiter J (2005) Unsupervised learning in radiology using novel latent variable models. In: CVPR
Barnard K, Duygulu P, Forsyth D, Freitas N, Blei D, Jordan M (2003) Matching words and pictures. JMRL 3:1107–1135
Blei D, Jordan M (2003) Modeling annotated data. In: ACM SIGIR
Socher R, Ganjoo M, Manning CD, Ng A (2013) Zero-shot learning through cross-modal transfer. In: Advances in neural information processing systems, pp 935–943
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Technical report
Lampert CH, Nickisch H, Harmeling S (2014) Attribute-based classification for zero-shot visual object categorization. IEEE Trans Pattern Anal Mach Intell 36(3):453–465
Scheirer W, Kumar N, Belhumeur P, Boult T (2012) Multi-attribute spaces: calibration for attribute fusion and similarity search. In: CVPR
Lampert CH, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer. In: CVPR, pp 951–958
Rashtchian C, Young P, Hodosh M, Hockenmaier J (2010) Collecting image annotations using amazon’s mechanical turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with amazon’s mechanical turk. Association for Computational Linguistics, pp 139–147
Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: ECCV, pp 512–528
Ordonez V, Kulkarni G, Berg TL (2011) Im2text: describing images using 1 million captioned photographs. In: Advances in neural information processing systems, pp 1143–1151
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 50–57
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Stevens K, Kegelmeyer P, Andrzejewski D, Buttler D (2012) Exploring topic coherence over many models and many topics. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics, pp 952–961
Girolami M, Kabán A (2003) On an equivalence between PLSI and LDA. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM, pp 433–434
Ding C, Li T, Peng W (2006) Nonnegative matrix factorization and probabilistic latent semantic indexing: equivalence chi-square statistic, and a hybrid method. In: Proceedings of the national conference on artificial intelligence, vol 21. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, p 342
Gaussier E, Goutte C (2005) Relation between PLSA and NMF and implications. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 601–602
Ramage D, Rosen E (2011) Stanford topic modeling toolbox. http://www-nlp.stanford.edu/software/tmt
Kiapour H, Yamaguchi K, Berg A, Berg T (2014) Hipster wars: discovering elements of fashion styles. In: ECCV
Ordonez V, Berg T (2014) Learning high-level judgments of urban perception. In: ECCV
Mikolov T, Yih WT, Zweig G (2013) Linguistic regularities in continuous space word representations. In: HLT-NAACL, pp 746–751 (Citeseer)
Openi - an open access biomedical image search engine. http://openi.nlm.nih.gov. Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine
Shin HC, Orton MR, Collins DJ, Doran SJ, Leach MO (2013) Stacked autoencoders for unsupervised feature learning and multiple organ detection in a pilot study using 4d patient data. IEEE Trans Pattern Anal Mach Intell 35(8):1930–1943
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning. ACM, pp 1096–1103
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
Schriml LM, Arze C, Nadendla S, Chang YWW, Mazaitis M, Felix V, Feng G, Kibbe WA (2012) Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res 40(D1):D940–D946
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems
Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. Cognitive modeling
Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10):1550–1560
Bengio Y, Schwenk H, Senécal JS, Morin F, Gauvain JL (2006) Neural probabilistic language models. Innovations in machine learning. Springer, Berlin, pp 137–186
Mikolov T, Karafiát M, Burget L, Cernockỳ J, Khudanpur S (2010) Recurrent neural network based language model. In: INTERSPEECH, pp 1045–1048
LeCun Y, Huang FJ, Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, CVPR 2004, vol 2. IEEE, pp II–97
Li S, Kulkarni G, Berg T, Berg A, Choi Y (2011) Composing simple image descriptions using web-scale n-grams. In: ACM CoNLL, pp 220–228
Mitchell M, Han X, Dodge J, Mensch A, Goyal A, Berg A, Yamaguchi K, Berg T, Stratos K, Daume H (2012) Midge: generating image descriptions from computer vision detections. In: EACL, pp 747–756
Mittelman R, Lee H, Kuipers B, Savarese S (2013) Weakly supervised learning of mid-level features with beta-Bernoulli process restricted Boltzmann machines. In: CVPR
Oquab M, Bottou L, Laptev I, Sivic J (2014) Weakly supervised object recognition with convolutional neural networks. Technical report. HAL-01015140, INRIA
Pinheiro P, Collobert R (2014) Weakly supervised object segmentation with convolutional neural networks. Technical report. Idiap-RR-13-2014, Idiap
Berg A, Berg T, Daume H, Dodge J, Goyal A, Han X, Mensch A, Mitchell M, Sood A, Stratos K, Yamaguchi K (2012) Understanding and predicting importance in images. In: CVPR
Gong Y, Wang L, Hodosh M, Hockenmaier J, Lazebnik S (2014) Improving image-sentence embeddings using large weakly annotated photo collections. In: ECCV
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Shin, HC., Lu, L., Kim, L., Seff, A., Yao, J., Summers, R. (2017). Interleaved Text/Image Deep Mining on a Large-Scale Radiology Image Database. In: Lu, L., Zheng, Y., Carneiro, G., Yang, L. (eds) Deep Learning and Convolutional Neural Networks for Medical Image Computing. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-42999-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-42999-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42998-4
Online ISBN: 978-3-319-42999-1
eBook Packages: Computer ScienceComputer Science (R0)