Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Interleaved Text/Image Deep Mining on a Large-Scale Radiology Image Database

  • Chapter
  • First Online:
Deep Learning and Convolutional Neural Networks for Medical Image Computing

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

  • 8756 Accesses

Abstract

Exploiting and effective learning on very large-scale (>100K patients) medical image databases have been a major challenge in spite of noteworthy progress in computer vision. This chapter suggests an interleaved text/image deep learning system to extract and mine the semantic interactions of radiologic images and reports, from a national research hospital’s Picture Archiving and Communication System. This chapter introduces a method to perform unsupervised learning (e.g., latent Dirichlet allocation, feedforward/recurrent neural net language models) on document- and sentence-level texts to generate semantic labels and supervised deep ConvNets with categorization and cross-entropy loss functions to map from images to label spaces. Keywords can be predicted for images in a retrieval manner, and presence/absence of some frequent types of disease can be predicted with probabilities. The large-scale datasets of extracted key images and their categorization, embedded vector labels, and sentence descriptions can be harnessed to alleviate deep learning’s “data-hungry” challenge in the medical domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Imaging modalities of magnetic resonance imaging (MRI).

  2. 2.

    Natural language expressions for imaging modalities of magnetic resonance imaging (MRI).

  3. 3.

    While RNN [46, 47] is one of the popular choices for learning language models [48, 49], deep convolutional neural network [3, 50] is more suitable for image classification.

References

  1. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Computer vision and pattern recognition. IEEE, pp 248–255

    Google Scholar 

  2. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2014) Imagenet large scale visual recognition challenge. arXiv:1409.0575

  3. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

    Google Scholar 

  4. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  5. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv:1409.4842

  6. Ordonez V, Deng J, Choi Y, Berg A, Berg T (2013) From large scale image categorization to entry-level categories. In: ICCV

    Google Scholar 

  7. Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: ECCV

    Google Scholar 

  8. Kulkarni G, Premraj V, Ordonez V, Dhar S, Li S, Choi Y, Berg A, Berg T (2013) Babytalk: understanding and generating simple image descriptions. IEEE Trans Pattern Anal Mach Intell 35(12):2891–2903

    Article  Google Scholar 

  9. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  10. Frome A, Corrado G, Shlens J, Bengio S, Dean J, Ranzato M, Mikolov T (2013) Devise: a deep visual-semantic embedding model. In: NIPS, pp 2121–2129

    Google Scholar 

  11. Kiros R, Szepesvri C (2012) Deep representations and codes for image auto-annotation. In: NIPS, pp 917–925

    Google Scholar 

  12. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv:1408.5093

  13. Gupta S, Girshick R, Arbelez P, Malik J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: ECCV

    Google Scholar 

  14. Gupta A, Ayhan M, Maida A (2013) Natural image bases to represent neuroimaging data. In: ICML

    Google Scholar 

  15. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781

  16. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

    Google Scholar 

  17. Ganin Y, Lempitsky V (2014) N4-fields: neural network nearest neighbor fields for image transforms. CoRR. arXiv:1406.6558

  18. Deselaers T, Ney H (2008) Deformations, patches, and discriminative models for automatic annotation of medical radiographs. PRL 29:2003

    Google Scholar 

  19. Carrivick L, Prabhu S, Goddard P, Rossiter J (2005) Unsupervised learning in radiology using novel latent variable models. In: CVPR

    Google Scholar 

  20. Barnard K, Duygulu P, Forsyth D, Freitas N, Blei D, Jordan M (2003) Matching words and pictures. JMRL 3:1107–1135

    MATH  Google Scholar 

  21. Blei D, Jordan M (2003) Modeling annotated data. In: ACM SIGIR

    Google Scholar 

  22. Socher R, Ganjoo M, Manning CD, Ng A (2013) Zero-shot learning through cross-modal transfer. In: Advances in neural information processing systems, pp 935–943

    Google Scholar 

  23. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Technical report

    Google Scholar 

  24. Lampert CH, Nickisch H, Harmeling S (2014) Attribute-based classification for zero-shot visual object categorization. IEEE Trans Pattern Anal Mach Intell 36(3):453–465

    Article  Google Scholar 

  25. Scheirer W, Kumar N, Belhumeur P, Boult T (2012) Multi-attribute spaces: calibration for attribute fusion and similarity search. In: CVPR

    Google Scholar 

  26. Lampert CH, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer. In: CVPR, pp 951–958

    Google Scholar 

  27. Rashtchian C, Young P, Hodosh M, Hockenmaier J (2010) Collecting image annotations using amazon’s mechanical turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with amazon’s mechanical turk. Association for Computational Linguistics, pp 139–147

    Google Scholar 

  28. Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: ECCV, pp 512–528

    Google Scholar 

  29. Ordonez V, Kulkarni G, Berg TL (2011) Im2text: describing images using 1 million captioned photographs. In: Advances in neural information processing systems, pp 1143–1151

    Google Scholar 

  30. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 50–57

    Google Scholar 

  31. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791

    Article  Google Scholar 

  32. Stevens K, Kegelmeyer P, Andrzejewski D, Buttler D (2012) Exploring topic coherence over many models and many topics. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics, pp 952–961

    Google Scholar 

  33. Girolami M, Kabán A (2003) On an equivalence between PLSI and LDA. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM, pp 433–434

    Google Scholar 

  34. Ding C, Li T, Peng W (2006) Nonnegative matrix factorization and probabilistic latent semantic indexing: equivalence chi-square statistic, and a hybrid method. In: Proceedings of the national conference on artificial intelligence, vol 21. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, p 342

    Google Scholar 

  35. Gaussier E, Goutte C (2005) Relation between PLSA and NMF and implications. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 601–602

    Google Scholar 

  36. Ramage D, Rosen E (2011) Stanford topic modeling toolbox. http://www-nlp.stanford.edu/software/tmt

  37. Kiapour H, Yamaguchi K, Berg A, Berg T (2014) Hipster wars: discovering elements of fashion styles. In: ECCV

    Google Scholar 

  38. Ordonez V, Berg T (2014) Learning high-level judgments of urban perception. In: ECCV

    Google Scholar 

  39. Mikolov T, Yih WT, Zweig G (2013) Linguistic regularities in continuous space word representations. In: HLT-NAACL, pp 746–751 (Citeseer)

    Google Scholar 

  40. Openi - an open access biomedical image search engine. http://openi.nlm.nih.gov. Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine

  41. Shin HC, Orton MR, Collins DJ, Doran SJ, Leach MO (2013) Stacked autoencoders for unsupervised feature learning and multiple organ detection in a pilot study using 4d patient data. IEEE Trans Pattern Anal Mach Intell 35(8):1930–1943

    Article  Google Scholar 

  42. Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning. ACM, pp 1096–1103

    Google Scholar 

  43. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408

    MathSciNet  MATH  Google Scholar 

  44. Schriml LM, Arze C, Nadendla S, Chang YWW, Mazaitis M, Felix V, Feng G, Kibbe WA (2012) Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res 40(D1):D940–D946

    Article  Google Scholar 

  45. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems

    Google Scholar 

  46. Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. Cognitive modeling

    Google Scholar 

  47. Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10):1550–1560

    Article  Google Scholar 

  48. Bengio Y, Schwenk H, Senécal JS, Morin F, Gauvain JL (2006) Neural probabilistic language models. Innovations in machine learning. Springer, Berlin, pp 137–186

    Google Scholar 

  49. Mikolov T, Karafiát M, Burget L, Cernockỳ J, Khudanpur S (2010) Recurrent neural network based language model. In: INTERSPEECH, pp 1045–1048

    Google Scholar 

  50. LeCun Y, Huang FJ, Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, CVPR 2004, vol 2. IEEE, pp II–97

    Google Scholar 

  51. Li S, Kulkarni G, Berg T, Berg A, Choi Y (2011) Composing simple image descriptions using web-scale n-grams. In: ACM CoNLL, pp 220–228

    Google Scholar 

  52. Mitchell M, Han X, Dodge J, Mensch A, Goyal A, Berg A, Yamaguchi K, Berg T, Stratos K, Daume H (2012) Midge: generating image descriptions from computer vision detections. In: EACL, pp 747–756

    Google Scholar 

  53. Mittelman R, Lee H, Kuipers B, Savarese S (2013) Weakly supervised learning of mid-level features with beta-Bernoulli process restricted Boltzmann machines. In: CVPR

    Google Scholar 

  54. Oquab M, Bottou L, Laptev I, Sivic J (2014) Weakly supervised object recognition with convolutional neural networks. Technical report. HAL-01015140, INRIA

    Google Scholar 

  55. Pinheiro P, Collobert R (2014) Weakly supervised object segmentation with convolutional neural networks. Technical report. Idiap-RR-13-2014, Idiap

    Google Scholar 

  56. Berg A, Berg T, Daume H, Dodge J, Goyal A, Han X, Mensch A, Mitchell M, Sood A, Stratos K, Yamaguchi K (2012) Understanding and predicting importance in images. In: CVPR

    Google Scholar 

  57. Gong Y, Wang L, Hodosh M, Hockenmaier J, Lazebnik S (2014) Improving image-sentence embeddings using large weakly annotated photo collections. In: ECCV

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Le Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Shin, HC., Lu, L., Kim, L., Seff, A., Yao, J., Summers, R. (2017). Interleaved Text/Image Deep Mining on a Large-Scale Radiology Image Database. In: Lu, L., Zheng, Y., Carneiro, G., Yang, L. (eds) Deep Learning and Convolutional Neural Networks for Medical Image Computing. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-42999-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42999-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42998-4

  • Online ISBN: 978-3-319-42999-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics