Abstract
Slif uses a combination of text-mining and image processing to extract information from figures in the biomedical literature. It also uses innovative extensions to traditional latent topic modeling to provide new ways to traverse the literature. Slif provides a publicly available searchable database (http://slif.cbi.cmu.edu).
Slif originally focused on fluorescence microscopy images. We have now extended it to classify panels into more image types. We also improved the classification into subcellular classes by building a more representative training set. To get the most out of the human labeling effort, we used active learning to select images to label.
We developed models that take into account the structure of the document (with panels inside figures inside papers) and the multi-modality of the information (free and annotated text, images, information from external databases). This has allowed us to provide new ways to navigate a large collection of documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Murphy, R.F., Velliste, M., Yao, J., Porreca, G.: Searching online journals for fluorescence microscope images depicting protein subcellular location patterns. In: BIBE 2001: Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering, Washington, DC, USA, pp. 119–128. IEEE Computer Society, Los Alamitos (2001)
Cohen, W.W., Wang, R., Murphy, R.F.: Understanding captions in biomedical publications. In: KDD 2003: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 499–504. ACM, New York (2003)
Murphy, R.F., Kou, Z., Hua, J., Joffe, M., Cohen, W.W.: Extracting and structuring subcellular location information from on-line journal articles: The subcellular location image finder. In: Proceedings of IASTED International Conference on Knowledge Sharing and Collaborative Engineering, pp. 109–114 (2004)
Kou, Z., Cohen, W.W., Murphy, R.F.: A stacked graphical model for associating sub-images with sub-captions. In: Altman, R.B., Dunker, A.K., Hunter, L., Murray, T., Klein, T.E. (eds.) Proceedings of the Pacific Symposium on Biocomputing, pp. 257–268. World Scientific, Singapore (2007)
Ahmed, A., Arnold, A., Coelho, L.P., Kangas, J., Sheikh, A.S., Xing, E.P., Cohen, W.W., Murphy, R.F.: Structured literature image finder: Parsing text and figures in biomedical literature. Journal of Web Semantics (2009) (in press)
Gingras, D., Michaud, M., Tomasso, G.D., Bliveau, E., Nyalendo, C., Bliveau, R.: Sphingosine-1-phosphate induces the association of membrane-type 1 matrix metalloproteinase with p130cas in endothelial cells. FEBS Letters 582(3), 399–404 (2008)
Kou, Z., Cohen, W.W., Murphy, R.F.: High-recall protein entity recognition using a dictionary. Bioinformatics 21, i266–i273 (2005)
Geusebroek, J.M., Hoang, M.A., van Gernert, J., Worring, M.: Genre-based search through biomedical images. In: Proceedings of 16th International Conference on Pattern Recognition, vol. 1, pp. 271–274 (2002)
Shatkay, H., Chen, N., Blostein, D.: Integrating image data into biomedical text categorization. Bioinformatics 22(14), 446–453 (2006)
Rafkind, B., Lee, M., Chang, S., Yu, H.: Exploring text and image features to classify images in bioscience literature. In: Proceedings of the BioNLP Workshop on Linking Natural Language Processing and Biology at HLT-NAACL, Morristown, NJ, USA. Association for Computational Linguistics, pp. 73–80 (2006)
Roy, N., Mccallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Proc. 18th International Conf. on Machine Learning, pp. 441–448. Morgan Kaufmann, San Francisco (2001)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Haralick, R.M.: Statistical and structural approaches to texture. Proceedings of the IEEE 67, 786–804 (1979)
Boland, M.V., Murphy, R.F.: A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells. Bioinformatics 17(12), 1213–1223 (2001)
Jennrich, R.: Stepwise Regression & Stepwise Discriminant Analysis. In: Statistical Methods for Digital Computers, pp. 58–95. John Wiley & Sons, Inc., New York (1977)
Hamilton, N., Pantelic, R., Hanson, K., Teasdale, R.: Fast automated cell phenotype image classification. BMC Bioinformatics 8(1), 110 (2007)
Ridler, T., Calvard, S.: Picture thresholding using an iterative selection method. IEEE Trans. Systems, Man and Cybernetics 8(8), 629–632 (1978)
Ahmed, A., Xing, E.P., Cohen, W.W., Murphy, R.F.: Structured correspondence topic models for mining captioned figures in biological literature. In: Proceedings of The Fifteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 39–47. ACM, New York (2009)
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR 1998: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 275–281. ACM, New York (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Coelho, L.P. et al. (2010). Structured Literature Image Finder: Extracting Information from Text and Images in Biomedical Literature. In: Blaschke, C., Shatkay, H. (eds) Linking Literature, Information, and Knowledge for Biology. Lecture Notes in Computer Science(), vol 6004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13131-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-13131-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13130-1
Online ISBN: 978-3-642-13131-8
eBook Packages: Computer ScienceComputer Science (R0)