Abstract
Protein-Protein interaction (PPI) information play a vital role in biological research. This work proposes a two-step machine learning based method to extract PPI information from biomedical literature. Both steps use Maximum Entropy (ME) model. The first step is designed to estimate whether a sentence in a literature contains PPI information. The second step is to judge whether each protein pair in a sentence has interaction. Two steps are combined through adding the outputs of the first step to the model of the second step as features. Experiments show the method achieves a total accuracy of 81.9% in BC–PPI corpus and the outputs of the first step can effectively prompt the performance of the PPI information extraction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tsai, T.H., Chou, W.C., Wu, S.H.: Integrating Linguistic Knowledge into a Conditional Random Field Framework to Identify Biomedical Named Entities. Expert Systems with Applications 30(1), 117–128 (2006)
Joshi-Tope, G., Gillespie, M., Vastrik, I.: Reactome: a Knowledgebase of Biological Pathways. Nucleic Acids Research 33(Database Issue), D428–432 (2005)
Bader, G.D., Betel, D., Hogue, C.W.: Bind: the Biomolecular Interaction Network Database. Nucleic Acids Research 31(1), 248–250 (2003)
Peri, S., Navarro, J.D., Kristiansen, T.Z.: Human Protein Reference Database as a Discovery Resource for Proteomics. Nucleic Acids Research 32 (Database Issue), D497–501 (2004)
Bunescu, R., Mooney, R., Ramani, A.: Integrating Co-occurrence Statistics with Information Extraction for Robust Retrieval of Protein Interactions from Medline. In: Proc of BioNLP–2006, pp. 49–56 (2006)
Ramani, A., Bunescu, R., Mooney, R.: Consolidating the Set of Know Human Proteinprotein Interactions in Preparation for Large-scale Mapping of the Human Interactome. Genome Biology 6(5), r40 (2005)
Temkin, J.M., Gilder, M.R.: Extraction of Protein Interaction Information from Unstructured Text Using a Context-free Grammar. Bioinformatics 19(16), 2046–2053 (2003)
Jang, H., Lim, J., Lim, J.H.: Finding the Evidence for Protein-protein Interactions from PubMed Abstracts. Bioinformatics 22(14), e220–e226 (2006)
Huang, M., Zhu, X., Hao, Y.: Discovering Patterns to Extract Protein-Protein Interactions from Full Biomedical Texts. Bioinformatics 20(18), 3604–3612 (2004)
Hao, Y., Zhu, X., Huang, M.: Discovering Patterns to Extract Protein-protein Interactions from the Literature: Part II. Bioinformatics 21(15), 3294–3300 (2005)
Mitsumori, T., Murata, M., Fukuda, Y.: Extracting Protein-Protein Interaction Information from Biomedical Text with SVM. IEICE Trans Inf & Syst. E89–D, 2464–2466 (2006)
Xiao, J., Su, J., Zhou, G.D.: Protein-Protein Interaction Extraction: A Supervised Learning Approach. In: Proc. Symp. on Semantic Mining in Biomedicine, pp. 51–59 (2005)
Berleant, D., Ding, J., Fulmer, A.W.: Corpus Properties of Protein Interaction Descriptions in MEDLINE (2003), http://class.ee.iastate.edu/berleant/home/me/cv/papers/corpuspropertiesstart.htm
Ratnaparkhi, A.: A Maximum Entropy Model for Part-of-Speech Taggin. In: Proc of the Conference on Empirical Methods in Natural Language Processing, pp. 133–142 (1996)
Chieu, H.L., Ng, H.T.: Named Entity Recognition: A Maximum Entropy Approach Using Global Information. In: Proc of the Seventh Conference on Natural Language Learning, pp. 190–203 (2003)
Nanda, K.: Combining Lexical, Syntactic and Semantic Features with Maximum Entropy Models for Extracting Relations. In: Proc of the ACL-2004, Interactive Posters/Demonstrations Session, pp. 178–181 (2004)
Yeh, A.S., Morgan, A., Colosimo, M.: BioCreAtIvE task 1A: Gene Mention Finding Evaluation. BMC Bioinformatics 6(Suppl 1) (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Sun, C., Lin, L., Wang, X., Guan, Y. (2007). Using Maximum Entropy Model to Extract Protein-Protein Interaction Information from Biomedical Literature. In: Huang, DS., Heutte, L., Loog, M. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues. ICIC 2007. Lecture Notes in Computer Science, vol 4681. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74171-8_72
Download citation
DOI: https://doi.org/10.1007/978-3-540-74171-8_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74170-1
Online ISBN: 978-3-540-74171-8
eBook Packages: Computer ScienceComputer Science (R0)