Abstract
Manual curation of biological databases is an expensive and labor-intensive process in Genomics and Systems Biology. We report the implem-entation of a state-of-the-art, rule-based Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from abstracts and full-text papers. We evaluate its output against a manually-curated standard database, and test the possibilities and limitations of automatic and semi-automatic curation of the so-called biobibliome. We also propose a novel Regulatory Interaction Mining Markup Language suited for representing this data, useful both for biologists and for text-mining specialists.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abney, S.: Partial parsing via finite-state cascades. In: Proceedings of the ESSLLI ’96 Robust Parsing Workshop, Prague, Czech Republic, pp. 8–15 (1996)
Corney, D.P.A., Buxton, B.F., Langdon, W.B., Jones, D.T.: BioRAT: Extracting Biological Information from Full-length Papers. Bioinformatics 20(17), 3206–3213 (2004)
Demetriou, G., Gaizauskas, R.: Utilizing Text Mining Results: The PastaWeb System. In: Proceedings of the Association for Computational Linguistics Workshop on Natural Language Processing in the Biomedical Domain, Philadelphia, July 11, pp. 77–84 (2002)
Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17(Suppl. 1), S74–S82 (2001)
Grivell, L.: Mining the bibliome: searching for a needle in a haystack? New computing tools are needed to effectively scan the growing amount of scientific literature for useful information. EMBO Rep. 3(3), 200–203 (2002)
Hirschman, L., Yeh, A., Blaschke, C., Valencia, A.: Overview of BioCreAtIvE: critical assessment of information extraction for biology (Epub 2005 May 24). Bioinformatics 6(Suppl. 1), 1 (2005)
Hucka, M., Finney, A., Bornstein, B.J., Keating, S.M., Shapiro, B.E., Matthews, J., Kovitz, B.L., Schilstra, M.J., Funahashi, A., Doyle, J.C., Kitano, H.: Evolving a lingua franca and associated software infrastructure for computational systems biology: the Systems Biology Markup Language (SBML) project. System Biology (Stevenage) 1(1), 41–53 (2004)
Karamanis, N., Lewin, I., Sealy, R., Drysdaley, R., Briscoe, E.: Integrating Natural Language Processing with Flybase Curation. In: Proceedings from Pacific Symposium on Biocomputing (to appear) (2007)
Karp, P.D.: Pathway databases: a case study in computational symbolic theories. Science 293(5537), 2040–2044 (2001)
Krallinger, M., Erhardt, R.A., Valencia, A.: Text-mining approaches in molecular biology and biomedicine. Drug Discov. Today 10(6), 439–445 (2005)
Rodriguez-Esteban, R., Iossifov, I., Rzhetsky, A.: Imitating Manual Curation of Text-Mined Facts in Biomedicine. PLoS Comput. Biol. 2(9), e118 (2006)
Salgado, H., Gama-Castro, S., Peralta-Gil, M., Diaz-Peredo, E., Sanchez-Solano, F., Santos-Zavaleta, A., Martinez-Flores, I., Jimenez-Jacinto, V., Bonavides-Martinez, C., Segura-Salazar, J., Martinez-Antonio, A., Collado-Vides, J.: RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res. 34(Database issue), D394–397 (2006)
Saric, J., Jensen, L., Rojas, I.: Large-scale Extraction of Gene Regulation for Model Organisms in an ontological contex. In: Silico Biology, 5, 0004 (2004)
Saurí, R., Verhagen, M., Pustejovsky, J.: Annotating and Recognizing Event Modality in Text. In: Proceedings of the 19th International FLAIRS Conference, FLAIRS 2006, Melbourne Beach, Florida, May 11-13 (2006)
Scherf, M., Epple, A., Werner, T.: The next generation of literature analysis: integration of genomic analysis into text mining. Brief Bioinform. 6(3), 287–297 (2005)
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, September (1994)
Yandell, M.D., Majoros, W.H.: Genomics and natural language processing. Nature Reviews Genetics 3(8), 601–610 (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rodríguez-Penagos, C., Salgado, H., Martínez-Flores, I., Collado-Vides, J. (2007). NLP-Based Curation of Bacterial Regulatory Networks. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_51
Download citation
DOI: https://doi.org/10.1007/978-3-540-70939-8_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70938-1
Online ISBN: 978-3-540-70939-8
eBook Packages: Computer ScienceComputer Science (R0)