research-article

Adding text mining workflows as web services to the BioCatalogue

Authors:

Georgios Kontonasios,

Ioannis Korkontzelos,

BalaKrishna Kolluru,

Sophia AnaniadouAuthors Info & Claims

SWAT4LS '11: Proceedings of the 4th International Workshop on Semantic Web Applications and Tools for the Life Sciences

Pages 50 - 57

https://doi.org/10.1145/2166896.2166913

Published: 07 December 2011 Publication History

Abstract

In this paper, we address the issue of automatically adding U-Compare workflows to BioCatalogue by exploring the compatibility of UIMA to a REST-like web service. We aim to make workflows consisting of state-of-the-art text mining components available to the bioinformatics community without the need for any expertise in programming or software library dependencies. We present a framework for developing UIMA-compliant, REST-like, text mining services which can either supplement the existing components or be used as bespoke, stand-alone workflows. The framework embodies U-Compare's component library and refactors Apache UIMA SimpleServer to provide a post-processing component of the analysis results, a human-readable access mechanism, and documentation templates. As an application, we implemented a number of new services, which are registered with the BioCatalogue.

References

[1]

W3C Note. Simple Object Access Protocol (SOAP) 1.1. Available at: www.w3.org/TR/2000/NOTE-SOAP-20000508.

[2]

The apache software foundation. apache uima pear packaging ant task documentation. Available at: uima.apache.org/downloads/sandbox/PearPackagingAntTaskUserGuide/PearPackagingAntTaskUserGuide.html, 2008.

[3]

The apache software foundation. UIMA simple server user guide. Available at: uima.apache.org/downloads/sandbox/simpleServerUserGuide/simpleServerUserGuide.html, 2008.

[4]

Biocatalogue: The life science web services registry. Available at: biocatalogue.org, 2011.

[5]

S. Altschul, W. Gish, W. Miller, E. Myers, and D. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215:403--410, 1990.

[6]

S. Ananiadou, D. B. Kell, and J. Tsujii. Text mining and its potential applications in systems biology. Trends in biotechnology, 24(12):571--579, Dec. 2006.

[7]

D. Barseghian, I. Altintas, M. Jones, D. Crawl, N. Potter, J. Gallagher, P. Cornillon, M. Schildhauer, E. Borer, and E. Seabloom. Workflows and extensions to the kepler scientific workflow system to support environmental sensor data access and analysis. Ecological Informatics, 5(1):42--50, 2010.

[8]

W. A. Baumgartner, K. B. Cohen, and L. Hunter. An open-source framework for large-scale, flexible evaluation of biomedical text mining systems. Journal of biomedical discovery and collaboration, 3:1+, Jan. 2008.

[9]

M. Berthold, N. Cebron, F. Dill, T. Gabriel, T. Kötter, T. Meinl, P. Ohl, C. Sieb, K. Thiel, and B. Wiswedel. Knime: The konstanz information miner. Data Analysis, Machine Learning and Applications, pages 319--326, 2008.

[10]

J. Bhagat, F. Tanoh, E. Nzuobontane, T. Laurent, J. Orlowski, M. Roos, K. Wolstencroft, S. Aleksejevs, R. Stevens, S. Pettifer, R. Lopez, and C. A. Goble. BioCatalogue: a universal catalogue of web services for the life sciences. Nucleic acids research, Article in press, May 2010.

[11]

M. D. Brazas, J. T. T. Yamada, and B. F. Ouellette. Evolution in bioinformatic resources: 2009 update on the bioinformatics links directory. Nucleic acids research, 37(Web Server issue):W3--5, July 2009.

[12]

S. Durinck, Y. Moreau, A. Kasprzyk, S. Davis, B. De Moor, A. Brazma, and W. Huber. Biomart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics, 21(16):3439, 2005.

Digital Library

[13]

D. A. Ferrucci and A. Lally. Building an example application with the unstructured information management architecture. IBM Systems Journal, 43(3):455--475, 2004.

Digital Library

[14]

K. T. Frantzi, S. Ananiadou, and H. Mima. Automatic recognition of multi-word terms: the C-value/NC-value method. International Journal on Digital Libraries, 3(2):115--130, 2000.

[15]

M. Y. Galperin and G. R. Cochrane. The 2011 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection. Nucleic Acids Research, 39(suppl 1):D1--D6, Jan. 2011.

[16]

C. A. Goble and D. C. De Roure. myExperiment: social networking for workflow-using e-scientists. In Proceedings of the 2nd workshop on Workflows in support of large-scale science, WORKS '07, pages 1--2, New York, NY, USA, 2007. ACM.

Digital Library

[17]

J. Goecks, A. Nekrutenko, J. Taylor, and T. Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol, 11(8):R86, 2010.

[18]

U. Hahn, E. Buyko, R. Landefeld, M. Mühlhausen, M. Poprat, K. Tomanek, and J. Wermter. An overview of JCoRe, the JULIE lab UIMA component repository. In LREC'08 Workshop 'Towards Enhanced Interoperability for Large HLT Systems: UIMA for NLP', pages 1--7, Marrakech, Morocco, May 2008.

[19]

N. Ide and L. Romary. Representing linguistic corpora and their annotations. In Proceedings of the Fifth Language Resources and Evaluation Conference (LREC), 2006.

[20]

Y. Kano, M. Miwa, K. B. Cohen, L. Hunter, S. Ananiadou, and J. ichi Tsujii. U-compare: A modular nlp workflow construction and evaluation system. IBM Journal of Research and Development, 55(3):11, 2011.

Digital Library

[21]

B. Kolluru, L. Hawizy, P. Murray-Rust, J. Tsujii, and S. Ananiadou. Using workflows to explore and optimise named entity recognition for chemistry. PLoS ONE, 6(5):e20181+, May 2011.

[22]

A. Labarga, F. Valentin, M. Anderson, and R. Lopez. Web Services at the European Bioinformatics Institute. Nucleic Acids Research, 35(suppl 2):W6--W11, July 2007.

[23]

T. Oinn, M. Greenwood, M. Addis, M. Alpdemir, J. Ferris, K. Glover, C. Goble, A. Goderis, D. Hull, D. Marvin, et al. Taverna: lessons in creating a workflow environment for the life sciences. Concurrency and Computation: Practice and Experience, 18(10):1067--1100, 2006.

[24]

D. Rebholz-Schuhmann, M. Arregui, S. Gaudan, H. Kirsch, and A. Jimeno. Text processing through web services: calling Whatizit. Bioinformatics (Oxford, England), 24(2):296--298, Jan. 2008.

Digital Library

[25]

L. Richardson and S. Ruby. RESTful web services. O'Reilly Media, 2007.

Digital Library

[26]

A. Rowe, D. Kalaitzopoulos, M. Osmond, M. Ghanem, and Y. Guo. The discovery net system for high throughput bioinformatics. Bioinformatics, 19(suppl 1):i225, 2003.

[27]

Y. Sasaki, Y. Tsuruoka, J. McNaught, and S. Ananiadou. How to make the most of NE dictionaries in statistical NER. BMC bioinformatics, 9 Suppl 11, 2008.

Digital Library

[28]

B. Settles. ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics, 21(14):3191, 2005.

Digital Library

[29]

Y. Tsuruoka. GENIA tagger: Part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text. Available at: www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/, 2006.

[30]

Y. Tsuruoka, Y. Tateishi, J.-D. Kim, T. Ohta, J. McNaught, S. Ananiadou, and J. Tsujii. Developing a robust Part-of-Speech tagger for biomedical text. In P. Bozanis and E. N. Houstis, editors, Advances in Informatics, volume 3746, chapter 36, pages 382--392. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005.

Digital Library

[31]

X. Wang. DECA: A species disambiguation system for biological named entities. Available at: nactem.ac.uk/decadetails/start.cgi, 2010.

[32]

X. Wang, J. Tsujii, and S. Ananiadou. Disambiguating the species of biomedical named entities using natural language parsers. Bioinformatics, 26(5):661--667, Mar. 2010.

Digital Library

[33]

M. Wilkinson and M. Links. Biomoby: an open source biological web services proposal. Briefings in bioinformatics, 3(4):331, 2002.

Index Terms

Adding text mining workflows as web services to the BioCatalogue

Recommendations

Developing multilingual text mining workflows in UIMA and u-compare
NLDB'12: Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems

We present a generic, language-independent method for the construction of multilingual text mining workflows. The proposed mechanism is implemented as an extension of U-Compare, a platform built on top of the Unstructured Information Management ...
Facilitating the analysis of discourse phenomena in an interoperable NLP platform
CICLing'13: Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

The analysis of discourse phenomena is essential in many natural language processing (NLP) applications. The growing diversity of available corpora and NLP tools brings a multitude of representation formats. In order to alleviate the problem of ...
Scientific Workflows as Services in caGrid: A Taverna and gRAVI Approach
ICWS '09: Proceedings of the 2009 IEEE International Conference on Web Services

In scientific collaboration platforms such as caGrid,workflow-as-a-service is a useful concept for various reasons, such as easy reuse of workflows, access to remote resources, security concerns, and improved execution performance. We propose a solution ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SWAT4LS '11: Proceedings of the 4th International Workshop on Semantic Web Applications and Tools for the Life Sciences

December 2011

139 pages

ISBN:9781450310765

DOI:10.1145/2166896

Conference Chairs:
Adrian Paschke
Freie Universitaet Berlin, Germany
,
Albert Burger
Heriot-Watt University, and Medical Research Council, Edinburgh, Scotland, United Kingdom
,
Paolo Romano
IST National Cancer Research Institute, Genova, Italy
,
M. Scott Marshall
University of Amsterdam, The Netherlands
,
Andrea Splendiani
Rothamsted Research, UK

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Ontotext
Corporate Semantic Web: Corporate Semantic Web
BBRC: Biotechnology and Biological Sciences Research Council
NCBO: National Center for BioMedical Ontology
BioMed Central: BioMed Central

In-Cooperation

SIGBio: ACM Special Interest Group on Bioinformatics
IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Biotechnology and Biological Sciences Research Council

Conference

SWAT4LS '11

Sponsor:

Corporate Semantic Web
BBRC
NCBO
BioMed Central

SWAT4LS '11: 4th Int. Workshop on Semantic Web Applications and Tools for Life Sciences

December 7 - 9, 2011

London, United Kingdom

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
161
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents