Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2166896.2166913acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesswat4lsConference Proceedingsconference-collections
research-article

Adding text mining workflows as web services to the BioCatalogue

Published: 07 December 2011 Publication History

Abstract

In this paper, we address the issue of automatically adding U-Compare workflows to BioCatalogue by exploring the compatibility of UIMA to a REST-like web service. We aim to make workflows consisting of state-of-the-art text mining components available to the bioinformatics community without the need for any expertise in programming or software library dependencies. We present a framework for developing UIMA-compliant, REST-like, text mining services which can either supplement the existing components or be used as bespoke, stand-alone workflows. The framework embodies U-Compare's component library and refactors Apache UIMA SimpleServer to provide a post-processing component of the analysis results, a human-readable access mechanism, and documentation templates. As an application, we implemented a number of new services, which are registered with the BioCatalogue.

References

[1]
W3C Note. Simple Object Access Protocol (SOAP) 1.1. Available at: www.w3.org/TR/2000/NOTE-SOAP-20000508.
[2]
The apache software foundation. apache uima pear packaging ant task documentation. Available at: uima.apache.org/downloads/sandbox/PearPackagingAntTaskUserGuide/PearPackagingAntTaskUserGuide.html, 2008.
[3]
The apache software foundation. UIMA simple server user guide. Available at: uima.apache.org/downloads/sandbox/simpleServerUserGuide/simpleServerUserGuide.html, 2008.
[4]
Biocatalogue: The life science web services registry. Available at: biocatalogue.org, 2011.
[5]
S. Altschul, W. Gish, W. Miller, E. Myers, and D. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215:403--410, 1990.
[6]
S. Ananiadou, D. B. Kell, and J. Tsujii. Text mining and its potential applications in systems biology. Trends in biotechnology, 24(12):571--579, Dec. 2006.
[7]
D. Barseghian, I. Altintas, M. Jones, D. Crawl, N. Potter, J. Gallagher, P. Cornillon, M. Schildhauer, E. Borer, and E. Seabloom. Workflows and extensions to the kepler scientific workflow system to support environmental sensor data access and analysis. Ecological Informatics, 5(1):42--50, 2010.
[8]
W. A. Baumgartner, K. B. Cohen, and L. Hunter. An open-source framework for large-scale, flexible evaluation of biomedical text mining systems. Journal of biomedical discovery and collaboration, 3:1+, Jan. 2008.
[9]
M. Berthold, N. Cebron, F. Dill, T. Gabriel, T. Kötter, T. Meinl, P. Ohl, C. Sieb, K. Thiel, and B. Wiswedel. Knime: The konstanz information miner. Data Analysis, Machine Learning and Applications, pages 319--326, 2008.
[10]
J. Bhagat, F. Tanoh, E. Nzuobontane, T. Laurent, J. Orlowski, M. Roos, K. Wolstencroft, S. Aleksejevs, R. Stevens, S. Pettifer, R. Lopez, and C. A. Goble. BioCatalogue: a universal catalogue of web services for the life sciences. Nucleic acids research, Article in press, May 2010.
[11]
M. D. Brazas, J. T. T. Yamada, and B. F. Ouellette. Evolution in bioinformatic resources: 2009 update on the bioinformatics links directory. Nucleic acids research, 37(Web Server issue):W3--5, July 2009.
[12]
S. Durinck, Y. Moreau, A. Kasprzyk, S. Davis, B. De Moor, A. Brazma, and W. Huber. Biomart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics, 21(16):3439, 2005.
[13]
D. A. Ferrucci and A. Lally. Building an example application with the unstructured information management architecture. IBM Systems Journal, 43(3):455--475, 2004.
[14]
K. T. Frantzi, S. Ananiadou, and H. Mima. Automatic recognition of multi-word terms: the C-value/NC-value method. International Journal on Digital Libraries, 3(2):115--130, 2000.
[15]
M. Y. Galperin and G. R. Cochrane. The 2011 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection. Nucleic Acids Research, 39(suppl 1):D1--D6, Jan. 2011.
[16]
C. A. Goble and D. C. De Roure. myExperiment: social networking for workflow-using e-scientists. In Proceedings of the 2nd workshop on Workflows in support of large-scale science, WORKS '07, pages 1--2, New York, NY, USA, 2007. ACM.
[17]
J. Goecks, A. Nekrutenko, J. Taylor, and T. Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol, 11(8):R86, 2010.
[18]
U. Hahn, E. Buyko, R. Landefeld, M. Mühlhausen, M. Poprat, K. Tomanek, and J. Wermter. An overview of JCoRe, the JULIE lab UIMA component repository. In LREC'08 Workshop 'Towards Enhanced Interoperability for Large HLT Systems: UIMA for NLP', pages 1--7, Marrakech, Morocco, May 2008.
[19]
N. Ide and L. Romary. Representing linguistic corpora and their annotations. In Proceedings of the Fifth Language Resources and Evaluation Conference (LREC), 2006.
[20]
Y. Kano, M. Miwa, K. B. Cohen, L. Hunter, S. Ananiadou, and J. ichi Tsujii. U-compare: A modular nlp workflow construction and evaluation system. IBM Journal of Research and Development, 55(3):11, 2011.
[21]
B. Kolluru, L. Hawizy, P. Murray-Rust, J. Tsujii, and S. Ananiadou. Using workflows to explore and optimise named entity recognition for chemistry. PLoS ONE, 6(5):e20181+, May 2011.
[22]
A. Labarga, F. Valentin, M. Anderson, and R. Lopez. Web Services at the European Bioinformatics Institute. Nucleic Acids Research, 35(suppl 2):W6--W11, July 2007.
[23]
T. Oinn, M. Greenwood, M. Addis, M. Alpdemir, J. Ferris, K. Glover, C. Goble, A. Goderis, D. Hull, D. Marvin, et al. Taverna: lessons in creating a workflow environment for the life sciences. Concurrency and Computation: Practice and Experience, 18(10):1067--1100, 2006.
[24]
D. Rebholz-Schuhmann, M. Arregui, S. Gaudan, H. Kirsch, and A. Jimeno. Text processing through web services: calling Whatizit. Bioinformatics (Oxford, England), 24(2):296--298, Jan. 2008.
[25]
L. Richardson and S. Ruby. RESTful web services. O'Reilly Media, 2007.
[26]
A. Rowe, D. Kalaitzopoulos, M. Osmond, M. Ghanem, and Y. Guo. The discovery net system for high throughput bioinformatics. Bioinformatics, 19(suppl 1):i225, 2003.
[27]
Y. Sasaki, Y. Tsuruoka, J. McNaught, and S. Ananiadou. How to make the most of NE dictionaries in statistical NER. BMC bioinformatics, 9 Suppl 11, 2008.
[28]
B. Settles. ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics, 21(14):3191, 2005.
[29]
Y. Tsuruoka. GENIA tagger: Part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text. Available at: www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/, 2006.
[30]
Y. Tsuruoka, Y. Tateishi, J.-D. Kim, T. Ohta, J. McNaught, S. Ananiadou, and J. Tsujii. Developing a robust Part-of-Speech tagger for biomedical text. In P. Bozanis and E. N. Houstis, editors, Advances in Informatics, volume 3746, chapter 36, pages 382--392. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005.
[31]
X. Wang. DECA: A species disambiguation system for biological named entities. Available at: nactem.ac.uk/decadetails/start.cgi, 2010.
[32]
X. Wang, J. Tsujii, and S. Ananiadou. Disambiguating the species of biomedical named entities using natural language parsers. Bioinformatics, 26(5):661--667, Mar. 2010.
[33]
M. Wilkinson and M. Links. Biomoby: an open source biological web services proposal. Briefings in bioinformatics, 3(4):331, 2002.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SWAT4LS '11: Proceedings of the 4th International Workshop on Semantic Web Applications and Tools for the Life Sciences
December 2011
139 pages
ISBN:9781450310765
DOI:10.1145/2166896
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Ontotext
  • Corporate Semantic Web: Corporate Semantic Web
  • BBRC: Biotechnology and Biological Sciences Research Council
  • NCBO: National Center for BioMedical Ontology
  • BioMed Central: BioMed Central

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BioCatalogue
  2. UIMA
  3. annotations
  4. components
  5. in-line
  6. stand-off
  7. u-compare
  8. web service
  9. workflow

Qualifiers

  • Research-article

Funding Sources

Conference

SWAT4LS '11
Sponsor:
  • Corporate Semantic Web
  • BBRC
  • NCBO
  • BioMed Central

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 161
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media