Abstract
The increase in the volume and heterogeneity of biomedical data sources has motivated researchers to embrace Linked Data (LD) technologies to solve the ensuing integration challenges and enhance information discovery. As an integral part of the EU GRANATUM project, a Linked Biomedical Dataspace (LBDS) was developed to semantically interlink data from multiple sources and augment the design of in silico experiments for cancer chemoprevention drug discovery. The different components of the LBDS facilitate both the bioinformaticians and the biomedical researchers to publish, link, query and visually explore the heterogeneous datasets. We have extensively evaluated the usability of the entire platform. In this paper, we showcase three different workflows depicting real-world scenarios on the use of LBDS by the domain users to intuitively retrieve meaningful information from the integrated sources. We report the important lessons that we learned through the challenges encountered and our accumulated experience during the collaborative processes which would make it easier for LD practitioners to create such dataspaces in other domains. We also provide a concise set of generic recommendations to develop LD platforms useful for drug discovery.
Chapter PDF
Similar content being viewed by others
References
Alexander, K., Cyganiak, R., et al.: Describing linked datasets. In: LDOW (2009)
Antoniades, A., Georgousopoulos, C., Forgo, N., et al.: Linked2Safety: A secure linked data medical information space for semantically-interconnecting EHRs advancing patients’ safety in medical research. In: 12th International Conference on Bioinformatics & Bioengineering (BIBE), pp. 517–522. IEEE (2012)
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., et al.: Gene Ontology: tool for the unification of biology. Nature Genetics 25(1), 25–29 (2000)
Belleau, F., Nolin, M.A., et al.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. Journal of Biomedical Informatics 41(5), 706–716 (2008)
Berlanga, R., et al.: Exploring and linking biomedical resources through multidimensional semantic spaces. BMC Bioinformatics 13(suppl. 1), S6 (2012)
Bizer, C., Seaborne, A.: D2RQ-treating non-RDF databases as virtual RDF graphs. In: Proceedings of the 3rd International Semantic Web Conference (ISWC) (2004)
Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Research 32(suppl. 1), D267–D270 (2004)
Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL Web-Querying Infrastructure: Ready for Action? In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013)
Castillo, R., Leser, U.: Selecting materialized views for RDF data. In: Daniel, F., Facca, F.M. (eds.) ICWE 2010. LNCS, vol. 6385, pp. 126–137. Springer, Heidelberg (2010)
Cheung, K.H., Frost, H.R., Marshall, M.S., et al.: A journey to semantic web query federation in the life sciences. BMC Bioinformatics 10(suppl. 10), S10 (2009)
Euzenat, J., Meilicke, C., Stuckenschmidt, H., Shvaiko, P., Trojahn, C.: Ontology alignment evaluation initiative: Six years of experience. In: Spaccapietra, S. (ed.) Journal on Data Semantics XV. LNCS, vol. 6720, pp. 158–192. Springer, Heidelberg (2011)
Freitas, A., Curry, E., et al.: Querying linked data using semantic relatedness: a vocabulary independent approach. IEEE Internet Computing, 24–33 (2012)
Goble, C., et al.: Incorporating commercial and private data into an open linked data platform for drug discovery. In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 65–80. Springer, Heidelberg (2013)
Hartig, O., Bizer, C., Freytag, J.C.: Executing sparql queries over the web of linked data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 293–309. Springer, Heidelberg (2009)
Hasnain, A., Fox, R., Decker, S., Deus, H.F.: Cataloguing and linking life sciences LOD Cloud. In: 1st International Workshop on Ontology Engineering in a Data-driven World at EKAW 2012 (2012)
Irwin, J.J., Shoichet, B.K.: ZINC-a free database of commercially available compounds for virtual screening. Journal of Chemical Information and Modeling 45(1), 177–182 (2005)
Kamdar, M.R., Iqbal, A., Saleem, M., Deus, H.F., Decker, S.: GenomeSnip: Fragmenting the Genomic Wheel to augment discovery in cancer research. In: Conference on Semantics in Healthcare and Life Sciences (CSHALS). ISCB (2014)
Kamdar, M.R., Zeginis, D., Hasnain, A., Decker, S., Deus, H.F.: ReVeaLD: A user-driven domain-specific interactive search platform for biomedical research. Journal of Biomedical Informatics 47, 112–130 (2014)
Kannas, C., Achilleos, K., Antoniou, Z., Nicolaou, C., Pattichis, C., et al.: A workflow system for virtual screening in cancer chemoprevention. In: 12th International Conference on Bioinformatics & Bioengineering (BIBE), pp. 439–446. IEEE (2012)
Kaufmann, E., Bernstein, A.: Evaluating the usability of natural language query languages and interfaces to Semantic Web knowledge bases. Web Semantics: Science, Services and Agents on the World Wide Web 8(4), 377–393 (2010)
Li, Q., Cheng, T., Wang, Y., Bryant, S.H.: PubChem as a public resource for drug discovery. Drug Discovery Today 15(23), 1052–1057 (2010)
Markham, K.M., et al.: The concept map as a research and evaluation tool: Further evidence of validity. Journal of Research in Science Teaching 31(1), 91–101 (1994)
Miller, G.A., Beckwith, R., Fellbaum, C., et al.: Introduction to WordNet: An on-line lexical database. International Journal of Lexicography 3(4), 235–244 (1990)
Nikolov, A., Uren, V., Motta, E., de Roeck, A.: Overcoming schema heterogeneity between linked semantic repositories to improve coreference resolution. In: Gómez-Pérez, A., Yu, Y., Ding, Y. (eds.) ASWC 2009. LNCS, vol. 5926, pp. 332–346. Springer, Heidelberg (2009)
Pence, H.E., Williams, A.: ChemSpider: an online chemical information resource. Journal of Chemical Education 87(11), 1123–1124 (2010)
Pietriga, E., Bizer, C., Karger, D.R., Lee, R.: Fresnel: A browser-independent presentation vocabulary for RDF. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 158–171. Springer, Heidelberg (2006)
Ruttenberg, A., Rees, J.A., et al.: Life sciences on the semantic web: the Neurocommons and beyond. Briefings in Bioinformatics 10(2), 193–204 (2009)
Saleem, M., Khan, Y., Hasnain, A., Ermilov, I., et al.: A fine-grained evaluation of SPARQL endpoint federation systems. Semantic Web Journal (2014)
Saleem, M., et al.: Big linked cancer data: Integrating linked TCGA and PubMed. In: Web Semantics: Science, Services and Agents on the World Wide Web (2014)
Samwald, M., Jentzsch, A., et al.: Linked open drug data for pharmaceutical research and development. Journal of Cheminformatics 3(1), 19 (2011)
Sandler, R.S., Halabi, S., Baron, J.A., Budinger, S., Paskett, E., et al.: A randomized trial of aspirin to prevent colorectal adenomas in patients with previous colorectal cancer. New England Journal of Medicine 348(10), 883–890 (2003)
Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: Optimization techniques for federated query processing on linked data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011)
Searls, D.B.: Data integration: challenges for drug discovery. Nature Reviews Drug Discovery 4(1), 45–58 (2005)
Shi, L., Campagne, F.: Building a protein name dictionary from full text: a machine learning term extraction approach. BMC Bioinformatics 6(1), 88 (2005)
Sousa, S.F., et al.: Protein-ligand docking: current status and future challenges. Proteins: Structure, Function, and Bioinformatics 65(1), 15–26 (2006)
Speirs, V., Parkes, A.T., et al.: Coexpression of Estrogen Receptor α and β Poor Prognostic factors in Human Breast Cancer? Cancer Research 59(3), 525–528 (1999)
Uschold, M., Gruninger, M.: Ontologies: Principles, methods and applications. The Knowledge Engineering Review 11(2), 93–136 (1996)
Visser, P.R., Jones, D.M., Bench-Capon, T., Shave, M.: An analysis of ontology mismatches; heterogeneity versus interoperability. In: AAAI 1997 Spring Symposium on Ontological Engineering, Stanford CA., USA, pp. 164–172 (1997)
Weininger, D.: SMILES, a chemical language and information system. Journal of Chemical Information and Computer Sciences 28(1), 31–36 (1988)
Whetzel, P.L., Noy, N.F., et al.: Bioportal: enhanced functionality via new web services from the national center for biomedical ontology to access and use ontologies in software applications. Nucleic Acids Research 39(suppl. 2), W541–W545 (2011)
Williams, A.J., Harland, L., Groth, P., Pettifer, S., et al.: Open PHACTS: semantic interoperability for drug discovery. Drug Discovery Today 17(21), 1188–1198 (2012)
Zeginis, D., et al.: A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources. Semantic Web (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hasnain, A. et al. (2014). Linked Biomedical Dataspace: Lessons Learned Integrating Data for Drug Discovery. In: Mika, P., et al. The Semantic Web – ISWC 2014. ISWC 2014. Lecture Notes in Computer Science, vol 8796. Springer, Cham. https://doi.org/10.1007/978-3-319-11964-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-11964-9_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11963-2
Online ISBN: 978-3-319-11964-9
eBook Packages: Computer ScienceComputer Science (R0)