Abstract
Alternative access paths to literature beyond mere keyword or bibliographic search are a major success factor in today’s digital libraries. Especially in the sciences, users are in dire need of complex knowledge spaces and facettations where entities like e.g., chemical substances, genes, or mathematical formulae may play a central role. However, even for clear-cut entities the requirements in terms of contextualized similarities or rankings may strongly differ. In this paper, we show how deep learning techniques used on scientific corpora lead to a strongly contextualized description of entities. As application case we take pharmaceutical entities in the form of small molecules and demonstrate how their learned contexts and profiles reflect their actual use as well as possible new uses, e.g., for drug design or repurposing. As our evaluation shows, the results gained are quite comparable to expensive manually maintained classifications in the field. Since our techniques only rely on deep embeddings of textual documents, our methodology promises to be generalizable to other use cases, too.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
References
Willett, P., Barnard, J.M., Downs, G.M.: Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38(6), 983–996 (1998)
Tönnies, S., Köhncke, B., Balke, W.T.: Taking chemistry to the task: personalized queries for chemical digital libraries. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2011), Ottawa, Canada (2011)
Wishart, D.S., Knox, C., Guo, A.C., Shrivastava, S., Hassanali, M., Stothard, P., Chang, Z., Woolsey, J.: DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 34(1), D668–D672 (2006). Database issue
Sacco, G.M., Tzitzikas, Y.: Dynamic Taxonomies and Faceted Search: Theory, Practice, and Experience. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02359-0
Köhncke, B., Balke, W.-T.: Context-sensitive ranking using cross-domain knowledge for chemical digital libraries. In: Aalberg, T., Papatheodorou, C., Dobreva, M., Tsakonas, G., Farrugia, C.J. (eds.) TPDL 2013. LNCS, vol. 8092, pp. 285–296. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40501-3_29
Gonzalez Pinto, J.M., Balke, W.T.: Demystifying the semantics of relevant objects in scholarly collections: a probabilistic approach. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), Knoxville, TN, USA (2015)
Gurulingappa, H., Kolárik, C., Hofmann-Apitius, M., Fluck, J.: Concept-based semi-automatic classification of drugs. J. Chem. Inf. Model. 49(8), 1986–1992 (2009)
Dunkel, M., Günther, S., Ahmed, J., Wittig, B., Preissner, R.: SuperPred: drug classification and target prediction. Nucleic Acids Res. 36(suppl 2), W55–W59 (2008)
Trieschnigg, D., Pezik, P., Lee, V., De Jong, F., Kraaij, W., Rebholz-Schuhmann, D.: MeSH Up: effective MeSH text classification for improved document retrieval. Bioinformatics 25(11), 1412–1418 (2009). Oxford University Press
Dumais, S.T.: Latent semantic analysis. In: Annual Review of Information Science and Technology (ARIST), Association for Information Science & Technology, vol. 38, no. 1 (2004)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003). MIT Press
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA (2013)
Jessop, D.M., Adams, S.E., Willighagen, E.L., Hawizy, L., Murray-Rust, P.: OSCAR4: a flexible architecture for chemical text-mining. J. Cheminform. 3(1), 41 (2011). Springer
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Borg, I., Groenen, P.J.: Modern Multidimensional Scaling: Theory and Applications. Springer, Heidelberg (2005). doi:10.1007/0-387-28981-X
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Wawrzinek, J., Balke, WT. (2017). Semantic Facettation in Pharmaceutical Collections Using Deep Learning for Active Substance Contextualization. In: Choemprayong, S., Crestani, F., Cunningham, S. (eds) Digital Libraries: Data, Information, and Knowledge for Digital Lives. ICADL 2017. Lecture Notes in Computer Science(), vol 10647. Springer, Cham. https://doi.org/10.1007/978-3-319-70232-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-70232-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70231-5
Online ISBN: 978-3-319-70232-2
eBook Packages: Computer ScienceComputer Science (R0)