Abstract
In this paper, we develop an approach to semantic search that utilizes high-dimensional vector representations to infer the nature of the relationship between query concepts and other concepts in relevant documents. We do so by incorporating outside knowledge drawn from tens of millions of concept-relation-concept triplets, known as semantic predications, extracted from the biomedical literature using a Natural Language Processing (NLP) system called SemRep. Inference is accomplished in high-dimensional space using Expansion-by-Analogy, a novel analogical approach to pseudo-relevance feedback, in which the relationships between query concepts and other concepts in documents they occur in guide the query expansion process. The semantic vector based approaches developed in this work show improvements in performance over a baseline bag-of-concepts model, and these improvements are most pronounced on queries that are not conducive to keyword-based search.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bodenreider, O., Stevens, R.: Bio-ontologies: current trends and future directions. Briefings Bioinform. 7, 256–274 (2006). PMID: 16899495 PMCID: PMC1847325
Zhou, W., Yu, C., Smalheiser, N., Torvik, V., Hong, J.: Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 655–662. ACM (2007)
Hersh, W.R.: Report on the TREC 2004 genomics track. In: ACM SIGIR Forum, vol. 39, pp. 21–24. ACM (2005)
Hersh, W.R., Cohen, A.M., Roberts, P.M., Rekapalli, H.K.: TREC 2006 genomics track overview. In: TREC (2006)
Koopman, B., Zuccon, G., Bruza, P., Sitbon, L., Lawley, M.: An evaluation of corpus-driven measures of medical concept similarity for information retrieval. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2439–2442. ACM (2012)
Zuccon, G., Koopman, B., Nguyen, A., Vickers, D., Butt, L.: Exploiting medical hierarchies for concept-based information retrieval. In: Proceedings of the Seventeenth Australasian Document Computing Symposium, pp. 111–114. ACM (2012)
Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., Rindflesch, T.C.: SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics 28(23), 3158–3160 (2012)
Rindflesch, T.C., Fiszman, M.: The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J. Biomed. Inf. 36, 462–477 (2003)
Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(Database Issue), D267 (2004)
Kilicoglu, H., Fiszman, M., Rosemblat, G., Marimpietri, S., Rindflesch, T.C.: Arguments of nominals in semantic interpretation of biomedical text. In: Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, pp. 46–54 (2010)
Cohen, T., Schvaneveldt, R., Rindflesch, T.: Predication-based semantic indexing: permutations as a means to encode predications in semantic space. AMIA Annu. Symp. Proc., 114–118 (2009)
Cohen, T., Widdows, D., Schvaneveldt, R., Rindflesch, T.C.: Finding Schizophrenia’s prozac emergent relational similarity in predication space. In: Song, D., Melucci, M., Frommholz, I., Zhang, P., Wang, L., Arafat, S. (eds.) QI 2011. LNCS, vol. 7052, pp. 48–59. Springer, Heidelberg (2011)
Cohen, T., Widdows, D., Schvaneveldt, R.W., Rindflesch, T.C.: Logical leaps and quantum connectives: forging paths through predication space. In: Proceedings of AAAI Fall Symposium on Quantum Informatics for Cognitive, Social, and Semantic Processes, pp. 11–13 (2010)
Cohen, T., Widdows, D., De Vine, L., Schvaneveldt, R., Rindflesch, T.C.: Many paths lead to discovery: analogical retrieval of cancer therapies. In: Busemeyer, J.R., Dubois, F., Lambert-Mogiliansky, A., Melucci, M. (eds.) QI 2012. LNCS, vol. 7620, pp. 90–101. Springer, Heidelberg (2012)
Cohen, T., Widdows, D.: Empirical distributional semantics: methods and biomedical applications. J. Biomed. Inf. 42, 390–405 (2009)
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)
Kanerva, P., Kristofersson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Proceedings of the 22nd Annual Conference of the Cognitive Science Society, vol. 1036 (2000)
Cohen, T., Widdows, D., Schvaneveldt, R., Davies, P., Rindflesch, T.: Discovering discovery patterns with predication-based semantic indexing. J. Biomed. Inf. 45, 1049–1065 (2012)
Gentner, D., Markman, A.B.: Structure mapping in analogy and similarity. Am. psychol. 52(1), 45 (1997)
Gayler, R.W.: Vector symbolic architectures answer jackendoff’s challenges for cognitive neuroscience. In: Slezak, P. (ed.), ICCS/ASCS International Conference on Cognitive Science, (Sydney, Australia. University of New South Wales.), pp. 133–138 (2004)
Plate, T.A.: Holographic Reduced Representation: Distributed Representation for Cognitive Structures. CSLI Publications, Stanford (2003)
De Vine, L., Bruza, P.: Semantic oscillations: encoding context and structure in complex valued holographic vectors. Proceedings of AAAI Fall Symposium on Quantum Informatics for Cognitive Social, and Semantic Processes (2010)
Widdows, D., Cohen, T.: Real, complex, and binary semantic vectors. In: Busemeyer, J.R., Dubois, F., Lambert-Mogiliansky, A., Melucci, M. (eds.) QI 2012. LNCS, vol. 7620, pp. 24–35. Springer, Heidelberg (2012)
Kanerva, P.: Binary spatter-coding of ordered k-tuples. In: von der Malsburg, C., von Seelen, W., Vorbrüggen, J.C., Sendhoff, B. (eds.) Artificial Neural Networks — ICANN 1996. LNCS, vol. 1112, pp. 869–873. Springer, Heidelberg (1996)
Wahle, M., Widdows, D., Herskovic, J.R., Bernstam, E.V., Cohen, T.: Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts. AMIA Annu. Symp. Proc., 940–949 (2012)
Karlgren, J., Sahlgren, M.: From Words to Understanding, Foundations of Real-World Intelligence, pp. 294–308. CSLI Publications, Stanford (2001)
Hersh, W., Buckley, C., Leone, T.J., Hickam, D.: OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 192–201 (1994)
Aronson, A.R., Lang, F.: An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inf. Assoc. 17, 229–236 (2010)
Hersh, W.R., Hickam, D.H., Haynes, R.B., McKibbon, K.A.: A performance and failure analysis of SAPHIRE with a MEDLINE test collection. J. Am. Med. Inf. Assoc. 1, 51–60 (1994)
Aronson, A.R., Rindflesch, T.C., Browne, A.C.: Exploiting a large thesaurus for information retrieval. RIAO 94, 197–216 (1994)
Widdows, D., Cohen, T.: The semantic vectors package: new algorithms and public tools for distributional semantics. In: Fourth IEEE International Conference on Semantic Computing (ICSC) (2010)
Apache lucene. https://lucene.apache.org
trec-eval. http://trec.nist.gov/trec_eval/
Koopman, B., Zuccon, G., Bruza, P., Sitbon, L., Lawley, M.: Graph-based concept weighting for medical information retrieval. In: Proceedings of the Seventeenth Australasian Document Computing Symposium, ADCS 2012, pp. 80–87. ACM, New York, NY, USA (2012)
Cohen, T., Widdows, D., Schvaneveldt, R., Rindflesch, T.: Discovery at a distance: farther journey’s in predication space. In: Proceedings of the First International Workshop on the role of Semantic Web in Literature-Based Discovery (SWLBD2012), The IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2012). Philadelphia, PA, USA, 4–7 October 2012
Acknowledgments
This research was supported by US National Library of Medicine grants R21 LM010826 and R01 LM011563. It was also supported in part by the Intramural Research Program of the US National Institutes of Health, National Library of Medicine. We would like to thank Lance DeVine, for contributing the CHRR implementation that was used in this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Cohen, T., Widdows, D., Rindflesch, T. (2015). Expansion-by-Analogy: A Vector Symbolic Approach to Semantic Search. In: Atmanspacher, H., Bergomi, C., Filk, T., Kitto, K. (eds) Quantum Interaction. QI 2014. Lecture Notes in Computer Science(), vol 8951. Springer, Cham. https://doi.org/10.1007/978-3-319-15931-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-15931-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15930-0
Online ISBN: 978-3-319-15931-7
eBook Packages: Computer ScienceComputer Science (R0)