Abstract
Pseudo-Relevance Feedback is one of the methods for improving search engine results. By automatically extracting information from a previous search result, a new query is posed as an expansion of the original query, and then it is searched again. In this paper, we apply a genetic algorithm to improve the Pseudo-Relevance Feedback method in searching medical texts. First, a set of candidate terms is constructed by extracting keywords from the documents returned from the initial search using the original query. Then, the seed terms are selected from the candidate term set using our proposed genetic algorithm, to be merged with the original query to create a new query. The new query is searched again, returning a final ranked list of documents. Experimental results on the TREC 2014 CDS dataset show that the proposed method outperforms the baseline method that does not use a genetic algorithm for Pseudo-Relevance Feedback.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chou, S., Chang, W., Cheng, C.Y., Jehng, J.C., Chang, C.: An information retrieval system for medical records & documents. In: 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 1474–1477 (2008). https://doi.org/10.1109/IEMBS.2008.4649446
Goeuriot, L., Jones, G.J.F., Kelly, L., Müller, H., Zobel, J.: Medical information retrieval: introduction to the special issue. Inf. Retr. J. 19(1–2), 1–5 (2016)
Palotti, J., Hanbury, A., Müller, H., Kahn Jr., C.E.: How users search and what they search for in the medical domain - understanding laypeople and experts through query logs. Inf. Retr. J. 19(1–2), 189–224 (2016)
Cao, G., Nie, J., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, Singapore, 20–24 July 2008, pp. 243–250 (2008). https://doi.org/10.1145/1390334.1390377
Lv, Y., Zhai, C., Chen, W.: A boosting approach to improving pseudo-relevance feedback. In: Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, 25–29 July 2011, pp. 165–174 (2011). https://doi.org/10.1145/2009916.2009942
Cao, G., Nie, J.-Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 243–250. ACM, New York (2008)
Vargas, S., Santos, R.L.T., Macdonald, C., Ounis, I.: Selecting effective expansion terms for diversity. In: Open Research Areas in Information Retrieval, OAIR 2013, Lisbon, Portugal, 15–17 May 2013, pp. 69–76 (2013)
Chen, H.: Machine learning for information retrieval: Neural networks, symbolic learning, and genetic algorithms. JASIS 46(3), 194–216 (1995)
Simpson, M.S., Voorhees, E., Hersh, W.: Overview of the TREC 2014 clinical decision support track. In: Proceedings of the 23rd Text Retrieval Conference (TREC), Gaithersburg, MD, USA (2014)
Del Fiol, G., Workman, T.E., Gorman, P.N.: Clinical questions raised by clinicians at the point of care: a systematic review. JAMA Intern. Med. 174(5), 710–718 (2014)
Mourão, A., Martins, F., Magalhães, J.: NovaSearch at TREC 2014 clinical decision support track. In: Proceedings of the Twenty-Third Text REtrieval Conference, TREC 2014, Gaithersburg, Maryland, USA, 19–21 November 2014
Singh, J.N., Dwivedi, S.K.: Analysis of vector space model in information retrieval. In: Proceedings Published by International Journal of Computer Applications\(^{\textregistered }\) (IJCA), vol. 2, pp. 14–18 (2012)
Trotman, A., Puurula, A., Burgess, B.: Improvements to BM25 and language models examined. In: Proceedings of the 2014 Australasian Document Computing Symposium, ADCS 2014, Melbourne, VIC, Australia, 27–28 November 2014, p. 58 (2014). https://doi.org/10.1145/2682862.2682863
Banerjee, P., Han, H.: Language modeling approaches to information retrieval. JCSE 3(3), 143–164 (2009)
Lv, Y., Zhai, C.: Lower-bounding term frequency normalization. In: Proceedings of the 20th ACM Conference on Information and Knowledge Management, CIKM 2011, Glasgow, United Kingdom, 24–28 October 2011, pp. 7–16 (2011)
Lv, Y., Zhai, C.: When documents are very long, BM25 fails! In: Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, 25–29 July 2011, pp. 1103–1104 (2011). https://doi.org/10.1145/2009916.2010070
Cormack, G.V., Clarke, C.L.A., Büttcher, S.: Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, Boston, MA, USA, 19–23 July 2009, pp. 758–759 (2009). https://doi.org/10.1145/1571941.1572114
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), Article ID 1–1150 (2012). https://doi.org/10.1145/2071389.2071390
Gen, M., Liu, B.: A genetic algorithm for optimal capacity expansion. J. Oper. Res. Soc. Jpn. 40, 1–9 (1997)
Roberts, K., Simpson, M.S., Demner-Fushman, D., Voorhees, E.M., Hersh, W.R.: State-of-the-art in biomedical literature retrieval for clinical cases: a survey of the TREC 2014 CDS track. Inf. Retr. J. 19(1–2), 113–148 (2016)
Zuva, K., Zuva, T.: Evaluation of information retrieval systems. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 4, 35–43 (2012)
Mogotsi, I.C., Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008). 482 p. ISBN: 978-0-521-86571-5. Inf. Retr. 13(2), 192–195 (2010)
Acknowledgments
This work is funded by Vietnam National University at Ho Chi Minh City under the grant number B2016-42-01.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Nguyen, L., Cao, T. (2018). Pseudo-Relevance Feedback for Information Retrieval in Medicine Using Genetic Algorithms. In: Nguyen, N., Hoang, D., Hong, TP., Pham, H., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2018. Lecture Notes in Computer Science(), vol 10752. Springer, Cham. https://doi.org/10.1007/978-3-319-75420-8_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-75420-8_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75419-2
Online ISBN: 978-3-319-75420-8
eBook Packages: Computer ScienceComputer Science (R0)