Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Understanding the impact of query expansion on federated search

Published: 21 June 2023 Publication History

Abstract

Query expansion (QE) has been studied extensively in traditional search settings due to its efficacy in improving retrieval performance. However, the level of performance achieved in the traditional settings has not been reported in the literature on the federated search. Some of the possible reasons include the lack of complete information regarding the corpus statistics of the databases and their diverse content. Nevertheless, several studies have experimented with different QE approaches and reported mixed results. This paper extends the findings of these publications by studying the impact of using a different source for selecting terms to be used in QE on the federated search. Specifically, the expansion terms are extracted from uniform resource locators (URLs) of the documents returned by each database. The retrieval experiments with TREC 2013 FedWeb dataset demonstrates that the expanded query using the proposed approach performs better in many instances than the unexpanded query.

References

[1]
Azad HK and Deepak A A new approach for query expansion using wikipedia and wordnet Inf Sci 2019 492 147-163
[2]
Baillie M, Azzopardi L, Crestani F (2006) Adaptive query-based sampling of distributed collections. In Proceedings of the 13th International Conference on String Processing and Information Retrieval, SPIRE’06, page 316-328, Berlin, Heidelberg. Springer-Verlag.
[3]
Callan J and Connell M Query-based sampling of text databases ACM Trans Inf Syst 2001 19 2 97-130
[4]
Callan J (2002) Distributed information retrieval. In Advances in information retrieval, Springer. 127–150.
[5]
Clarke CLA, Kolla M, Cormack GV, Vechtomova O, Ashkan A, Büttcher S, MacKinnon I (2008) Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. 659–666.
[6]
Cui H, Wen J-R, Nie J-Y, Ma W-Y (2002) Probabilistic query expansion using query logs. In Proceedings of the 11th international conference on World Wide Web. 325–332.
[7]
Damas J, Devezas J, Nunes S (2022) Federated search using query log evidence. In Progress in Artificial Intelligence: 21st EPIA Conference on Artificial Intelligence, EPIA 2022, Lisbon, Portugal, August 31–September 2, 2022, Proceedings, pages 794–805. Springer.
[8]
Demeester T, Trieschnigg D, Nguyen D, Zhou K, Hiemstra D (2014) Overview of the trec 2014 federated web search track. Technical report, GHENT UNIV (BELGIUM)
[9]
Diaz F, Mitra B, Craswell N (2016) Query expansion with locally-trained word embeddings. arXiv preprint arXiv:1605.07891
[10]
Dragoni M, Rexha A, Ziak H, Kern R (2017) A semantic federated search engine for domain-specific document retrieval. In Proceedings of the Symposium on Applied Computing, pp 303–308.
[11]
Fernández-Reyes FC, Hermosillo-Valadez J, and Montes-y-Gómez M A prospect-guided global query expansion strategy using word embeddings Inf Process Manag 2018 54 1 1-13
[12]
Furnas GW, Landauer TK, Gomez LM, and Dumais ST The vocabulary problem in human-system communication Commun ACM 1987 30 11 964-971
[13]
Gallant M, Isah H, Zulkernine F, Khan S (2019) Xu: an automated query expansion and optimization tool. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol 1. IEEE, Milwaukee, WI, pp 443–452. https://ieeexplore.ieee.org/document/8754179/
[14]
Garba A, Khalid S, Ullah I, Khusro S, Mumin D (2020) Embedding based learning for collection selection in federated search. Data Technol Appl 54(5).
[15]
Garba A, Wu S (2023) Snippet-based result merging in federated search. J Inf Sci. 01655515221144864.
[16]
Ghansah B, Wu S, Ghansah N (2015) Rankboost-Based Result Merging. In: 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing. IEEE, Liverpool, UK, pp 907–914. https://ieeexplore.ieee.org/document/7363176/
[17]
Gong Z, Cheang CW, Hou UL (2005) Web query expansion by wordnet. In International Conference on Database and Expert Systems Applications, pp 166–175. Springer.
[18]
Gravano L, Chang C-CK, Garcia-Molina H, Paepcke A (1997) Starts: Stanford proposal for internet meta-searching. In Proceedings of the 1997 ACM SIGMOD international conference on Management of data. 207–218.
[19]
Han B, Chen L, and Tian X Knowledge based collection selection for distributed information retrieval Inf Process Manage 2018 54 1 116-128
[20]
Hong D, Si L (2012) Mixture model with multiple centralized retrieval algorithms for result merging in federated search. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. pp 821–830.
[21]
Keikha A, Ensan F, and Bagheri E Query expansion using pseudo relevance feedback on wikipedia J Intell 2018 50 3 455-478
[22]
Khalid S, Khusro S, Alam A, Wahid A (2023) BERT-embedding and citation network analysis based query expansion technique for scholarly search. arXiv preprint arXiv:2301.11069.
[23]
Khalid S, Khusro S, and Ullah I Crawling ajax-based web applications: Evolution and state-of-the-art Malays J Comput Sci 2018 31 1 35-47
[24]
Khalid S, Shengli Wu, Alam A, and Ullah I Real-time feedback query expansion technique for supporting scholarly search using citation network analysis J Inf Sci 2021 47 1 3-15
[25]
Khalid S and Shengli Wu Supporting scholarly search by query expansion and citation analysis Eng Technol Appl Sci Res 2020 10 4 6102-6108
[26]
Koutsomitropoulos D, Solomou G, and Kalou K Federated semantic search using terminological thesauri for learning object discovery J Enterp Inf Manag 2017 30 5 795-808
[27]
Li L, Zhang Z, and Wu S Meng Xiaofeng, Li Ruixuan, Wang Kanliang, Niu Baoning, Wang Xin, and Zhao Gansen Lda-based resource selection for results diversification in federated search Web Information Systems and Applications 2018 Cham Springer 147-156
[28]
Mikolov T, Chen K, Greg Corrado, and Jeffrey Dean (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
[29]
Ogilvie P, Callan J (2001) The effectiveness of query expansion for distributed information retrieval. In Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM ’01, pp 183-190, New York, NY, USA. Association for Computing Machinery.
[30]
Paepcke A, Brandriff R, Janee G, Larson R, Ludaescher B, Melnik S, and Raghavan S Search middleware and the simple digital library interoperability protocol D-Lib Magazine 2000 6 3 5-8
[31]
Palakodety S, Callan J (2014) Query transformations for result merging. Technical report, Carnegie-Mellon Univ Pittsburgh Pa School of Computer Science. https://apps.dtic.mil/sti/pdfs/ADA618630.pdf. Accessed 20 Nov 2021
[32]
Pal D, Mitra M, and Datta K Improving query expansion using wordnet J Am Soc Inf Sci 2014 65 12 2469-2478
[33]
Parapar J, Presedo-Quindimil MA, and Barreiro A Score distributions for pseudo relevance feedback Inf Sci 2014 273 171-181
[34]
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
[35]
Piedra N, Chicaiza J, Lpez J, and Tovar E An architecture based on linked data technologies for the integration and reuse of oer in moocs context Open Praxis 2014 6 2 171-187
[36]
Rattinger A, Le Goff J, and Guetl C Local word embeddings for query expansion based on co-authorship and citations CEUR Workshop Proc 2018 2080 46-53
[37]
Robertson SE, Walker S, and Beaulieu M Experimentation as a way of life: Okapi at trec Inf Process Manage 2000 36 1 95-108
[38]
Roy D, Paul D, Mitra M, Garain U (2016) Using word embeddings for automatic query expansion. arXiv preprint arXiv:1606.07608
[39]
Sellami S and Zarour NE Keyword-based faceted search interface for knowledge graph construction and exploration Int J Web Inf Syst 2022 18 5/6 453-486
[40]
Sharma DK, Pamula R, Chauhan DS (2018) A comparative analysis of fuzzy logic based query expansion approaches for document retrieval. In International Conference on Advances in Computing and Data Sciences, pp 336–345. Springer.
[41]
Shokouhi M, Azzopardi L, Thomas P (2009) Effective query expansion for federated search. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, p 427-434. Association for Computing Machinery, New York, NY, USA.
[42]
Shokouhi M and Si L Federated search. Found Trends Inf Retr 2011 5 1 1-102
[43]
Shokouhi M (2007) Central-rank-based collection selection in uncooperative distributed information retrieval. In European Conference on Information Retrieval, pp 160–172. Springer.
[44]
Singh J and Sharan A Context window based co-occurrence approach for improving feedback based query expansion in information retrieval Int J Inf Retr Res (IJIRR) 2015 5 4 31-45
[45]
Singh J and Sharan A A new fuzzy logic-based query expansion model for efficient information retrieval using relevance feedback approach Neural Comput Appl 2017 28 9 2557-2580
[46]
Ullah I and Khusro S Social book search: the impact of the social web on book retrieval and recommendation Multimed Tools Appl 2020 79 11–12 8011-8060
[47]
Ullah I and Khusro S On the analysis and evaluation of information retrieval models for social book search Multimed Tools Appl 2023 82 5 6431-6478
[48]
Urak G, Ziak H, Kern R (2018) Source selection of long tail sources for federated search in an uncooperative setting. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing, SAC ’18, p 720-727. Association for Computing Machinery, New York, NY, USA.
[49]
Wang Q, Shi S, Cao W (2014) Ruc at trec 2014: Select resources using topic models. Technical report, RENMIN UNIV BEIJING (CHINA). http://trec.nist.gov/pubs/trec23/papers/pro-info ruc federated.pdf
[50]
Wu T, X Liu, Dong S (2019) Ltrrs: a learning to rank based algorithm for resource selection in distributed information retrieval. In Information Retrieval: 25th China Conference, CCIR 2019, Fuzhou, China, September 20–22, 2019, Proceedings 25, pp 52–63. Springer.
[51]
Xu J, Callan J (1998) Effective retrieval with distributed collections. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, pp 112–120.

Index Terms

  1. Understanding the impact of query expansion on federated search
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Multimedia Tools and Applications
          Multimedia Tools and Applications  Volume 83, Issue 4
          Jan 2024
          2884 pages

          Publisher

          Kluwer Academic Publishers

          United States

          Publication History

          Published: 21 June 2023
          Accepted: 10 May 2023
          Revision received: 27 March 2023
          Received: 19 January 2022

          Author Tags

          1. Distributed Information Retrieval
          2. Federated Search
          3. Results Merging
          4. Query Expansion
          5. Uncooperative Environments

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 12 Jan 2025

          Other Metrics

          Citations

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media