Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2970398.2970405acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

Embedding-based Query Language Models

Published: 12 September 2016 Publication History

Abstract

Word embeddings, which are low-dimensional vector representations of vocabulary terms that capture the semantic similarity between them, have recently been shown to achieve impressive performance in many natural language processing tasks. The use of word embeddings in information retrieval, however, has only begun to be studied. In this paper, we explore the use of word embeddings to enhance the accuracy of query language models in the ad-hoc retrieval task. To this end, we propose to use word embeddings to incorporate and weight terms that do not occur in the query, but are semantically related to the query terms. We describe two embedding-based query expansion models with different assumptions. Since pseudo-relevance feedback methods that use the top retrieved documents to update the original query model are well-known to be effective, we also develop an embedding-based relevance model, an extension of the effective and robust relevance model approach. In these models, we transform the similarity values obtained by the widely-used cosine similarity with a sigmoid function to have more discriminative semantic similarity values. We evaluate our proposed methods using three TREC newswire and web collections. The experimental results demonstrate that the embedding-based methods significantly outperform competitive baselines in most cases. The embedding-based methods are also shown to be more robust than the baselines.

References

[1]
N. Abdul-jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, D. Metzler, M. D. Smucker, T. Strohman, H. Turtle, and C. Wade. UMass at TREC 2004: Novelty and HARD. In TREC '04, 2004.
[2]
M. ALMasri, C. Berrut, and J.-P. Chevallet. A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information. In ECIR '16, pages 709--715, 2016.
[3]
J. Bai, J.-Y. Nie, G. Cao, and H. Bouchard. Using Query Contexts in Information Retrieval. In SIGIR '07, pages 15--22, 2007.
[4]
C. Carpineto and G. Romano. A Survey of Automatic Query Expansion in Information Retrieval. ACM Comput. Surv., 44(1):1:1--1:50, 2012.
[5]
S. Clinchant and F. Perronnin. Aggregating Continuous Word Embeddings for Information Retrieval. In CVSC@ACL '13, pages 100--109, 2013.
[6]
K. Collins-Thompson. Reducing the Risk of Query Expansion via Robust Constrained Optimization. In CIKM '09, pages 837--846, 2009.
[7]
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.
[8]
P. Dhillon, D. P. Foster, and L. H. Ungar. Multi-View Learning of Word Embeddings via CCA. In NIPS '11, pages 199--207, 2011.
[9]
F. Diaz, B. Mitra, and N. Craswell. Query Expansion with Locally-Trained Word Embeddings. In ACL '16, 2016.
[10]
D. Ganguly, D. Roy, M. Mitra, and G. J. Jones. Word Embedding Based Generalized Language Model for Information Retrieval. In SIGIR '15, pages 795--798, 2015.
[11]
M. Karimzadehgan and C. Zhai. Estimation of Statistical Translation Models Based on Mutual Information for Ad Hoc Information Retrieval. In SIGIR '10, pages 323--330, 2010.
[12]
T. Kenter and M. de Rijke. Short Text Similarity with Word Embeddings. In CIKM '15, pages 1411--1420, 2015.
[13]
M. J. Kusner, Y. Sun, N. I. Kolkin, and K. Q. Weinberger. From Word Embeddings to Document Distances. In ICML '15, pages 957--966, 2015.
[14]
J. Lafferty and C. Zhai. Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In SIGIR '01, pages 111--119, 2001.
[15]
V. Lavrenko and W. B. Croft. Relevance Based Language Models. In SIGIR '01, pages 120--127, 2001.
[16]
Q. V. Le and T. Mikolov. Distributed Representations of Sentences and Documents. In ICML '14, pages 1188--1196, 2014.
[17]
O. Levy, Y. Goldberg, and I. Dagan. Improving Distributional Similarity with Lessons Learned from Word Embeddings. TACL, 3:211--225, 2015.
[18]
Y. Lv and C. Zhai. A Comparative Study of Methods for Estimating Query Language Models with Pseudo Feedback. In CIKM '09, pages 1895--1898, 2009.
[19]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed Representations of Words and Phrases and their Compositionality. In NIPS '13, pages 3111--3119, 2013.
[20]
A. Montazeralghaem, H. Zamani, and A. Shakery. Axiomatic Analysis for Improving the Log-Logistic Feedback Model. In SIGIR '16, pages 765--768, 2016.
[21]
J. Pennington, R. Socher, and C. Manning. GloVe: Global Vectors for Word Representation. In EMNLP '14, pages 1532--1543, 2014.
[22]
J. M. Ponte and W. B. Croft. A Language Modeling Approach to Information Retrieval. In SIGIR '98, pages 275--281, 1998.
[23]
P. Resnik. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In IJCAI '95, pages 448--453, 1995.
[24]
J. J. Rocchio. Relevance Feedback in Information Retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313--323. 1971.
[25]
A. Sordoni, Y. Bengio, and J.-Y. Nie. Learning Concept Embeddings for Query Expansion by Quantum Entropy Minimization. In AAAI '14, pages 1586--1592, 2014.
[26]
E. M. Voorhees. Query Expansion Using Lexical-semantic Relations. In SIGIR '94, pages 61--69, 1994.
[27]
I. Vulić and M.-F. Moens. Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings. In SIGIR '15, pages 363--372, 2015.
[28]
J. Xu and W. B. Croft. Query Expansion Using Local and Global Document Analysis. In SIGIR '96, pages 4--11, 1996.
[29]
C. Zhai and J. Lafferty. Model-based Feedback in the Language Modeling Approach to Information Retrieval. In CIKM '01, pages 403--410, 2001.
[30]
G. Zheng and J. Callan. Learning to Reweight Terms with Distributed Representations. In SIGIR '15, pages 575--584, 2015.
[31]
G. Zhou, T. He, J. Zhao, and P. Hu. Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering. In ACL '15, pages 250--259, 2015.
[32]
G. Zuccon, B. Koopman, P. Bruza, and L. Azzopardi. Integrating and Evaluating Neural Word Embeddings in Information Retrieval. In ADCS '15, pages 12:1--12:8, 2015.

Cited By

View all
  • (2024)Dimension Importance Estimation for Dense Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657691(1318-1328)Online publication date: 10-Jul-2024
  • (2024)Exploring Image Similarity through Generative Language Models: A Comparative Study of GPT-4 with Word Embeddings and Traditional Approaches2024 IEEE International Conference on Electro Information Technology (eIT)10.1109/eIT60633.2024.10609905(275-279)Online publication date: 30-May-2024
  • (2024)Ranking Case Law by Context Dimensions Using Fuzzy Fingerprints2024 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE60900.2024.10612086(1-8)Online publication date: 30-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval
September 2016
318 pages
ISBN:9781450344975
DOI:10.1145/2970398
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. language models
  2. pseudo-relevance feedback
  3. query expansion
  4. word embedding

Qualifiers

  • Research-article

Conference

ICTIR '16
Sponsor:

Acceptance Rates

ICTIR '16 Paper Acceptance Rate 41 of 79 submissions, 52%;
Overall Acceptance Rate 235 of 527 submissions, 45%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)95
  • Downloads (Last 6 weeks)12
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Dimension Importance Estimation for Dense Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657691(1318-1328)Online publication date: 10-Jul-2024
  • (2024)Exploring Image Similarity through Generative Language Models: A Comparative Study of GPT-4 with Word Embeddings and Traditional Approaches2024 IEEE International Conference on Electro Information Technology (eIT)10.1109/eIT60633.2024.10609905(275-279)Online publication date: 30-May-2024
  • (2024)Ranking Case Law by Context Dimensions Using Fuzzy Fingerprints2024 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE60900.2024.10612086(1-8)Online publication date: 30-Jun-2024
  • (2024)Conditional variational autoencoder for query expansion in ad-hoc information retrievalInformation Sciences: an International Journal10.1016/j.ins.2023.119764652:COnline publication date: 1-Jan-2024
  • (2024)A Generative AI-Based Assistant to Evaluate Short and Long Answer QuestionsSN Computer Science10.1007/s42979-024-02965-45:5Online publication date: 10-Jun-2024
  • (2024)SeNSe: embedding alignment via semantic anchors selectionInternational Journal of Data Science and Analytics10.1007/s41060-024-00522-zOnline publication date: 20-Mar-2024
  • (2024)Word2Vec-GloVe-BERT Embeddings for Query ExpansionIntelligent Systems Design and Applications10.1007/978-3-031-64836-6_17(167-177)Online publication date: 25-Jul-2024
  • (2024)Event-Specific Document Ranking Through Multi-stage Query Expansion Using an Event Knowledge GraphAdvances in Information Retrieval10.1007/978-3-031-56060-6_22(333-348)Online publication date: 16-Mar-2024
  • (2023)Semantics-aware query expansion using pseudo-relevance feedbackJournal of Information Science10.1177/01655515231184831Online publication date: 22-Jul-2023
  • (2023)ColBERT-PRF: Semantic Pseudo-Relevance Feedback for Dense Passage and Document RetrievalACM Transactions on the Web10.1145/357240517:1(1-39)Online publication date: 16-Jan-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media