research-article

Modeling Queries with Contextual Snippets for Information Retrieval

Authors:

Jimmy Xiangji Huang,

Liang HeAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 9, Issue 4

Article No.: 47, Pages 1 - 26

https://doi.org/10.1145/3161607

Published: 31 January 2018 Publication History

Abstract

Query expansion under the pseudo-relevance feedback (PRF) framework has been extensively studied in information retrieval. However, most expansion methods are mainly based on the statistics of single terms, which can generate plenty of irrelevant query terms and decrease retrieval performance. To alleviate this problem, we propose an approach that adapts the PRF-based contextual snippets into a context-aware topic model to enhance query representations. Specifically, instead of selecting a series of independent terms, we make full use of the query contextual information and focus on the snippets with the length of n in the PRF documents. Furthermore, we propose a context-aware topic (CAT) model to mine the topic distributions of the query-relevant snippets, namely, fine contextual snippets. In contrast to the traditional topic models that infer the topics from the whole corpus, we establish a bridge between the snippets and the corresponding PRF documents, which can be used for modeling the topics more precisely and efficiently. Finally, the topic distributions of the fine snippets are used for context-aware and topic-sensitive query representations. To evaluate the performance of our approach, we integrate the obtained queries into a topic-based hybrid retrieval model and conduct extensive experiments on various TREC collections. The experimental results show that our query-modeling approach is more effective in boosting retrieval performance compared with the state-of-the-art methods.

Supplementary Material

a47-chen-apndx.pdf (chen.zip)

Supplemental movie, appendix, image and software files for, Modeling Queries with Contextual Snippets for Information Retrieval

Download
76.25 KB

References

[1]

Giorgos Akrivas, Manolis Wallace, Giorgos Andreou, Giorgos Stamou, and Stefanos Kollias. 2002. Context-sensitive semantic query expansion. In ICAIS’02. 109--114.

Digital Library

[2]

Hagai Attias. 2000. A variational Baysian framework for graphical models. In NIPS’00. 209--215.

Digital Library

[3]

Claudio Biancalana, Fabio Gasparetti, Alessandro Micarelli, and Giuseppe Sansonetti. 2013. Social semantic query expansion. ACM Transactions on Intelligent Systems and Technology 4, 4, 60.

Digital Library

[4]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993--1022.

Digital Library

[5]

Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson. 2008. Selecting good expansion terms for pseudo-relevance feedback. In SIGIR’08. ACM, 243--250.

Digital Library

[6]

Claudio Carpineto and Giovanni Romano. 2012. A survey of automatic query expansion in information retrieval. ACM Computing Surveys 44, 1, 1--56.

Digital Library

[7]

Qin Chen, Qinmin Hu, Jimmy Xiangji Huang, and Liang He. 2018. CA-RNN: Using context-aligned recurrent neural networks for modeling sentence similarity. In AAAI’18. 8 pages.

[8]

Qin Chen, Qinmin Hu, Jimmy Xiangji Huang, Liang He, and Weijie An. 2017. Enhancing recurrent neural networks with positional attention for question answering. In SIGIR’17. 993--996.

Digital Library

[9]

Kenneth Ward Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16, 1, 22--29.

Digital Library

[10]

Kevyn Collins-Thompson and Jamie Callan. 2007. Estimation and use of uncertainty in pseudo-relevance feedback. In SIGIR’07. ACM, 303--310.

Digital Library

[11]

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Vol. 151. Cambridge: Cambridge University Press. 177 pages.

[12]

Liana Ermakova, Josiane Mothe, and Elena Nikitina. 2016. Proximity relevance model for query expansion. In SAC’16. ACM, 1054--1059.

Digital Library

[13]

George W. Furnas, Thomas K. Landauer, Louis M. Gomez, and Susan T. Dumais. 1987. The vocabulary problem in human-system communication. Communications of the ACM 30, 11, 964--971.

Digital Library

[14]

Debasis Ganguly, Dwaipayan Roy, Mandar Mitra, and Gareth J. F. Jones. 2015. Word embedding based generalized language model for information retrieval. In SIGIR’15. ACM, 795--798.

Digital Library

[15]

Charles J. Geyer. 1992. Practical Markov chain Monte Carlo. Statistical Science 7, 4 (1992), 473--483.

[16]

Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, Fabrizio Silvestri, and Narayan Bhamidipati. 2015. Context-and content-aware embeddings for query rewriting in sponsored search. In SIGIR’15. ACM, 383--392.

Digital Library

[17]

Brynjar Gretarsson, John Odonovan, Svetlin Bostandjiev, Tobias Hllerer, Arthur Asuncion, David Newman, and Padhraic Smyth. 2012. Topicnets: Visual analysis of large text corpora with topic modeling. ACM Transactions on Intelligent Systems and Technology 3, 2, 23.

Digital Library

[18]

Thomas L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. In PNAS’04. 5228--5235.

[19]

Ben He, Jimmy Xiangji Huang, and Xiaofeng Zhou. 2011. Modeling term proximity for probabilistic information retrieval models. Information Sciences 181, 14, 3017--3031.

Digital Library

[20]

Qinmin Hu, Yijun Pei, Qin Chen, and Liang He. 2016. SG++: Word representation with sentiment and negation for Twitter sentiment classification. In SIGIR’16. ACM, 997--1000.

Digital Library

[21]

Jimmy Xiangji Huang, Jun Miao, and Ben He. 2013. High performance query expansion using adaptive co-training. Information Processing 8 Management 49, 2, 441--453.

Digital Library

[22]

Xiangji Huang, Yan Rui Huang, Miao Wen, Aijun An, Yang Liu, and Josiah Poon. 2006. Applying data mining to pseudo-relevance feedback for high performance text retrieval. In ICDM’06. IEEE, 295--306.

Digital Library

[23]

Y. Kumar Jain and Santosh Kumar Bhandare. 2011. Min max normalization based data perturbation method for privacy protection. International Journal of Computer and Communication Technology 2, 8, 45--50.

[24]

Zongcheng Ji, Fei Xu, Bin Wang, and Ben He. 2012. Question-answer topic model for question retrieval in community question answering. In CIKM’12. ACM, 2471--2474.

Digital Library

[25]

Fanghong Jian, Jimmy Xiangji Huang, Jiashu Zhao, Tingting He, and Po Hu. 2016. A simple enhancement for ad-hoc information retrieval via topic modelling. In SIGIR’16. ACM, 733--736.

Digital Library

[26]

Álvaro Barbero Jiménez, Jorge López Lázaro, and José R. Dorronsoro. 2009. Finding optimal model parameters by deterministic and annealed focused grid search. Neurocomputing 72, 13, 2824--2832.

Digital Library

[27]

Michael I. Jordan, Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. 1999. An introduction to variational methods for graphical models. Machine Learning 37, 2, 183--233.

Digital Library

[28]

Andisheh Keykhah, Faezeh Ensan, and Ebrahim Bagheri. 2016. Query expansion using pseudo relevance feedback on Wikipedia. In Workshop at WSDM’16.

[29]

Victor Lavrenko and W. Bruce Croft. 2001. Relevance based language models. In SIGIR’01. ACM, 120--127.

Digital Library

[30]

Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In ICML’14. 1188--1196.

Digital Library

[31]

Lillian Lee. 1999. Measures of distributional similarity. In ACL’99. 25--32.

Digital Library

[32]

Michael E. Lesk. 1969. Word-word associations in document retrieval systems. Journal of the American Society for Information Science and Technology 20, 1, 27--38.

[33]

Lin Li, Guandong Xu, Zhenglu Yang, Peter Dolog, Yanchun Zhang, and Masaru Kitsuregawa. 2013. An efficient approach to suggesting topically related web queries using hidden topic model. World Wide Web 16, 3, 273--297.

Digital Library

[34]

Xinyi Li and Maarten de Rijke. 2017. Do topic shift and query reformulation patterns correlate in academic search?. In ECIR’17. 146--159.

[35]

Zhen Liao, Daxin Jiang, Enhong Chen, Jian Pei, Huanhuan Cao, and Hang Li. 2011. Mining concept sequences from large-scale search logs for context-aware query suggestion. ACM Transactions on Intelligent Systems and Technology 3, 1, 17.

Digital Library

[36]

Huiwen Liu, Jiajie Xu, Kai Zheng, Chengfei Liu, Lan Du, and Xian Wu. 2017. Semantic-aware query processing for activity trajectories. In WSDM’17. ACM, 283--292.

Digital Library

[37]

Yiqun Liu, Junwei Miao, Min Zhang, Shaoping Ma, and Liyun Ru. 2011. How do users describe their information need: Query recommendation based on snippet click model. Expert Systems with Applications 38, 11, 13847--13856.

[38]

Yuanhua Lv and ChengXiang Zhai. 2010. Positional relevance model for pseudo-relevance feedback. In SIGIR’10. ACM, 579--586.

Digital Library

[39]

Jun Miao, Jimmy Xiangji Huang, and Zheng Ye. 2012. Proximity-based Rocchio’s model for pseudo relevance feedback. In SIGIR’12. ACM, 535--544.

Digital Library

[40]

Jun Miao, Jimmy Xiangji Huang, and Jiashu Zhao. 2016. TopPRF: A probabilistic framework for integrating topic space into pseudo relevance feedback. ACM Transactions on Information Systems 34, 4, 1--38.

Digital Library

[41]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. Arxiv Preprint Arxiv:1301.3781.

[42]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS’13. 3111--3119.

Digital Library

[43]

Singthongchai Niwattanakul, Jatsada Singthongchai, Ekkachai Naenudorn, and Supachanun Wanapu. 2013. Using of Jaccard coefficient for keywords similarity. In IMECS’13. 13--15.

[44]

Jay M. Ponte and W. Bruce Croft. 1998. A language modeling approach to information retrieval. In SIGIR’98. ACM, 275--281.

Digital Library

[45]

Run Wei Qiang, Yue Fei, Yi Hong Hong, and Jian Wu Yang. 2013. PKUICST at TREC 2013 microblog track. In TREC’13. 1--5.

[46]

Nazneen Fatema N. Rajani, Kate McArdle, and Jason Baldridge. 2014. Extracting topics based on authors, recipients and content in microblogs. In SIGIR’14. ACM, 1171--1174.

Digital Library

[47]

Daniel Ramage, Susan T. Dumais, and Daniel J. Liebling. 2010. Characterizing microblogs with topic models. In ICWSM’10. 1--8.

[48]

Priyang Rathod, Mithun Sheshagiri, and Anugeetha Kunjithapatham. 2007. Method and apparatus for search result snippet analysis for query expansion and result filtering. US Patent App. 11/725,865.

[49]

Stephen E. Robertson, Steve Walker, Susan Jones, Micheline M. Hancock-Beaulieu, and Mike Gatford. 1995. Okapi at TREC-3. NIST Special Publication. National Instiute of Standards 8 Technology, 109--109.

[50]

Joseph John Rocchio. 1971. Relevance feedback in information retrieval. In the SMART Retrieval System: Experiments in Automatic Document Processing. 313--323.

[51]

Gerd Ronning. 1989. Maximum likelihood estimation of Dirichlet distributions. Journal of Statistical Computation and Simulation 32, 4, 215--221.

[52]

Anna Shtok, Oren Kurland, and David Carmel. 2009. Predicting query performance by query-drift estimation. In ICTIR’09. 305--312.

Digital Library

[53]

Jagendra Singh and Aditi Sharan. 2015. Context window based co-occurrence approach for improving feedback based query expansion in information retrieval. International Journal of Information Retrieval Research 5, 4, 31--45.

[54]

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP’13. 1631--1642.

[55]

Ellen M. Voorhees and Donna K. Harman. 2005. TREC: Experiment and Evaluation in Information Retrieval. Vol. 1. Cambridge: MIT Press.

Digital Library

[56]

Jeroen B. P. Vuurens and Arjen P. de Vries. 2014. Distance matters&excl; Cumulative proximity expansions for ranking documents. Information Retrieval Journal 17, 4, 380--406.

Digital Library

[57]

Yashen Wang, Heyan Huang, and Chong Feng. 2017. Query expansion based on a feedback concept model for microblog retrieval. In WWW’17. 559--568.

Digital Library

[58]

Zhibo Wang, Long Ma, and Yanqing Zhang. 2016. A hybrid document feature extraction method using latent Dirichlet allocation and word2vec. In DSC’16. IEEE, 98--103.

[59]

Xing Wei and W Bruce Croft. 2006. LDA-based document models for ad-hoc retrieval. In SIGIR’06. 178--185.

Digital Library

[60]

Justin Wood, Patrick Tan, Wei Wang, and Corey Arnold. 2017. Source-LDA: Enhancing probabilistic topic models using prior knowledge sources. In ICDE’17. 411--422.

[61]

Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Jun Xu, and Xueqi Cheng. 2015. A probabilistic model for bursty topic discovery in microblogs. In AAAI’15. 353--359.

Digital Library

[62]

Yuangang Yao, Jin Yi, Yanzhao Liu, Xianghui Zhao, and Chenghao Sun. 2015. Query processing based on associated semantic context inference. In ICISCE’15. 395--399.

Digital Library

[63]

Zheng Ye and Jimmy Xiangji Huang. 2014. A simple term frequency transformation model for effective pseudo relevance feedback. In SIGIR’14. ACM, 323--332.

Digital Library

[64]

Zheng Ye and Jimmy Xiangji Huang. 2016. A learning to rank approach for quality-aware pseudo-relevance feedback. Journal of the American Society for Information Science and Technology 67, 4, 942--959.

Digital Library

[65]

Zheng Ye, Jimmy Xiangji Huang, and Hongfei Lin. 2011. Finding a good query-related topic for boosting pseudo-relevance feedback. Journal of the American Society for Information Science and Technology 62, 4, 748--760.

Digital Library

[66]

Zhijun Yin, Liangliang Cao, Quanquan Gu, and Jiawei Han. 2012. Latent community topic analysis: Integration of community discovery with topic modeling. ACM Transactions on Intelligent Systems and Technology 3, 4, 63.

Digital Library

[67]

Chengxiang Zhai and John Lafferty. 2001. Model-based feedback in the KL-divergence retrieval model. In CIKM’01. 403--410.

[68]

Chengxiang Zhai and John Lafferty. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR’01. ACM, 334--342.

Digital Library

[69]

Ke Zhai, Jordan Boyd-Graber, Nima Asadi, and Mohamad L. Alkhouja. 2012. Mr. LDA: A flexible large scale topic modeling package using variational inference in mapreduce. In WWW’12. 879--888.

Digital Library

[70]

Peng Zhang, Qian Yu, Yuexian Hou, Dawei Song, Jingfei Li, and Bin Hu. 2017. A distribution separation method using irrelevance feedback data for information retrieval. ACM Transactions on Intelligent Systems and Technology 8, 3 (2017), 26 Pages.

Digital Library

[71]

Jiashu Zhao, Jimmy Xiangji Huang, and Ben He. 2011. CRTER: Using cross terms to enhance probabilistic information retrieval. In SIGIR’11. ACM, 155--164.

Digital Library

[72]

Jiashu Zhao, Jimmy Xiangji Huang, and Shicheng Wu. 2012. Rewarding term location information to enhance probabilistic information retrieval. In SIGIR’12. ACM, 1137--1138.

Digital Library

[73]

Jiashu Zhao, Jimmy Xiangji Huang, and Zheng Ye. 2014. Modeling term associations for probabilistic information retrieval. ACM Transactions on Information Systems 32, 2, 1--47.

Digital Library

[74]

Yueting Zhuang, Hanqi Wang, Jun Xiao, Fei Wu, Yi Yang, Weiming Lu, and Zhongfei Zhang. 2017. Bag-of-discriminative-words (BoDW) representation via topic modeling. IEEE Transactions on Knowledge and Data Engineering 29, 5, 977--990.

Digital Library

Cited By

Lichtenstein MRucks-Ahidiana Z(2021)Contextual Text Coding: A Mixed-methods Approach for Large-scale Textual DataSociological Methods & Research10.1177/004912412098619152:2(606-641)Online publication date: 8-Feb-2021
https://doi.org/10.1177/0049124120986191
Chen SHu QSong YHe YWu HHe L(2019)Self-Attention based Network For Medical Query Expansion2019 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2019.8852269(1-9)Online publication date: Jul-2019
https://doi.org/10.1109/IJCNN.2019.8852269
Chen QHu QHuang JHe L(2018)TAKer: Fine-Grained Time-Aware Microblog Search with Kernel Density EstimationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.279453830:8(1602-1615)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1109/TKDE.2018.2794538

Index Terms

Modeling Queries with Contextual Snippets for Information Retrieval
1. Information systems
  1. Information retrieval

Recommendations

Improving Short Query Representation in LDA Based Information Retrieval Systems
Hybrid Artificial Intelligent Systems
Abstract
Incorporation of topic modeling techniques into Information Retrieval (IR) systems has been a promising area of research in the last years. Typically, queries submitted into IR systems are concise and made up using only the essential keywords. ...
An Intermediate Query Model for Structured Retrieval's Queries Construction
iiWAS '14: Proceedings of the 16th International Conference on Information Integration and Web-based Applications & Services

Looking at the amount of structured contents available on the web in recent years, we can be certain that the needs of structured retrieval systems are getting more prominent. In order to access the structured contents, information requests are ...
Extractive text summarization using clustering-based topic modeling
Abstract
Text summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 9, Issue 4

Research Survey and Regular Papers

July 2018

280 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3183892

Editor:
Yu Zheng
Microsoft Research, China

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 January 2018

Accepted: 01 November 2017

Revised: 01 October 2017

Received: 01 July 2017

Published in TIST Volume 9, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Nature Science Foundation of China
High Technology Research and Development Program of China
ORF-RE (Ontario Research Fund-Research Excellence) award in BRAIN Alliance
Natural Sciences 8 Engineering Research Council (NSERC) of Canada
NSERC CREATE award in ADERSIM
York Research Chairs (YRC)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
342
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lichtenstein MRucks-Ahidiana Z(2021)Contextual Text Coding: A Mixed-methods Approach for Large-scale Textual DataSociological Methods & Research10.1177/004912412098619152:2(606-641)Online publication date: 8-Feb-2021
https://doi.org/10.1177/0049124120986191
Chen SHu QSong YHe YWu HHe L(2019)Self-Attention based Network For Medical Query Expansion2019 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2019.8852269(1-9)Online publication date: Jul-2019
https://doi.org/10.1109/IJCNN.2019.8852269
Chen QHu QHuang JHe L(2018)TAKer: Fine-Grained Time-Aware Microblog Search with Kernel Density EstimationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.279453830:8(1602-1615)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1109/TKDE.2018.2794538

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents