research-article

Generic Intent Representation in Web Search

Authors:

Paul N. Bennett,

Saurabh TiwaryAuthors Info & Claims

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 65 - 74

https://doi.org/10.1145/3331184.3331198

Published: 18 July 2019 Publication History

Abstract

This paper presents GEneric iNtent Encoder (GEN Encoder) which learns a distributed representation space for user intent in search. Leveraging large scale user clicks from Bing search logs as weak supervision of user intent, GEN Encoder learns to map queries with shared clicks into similar embeddings end-to-end and then fine-tunes on multiple paraphrase tasks. Experimental results on an intrinsic evaluation task - query intent similarity modeling - demonstrate GEN Encoder's robust and significant advantages over previous representation methods. Ablation studies reveal the crucial role of learning from implicit user feedback in representing user intent and the contributions of multi-task learning in representation generality. We also demonstrate that GEN Encoder alleviates the sparsity of tail search traffic and cuts down half of the unseen queries by using an efficient approximate nearest neighbor search to effectively identify previous queries with the same search intent. Finally, we demonstrate distances between GEN encodings reflect certain information seeking behaviors in search sessions.

Supplementary Material

MP4 File (cite2-11h20-d1.mp4)

Download
506.60 MB

References

[1]

Eugene Agichtein, Eric Brill, and Susan Dumais. 2006. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 19--26.

Digital Library

[2]

Michael Bendersky, Donald Metzler, and W. Bruce Croft. 2011. Parameterized concept weighting in verbose queries. In Proceedings of the 34th annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2011). ACM, 605--614.

Digital Library

[3]

Andrei Broder. 2002. A taxonomy of web search. In ACM Sigir forum, Vol. 36. ACM, 3--10.

Digital Library

[4]

Andrei Z. Broder, Marcus Fontoura, Evgeniy Gabrilovich, Amruta Joshi, Vanja Josifovski, and Tong Zhang. 2007. Robust classification of rare queries using web Knowledge. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2007). ACM, 231--238.

Digital Library

[5]

Ben Carterette, Evangelos Kanoulas, Mark Hall, and Paul Clough. 2014. Overview of the TREC 2014 session track. In Proceedings of The 23rd Text Retrieval Conference (TREC 2014).

[6]

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, et al. 2018. Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018).

[7]

W. Bruce Croft, Donald Metzler, and Trevor Strohman. 2010. Search Engines: Information Retrieval in Practice. Addison-Wesley Reading.

Digital Library

[8]

Zhuyun Dai, Chenyan Xiong, Jamie Callan, and Zhiyuan Liu. 2018. Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM 2018). ACM, 126--134.

Digital Library

[9]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[10]

Fernando Diaz, Bhaskar Mitra, and Nick Craswell. 2016. Query Expansion with Locally-Trained Word Embeddings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016). 367--377.

[11]

Doug Downey, Susan Dumais, and Eric Horvitz. 2007. Heads and tails: studies of web search with common and rare queries. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 847--848.

Digital Library

[12]

Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2002. Placing search in context: The concept revisited. ACM Transactions on information systems 20, 1 (2002), 116--131.

Digital Library

[13]

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 55--64.

Digital Library

[14]

Ahmed Hassan, Ryen W. White, Susan T. Dumais, and Yi-Min Wang. 2014. Struggling or exploring? disambiguating long search sessions. In Proceedings of the 7th ACM international conference on Web search and data mining. ACM, 53--62.

Digital Library

[15]

Jian Hu, Gang Wang, Fred Lochovsky, Jian-tao Sun, and Zheng Chen. 2009. Understanding user's query intent with wikipedia. In Proceedings of the 18th international conference on World wide web. ACM, 471--480.

Digital Library

[16]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (CIKM 2013). ACM, 2333--2338.

Digital Library

[17]

Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2016. CharacterAware Neural Language Models. In AAAI. 2741--2749.

Digital Library

[18]

Victor Lavrenko and W. Bruce Croft. 2001. Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001). ACM, 120--127.

Digital Library

[19]

Xiao Li, Ye-YiWang, and Alex Acero. 2008. Learning query intent from regularized click graphs. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 339--346.

Digital Library

[20]

Yury A. Malkov and Dmitry A. Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence (2018).

[21]

Donald Metzler and W. Bruce Croft. 2005. A Markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005). ACM, 472--479.

Digital Library

[22]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 27th Advances in Neural Information Processing Systems 2013 (NIPS 2013). NIPS, 3111--3119.

Digital Library

[23]

Andriy Mnih and Koray Kavukcuoglu. 2013. Learning word embeddings efficiently with noise-contrastive estimation. In Advances in neural information processing systems. 2265--2273.

Digital Library

[24]

Eric Nalisnick, Bhaskar Mitra, Nick Craswell, and Rich Caruana. 2016. Improving document ranking with dual word embeddings. In Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, 83--84.

Digital Library

[25]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, and Xueqi Cheng. 2017. Deeprank: A new deep architecture for relevance ranking in information retrieval. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 257--266.

Digital Library

[26]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP 2014). 1532--1543.

[27]

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proc. of NAACL.

[28]

Navid Rekabsaz, Mihai Lupu, Allan Hanbury, and Hamed Zamani. 2017. Word Embedding Causes Topic Shifting; Exploit Global Context! In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1105--1108.

Digital Library

[29]

Gerard Salton and Chris Buckley. 1990. Improving retrieval performance by relevance feedback. Journal of the American society for information science 41, 4 (1990), 288--297.

[30]

Tobias Schnabel, Igor Labutov, David Mimno, and Thorsten Joachims. 2015. Evaluation methods for unsupervised word embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 298--307.

[31]

Dou Shen, Jian-Tao Sun, Qiang Yang, and Zheng Chen. 2006. Building bridges for web query classification. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 131--138.

Digital Library

[32]

Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. Learning semantic representations using convolutional neural networks for web search. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 373--374.

Digital Library

[33]

Rupesh K. Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Training very deep networks. In Advances in neural information processing systems (NeuIPS 2015). 2377--2385.

Digital Library

[34]

AlexWang, Amapreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2018. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. arXiv preprint arXiv:1804.07461 (2018).

[35]

Chenyan Xiong and Jamie Callan. 2015. Query expansion with Freebase. In Proceedings of the fifth ACM International Conference on the Theory of Information Retrieval (ICTIR 2015). ACM, 111--120.

Digital Library

[36]

Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2017). ACM, 55--64.

Digital Library

[37]

Xiaoxin Yin and Sarthak Shah. 2010. Building taxonomy of web search intents for name entity queries. In Proceedings of the 19th international conference on World wide web (WWW 2010). ACM, 1001--1010.

Digital Library

[38]

Hamed Zamani and W. Bruce Croft. 2016. Embedding-based query language models. In Proceedings of the 2016 ACM international conference on the theory of information retrieval (ICTIR 2016). ACM, 147--156.

Digital Library

[39]

Hamed Zamani and W. Bruce Croft. 2016. Estimating embedding vectors for queries. In Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval (ICTIR 2016). ACM, 123--132.

Digital Library

[40]

Hamed Zamani and W. Bruce Croft. 2017. Relevance-based word embedding. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017). ACM, 505--514.

Digital Library

[41]

Hamed Zamani, Bhaskar Mitra, Xia Song, Nick Craswell, and Saurabh Tiwary. 2018. Neural Ranking Models with Multiple Document Fields. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM 2018). 700--708.

Digital Library

[42]

Guoqing Zheng and James P. Callan. 2015. Learning to reweight terms with distributed representations. In Proceedings of the 38th annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2015). ACM, 575--584.

Digital Library

Cited By

Alaofi MGallagher LSanderson MScholer FThomas PChen HDuh WHuang HKato MMothe JPoblete B(2023)Can Generative LLMs Create Query Variants for Test Collections? An Exploratory StudyProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591960(1869-1873)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591960
Gao JXiong CBennett PCraswell NGao JXiong CBennett PCraswell N(2023)Proactive Human-Machine ConversationsNeural Approaches to Conversational Information Retrieval10.1007/978-3-031-23080-6_7(145-167)Online publication date: 17-Mar-2023
https://doi.org/10.1007/978-3-031-23080-6_7
Pande MKakkar VBansal MKumar SSharma CMalhotra HMehta PAl Hasan MXiong L(2022)Learning-to-Spell: Weak Supervision based Query Correction in E-Commerce Search with Small Strong LabelsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557113(3431-3440)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557113
Show More Cited By

Index Terms

Generic Intent Representation in Web Search
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

User Intent in Multimedia Search: A Survey of the State of the Art and Future Challenges

Today's multimedia search engines are expected to respond to queries reflecting a wide variety of information needs from users with different goals. The topical dimension (“what” the user is searching for) of these information needs is well studied; ...
User Intent, Behaviour, and Perceived Satisfaction in Product Search
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining

As online shopping becomes increasingly popular, users perform more product search to purchase items. Previous studies have investigated people's online shopping behaviours and ways to predict online purchases. However, from a user perspective, there ...
Determining the user intent of web search engine queries
WWW '07: Proceedings of the 16th international conference on World Wide Web

Determining the user intent of Web searches is a difficult problem due to the sparse data available concerning the searcher. In this paper, we examine a method to determine the user intent underlying Web search engine queries. We qualitatively analyze ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2019

1512 pages

ISBN:9781450361729

DOI:10.1145/3331184

General Chairs:
Benjamin Piwowarski
CNRS - Sorbonne Universite, France
,
Max Chevalier
Universite de Toulouse, CNRS, France
,
Eric Gaussier
Universite Grenoble Alpes, CNRS, France
,
Program Chairs:
Yoelle Maarek
Amazon Research, Israel
,
Jian-Yun Nie
University of Montreal, Canada
,
Falk Scholer
RMIT University, Australia

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '19

Sponsor:

SIGIR

SIGIR '19: The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 21 - 25, 2019

Paris, France

Acceptance Rates

SIGIR'19 Paper Acceptance Rate 84 of 426 submissions, 20%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
963
Total Downloads

Downloads (Last 12 months)50
Downloads (Last 6 weeks)1

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Alaofi MGallagher LSanderson MScholer FThomas PChen HDuh WHuang HKato MMothe JPoblete B(2023)Can Generative LLMs Create Query Variants for Test Collections? An Exploratory StudyProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591960(1869-1873)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591960
Gao JXiong CBennett PCraswell NGao JXiong CBennett PCraswell N(2023)Proactive Human-Machine ConversationsNeural Approaches to Conversational Information Retrieval10.1007/978-3-031-23080-6_7(145-167)Online publication date: 17-Mar-2023
https://doi.org/10.1007/978-3-031-23080-6_7
Pande MKakkar VBansal MKumar SSharma CMalhotra HMehta PAl Hasan MXiong L(2022)Learning-to-Spell: Weak Supervision based Query Correction in E-Commerce Search with Small Strong LabelsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557113(3431-3440)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557113
Wang YWang SLi YDou DAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)Recognizing Medical Search Query Intent by Few-shot LearningProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531789(502-512)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531789
Wu GLin JSilva C(2022)IntentVizor: Towards Generic Query Guided Interactive Video Summarization2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.01025(10493-10502)Online publication date: Jul-2022
https://doi.org/10.1109/CVPR52688.2022.01025
Naseri SDalton JYates AAllan J(2022)CEQE to SQET: A study of contextualized embeddings for query expansionInformation Retrieval Journal10.1007/s10791-022-09405-y25:2(184-208)Online publication date: 22-Mar-2022
https://doi.org/10.1007/s10791-022-09405-y
Xu YBin YWang GYang Y(2021)Hierarchical Composition Learning for Composed Query Image RetrievalProceedings of the 3rd ACM International Conference on Multimedia in Asia10.1145/3469877.3490601(1-7)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1145/3469877.3490601
Hashemi HZamani HCroft WDemartini GZuccon GCulpepper JHuang ZTong H(2021)Learning Multiple Intent Representations for Search QueriesProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482445(669-679)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482445
Qu CKong WYang LZhang MBendersky MNajork MDemartini GZuccon GCulpepper JHuang ZTong H(2021)Natural Language Understanding with Privacy-Preserving BERTProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482281(1488-1497)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482281
Ahmadvand AZahiri SHughes SAl Jadda KKallumadi SAgichtein EDiaz FShah CSuel TCastells PJones RSakai T(2021)APRF-Net: Attentive Pseudo-Relevance Feedback Network for Query CategorizationProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463041(1603-1607)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3404835.3463041
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents