Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3331184.3331198acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Generic Intent Representation in Web Search

Published: 18 July 2019 Publication History
  • Get Citation Alerts
  • Abstract

    This paper presents GEneric iNtent Encoder (GEN Encoder) which learns a distributed representation space for user intent in search. Leveraging large scale user clicks from Bing search logs as weak supervision of user intent, GEN Encoder learns to map queries with shared clicks into similar embeddings end-to-end and then fine-tunes on multiple paraphrase tasks. Experimental results on an intrinsic evaluation task - query intent similarity modeling - demonstrate GEN Encoder's robust and significant advantages over previous representation methods. Ablation studies reveal the crucial role of learning from implicit user feedback in representing user intent and the contributions of multi-task learning in representation generality. We also demonstrate that GEN Encoder alleviates the sparsity of tail search traffic and cuts down half of the unseen queries by using an efficient approximate nearest neighbor search to effectively identify previous queries with the same search intent. Finally, we demonstrate distances between GEN encodings reflect certain information seeking behaviors in search sessions.

    Supplementary Material

    MP4 File (cite2-11h20-d1.mp4)

    References

    [1]
    Eugene Agichtein, Eric Brill, and Susan Dumais. 2006. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 19--26.
    [2]
    Michael Bendersky, Donald Metzler, and W. Bruce Croft. 2011. Parameterized concept weighting in verbose queries. In Proceedings of the 34th annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2011). ACM, 605--614.
    [3]
    Andrei Broder. 2002. A taxonomy of web search. In ACM Sigir forum, Vol. 36. ACM, 3--10.
    [4]
    Andrei Z. Broder, Marcus Fontoura, Evgeniy Gabrilovich, Amruta Joshi, Vanja Josifovski, and Tong Zhang. 2007. Robust classification of rare queries using web Knowledge. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2007). ACM, 231--238.
    [5]
    Ben Carterette, Evangelos Kanoulas, Mark Hall, and Paul Clough. 2014. Overview of the TREC 2014 session track. In Proceedings of The 23rd Text Retrieval Conference (TREC 2014).
    [6]
    Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, et al. 2018. Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018).
    [7]
    W. Bruce Croft, Donald Metzler, and Trevor Strohman. 2010. Search Engines: Information Retrieval in Practice. Addison-Wesley Reading.
    [8]
    Zhuyun Dai, Chenyan Xiong, Jamie Callan, and Zhiyuan Liu. 2018. Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM 2018). ACM, 126--134.
    [9]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
    [10]
    Fernando Diaz, Bhaskar Mitra, and Nick Craswell. 2016. Query Expansion with Locally-Trained Word Embeddings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016). 367--377.
    [11]
    Doug Downey, Susan Dumais, and Eric Horvitz. 2007. Heads and tails: studies of web search with common and rare queries. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 847--848.
    [12]
    Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2002. Placing search in context: The concept revisited. ACM Transactions on information systems 20, 1 (2002), 116--131.
    [13]
    Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 55--64.
    [14]
    Ahmed Hassan, Ryen W. White, Susan T. Dumais, and Yi-Min Wang. 2014. Struggling or exploring? disambiguating long search sessions. In Proceedings of the 7th ACM international conference on Web search and data mining. ACM, 53--62.
    [15]
    Jian Hu, Gang Wang, Fred Lochovsky, Jian-tao Sun, and Zheng Chen. 2009. Understanding user's query intent with wikipedia. In Proceedings of the 18th international conference on World wide web. ACM, 471--480.
    [16]
    Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (CIKM 2013). ACM, 2333--2338.
    [17]
    Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2016. CharacterAware Neural Language Models. In AAAI. 2741--2749.
    [18]
    Victor Lavrenko and W. Bruce Croft. 2001. Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001). ACM, 120--127.
    [19]
    Xiao Li, Ye-YiWang, and Alex Acero. 2008. Learning query intent from regularized click graphs. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 339--346.
    [20]
    Yury A. Malkov and Dmitry A. Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence (2018).
    [21]
    Donald Metzler and W. Bruce Croft. 2005. A Markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005). ACM, 472--479.
    [22]
    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 27th Advances in Neural Information Processing Systems 2013 (NIPS 2013). NIPS, 3111--3119.
    [23]
    Andriy Mnih and Koray Kavukcuoglu. 2013. Learning word embeddings efficiently with noise-contrastive estimation. In Advances in neural information processing systems. 2265--2273.
    [24]
    Eric Nalisnick, Bhaskar Mitra, Nick Craswell, and Rich Caruana. 2016. Improving document ranking with dual word embeddings. In Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, 83--84.
    [25]
    Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, and Xueqi Cheng. 2017. Deeprank: A new deep architecture for relevance ranking in information retrieval. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 257--266.
    [26]
    Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP 2014). 1532--1543.
    [27]
    Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proc. of NAACL.
    [28]
    Navid Rekabsaz, Mihai Lupu, Allan Hanbury, and Hamed Zamani. 2017. Word Embedding Causes Topic Shifting; Exploit Global Context! In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1105--1108.
    [29]
    Gerard Salton and Chris Buckley. 1990. Improving retrieval performance by relevance feedback. Journal of the American society for information science 41, 4 (1990), 288--297.
    [30]
    Tobias Schnabel, Igor Labutov, David Mimno, and Thorsten Joachims. 2015. Evaluation methods for unsupervised word embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 298--307.
    [31]
    Dou Shen, Jian-Tao Sun, Qiang Yang, and Zheng Chen. 2006. Building bridges for web query classification. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 131--138.
    [32]
    Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. Learning semantic representations using convolutional neural networks for web search. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 373--374.
    [33]
    Rupesh K. Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Training very deep networks. In Advances in neural information processing systems (NeuIPS 2015). 2377--2385.
    [34]
    AlexWang, Amapreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2018. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. arXiv preprint arXiv:1804.07461 (2018).
    [35]
    Chenyan Xiong and Jamie Callan. 2015. Query expansion with Freebase. In Proceedings of the fifth ACM International Conference on the Theory of Information Retrieval (ICTIR 2015). ACM, 111--120.
    [36]
    Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2017). ACM, 55--64.
    [37]
    Xiaoxin Yin and Sarthak Shah. 2010. Building taxonomy of web search intents for name entity queries. In Proceedings of the 19th international conference on World wide web (WWW 2010). ACM, 1001--1010.
    [38]
    Hamed Zamani and W. Bruce Croft. 2016. Embedding-based query language models. In Proceedings of the 2016 ACM international conference on the theory of information retrieval (ICTIR 2016). ACM, 147--156.
    [39]
    Hamed Zamani and W. Bruce Croft. 2016. Estimating embedding vectors for queries. In Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval (ICTIR 2016). ACM, 123--132.
    [40]
    Hamed Zamani and W. Bruce Croft. 2017. Relevance-based word embedding. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017). ACM, 505--514.
    [41]
    Hamed Zamani, Bhaskar Mitra, Xia Song, Nick Craswell, and Saurabh Tiwary. 2018. Neural Ranking Models with Multiple Document Fields. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM 2018). 700--708.
    [42]
    Guoqing Zheng and James P. Callan. 2015. Learning to reweight terms with distributed representations. In Proceedings of the 38th annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2015). ACM, 575--584.

    Cited By

    View all
    • (2023)Can Generative LLMs Create Query Variants for Test Collections? An Exploratory StudyProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591960(1869-1873)Online publication date: 19-Jul-2023
    • (2023)Proactive Human-Machine ConversationsNeural Approaches to Conversational Information Retrieval10.1007/978-3-031-23080-6_7(145-167)Online publication date: 17-Mar-2023
    • (2022)Learning-to-Spell: Weak Supervision based Query Correction in E-Commerce Search with Small Strong LabelsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557113(3431-3440)Online publication date: 17-Oct-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2019
    1512 pages
    ISBN:9781450361729
    DOI:10.1145/3331184
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 July 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. generic intent representation
    2. query embedding
    3. user intent

    Qualifiers

    • Research-article

    Conference

    SIGIR '19
    Sponsor:

    Acceptance Rates

    SIGIR'19 Paper Acceptance Rate 84 of 426 submissions, 20%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)50
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Can Generative LLMs Create Query Variants for Test Collections? An Exploratory StudyProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591960(1869-1873)Online publication date: 19-Jul-2023
    • (2023)Proactive Human-Machine ConversationsNeural Approaches to Conversational Information Retrieval10.1007/978-3-031-23080-6_7(145-167)Online publication date: 17-Mar-2023
    • (2022)Learning-to-Spell: Weak Supervision based Query Correction in E-Commerce Search with Small Strong LabelsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557113(3431-3440)Online publication date: 17-Oct-2022
    • (2022)Recognizing Medical Search Query Intent by Few-shot LearningProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531789(502-512)Online publication date: 6-Jul-2022
    • (2022)IntentVizor: Towards Generic Query Guided Interactive Video Summarization2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.01025(10493-10502)Online publication date: Jul-2022
    • (2022)CEQE to SQET: A study of contextualized embeddings for query expansionInformation Retrieval Journal10.1007/s10791-022-09405-y25:2(184-208)Online publication date: 22-Mar-2022
    • (2021)Hierarchical Composition Learning for Composed Query Image RetrievalProceedings of the 3rd ACM International Conference on Multimedia in Asia10.1145/3469877.3490601(1-7)Online publication date: 1-Dec-2021
    • (2021)Learning Multiple Intent Representations for Search QueriesProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482445(669-679)Online publication date: 26-Oct-2021
    • (2021)Natural Language Understanding with Privacy-Preserving BERTProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482281(1488-1497)Online publication date: 26-Oct-2021
    • (2021)APRF-Net: Attentive Pseudo-Relevance Feedback Network for Query CategorizationProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463041(1603-1607)Online publication date: 11-Jul-2021
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media