research-article

Public Access

Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search

Authors:

Zhiyuan LiuAuthors Info & Claims

WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining

Pages 126 - 134

https://doi.org/10.1145/3159652.3159659

Published: 02 February 2018 Publication History

Abstract

This paper presents \textttConv-KNRM, a Convolutional Kernel-based Neural Ranking Model that models n-gram soft matches for ad-hoc search. Instead of exact matching query and document n-grams, \textttConv-KNRM uses Convolutional Neural Networks to represent n-grams of various lengths and soft matches them in a unified embedding space. The n-gram soft matches are then utilized by the kernel pooling and learning-to-rank layers to generate the final ranking score. \textttConv-KNRM can be learned end-to-end and fully optimized from user feedback. The learned model»s generalizability is investigated by testing how well it performs in a related domain with small amounts of training data. Experiments on English search logs, Chinese search logs, and TREC Web track tasks demonstrated consistent advantages of \textttConv-KNRM over prior neural IR methods and feature-based methods.

References

[1]

Qingyao Ai, Liu Yang, Jiafeng Guo, and W Bruce Croft . 2016. Analysis of the paragraph vector model for information retrieval Proceedings of the 2016 ACM on International Conference on the Theory of Information Retrieval (ICTIR 2016). ACM, 133--142.

Digital Library

[2]

Michael Bendersky, Donald Metzler, and W. Bruce Croft . 2011. Parameterized concept weighting in verbose queries Proceedings of the 34th annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2011). ACM.

Digital Library

[3]

Michael Bendersky, Donald Metzler, and W Bruce Croft . 2012. Effective query formulation with multiple information sources Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (WSDM 2012). ACM, 443--452.

Digital Library

[4]

Adam Berger and John Lafferty . 1999. Information Retrieval as statistical translation. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 222--229.

Digital Library

[5]

Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke . 2015. Click Models for Web Search. Synthesis Lectures on Information Concepts, Retrieval, and Services, Vol. 7, 3 (2015), 1--115.

[6]

Nick Craswell, W Bruce Croft, Jiafeng Guo, Bhaskar Mitra, and Maarten de Rijke . 2017. Report on the SIGIR 2016 Workshop on Neural Information Retrieval (Neu-IR) Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM.

[7]

W Bruce Croft, Donald Metzler, and Trevor Strohman . 2010. Search Engines: Information Retrieval in Practice. Addison-Wesley Reading.

Digital Library

[8]

Jeffrey Dalton, Laura Dietz, and James Allan . 2014. Entity Query Feature Expansion using Knowledge Base Links Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2014). ACM, 365--374.

Digital Library

[9]

Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W. Crof, Bruce . 2017. Neural Ranking Models with Weak Supervision. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM.

Digital Library

[10]

Fernando Diaz, Bhaskar Mitra, and Nick Craswell . 2016. Query Expansion with Locally-Trained Word Embeddings Proceedings of the 54th Annual Meeting of the Association for Computational (ACL). ACL.

[11]

Kristen Grauman and Trevor Darrell . 2005. The pyramid match kernel: Discriminative classification with sets of image features Tenth IEEE International Conference on Computer Vision (ICCV) Volume 1, Vol. Vol. 2. IEEE, 1458--1465.

Digital Library

[12]

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft . 2016 a. Semantic Matching by Non-Linear Word Transportation for Information Retrieval Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM). ACM.

Digital Library

[13]

Jiafeng Guo, Yixing Fan, Ai Qingyao, and W. Bruce Croft . 2016 b. A Deep Relevance Matching Model for Ad-hoc Retrieval Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM). ACM.

Digital Library

[14]

Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen . 2014. Convolutional neural network architectures for matching natural language sentences Advances in Neural Information Processing Systems (NIPS).

Digital Library

[15]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck . 2013. Learning deep structured semantic models for web search using click through data Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (CIKM). ACM, 2333--2338.

Digital Library

[16]

Thorsten Joachims . 2002. Optimizing search engines using clickthrough Data. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002). ACM, 133--142.

Digital Library

[17]

Yoon Kim . 2014. Convolutional neural networks for sentence classification Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014--10). Association for Computational Linguistics.

[18]

Donald Metzler and W Bruce Croft . 2005. A Markov random field model for term dependencies Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005). ACM, 472--479.

Digital Library

[19]

Donald Metzler and W Bruce Croft . 2007. Linear feature-based models for information retrieval. Information Retrieval (2007).

Digital Library

[20]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean . 2013. Distributed representations of words and phrases and their compositionality Proceedings of the 2tth Advances in Neural Information Processing Systems 2013 (NIPS).

Digital Library

[21]

Bhaskar Mitra, Fernando Diaz, and Nick Craswell . 2017. Learning to Match Using Local and Distributed Representations of Text for Web Search Proceedings of the 25th International Conference on World Wide Web (WWW). ACM.

Digital Library

[22]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, and Xueqi Cheng . 2016. A study of matchpyramid models on ad-hoc retrieval. arXiv preprint arXiv:1606.04648 (2016).

[23]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng Text Matching As Image Recognition. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI).

Digital Library

[24]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning . 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532--1543.

[25]

Navid Rekabsaz, Mihai Lupu, Allan Hanbury, and Hamed Zamani . 2017. Word Embedding Causes Topic Shifting; Exploit Global Context! Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM.

Digital Library

[26]

Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil . 2014. Learning semantic representations using convolutional neural networks for web search Proceedings of the 23rd International Conference on World Wide Web (WWW). ACM.

Digital Library

[27]

Qiang Wu, Christopher JC Burges, Krysta M Svore, and Jianfeng Gao . 2010. Adapting boosting for information retrieval measures. Information Retrieval (2010).

Digital Library

[28]

Chenyan Xiong, Jamie Callan, and Tie-Yan Liu . 2017 a. Word-Entity Duet Representations for Document Ranking Proceedings of the 40th annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2017). ACM.

Digital Library

[29]

Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power . 2017 b. End-to-end neural ad-hoc ranking with kernel pooling Proceedings of the 40th annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2017). ACM.

Digital Library

[30]

Hamed Zamani and W. Croft, Bruce . 2017. Relevance-based Word Embedding. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM.

Digital Library

[31]

Hua-Ping Zhang, Hong-Kui Yu, De-Yi Xiong, and Qun Liu . 2003. HHMM-based Chinese lexical analyzer ICTCLAS. Proceedings of the second SIGHAN workshop on Chinese language processing. ACL.

Digital Library

[32]

Guoqing Zheng and Jamie Callan . 2015. Learning to Reweight Terms with Distributed Representations Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2015). ACM.

Digital Library

Cited By

Wang JHuang JTu XWang JHuang ALaskar MBhuiyan A(2024)Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and ChallengesACM Computing Surveys10.1145/364847156:7(1-33)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.1145/3648471
Bai YZhou YDou ZWen J(2024)Intent-Oriented Dynamic Interest Modeling for Personalized Web SearchACM Transactions on Information Systems10.1145/363981742:4(1-30)Online publication date: 8-Jan-2024
https://dl.acm.org/doi/10.1145/3639817
MacAvaney STonellotto NHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)A Reproducibility Study of PLAIDProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657856(1411-1419)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657856
Show More Cited By

Index Terms

Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Evaluating leading web search engines on children's queries
HCII'11: Proceedings of the 14th international conference on Human-computer interaction: users and applications - Volume Part IV

This study compared retrieved results, relevance ranking, and overlap across Google, Yahoo!, Bing, Yahoo Kids!, and Ask Kids on 15 queries constructed by middle school children. Queries included one word, two words, and multiple words/phrases/natural ...
Selective Weak Supervision for Neural Information Retrieval
WWW '20: Proceedings of The Web Conference 2020

This paper democratizes neural information retrieval to scenarios where large scale relevance training signals are not available. We revisit the classic IR intuition that anchor-document relations approximate query-document relevance and propose a ...
Towards content-based relevance ranking for video search
MM '06: Proceedings of the 14th ACM international conference on Multimedia

Most existing web video search engines index videos by file names, URLs, and surrounding texts. These types of video metadata roughly describe the whole video in an abstract level without taking the rich content, such as semantic content descriptions ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining

February 2018

821 pages

ISBN:9781450355810

DOI:10.1145/3159652

General Chairs:
Yi Chang
Jilin University, Huawei Inc.
,
Chengxiang Zhai
University of Illinois Urbana-Champaign
,
Program Chairs:
Yan Liu
University of Southern California
,
Yoelle Maarek
Amazon

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 February 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

WSDM 2018

Sponsor:

WSDM 2018: The Eleventh ACM International Conference on Web Search and Data Mining

February 5 - 9, 2018

CA, Marina Del Rey, USA

Acceptance Rates

WSDM '18 Paper Acceptance Rate 81 of 514 submissions, 16%;

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

179
Total Citations
View Citations
2,385
Total Downloads

Downloads (Last 12 months)326
Downloads (Last 6 weeks)43

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang JHuang JTu XWang JHuang ALaskar MBhuiyan A(2024)Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and ChallengesACM Computing Surveys10.1145/364847156:7(1-33)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.1145/3648471
Bai YZhou YDou ZWen J(2024)Intent-Oriented Dynamic Interest Modeling for Personalized Web SearchACM Transactions on Information Systems10.1145/363981742:4(1-30)Online publication date: 8-Jan-2024
https://dl.acm.org/doi/10.1145/3639817
MacAvaney STonellotto NHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)A Reproducibility Study of PLAIDProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657856(1411-1419)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657856
Zhou YZhu QJin JDou ZChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Cognitive Personalized Search Integrating Large Language Models with an Efficient Memory MechanismProceedings of the ACM Web Conference 202410.1145/3589334.3645482(1464-1473)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645482
Mu HZhang SWang YSun YXu H(2024)TRGNN: Text-Rich Graph Neural Network for Few-Shot Document Filtering2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650066(1-9)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10650066
Javed ARashid ITahir SSaeed SAlmuhaideb AAlissa K(2024)AdamW+: Machine Learning Framework to Detect Domain Generation Algorithms for MalwareIEEE Access10.1109/ACCESS.2024.340754612(79138-79150)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3407546
Chang SAhn GPark S(2024)Improving Performance of Neural IR Models by Using a Keyword-Extraction-Based Weak-Supervision MethodIEEE Access10.1109/ACCESS.2024.338219012(46851-46863)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3382190
Faseeh MKhan MIqbal NQayyum FMehmood AKim J(2024)Enhancing User Experience on Q&A Platforms: Measuring Text Similarity Based on Hybrid CNN-LSTM Model for Efficient Duplicate Question DetectionIEEE Access10.1109/ACCESS.2024.335842212(34512-34526)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3358422
Guo XCao HLiang S(2024)End-to-end multi-perspective multimodal posts relevance score reasoning predictionInformation Sciences: an International Journal10.1016/j.ins.2024.120727675:COnline publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1016/j.ins.2024.120727
Song CZeng ZTian CLi KYao YZheng SLiu ZSun M(2024)Relation-aware deep neural network enables more efficient biomedical knowledge acquisition from massive literatureAI Open10.1016/j.aiopen.2024.08.0025(104-114)Online publication date: 2024
https://doi.org/10.1016/j.aiopen.2024.08.002
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents