research-article

Open access

Blend and Match: Distilling Semantic Search Models with Different Inductive Biases and Model Architectures

Authors:

Ashutosh Joshi,

Mutasem Al-Darabsah,

Vaclav PetricekAuthors Info & Claims

WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023

Pages 869 - 877

https://doi.org/10.1145/3543873.3587629

Published: 30 April 2023 Publication History

All formats PDF

Abstract

Commercial search engines use different semantic models to augment lexical matches. These models provide candidate items for a user’s query from a target space of millions to billions of items. Models with different inductive biases provide relatively different predictions, making it desirable to launch multiple semantic models in production. However, latency and resource constraints make simultaneously deploying multiple models impractical. In this paper, we introduce a distillation approach, called Blend and Match (BM), to unify two different semantic search models into a single model. We use a Bi-encoder semantic matching model as our primary model and propose a novel loss function to incorporate eXtreme Multi-label Classification (XMC) predictions as the secondary model. Our experiments conducted on two large-scale datasets, collected from a popular e-commerce store, show that our proposed approach significantly improves the recall of the primary Bi-encoder model by 11% to 17% with a minimal loss in precision. We show that traditional knowledge distillation approaches result in a sub-optimal performance for our problem setting, and our BM approach yields comparable rankings with strong Rank Fusion (RF) methods used only if one could deploy multiple models.

References

[1]

Peter Bailey, Alistair Moffat, Falk Scholer, and Paul Thomas. 2017. Retrieval consistency in the presence of query variations. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 395–404.

Digital Library

[2]

RODGER BENHAM. 2018. Improving Recall In Text Retrieval Using Rank Fusion. Ph.D. Dissertation. RMIT University Australia 5.

[3]

Kuluhan Binici, Nam Trung Pham, Tulika Mitra, and Karianto Leman. 2021. Preventing Catastrophic Forgetting and Distribution Mismatch in Knowledge Distillation via Synthetic Data. CoRR abs/2108.05698 (2021). arXiv:2108.05698https://arxiv.org/abs/2108.05698

[4]

Leonid Boytsov and Eric Nyberg. 2020. Flexible retrieval with NMSLIB and FlexNeuART. arXiv preprint arXiv:2010.14848 (2020).

[5]

Cristian Buciluǎ, Rich Caruana, and Alexandru Niculescu-Mizil. 2006. Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 535–541.

Digital Library

[6]

Wei-Cheng Chang, Daniel Jiang, Hsiang-Fu Yu, Choon Hui Teo, Jiong Zhang, Kai Zhong, Kedarnath Kolluri, Qie Hu, Nikhil Shandilya, Vyacheslav Ievgrafov, 2021. Extreme multi-label learning for semantic matching in product search. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2643–2651.

Digital Library

[7]

Wei-Cheng Chang, Daniel Jiang, Hsiang-Fu Yu, Choon Hui Teo, Jiong Zhang, Kai Zhong, Kedarnath Kolluri, Qie Hu, Nikhil Shandilya, Vyacheslav Ievgrafov, Japinder Singh, and Inderjit S. Dhillon. 2021. Extreme Multi-Label Learning for Semantic Matching in Product Search. Association for Computing Machinery, New York, NY, USA, 2643–2651. https://doi.org/10.1145/3447548.3467092

Digital Library

[8]

Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, and Sanjiv Kumar. 2020. Pre-training Tasks for Embedding-based Large-scale Retrieval. In International Conference on Learning Representations. https://openreview.net/forum?id=rkg-mA4FDr

[9]

Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit S Dhillon. 2020. Taming pretrained transformers for extreme multi-label text classification. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3163–3171.

Digital Library

[10]

Yuntao Chen, Naiyan Wang, and Zhaoxiang Zhang. 2018. Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[11]

Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. 2017. A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282 (2017).

[12]

Jaekeol Choi, Euna Jung, Jangwon Suh, and Wonjong Rhee. 2021. Improving Bi-encoder Document Ranking Models with Two Rankers and Multi-teacher Distillation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2192–2196.

Digital Library

[13]

Daniel Cohen, John Foley, Hamed Zamani, James Allan, and W Bruce Croft. 2018. Universal approximation functions for fast learning to rank: Replacing expensive regression forests with simple feed-forward networks. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 1017–1020.

Digital Library

[14]

Gordon V Cormack, Charles LA Clarke, and Stefan Buettcher. 2009. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. 758–759.

Digital Library

[15]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. 2019 Conf of the NAACL: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186. https://doi.org/10.18653/v1/N19-1423

[16]

Edward A Fox and Joseph A Shaw. 1994. Combination of multiple searches. NIST special publication SP 243 (1994).

[17]

Krzysztof J Geras, Abdel-rahman Mohamed, Rich Caruana, Gregor Urban, Shengjie Wang, Ozlem Aslan, Matthai Philipose, Matthew Richardson, and Charles Sutton. 2015. Blending lstms into cnns. arXiv preprint arXiv:1511.06433 (2015).

[18]

Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. 2021. Knowledge distillation: A survey. International Journal of Computer Vision 129, 6 (2021), 1789–1819.

Digital Library

[19]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2014. Distilling the knowledge in a neural network. NIPS 2014, Deep Learning Workshop (2014).

[20]

Sebastian Hofstätter, Sophia Althammer, Michael Schröder, Mete Sertkan, and Allan Hanbury. 2021. Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation. arxiv:2010.02666 [cs.IR]

[21]

D Frank Hsu and Isak Taksa. 2005. Comparing rank and score combination methods for data fusion in information retrieval. Information retrieval 8, 3 (2005), 449–480.

[22]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2333–2338.

Digital Library

[23]

Thorsten Joachims, Hang Li, Tie-Yan Liu, and ChengXiang Zhai. 2007. Learning to Rank for Information Retrieval (LR4IR 2007). SIGIR Forum 41, 2 (Dec. 2007), 58–62. https://doi.org/10.1145/1328964.1328974

Digital Library

[24]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with gpus. IEEE Transactions on Big Data (2019).

[25]

SeongKu Kang, Junyoung Hwang, Wonbin Kweon, and Hwanjo Yu. 2020. DE-RRD: A Knowledge Distillation Framework for Recommender System. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 605–614.

Digital Library

[26]

Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. 2019. Latent retrieval for weakly supervised open domain question answering. arXiv preprint arXiv:1906.00300 (2019).

[27]

Youngjune Lee and Kee-Eung Kim. 2021. Dual Correction Strategy for Ranking Distillation in Top-N Recommender System. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3186–3190.

Digital Library

[28]

David Lillis. 2020. On the Evaluation of Data Fusion for Information Retrieval. In Forum for Information Retrieval Evaluation. 54–57.

[29]

David Lillis, Fergus Toolan, Rem Collier, and John Dunnion. 2006. Probfuse: a probabilistic approach to data fusion. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 139–146.

Digital Library

[30]

Yuang Liu, Wei Zhang, and Jun Wang. 2020. Adaptive multi-teacher multi-level knowledge distillation. Neurocomputing 415 (2020), 106–113.

[31]

Hanqing Lu, Youna Hu, Tong Zhao, Tony Wu, Yiwei Song, and Bing Yin. 2021. Graph-based Multilingual Product Retrieval in E-Commerce Search. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers. 146–153.

[32]

CD Manning, P Raghavan, and H Schütze. 2008. Xml retrieval. In Introduction to Information Retrieval.Cambridze University Press.

[33]

Priyanka Nigam, Yiwei Song, Vijai Mohan, Vihan Lakshman, Weitian Ding, Ankit Shingavi, Choon Hui Teo, Hao Gu, and Bing Yin. 2019. Semantic product search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2876–2885.

Digital Library

[34]

Rabia Nuray and Fazli Can. 2006. Automatic ranking of information retrieval systems using data fusion. Information processing & management 42, 3 (2006), 595–614.

[35]

Sashank Reddi, Rama Kumar Pasumarthi, Aditya Menon, Ankit Singh Rawat, Felix Yu, Seungyeon Kim, Andreas Veit, and Sanjiv Kumar. 2021. Rankdistil: Knowledge distillation for ranking. In International Conference on Artificial Intelligence and Statistics. PMLR, 2368–2376.

[36]

Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc.

[37]

Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR’94. Springer, 232–241.

[38]

Gerard Salton, Anita Wong, and Chung-Shu Yang. 1975. A vector space model for automatic indexing. Commun. ACM 18, 11 (1975), 613–620.

Digital Library

[39]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. NeurIPS 2019, Workshop on Energy Efficient Machine Learning and Cognitive Computing (2019).

[40]

Yanyao Shen, Hsiang-fu Yu, Sujay Sanghavi, and Inderjit Dhillon. 2020. Extreme Multi-label Classification from Aggregated Labels. In International Conference on Machine Learning. PMLR, 8752–8762.

[41]

Jiaxi Tang and Ke Wang. 2018. Ranking distillation: Learning compact ranking models with high performance for recommender system. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2289–2298.

Digital Library

[42]

Amir Vakili Tahami, Kamyar Ghajar, and Azadeh Shakery. 2020. Distilling knowledge for fast retrieval-based chat-bots. In Proceedings of the 43rd International ACM SIGIR conference on research and development in information retrieval. 2081–2084.

Digital Library

[43]

Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N. Bennett, Junaid Ahmed, and Arnold Overwijk. 2021. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In International Conference on Learning Representations. https://openreview.net/forum?id=zeFrfgyZln

[44]

Shan You, Chang Xu, Chao Xu, and Dacheng Tao. 2017. Learning from multiple teacher networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1285–1294.

Digital Library

[45]

Hsiang-Fu Yu, Kai Zhong, and Inderjit S Dhillon. 2020. PECOS: Prediction for enormous and correlated output spaces. arXiv preprint arXiv:2010.05878 (2020).

Recommendations

An effective semantic search technique using ontology
WWW '09: Proceedings of the 18th international conference on World wide web

In this paper, we present a semantic search technique considering the type of desired Web resources and the semantic relationships between the resources and the query keywords in the ontology. In order to effectively retrieve the most relevant top-k ...
Experience of Developing a Meta-semantic Search Engine
CUBE '13: Proceedings of the 2013 International Conference on Cloud & Ubiquitous Computing & Emerging Technologies

Thinking of today's web search scenario which is mainly keyword based, leads to the need of effective and meaningful search provided by Semantic Web. Existing search engines are vulnerable to provide relevant answers to users query due to their ...
Image Semantic Search Engine
DBTA '09: Proceedings of the 2009 First International Workshop on Database Technology and Applications

As the search technology rapidly developed, nowadays, main search engines are already able to meet users basic search desire. However, current search algorithms or methodologies mostly depend on keywords matching process, which could be effective for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023

April 2023

1567 pages

ISBN:9781450394192

DOI:10.1145/3543873

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 April 2023

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '23

Sponsor:

SIGWEB

WWW '23: The ACM Web Conference 2023

April 30 - May 4, 2023

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
354
Total Downloads

Downloads (Last 12 months)235
Downloads (Last 6 weeks)40

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents