short-paper

Cross-Batch Negative Sampling for Training Two-Tower Recommenders

Authors:

Xiuqiang HeAuthors Info & Claims

SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1632 - 1636

https://doi.org/10.1145/3404835.3463032

Published: 11 July 2021 Publication History

Abstract

The two-tower architecture has been widely applied for learning item and user representations, which is important for large-scale recommender systems. Many two-tower models are trained using various in-batch negative sampling strategies, where the effects of such strategies inherently rely on the size of mini-batches. However, training two-tower models with a large batch size is inefficient, as it demands a large volume of memory for item and user contents and consumes a lot of time for feature encoding. Interestingly, we find that neural encoders can output relatively stable features for the same input after warming up in the training process. Based on such facts, we propose a simple yet effective sampling strategy called Cross-Batch Negative Sampling (CBNS), which takes advantage of the encoded item embeddings from recent mini-batches to boost the model training. Both theoretical analysis and empirical evaluations demonstrate the effectiveness and the efficiency of CBNS.

References

[1]

Yoshua Bengio and Jean-Sébastien Senécal. 2008. Adaptive importance sampling to accelerate training of a neural probabilistic language model. IEEE Transactions on Neural Networks, Vol. 19, 4 (2008), 713--722.

Digital Library

[2]

Yoshua Bengio, Jean-Sébastien Senécal, et al. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling. In International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR, 1--9.

[3]

Guy Blanc and Steffen Rendle. 2018. Adaptive sampled softmax with kernel based sampling. In International Conference on Machine Learning. PMLR, 590--599.

[4]

Yukuo Cen, Jianwei Zhang, Xu Zou, Chang Zhou, Hongxia Yang, and Jie Tang. 2020. Controllable Multi-Interest Framework for Recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2942--2951.

Digital Library

[5]

Jiawei Chen, Can Wang, Sheng Zhou, Qihao Shi, Yan Feng, and Chun Chen. 2019. Samwalker: Social recommendation with informative sampling strategy. In The World Wide Web Conference. 228--239.

Digital Library

[6]

Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017b. Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. 335--344.

Digital Library

[7]

Ting Chen, Yizhou Sun, Yue Shi, and Liangjie Hong. 2017a. On sampling strategies for neural network-based collaborative filtering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 767--776.

Digital Library

[8]

Tianqi Chen, Weinan Zhang, Qiuxia Lu, Kailong Chen, Zhao Zheng, and Yong Yu. 2012. SVDFeature: a toolkit for feature-based collaborative filtering. The Journal of Machine Learning Research, Vol. 13, 1 (2012), 3619--3622.

Digital Library

[9]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191--198.

Digital Library

[10]

Jingtao Ding, Yuhan Quan, Quanming Yao, Yong Li, and Depeng Jin. 2020 b. Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering. In Advances in Neural Information Processing Systems (NeurIPS), H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. 1094--1105.

[11]

Yingqi Qu Yuchen Ding, Jing Liu, Kai Liu, Ruiyang Ren, Xin Zhao, Daxiang Dong, Hua Wu, and Haifeng Wang. 2020 a. RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering. arXiv preprint arXiv:2010.08191 (2020).

[12]

Suyu Ge, Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. 2020. Graph enhanced representation learning for news recommendation. In Proceedings of The Web Conference 2020. 2863--2869.

Digital Library

[13]

Dan Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, and Diego Garcia-Olano. 2019. Learning Dense Representations for Entity Retrieval. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). 528--537.

[14]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9729--9738.

[15]

Ruining He and Julian McAuley. 2016. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In Proceedings of the 25th International Conference on World Wide Web. 507--517.

Digital Library

[16]

Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).

[17]

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. In 2008 Eighth IEEE International Conference on Data Mining. IEEE, 263--272.

Digital Library

[18]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6769--6781.

[19]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR .

[20]

Chao Li, Zhiyuan Liu, Mengmeng Wu, Yuchi Xu, Huan Zhao, Pipei Huang, Guoliang Kang, Qiwei Chen, Wei Li, and Dik Lun Lee. 2019. Multi-interest network with dynamic routing for recommendation at Tmall. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2615--2623.

Digital Library

[21]

Dawen Liang, Rahul G Krishnan, Matthew D Hoffman, and Tony Jebara. 2018. Variational autoencoders for collaborative filtering. In Proceedings of the 2018 world wide web conference. 689--698.

Digital Library

[22]

Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. 2015. Image-Based Recommendations on Styles and Substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 43--52.

[23]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 2. 3111--3119.

[24]

Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International Conference on Data Mining. IEEE, 995--1000.

Digital Library

[25]

Andrew I Schein, Alexandrin Popescul, Lyle H Ungar, and David M Pennock. 2002. Methods and metrics for cold-start recommendations. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. 253--260.

Digital Library

[26]

Xun Wang, Haozhi Zhang, Weilin Huang, and Matthew R Scott. 2020. Cross-batch memory for embedding learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6388--6397.

[27]

Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In Proceedings of the 27th ACM International Conference on Multimedia. 1437--1445.

Digital Library

[28]

Chao-Yuan Wu, R Manmatha, Alexander J Smola, and Philipp Krahenbuhl. 2017. Sampling matters in deep embedding learning. In Proceedings of the IEEE International Conference on Computer Vision. 2840--2848.

[29]

Ledell Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, and Luke Zettlemoyer. 2020. Scalable Zero-shot Entity Linking with Dense Entity Retrieval. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6397--6407.

[30]

Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval. 55--64.

Digital Library

[31]

Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N. Bennett, Junaid Ahmed, and Arnold Overwijk. 2021. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In ICLR .

[32]

Qidi Xu, Fumin Shen, Li Liu, and Heng Tao Shen. 2018. Graphcar: Content-aware multimedia recommendation with graph autoencoder. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 981--984.

Digital Library

[33]

Ji Yang, Xinyang Yi, Derek Zhiyuan Cheng, Lichan Hong, Yang Li, Simon Xiaoming Wang, Taibai Xu, and Ed H Chi. 2020 b. Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations. In Companion Proceedings of the Web Conference 2020. 441--447.

[34]

Yinfei Yang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan, Chris Tar, Yun-hsuan Sung, et al. 2020 a. Multilingual Universal Sentence Encoder for Semantic Retrieval. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 87--94.

[35]

Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Kumthekar, Zhe Zhao, Li Wei, and Ed Chi. 2019. Sampling-bias-corrected neural modeling for large corpus item recommendations. In Proceedings of the 13th ACM Conference on Recommender Systems. 269--277.

Digital Library

[36]

Wen-tau Yih, Kristina Toutanova, John C Platt, and Christopher Meek. 2011. Learning discriminative projections for text similarity measures. In Proceedings of the fifteenth conference on computational natural language learning. 247--256.

[37]

Chang Zhou, Jianxin Ma, Jianwei Zhang, Jingren Zhou, and Hongxia Yang. 2020. Contrastive Learning for Debiased Candidate Generation in Large-Scale Recommender Systems. arXiv preprint cs.IR/2005.12964 (2020).

Cited By

Mezentsev GGusak DOseledets IFrolov E(2024)Scalable Cross-Entropy Loss for Sequential Recommendations with Large Item CatalogsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688140(475-485)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688140
Wu JWang XGao XChen JFu HQiu T(2024)On the Effectiveness of Sampled Softmax Loss for Item RecommendationACM Transactions on Information Systems10.1145/363706142:4(1-26)Online publication date: 22-Mar-2024
https://dl.acm.org/doi/10.1145/3637061
Gusak DMezentsev GOseledets IFrolov ESerra ESpezzano F(2024)RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential RecommendersProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679986(3772-3776)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679986
Show More Cited By

Index Terms

Cross-Batch Negative Sampling for Training Two-Tower Recommenders
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Sampling-bias-corrected neural modeling for large corpus item recommendations
RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems

Many recommendation systems retrieve and score items from a very large corpus. A common recipe to handle data sparsity and power-law item distribution is to learn item representations from its content features. Apart from many content-aware systems ...
Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations
WWW '20: Companion Proceedings of the Web Conference 2020

Learning query and item representations is important for building large scale recommendation systems. In many real applications where there is a huge catalog of items to recommend, the problem of efficiently retrieving top k items given user’s query ...
Batch-Mix Negative Sampling for Learning Recommendation Retrievers
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Recommendation retrievers commonly retrieve user potentially preferred items from numerous items, where the query and item representation are learned according to the dual encoders with the log-softmax loss. Under real scenarios, the number of items ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2021

2998 pages

ISBN:9781450380379

DOI:10.1145/3404835

General Chairs:
Fernando Diaz
(Google)
,
Chirag Shah
University of Washington
,
Torsten Suel
New York University
,
Program Chairs:
Pablo Castells
Universidad Autónoma de Madrid, Amazon
,
Rosie Jones
Spotify
,
Tetsuya Sakai
Waseda University

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '21

Sponsor:

SIGIR

SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2021

Virtual Event, Canada

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
933
Total Downloads

Downloads (Last 12 months)102
Downloads (Last 6 weeks)15

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mezentsev GGusak DOseledets IFrolov E(2024)Scalable Cross-Entropy Loss for Sequential Recommendations with Large Item CatalogsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688140(475-485)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688140
Wu JWang XGao XChen JFu HQiu T(2024)On the Effectiveness of Sampled Softmax Loss for Item RecommendationACM Transactions on Information Systems10.1145/363706142:4(1-26)Online publication date: 22-Mar-2024
https://dl.acm.org/doi/10.1145/3637061
Gusak DMezentsev GOseledets IFrolov ESerra ESpezzano F(2024)RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential RecommendersProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679986(3772-3776)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679986
Li WWang ZWang JXia SZhu JChen MFan JCheng JLei JHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)ReFer: Retrieval-Enhanced Vertical Federated Recommendation for Full Set User BenefitProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657763(1763-1773)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657763
He QLi XCai B(2024)Graph neural network recommendation algorithm based on improved dual tower modelScientific Reports10.1038/s41598-024-54376-314:1Online publication date: 15-Feb-2024
https://doi.org/10.1038/s41598-024-54376-3
Bagga VSugunan SSrivastava AKumar RGupta PKumar DGuha D(2024)Adaptive Fusion and Transfer Learning for Enhanced E –Commerce RecommendationsProcedia Computer Science10.1016/j.procs.2023.12.037229:C(345-356)Online publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1016/j.procs.2023.12.037
Sachidananda VYang ZZhu CKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Global selection of contrastive batches via optimization on sample permutationsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619637(29542-29562)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619637
Fan YChen JJiang YLian DGuo FZheng KFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Batch-Mix Negative Sampling for Learning Recommendation RetrieversProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614789(494-503)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614789
Zhai JGong ZWang YSun XYan ZLi FLiu XSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Revisiting Neural Retrieval on AcceleratorsProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599897(5520-5531)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599897
He YTian YWang MChen FYu LTang MChen CZhang NKuang BPrakash A(2023)Que2Engage: Embedding-based Retrieval for Relevant and Engaging Products at Facebook MarketplaceCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3584633(386-390)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543873.3584633
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents