Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3404835.3463032acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Cross-Batch Negative Sampling for Training Two-Tower Recommenders

Published: 11 July 2021 Publication History

Abstract

The two-tower architecture has been widely applied for learning item and user representations, which is important for large-scale recommender systems. Many two-tower models are trained using various in-batch negative sampling strategies, where the effects of such strategies inherently rely on the size of mini-batches. However, training two-tower models with a large batch size is inefficient, as it demands a large volume of memory for item and user contents and consumes a lot of time for feature encoding. Interestingly, we find that neural encoders can output relatively stable features for the same input after warming up in the training process. Based on such facts, we propose a simple yet effective sampling strategy called Cross-Batch Negative Sampling (CBNS), which takes advantage of the encoded item embeddings from recent mini-batches to boost the model training. Both theoretical analysis and empirical evaluations demonstrate the effectiveness and the efficiency of CBNS.

References

[1]
Yoshua Bengio and Jean-Sébastien Senécal. 2008. Adaptive importance sampling to accelerate training of a neural probabilistic language model. IEEE Transactions on Neural Networks, Vol. 19, 4 (2008), 713--722.
[2]
Yoshua Bengio, Jean-Sébastien Senécal, et al. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling. In International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR, 1--9.
[3]
Guy Blanc and Steffen Rendle. 2018. Adaptive sampled softmax with kernel based sampling. In International Conference on Machine Learning. PMLR, 590--599.
[4]
Yukuo Cen, Jianwei Zhang, Xu Zou, Chang Zhou, Hongxia Yang, and Jie Tang. 2020. Controllable Multi-Interest Framework for Recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2942--2951.
[5]
Jiawei Chen, Can Wang, Sheng Zhou, Qihao Shi, Yan Feng, and Chun Chen. 2019. Samwalker: Social recommendation with informative sampling strategy. In The World Wide Web Conference. 228--239.
[6]
Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017b. Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. 335--344.
[7]
Ting Chen, Yizhou Sun, Yue Shi, and Liangjie Hong. 2017a. On sampling strategies for neural network-based collaborative filtering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 767--776.
[8]
Tianqi Chen, Weinan Zhang, Qiuxia Lu, Kailong Chen, Zhao Zheng, and Yong Yu. 2012. SVDFeature: a toolkit for feature-based collaborative filtering. The Journal of Machine Learning Research, Vol. 13, 1 (2012), 3619--3622.
[9]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191--198.
[10]
Jingtao Ding, Yuhan Quan, Quanming Yao, Yong Li, and Depeng Jin. 2020 b. Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering. In Advances in Neural Information Processing Systems (NeurIPS), H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. 1094--1105.
[11]
Yingqi Qu Yuchen Ding, Jing Liu, Kai Liu, Ruiyang Ren, Xin Zhao, Daxiang Dong, Hua Wu, and Haifeng Wang. 2020 a. RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering. arXiv preprint arXiv:2010.08191 (2020).
[12]
Suyu Ge, Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. 2020. Graph enhanced representation learning for news recommendation. In Proceedings of The Web Conference 2020. 2863--2869.
[13]
Dan Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, and Diego Garcia-Olano. 2019. Learning Dense Representations for Entity Retrieval. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). 528--537.
[14]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9729--9738.
[15]
Ruining He and Julian McAuley. 2016. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In Proceedings of the 25th International Conference on World Wide Web. 507--517.
[16]
Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
[17]
Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. In 2008 Eighth IEEE International Conference on Data Mining. IEEE, 263--272.
[18]
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6769--6781.
[19]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR .
[20]
Chao Li, Zhiyuan Liu, Mengmeng Wu, Yuchi Xu, Huan Zhao, Pipei Huang, Guoliang Kang, Qiwei Chen, Wei Li, and Dik Lun Lee. 2019. Multi-interest network with dynamic routing for recommendation at Tmall. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2615--2623.
[21]
Dawen Liang, Rahul G Krishnan, Matthew D Hoffman, and Tony Jebara. 2018. Variational autoencoders for collaborative filtering. In Proceedings of the 2018 world wide web conference. 689--698.
[22]
Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. 2015. Image-Based Recommendations on Styles and Substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 43--52.
[23]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 2. 3111--3119.
[24]
Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International Conference on Data Mining. IEEE, 995--1000.
[25]
Andrew I Schein, Alexandrin Popescul, Lyle H Ungar, and David M Pennock. 2002. Methods and metrics for cold-start recommendations. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. 253--260.
[26]
Xun Wang, Haozhi Zhang, Weilin Huang, and Matthew R Scott. 2020. Cross-batch memory for embedding learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6388--6397.
[27]
Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In Proceedings of the 27th ACM International Conference on Multimedia. 1437--1445.
[28]
Chao-Yuan Wu, R Manmatha, Alexander J Smola, and Philipp Krahenbuhl. 2017. Sampling matters in deep embedding learning. In Proceedings of the IEEE International Conference on Computer Vision. 2840--2848.
[29]
Ledell Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, and Luke Zettlemoyer. 2020. Scalable Zero-shot Entity Linking with Dense Entity Retrieval. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6397--6407.
[30]
Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval. 55--64.
[31]
Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N. Bennett, Junaid Ahmed, and Arnold Overwijk. 2021. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In ICLR .
[32]
Qidi Xu, Fumin Shen, Li Liu, and Heng Tao Shen. 2018. Graphcar: Content-aware multimedia recommendation with graph autoencoder. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 981--984.
[33]
Ji Yang, Xinyang Yi, Derek Zhiyuan Cheng, Lichan Hong, Yang Li, Simon Xiaoming Wang, Taibai Xu, and Ed H Chi. 2020 b. Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations. In Companion Proceedings of the Web Conference 2020. 441--447.
[34]
Yinfei Yang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan, Chris Tar, Yun-hsuan Sung, et al. 2020 a. Multilingual Universal Sentence Encoder for Semantic Retrieval. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 87--94.
[35]
Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Kumthekar, Zhe Zhao, Li Wei, and Ed Chi. 2019. Sampling-bias-corrected neural modeling for large corpus item recommendations. In Proceedings of the 13th ACM Conference on Recommender Systems. 269--277.
[36]
Wen-tau Yih, Kristina Toutanova, John C Platt, and Christopher Meek. 2011. Learning discriminative projections for text similarity measures. In Proceedings of the fifteenth conference on computational natural language learning. 247--256.
[37]
Chang Zhou, Jianxin Ma, Jianwei Zhang, Jingren Zhou, and Hongxia Yang. 2020. Contrastive Learning for Debiased Candidate Generation in Large-Scale Recommender Systems. arXiv preprint cs.IR/2005.12964 (2020).

Cited By

View all
  • (2024)Scalable Cross-Entropy Loss for Sequential Recommendations with Large Item CatalogsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688140(475-485)Online publication date: 8-Oct-2024
  • (2024)On the Effectiveness of Sampled Softmax Loss for Item RecommendationACM Transactions on Information Systems10.1145/363706142:4(1-26)Online publication date: 22-Mar-2024
  • (2024)RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential RecommendersProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679986(3772-3776)Online publication date: 21-Oct-2024
  • Show More Cited By

Index Terms

  1. Cross-Batch Negative Sampling for Training Two-Tower Recommenders

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2021
    2998 pages
    ISBN:9781450380379
    DOI:10.1145/3404835
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 July 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. information retrieval
    2. neural networks
    3. recommender systems

    Qualifiers

    • Short-paper

    Conference

    SIGIR '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)102
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Scalable Cross-Entropy Loss for Sequential Recommendations with Large Item CatalogsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688140(475-485)Online publication date: 8-Oct-2024
    • (2024)On the Effectiveness of Sampled Softmax Loss for Item RecommendationACM Transactions on Information Systems10.1145/363706142:4(1-26)Online publication date: 22-Mar-2024
    • (2024)RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential RecommendersProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679986(3772-3776)Online publication date: 21-Oct-2024
    • (2024)ReFer: Retrieval-Enhanced Vertical Federated Recommendation for Full Set User BenefitProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657763(1763-1773)Online publication date: 10-Jul-2024
    • (2024)Graph neural network recommendation algorithm based on improved dual tower modelScientific Reports10.1038/s41598-024-54376-314:1Online publication date: 15-Feb-2024
    • (2024)Adaptive Fusion and Transfer Learning for Enhanced E –Commerce RecommendationsProcedia Computer Science10.1016/j.procs.2023.12.037229:C(345-356)Online publication date: 14-Mar-2024
    • (2023)Global selection of contrastive batches via optimization on sample permutationsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619637(29542-29562)Online publication date: 23-Jul-2023
    • (2023)Batch-Mix Negative Sampling for Learning Recommendation RetrieversProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614789(494-503)Online publication date: 21-Oct-2023
    • (2023)Revisiting Neural Retrieval on AcceleratorsProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599897(5520-5531)Online publication date: 6-Aug-2023
    • (2023)Que2Engage: Embedding-based Retrieval for Relevant and Engaging Products at Facebook MarketplaceCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3584633(386-390)Online publication date: 30-Apr-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media