research-article

Open access

BatchSampler: Sampling Mini-Batches for Contrastive Learning in Vision, Language, and Graphs

Authors:

Jie TangAuthors Info & Claims

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 3057 - 3069

https://doi.org/10.1145/3580305.3599263

Published: 04 August 2023 Publication History

Abstract

In-Batch contrastive learning is a state-of-the-art self-supervised method that brings semantically-similar instances close while pushing dissimilar instances apart within a mini-batch. Its key to success is the negative sharing strategy, in which every instance serves as a negative for the others within the mini-batch. Recent studies aim to improve performance by sampling hard negatives within the current mini-batch, whose quality is bounded by the mini-batch itself. In this work, we propose to improve contrastive learning by sampling mini-batches from the input data. We present BatchSampler\footnoteThe code is available at BatchSampler to sample mini-batches of hard-to-distinguish (i.e., hard and true negatives to each other) instances. To make each mini-batch have fewer false negatives, we design the proximity graph of randomly-selected instances. To form the mini-batch, we leverage random walk with restart on the proximity graph to help sample hard-to-distinguish instances. BatchSampler is a simple and general technique that can be directly plugged into existing contrastive learning models in vision, language, and graphs. Extensive experiments on datasets of three modalities show that BatchSampler can consistently improve the performance of powerful contrastive models, as shown by significant improvements of SimCLR on ImageNet-100, SimCSE on STS (language), and GraphCL and MVGRL on graph datasets.

Supplementary Material

MP4 File (BatchSampler-video.mp4)

Pre-recorded Presentation Video for KDD 2023

Download
4.16 MB

References

[1]

Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Inigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, et al. 2015. Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability. In SemEval'15. 252--263.

[2]

Eneko Agirre, Carmen Banea, Claire Cardie, Daniel M Cer, Mona T Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2014. SemEval-2014 Task 10: Multilingual Semantic Textual Similarity. In SemEval'14. 81--91.

[3]

Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez Agirre, Rada Mihalcea, German Rigau Claramunt, and Janyce Wiebe. 2016. Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In SemEval'16.

[4]

Eneko Agirre, Daniel Cer, Mona Diab, and Aitor Gonzalez-Agirre. 2012. Semeval-2012 task 6: A pilot on semantic textual similarity. In SemEval'12. 385--393.

[5]

Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, and Weiwei Guo. 2013. * SEM 2013 shared task: Semantic textual similarity. In SemEval'13. 32--43.

[6]

Reid Andersen, Fan Chung, and Kevin Lang. 2006. Local graph partitioning using pagerank vectors. FOCS'06.

Digital Library

[7]

Konstantin Avrachenkov, Remco van der Hofstad, and Marina Sokol. 2014. Personalized pagerank with node-dependent restart. In WAW'14. 23--33.

[8]

Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. 2020. Unsupervised learning of visual features by contrasting cluster assignments. In NIPS'20. 9912--9924.

[9]

Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: a library for support vector. TIST'11 (2011), 1--27.

[10]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In ICML'20. PMLR, 1597--1607.

[11]

Xinlei Chen, Saining Xie, and Kaiming He. 2021. An empirical study of training self-supervised vision transformers. In ICCV'21. 9640--9649.

[12]

Ching-Yao Chuang, Joshua Robinson, Lin Yen-Chen Antonio Torralba, and Stefanie Jegelka. 2020. Debiased Contrastive Learning. In NIPS'20.

[13]

Fan Chung and Alexander Tsiatas. 2010. Finding and visualizing graph clusters using pagerank optimization. In WAW'10. 86--97.

[14]

Fan Chung and Wenbo Zhao. 2010. PageRank and random walks on graphs. In Fete of combinatorics and computer science. 43--62.

[15]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL'19. 4171--4186.

[16]

Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. Simcse: Simple contrastive learning of sentence embeddings. EMNLP'21 (2021).

[17]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD'16. 855--864.

Digital Library

[18]

Michael Gutmann and Aapo Hyvärinen. 2010. Noise-Contrastive Estimation: A new estimation principle for unnormalized statistical models. In AISTATS'10. 297--304.

[19]

Kaveh Hassani and Amir Hosein Khasahmadi. 2020. Contrastive multi-view representation learning on graphs. In International conference on machine learning. PMLR, 4116--4126.

[20]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In CVPR'20.

[21]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR'16. 770--778.

[22]

Jui-Ting Huang, Ashish Sharma, Shuying Sun, Li Xia, David Zhang, Philip Pronin, Janani Padmanabhan, Giuseppe Ottaviano, and Linjun Yang. 2020. Embedding-based retrieval in facebook search. In KDD'20. 2553--2561.

Digital Library

[23]

Tinglin Huang, Yuxiao Dong, Ming Ding, Zhen Yang, Wenzheng Feng, Xinyu Wang, and Jie Tang. 2021. textMixGCF: An Improved Training Method for Graph Neural Network-based Recommender Systems. In KDD'21. 665--674.

Digital Library

[24]

Tri Huynh, Simon Kornblith, Matthew R Walter, Michael Maire, and Maryam Khademi. 2022. Boosting contrastive self-supervised learning with false negative cancellation. In WACV'22. 2785--2795.

[25]

Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, and Fillia Makedon. 2020. A survey on contrastive self-supervised learning. Technologies, Vol. 9, 1 (2020), 2.

[26]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, Vol. 7, 3 (2019), 535--547.

[27]

Yannis Kalantidis, Mert Bulent Sariyildiz, Noe Pion, Philippe Weinzaepfel, and Diane Larlus. 2020. Hard Negative Mixing for Contrastive Learning. In NIPS'20.

[28]

Vladimir Karpukhin, Barlas Oug uz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense passage retrieval for open-domain question answering. In EMNLP'20.

[29]

Donald Ervin Knuth. 1997. The art of computer programming: Fundamental Algorithms. Vol. 1. Pearson Education.

[30]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).

[31]

Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, and Roberto Zamparelli. 2014. A SICK cure for the evaluation of compositional distributional semantic models. In LREC'14. 216--223.

[32]

Yu Meng, Chenyan Xiong, Payal Bajaj, Paul Bennett, Jiawei Han, and Xia Song. 2021. Coco-lm: Correcting and contrasting text sequences for language model pretraining. In NIPS'21.

[33]

Annamalai Narayanan, Mahinthan Chandramohan, Rajasekar Venkatesan, Lihui Chen, Yang Liu, and Shantanu Jaiswal. 2017. graph2vec: Learning distributed representations of graphs. In MLG'17.

[34]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).

[35]

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Stanford University Technical Report, (1999).

[36]

Veli?kovi? Petar, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, and R. Devon Hjelm. 2018. Deep graph infomax. In ICLR'19.

[37]

Jiezhong Qiu, Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, and Jie Tang. 2020. Gcc: Graph contrastive coding for graph neural network pre-training. In KDD'20. 1150--1160.

Digital Library

[38]

Joshua Robinson, Ching-Yao Chuang, Suvrit Sra, and Stefanie Jegelka. 2021. Contrastive Learning with Hard Negative Samples. In ICLR'21.

[39]

Nino Shervashidze, Pascal Schweitzer, Erik Jan Van Leeuwen, Kurt Mehlhorn, and Karsten M Borgwardt. 2011. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, Vol. 12, 9 (2011).

[40]

Daniel A Spielman and Shang-Hua Teng. 2013. A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM Journal on computing (2013).

[41]

Fan-Yun Sun, Jordan Hoffmann, Vikas Verma, and Jian Tang. 2019. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. In ICLR'20.

[42]

Afrina Tabassum, Muntasir Wahed, Hoda Eldardiry, and Ismini Lourentzou. 2022. Hard negative sampling strategies for contrastive representation learning. arXiv preprint arXiv:2206.01197 (2022).

[43]

Junfeng Tian, Zhiheng Zhou, Man Lan, and Yuanbin Wu. 2017. Ecnu at semeval-2017 task 1: Leverage kernel-based traditional nlp features and neural networks to build a universal model for multilingual and cross-lingual semantic textual similarity. In SemEval'17. 191--197.

[44]

Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast random walk with restart and its applications. In ICDM'06.

Digital Library

[45]

Guangrun Wang, Keze Wang, Guangcong Wang, Philip HS Torr, and Liang Lin. 2021. Solving inefficiency of self-supervised representation learning. In ICCV'21. 9505--9515.

[46]

Mike Wu, Milan Mosse, Chengxu Zhuang, Daniel Yamins, and Noah Goodman. 2020a. Conditional negative sampling for contrastive learning of visual representations. In ICLR'21.

[47]

Yawen Wu, Zhepeng Wang, Dewen Zeng, Yiyu Shi, and Jingtong Hu. 2021. Enabling on-device self-supervised contrastive learning with selective data contrast. In 2021 58th ACM/IEEE Design Automation Conference (DAC). IEEE, 655--660.

Digital Library

[48]

Zhuofeng Wu, Sinong Wang, Jiatao Gu, Madian Khabsa, Fei Sun, and Hao Ma. 2020b. Clear: Contrastive learning for sentence representation. arXiv preprint arXiv:2012.15466 (2020).

[49]

Zhirong Wu, Yuanjun Xiong, Stella X. Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In CVPR'19. 3733--3742.

[50]

Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. 2020. Approximate nearest neighbor negative contrastive learning for dense text retrieval. In ICLR'21.

[51]

Dongkuan Xu, Wei Cheng, Dongsheng Luo, Haifeng Chen, and Xiang Zhang. 2021. Infogcl: Information-aware graph contrastive learning. In NIPS'21, Vol. 34. 30414--30425.

[52]

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How powerful are graph neural networks?. In ICLR'19.

[53]

Pinar Yanardag and SVN Vishwanathan. 2015. Deep graph kernels. In KDD'15. 1365--1374.

Digital Library

[54]

Zhen Yang, Ming Ding, Chang Zhou, Hongxia Yang, Jingren Zhou, and Jie Tang. 2020. Understanding negative sampling in graph representation learning. In KDD'20. 1666--1676.

Digital Library

[55]

Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, and Jure Leskovec. 2018. Hierarchical graph representation learning with differentiable pooling. In NIPS'19, Vol. 31.

[56]

Yuning You, Tianlong Chen, Yang Shen, and Zhangyang Wang. 2021. Graph contrastive learning automated. In International Conference on Machine Learning. PMLR, 12121--12132.

[57]

Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. 2020. Graph Contrastive Learning with Augmentations. In NIPS'20. 5812--5823.

[58]

Junchi Yu, Tingyang Xu, Yu Rong, Yatao Bian, Junzhou Huang, and Ran He. 2020. Graph information bottleneck for subgraph recognition. In ICLR'21.

[59]

Junchi Yu, Tingyang Xu, Yu Rong, Yatao Bian, Junzhou Huang, and Ran He. 2021. Recognizing predictive substructures with subgraph information bottleneck. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).

Digital Library

[60]

Shaofeng Zhang, Meng Liu, Junchi Yan, Hengrui Zhang, Lingxiao Huang, Xiaokang Yang, and Pinyan Lu. 2022. M-Mix: Generating Hard Negatives via Multi-sample Mixing for Contrastive Learning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2461--2470.

Digital Library

[61]

Weinan Zhang, Tianqi Chen, Jun Wang, and Yong Yu. 2013. Optimizing top-n collaborative filtering via dynamic negative item sampling. In SIGIR'13. 785--788.

Digital Library

[62]

Mingkai Zheng, Fei Wang, Shan You, Chen Qian, Changshui Zhang, Xiaogang Wang, and Chang Xu. 2021. Weakly supervised contrastive learning. In ICCV'21. 10042--10051.

[63]

Kun Zhou, Beichen Zhang, Wayne Xin Zhao, and Ji-Rong Wen. 2022. Debiased Contrastive Learning of Unsupervised Sentence Representations. In ACL'22.

[64]

Yanqiao Zhu, Yichen Xu, Qiang Liu, and Shu Wu. 2021a. An empirical study of graph contrastive learning. arXiv preprint arXiv:2109.01116 (2021).

[65]

Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. 2021b. Graph contrastive learning with adaptive augmentation. In Proceedings of the Web Conference 2021. 2069--2080.

Digital Library

[66]

Jirí ?íma and Satu Elisa Schaeffer. 2006. On the np-completeness of some graph cluster measures. In International Conference on Current Trends in Theory and Practice of Computer Science. io

Cited By

Yang ZDing MHuang TCen YSong JXu BDong YTang J(2024)Does Negative Sampling Matter? a Review With Insights Into its Theory and ApplicationsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.337147346:8(5692-5711)Online publication date: Aug-2024
https://doi.org/10.1109/TPAMI.2024.3371473
Zhang ZZhao LShen YYao B(2024)StockCL: Selective Contrastive Learning for Stock Trend Forecasting via Learnable ConceptsDatabase Systems for Advanced Applications10.1007/978-981-97-5575-2_20(275-284)Online publication date: 2-Sep-2024
https://doi.org/10.1007/978-981-97-5575-2_20

Index Terms

BatchSampler: Sampling Mini-Batches for Contrastive Learning in Vision, Language, and Graphs
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms

Recommendations

Mining negative samples on contrastive learning via curricular weighting strategy
Abstract
Contrastive learning, which pulls positive pairs closer and pushes away negative pairs, has remarkably propelled the development of self-supervised representation learning. Previous studies either neglected negative sample selection, resulting in ...
Highlights
- The strategy for selecting negative samples is closely related to the performance of contrastive learning.
- Employing a negative sample selection curriculum from easy to hard improves contrastive learning.
- Regularizing the weights ...
Hard Negative Sample Mining for Contrastive Representation in Reinforcement Learning
Advances in Knowledge Discovery and Data Mining
Abstract
In recent years, contrastive learning has become an important technology of self-supervised representation learning and achieved SOTA performances in many fields, which has also gained increasing attention in the reinforcement learning (RL) ...
Negative samples selecting strategy for graph contrastive learning
Abstract
Graph neural networks (GNNs) have emerged as a successful method on graph structured data. Limited by expensive labeled data, contrastive learning has been adopted to the graph domain. In most existing node-level graph contrastive ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2023

5996 pages

ISBN:9798400701030

DOI:10.1145/3580305

General Chairs:
Ambuj Singh
UC Santa Barbara, USA
,
Yizhou Sun
UC Los Angeles, USA
,
Program Chairs:
Leman Akoglu
Carnegie Mellon University, USA
,
Dimitrios Gunopulos
University of Athens, Greece
,
Xifeng Yan
UC Santa Barbara, USA
,
Ravi Kumar
Google, USA
,
Fatma Ozcan
Google, USA
,
Jieping Ye
Alibaba DAMO Academy

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Technology and Innovation Major Project of the Ministry of Science
NSF of China for Distinguished Young Scholars
Technology of China under Grant

Conference

KDD '23

Sponsor:

KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 6 - 10, 2023

CA, Long Beach, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
471
Total Downloads

Downloads (Last 12 months)398
Downloads (Last 6 weeks)57

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang ZDing MHuang TCen YSong JXu BDong YTang J(2024)Does Negative Sampling Matter? a Review With Insights Into its Theory and ApplicationsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.337147346:8(5692-5711)Online publication date: Aug-2024
https://doi.org/10.1109/TPAMI.2024.3371473
Zhang ZZhao LShen YYao B(2024)StockCL: Selective Contrastive Learning for Stock Trend Forecasting via Learnable ConceptsDatabase Systems for Advanced Applications10.1007/978-981-97-5575-2_20(275-284)Online publication date: 2-Sep-2024
https://doi.org/10.1007/978-981-97-5575-2_20

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents