Abstract
Code search is a crucial task in software engineering, aiming to search relevant code from the codebase based on natural language queries. While deep-learning-based code search methods have demonstrated impressive performance, recent advances in contrastive learning have further enhanced the representation learning of these models. Despite these improvements, existing methods still have limitations in the representation learning of multi-modal data. Specifically, these methods suffer from a semantic loss in the representation learning of code and fail to explore functionally relevant code pairs in the representation learning fully. To address these limitations, we propose A Representation Fusion based Multi-View Momentum Contrastive Learning Framework for Code Search, named RFMC-CS. RFMC-CS effectively retains the semantic and structural information of code through multi-modal representation and fusion. Through elaborately designed Multi-View Momentum Contrastive Learning, RFMC-CS can further learn the correlations between different modalities of samples and semantic relevant samples. The experimental results on the CodeSearchNet benchmark show that RFMC-CS outperforms seven advanced baselines on MRR and Recall@k metrics. The ablation experiments illustrate the effectiveness of each component. The portability experiments show that RFMC-CS has good portability.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
No datasets were generated or analysed during the current study.
References
Cambronero, J., Li, H., Kim, S., Sen, K., Chandra, S.: When deep learning met code search. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 964–974. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3338906.3340458
Chai, Y., Zhang, H., Shen, B., Gu, X.: Cross-domain deep code search with meta learning. In: Proceedings of the 44th International Conference on Software Engineering, pp. 487–498. Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3510003.3510125
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning, vol. 119, pp. 1597–1607. PMLR (2020)
Cheng, Y., Kuang, L.: CSRS: code search with relevance matching and semantic matching. In: 2022 IEEE/ACM 30th International Conference on Program Comprehension, pp. 533–542 (2022). https://doi.org/10.1145/3524610.3527889
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1423
Di Grazia, L., Pradel, M.: Code search: a survey of techniques for finding code. ACM Comput. Surv. (2023). https://doi.org/10.1145/3565971
Ding, Y., Buratti, L., Pujar, S., Morari, A., Ray, B., Chakraborty, S.: Contrastive learning for source code with structural and functional properties (2021). CoRR arXiv:2110.03868 [cs.PL]
Fang, H., Wang, S., Zhou, M., Ding, J., Xie, P.: CERT: contrastive self-supervised learning for language understanding (2020). CoRR arXiv:2005.12766 [cs.CL]
Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., Zhou, M.: CodeBERT: a pre-trained model for programming and natural languages. In: Findings of the Association for Computational Linguistics, pp. 1536–1547. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.139
Gao, T., Yao, X., Chen, D.: SimCSE: simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 6894–6910. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021). https://doi.org/10.18653/v1/2021.emnlp-main.552
Giorgi, J., Nitski, O., Wang, B., Bader, G.: DeCLUTR: deep contrastive learning for unsupervised textual representations. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol. 1, pp. 879–895. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.acl-long.72
Gu, X., Zhang, H., Kim, S.: Deep code search. In: Proceedings of the 40th International Conference on Software Engineering, pp. 933–944. Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3180155.3180167
Guo, D., Lu, S., Duan, N., Wang, Y., Zhou, M., Yin, J.: UniXcoder: unified cross-modal pre-training for code representation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 7212–7225. Association for Computational Linguistics, Dublin, Ireland (2022). https://doi.org/10.18653/v1/2022.acl-long.499
Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., Liu, S., Zhou, L., Duan, N., Svyatkovskiy, A., Fu, S., Tufano, M., Deng, S.K., Clement, C., Drain, D., Sundaresan, N., Yin, J., Jiang, D., Zhou, M.: GraphCodeBERT: pre-training code representations with data flow. In: International Conference on Learning Representations (2021)
He, X., Deng, K., Wang, X., Li, Y., Zhang, Y., Wang, M.: Lightgcn: Simplifying and powering graph convolution network for recommendation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 639–648. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3397271.3401063
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9726–9735 (2020). https://doi.org/10.1109/CVPR42600.2020.00975
Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. In: International Conference on Learning Representations (2019)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Hu, Y., Jiang, H., Hu, Z.: Measuring code maintainability with deep neural networks. Front. Comput. Sci. 17(6), 176214 (2023). https://doi.org/10.1007/s11704-022-2313-0
Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., Brockschmidt, M.: CodesearchNet challenge: evaluating the state of semantic code search (2019). CoRR arXiv:1909.09436 [cs.LG]
Jiang, X., Zheng, Z., Lyu, C., Li, L., Lyu, L.: Treebert: a tree-based pre-trained model for programming language. In: Uncertainty in Artificial Intelligence, pp. 54–63 (2021). PMLR
Kim, K., Ghatpande, S., Kim, D., Zhou, X., Liu, K., Bissyandé, T.F., Klein, J., Le Traon, Y.: Big code search: a bibliography. ACM Comput. Surv. (2023). https://doi.org/10.1145/3604905
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR 2017. OpenReview.net (2017)
Li, X., Gong, Y., Shen, Y., Qiu, X., Zhang, H., Yao, B., Qi, W., Jiang, D., Chen, W., Duan, N.: CodeRetriever: a large scale contrastive pre-training method for code search. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 2898–2910. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (2022)
Li, J., Liu, F., Li, J., Zhao, Y., Li, G., Jin, Z.: MCodeSearcher: multi-view contrastive learning for code search. In: Proceedings of the 14th Asia-Pacific Symposium on Internetware, pp. 270–280. Association for Computing Machinery, New York, NY, USA (2023). https://doi.org/10.1145/3609437.3609456
Li, Z., Yin, G., Wang, T., Zhang, Y., Yu, Y., Wang, H.: Correlation-based software search by leveraging software term database. Front. Comput. Sci. 12(5), 923–938 (2018). https://doi.org/10.1007/s11704-017-6573-z
Linstead, E., Bajracharya, S., Ngo, T., Rigor, P., Lopes, C., Baldi, P.: Sourcerer: mining and searching internet-scale software repositories. Data Min. Knowl. Discov. 18(2), 300–336 (2009). https://doi.org/10.1007/s10618-008-0118-x
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: a robustly optimized Bert pretraining approach (2019). CoRR arXiv:1907.11692 [cs.CL]
Liu, C., Xia, X., Lo, D., Gao, C., Yang, X., Grundy, J.: Opportunities and challenges in code search tools. ACM Comput. Surv. (2021). https://doi.org/10.1145/3480027
Lv, F., Zhang, H., Lou, J.-G., Wang, S., Zhang, D., Zhao, J.: CodeHow: effective code search based on api understanding and extended Boolean model. In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering, pp. 260–270 (2015). https://doi.org/10.1109/ASE.2015.42
McMillan, C., Grechanik, M., Poshyvanyk, D., Xie, Q., Fu, C.: Portfolio: finding relevant functions and their usage. In: 2011 33rd International Conference on Software Engineering, pp. 111–120 (2011). https://doi.org/10.1145/1985793.1985809
Sennrich, R., Haddow, B., Birch, A.: Neural Machine Translation of Rare Words with Subword Units (2016). arXiv:1508.07909
Shi, E., Wang, Y., Gu, W., Du, L., Zhang, H., Han, S., Zhang, D., Sun, H.: CoCoSoDa: effective contrastive learning for code search. In: Proceedings of the 45th International Conference on Software Engineering, pp. 2198–2210. IEEE Press (2023). https://doi.org/10.1109/ICSE48619.2023.00185
Shi, Z., Xiong, Y., Zhang, Y., Jiang, Z., Zhao, J., Wang, L., Li, S.: Improving code search with multi-modal momentum contrastive learning. In: 2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC), pp. 280–291 (2023). https://doi.org/10.1109/ICPC58990.2023.00043
Shuai, J., Xu, L., Liu, C., Yan, M., Xia, X., Lei, Y.: Improving code search with co-attentive representation learning. In: Proceedings of the 28th International Conference on Program Comprehension, pp. 196–207. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3387904.3389269
Sun, W., Fang, C., Chen, Y., Tao, G., Han, T., Zhang, Q.: Code search based on context-aware code translation. In: Proceedings of the 44th International Conference on Software Engineering, pp. 388–400. Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3510003.3510140
Wan, Y., Shu, J., Sui, Y., Xu, G., Zhao, Z., Wu, J., Yu, P.: Multi-modal attention network learning for semantic source code retrieval. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 13–25. IEEE, San Diego, CA, USA (2019). https://doi.org/10.1109/ASE.2019.00012
Wang, Y., Wang, W., Joty, S., Hoi, S.C.H.: CodeT5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 8696–8708. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021). https://doi.org/10.18653/v1/2021.emnlp-main.685
Wang, X., Wang, Y., Mi, F., Zhou, P., Wan, Y., Liu, X., Li, L., Wu, H., Liu, J., Jiang, X.: SynCoBERT: syntax-guided multi-modal contrastive pre-training for code representation (2021). CoRR arXiv:2108.04556 [cs.CL]
Wu, Z., Xiong, Y., Yu, S., Lin, D.: Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination (2018). arXiv:1805.01978
Xu, L., Yang, H., Liu, C., Shuai, J., Yan, M., Lei, Y., Xu, Z.: Two-stage attention-based model for code search with textual and structural features. In: 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering, pp. 342–353 (2021). https://doi.org/10.1109/SANER50967.2021.00039
Yan, Y., Li, R., Wang, S., Zhang, F., Wu, W., Xu, W.: ConSERT: a contrastive framework for self-supervised sentence representation transfer. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol. 1, pp. 5065–5075. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.acl-long.393
Zhu, Q., Sun, Z., Liang, X., Xiong, Y., Zhang, L.: OCoR: an overlapping-aware code retriever. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pp. 883–894. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3324884.3416530
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China (No. 62250610224), and CCF-Zhipu Large Model Innovation Fund (No. CCF-Zhipu202408).
Author information
Authors and Affiliations
Contributions
GC: Conceptualization of this study, Methodology, Software, Investigation, Writing—original draft. WL: Methodology, Formal analysis, Investigation, Validation, Writing—review. XX: Conceptualization of this study, Supervision, Funding acquisition, Writing—review.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, G., Liu, W. & Xie, X. RFMC-CS: a representation fusion based multi-view momentum contrastive learning framework for code search. Autom Softw Eng 32, 16 (2025). https://doi.org/10.1007/s10515-025-00487-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10515-025-00487-8