RFMC-CS: a representation fusion based multi-view momentum contrastive learning framework for code search

Chen, Gong; Liu, Wenjie; Xie, Xiaoyuan

doi:10.1007/s10515-025-00487-8

RFMC-CS: a representation fusion based multi-view momentum contrastive learning framework for code search

Published: 27 January 2025

Volume 32, article number 16, (2025)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Gong Chen¹^na1,
Wenjie Liu¹^na1 &
Xiaoyuan Xie¹

Abstract

Code search is a crucial task in software engineering, aiming to search relevant code from the codebase based on natural language queries. While deep-learning-based code search methods have demonstrated impressive performance, recent advances in contrastive learning have further enhanced the representation learning of these models. Despite these improvements, existing methods still have limitations in the representation learning of multi-modal data. Specifically, these methods suffer from a semantic loss in the representation learning of code and fail to explore functionally relevant code pairs in the representation learning fully. To address these limitations, we propose A Representation Fusion based Multi-View Momentum Contrastive Learning Framework for Code Search, named RFMC-CS. RFMC-CS effectively retains the semantic and structural information of code through multi-modal representation and fusion. Through elaborately designed Multi-View Momentum Contrastive Learning, RFMC-CS can further learn the correlations between different modalities of samples and semantic relevant samples. The experimental results on the CodeSearchNet benchmark show that RFMC-CS outperforms seven advanced baselines on MRR and Recall@k metrics. The ablation experiments illustrate the effectiveness of each component. The portability experiments show that RFMC-CS has good portability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CMCS: contrastive-metric learning via vector-level sampling and augmentation for code search

Article Open access 24 June 2024

Improving Code Representation Learning via Multi-view Contrastive Graph Pooling for Abstract Syntax Tree

CUTE: A Collaborative Fusion Representation-Based Fine-Tuning and Retrieval Framework for Code Search

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

No datasets were generated or analysed during the current study.

Notes

References

Cambronero, J., Li, H., Kim, S., Sen, K., Chandra, S.: When deep learning met code search. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 964–974. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3338906.3340458
Chai, Y., Zhang, H., Shen, B., Gu, X.: Cross-domain deep code search with meta learning. In: Proceedings of the 44th International Conference on Software Engineering, pp. 487–498. Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3510003.3510125
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning, vol. 119, pp. 1597–1607. PMLR (2020)
Cheng, Y., Kuang, L.: CSRS: code search with relevance matching and semantic matching. In: 2022 IEEE/ACM 30th International Conference on Program Comprehension, pp. 533–542 (2022). https://doi.org/10.1145/3524610.3527889
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1423
Di Grazia, L., Pradel, M.: Code search: a survey of techniques for finding code. ACM Comput. Surv. (2023). https://doi.org/10.1145/3565971
Article MATH Google Scholar
Ding, Y., Buratti, L., Pujar, S., Morari, A., Ray, B., Chakraborty, S.: Contrastive learning for source code with structural and functional properties (2021). CoRR arXiv:2110.03868 [cs.PL]
Fang, H., Wang, S., Zhou, M., Ding, J., Xie, P.: CERT: contrastive self-supervised learning for language understanding (2020). CoRR arXiv:2005.12766 [cs.CL]
Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., Zhou, M.: CodeBERT: a pre-trained model for programming and natural languages. In: Findings of the Association for Computational Linguistics, pp. 1536–1547. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.139
Gao, T., Yao, X., Chen, D.: SimCSE: simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 6894–6910. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021). https://doi.org/10.18653/v1/2021.emnlp-main.552
Giorgi, J., Nitski, O., Wang, B., Bader, G.: DeCLUTR: deep contrastive learning for unsupervised textual representations. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol. 1, pp. 879–895. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.acl-long.72
Gu, X., Zhang, H., Kim, S.: Deep code search. In: Proceedings of the 40th International Conference on Software Engineering, pp. 933–944. Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3180155.3180167
Guo, D., Lu, S., Duan, N., Wang, Y., Zhou, M., Yin, J.: UniXcoder: unified cross-modal pre-training for code representation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 7212–7225. Association for Computational Linguistics, Dublin, Ireland (2022). https://doi.org/10.18653/v1/2022.acl-long.499
Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., Liu, S., Zhou, L., Duan, N., Svyatkovskiy, A., Fu, S., Tufano, M., Deng, S.K., Clement, C., Drain, D., Sundaresan, N., Yin, J., Jiang, D., Zhou, M.: GraphCodeBERT: pre-training code representations with data flow. In: International Conference on Learning Representations (2021)
He, X., Deng, K., Wang, X., Li, Y., Zhang, Y., Wang, M.: Lightgcn: Simplifying and powering graph convolution network for recommendation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 639–648. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3397271.3401063
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9726–9735 (2020). https://doi.org/10.1109/CVPR42600.2020.00975
Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. In: International Conference on Learning Representations (2019)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article MATH Google Scholar
Hu, Y., Jiang, H., Hu, Z.: Measuring code maintainability with deep neural networks. Front. Comput. Sci. 17(6), 176214 (2023). https://doi.org/10.1007/s11704-022-2313-0
Article MATH Google Scholar
Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., Brockschmidt, M.: CodesearchNet challenge: evaluating the state of semantic code search (2019). CoRR arXiv:1909.09436 [cs.LG]
Jiang, X., Zheng, Z., Lyu, C., Li, L., Lyu, L.: Treebert: a tree-based pre-trained model for programming language. In: Uncertainty in Artificial Intelligence, pp. 54–63 (2021). PMLR
Kim, K., Ghatpande, S., Kim, D., Zhou, X., Liu, K., Bissyandé, T.F., Klein, J., Le Traon, Y.: Big code search: a bibliography. ACM Comput. Surv. (2023). https://doi.org/10.1145/3604905
Article Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR 2017. OpenReview.net (2017)
Li, X., Gong, Y., Shen, Y., Qiu, X., Zhang, H., Yao, B., Qi, W., Jiang, D., Chen, W., Duan, N.: CodeRetriever: a large scale contrastive pre-training method for code search. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 2898–2910. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (2022)
Li, J., Liu, F., Li, J., Zhao, Y., Li, G., Jin, Z.: MCodeSearcher: multi-view contrastive learning for code search. In: Proceedings of the 14th Asia-Pacific Symposium on Internetware, pp. 270–280. Association for Computing Machinery, New York, NY, USA (2023). https://doi.org/10.1145/3609437.3609456
Li, Z., Yin, G., Wang, T., Zhang, Y., Yu, Y., Wang, H.: Correlation-based software search by leveraging software term database. Front. Comput. Sci. 12(5), 923–938 (2018). https://doi.org/10.1007/s11704-017-6573-z
Article MATH Google Scholar
Linstead, E., Bajracharya, S., Ngo, T., Rigor, P., Lopes, C., Baldi, P.: Sourcerer: mining and searching internet-scale software repositories. Data Min. Knowl. Discov. 18(2), 300–336 (2009). https://doi.org/10.1007/s10618-008-0118-x
Article MathSciNet Google Scholar
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: a robustly optimized Bert pretraining approach (2019). CoRR arXiv:1907.11692 [cs.CL]
Liu, C., Xia, X., Lo, D., Gao, C., Yang, X., Grundy, J.: Opportunities and challenges in code search tools. ACM Comput. Surv. (2021). https://doi.org/10.1145/3480027
Article MATH Google Scholar
Lv, F., Zhang, H., Lou, J.-G., Wang, S., Zhang, D., Zhao, J.: CodeHow: effective code search based on api understanding and extended Boolean model. In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering, pp. 260–270 (2015). https://doi.org/10.1109/ASE.2015.42
McMillan, C., Grechanik, M., Poshyvanyk, D., Xie, Q., Fu, C.: Portfolio: finding relevant functions and their usage. In: 2011 33rd International Conference on Software Engineering, pp. 111–120 (2011). https://doi.org/10.1145/1985793.1985809
Sennrich, R., Haddow, B., Birch, A.: Neural Machine Translation of Rare Words with Subword Units (2016). arXiv:1508.07909
Shi, E., Wang, Y., Gu, W., Du, L., Zhang, H., Han, S., Zhang, D., Sun, H.: CoCoSoDa: effective contrastive learning for code search. In: Proceedings of the 45th International Conference on Software Engineering, pp. 2198–2210. IEEE Press (2023). https://doi.org/10.1109/ICSE48619.2023.00185
Shi, Z., Xiong, Y., Zhang, Y., Jiang, Z., Zhao, J., Wang, L., Li, S.: Improving code search with multi-modal momentum contrastive learning. In: 2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC), pp. 280–291 (2023). https://doi.org/10.1109/ICPC58990.2023.00043
Shuai, J., Xu, L., Liu, C., Yan, M., Xia, X., Lei, Y.: Improving code search with co-attentive representation learning. In: Proceedings of the 28th International Conference on Program Comprehension, pp. 196–207. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3387904.3389269
Sun, W., Fang, C., Chen, Y., Tao, G., Han, T., Zhang, Q.: Code search based on context-aware code translation. In: Proceedings of the 44th International Conference on Software Engineering, pp. 388–400. Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3510003.3510140
Wan, Y., Shu, J., Sui, Y., Xu, G., Zhao, Z., Wu, J., Yu, P.: Multi-modal attention network learning for semantic source code retrieval. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 13–25. IEEE, San Diego, CA, USA (2019). https://doi.org/10.1109/ASE.2019.00012
Wang, Y., Wang, W., Joty, S., Hoi, S.C.H.: CodeT5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 8696–8708. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021). https://doi.org/10.18653/v1/2021.emnlp-main.685
Wang, X., Wang, Y., Mi, F., Zhou, P., Wan, Y., Liu, X., Li, L., Wu, H., Liu, J., Jiang, X.: SynCoBERT: syntax-guided multi-modal contrastive pre-training for code representation (2021). CoRR arXiv:2108.04556 [cs.CL]
Wu, Z., Xiong, Y., Yu, S., Lin, D.: Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination (2018). arXiv:1805.01978
Xu, L., Yang, H., Liu, C., Shuai, J., Yan, M., Lei, Y., Xu, Z.: Two-stage attention-based model for code search with textual and structural features. In: 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering, pp. 342–353 (2021). https://doi.org/10.1109/SANER50967.2021.00039
Yan, Y., Li, R., Wang, S., Zhang, F., Wu, W., Xu, W.: ConSERT: a contrastive framework for self-supervised sentence representation transfer. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol. 1, pp. 5065–5075. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.acl-long.393
Zhu, Q., Sun, Z., Liang, X., Xiong, Y., Zhang, L.: OCoR: an overlapping-aware code retriever. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pp. 883–894. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3324884.3416530

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (No. 62250610224), and CCF-Zhipu Large Model Innovation Fund (No. CCF-Zhipu202408).

Author information

Gong Chen and Wenjie Liu have contributed equally to this work.

Authors and Affiliations

School of Computer Science, Wuhan University, Wuhan, Hubei, China
Gong Chen, Wenjie Liu & Xiaoyuan Xie

Authors

Gong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyuan Xie
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

GC: Conceptualization of this study, Methodology, Software, Investigation, Writing—original draft. WL: Methodology, Formal analysis, Investigation, Validation, Writing—review. XX: Conceptualization of this study, Supervision, Funding acquisition, Writing—review.

Corresponding author

Correspondence to Xiaoyuan Xie.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, G., Liu, W. & Xie, X. RFMC-CS: a representation fusion based multi-view momentum contrastive learning framework for code search. Autom Softw Eng 32, 16 (2025). https://doi.org/10.1007/s10515-025-00487-8

Download citation

Received: 15 December 2024
Accepted: 11 January 2025
Published: 27 January 2025
DOI: https://doi.org/10.1007/s10515-025-00487-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RFMC-CS: a representation fusion based multi-view momentum contrastive learning framework for code search

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CMCS: contrastive-metric learning via vector-level sampling and augmentation for code search

Improving Code Representation Learning via Multi-view Contrastive Graph Pooling for Abstract Syntax Tree

CUTE: A Collaborative Fusion Representation-Based Fine-Tuning and Retrieval Framework for Code Search

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

RFMC-CS: a representation fusion based multi-view momentum contrastive learning framework for code search

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CMCS: contrastive-metric learning via vector-level sampling and augmentation for code search

Improving Code Representation Learning via Multi-view Contrastive Graph Pooling for Abstract Syntax Tree

CUTE: A Collaborative Fusion Representation-Based Fine-Tuning and Retrieval Framework for Code Search

Explore related subjects

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation