Cross-Domain Deep Code Search with Meta Learning

Chai, Yitian; Zhang, Hongyu; Shen, Beijun; Gu, Xiaodong

doi:10.1145/3510003.3510125

Computer Science > Software Engineering

arXiv:2201.00150v4 (cs)

[Submitted on 1 Jan 2022 (v1), revised 30 Aug 2022 (this version, v4), latest version 12 Mar 2024 (v6)]

Title:Cross-Domain Deep Code Search with Meta Learning

Authors:Yitian Chai, Hongyu Zhang, Beijun Shen, Xiaodong Gu

View PDF

Abstract:Recently, pre-trained programming language models such as CodeBERT have demonstrated substantial gains in code search. Despite showing great performance, they rely on the availability of large amounts of parallel data to fine-tune the semantic mappings between queries and code. This restricts their practicality in domain-specific languages with relatively scarce and expensive data. In this paper, we propose CDCS, a novel approach for domain-specific code search. CDCS employs a transfer learning framework where an initial program representation model is pre-trained on a large corpus of common programming languages (such as Java and Python), and is further adapted to domain-specific languages such as SQL and Solidity. Unlike cross-language CodeBERT, which is directly fine-tuned in the target language, CDCS adapts a few-shot meta-learning algorithm called MAML to learn the good initialization of model parameters, which can be best reused in a domain-specific language. We evaluate the proposed approach on two domain-specific languages, namely, SQL and Solidity, with model transferred from two widely used languages (Python and Java). Experimental results show that CDCS significantly outperforms conventional pre-trained code models that are directly fine-tuned in domain-specific languages, and it is particularly effective for scarce data.

Comments:	Accepted by ICSE 2022 (The 44th International Conference on Software Engineering)
Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2201.00150 [cs.SE]
	(or arXiv:2201.00150v4 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2201.00150
Related DOI:	https://doi.org/10.1145/3510003.3510125

Submission history

From: Xiaodong Gu [view email]
[v1] Sat, 1 Jan 2022 09:00:48 UTC (3,883 KB)
[v2] Tue, 28 Jun 2022 12:12:02 UTC (3,883 KB)
[v3] Sun, 21 Aug 2022 12:25:21 UTC (3,883 KB)
[v4] Tue, 30 Aug 2022 12:18:38 UTC (3,883 KB)
[v5] Sat, 3 Dec 2022 06:25:06 UTC (3,883 KB)
[v6] Tue, 12 Mar 2024 05:31:50 UTC (3,887 KB)

Computer Science > Software Engineering

Title:Cross-Domain Deep Code Search with Meta Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Cross-Domain Deep Code Search with Meta Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators