research-article

Open access

Learning to Detect and Localize Multilingual Bugs

Authors:

Haoran Yang,

Yu Nong,

Tao Zhang,

Xiapu Luo,

Haipeng CaiAuthors Info & Claims

Proceedings of the ACM on Software Engineering, Volume 1, Issue FSE

Article No.: 97, Pages 2190 - 2213

https://doi.org/10.1145/3660804

Published: 12 July 2024 Publication History

PDF eReader

Abstract

Increasing studies have shown bugs in multi-language software as a critical loophole in modern software quality assurance, especially those induced by language interactions (i.e., multilingual bugs). Yet existing tool support for bug detection/localization remains largely limited to single-language software, despite the long-standing prevalence of multi-language systems in various real-world software domains. Extant static/dynamic analysis and deep learning (DL) based approaches all face major challenges in addressing multilingual bugs. In this paper, we present xLoc, a DL-based technique/tool for detecting and localizing multilingual bugs. Motivated by results of our bug-characteristics study on top locations of multilingual bugs, xLoc first learns the general knowledge relevant to differentiating various multilingual control-flow structures. This is achieved by pre-training a Transformer model with customized position encoding against novel objectives. Then, xLoc learns task-specific knowledge for the task of multilingual bug detection/localization, through another new position encoding scheme (based on cross-language API vicinity) that allows for the model to attend particularly to control-flow constructs that bear most multilingual bugs during fine-tuning. We have implemented xLoc for Python-C software and curated a dataset of 3,770 buggy and 15,884 non-buggy Python-C samples, which enabled our extensive evaluation of xLoc against two state-of-the-art baselines: fine-tuned CodeT5 and zero-shot ChatGPT. Our results show that xLoc achieved 94.98% F1 and 87.24%@Top-1 accuracy, which are significantly (up to 162.88% and 511.75%) higher than the baselines. Ablation studies further confirmed significant contributions of each of the novel design elements in xLoc. With respective bug-location characteristics and labeled bug datasets for fine-tuning, our design may be applied to other language combinations beyond Python-C.

References

[1]

Mouna Abidi, Manel Grichi, and Foutse Khomh. 2019. Behind the scenes: developers’ perception of multi-language practices. In Annual International Conference on Computer Science and Software Engineering. 72–81.

Abstract

References

Index Terms

Recommendations

Detect Related Bugs from Source Code Using Bug Information

Are Neural Bug Detectors Comparable to Software Developers on Variable Misuse Bugs?

Poster: Protecting Source Code Privacy When Hunting Bugs

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations