research-article

Public Access

Cross-language code search using static and dynamic analyses

Authors:

Kathryn T. StoleeAuthors Info & Claims

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 205 - 217

https://doi.org/10.1145/3468264.3468538

Published: 18 August 2021 Publication History

Abstract

As code search permeates most activities in software development,code-to-code search has emerged to support using code as a query and retrieving similar code in the search results. Applications include duplicate code detection for refactoring, patch identification for program repair, and language translation. Existing code-to-code search tools rely on static similarity approaches such as the comparison of tokens and abstract syntax trees (AST) to approximate dynamic behavior, leading to low precision. Most tools do not support cross-language code-to-code search, and those that do, rely on machine learning models that require labeled training data. We present Code-to-Code Search Across Languages (COSAL), a cross-language technique that uses both static and dynamic analyses to identify similar code and does not require a machine learning model. Code snippets are ranked using non-dominated sorting based on code token similarity, structural similarity, and behavioral similarity. We empirically evaluate COSAL on two datasets of 43,146Java and Python files and 55,499 Java files and find that 1) code search based on non-dominated ranking of static and dynamic similarity measures is more effective compared to single or weighted measures; and 2) COSAL has better precision and recall compared to state-of-the-art within-language and cross-language code-to-code search tools. We explore the potential for using COSAL on large open-source repositories and discuss scalability to more languages and similarity metrics, providing a gateway for practical,multi-language code-to-code search.

References

[1]

[n.d.]. SearchCode. searchcode.com [Online; accessed 06-February-2020].

[2]

2021. COSAL. Mathew, George and Stolee, Kathryn T. Stolee. https://doi.org/10.5281/zenodo.4968705

Digital Library

[3]

Yasemin Acar, Michael Backes, Sascha Fahl, Doowon Kim, Michelle L Mazurek, and Christian Stransky. 2016. You get where you’re looking for: The impact of information sources on code security. In 2016 IEEE Symposium on Security and Privacy (SP). 289–305. https://doi.org/10.1109/SP.2016.25

[4]

Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 281–293. https://doi.org/10.1145/2635868.2635883

Digital Library

[5]

AtCoder Inc. [n.d.]. AtCoder. atcoder.jp Accessed: 2020-08-12.

[6]

Leif Azzopardi, Yashar Moshfeghi, Martin Halvey, Rami S Alkhawaldeh, Krisztian Balog, Emanuele Di Buccio, Diego Ceccarelli, Juan M Fernández-Luna, Charlie Hull, and Jake Mannix. 2017. Lucene4IR: Developing information retrieval evaluation resources using Lucene. In ACM SIGIR Forum. 50, 58–75. https://doi.org/10.1145/3053408.3053421

Digital Library

[7]

Brenda S Baker. 1995. On finding duplication and near-duplication in large software systems. In Proceedings of 2nd Working Conference on Reverse Engineering. 86–95. https://doi.org/10.1109/WCRE.1995.514697

[8]

Pierre Baldi and Yves Chauvin. 1993. Neural networks for fingerprint recognition. neural computation, 5, 3 (1993), 402–418. https://doi.org/10.1162/neco.1993.5.3.402

Digital Library

[9]

Geetika Bansal and Rajkumar Tekchandani. 2014. Selecting a set of appropriate metrics for detecting code clones. In 2014 Seventh International Conference on Contemporary Computing (IC3). 484–488. https://doi.org/10.1109/IC3.2014.6897221

[10]

Earl T Barr, Yuriy Brun, Premkumar Devanbu, Mark Harman, and Federica Sarro. 2014. The plastic surgery hypothesis. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 306–317. https://doi.org/10.1145/2635868.2635898

Digital Library

[11]

Ira D Baxter, Andrew Yahin, Leonardo Moura, Marcelo Sant’Anna, and Lorraine Bier. 1998. Clone detection using abstract syntax trees. In Software Maintenance, 1998. Proceedings., International Conference on. 368–377. https://doi.org/10.1109/ICSM.1998.738528

[12]

Stefan Bellon, Rainer Koschke, Giulio Antoniol, Jens Krinke, and Ettore Merlo. 2007. Comparison and evaluation of clone detection tools. IEEE Transactions on software engineering, 33, 9 (2007), 577–591. https://doi.org/10.1109/TSE.2007.70725

Digital Library

[13]

Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM, 18, 9 (1975), 509–517. https://doi.org/10.1145/361002.361007

Digital Library

[14]

Dave Binkley, Marcia Davis, Dawn Lawrie, Jonathan I Maletic, Christopher Morrell, and Bonita Sharif. 2013. The impact of identifier style on effort and comprehension. Empirical Software Engineering, 18, 2 (2013), 219–276. https://doi.org/10.1007/s10664-012-9201-4

Digital Library

[15]

S Bird, E Klein, and E Loper. 2009. Accessing text corpora and lexical resources. Natural Language Processing with Python, https://doi.org/10.5555/1717171

[16]

Nghi DQ Bui, Yijun Yu, and Lingxiao Jiang. 2021. InferCode: Self-Supervised Learning of Code Representations by Predicting Subtrees. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 1186–1197. https://doi.org/10.1109/ICSE43902.2021.00109

Digital Library

[17]

William AV Clark and Karen L Avery. 1976. The effects of data aggregation in statistical analysis. Geographical Analysis, 8, 4 (1976), 428–438. https://doi.org/10.1111/j.1538-4632.1976.tb00549.x

[18]

Python Community. [n.d.]. Python Keywords. tiny.cc/q7jqsz Accessed: 2020-08-12.

[19]

Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation, 6, 2 (2002), 182–197. https://doi.org/10.1109/4235.996017

Digital Library

[20]

Florian Deissenboeck, Lars Heinemann, Benjamin Hummel, and Stefan Wagner. 2012. Challenges of the dynamic detection of functionally similar code fragments. In Software Maintenance and Reengineering (CSMR), 2012 16th European Conference on. 299–308. https://doi.org/10.1109/CSMR.2012.38

Digital Library

[21]

Kan Deng. 1998. Omega: On-line memory-based general purpose system classifier. Ph.D. Dissertation. Carnegie Mellon University. https://doi.org/10.5555/929042

[22]

DotNet. [n.d.]. Roslyn. https://github.com/dotnet/roslyn Accessed: 2020-08-12.

[23]

Maha Elarbi, Slim Bechikh, Abhishek Gupta, Lamjed Ben Said, and Yew-Soon Ong. 2017. A new decomposition-based NSGA-II for many-objective optimization. IEEE transactions on systems, man, and cybernetics: systems, 48, 7 (2017), 1191–1210. https://doi.org/10.1109/TSMC.2017.2654301

[24]

Rochelle Elva and Gary T Leavens. 2012. Semantic clone detection using method ioe-behavior. In 2012 6th International Workshop on Software Clones (IWSC). 80–81. https://doi.org/10.1109/IWSC.2012.6227874

[25]

Carlos M Fonseca and Peter J Fleming. 1993. Genetic Algorithms for Multiobjective Optimization: FormulationDiscussion and Generalization. In Icga. 93, 416–423. https://doi.org/10.5555/645513.657757

[26]

Mark Gabel, Lingxiao Jiang, and Zhendong Su. 2008. Scalable detection of semantic clones. In Proceedings of the 30th international conference on Software engineering. 321–330. https://doi.org/10.1145/1368088.1368132

Digital Library

[27]

Google. [n.d.]. Google Code Jam. code.google.com/codejam Accessed: 2018-09-25.

[28]

Divya Gopinath, Muhammad Zubair Malik, and Sarfraz Khurshid. 2011. Specification-based program repair using SAT. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 173–188. https://doi.org/10.1007/978-3-642-19835-9_15

[29]

Clinton Gormley and Zachary Tong. 2015. Elasticsearch: the definitive guide: a distributed real-time search and analytics engine. " O’Reilly Media, Inc.". isbn:978-1-449-35854-9

[30]

Michael Greenspan and Mike Yurick. 2003. Approximate kd tree search for efficient ICP. In Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings. 442–448. https://doi.org/10.1109/IM.2003.1240280

[31]

Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. In Proceedings of the 40th International Conference on Software Engineering. 933–944. https://doi.org/10.1145/3180155.3180167

Digital Library

[32]

James Halliday. [n.d.]. c-tokenzier. https://github.com/substack/c-tokenizer Accessed: 2020-08-12.

[33]

Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In 2012 34th International Conference on Software Engineering (ICSE). 837–847. https://doi.org/10.5555/2337223.2337322

[34]

Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stephane Glondu. 2007. Deckard: Scalable and accurate tree-based detection of code clones. In Proceedings of the 29th international conference on Software Engineering. 96–105. https://doi.org/10.1109/ICSE.2007.30

Digital Library

[35]

Lingxiao Jiang and Zhendong Su. 2009. Automatic mining of functionally equivalent code fragments via random testing. In Proceedings of the eighteenth international symposium on Software testing and analysis. 81–92. https://doi.org/10.1145/1572272.1572283

Digital Library

[36]

Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue. 2002. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering, 28, 7 (2002), 654–670. https://doi.org/10.1109/TSE.2002.1019480

Digital Library

[37]

Yalin Ke, Kathryn T Stolee, Claire Le Goues, and Yuriy Brun. 2015. Repairing programs with semantic code search. In Automated Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on. 295–306. https://doi.org/10.1109/ASE.2015.60

Digital Library

[38]

James Kennedy and Russell Eberhart. 1995. Particle swarm optimization. In Proceedings of ICNN’95-International Conference on Neural Networks. 4, 1942–1948. https://doi.org/10.1109/ICNN.1995.488968

[39]

Heejung Kim, Yungbum Jung, Sunghun Kim, and Kwankeun Yi. 2011. MeCC: memory comparison-based clone detector. In Proceedings of the 33rd International Conference on Software Engineering. 301–310. https://doi.org/10.1145/1985793.1985835

Digital Library

[40]

Kisub Kim, Dongsun Kim, Tegawendé F Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY: a code-to-code search engine. In Proceedings of the 40th International Conference on Software Engineering. 946–957. https://doi.org/10.1145/3180155.3180187

Digital Library

[41]

Scott Kirkpatrick, C Daniel Gelatt, and Mario P Vecchi. 1983. Optimization by simulated annealing. science, 220, 4598 (1983), 671–680. https://doi.org/10.1126/science.220.4598.671

[42]

Rainer Koschke, Raimar Falke, and Pierre Frenzel. 2006. Clone detection using abstract syntax suffix trees. In 2006 13th Working Conference on Reverse Engineering. 253–262. https://doi.org/10.1109/WCRE.2006.18

Digital Library

[43]

Nicholas A Kraft, Brandon W Bonds, and Randy K Smith. 2008. Cross-language Clone Detection. In SEKE. 54–59. https://doi.org/10.1.1.725.26

[44]

Ken Krugler. 2013. Krugle code search architecture. In Finding Source Code on the Web for Remix and Reuse. Springer, 103–120. https://doi.org/10.1007/978-1-4614-6596-6

[45]

Chris Lattner. [n.d.]. clang: a C language family frontend for LLVM. http://clang.llvm.org Accessed: 2020-08-12.

[46]

Jingyue Li and Michael D Ernst. 2012. CBCD: Cloned buggy code detector. In Proceedings of the 34th International Conference on Software Engineering. 310–320. https://doi.org/10.1109/ICSE.2012.6227183

[47]

Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. 2004. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code. In OSdi. 4, 289–302. https://doi.org/10.1109/TSE.2006.28

Digital Library

[48]

Cristina V Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéjàVu: a map of code duplicates on GitHub. Proceedings of the ACM on Programming Languages, 1, OOPSLA (2017), 1–28. https://doi.org/10.1145/3133908

Digital Library

[49]

Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. 2019. Aroma: Code recommendation via structural code search. Proceedings of the ACM on Programming Languages, 3, OOPSLA (2019), 1–28. https://doi.org/10.1145/3360578

Digital Library

[50]

Aleksandr Luntz. 1969. On estimation of characters obtained in statistical procedure of recognition. Technicheskaya Kibernetica, 3 (1969).

[51]

George Mathew, Christopher Parnin, and Kathryn T. Stolee. 2020. SLACC: Simion-based Language Agnostic Code Clones. International Conference on Software Engineering (ICSE), Jul, https://doi.org/10.1145/3377811.3380407

Digital Library

[52]

Philip Mayer, Michael Kirsch, and Minh Anh Le. 2017. On multi-language software development, cross-language links and accompanying tools: a survey of professional software developers. Journal of Software Engineering Research and Development, 5, 1 (2017), 1. https://doi.org/10.1186/s40411-017-0035-z

[53]

Kaisa Miettinen. 2012. Nonlinear multiobjective optimization. 12, Springer Science & Business Media.

[54]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

[55]

David S Moore and Stephane Kirkland. 2007. The basic practice of statistics. 2, WH Freeman New York.

[56]

Kawser Wazed Nafi, Tonny Shekha Kar, Banani Roy, Chanchal K Roy, and Kevin A Schneider. 2019. CLCDSA: cross language code clone detection using syntactical features and API documentation. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1026–1037. https://doi.org/10.1109/ASE.2019.00099

Digital Library

[57]

Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. 2013. Semfix: Program repair via semantic analysis. In Software Engineering (ICSE), 2013 35th International Conference on. 772–781. https://doi.org/10.1109/ICSE.2013.6606623

[58]

Trong Duc Nguyen, Anh Tuan Nguyen, Hung Dang Phan, and Tien N Nguyen. 2017. Exploring API embedding for API usages and applications. In Software Engineering (ICSE), 2017 IEEE/ACM 39th International Conference on. 438–449. https://doi.org/10.1109/ICSE.2017.47

Digital Library

[59]

Suphakit Niwattanakul, Jatsada Singthongchai, Ekkachai Naenudorn, and Supachanun Wanapu. 2013. Using of Jaccard coefficient for keywords similarity. In Proceedings of the international multiconference of engineers and computer scientists. 1, 380–384.

[60]

Oracle. [n.d.]. Java Language Keywords. tiny.cc/s7jqsz Accessed: 2020-08-12.

[61]

Terence Parr. 2013. The definitive ANTLR 4 reference. Pragmatic Bookshelf.

[62]

J-F Patenaude, Ettore Merlo, Michel Dagenais, and Bruno Laguë. 1999. Extending software quality assessment techniques to java systems. In Proceedings Seventh International Workshop on Program Comprehension. 49–56. https://doi.org/10.1109/WPC.1999.777743

[63]

Daniel Perez and Shigeru Chiba. 2019. Cross-language clone detection by learning over abstract syntax trees. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). 518–528. https://doi.org/10.1109/MSR.2019.00078

Digital Library

[64]

David M Perry, Dohyeong Kim, Roopsha Samanta, and Xiangyu Zhang. 2019. SemCluster: clustering of imperative programming assignments based on quantitative semantic features. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. 860–873. https://doi.org/10.1145/3314221.3314629

Digital Library

[65]

Python Community. [n.d.]. Python AST. docs.python.org/3/library/ast.html [Online; accessed 23-August-2019].

[66]

Chaiyong Ragkhitwetsagul and Jens Krinke. 2019. Siamese: scalable and incremental code clone search via multiple code representations. Empirical Software Engineering, 24, 4 (2019), 2236–2284. https://doi.org/10.1007/s10664-019-09697-7

Digital Library

[67]

Baishakhi Ray, Miryung Kim, Suzette Person, and Neha Rungta. 2013. Detecting and Characterizing Semantic Inconsistencies in Ported Code. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE’13). IEEE Press, Piscataway, NJ, USA. 367–377. isbn:978-1-4799-0215-6 https://doi.org/10.1109/ASE.2013.6693095

Digital Library

[68]

Steven P Reiss. 2009. Semantics-based code search. In 2009 IEEE 31st International Conference on Software Engineering. 243–253. https://doi.org/10.1109/ICSE.2009.5070525

Digital Library

[69]

PA Relf. 2004. Achieving software quality through identifier names. In Qualcon 2004. 33–34.

[70]

Caitlin Sadowski, Kathryn T Stolee, and Sebastian Elbaum. 2015. How developers search for code: a case study. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 191–201.

Digital Library

[71]

Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K Roy, and Cristina V Lopes. 2016. SourcererCC: Scaling code clone detection to big-code. In Proceedings of the 38th International Conference on Software Engineering. 1157–1168. https://doi.org/10.1145/2884781.2884877

Digital Library

[72]

Claude Sammut and Geoffrey I Webb. 2010. Leave-one-out cross-validation. Encyclopedia of machine learning, 600–601.

[73]

Susan Elliott Sim, Medha Umarji, Sukanya Ratanotayanon, and Cristina V Lopes. 2011. How well do search engines support code retrieval on the web? ACM Transactions on Software Engineering and Methodology (TOSEM), 21, 1 (2011), 1–25. https://doi.org/10.1145/2063239.2063243

Digital Library

[74]

Kathryn T Stolee and Sebastian Elbaum. 2012. Toward semantic search via SMT solver. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. 25. https://doi.org/10.1145/2393596.2393625

Digital Library

[75]

Kathryn T Stolee, Sebastian Elbaum, and Daniel Dobos. 2014. Solving the search for source code. ACM Transactions on Software Engineering and Methodology (TOSEM), 23, 3 (2014), 26. https://doi.org/10.1145/2581377

Digital Library

[76]

Kathryn T Stolee, Sebastian Elbaum, and Matthew B Dwyer. 2016. Code search with input/output queries: Generalizing, ranking, and assessment. Journal of Systems and Software, 116 (2016), 35–48. https://doi.org/10.1016/j.jss.2015.04.081

Digital Library

[77]

Fang-Hsiang Su, Jonathan Bell, Kenneth Harvey, Simha Sethumadhavan, Gail Kaiser, and Tony Jebara. 2016. Code relatives: detecting similarly behaving software. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 702–714.

Digital Library

[78]

Fang-Hsiang Su, Jonathan Bell, Gail Kaiser, and Simha Sethumadhavan. 2016. Identifying functionally similar code in complex codebases. In Program Comprehension (ICPC), 2016 IEEE 24th International Conference on. 1–10. https://doi.org/10.1109/ICPC.2016.7503720

[79]

Jeffrey Svajlenko and Chanchal K Roy. 2015. Evaluating clone detection tools with BigCloneBench. In 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME). 131–140. https://doi.org/10.1109/ICSM.2015.7332459

Digital Library

[80]

Team Github. [n.d.]. Github REST API. docs.github.com/en/rest Accessed: 2020-08-12.

[81]

Team Github. [n.d.]. Github Search. tiny.cc/ig5nsz Accessed: 2020-08-12.

[82]

Ye Tian, Handing Wang, Xingyi Zhang, and Yaochu Jin. 2017. Effectiveness and efficiency of non-dominated sorting for evolutionary multi-and many-objective optimization. Complex & Intelligent Systems, 3, 4 (2017), 247–263. https://doi.org/10.1007/s40747-017-0057-5

[83]

Danny van Bruggen. 2015. Javaparser - For processing Java code. github.com/javaparser/javaparser [Online; accessed 23-August-2019].

[84]

Andrew Walenstein and Arun Lakhotia. 2007. The software similarity problem in malware analysis. In Dagstuhl Seminar Proceedings.

[85]

Alex Wawro. [n.d.]. What exactly goes into porting a video game? BlitWorks explains. http://tiny.cc/r5jqsz Accessed: 2020-08-12.

[86]

Qi Xin and Steven P Reiss. 2017. Leveraging syntax-related code for automated program repair. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 660–670. https://doi.org/10.1109/ASE.2017.8115676

[87]

Qi Xin and Steven P Reiss. 2019. Revisiting ssFix for Better Program Repair. arXiv preprint arXiv:1903.04583.

[88]

Wuu Yang. 1991. Identifying syntactic differences between two programs. Software: Practice and Experience, 21, 7 (1991), 739–755. https://doi.org/10.1002/spe.4380210706

Digital Library

[89]

R. Yue, Z. Gao, N. Meng, Y. Xiong, X. Wang, and J. D. Morgenthaler. 2018. Automatic Clone Recommendation for Refactoring Based on the Present and the Past. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). 115–126. issn:2576-3148 https://doi.org/10.1109/ICSME.2018.00021

[90]

Kaizhong Zhang and Dennis Shasha. 1989. Simple fast algorithms for the editing distance between trees and related problems. SIAM journal on computing, 18, 6 (1989), 1245–1262. https://doi.org/10.1137/0218082

Digital Library

[91]

Qingfu Zhang and Hui Li. 2007. MOEA/D: A multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on evolutionary computation, 11, 6 (2007), 712–731. https://doi.org/10.1109/TEVC.2007.892759

Digital Library

[92]

Eckart Zitzler and Lothar Thiele. 1998. An evolutionary algorithm for multiobjective optimization: The strength pareto approach. TIK-report, 43 (1998), https://doi.org/10.1.1.40.7696

Cited By

Zhang XXiang YLiu ZHu XZhou D(2024)I2RIntelligent Data Analysis10.3233/IDA-23008228:3(807-823)Online publication date: 28-May-2024
https://dl.acm.org/doi/10.3233/IDA-230082
Zhang FLi MWu HWu T(2024)Intelligent code search aids edge software developmentJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-024-00629-513:1Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1186/s13677-024-00629-5
Zhang ATang XOney SChen YJoyner DKim MWang XXia M(2024)CFlow: Supporting Semantic Flow Analysis of Students' Code in Programming Problems at ScaleProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662025(188-199)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3662025
Show More Cited By

Index Terms

Cross-language code search using static and dynamic analyses
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Similarity measures
2. Software and its engineering
  1. Software notations and tools
    1. Software maintenance tools

Recommendations

CCCS: Contrastive Cross-Language Code Search Using Code Graph Information
Developers often search and reuse existing code snippets to improve software development efficiency during software development. Currently, researchers have proposed many code search methods. However, the search intent of existing methods is basically a ...
Advanced static analysis for decompilation using scattered context grammars
ACC'11/MMACTEE'11: Proceedings of the 13th IASME/WSEAS international conference on Mathematical Methods and Computational Techniques in Electrical Engineering conference on Applied Computing

Reverse program compilation (i.e. decompilation) is a process heavily exploited in reverse engineering. The task of decompilation is to transform a platform-specific executable into a high-level language representation, which is usually the C language. ...
Cross-Language Code Similarity and Applications in Clone Detection and Code Search

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

August 2021

1690 pages

ISBN:9781450385626

DOI:10.1145/3468264

General Chairs:
Diomidis Spinellis
Athens University of Economics and Business, Greece
,
Georgios Gousios
Facebook, Netherlands / Delft University of Technology, Netherlands
,
Program Chairs:
Marsha Chechik
University of Toronto, Canada
,
Massimiliano Di Penta
University of Sannio, Italy

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 August 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Artifacts Available / v1.1

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

ESEC/FSE '21

Sponsor:

SIGSOFT

ESEC/FSE '21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

August 23 - 28, 2021

Athens, Greece

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
832
Total Downloads

Downloads (Last 12 months)197
Downloads (Last 6 weeks)27

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang XXiang YLiu ZHu XZhou D(2024)I2RIntelligent Data Analysis10.3233/IDA-23008228:3(807-823)Online publication date: 28-May-2024
https://dl.acm.org/doi/10.3233/IDA-230082
Zhang FLi MWu HWu T(2024)Intelligent code search aids edge software developmentJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-024-00629-513:1Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1186/s13677-024-00629-5
Zhang ATang XOney SChen YJoyner DKim MWang XXia M(2024)CFlow: Supporting Semantic Flow Analysis of Students' Code in Programming Problems at ScaleProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662025(188-199)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3662025
Fan GChen SGao CXiao JZhang TFeng Z(2024)RAPID: Zero-Shot Domain Adaptation for Code Search with Pre-Trained ModelsACM Transactions on Software Engineering and Methodology10.1145/364154233:5(1-35)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3641542
Gorchakov AAnatolievna Demidova L(2024)Methods and Algorithms for Cross-Language Search of Source Code Fragments2024 International Conference on Information Technologies (InfoTech)10.1109/InfoTech63258.2024.10701403(1-4)Online publication date: 11-Sep-2024
https://doi.org/10.1109/InfoTech63258.2024.10701403
Di Grazia LPradel M(2023)Code Search: A Survey of Techniques for Finding CodeACM Computing Surveys10.1145/356597155:11(1-31)Online publication date: 9-Feb-2023
https://dl.acm.org/doi/10.1145/3565971
Zhang AChen YOney S(2023)RunEx: Augmenting Regular-Expression Code Search with Runtime Values2023 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VL-HCC57772.2023.00024(139-147)Online publication date: 3-Oct-2023
https://doi.org/10.1109/VL-HCC57772.2023.00024
Mehrotra NSharma AJindal APurandare R(2023)Improving Cross-Language Code Clone Detection via Code Representation Learning and Graph Neural NetworksIEEE Transactions on Software Engineering10.1109/TSE.2023.331179649:11(4846-4868)Online publication date: 6-Sep-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3311796
Geng MDong DLu P(2023)Input Transformation for Pre-Trained-Model-Based Cross-Language Code Search2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)10.1109/QRS-C60940.2023.00021(403-412)Online publication date: 22-Oct-2023
https://doi.org/10.1109/QRS-C60940.2023.00021
Geng MDong DLu P(2023)Hierarchical Semantic Graph Construction and Pooling Approach for Cross-language Code Retrieval2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)10.1109/QRS-C60940.2023.00020(393-402)Online publication date: 22-Oct-2023
https://doi.org/10.1109/QRS-C60940.2023.00020
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents