Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Out of step: : Code clone detection for mobile apps across different language codebases

Published: 18 July 2024 Publication History

Abstract

Clone detection provides insight about replicated fragments in a code base. With the rise of multi-language code bases, new techniques addressing cross-language code clone detection enable the analysis of polyglot systems. Such techniques have not yet been applied to the mobile apps' domain, which are naturally polyglot. Native mobile app developers must synchronize their code base in at least two different programming languages. App synchronization is a difficult and time-consuming maintenance task, as features can rapidly diverge between platforms, and feature identification must be performed manually. The end goal of this work is to provide an analysis framework to reduce the impact of app synchronization. A first step in this direction consists in a structural algorithm for cross-language clone detection, called Out of Step, exploiting the idea behind enriched concrete syntax trees. Such trees are used as a common intermediate representation built from programming languages' grammars, to detect similarities between app code bases. Our technique finds code similarities with over 80% for the evaluation of language features, where Type 1-3 clones are manually injected for the analysis of both single- and cross-language cases for Kotlin and Dart. We validate the feasibility and correctness of our approach through the evaluation of the main language constructs for Kotlin and Dart. To validate the effectiveness we use a first case study detecting clones between 12 sorting algorithms across Kotlin and Dart, identifying clone similarities with a precision between 67% and 95%. Finally, we use a corpus of 144 mobile apps implemented in Kotlin and Dart, correctly identifying code similarities for the full application logic.

References

[1]
Q.U. Ain, W.H. Butt, M.W. Anwar, F. Azam, B. Maqbool, A systematic review on code clone detection, IEEE Access 7 (2019) 86121–86144,.
[2]
M. Mondal, C.K. Roy, K.A. Schneider, A survey on clone refactoring and tracking, J. Syst. Softw. 159 (2020) 27,.
[3]
R. Nunkesser, Beyond web/native/hybrid: a new taxonomy for mobile app development, in: Proceedings of the International Conference on Mobile Software Engineering and Systems, MOBILESoft'18, Association for Computing Machinery, New York, NY, USA, 2018, pp. 214–218,.
[4]
P. Gokhale, S. Singh, Multi-platform strategies, approaches and challenges for developing mobile applications, in: International Conference on Circuits, Systems, Communication and Information Technology Applications, CSCITA'14, IEEE, 2014, pp. 289–293.
[5]
A.N. Duc, A. Mockus, R. Hackbarth, J. Palframan, Forking and coordination in multi-platform development: a case study, in: Proceedings of the Symposium on Empirical Software Engineering and Measurement, ESEM'14, ACM, 2014, pp. 1–10.
[6]
N. Göde, R. Koschke, Frequency and risks of changes to clones, in: Proceedings of the International Conference on Software Engineering, ICSE'11, 2011, pp. 311–320.
[7]
N. Patkar, M. Ghafari, O. Nierstrasz, S. Hotomski, Caveats in eliciting mobile app requirements, in: Proceedings of the Evaluation and Assessment in Software Engineering, EASE'20, ACM, New York, NY, USA, 2020, pp. 180–189,.
[8]
S.B. Ankali, L. Parthiban, Detection and classification of cross-language code clone types by filtering the nodes of antlr-generated parse tree, Int. J. Intell. Syst. Appl. 13 (2021).
[9]
W. Li, J. Ming, X. Luo, H. Cai, {PolyCruise}: a {Cross-Language} dynamic information flow analysis, in: 31st, in: USENIX Security Symposium (USENIX Security, vol. 22, 2022, pp. 2513–2530.
[10]
S. Jimenez, G. Rakic, S. Takahashi, N. Cardozo, Cross-language clone detection for mobile apps, in: Ibero-American Conference on Software Engineering, CIbSE'23, SBC, 2023, pp. 107–121.
[11]
Parr, T. : The definitive ANTLR 4 reference. vol. 1, 1 ed., 2013. URL: https://pragprog.com/titles/tpantlr2/the-definitive-antlr-4-reference/.
[12]
G. Rakić, Z. Budimac, Introducing enriched concrete syntax trees, in: Proceedings of the International Multiconference on, Information Society, 2013, pp. 211–214.
[13]
Z. Budimac, G. Rakić, M. Savić, Ssqsa architecture, in: Balkan Conference in Informatics, BCI'12, ACM, New York, NY, USA, 2012, pp. 287–290,.
[14]
T. Vislavski, G. Rakić, N. Cardozo, Z. Budimac Licca, A tool for cross-language clone detection, in: International Conference on Software Analysis, Evolution and Reengineering, SANER'18, IEEE, 2018, pp. 512–516.
[15]
G. Rakić, Extendable and adaptable framework for input language independent static analysis, Ph.D. thesis University of Novi Sad (Serbia), 2015.
[16]
V.R. Basili, G. Caldiera, H.D. Rombach, The Goal Question Metric Approach, Encyclopedia of Software Engineering, 1994, pp. 528–532.
[17]
N. Cardozo, Clonecorp: Cross-language clone detection dataset, volume v1.0.0, Zenodo, 2024.
[18]
H. Sajnani, V. Saini, J. Svajlenko, C.K. Roy, C.V. Lopes, SourcererCC: scaling code clone detection to big-code, in: International Conference on Software Engineering, IEEE Computer Society, 2016, pp. 1157–1168,.
[19]
Y. Yuan, Y. Guo, Boreas: an accurate and scalable token-based approach to code clone detection, in: Proceedings of the International Conference on Automated Software Engineering, ASE'12, 2012, pp. 286–289.
[20]
C.K. Roy, J.R. Cordy, R. Koschke, Comparison and evaluation of code clone detection techniques and tools: a qualitative approach, Sci. Comput. Program. 74 (2009) 470–495.
[21]
X. Cheng, Z. Peng, L. Jiang, H. Zhong, H. Yu, J. Zhao, Clcminer: detecting cross-language clones without intermediates, IEICE Trans. Inf. Syst. 100 (2017) 273–284.
[22]
S. Bellon, R. Koschke, G. Antoniol, J. Krinke, E. Merlo, Comparison and evaluation of clone detection tools, IEEE Trans. Softw. Eng. 33 (2007) 577–591.
[23]
G. Shobha, A. Rana, V. Kansal, S. Tanwar, Code clone detection—a systematic review, emerging technologies in data mining and information security, in: Proceedings of IEMIS 2020, vol. 2, 2021, pp. 645–655.
[24]
M. Zakeri-Nasrabadi, S. Parsa, M. Ramezani, C. Roy, M. Ekhtiarzadeh, A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges, J. Syst. Softw. (2023).
[25]
D. Rattan, R. Bhatia, M. Singh, Software clone detection: a systematic review, Inf. Softw. Technol. 55 (2013) 1165–1199.
[26]
N.A. Kraft, B.W. Bonds, R.K. Smith, Cross-language clone detection, in: SEKE, 2008, pp. 54–59.
[27]
F. Al-Omari, I. Keivanloo, C.K. Roy, J. Rilling, Detecting clones across Microsoft. net programming languages, in: 2012 19th Working Conference on Reverse Engineering, IEEE, 2012, pp. 405–414.
[28]
L. Nichols, M. Emre, B. Hardekopf, Structural and nominal cross-language clone detection, in: R. Hähnle, W. van der Aalst (Eds.), Fundamental Approaches to Software Engineering, FASE'19, Springer International Publishing, 2019, pp. 247–263.
[29]
F. Zhang, L. Li, C. Liu, Q. Zeng, Flow chart generation-based source code similarity detection using process mining, Sci. Program. 2020 (2020),.
[30]
C. Tao, Q. Zhan, X. Hu, X. Xia, C4: contrastive cross-language code clone detection, in: Proceedings of the IEEE/ACM International Conference on Program Comprehension, ICPC'22, ACM, New York, NY, USA, 2022, pp. 413–424,.
[31]
S.N. Pinku, D. Mondal, C.K. Roy, C4: contrastive cross-language code clone detection, in: Proceedings of the IEEE/ACM International Conference on Program Comprehension, ICPC'23, ACM, New York, NY, USA, 2023,.
[32]
K.W. Nafi, T.S. Kar, B. Roy, C.K. Roy, K.A. Schneider, CLCDSA: cross language code clone detection using syntactical features and api documentation, in: IEEE/ACM International Conference on Automated Software Engineering, ASE'19, IEEE, 2019, pp. 1026–1037.
[33]
J. Li, C. Tao, Z. Jin, F. Liu, J.A. Li, G. Li, ZC3: zero-shot cross-language code clone detection, in: International Conference on Automated Software Engineering, ASE'23, ACM, 2023, p. 13.
[34]
D. Perez, S. Chiba, Cross-language clone detection by learning over abstract syntax trees, in: International Conference on Mining Software Repositories, MSR'19, IEEE, 2019, pp. 518–528.
[35]
N. Mehrotra, A. Sharma, A. Jindal, R. Purandare, Improving cross-language code clone detection via code representation learning and graph neural networks, Trans. Softw. Eng. (2023).
[36]
M.A. Yahya, D.-K. Kim, Clcd-I: cross-language clone detection by using deep learning with infercode, Comput. 12 (2023) 12.
[37]
J. Yang, Y. Zhang, O. Ruan, Cross-language source code clone detection based on graph neural network, 2023, Available at SSRN 4594864.
[38]
C.K. Roy, J.R. Cordy, Benchmarks for software clone detection: a ten-year retrospective, in: 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE, 2018, pp. 26–37.
[39]
J. Svajlenko, C.K. Roy Bigclonebench, Code Clone Analysis: Research, Tools, and Practices, 2021, pp. 93–105.
[40]
R. Koschke, R. Falke, P. Frenzel, Clone detection using abstract syntax suffix trees, in: 2006 13th Working Conference on Reverse Engineering, IEEE, 2006, pp. 253–262.
[41]
R. Falke, P. Frenzel, R. Koschke, Empirical evaluation of clone detection using syntax suffix trees, Empir. Softw. Eng. 6 (2008) 601–643.
[42]
J. Svajlenko, C.K. Roy, The mutation and injection framework: evaluating clone detection tools with mutation analysis, Trans. Softw. Eng. 47 (2019) 1060–1087.
[43]
A. Paiva, J. Oliveira, E. Figueiredo, Dolly or shaun? A survey to verify code clones detected using similar sequences of method calls, 2016.
[44]
T. Vislavski, G. Rakic, Code clone benchmarks overview, in: SQAMIA, 2018.
[45]
S. Wagner, A. Abdulkhaleq, I. Bogicevic, J.-P. Ostberg, J. Ramadani, How are functionally similar code clones syntactically different? An empirical study and a benchmark, PeerJ Comput. Sci. 2 (2016) e49.
[46]
A.I. Alam, P.R. Roy, F. Al-Omari, C.K. Roy, B. Roy, K.A. Schneider, Gptclonebench: a comprehensive benchmark of semantic clones and cross-language clones using gpt-3 model and semanticclonebench, in: 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE, 2023, pp. 1–13.
[47]
A. Walker, T. Cerny, E. Song, Open-source tools and benchmarks for code-clone detection, ACM SIGAPP Appl. Comput. Rev. 19 (2020) 28–39,.
[48]
J.R. Cordy, C.K. Roy, The NiCad clone detector, in: IEEE International Conference on Program Comprehension, 2011, pp. 219–220,.
[49]
T. Kamiya, S. Kusumoto, K. Inoue, CCFinder: a multilinguistic token-based code clone detection system for large scale source code, IEEE Trans. Softw. Eng. 28 (2002) 654–670,.
[50]
Harris, S. (2018): Simian - similarity analyser | duplicate code detection for the enterprise | overview. http://www.harukizaemon.com/simian/index.html.
[51]
Gordon, S.; Bannier, B. (2021): xsgordon/duplo-fork: C/C++/Java duplicate source code block finder. https://github.com/dlidstrom/Duplo.
[52]
N. Göde, R. Koschke, Incremental clone detection, in: European Conference on Software Maintenance and Reengineering, SMR'09, 2009, pp. 219–228,.
[53]
(2021): PMD: an extensible cross-language static code analyzer. https://pmd.github.io/latest/.
[54]
Lingxiao, J.; Ghassan, M.; Zhendong, S.; Stephane, G. (2018): skyhover/deckard: code clone detection, clone-related bug detection, sematic clone analysis. https://github.com/skyhover/Deckard.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Science of Computer Programming
Science of Computer Programming  Volume 236, Issue C
Sep 2024
300 pages

Publisher

Elsevier North-Holland, Inc.

United States

Publication History

Published: 18 July 2024

Author Tags

  1. Code-clone detection
  2. Mobile apps
  3. Program analysis

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media