Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

BigCloneBench

  • Chapter
  • First Online:
Code Clone Analysis

Abstract

Many clone detection tools and techniques have been created to tackle various use-cases, including syntactical clone detection, semantic clone detection, inter-project clone detection, large-scale clone detection and search, and so on. While a few clone benchmarks are available, none target this breadth of usage. BigCloneBench is a clone benchmark designed to evaluate clone detection tools across a variety of use-cases. It was built by mining a large inter-project source repository for functions implementing known functionalities. This produced a large benchmark of inter-project and intra-project semantic clones across the full spectrum of syntactical similarity. The benchmark is augmented with an evaluation framework named BigCloneEval which simplifies tool evaluation studies and allows the user to slice the benchmark based on the clone properties in order to evaluate for a particular use-case. We have used BigCloneBench in a number of studies that demonstrate its value, as well as show where it has been used by the research community. In this chapter, we discuss the clone benchmarking theory and the existing benchmarks, describe the BigCloneBench creation process, and overview the BigCloneEval evaluation procedure. We conclude by summarizing BigCloneBench’s usage in the literature, and present ideas for future improvements and expansion of the benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/jeffsvajlenko/BigCloneEval.

References

  1. Ambient Software Evolution Group. SECold IJaDataset 2.0. https://sites.google.com/site/asegsecold/projects/seclone, January 2013

  2. J. Svajlenko, C. Roy. Bigclonebench. http://www.jeff.svajlenko.com/bigclonebench.html

  3. J. Svajlenko, C.K. Roy, Bigcloneeval: a clone detection tool evaluation framework with bigclonebench, in International Conference on Software Maintenance and Evolution (2016)

    Google Scholar 

  4. J. Svajlenko, C. Roy, The mutation and injection framework: evaluating clone detection tools with mutation analysis. IEEE Trans. Softw. Eng. (01):1

    Google Scholar 

  5. J. Svajlenko, C.K. Roy, S. Duszynski, Forksim: generating software forks for evaluating cross-project similarity analysis tools. In International Working Conference on Source Code Analysis and Manipulation (SCAM) (2013), pp 37–42

    Google Scholar 

  6. J. Svajlenko, C.K. Roy, Evaluating modern clone detection tools, vol. 10 (2014)

    Google Scholar 

  7. Y. Yuki, Y. Higo, K. Hotta, S. Kusumoto, Generating clone references with less human subjectivity, in 2016 IEEE 24th International Conference on Program Comprehension (ICPC) (2016), pp 1–4

    Google Scholar 

  8. T. Lavoie, E. Merlo, Automated type-3 clone oracle using levenshtein metric, in Proceedings of the 5th International Workshop on Software Clones, IWSC ’11 (ACM, New York, NY, USA, 2011), pp. 34–40

    Google Scholar 

  9. D.E. Krutz, W. Le, A code clone oracle, in Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014 (ACM, New York, NY, USA), pp. 388–391 (2014)

    Google Scholar 

  10. H. Sajnani, V. Saini, C. Lopes, A parallel and efficient approach to large scale clone detection. J. Softw. Evol. Process, 27(6), 402–429 (2015). JSME-13-0129.R2

    Google Scholar 

  11. J. Svajlenko, I. Keivanloo, C.K. Roy, Scaling classical clone detection tools for ultra-large datasets: an exploratory study, in International Workshop on Software Clones, pp. 16–22 (2013)

    Google Scholar 

  12. I. Keivanloo, J. Rilling, P. Charland, Internet-scale real-time code clone search via multi-level indexing, in Working Conference on Reverse Engineering, pp. 23–27 (2011)

    Google Scholar 

  13. S. Bellon, R. Koschke, G. Antoniol, J. Krinke, E. Merlo, Comparison and evaluation of clone detection tools. IEEE Trans. Softw. Eng. 33(9), 577–591 (2007)

    Article  Google Scholar 

  14. B.S. Baker, Finding clones with dup: analysis of an experiment. IEEE Trans. Softw. Eng. 33(9), 608–621 (2007)

    Article  Google Scholar 

  15. A. Charpentier, J.-R. Falleri, D. Lo, L. Réveillère, An empirical assessment of Bellon’s clone benchmark, in International Conference on Evaluation and Assessment in Software Engineering, EASE ’15 (ACM, New York, NY, USA), pp. 20:1–20:10 (2015)

    Google Scholar 

  16. A. Charpentier, J.-R. Falleri, F. Morandat, E.B.H. Yahia, L. Réveillère, Raters’ reliability in clone benchmarks construction. Empirical Softw. Eng. 22(1), 235–258 (2017)

    Article  Google Scholar 

  17. C.K. Roy, J.R. Cordy, Towards a mutation-based automatic framework for evaluating code clone detection tools, in Proceedings of the 2008 C3S2E Conference, C3S2E ’08 (ACM, New York, NY, USA), pp. 137–140 (2008)

    Google Scholar 

  18. A. Walenstein, N. Jyoti, J. Li, Y. Yang, A. Lakhotia, Problems creating task-relevant clone detection reference data. WCRE, 285–294 (2003)

    Google Scholar 

  19. S. Bellon, Stefan Bellon’s clone detector benchmark. http://www.softwareclones.org/research-data.php

  20. S. Bellon, Vergleich von Techniken zur Erkennung duplizierten Quellcodes. Master’s thesis, Universität Stuttgart, 2002. 156 pp

    Google Scholar 

  21. H. Murakami, Y. Higo, S. Kusumoto, A dataset of clone references with gaps. MSR’14, 412–415 (2014)

    Google Scholar 

  22. C.K. Roy, J.R. Cordy, R. Koschke, Comparison and evaluation of code clone detection techniques and tools: a qualitative approach. Sci. Comput. Program. 74(7), 470–495 (2009). May

    Article  MathSciNet  Google Scholar 

  23. J. Svajlenko, C.K. Roy, Evaluating clone detection tools with bigclonebench, in International Conference on Software Maintenance and Evolution, pp. 131–140 (2015)

    Google Scholar 

  24. J. Svajlenko, C.K. Roy, Fast and flexible large-scale clone detection with cloneworks, in Proceedings of the 39th International Conference on Software Engineering Companion, ICSE-C ’17 (IEEE Press, Piscataway, NJ, USA), pp. 27–30 (2017)

    Google Scholar 

  25. H. Sajnani, V. Saini, J. Svajlenko, C.K. Roy, C.V. Lopes, Sourcerercc: scaling code clone detection to big-code, in Proceedings of the 38th International Conference on Software Engineering (ACM), pp. 1157–1168 (2016)

    Google Scholar 

  26. P. Wang, J. Svajlenko, Y. Wu, Y. Xu, C. K. Roy, Ccaligner: a token based large-gap clone detector, in 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pp. 1066–1077 (2018)

    Google Scholar 

  27. J. Svajlenko, J.F. Islam, I. Keivanloo, C.K. Roy, M.M. Mia, Towards a big data curated benchmark of inter-project code clones, in Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution, ICSME ’14 (IEEE Computer Society, Washington, DC, USA), pp. 476–480 (2014)

    Google Scholar 

  28. J. Svajlenko, Large-scale clone detection and benchmarking. Ph.D. thesis, University of Saskatchewan, Saskatoon, Canada, 2 2018

    Google Scholar 

  29. J. Svajlenko, C.K. Roy, Cloneworks: a fast and flexible large-scale near-miss clone detection tool, in 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), pp. 177–179 (2017)

    Google Scholar 

  30. Y. Gao, Z. Wang, S. Liu, L. Yang, W. Sang, Y. Cai, Teccd: a tree embedding approach for code clone detection, in 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 145–156 (2019)

    Google Scholar 

  31. J. Liu, T. Wang, C. Feng, H. Wang, D. Li, A large-gap clone detection approach using sequence alignment via dynamic parameter optimization. IEEE Access 7 (2019)

    Google Scholar 

  32. S. Zhou, H. Zhong, B. Shen, Slampa: Recommending code snippets with statistical language model, in 2018 25th Asia-Pacific Software Engineering Conference (APSEC), pp. 79–88 (2018)

    Google Scholar 

  33. J. Zhang, X. Wang, H. Zhang, H. Sun, K. Wang, X. Liu, A novel neural source code representation based on abstract syntax tree, in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 783–794 (2019)

    Google Scholar 

  34. H. Yu, W. Lam, L. Chen, G. Li, T. Xie, Q. Wang, Neural detection of semantic code clones via tree-based convolution, in 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pp. 70–80 (2019)

    Google Scholar 

  35. J. Zeng, K. Ben, X. Li, X. Zhang, Fast code clone detection based on weighted recursive autoencoders. IEEE Access 7, 125062–125078 (2019)

    Article  Google Scholar 

  36. D.K. Kim. A deep neural network-based approach to finding similar code segments. IEICE Trans. Inf. Syst. E103.D(4):874–878 (2020)

    Google Scholar 

  37. M. Hammad, Ö. Babur, H.A. Basit, M. van den Brand, Deepclone: modeling clones to generate code predictions, in Reuse in Emerging Software Engineering Practices (Springer International Publishing), pp 135–151 (2020)

    Google Scholar 

  38. F. Zhang, H. Niu, I. Keivanloo, Y. Zou, Expanding queries for code search using semantically related API class-names. Trans. Softw. Eng. 44(11), 1070–1082 (2018)

    Article  Google Scholar 

  39. Y. Fujiwara, N. Yoshida, E. Choi, K. Inoue, Code-to-code search based on deep neural network and code mutation, in 2019 IEEE 13th International Workshop on Software Clones (IWSC), pp. 1–7 (2019)

    Google Scholar 

  40. I. Keivanloo, F. Zhang, Y. Zou, Threshold-free code clone detection for a large-scale heterogeneous java repository, in 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 201–210 (2015)

    Google Scholar 

  41. T. Wang, M. Harman, Y. Jia, J. Krinke, Searching for better configurations: a rigorous approach to clone evaluation, in Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013 (ACM, New York, NY, USA), pp. 455–465 (2013)

    Google Scholar 

  42. L. Li, H. Feng, W. Zhuang, N. Meng, B. Ryder, Cclearner: a deep learning-based clone detection approach, in International Conference on Software Maintenance and Evolution, pp. 249–260 (2017)

    Google Scholar 

  43. S. Zhou, B. Shen, H. Zhong, Lancer: your code tell me what you need, in International Conference on Automated Software Engineering (ASE), pp. 1202–1205 (2019)

    Google Scholar 

  44. C. Guo, D. Huang, N. Dong, Q. Ye, J. Xu, Y. Fan, H. Yang, Y. Xu. Deep review sharing, in 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 61–72 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Svajlenko, J., Roy, C.K. (2021). BigCloneBench. In: Inoue, K., Roy, C.K. (eds) Code Clone Analysis. Springer, Singapore. https://doi.org/10.1007/978-981-16-1927-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-1927-4_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-1926-7

  • Online ISBN: 978-981-16-1927-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics