Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Analyzing the adoption of database management systems throughout the history of open source projects

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

The appropriate selection of DBMSs (Database Management Systems) is relevant for the success of modern software applications. Relational DBMSs are popular for structured data management, while non-relational systems, such as NoSQL databases, have gained traction for handling unstructured data and scaling in dynamic environments. These varying DBMS characteristics have led to an increasing trend of combining multiple systems within a single application to meet diverse requirements. However, existing work does not analyze whether DBMS are replaced or used together in a broad scope. This paper presents an empirical study on DBMS usage across 362 popular open-source Java projects hosted on GitHub. Our analysis focuses on the most widely adopted DBMSs, both relational and non-relational, as ranked by the DB-Engines website. By examining DBMS integration patterns, stability, and migration trends, we aim to uncover insights into the factors driving DBMS choices in real-world applications. We investigated DBMS popularity, usage stability, migration patterns, synergy among DBMS, and the role of Object-Relational Mappers (ORMs) in DBMS interactions. We applied heuristics to detect DBMS presence, tracked usage trends over time, and analyzed the coexistence and replacement of different systems. We also examined ORM frameworks to understand their impact on DBMS management and query-building practices. Our findings reveal that MySQL and PostgreSQL are the most popular DBMSs, although some projects replace them with other DBMSs. While certain popular DBMSs (e.g., Redis, MongoDB) usually stay in the project after they are introduced (and therefore their adoption is stable), others (e.g., HyperSQL) are frequently replaced as project requirements evolve. We also observed patterns of polyglot persistence, where multiple DBMSs coexist to handle varied data types. Notably, Informix is a relational DBMS designed to handle real-time data processing and is always used with other DBMSs. Additionally, we identified ORM usage trends that facilitate database interactions and mitigate migration complexities. These insights contribute to a broader understanding of DBMS adoption, providing valuable guidance for developers and architects in selecting and managing database infrastructure over time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Data Availability

All data and code used in our analysis are publicly available in our GitHub repository https://github.com/gems-uff/db-mining.

Notes

  1. https://db-engines.com/en/ranking_definition

  2. https://hibernate.org/

  3. https://openjpa.apache.org/documentation.html

  4. https://www.eclipse.org/eclipselink/

  5. https://mybatis.org/mybatis-3/index.html

  6. https://spring.io/

  7. https://www.jooq.org/

  8. https://github.com/moparisthebest/JdbcMapper

  9. https://www.tiobe.com/tiobe-index/

  10. https://tinyurl.com/mr2t77st

  11. http://www.github.com/bitcoin-wallet/bitcoin-wallet

  12. https://github.com/TeamNewPipe/NewPipe

  13. https://tinyurl.com/mr2t77st

  14. We used a Ryzen 7735HS with 16GB RAM GDDR5, 512GB SSD nvme 4.0 to measure the performance of our analysis on a sample project.

  15. https://www.philippe-fournier-viger.com/spmf/index.php?link=citations.php

  16. https://patterncounter.readthedocs.io/en/latest/

  17. https://tinyurl.com/y25cm6e5

  18. https://tinyurl.com/3f8kp763

  19. http://tinyurl.com/3acawe3k

  20. https://patterncounter.readthedocs.io/en/latest/

  21. http://www.h2database.com/html/history.html

  22. https://www.ibm.com/products/informix

  23. https://www.ibm.com/docs/en/informix-servers/12.10?topic=overview-getting-started

  24. https://postgis.net/

  25. https://tinyurl.com/2nyt757h

  26. https://tinyurl.com/5dbvdkju

  27. https://github.com/scoophealth/oscar

References

  • Agarwal S (2013) Data mining: Data mining concepts and techniques. In: 2013 International conference on machine intelligence and research advancement, IEEE, Institute of Electrical and Electronics Engineers, Katra, India, pp 203–207. https://doi.org/10.1109/ICMIRA.2013.45

  • Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB, Santiago, Chile, vol 1215, pp 487–499

  • Allamanis M, Sutton C (2013) Mining source code repositories at massive scale using language modeling. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 207–216. https://doi.org/10.1109/MSR.2013.6624029

  • Borges H, Tulio Valente M (2018) What’s in a github star? Understanding repository starring practices in a social coding platform. J Syst Softw 146:112–129. https://doi.org/10.1016/j.jss.2018.09.016

    Article  Google Scholar 

  • Cattell R (2011) Scalable sql and nosql data stores. Acm Sigmod Record 39(4):12–27

    Article  MATH  Google Scholar 

  • Davoudian A, Chen L, Liu M (2018) A survey on nosql stores. ACM Comput Surv (CSUR) 51(2):1–43

    Article  MATH  Google Scholar 

  • DB-Engines (2022) Db-engines ranking. https://db-engines.com/en/ranking. Accessed 28 Feb 2022

  • Dimolikas K, Zarras AV, Vassiliadis P (2020) A study on the effect of a table’s involvement in foreign keys to its schema evolution. In: Conceptual modeling: 39th international conference, ER 2020, Vienna, Austria, November 3–6, 2020, Proceedings 39. Springer, pp 456–470

  • Elmasri R, Navathe S (2010) Fundamentals of database systems. Pearson

  • Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recognit 1(1):54–77

    MATH  Google Scholar 

  • Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The spmf open-source data mining library version 2. In: Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2016, Riva del Garda, Italy, September 19–23, 2016, Proceedings, Part III 16. Springer, Springer International Publishing, Cham, pp 36–40

  • Fowler M (2011) Polyglot persistence. https://martinfowler.com/bliki/PolyglotPersistence.html. Accessed 27 Jan 2025

  • Gamma E, Helm R, Johnson R, Vlissides JM (1994) Design patterns: elements of reusable object-oriented software, 1st edn. Addison-Wesley Professional. http://www.amazon.com/Design-Patterns-Elements-Reusable-Object-Oriented/dp/0201633612/ref=ntt_at_ep_dpi_1

  • Gessert F, Wingerath W, Friedrich S, Ritter N (2017) Nosql database systems: a survey and decision guidance. Comput Sci-Res Dev 32:353–365

    Article  Google Scholar 

  • Goeminne M, Decan A, Mens T (2014) Co-evolving code-related and database-related changes in a data-intensive software system. In: 2014 Software evolution week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), Institute of Electrical and Electronics Engineers, Antwerp, Belgium, pp 353–357. https://doi.org/10.1109/CSMR-WCRE.2014.6747193

  • Goeminne M, Mens T (2015) Towards a survival analysis of database framework usage in java projects. In: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 551–555. https://doi.org/10.1109/ICSM.2015.7332512

  • Han J, Pei J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th international conference on data engineering. IEEE, pp 215–224

  • Henry S, Kafura D (1981) Software structure metrics based on information flow. IEEE Trans Softw Eng SE–7(5):510–518. https://doi.org/10.1109/TSE.1981.231113

    Article  MATH  Google Scholar 

  • Johnson R (2005) J2ee development frameworks. Computer 38(1):107–110. https://doi.org/10.1109/MC.2005.22

    Article  MATH  Google Scholar 

  • JRebel (2020) 2020 java technology report. https://www.jrebel.com/blog/2020-java-technology-report. Accessed 09 Feb 2022

  • Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories, ACM, Association for Computing Machinery, New York, NY, USA, MSR 2014, pp 92–101. https://doi.org/10.1145/2597073.2597074

  • Keith M, Schincariol M (2006) Pro EJB 3: Java Persistence API (Pro). Apress, USA

    Book  MATH  Google Scholar 

  • Keith M, Schincariol M, Nardone M (2018) Pro JPA 2 in Java EE 8: An in-depth guide to java persistence APIs. Apress L. P, Berkeley, CA

    Book  MATH  Google Scholar 

  • Kleppmann M (2017) Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems. O’Reilly Media. https://books.google.com.br/books?id=BM7woQEACAAJ

  • Linares-Vásquez M, Li B, Vendome C, Poshyvanyk D (2015) How do developers document database usages in source code? (n). In: 2015 30th IEEE/ACM international conference on Automated Software Engineering (ASE), pp 36–41. https://doi.org/10.1109/ASE.2015.67

  • Lyu Y, Gui J, Wan M, Halfond WGJ (2017) An empirical study of local database usage in android applications. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 444–455. https://doi.org/10.1109/ICSME.2017.75

  • McHugh M (2012) Interrater reliability: the kappa statistic. Biochem Med 22(3):276–82

    Article  MathSciNet  MATH  Google Scholar 

  • OpenHMS (2021) Sql query builders. https://openhms.sourceforge.io/sqlbuilder/. Accessed 23 May 2023

  • Qiu D, Li B, Su Z (2013) An empirical analysis of the co-evolution of schema and code in database applications. In: Proceedings of the 2013 9th Joint meeting on foundations of software engineering, ACM, Association for Computing Machinery, New York, NY, USA, ESEC/FSE 2013, pp 125–135. https://doi.org/10.1145/2491411.2491431

  • Raschka S (2018) Mlxtend: providing machine learning and data science utilities and extensions to python’s scientific computing stack. J Open Source Softw 3(24):638. https://doi.org/10.21105/joss.00638. http://joss.theoj.org/papers/10.21105/joss.00638

  • Roy-Hubara N, Shoval P, Sturm A (2022) Selecting databases for polyglot persistence applications. Data Knowl Eng 137:101950. https://doi.org/10.1016/j.datak.2021.101950. https://www.sciencedirect.com/science/article/pii/S0169023X21000744

  • Sahatqija K, Ajdari J, Zenuni X, Raufi B, Ismaili F (2018) Comparison between relational and nosql databases. In: 2018 41st international convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE, Institute of Electrical and Electronics Engineers, Opatija, Croatia, pp 0216–0221

  • Scherzinger S, Sidortschuck S (2020) An empirical study on the design and evolution of nosql database schemas. In: Conceptual modeling: 39th international conference, ER 2020, Vienna, Austria, November 3–6, 2020, Proceedings 39. Springer, pp 441–455

  • Turkish Technology (2024) N+1 select problem. https://medium.com/@turkishtechnology/n-1-select-problem-21a3717325b6. Accessed 17 Nov 2024

  • Vassiliadis P (2021) Profiles of schema evolution in free open source software projects. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, pp 1–12

  • Yan C, Cheung A, Yang J, Lu S (2017) Understanding database performance inefficiencies in real-world web applications. Association for Computing Machinery, New York, NY, USA, pp 1299–1308. https://doi.org/10.1145/3132847.3132954

  • Yang J, Subramaniam P, Lu S, Yan C, Cheung A (2018) How \(<\)i\(>\)not\(<\)/i\(>\) to structure your database-backed web applications: a study of performance bugs in the wild. In: Proceedings of the 40th international conference on software engineering. Association for Computing Machinery, New York, NY, USA, ICSE ’18, pp 800–810. https://doi.org/10.1145/3180155.3180194

Download references

Acknowledgements

The authors would like to thank the National Science Foundation (NSF) grants 2247929, 2303042, 2303612, and 2303612; CNPq grants 305020/2019-6, 311955/2020-7, and 309410/2023-1; CNPq/MCTI/FNDCT grant 408812/2021-4; MCTIC/CGI/FAPESP grant 2021/06662-1; Fundação Araucaria - Parana State Government grant PRD2023361000043; and FAPERJ grants E26/201.038/2021 and E-26/210.478/2024, for the financial support. This paper has immensely benefited from the comments and suggestions of the three anonymous reviewers, to whom we are deeply thankful. We also acknowledge the use of Grammarly and ChatGPT 4.o for the improvement of spelling, grammar, vocabulary, and style of the text. We also utilized ChatGPT 4.o to speed up the writing of Python code and to classify projects, as mentioned earlier. All suggestions were carefully examined, tested, and often corrected by us, whereby we take full responsibility for the form and content of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Camila A. Paiva.

Ethics declarations

Conflict of interest

The authors declare that there are no financial or non-financial interests that are directly or indirectly related to this work.

Additional information

Communicated by: Denys Poshyvanyk.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Paiva, C.A., Maximino, R., Paiva, F. et al. Analyzing the adoption of database management systems throughout the history of open source projects. Empir Software Eng 30, 71 (2025). https://doi.org/10.1007/s10664-025-10627-z

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-025-10627-z

Keywords