Abstract
The appropriate selection of DBMSs (Database Management Systems) is relevant for the success of modern software applications. Relational DBMSs are popular for structured data management, while non-relational systems, such as NoSQL databases, have gained traction for handling unstructured data and scaling in dynamic environments. These varying DBMS characteristics have led to an increasing trend of combining multiple systems within a single application to meet diverse requirements. However, existing work does not analyze whether DBMS are replaced or used together in a broad scope. This paper presents an empirical study on DBMS usage across 362 popular open-source Java projects hosted on GitHub. Our analysis focuses on the most widely adopted DBMSs, both relational and non-relational, as ranked by the DB-Engines website. By examining DBMS integration patterns, stability, and migration trends, we aim to uncover insights into the factors driving DBMS choices in real-world applications. We investigated DBMS popularity, usage stability, migration patterns, synergy among DBMS, and the role of Object-Relational Mappers (ORMs) in DBMS interactions. We applied heuristics to detect DBMS presence, tracked usage trends over time, and analyzed the coexistence and replacement of different systems. We also examined ORM frameworks to understand their impact on DBMS management and query-building practices. Our findings reveal that MySQL and PostgreSQL are the most popular DBMSs, although some projects replace them with other DBMSs. While certain popular DBMSs (e.g., Redis, MongoDB) usually stay in the project after they are introduced (and therefore their adoption is stable), others (e.g., HyperSQL) are frequently replaced as project requirements evolve. We also observed patterns of polyglot persistence, where multiple DBMSs coexist to handle varied data types. Notably, Informix is a relational DBMS designed to handle real-time data processing and is always used with other DBMSs. Additionally, we identified ORM usage trends that facilitate database interactions and mitigate migration complexities. These insights contribute to a broader understanding of DBMS adoption, providing valuable guidance for developers and architects in selecting and managing database infrastructure over time.


















Similar content being viewed by others
Data Availability
All data and code used in our analysis are publicly available in our GitHub repository https://github.com/gems-uff/db-mining.
Notes
We used a Ryzen 7735HS with 16GB RAM GDDR5, 512GB SSD nvme 4.0 to measure the performance of our analysis on a sample project.
References
Agarwal S (2013) Data mining: Data mining concepts and techniques. In: 2013 International conference on machine intelligence and research advancement, IEEE, Institute of Electrical and Electronics Engineers, Katra, India, pp 203–207. https://doi.org/10.1109/ICMIRA.2013.45
Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB, Santiago, Chile, vol 1215, pp 487–499
Allamanis M, Sutton C (2013) Mining source code repositories at massive scale using language modeling. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 207–216. https://doi.org/10.1109/MSR.2013.6624029
Borges H, Tulio Valente M (2018) What’s in a github star? Understanding repository starring practices in a social coding platform. J Syst Softw 146:112–129. https://doi.org/10.1016/j.jss.2018.09.016
Cattell R (2011) Scalable sql and nosql data stores. Acm Sigmod Record 39(4):12–27
Davoudian A, Chen L, Liu M (2018) A survey on nosql stores. ACM Comput Surv (CSUR) 51(2):1–43
DB-Engines (2022) Db-engines ranking. https://db-engines.com/en/ranking. Accessed 28 Feb 2022
Dimolikas K, Zarras AV, Vassiliadis P (2020) A study on the effect of a table’s involvement in foreign keys to its schema evolution. In: Conceptual modeling: 39th international conference, ER 2020, Vienna, Austria, November 3–6, 2020, Proceedings 39. Springer, pp 456–470
Elmasri R, Navathe S (2010) Fundamentals of database systems. Pearson
Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recognit 1(1):54–77
Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The spmf open-source data mining library version 2. In: Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2016, Riva del Garda, Italy, September 19–23, 2016, Proceedings, Part III 16. Springer, Springer International Publishing, Cham, pp 36–40
Fowler M (2011) Polyglot persistence. https://martinfowler.com/bliki/PolyglotPersistence.html. Accessed 27 Jan 2025
Gamma E, Helm R, Johnson R, Vlissides JM (1994) Design patterns: elements of reusable object-oriented software, 1st edn. Addison-Wesley Professional. http://www.amazon.com/Design-Patterns-Elements-Reusable-Object-Oriented/dp/0201633612/ref=ntt_at_ep_dpi_1
Gessert F, Wingerath W, Friedrich S, Ritter N (2017) Nosql database systems: a survey and decision guidance. Comput Sci-Res Dev 32:353–365
Goeminne M, Decan A, Mens T (2014) Co-evolving code-related and database-related changes in a data-intensive software system. In: 2014 Software evolution week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), Institute of Electrical and Electronics Engineers, Antwerp, Belgium, pp 353–357. https://doi.org/10.1109/CSMR-WCRE.2014.6747193
Goeminne M, Mens T (2015) Towards a survival analysis of database framework usage in java projects. In: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 551–555. https://doi.org/10.1109/ICSM.2015.7332512
Han J, Pei J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th international conference on data engineering. IEEE, pp 215–224
Henry S, Kafura D (1981) Software structure metrics based on information flow. IEEE Trans Softw Eng SE–7(5):510–518. https://doi.org/10.1109/TSE.1981.231113
Johnson R (2005) J2ee development frameworks. Computer 38(1):107–110. https://doi.org/10.1109/MC.2005.22
JRebel (2020) 2020 java technology report. https://www.jrebel.com/blog/2020-java-technology-report. Accessed 09 Feb 2022
Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories, ACM, Association for Computing Machinery, New York, NY, USA, MSR 2014, pp 92–101. https://doi.org/10.1145/2597073.2597074
Keith M, Schincariol M (2006) Pro EJB 3: Java Persistence API (Pro). Apress, USA
Keith M, Schincariol M, Nardone M (2018) Pro JPA 2 in Java EE 8: An in-depth guide to java persistence APIs. Apress L. P, Berkeley, CA
Kleppmann M (2017) Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems. O’Reilly Media. https://books.google.com.br/books?id=BM7woQEACAAJ
Linares-Vásquez M, Li B, Vendome C, Poshyvanyk D (2015) How do developers document database usages in source code? (n). In: 2015 30th IEEE/ACM international conference on Automated Software Engineering (ASE), pp 36–41. https://doi.org/10.1109/ASE.2015.67
Lyu Y, Gui J, Wan M, Halfond WGJ (2017) An empirical study of local database usage in android applications. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 444–455. https://doi.org/10.1109/ICSME.2017.75
McHugh M (2012) Interrater reliability: the kappa statistic. Biochem Med 22(3):276–82
OpenHMS (2021) Sql query builders. https://openhms.sourceforge.io/sqlbuilder/. Accessed 23 May 2023
Qiu D, Li B, Su Z (2013) An empirical analysis of the co-evolution of schema and code in database applications. In: Proceedings of the 2013 9th Joint meeting on foundations of software engineering, ACM, Association for Computing Machinery, New York, NY, USA, ESEC/FSE 2013, pp 125–135. https://doi.org/10.1145/2491411.2491431
Raschka S (2018) Mlxtend: providing machine learning and data science utilities and extensions to python’s scientific computing stack. J Open Source Softw 3(24):638. https://doi.org/10.21105/joss.00638. http://joss.theoj.org/papers/10.21105/joss.00638
Roy-Hubara N, Shoval P, Sturm A (2022) Selecting databases for polyglot persistence applications. Data Knowl Eng 137:101950. https://doi.org/10.1016/j.datak.2021.101950. https://www.sciencedirect.com/science/article/pii/S0169023X21000744
Sahatqija K, Ajdari J, Zenuni X, Raufi B, Ismaili F (2018) Comparison between relational and nosql databases. In: 2018 41st international convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE, Institute of Electrical and Electronics Engineers, Opatija, Croatia, pp 0216–0221
Scherzinger S, Sidortschuck S (2020) An empirical study on the design and evolution of nosql database schemas. In: Conceptual modeling: 39th international conference, ER 2020, Vienna, Austria, November 3–6, 2020, Proceedings 39. Springer, pp 441–455
Turkish Technology (2024) N+1 select problem. https://medium.com/@turkishtechnology/n-1-select-problem-21a3717325b6. Accessed 17 Nov 2024
Vassiliadis P (2021) Profiles of schema evolution in free open source software projects. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, pp 1–12
Yan C, Cheung A, Yang J, Lu S (2017) Understanding database performance inefficiencies in real-world web applications. Association for Computing Machinery, New York, NY, USA, pp 1299–1308. https://doi.org/10.1145/3132847.3132954
Yang J, Subramaniam P, Lu S, Yan C, Cheung A (2018) How \(<\)i\(>\)not\(<\)/i\(>\) to structure your database-backed web applications: a study of performance bugs in the wild. In: Proceedings of the 40th international conference on software engineering. Association for Computing Machinery, New York, NY, USA, ICSE ’18, pp 800–810. https://doi.org/10.1145/3180155.3180194
Acknowledgements
The authors would like to thank the National Science Foundation (NSF) grants 2247929, 2303042, 2303612, and 2303612; CNPq grants 305020/2019-6, 311955/2020-7, and 309410/2023-1; CNPq/MCTI/FNDCT grant 408812/2021-4; MCTIC/CGI/FAPESP grant 2021/06662-1; Fundação Araucaria - Parana State Government grant PRD2023361000043; and FAPERJ grants E26/201.038/2021 and E-26/210.478/2024, for the financial support. This paper has immensely benefited from the comments and suggestions of the three anonymous reviewers, to whom we are deeply thankful. We also acknowledge the use of Grammarly and ChatGPT 4.o for the improvement of spelling, grammar, vocabulary, and style of the text. We also utilized ChatGPT 4.o to speed up the writing of Python code and to classify projects, as mentioned earlier. All suggestions were carefully examined, tested, and often corrected by us, whereby we take full responsibility for the form and content of the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no financial or non-financial interests that are directly or indirectly related to this work.
Additional information
Communicated by: Denys Poshyvanyk.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Paiva, C.A., Maximino, R., Paiva, F. et al. Analyzing the adoption of database management systems throughout the history of open source projects. Empir Software Eng 30, 71 (2025). https://doi.org/10.1007/s10664-025-10627-z
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-025-10627-z