Analyzing the adoption of database management systems throughout the history of open source projects

Paiva, Camila A.; Maximino, Raquel; Paiva, Frederico; Vieira, Rafael Accetta; Espanha, Nicole; Pimentel, João Felipe; Wiese, Igor; Gerosa, Marco Aurélio; Steinmacher, Igor; Murta, Leonardo; Braganholo, Vanessa

doi:10.1007/s10664-025-10627-z

Analyzing the adoption of database management systems throughout the history of open source projects

Published: 22 February 2025

Volume 30, article number 71, (2025)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

61 Accesses
Explore all metrics

Abstract

The appropriate selection of DBMSs (Database Management Systems) is relevant for the success of modern software applications. Relational DBMSs are popular for structured data management, while non-relational systems, such as NoSQL databases, have gained traction for handling unstructured data and scaling in dynamic environments. These varying DBMS characteristics have led to an increasing trend of combining multiple systems within a single application to meet diverse requirements. However, existing work does not analyze whether DBMS are replaced or used together in a broad scope. This paper presents an empirical study on DBMS usage across 362 popular open-source Java projects hosted on GitHub. Our analysis focuses on the most widely adopted DBMSs, both relational and non-relational, as ranked by the DB-Engines website. By examining DBMS integration patterns, stability, and migration trends, we aim to uncover insights into the factors driving DBMS choices in real-world applications. We investigated DBMS popularity, usage stability, migration patterns, synergy among DBMS, and the role of Object-Relational Mappers (ORMs) in DBMS interactions. We applied heuristics to detect DBMS presence, tracked usage trends over time, and analyzed the coexistence and replacement of different systems. We also examined ORM frameworks to understand their impact on DBMS management and query-building practices. Our findings reveal that MySQL and PostgreSQL are the most popular DBMSs, although some projects replace them with other DBMSs. While certain popular DBMSs (e.g., Redis, MongoDB) usually stay in the project after they are introduced (and therefore their adoption is stable), others (e.g., HyperSQL) are frequently replaced as project requirements evolve. We also observed patterns of polyglot persistence, where multiple DBMSs coexist to handle varied data types. Notably, Informix is a relational DBMS designed to handle real-time data processing and is always used with other DBMSs. Additionally, we identified ORM usage trends that facilitate database interactions and mitigate migration complexities. These insights contribute to a broader understanding of DBMS adoption, providing valuable guidance for developers and architects in selecting and managing database infrastructure over time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Renovating Database Applications with DBAutoAwesome

An Empirical Study of (Multi-) Database Models in Open-Source Projects

Modern Backend Development Technologies: A Comparative Review and Case Study

Data Availability

All data and code used in our analysis are publicly available in our GitHub repository https://github.com/gems-uff/db-mining.

Notes

References

Agarwal S (2013) Data mining: Data mining concepts and techniques. In: 2013 International conference on machine intelligence and research advancement, IEEE, Institute of Electrical and Electronics Engineers, Katra, India, pp 203–207. https://doi.org/10.1109/ICMIRA.2013.45
Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB, Santiago, Chile, vol 1215, pp 487–499
Allamanis M, Sutton C (2013) Mining source code repositories at massive scale using language modeling. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 207–216. https://doi.org/10.1109/MSR.2013.6624029
Borges H, Tulio Valente M (2018) What’s in a github star? Understanding repository starring practices in a social coding platform. J Syst Softw 146:112–129. https://doi.org/10.1016/j.jss.2018.09.016
Article Google Scholar
Cattell R (2011) Scalable sql and nosql data stores. Acm Sigmod Record 39(4):12–27
Article MATH Google Scholar
Davoudian A, Chen L, Liu M (2018) A survey on nosql stores. ACM Comput Surv (CSUR) 51(2):1–43
Article MATH Google Scholar
DB-Engines (2022) Db-engines ranking. https://db-engines.com/en/ranking. Accessed 28 Feb 2022
Dimolikas K, Zarras AV, Vassiliadis P (2020) A study on the effect of a table’s involvement in foreign keys to its schema evolution. In: Conceptual modeling: 39th international conference, ER 2020, Vienna, Austria, November 3–6, 2020, Proceedings 39. Springer, pp 456–470
Elmasri R, Navathe S (2010) Fundamentals of database systems. Pearson
Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recognit 1(1):54–77
MATH Google Scholar
Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The spmf open-source data mining library version 2. In: Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2016, Riva del Garda, Italy, September 19–23, 2016, Proceedings, Part III 16. Springer, Springer International Publishing, Cham, pp 36–40
Fowler M (2011) Polyglot persistence. https://martinfowler.com/bliki/PolyglotPersistence.html. Accessed 27 Jan 2025
Gamma E, Helm R, Johnson R, Vlissides JM (1994) Design patterns: elements of reusable object-oriented software, 1st edn. Addison-Wesley Professional. http://www.amazon.com/Design-Patterns-Elements-Reusable-Object-Oriented/dp/0201633612/ref=ntt_at_ep_dpi_1
Gessert F, Wingerath W, Friedrich S, Ritter N (2017) Nosql database systems: a survey and decision guidance. Comput Sci-Res Dev 32:353–365
Article Google Scholar
Goeminne M, Decan A, Mens T (2014) Co-evolving code-related and database-related changes in a data-intensive software system. In: 2014 Software evolution week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), Institute of Electrical and Electronics Engineers, Antwerp, Belgium, pp 353–357. https://doi.org/10.1109/CSMR-WCRE.2014.6747193
Goeminne M, Mens T (2015) Towards a survival analysis of database framework usage in java projects. In: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 551–555. https://doi.org/10.1109/ICSM.2015.7332512
Han J, Pei J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th international conference on data engineering. IEEE, pp 215–224
Henry S, Kafura D (1981) Software structure metrics based on information flow. IEEE Trans Softw Eng SE–7(5):510–518. https://doi.org/10.1109/TSE.1981.231113
Article MATH Google Scholar
Johnson R (2005) J2ee development frameworks. Computer 38(1):107–110. https://doi.org/10.1109/MC.2005.22
Article MATH Google Scholar
JRebel (2020) 2020 java technology report. https://www.jrebel.com/blog/2020-java-technology-report. Accessed 09 Feb 2022
Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories, ACM, Association for Computing Machinery, New York, NY, USA, MSR 2014, pp 92–101. https://doi.org/10.1145/2597073.2597074
Keith M, Schincariol M (2006) Pro EJB 3: Java Persistence API (Pro). Apress, USA
Book MATH Google Scholar
Keith M, Schincariol M, Nardone M (2018) Pro JPA 2 in Java EE 8: An in-depth guide to java persistence APIs. Apress L. P, Berkeley, CA
Book MATH Google Scholar
Kleppmann M (2017) Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems. O’Reilly Media. https://books.google.com.br/books?id=BM7woQEACAAJ
Linares-Vásquez M, Li B, Vendome C, Poshyvanyk D (2015) How do developers document database usages in source code? (n). In: 2015 30th IEEE/ACM international conference on Automated Software Engineering (ASE), pp 36–41. https://doi.org/10.1109/ASE.2015.67
Lyu Y, Gui J, Wan M, Halfond WGJ (2017) An empirical study of local database usage in android applications. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 444–455. https://doi.org/10.1109/ICSME.2017.75
McHugh M (2012) Interrater reliability: the kappa statistic. Biochem Med 22(3):276–82
Article MathSciNet MATH Google Scholar
OpenHMS (2021) Sql query builders. https://openhms.sourceforge.io/sqlbuilder/. Accessed 23 May 2023
Qiu D, Li B, Su Z (2013) An empirical analysis of the co-evolution of schema and code in database applications. In: Proceedings of the 2013 9th Joint meeting on foundations of software engineering, ACM, Association for Computing Machinery, New York, NY, USA, ESEC/FSE 2013, pp 125–135. https://doi.org/10.1145/2491411.2491431
Raschka S (2018) Mlxtend: providing machine learning and data science utilities and extensions to python’s scientific computing stack. J Open Source Softw 3(24):638. https://doi.org/10.21105/joss.00638. http://joss.theoj.org/papers/10.21105/joss.00638
Roy-Hubara N, Shoval P, Sturm A (2022) Selecting databases for polyglot persistence applications. Data Knowl Eng 137:101950. https://doi.org/10.1016/j.datak.2021.101950. https://www.sciencedirect.com/science/article/pii/S0169023X21000744
Sahatqija K, Ajdari J, Zenuni X, Raufi B, Ismaili F (2018) Comparison between relational and nosql databases. In: 2018 41st international convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE, Institute of Electrical and Electronics Engineers, Opatija, Croatia, pp 0216–0221
Scherzinger S, Sidortschuck S (2020) An empirical study on the design and evolution of nosql database schemas. In: Conceptual modeling: 39th international conference, ER 2020, Vienna, Austria, November 3–6, 2020, Proceedings 39. Springer, pp 441–455
Turkish Technology (2024) N+1 select problem. https://medium.com/@turkishtechnology/n-1-select-problem-21a3717325b6. Accessed 17 Nov 2024
Vassiliadis P (2021) Profiles of schema evolution in free open source software projects. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, pp 1–12
Yan C, Cheung A, Yang J, Lu S (2017) Understanding database performance inefficiencies in real-world web applications. Association for Computing Machinery, New York, NY, USA, pp 1299–1308. https://doi.org/10.1145/3132847.3132954
Yang J, Subramaniam P, Lu S, Yan C, Cheung A (2018) How $<$i$>$not$<$/i$>$ to structure your database-backed web applications: a study of performance bugs in the wild. In: Proceedings of the 40th international conference on software engineering. Association for Computing Machinery, New York, NY, USA, ICSE ’18, pp 800–810. https://doi.org/10.1145/3180155.3180194

Download references

Acknowledgements

The authors would like to thank the National Science Foundation (NSF) grants 2247929, 2303042, 2303612, and 2303612; CNPq grants 305020/2019-6, 311955/2020-7, and 309410/2023-1; CNPq/MCTI/FNDCT grant 408812/2021-4; MCTIC/CGI/FAPESP grant 2021/06662-1; Fundação Araucaria - Parana State Government grant PRD2023361000043; and FAPERJ grants E26/201.038/2021 and E-26/210.478/2024, for the financial support. This paper has immensely benefited from the comments and suggestions of the three anonymous reviewers, to whom we are deeply thankful. We also acknowledge the use of Grammarly and ChatGPT 4.o for the improvement of spelling, grammar, vocabulary, and style of the text. We also utilized ChatGPT 4.o to speed up the writing of Python code and to classify projects, as mentioned earlier. All suggestions were carefully examined, tested, and often corrected by us, whereby we take full responsibility for the form and content of the paper.

Author information

Authors and Affiliations

Instituto de Computação, Universidade Federal Fluminense, Niterói, Rio de Janeiro, Brazil
Camila A. Paiva, Raquel Maximino, Frederico Paiva, Rafael Accetta Vieira, Nicole Espanha, João Felipe Pimentel, Leonardo Murta & Vanessa Braganholo
Universidade Tecnológica Federal do Paraná, Campo Mourão, Paraná, Brazil
Igor Wiese
Northern Arizona University, Flagstaff, AZ, USA
Marco Aurélio Gerosa & Igor Steinmacher

Authors

Camila A. Paiva
View author publications
You can also search for this author in PubMed Google Scholar
Raquel Maximino
View author publications
You can also search for this author in PubMed Google Scholar
Frederico Paiva
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Accetta Vieira
View author publications
You can also search for this author in PubMed Google Scholar
Nicole Espanha
View author publications
You can also search for this author in PubMed Google Scholar
João Felipe Pimentel
View author publications
You can also search for this author in PubMed Google Scholar
Igor Wiese
View author publications
You can also search for this author in PubMed Google Scholar
Marco Aurélio Gerosa
View author publications
You can also search for this author in PubMed Google Scholar
Igor Steinmacher
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Murta
View author publications
You can also search for this author in PubMed Google Scholar
Vanessa Braganholo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Camila A. Paiva.

Ethics declarations

Conflict of interest

The authors declare that there are no financial or non-financial interests that are directly or indirectly related to this work.

Additional information

Communicated by: Denys Poshyvanyk.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Paiva, C.A., Maximino, R., Paiva, F. et al. Analyzing the adoption of database management systems throughout the history of open source projects. Empir Software Eng 30, 71 (2025). https://doi.org/10.1007/s10664-025-10627-z

Download citation

Accepted: 10 February 2025
Published: 22 February 2025
DOI: https://doi.org/10.1007/s10664-025-10627-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analyzing the adoption of database management systems throughout the history of open source projects

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Renovating Database Applications with DBAutoAwesome

An Empirical Study of (Multi-) Database Models in Open-Source Projects

Modern Backend Development Technologies: A Comparative Review and Case Study

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Analyzing the adoption of database management systems throughout the history of open source projects

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Renovating Database Applications with DBAutoAwesome

An Empirical Study of (Multi-) Database Models in Open-Source Projects

Modern Backend Development Technologies: A Comparative Review and Case Study

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation