Many software projects rely on a relational database in order to realize part of their functional... more Many software projects rely on a relational database in order to realize part of their functionality. Various database frameworks and object-relational mappings have been developed and used to facilitate data manipulation. Little is known about whether and how such frameworks co-occur, how they complement or compete with each other, and how this changes over time. We empirically studied these aspects for 5 Java database frameworks, based on a corpus of 3,707 GitHub Java projects. In particular, we analysed whether certain database frameworks co-occur frequently, and whether some database frameworks get replaced over time by others. Using the statistical technique of survival analysis, we explored the survival of the database frameworks in the considered projects. This provides useful evidence to software developers about which frameworks can be used successfully in combination and which combinations should be avoided.
Abstract Software repository mining research extracts and analyses data originating from multiple... more Abstract Software repository mining research extracts and analyses data originating from multiple software repositories to understand the historical development of software systems, and to propose better ways to evolve such systems in the future. Of particular interest is the study of the activities and interactions between the persons involved in the software development process.
DI-fusion, le Dépôt institutionnel numérique de l'ULB, est l'outil de référencementde la producti... more DI-fusion, le Dépôt institutionnel numérique de l'ULB, est l'outil de référencementde la production scientifique de l'ULB.L'interface de recherche DI-fusion permet de consulter les publications des chercheurs de l'ULB et les thèses qui y ont été défendues.
Résumé: Software ecosystems are coherent collections of software projects that evolve together an... more Résumé: Software ecosystems are coherent collections of software projects that evolve together and are maintained by the same developer community. Tools for analysing and visualising the evolution of software ecosystems must not only take into account the software product, but the development community as well.
Résumé: Les systèmes logiciels sont parmi les systèmes les plus complexes que l'homme ait jamais ... more Résumé: Les systèmes logiciels sont parmi les systèmes les plus complexes que l'homme ait jamais fabriqués. Des chercheurs tentent de mesurer, comprendre et analyser cette complexité dans le but de fournir des outils automatisés permettant de contrôler voire de réduire cette complexité.
Abstract Interactions between user and developer communities on the one hand, and open-source sof... more Abstract Interactions between user and developer communities on the one hand, and open-source software (OSS) evolution and quality on the other hand, are not intensively studied. However, these communities significantly influence how the software evolves. Empirical studies about this influence could offer us a way to propose changes in the software development process in order to improve the overall software quality.
Abstract. Empirical software engineering is concerned with statistical studies that aim to unders... more Abstract. Empirical software engineering is concerned with statistical studies that aim to understand and improve certain aspects of the software development process. Many of these focus on the evolution and maintenance of evolving software projects. They rely on repository mining techniques to extract relevant data from software repositories or other data sources frequently used by software developers.
Abstract Most empirical studies of open source software repositories focus on the analysis of iso... more Abstract Most empirical studies of open source software repositories focus on the analysis of isolated projects, or restrict themselves to the study of the relationships between technical artifacts. In contrast, we have carried out a case study that focuses on the actual contributors to software ecosystems, being collections of software projects that are maintained by the same community. To this aim, we defined a new series of workload and involvement metrics, as well as a novel approach—
Le refactoring est une tâche récurrente lors du développement logiciel. En fonction de la méthodo... more Le refactoring est une tâche récurrente lors du développement logiciel. En fonction de la méthodologie adoptée, il peut survenir aussi bien au début ou au terme d'une release qu'a la fin d'une journée de travail. Le but du refactoring est d'améliorer la qualité interne du logiciel tout en préservant son comportement visible. Il existe un grand nombre d'outils assistant le développeur dans sa refactorisation du code, mais il n'existe pas de bouton miracle qui permette de réaliser un refactoring réellement utile sans effort.
Ce document est mon second rapport pour le comité d'accompagnement de ma thèse commencée en septe... more Ce document est mon second rapport pour le comité d'accompagnement de ma thèse commencée en septembre 2009 et réalisée dans le service de Génie Logiciel de l'UMONS grâce au projet ARC AUWB-08/12-UMH" Model-Driven Software Evolution". Il rappelle dans les grandes lignes le domaine de recherche de ma thèse et présente les problèmes qu'elle a soulevés au cours de cette année ainsi que les activités relatives à ma formation doctorale.
Since a couple of decades, open source software has gained popularity due to the savings they rep... more Since a couple of decades, open source software has gained popularity due to the savings they represent and the ability for the users to modify and improve the software themeselves. As the number of projects which the entire history is available grows over time, the number of empirical studies on them grows as well. Most of these empirical studies are carried out with no consideration for other artefacts but source code.
Le génie logiciel empirique s’intéresse aux études empiriques permettant de comprendre et d’am... more Le génie logiciel empirique s’intéresse aux études empiriques permettant de comprendre et d’améliorer certains aspects du processus logiciel. Nombre d’entre elles sont dédiées à l’évolution des projets logiciels. Elles extraient les données pertinentes venant de dépôts logiciels ou d’autres sources de données couramment utilisées par les développeurs. Nous suggérons d’élargir ce type d’études empiriques en tenant compte de l’information concernant les communautés de développeurs, ainsi que leur façon de travailler, d’interagir et de communiquer. L’hypothèse sous-jacente étant que les aspects sociaux influent significativement la qualité du produit logiciel, ainsi que la manière dont ce produit évolue au cours du temps. Dans cette conférence, nous présenterons un outil permettant d’extraire, de visualiser et d’analyser l’information concernant les communautés gravitant autour d’un projet logiciel. Nous montrons quelques études empiriques effectuées, et nous présentons des pistes de recherche dans ce domaine de recherche combinant l’analyse des réseaux sociaux et le génie logiciel empirique.
Numerous empirical studies analyse evolving open source software (OSS) projects, and try to estim... more Numerous empirical studies analyse evolving open source software (OSS) projects, and try to estimate the activity and effort in these projects. Most of these studies, however, only focus on a limited set of artefacts, being source code and defect data. In our research, we extend the analysis by also taking into account mailing list information. The main goal of this article is to find evidence for the Pareto principle in this context, by studying how the activity of developers and users involved in OSS projects is distributed: it appears that most of the activity is carried out by a small group of people. Following the GQM paradigm, we provide evidence for this principle. We selected a range of metrics used in economy to measure inequality in distribution of wealth, and adapted these metrics to assess how OSS project activity is distributed. Regardless of whether we analyse version repositories, bug trackers, or mailing lists, and for all three projects we studied, it turns out that the distribution of activity is highly imbalanced.
Nowadays, most empirical studies in open source software evolution are based on the analysis of p... more Nowadays, most empirical studies in open source software evolution are based on the analysis of program code alone. In order to get a better understanding of how software evolves over time, many more entities that are part of the software ecosystem need to be taken into account. We present a general framework to automate the analysis of the evolu- tion of software ecosystems. The framework incorporates a database that stores all relevant information obtained thanks to several mining tools, and provides a unified data source to visualisation tools. One such visualisation tool is inte- grated in order to get a first quick overview of the evolution of different aspects of the software project under study. The framework is extensible in order to accommodate more and different types of input and output, depending on the needs of the user. We compare our framework against existing solutions, and show how we can use this framework for car- rying out concrete ecosystem evolution experiments.
Many software projects rely on a relational database in order to realize part of their functional... more Many software projects rely on a relational database in order to realize part of their functionality. Various database frameworks and object-relational mappings have been developed and used to facilitate data manipulation. Little is known about whether and how such frameworks co-occur, how they complement or compete with each other, and how this changes over time. We empirically studied these aspects for 5 Java database frameworks, based on a corpus of 3,707 GitHub Java projects. In particular, we analysed whether certain database frameworks co-occur frequently, and whether some database frameworks get replaced over time by others. Using the statistical technique of survival analysis, we explored the survival of the database frameworks in the considered projects. This provides useful evidence to software developers about which frameworks can be used successfully in combination and which combinations should be avoided.
Abstract Software repository mining research extracts and analyses data originating from multiple... more Abstract Software repository mining research extracts and analyses data originating from multiple software repositories to understand the historical development of software systems, and to propose better ways to evolve such systems in the future. Of particular interest is the study of the activities and interactions between the persons involved in the software development process.
DI-fusion, le Dépôt institutionnel numérique de l'ULB, est l'outil de référencementde la producti... more DI-fusion, le Dépôt institutionnel numérique de l'ULB, est l'outil de référencementde la production scientifique de l'ULB.L'interface de recherche DI-fusion permet de consulter les publications des chercheurs de l'ULB et les thèses qui y ont été défendues.
Résumé: Software ecosystems are coherent collections of software projects that evolve together an... more Résumé: Software ecosystems are coherent collections of software projects that evolve together and are maintained by the same developer community. Tools for analysing and visualising the evolution of software ecosystems must not only take into account the software product, but the development community as well.
Résumé: Les systèmes logiciels sont parmi les systèmes les plus complexes que l'homme ait jamais ... more Résumé: Les systèmes logiciels sont parmi les systèmes les plus complexes que l'homme ait jamais fabriqués. Des chercheurs tentent de mesurer, comprendre et analyser cette complexité dans le but de fournir des outils automatisés permettant de contrôler voire de réduire cette complexité.
Abstract Interactions between user and developer communities on the one hand, and open-source sof... more Abstract Interactions between user and developer communities on the one hand, and open-source software (OSS) evolution and quality on the other hand, are not intensively studied. However, these communities significantly influence how the software evolves. Empirical studies about this influence could offer us a way to propose changes in the software development process in order to improve the overall software quality.
Abstract. Empirical software engineering is concerned with statistical studies that aim to unders... more Abstract. Empirical software engineering is concerned with statistical studies that aim to understand and improve certain aspects of the software development process. Many of these focus on the evolution and maintenance of evolving software projects. They rely on repository mining techniques to extract relevant data from software repositories or other data sources frequently used by software developers.
Abstract Most empirical studies of open source software repositories focus on the analysis of iso... more Abstract Most empirical studies of open source software repositories focus on the analysis of isolated projects, or restrict themselves to the study of the relationships between technical artifacts. In contrast, we have carried out a case study that focuses on the actual contributors to software ecosystems, being collections of software projects that are maintained by the same community. To this aim, we defined a new series of workload and involvement metrics, as well as a novel approach—
Le refactoring est une tâche récurrente lors du développement logiciel. En fonction de la méthodo... more Le refactoring est une tâche récurrente lors du développement logiciel. En fonction de la méthodologie adoptée, il peut survenir aussi bien au début ou au terme d'une release qu'a la fin d'une journée de travail. Le but du refactoring est d'améliorer la qualité interne du logiciel tout en préservant son comportement visible. Il existe un grand nombre d'outils assistant le développeur dans sa refactorisation du code, mais il n'existe pas de bouton miracle qui permette de réaliser un refactoring réellement utile sans effort.
Ce document est mon second rapport pour le comité d'accompagnement de ma thèse commencée en septe... more Ce document est mon second rapport pour le comité d'accompagnement de ma thèse commencée en septembre 2009 et réalisée dans le service de Génie Logiciel de l'UMONS grâce au projet ARC AUWB-08/12-UMH" Model-Driven Software Evolution". Il rappelle dans les grandes lignes le domaine de recherche de ma thèse et présente les problèmes qu'elle a soulevés au cours de cette année ainsi que les activités relatives à ma formation doctorale.
Since a couple of decades, open source software has gained popularity due to the savings they rep... more Since a couple of decades, open source software has gained popularity due to the savings they represent and the ability for the users to modify and improve the software themeselves. As the number of projects which the entire history is available grows over time, the number of empirical studies on them grows as well. Most of these empirical studies are carried out with no consideration for other artefacts but source code.
Le génie logiciel empirique s’intéresse aux études empiriques permettant de comprendre et d’am... more Le génie logiciel empirique s’intéresse aux études empiriques permettant de comprendre et d’améliorer certains aspects du processus logiciel. Nombre d’entre elles sont dédiées à l’évolution des projets logiciels. Elles extraient les données pertinentes venant de dépôts logiciels ou d’autres sources de données couramment utilisées par les développeurs. Nous suggérons d’élargir ce type d’études empiriques en tenant compte de l’information concernant les communautés de développeurs, ainsi que leur façon de travailler, d’interagir et de communiquer. L’hypothèse sous-jacente étant que les aspects sociaux influent significativement la qualité du produit logiciel, ainsi que la manière dont ce produit évolue au cours du temps. Dans cette conférence, nous présenterons un outil permettant d’extraire, de visualiser et d’analyser l’information concernant les communautés gravitant autour d’un projet logiciel. Nous montrons quelques études empiriques effectuées, et nous présentons des pistes de recherche dans ce domaine de recherche combinant l’analyse des réseaux sociaux et le génie logiciel empirique.
Numerous empirical studies analyse evolving open source software (OSS) projects, and try to estim... more Numerous empirical studies analyse evolving open source software (OSS) projects, and try to estimate the activity and effort in these projects. Most of these studies, however, only focus on a limited set of artefacts, being source code and defect data. In our research, we extend the analysis by also taking into account mailing list information. The main goal of this article is to find evidence for the Pareto principle in this context, by studying how the activity of developers and users involved in OSS projects is distributed: it appears that most of the activity is carried out by a small group of people. Following the GQM paradigm, we provide evidence for this principle. We selected a range of metrics used in economy to measure inequality in distribution of wealth, and adapted these metrics to assess how OSS project activity is distributed. Regardless of whether we analyse version repositories, bug trackers, or mailing lists, and for all three projects we studied, it turns out that the distribution of activity is highly imbalanced.
Nowadays, most empirical studies in open source software evolution are based on the analysis of p... more Nowadays, most empirical studies in open source software evolution are based on the analysis of program code alone. In order to get a better understanding of how software evolves over time, many more entities that are part of the software ecosystem need to be taken into account. We present a general framework to automate the analysis of the evolu- tion of software ecosystems. The framework incorporates a database that stores all relevant information obtained thanks to several mining tools, and provides a unified data source to visualisation tools. One such visualisation tool is inte- grated in order to get a first quick overview of the evolution of different aspects of the software project under study. The framework is extensible in order to accommodate more and different types of input and output, depending on the needs of the user. We compare our framework against existing solutions, and show how we can use this framework for car- rying out concrete ecosystem evolution experiments.
Uploads
Papers by Mathieu Goeminne
The main goal of this article is to find evidence for the Pareto principle in this context, by studying how the activity of developers and users involved in OSS projects is distributed: it appears that most of the activity is carried out by a small group of people. Following the GQM paradigm, we provide evidence for this principle. We selected a range of metrics used in economy to measure inequality in distribution of wealth, and adapted these metrics to assess how OSS project activity is distributed.
Regardless of whether we analyse version repositories, bug trackers, or mailing lists, and for all three projects we studied, it turns out that the distribution of activity is highly imbalanced.
The main goal of this article is to find evidence for the Pareto principle in this context, by studying how the activity of developers and users involved in OSS projects is distributed: it appears that most of the activity is carried out by a small group of people. Following the GQM paradigm, we provide evidence for this principle. We selected a range of metrics used in economy to measure inequality in distribution of wealth, and adapted these metrics to assess how OSS project activity is distributed.
Regardless of whether we analyse version repositories, bug trackers, or mailing lists, and for all three projects we studied, it turns out that the distribution of activity is highly imbalanced.