No abstract available.
Success factors of Business Intelligence
Business Intelligence (BI) has proven to be a competitive advantage for organizations, allowing them to better measure, manage, and optimize their operations. It has provided the means to improve data-driven decision making and to harmonize an ...
A brief history of software — from Bell Labs to Microsoft Research
In the mid 1990s, I was (tangentially) part of an effort in Bell Labs called the “Code Decay” project. The hypothesis of this project was that over time code becomes fragile (more difficult to change without introducing problems), and that this process ...
The promises and perils of mining git
We are now witnessing the rapid growth of decentralized source code management (DSCM) systems, in which every developer has her own repository. DSCMs facilitate a style of collaboration in which work output can flow sideways (and privately) between ...
Amassing and indexing a large sample of version control systems: Towards the census of public source code history
The source code and its history represent the output and process of software development activities and are an invaluable resource for study and improvement of software development practice. While individual projects and groups of projects have been ...
MapReduce as a general framework to support research in Mining Software Repositories (MSR)
Researchers continue to demonstrate the benefits of Mining Software Repositories (MSR) for supporting software development and research activities. However, as the mining process is time and resource intensive, they often create their own distributed ...
A platform for software engineering research
Research in the fields of software quality, maintainability and evolution requires the analysis of large quantities of data, which often originate from open source software projects. Collecting and preprocessing data, calculating metrics, and ...
Evaluating the relation between coding standard violations and faultswithin and across software versions
In spite of the widespread use of coding standards and tools enforcing their rules, there is little empirical evidence supporting the intuition that they prevent the introduction of faults in software. In previous work, we performed a pilot study to ...
Tracking concept drift of software projects using defect prediction quality
Defect prediction is an important task in the mining of software repositories, but the quality of predictions varies strongly within and across software projects. In this paper we investigate the reasons why the prediction quality is so fluctuating due ...
Does calling structure information improve the accuracy of fault prediction?
Previous studies have shown that software code attributes, such as lines of source code, and history information, such as the number of code changes and the number of faults in prior releases of software, are useful for predicting where faults will ...
Mining source code to automatically split identifiers for software analysis
Automated software engineering tools (e.g., program search, concern location, code reuse, quality assessment, etc.) increasingly rely on natural language information from comments and identifiers in code. The first step in analyzing words from ...
Code siblings: Technical and legal implications of copying code between applications
Source code cloning does not happen within a single system only. It can also occur between one system and another. We use the term code sibling to refer to a code clone that evolves in a different system than the code from which it originates. Code ...
Author entropy vs. file size in the gnome suite of applications
We present the results of a study in which author entropy was used to characterize author contributions per file. Our analysis reveals three patterns: banding in the data, uneven distribution of data across bands, and file size dependent distributions ...
Evaluating process quality in GNOME based on change request data
The lifecycle of defects reports and enhancement requests collected in the Bugzilla database of the GNOME project provides valuable information on the evolution of the change request process and for the assessment of process quality in the GNOME sub ...
Mining the coherence of GNOME bug reports with statistical topic models
We adapt Latent Dirichlet Allocation to the problem of mining bug reports in order to define a new information-theoretic measure of coherence. We then apply our technique to a snapshot of the GNOME Bugzilla database consisting of 431,863 bug reports for ...
Visualizing Gnome with the Small Project Observatory
We analyzed the Gnome family of systems with the Small Project Observatory, our online ecosystem visualization platform. We begin by briefly introducing the model of SPO. We then observe and discuss several phases in the activity of the Gnome ecosystem. ...
On the use of Internet Relay Chat (IRC) meetings by developers of the GNOME GTK+ project
Developers of open source projects are distributed across the world. They rely on email, mailing lists, instant messaging, IRC channels and more recently IRC meetings to communicate. Most of the studies thus far focus on the use of mailing lists by OSS ...
Mining search topics from a code search engine usage log
We present a topic modeling analysis of a year long usage log of Koders, one of the major commercial code search engines. This analysis contributes to the understanding of what users of code search engines are looking for. Observations on the prevalence ...
From work to word: How do software developers describe their work?
Developers take notes about their work sessions, either to remember the work status and share it with collaborators, or because employers explicitly require this for project management matters. We report on an exploratory study which aims at ...
Assigning bug reports using a vocabulary-based expertise model of developers
For popular software systems, the number of daily submitted bug reports is high. Triaging these incoming reports is a time consuming task. Part of the bug triage is the assignment of a report to a developer with the appropriate expertise. In this paper, ...
Mining the history of synchronous changes to refine code ownership
When software repositories are mined, two distinct sources of information are usually explored: the history log and snapshots of the system. Results of analyses derived from these two sources are biased by the frequency with which developers commit ...