This is the html version of the file https://arxiv.org/abs/1507.01715.
Google automatically generates html versions of documents as we crawl the web.
arXiv:1507.01715v2 [cs.SE] 8 Jul 2015
  Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Page 1
REPORT ON THE SECOND WORKSHOP ON SUSTAINABLE SOFTWARE
FOR SCIENCE: PRACTICE AND EXPERIENCES (WSSSPE2)
DANIEL S. KATZ(1), SOU-CHENG T. CHOI(2), NANCY WILKINS-DIEHR(3), NEIL CHUE HONG(4),
COLIN C. VENTERS(5), JAMES HOWISON(6), FRANK SEINSTRA(7), MATTHEW JONES(8),
KAREN CRANSTON(9), THOMAS L. CLUNE(10), MIGUEL DE VAL-BORRO(11), RICHARD LITTAUER(12)
Abstract. This technical report records and discusses the Second Workshop on Sustainable Soft-
ware for Science: Practice and Experiences (WSSSPE2). The report includes a description of the
alternative, experimental submission and review process, two workshop keynote presentations, a
series of lightning talks, a discussion on sustainability, and five discussions from the topic areas of
exploring sustainability; software development experiences; credit & incentives; reproducibility &
reuse & sharing; and code testing & code review. For each topic, the report includes a list of tangible
actions that were proposed and that would lead to potential change. The workshop recognized that
reliance on scientific software is pervasive in all areas of world-leading research today. The workshop
participants then proceeded to explore different perspectives on the concept of sustainability. Key
enablers and barriers of sustainable scientific software were identified from their experiences. In
addition, recommendations with new requirements such as software credit files and software prize
frameworks were outlined for improving practices in sustainable software engineering. There was
also broad consensus that formal training in software development or engineering was rare among
the practitioners. Significant strides need to be made in building a sense of community via training
in software and technical practices, on increasing their size and scope, and on better integrating
them directly into graduate education programs. Finally, journals can define and publish policies
to improve reproducibility, whereas reviewers can insist that authors provide sufficient information
and access to data and software to allow them reproduce the results in the paper. Hence a list of
criteria is compiled for journals to provide to reviewers so as to make it easier to review software
submitted for publication as a “Software Paper.”
(1) Computation Institute, University of Chicago & Argonne National Laboratory, Chicago, IL, USA; dsk@uchicago.
edu.
(2) NORC at the University of Chicago and Illinois Institute of Technology, Chicago, IL, USA; sctchoi@uchicago.edu.
(3) University of California-San Diego, San Diego, CA, USA; wilkinsn@sdsc.edu.
(4) Software Sustainability Institute, University of Edinburgh, Edinburgh, UK; N.ChueHong@software.ac.uk.
(5) University of Huddersfield, School of Computing and Engineering, Huddersfield, UK; C.Venters@hud.ac.uk.
(6) University of Texas at Austin, Austin, TX, USA; jhowison@ischool.utexas.edu.
(7) Netherlands eScience Centre, Amsterdam, Netherlands; F.Seinstra@esciencecenter.nl.
(8) National Center for Ecological Analysis and Synthesis, Santa Barbara, CA, USA; jones@nceas.ucsb.edu.
(9) National Evolutionary Synthesis Center, Durham, NC, USA; karen.cranston@nescent.org.
(10) NASA Goddard Space Flight Center, Greenbelt, MD, USA; Thomas.L.Clune@nasa.gov.
(11) Department of Astrophysical Sciences, Princeton University, Princeton, NJ, USA; mdevalbo@astro.princeton.
edu.
(12) University of Saarland, Germany; richard.littauer@gmail.com.
1
arXiv:1507.01715v2 [cs.SE] 8 Jul 2015

Page 2
2
1. Introduction
The Second Workshop on Sustainable Software for Science: Practice and Experiences
(WSSSPE2)1 was held on 16 November, 2014 in the city of New Orleans, Louisiana, USA, in con-
junction with the International Conference for High Performance Computing, Networking, Storage
and Analysis (SC14)2. WSSSPE2 followed the model of a general initial workshop, WSSSPE13 [1, 2],
which co-occurred with SC13, and a focused workshop, WSSSPE1.14, which was organized in July
2014 jointly with the SciPy conference5.
Progress in scientific research is dependent on the quality and accessibility of software at all levels.
Hence it is critical to address challenges related to the development, deployment, maintenance, and
overall sustainability of reusable software as well as education around software practices. These
challenges can be technological, policy based, organizational, and educational, and are of interest to
developers (the software community), users (science disciplines), software-engineering researchers,
and researchers studying the conduct of science (science of team science, science of organizations,
science of science and innovation policy, and social science communities). The WSSSPE1 workshop
engaged the broad scientific community to identify challenges and best practices in areas of inter-
est to creating sustainable scientific software. WSSSPE2 invited the community to propose and
discuss specific mechanisms to move towards an imagined future practice for software development
and usage in science and engineering. The workshop included multiple mechanisms for partici-
pation, encouraged team building around solutions, and identified risky solutions with potentially
transformative outcomes. It strongly encouraged participation of early-career scientists, postdoc-
toral researchers, and graduate students, with funds provided to the conference organizers by the
Moore Foundation and the National Science Foundation (NSF), to support the travel of potential
participants who would not otherwise be able to attend, and young participants and those from
underrepresented groups, respectively. These funds allowed 17 additional participants to attend,
and each was offered the chance to present a lightning talk.
This report extends a previous report that discussed the submission, peer-review, and peer-
grouping processes in detail [3]. It is also based on a collaborative set of notes taken with Google
Docs during the workshop [4]. Overall, the report discusses the organization work done before the
workshop (§2); the keynotes (§3); a series of lightning talks (§4), intended to give an opportunity
for attendees to quickly highlight an important issue or a potential solution; a session on defining
sustainability 5). The report also gives summaries of action plans proposed by five breakout
sessions, which explored in specific areas including sustainability 6); software development expe-
riences (§7); credit & incentives (§8); reproducibility, reuse, & sharing (§9); code testing & code
review (§10). Lastly, the report also includes some conclusions (§11) and an incomplete list of
attendees (Appendix A).
2. Submissions, Peer-Review, and Peer-Grouping
WSSSPE2 began with a call for papers [3]. Based on the goal of encouraging a wide range
of submissions from those involved in software practice, ranging from initial thoughts and partial
studies to mature deployments, but focusing on papers that are intended to lead to changes, the
organizers wanted to make submission as easy as possible. The call for papers stated:
We invite short (4-page) actionable papers that will lead to improvements for
sustainable software science. These papers could be a call to action, or could provide
position or experience reports on sustainable software activities. The papers will be
used by the organizing committee to design sessions that will be highly interactive
and targeted towards facilitating action. Submitted papers should be archived by
1http://wssspe.researchcomputing.org.uk/wssspe2/
2http://sc14.supercomputing.org
3http://wssspe.researchcomputing.org.uk/wssspe1/
4http://wssspe.researchcomputing.org.uk/wssspe1-1/
5https://conference.scipy.org/scipy2014/participate/wssspe/

Page 3
3
a third-party service that provides DOIs. We encourage submitters to license their
papers under a Creative Commons license that encourages sharing and remixing, as
we will combine ideas (with attribution) into the outcomes of the workshop.
The call included the following areas of interest:
• defining software sustainability in the context of science and engineering software
– how to evaluate software sustainability
• improving the development process that leads to new software
– methods to develop sustainable software from the outset
– effective approaches to reusable software created as a by-product of research
– impact of computer science research on the development of scientific software
• recommendations for the support and maintenance of existing software
– software engineering best practices
– governance, business, and sustainability models
– the role, operation, and sustainability of community software repositories
– reproducibility, transparency needs that may be unique to science
• successful open source software implementations
– incentives for using and contributing to open source software
– transitioning users into contributing developers
• building large and engaged user communities
– developing strong advocates
– measurement of usage and impact
• encouraging industry’s role in sustainability
– engagement of industry with volunteer communities
– incentives for industry
– incentives for community to contribute to industry-driven projects
• recommending policy changes
– software credit, attribution, incentive, and reward
– issues related to multiple organizations and multiple countries, such as in-
tellectual property, licensing, etc.
– mechanisms and venues for publishing software, and the role of publishers
• improving education and training
– best practices for providing graduate students and postdoctoral researchers
in domain communities with sufficient training in software development
– novel uses of sustainable software in education (K-20)
– case studies from students on issues around software development in the
undergraduate or graduate curricula
• careers and profession
– successful examples of career paths for developers
– institutional changes to support sustainable software such as promotion and
tenure metrics, job categories, etc.
31 submissions were received; all but one used arXiv6 or figshare7 to self-publish their papers.
The review process was fairly standard. First, reviewers bid for papers. Then an automated
system matched the bids to determine assignments. After the reviewers completed their assigned
reviews (with an average of 4.9 reviews per paper and 4.1 reviews per reviewer), they used Easy-
Chair8 to record scores on relevance and comments. Finally, the organizers accessed the information
to decide which papers to associate with the workshop and provided authors with the comments to
help them improve their papers.
6
http://arxiv.org
7http://figshare.com
8http://easychair.org/

Page 4
4
The organizers decided to list 28 of the papers as significantly contributing to the workshop, a
very high acceptance rate, but one that is reasonable, given the goal of broad participation and the
fact that the reports were already self-published.
The organizers wanted very interactive sessions, with the process of creating the sessions open
to the full program committee, the paper authors, and others who might attend the workshop. In
order to do this, the organizers used Well Sorted9 with the following steps:
(1) Authors were asked to create Well Sorted “cards” for the papers. These cards have a title (50
characters maximum) and a body (255 characters maximum).
(2) Authors, members of the WSSSPE program committee, and mailing list subscribers were asked
to sort the cards. Each person dragged the cards, one by one, into groups. A group could have
as many cards as the person wanted it to have, and it could have any meaning that made sense
to that person.
(3) Well Sorted produced a set of averages of all the sorts, with various numbers of card clusters.
The organizers then chose a sort that contained five groups that felt most meaningful. After
that, they decided on names for the five groups:
• Exploring Sustainability
• Software Development Experiences
• Credit & Incentives
• Reproducibility & Reuse & Sharing
• Code Testing & Code Review.
Finally, since some of the papers were not represented by cards in the process, they were not
placed in groups by the peer-grouping system. The authors of these papers were asked which groups
seemed the best for their papers; these papers were then placed in those groups. Sections 6-10 discuss
the breakout groups, including a list of the papers associated with each group.
3. Keynotes
The workshop featured two keynote addresses. In the opening keynote presentation, Kaitlin
Thaney of the Mozilla Science Lab talked about her organization’s work and policy to enable and
support sustainable and reproducible scientific research through the open web. The second keynote
speaker was Neil Chue Hong of Software Sustainability Institute. He shed light on how scientific
software is prevalently driving advances in many science and engineering fields. Both keynote
speeches spawned further discussion among workshop participants on the crucial notion of software
sustainability in the theme of our workshop.
3.1. Kaitlin Thaney, Designing for Truth, Scale, and Sustainability [5]. Kaitlin Thaney
is the Director of the Mozilla Science Lab (hereafter Mozilla), which is a non-profit organization
interested in openness, news, website creation, and Science, all taking advantage of the open web.
Thaney started noting the unfortunate fact that many current systems suffer the unintended
consequence of creating friction that hinders users, despite designers’ original purposes to do good.
An example is the National Cancer Institute’s caBIG. A total of $350 million was spent, including
more than $60 million for management. More than 70 tools were created, but caBIG is still seen
as a failure10. Those that had the least investment were the most used; the most invested software
were the least utilized.
Thaney emphasized that for efficient reproducible open research, we would need research tools
(e.g., software repositories), social capital (e.g., incentives), and capacity (e.g., training and mentor-
ship). Our systems would need to communicate with each other. A point was made by a member
of the audience that as systems become less monolithic, it often becomes harder to sustain the links
between them11.
9http://www.well-sorted.org
10
Report Blasts Problem-Plagued Cancer Research Grid, http://tinyurl.com/maf6dz2
11See, for example, http://tinyurl.com/l76tba2.

Page 5
5
Thaney spoke about Mozilla’s work around code citation, through a collaboration and prototype
crafted between Mozilla, GitHub, figshare, and Zenodo. This work was presented at a closed meeting
in May 2014 at the National Institutes of Health (NIH) around these issues, sparking a conversation
from that meeting around what a Software Discovery Index12 might look like. The meeting included
a number of publishers, researchers, and those behind major scientific software efforts such as
Bioconductor13, Galaxy14, and nanoHUB15. Ted Habermann in the audience commented that if the
metadata is minimal, it would be less onerous for data providers, but more burdensome for users—it
could be challenging to keep a balance between what have to be captured and what would be ideal
if we do not want to lose user engagement as in the case of the old Harvard Dataverse, finding often
only the first four fields of three pages of metadata were filled out.
The speaker concluded her talk urging the audience to design scientific software with the general
community, not an individual, in mind; and to design to unlock latent potential of our systems. In
addition, she encouraged everyone to rethink how we reward researchers and support roles.
3.2. Neil Chue Hong, We are the 92% [6]. Neil Chue Hong is the Director of the Software
Sustainability Institute (SSI) in the United Kingdom (UK). The SSI was founded to support the
UK’s research software community by cultivating better, more sustainable research software to
enable world-class research. Chue Hong’s keynote started by making the point that the use of –
and reliance on – software is pervasive in all areas of world-leading research, showing examples
from disciplines as diverse as humanities and high-energy physics, quoting Kerstin Kleese Van
Dam of the Pacific Northwestern National Laboratory via a petition campaign16 at change.org,
“Today there are very few science areas left which do not rely on IT and thus software for the
majority of their research work. More importantly key scientific advances in experimental and
observational science would have been impossible without better software.” He also cited Daniel Katz,
Software Infrastructure for Sustained Innovation (SI2) Program Director of the NSF, “Scientific
discovery and innovation are advancing along fundamentally new pathways opened by development of
increasingly sophisticated software. Software is an integral enabler of computation, experiment and
theory, and directly responsible for increased scientific productivity and enhancement of researchers’
capabilities.”
Chue Hong drew attention to the issue that in the cyber-infrastructure and high-performance
community, hundreds of thousands of researchers developing software are all too often disregarded
or considered the long tail. Actually, the numbers point to the fact that they are the mainstream.
He emphasized that software is no longer special; it is both essential to and common in scientific
research. A 2014 survey17 conducted by the SSI polled researchers from 15 research-intensive UK
universities (406 respondents covering a representative range of funders, disciplines, and seniority).
The survey reported that 92% respondents confirmed the use of research software and 89% affirmed
that it would be impossible or difficult to conduct research without software. Nevertheless the
British research community is just starting to understand the magnitude of the issue. Whilst many
researchers make use of software such as MATLAB, SPSS, and Excel, data from the aforementioned
SSI survey shows that over half (56%) of respondents developed their own research software (which
equates to over 140,000 researchers if extrapolated across the UK) and yet 71% of all UK researchers
had no formal software-development training, having to rely on their own coding skills.
Examining another aspect of the size of the research software community, Chue Hong noted that
the costs of software-reliant research in the UK included £840 million of investment in the financial
year 2013–2014, and this amount has risen by 3% on average over the past four years. About 30%
of total research investment has been spent on research that relies on software over the last four
financial years. These numbers stemmed from an analysis by the SSI of data from 49,650 grant
12Software Discovery Index, http://softwarediscoveryindex.org/report/
13Bioconductor, http://www.bioconductor.org
14Galaxy, http://galaxyproject.org
15nanoHUB, https://nanohub.org
16
http://tinyurl.com/nkn5tnv
17http://tinyurl.com/ooajs7m

Page 6
6
titles and abstracts published on Gateway to Research between years 2010 and 2014. A similar
analysis of university jobs advertised in the same period discovered that despite this investment,
only 4% of positions were software development related, and of these only 17% were explicitly
named as a software developer or software engineer positions: the vast majority being advertised as
research associate or research assistant positions. This in turn leads to the issue of career paths for
those bridging the research and software worlds, who are essential to support the use and further
development of research software, a point highlighted by a graphic showing UK STEM graduate
career paths18 showing that only 3.5% were able to secure permanent positions.
To conclude, Chue Hong led the audience in discussing the following questions: What are we
going to do to help and benefit the 92% of researchers who rely on software? Who do we need to
persuade? What are the incentives we need to put in place? Finally, he challenged the workshop
participants to change the current deficient practices in research and academia.
4. Lightning Talks
Lightning talks were a new feature in WSSSPE2. Since the workshop program was mostly
dedicated to discussions, the organizers wanted to give the attendees a chance to also ‘make a pitch’
for an idea, either representing a contributed paper or something different. Eighteen attendees
volunteered to participate in the lightning talks, each given only two minutes to speak and at most
one slide to show. The talks were presented in reverse alphabetic order of speakers’ last names. In
the rest of this section, we highlight the gist of some of the speakers’ messages.
(1) Colin C. Venters: The Nebuchadnezzar Effect: Dreaming of Sustainable Software
through Sustainable Software Architectures [7]. Venters proposed that sustainable soft-
ware is a composite, first-class, non-functional requirement (NFR) that is at a minimum a
measure of a system’s maintainability and extensibility, but may also include other NFRs such
as efficiency (e.g., energy, cost), interoperability, portability, reusability, scalability, and usabil-
ity. To achieve technically sustainable software, Venters suggested that software architectures
are fundamental as they are the primary carrier of system NFRs, i.e., pre-system understanding;
and influence how developers are able to understand, analyze, extend, test, and maintain a soft-
ware system, i.e., post-deployment system understanding. In addition, Venters highlighted that
sustainability of software architectures needs to be addressed to endure different types of change
and evolution in order to mitigate architectural drift, erosion, and knowledge vaporization.
(2) Marlon Pierce: Patching It Up, Pulling It Forward [8]. Pierce discussed how open
open source is. Open software needs a diverse, openly governed community behind it, just as
it needs open licensing and a public code repository. To probe the level of governance within
open source projects, he and his co-authors (Marru and Mattmann) suggested a contest to
encourage individual developers to submit patches and requests to projects that are important
to them. This simple mechanism shall expose several governance mechanisms, such as how easy
it is for independent developers to communicate with project leadership, how projects accept
and license third-party contributions, and how projects make decisions such as granting source
tree write access.
(3) John Peterson: Continuous Integration for Concurrent MOOSE Framework and
Application Development on GitHub [9]. Peterson from the Idaho National Laboratory re-
ported that in March 2014, the MOOSE framework was released under an open source license on
GitHub, significantly expanding and diversifying the pool of current active and potential future
contributors on the project. The MOOSE team employs an extensive continuous integration
test suite to ensure that both the framework and the applications based on the framework are
verified before any code changes are accepted into the repository. They use a combination of
built-in Git features such as branching and submodules, GitHub API integration capabilities,
18
Source: The Scientific Century, Royal Society, 2010 (revised to reflect first stage clarification from “What Do
PhD’s Do?” study)

Page 7
7
and in-house developed testing software to perform this verification and update the dependent
applications in a relatively seamless manner for users.
(4) Abani Patra: Execute it [10]. Patra discussed the value of an easily accessible platform
for executing scientific software, e.g., HUBzero to access XSEDE or other computing resources.
Such a platform for executing benchmark problems (even at a small scale) allows the developer
community access a reference implementation and provides an easy way to train the larger user
community. A second idea of this talk was that for true usability, much more attention and
support needs to be given to the integrated use of simulation tools inside complex workflows.
(5) Daniel S. Katz: Implementing Transitive Credit with JSON-LD [11]. Science and en-
gineering research increasingly relies on activities that facilitate research but are not rewarded
or recognized, such as: data sharing; developing common data resources, software and method-
ologies; and annotating data and publications. To promote and advance these activities, we
must develop mechanisms for assigning credit, facilitate the appropriate attribution of research
outcomes, devise incentives for activities that facilitate research, and allocate funds to maximize
return on investment. Katz discussed the issue of assigning credit for both direct and indirect
contributions by using JSON-LD to implement a prototype transitive credit system.
(6) Samin Ishtiaq: Daemons, Notifications and Sustaining Software. The reproduction
and replication of novel results has become a major issue in computer science, systems biology,
and other computational disciplines. These include both the inability to re-implement novel
algorithms and approaches, and lack of an agreement on how and what to benchmark these
algorithms on. Ishtiaq from Microsoft Research Cambridge pointed out these problems and
made several suggestions to address them.
(7) James Howison: Retract all Bit-Rotten Publications. Howison sought to provoke dis-
cussion by proposing that papers whose workflows are not kept current with the changing
software ecosystem should be automatically retracted. This would create an incentive for au-
thors to keep their software current and usable, rather than the current situation in which every
potential user has to do this individually. A softer version of the proposal would identify papers
whose software workflow has become bit-rotten and allow others to keep the code up to date,
either adding them as new authors of the paper or providing credit for their academic service
in some other form.
(8) Robert Downs: Community Recommendations for Improving Sustainable Scientific
Software Practices [12]. Robert Downs, of the Columbia University Center for International
Earth Science Information Network (CIESIN), described a focus group study conducted with
the Science Software Cluster (SSC) of the Federation of Earth Science Information Partners
(ESIP). For the study, almost 300 attendees of the 2014 Summer ESIP Meeting were invited to
participate in simultaneous roundtable discussions on sustainability of science software. Over
two-thirds of the roundtable focus groups responded to a semi-structured survey that contained
three sets of questions eliciting recommendations for near-term actions of the community to
improve sustainable software practices. Initial analysis of the participants’ responses to the
questionnaire revealed several suggestions, which included improving community engagement
and collaborative activities, increasing understanding and awareness, and creating incentives to
motivate sustainable science software practices. The ESIP SSC plans to engage the community
in the recommended activities for improving sustainable scientific software practices.
(9) Carl Boettiger: rOpenSci: Building Sustainable Software by Fostering a Diverse
Community [13]. Boettiger described how the rOpenSci project has been successful by focus-
ing not just on building software but also on building a community of researchers who learn
and adopt their approaches to reproducible research and sustainable software practice. Through
outreach, mentoring, workshops, and hackathons, they have not only reached new users, but
also turned users into co-developers of robust software and good practices to support data
science research across a growing set of disciplines.

Page 8
8
(10) Jakob Blomer: The Need for a Versioned Data Analysis Environment [14]. Large-
scale scientific endeavors, such as the discovery of the Higgs boson at the Large Hadron Col-
lider (LHC), often rely on complex software stacks. Maintaining thousands of dependencies of
software libraries and operating system versions has shown that despite source code availabil-
ity, the setup and the validation of a minimal usable analysis environment can easily become
prohibitively expensive. In high-energy physics, CernVM-FS, a special-purpose, open-source,
versioning, and snapshotting file system used to capture and distribute entire software stacks,
proved to be useful for providing instant access to ready-to-run data analysis environments.
(11) Alice Allen: Find it! Cite it! The Astrophysics Source Code Library (ASCL) is an online
registry of scientist-written software used in astronomy research. Their primary interest is
rendering research more transparent by making this software more discoverable for examination.
The ASCL is treated as a publication by an indexing resource for astronomy, the Astrophysics
Data System (ADS). ADS tracks citations to what it indexes, including citations to ASCL
entries. Increasing rewards for writing software, whether through citation, transitive credit or
other methods, gives software authors a powerful reason to take the time to build sustainability
into their software and is an excellent way to drive community change.
5. Defining Sustainability
In the first interactive session, the attendees divided themselves into groups to discuss software
sustainability. They were asked to
(1) discuss what the term “software sustainability” meant to them
(2) determine three things they considered to be significant enablers of software sustainability
(3) determine three things they considered to be significant barriers to software sustainability.
Once each group had come up with answers, all the answers were compiled, and the attendees voted
on which they thought were important by a show of hands.
The general responses to what software sustainability meant were:
• keeping software scientifically useful
• separating techniques in code from knowledge in code
• that an adequately large community finds value in software and is willing to sustain it.
The enablers of and barriers to software sustainability, roughly ranked by attendee votes are
shown in Tables 1 and 2, respectively.19
Table 1. Enablers of software sustainability, with 0 to 10 ‘*’s roughly indicating
the fraction of attendees who voted for an item as important.
Importance Item
********** healthy and vibrant communities; vibrant community to champion software
********** designing for growth and extension – open development
*******
culture in community for reuse
****
portability
****
culture in development community to support transition between developers
***
interdisciplinary people: science + IT experience
**
planning for end of life
**
make smart choices about dependencies
*
thinking of software as product lines – long term vs. short term view
not all communities need new software
converting use into resources
19
A few other items were suggested as barriers, but were not voted on due to lack of time in the session: layering up
dependencies; using software past its sustainable life; using software past its usable life; inertia for accepted answers
versus wrong or right answers; monolithic or poor code; and need to restructure code when hardware/software/libraries
change.

Page 9
9
Table 2. Barriers to software sustainability, with 0 to 10 ‘*’s roughly indicating the
fraction of attendees who voted for an item as important.
Importance Item
*******
lack of incentives, including promotion and tenure process; promotion and
tenure process in academic is incompatible with sustainability
*****
absent or poor documentation
*****
funding to ensure sustainability is difficult to obtain
****
developers are not computer scientists; don’t have software engineering prac-
tices (in particular, those needed to scale-up projects to support and be devel-
oped by a large sustainable community)
***
overreliance on one or two people – the ‘bus test’20
**
rate of change of underlying technologies
**
lack of business models for sustainability
*
lack of training for how to build sustainability into the system
maintenance for software is not visible, appears to “just happen”
licensing issues
staff turnover – lack of continuity
6. Exploring Sustainability
Six papers were categorized under the theme of Exploring Sustainability. The group included four
of the authors from the six papers submitted and an additional number of participants who expressed
an interest in the theme at the workshop. Each paper had a different perspective on the concept
of sustainability, which ranged from the sustainment of communities to defining sustainability as a
first-class, composite non-functional requirement.
6.1. Discussion and Actions.
6.1.1. Discussion. Each author was invited to outline the key action from their paper as a potential
discussion point for the group; where the authors were not present, the group facilitators outlined
the actionable points from their papers. The key actions from the six papers included recommen-
dations for improving practice in sustainable software engineering [15, 16]; development of Software
and Infrastructure as a Service as a mechanism for fostering sustainable science communities [17];
developer incentives for code contributions to open source projects [18]; establishment of a set of
software engineering principles based on scalability, reproducibility, and energy efficiency [19]; and
applying software architectures as a mechanism for architectural-level reasoning about sustainabil-
ity [20]. The group took the position of viewing sustainability from the perspective of addressing
the challenges related to the development, deployment, and maintenance of reusable software [2].
The principal focus of the initial discussion considered how to foster cultural change towards de-
veloping sustainable software in academic environments. It was suggested that a new requirement
driven by the agencies funding research projects where software was an intrinsic part of enabling the
research program would be an additional element within the grant proposal to provide a sustain-
ability plan for sustaining the software. This type of initiative would provide the necessary incentive
and motivation for researchers to consider how to sustain their software beyond the lifetime of the
project.
The discussion then focused on a need for a common language and a definition on the con-
cept of sustainability that moved beyond the current fuzzy definitions where time was the simple
measurement of sustainment. It was suggested that there was a need to identify tangible actions
that underpin sustainability that developers could incorporate into the development stream of their
20“Bus test” in Table 2 refers to the smallest number of key people a project would need to lose to become non-viable
(the larger the number, the healthier the project).

Page 10
10
software. This prompted a debate regarding whether sustainability should be considered as a non-
functional requirement or software quality as defined within ISO/IEC 25010 [21]. The focus of the
discussion was based on the paper by Venters et al. [20], which suggested that sustainability is
a first-class, composite, non-functional requirement composed of a number of sub-characteristics.
It was generally agreed that maintainability and extensibility were key qualities underlying sus-
tainability. In addition, the group also discussed what other non-functional requirements would
contribute to the development of sustainable software, e.g., reusability and scalability. However, it
was recognized that there was a need to identify appropriate metrics and measures.
The group also discussed whether the concept of sustainability itself was a barrier to achieving
sustainable software. It was suggested that “the first rule of software sustainability is do not talk
about software sustainability.” Instead there should be a focus on best software engineering practice.
Playing devil’s advocate, it was asked that if the focus on the concept of sustainability was ignored,
what current software engineering practices and principles could be utilized by software developers
and domain scientists to achieve sustainability? This raised the question of why existing software
engineering knowledge, such as that contained in SWEBOK [22], was largely ignored and to what
extent the environment has a strong influence on practice within the scientific and engineering
community. As a result, there is a need to identify best practices and reach a consensus beyond the
theories that underpin the discipline of software engineering. Similarly, how could we translate or
distill some of the key building blocks that underpin software engineering practice?
A final key point of the discussion was the role that software design and patterns play before
committing to writing a line of code. It was suggested that modeling must play a major role in
attaining sustainable software. The point was raised that design involves making decisions and a
mechanism would be necessary for capturing these decision points. This introduced the idea of
software provenance that moved beyond commits in software repositories to how to capture and
maintain relationships between sources and design decisions to prevent knowledge vaporization.
Whether this could be achieved through software architectures is an open-research challenge.
6.1.2. Actions. The main action to come from the group was a proposal to identify the ten best
software engineering practices similar to Philip Bourne’s approach of Ten Simple Rules [23].
6.2. Papers. The papers that were discussed in the Exploring Sustainability group are:
• Mario Rosado de Souza, Robert Haines, and Caroline Jay. Defining sustainability through
Developers’ Eyes: Recommendations from an Interview Study [15]
• Robert Downs, W. Christopher Lenhardt, Erin Robinson, Ethan Davis, and Nicholas Weber.
Community Recommendations for Sustainable Scientific Software [16]
• Abani Patra, Matthew Jones, Steven Gallo, Kyle Marcus, and Tevfik Kosar. Role of Online
Platforms, Communications and Workflows in Developing Sustainable Software for Science
Communities [17]
• Marlon Pierce, Suresh Marru, and Chris Mattmann. WSSSPE2: Patching It Up, Pulling It
Forward [18]
• Justin Shi. Seeking the principles of sustainable software engineering [19]
• Colin C. Venters, Michael K. Griffiths, Violeta Holmes, Rupert R. Ward, and David J. Cooke.
The Nebuchadnezzar Effect: Dreaming of Sustainable Software through Sustainable Software
Architectures [20]
7. Software Development Experiences
Of the short actionable papers that would lead to improvements for sustainable software science,
11 submissions were categorized in the Software Development Experiences group. Because of the
large number, we split these into two subgroups prior to the event. Some common themes helped in
this division. For example, several papers that addressed education and training issues, including
best practices and case studies, were grouped together. Others discussed experiences with registries,
developer collectives and specific examples of successful, sustainable software (including a valuable
industry perspective).

Page 11
11
7.1. Discussion and Actions. Subgroup A consisted of ten participants who discussed papers
surrounding training and successful community software initiatives. Subgroup B had six partic-
ipants who discussed four papers. Because of the nature of the papers, training emerged as a
common theme. However the conversation was wide-ranging, including incentives, reproducibility,
and funding to promote sustainability. In the end, both groups discussed training, though from
somewhat different points of view and resulting in somewhat different suggested actions.
7.1.1. Subgroup A Discussion. Group discussion started with a position statement by each person
surrounding what they had learned from their software development experiences and how those
lessons might be translated into actionable outcomes. Participants came from a range of back-
grounds, and represented multiple software and training initiatives, including Software Carpentry,
Data Carpentry, Open Science for Synthesis (OSS), the Community Surface Dynamics Modeling
Systems group, ROpenSci, DataONE, the HDF Group, and others. Some of these software ex-
periences were focused on development of new products for use in the sciences (e.g., ROpenSci,
CSDMS), and these recounted the difficulties of engaging with disciplinary scientists in writing
software. Software was clearly utilized broadly across the various science disciplines represented,
and it was developed within those disciplines as well. Some researchers created software for sta-
tistical analysis and modeling, while others used it to control instrumentation, collate data across
networks, collect information from respondents, and many other uses. There was broad consensus
that, within the disciplines represented, formal training in any type of software development or
engineering was rare among the practitioners; most are self-taught, and develop software to get
another job done. Any ancillary utility of the software outside of the specific science target was
generally unplanned and few researchers would want to invest more time to make their own software
more re-usable.
There was general agreement that this body of disciplinary software improvement needed to be
understood in terms of the scientific productivity that could be achieved. A software maturity
model is needed for science software, but it needs to be introduced in a way that fits the culture of
science, which largely thinks of software as a tool, rather than a product itself. The group was in
general consensus that more widespread training in software practice is needed within the domain
sciences, and several of the participants were involved in efforts along these lines. Participants
felt that projects that build a sense of community via training in software and technical practices
would have the most success in changing practices in that community, but that there was a need
for a managed introduction of these practices. Participants also recognized that these could not
be one-time, one-off training opportunities, as software and technologies change rapidly over time.
For example, while today Software Carpentry is focused on teaching version control via Git, there
has been a rapid evolution from RCS to CVS to SVN to Git over a short time frame, and thus
communities should expect the need to train for adaptability and a changing technology tool chain,
rather than assume that these technologies will stay fixed. Thus, although short term training that
introduces specific tools was considered highly valuable, these trainings were also not considered
sufficient to engender the changes in software practice that were deemed necessary. Combining
the need for changes in practice to be introduced incrementally with the need to minimize the
divergence of training from direct science goals and the difficulties of training for a rapidly evolving
technology space, participants concluded that multiple training efforts that targeted different parts
of the spectrum were needed. Short-term courses introducing immediately useful skills needs to be
offered alongside more in-depth courses on software engineering and practice that allow students to
adapt to a changing landscape.
Finally, after agreeing that these complementary training models were needed, the group discussed
sustainability of training, and how the leading groups in this space are teaching only a small fraction
of the community that needs and wants training. Most graduate programs in the sciences do not
currently incorporate these approaches in their graduate curricula, although there is an increasing
number of quantitatively oriented courses around analysis and modeling. These still, however,
generally omit engineering practices such as version control, unit and regression testing, and software

Page 12
12
modularity and abstraction, often because the instructors themselves in the domain sciences are
not familiar with these techniques. Thus, students who are being trained in these approaches are
doing so through short 2–3 day training workshops such as Software and Data Carpentry, rather
than through semester-long graduate education courses at their universities, which tend to focus on
statistical and modeling techniques. Hybrid programs like the three-week Open Science for Synthesis
(OSS) training that combine the three (science, quantitative techniques, and software engineering)
into an holistic course serve part of the need but reach only a few researchers at this time. Thus,
the group concluded that significant strides needed to be made in coordinating these trainings so
that they are complementary, on increasing their size and scope, and on better integrating them
directly into graduate education programs.
7.1.2. Subgroup B Discussion. The statement “applied computer science is being attempted in
academia without any formal training” kicked off our discussion. The group brought expertise in
several different training models, from a two-day Software Carpentry workshop, a three-week Open
Science for Synthesis (OSS) program, to semester-long programs. We discussed the pros and cons
of different training approaches, touching on informal learning, for example, where people learn
the necessary skills by asking questions of cross-disciplinary people (“boundary scientists”) in their
work environments.
Some papers explored gaps in training of early-career scientists. Industry participants in our
group confirmed this observation. We asked ourselves, “Are traditional courses failing?” We think
yes. Changes to undergraduate curricula requirements are difficult. But as programming models
become more complex, we have to raise the skill levels. Skills must be improved not only in
traditional programming – learning languages and algorithms – but also around professional software
development. We want to get to a point where “Everyone has a new minimum now – everyone knows
Git.” Developers also need training on licensing choices. Just because some source code is on Git
does not mean that it is open.
“How do you get people to look for the training they need?” wondered one participant. People
seek out opportunities like Software Carpentry to augment skills. While some instructions can be
done in institutional curricula, independent groups (non-profits, institutes) have more flexibility.
Some asked whether Silicon Valley would be interested in funding training so that people are better
prepared to enter the workforce. Some felt that companies were reluctant to deliver training for fear
that their employees might then leave them. While others felt that pushing all training to industry
could lead to good technical people “getting on the Google bus” and leaving sciences.
“How will we know when people are trained effectively in these new skills?” We discussed certifi-
cation. It can be hard to build the recognition of Java or database certifications among all technolo-
gies. OSS offers badges to those completing training. However we need to demonstrate proficiency,
not just completion. Google Summer of Code is a big CV augmentation for participants—can we
create something similar?
The group also discussed how to fund training. On participant observed that NYU runs a six-
week data-science training; companies grab the graduates and pay for those they hire. OSS also used
the NSF Software Institutes program as a vehicle to fund training activities. Software Carpentry
uses a collaborative teaching approach where people publish open teaching materials and receive
credits for their use.
We then discussed career paths for those supporting sustainable software. While tenure track
is not the only option for graduate students, the challenges for those who remain in academia can
be large. Research scientists are entirely dependent on soft money, which can be unpredictable.
Postdocs and those who do pursue tenure track positions need to publish and see no rewards in
software development. These challenges were all identified at WSSSPE1. What actions would we
recommend to improve things? Altmetrics and download statistics may slowly change the system
and improve a developer’s ability to receive credits for time invested in software development. NSF’s
recognition of scientific products including datasets, software, and publications is also helping.

Page 13
13
We asked ourselves, “Are there examples of things that are changing because of this and how
can we build momentum?” One example demonstrates change over time. In 2007, nanoHUB listed
the academic reward structure as a problem in an EDUCAUSE report where the authors note, “In
the future, nanoHUB researchers are hoping to change the research culture. While they recognize
that young faculty members are unlikely to get tenure based on their nanoHUB contributions, they
hope to encourage faculty to think beyond their own research needs to consider publishing tutorials
and other content in their fields on the nanoHUB site.” Fast forward seven years later where, in
a 2014 iSGTW article, quote nanoHUB Principal Investigator (PI) Gerhard Klimeck, “A former
student of mine published eight tools on nanoHUB, serving over 6,000 people with his tools. He
then joined a university as a professor and introduced nanoHUB. Use of the gateway from that
university skyrocketed; he used nanoHUB in existing classes, created new classes, and infused it
in his research. Ultimately, the professor’s department head attributed his two-year rise to tenure
with the reputation and innovation he gained through nanoHUB.”
The group also discussed “attribution trees,” an idea put forward by Dan Katz and Arfon Smith
where a chain of attributions can, for example, give credit to those developing libraries and building
blocks that other pieces of software use. The group considered potential journals and medium to
push the application of this idea. One participant noted that Dryad21 works with journals. If
a paper is accepted to one of those journals, the supporting data must be submitted to Dryad.
The group also discussed reproducibility as a component of the journal review process and “active
papers,” with immediate links to the data and software.
We then discussed variations among scientific domain areas and wondered “Are some communities
more or less open than others?” To some members of the group, biology seems to be more open,
while physics less so. Some felt, with the more recent development of bioinformatics as a field, there
were fewer historical practices to undo in biology. Physics has preprint philosophy to overcome.
Environmental sciences may be mixed. The group felt that the biomedical area, however, was very
competitive and closed. One participant mentioned blueprints for going open source (like NWChem
recently did) where authors outline how this helps, what you do and what the next steps are.
We then moved beyond our training discussion to address funding that encourages sustainable
software – funding of both people and projects that create a true system to support sustainability.
We called this “institutionalized serendipity.” As science is increasingly reliant on software, one
participant observed that “software development can have much broader impact than publishing
individual research, but it is not viewed that way.” Because of this centrality, one participant
mentioned that training ought to be called Science Carpentry rather than Software Carpentry. We
felt we were beginning to see changes in the research community as a result of the NSF’s data
management plan requirements. PIs are thinking more about data and some university libraries are
offering data repositories. We wondered if a software management plan might be effective. “How
might funding programs need to adapt to reward good software development practices?” we asked
ourselves. We thought about best practices such as version control, test harnesses, mailing lists,
bug tracking, community contribution, and reuse where appropriate. We thought about measuring
success through usage statistics (downloads, altmetrics). “Should funders demand reproducibility?”
we wondered. In order for results to be reproduced, software would need to be carefully curated.
Again, our industry participants contributed unique viewpoints. Partnering with industry was
seen as one path to sustainability. The unique partnerships Kitware engages in promote academic
freedom while creating an income stream from certain portions of the software. This type of
approach to sustainable software frees researchers from performing tasks that do not offer the
rewards their institutions value. We also discussed about successful models for industry partnerships
that preserve open science. Participants noted that there are some NSF programs that prohibit
partnerships with for-profit companies (but there are other programs in which this is encouraged).
7.1.3. Subgroup A Actions. Three main actions were identified by Subgroup A that would be of
interest to participants and benefit the research community. These focused on the desire to amplify
21
The Dryad Digital Repository, http://datadryad.org.

Page 14
14
the current community efforts in training and software engineering by connecting the current train-
ing initiatives (e.g., Software and Data Carpentry, rOpenSci, OSS) that are of different durations
at appropriate stages. Generally it was felt that a modicum of interaction between short term
workshops (SwC, DC), medium term trainings (e.g., OSS), and longer-term courses (e.g., BIDS
Data Science) would benefit from coordinating curriculum, discussing and aligning prerequisites,
and coordinating timing of courses. A training coordination effort would go a long way towards
amplifying the value of the individual efforts and make them all more effective.
• Action 1: Create a roadmap of research software training initiatives. Such a roadmap would
provide a taxonomy of training opportunities: what they deliver, and what attendees need
to know going in (prerequisites). It would also show which recommended roadmap actions
will have a clear and immediate payoff, and which will have long-term payoffs. The training
roadmap would emphasize time savings and efficiency gains to be had from each training.
• Action 2: Build a report card characterizing use of best practices in scientific software. Gener-
ally, people felt that researchers would be very willing to migrate practices if they could identify
where they needed to improve. Such a report card would ask simple questions to characterize
use of best practices in science software, and could be structured similarly to Joel Spolsky’s
Software Maturity questionnaire (also known as the Joel Test [24]). The survey would create
a report card that shows areas where a project could improve, and then link those areas to
specific training offerings from the training taxonomy from Action 1.
• Action 3: Create a science software review forum. It was generally acknowledged that a lit-
tle code review can have a tremendous impact on the quality of software in a project, but
that sites for science code review are lacking. While people can ask questions on the Stack
Exchange sites, they are generally not open to questions of style, approach, or appropriate-
ness, as they try to avoid subjective commentary. Instead, we need a site where code gets
discussed/summarized/described (in small bites) by the science software community. The tar-
get audience would be graduate and undergraduate students in the sciences, and there would
need to be mechanisms to keep the review positive and constructive, and not get pedantic or
judgmental. This could be tied to a software registry (such as the nascent GeoSoft project), or
to language repositories like R’s CRAN repository, and could lead to the report card discussed
in Action 2.
7.1.4. Subgroup B Actions. Subgroup B then focused on actions it could take. It discussed de-
velopment of a white paper that describes a matrix approach to training (multi-day, multi-week,
semester). The white paper might include a survey of existing techniques. There are many, some dat-
ing back many years, for example the Interuniversity Consortium for Political and Social Research
(ICPSR) and various summer institutes. The white paper could include a call for a comprehensive
assessment of these techniques. We need to think carefully about the right venue for such a white
paper, where it would have the most impact. The subgroup believes it would need to approach
editors directly to ascertain this.
The group felt that training in techniques that promote sustainability has a range of benefits:
career paths, educated reviewers, reproducible science, and more. There is some information on
how that has been approached and assessed, but more is needed. The group felt that this training
is undervalued and that it is important to communicate the return on investment (ROI) – both
individual ROI (skills that make scientists more effective and more marketable) and funder ROI
(better use of taxpayer funds, research more likely to be reproducible because sustainable software
exists, better trained reviewers).
7.2. Papers. The papers that were discussed in the Software Development Experiences Subgroup A
are:
• Michael R. Crusoe and C. Titus Brown. Channeling community contributions to scientific
Software: A Hackathon Experience [25]
• Marcus Hanwell, Patrick O’Leary, and Bob O’Bara. Sustainable software ecosystems: Software
Engineers, Domain Scientists, and Engineers Collaborating for Science [26]

Page 15
15
• W. Christopher Lenhardt, Stanley Ahalt, Matt Jones, J. Aukema, S. Hampton, S. R. Hespanh,
R. Idaszak, and M. Schildhauer. ISEES-WSSI Lessons for Sustainable Science Software from
an Early Career Training Institute on Open Science Synthesis [27]
• Jory Schossau and Greg Wilson. Which Sustainable Software Practices do Scientists Find Most
Useful? [28]
The papers that were discussed in the Software Development Experiences Subgroup B are:
• Jordan Adams, Sai Nudurupati, Nicole Gasparini, Daniel Hobley, Eric Hutton, Gregory Tucker,
and Erkan Istanbulluoglu. Landlab: Sustainable Software Development in Practice [29]
• Alice Allen and Judy Schmidt. Looking before Leaping: Creating a Software Registry [30]
• Carl Boettiger, Ted Hart, Scott Chamberlain, and Karthik Ram. Building Software, Building
Community: Lessons from the ROpenSci Project [31]
• Yolanda Gil, Eunyoung Moon, and James Howison. No Science Software Is an Island: Collab-
orative Software Development Needs in Geosciences [32]
• Ted Habermann, Andrew Collette, Steve Vincena, Werner Benger, Jay Jay Billings, Matt
Gerring, Konrad Hinsen, Pierre de Buyl, Mark Könnecke, Filipe Rnc Maia, and Suren Byna.
The Hierarchical Data Format (HDF): A Foundation for Sustainable Data and Software [33]
• Eric Hutton, Mark Piper, Irina Overeem, Albert Kettner, and James Syvitski. Building Sus-
tainable Software - The CSDMS Approach [34]
• James S. Spencer, Nicholas S. Blunt, William A. Vigor, Fionn D. Malone, W. M. C. Foulkes,
James J. Shepherd, and Alex J. W. Thom. The Highly Accurate N-DEterminant (HANDE)
Quantum Monte Carlo Project: Open-source Stochastic Diagonalisation for Quantum Chem-
istry [35]
8. Credit & Incentives
This group, with just three papers but a large amount of interest and participation from atten-
dees, focused on the institutional, social, and cultural mechanisms that encourage the creation and
maintenance of shared software, in the context of what now exists, what mechanisms are desired,
and how we might achieve them.
8.1. Discussion and Actions. In the first discussion session, this group decided to break into two
smaller groups, each independently working through the same general topic: credit and incentives.
8.1.1. First breakout discussion: Group A. The first sub-group discussed issues around the current
system for credit and incentives, which it called “hacking the incentive structure.” The group
considered four potential points of leverage:
First, that we currently have systems that collect information, and these could be modified to
collect different information, then map that information to actions. We could initially build a proof-
of-concept for a new use of a given system, then determine what actions would be needed to make
this use more common.
Second that we could create entirely new systems, perhaps because the existing systems are too
tied to what they measure, and modifying them is not practical.
Third that we could change academic culture, rather than worrying about the systems. This
was mostly focused on citations, because they matter for hiring, promotion, and tenure decisions.
The group discussed how we could weigh the citations within papers better than we now do. How
could we identify the five citations that really matter for a paper, distinguishing them from the
related works and general background that are also cited? Perhaps we could break these out in
the reference list, working with publishers to implement this. Or maybe we could also break out
categories of citations, such as the most important software used, the previous publication that we
are building from, the data that we actually used, etc. The Moore Foundation’s award in data
science was given as an example, asking proposers: What are the five canonical citations that are
most important to your work? [36] This would be a way of giving credit and assigning importance

Page 16
16
to these works, differently from how we just count citations today. A possible action that the group
discussed was conducting a longitudinal study of most useful software, data, etc. in a discipline.
Fourth, that we could change the ways funders make decisions, and use these funding policies as
incentives.
After this discussion, group A brainstormed about incentives, with the following items suggested:
• running programming contests, creating bounties for contributing to open source software, etc.
• augment author lists to give credit to people who do not now get credit (and making them
machine readable)
• developing a microcitation standard and mechanism (for both software and data)
• developing a well-defined standard for author contribution – what level of contribution rises to
the level of authorship?
• leveraging social media for citation and reviewing of content – then using social media to bring
more people into the review process than is traditional
• determining where else software can be cited and recorded (e.g., acknowledgments sections of
papers)
• developing a taxonomy of contributors (e.g., Project Credit [37]) tied to places that these metrics
are already stored (e.g., ORCID [38])
• making metadata easier to add for software, creating incentive for providing software metadata
– note that this cannot be centralized
• creating something like the h-index that tenure committees can make use of - simplify a way of
measuring and documenting the overall credit given to an individual over different projects
• thinking about publishing software versus journal - software does not have to be novel
• determining guidelines for recommending software characteristics for tenure - perhaps draft
guidelines then get ACM or IEEE agreement.
8.1.2. First breakout discussion: Group B. This group started by discussing who should be incen-
tivized, thinking of two categories of people: those in science (who could be incentivized to do
better, more shareable, more sustainable work), and those in industry but interested in science
(who could be incentivized to contribute to science.) It was pointed out that we are not yet clear
enough on exactly what we want to incentivize, suggesting that we need to have a clearer picture
of “good computational work” and what sort of contributions are truly generative for science.
The discussion of incentivizing those in science acknowledged that the publications system was far
from perfect for incentivizing good software work. Nonetheless, the discussion focused on bringing
software people into publications. There were two main suggestions. The first is to focus on end
users of software and encourage them to cite software the “right” way. James Howison suggested that
his research showed that few projects were making a formal request for citation (but that authors
weren’t necessarily following those suggestions anyway) [39]. He suggested making access to the
software conditional on a license that requires citation. Others found this “too confrontational”
and preferred to concentrate on making it easier to do the right thing. The second was focusing on
those leading software projects, and the group was more enthusiastic about “forcing” PIs to include
their “software people” on publications, although there were few ideas on how exactly to do this.
Another technique mentioned was that when scientific software projects are hosted in organizations
like Apache, the scientific contributors can benefit from building their reputations, perhaps yielding
job offers that they can use to negotiate better job and career packages.
The discussion on incentivizing those outside science focused on accessing the well of affection
that those working in software have for scientific research. How can the interest and skills of this
group be marshaled towards sustainable contributions? There is evidence that the migration of
scientific software projects to the Apache Software Foundation (ASF) has created opportunities for
those not employed in the scientific center to contribute to projects initiated by scientists (especially
where there is cross-over with industry needs, such as provenance and workflow).
The group also discussed developing “software prizes” arguing that while it is hard to “mint” other
new sources of reputation, prizes are possible without getting too many others on board. The prize

Page 17
17
criteria can form a template for describing what we mean by scientific contributions made through
software, particularly focusing on building active communities, not only writing great code.
8.1.3. Group merger and redivision. After the first breakout session, the groups A and B came
together and discussed a rollup of the ideas from the subgroups at a high level:
• citation ecosystem – traces of usage (metrics)
• taxonomy of contributorship, understanding roles
• prizes
• new metrics (for people’s activities in software)
• guidelines for evaluating scientific contribution through software (perhaps using new metrics).
In the remaining discussion sessions, the group chose to split into three subgroups to discuss a
version of these topics: citation ecosystems, taxonomy of contributorship/guidelines for software for
tenure review, and prizes. The subgroups were asked to clearly identify
• the problem to be solved
• steps towards a solution.
8.1.4. Remaining breakout discussions: Citation ecosystems. This group defined its goal as creating
a low-barrier-of-entry method for recording names and roles (and in a second phase, optionally
including weights) of contributors (coders and other intellectual contributions) to a software package
in a machine readable way (to be called a credit file), then encouraging the scientific community to
adopt this practice.
The following general points were initially discussed:
• The FLASH [40] project was suggested as an example of how something like this has been done.
• This data could be a file that can be associated with a citation to the software, either through
use of a DOI (digital object identifier) for the credit file, or by uploading the credit files as
associated with the paper.
• This idea could also be applied to data.
• The credit file should be part of the metadata that are freely available to those who have not
paid for access to the journal, like citations are now in most cases.
• It was suggested that there should also be a separate file to track software dependencies.
The group came up with the following actions to be performed:
(1) Build a tool that can automatically determine who the contributors are (from a Git or other
repository), then allows the user to manually edit the output to add/remove people, define
roles.
(2) Work with repositories to encourage them to provide the information we need based on what
they already store.
(3) Define what a citation file should look like and what it should be called.
(4) Test adoption with a substantial scientific organization such as the Lawrence Berkeley National
Laboratory (LBNL).
(5) Create credit file for a set of software.
(6) Build a validator (and perhaps a visualizer) for credit files.
(7) Write a tool to collect files and visualize/output interconnections (which software is used with
which), based on an existing project.
(8) When we (the group members) write papers, we should track the software we use, and encourage
the software developers to make their software citable and create credit files.
(9) Build a tool to export the credit file to BibTex and other citation styles.
(10) Make sure the BibTex entries (exported from internal data) are somewhat standardized so that
they can be imported into papers. Also make sure that standard LaTeX style files understand
and accept these entries.

Page 18
18
8.1.5. Remaining breakout discussions: Taxonomy of contributorship/guidelines for software for
tenure review. At the start of the discussion, the breakout group brought forth the important
observation of the wide disparity in commonly accepted habits of publication in different research
fields. In domains which have, historically, relied on large groups of researchers collaborating towards
a common goal (e.g., high-energy physics, astronomy), publications often have tens or even hundreds
of co-authors (with some papers in experimental particle physics having over 3000). In other
domains, the number of co-authors is typically much smaller with, in some cases, even a preference
for single-author papers. Similarly, the various platforms for publication are valued differently in
different domains. Most commonly, publications in peer-reviewed scientific journals are regarded
as the most important and most impactful. However, in certain domains, especially in Computer
Science, many researchers typically regard conference proceedings as their prime publication target.
It is often suggested that this difference is due to the rapid developments in information technology,
a pace that cannot be upheld by traditional peer-reviewed journals. Whatever the causes, any
useful taxonomy of contributorship or guideline for tenure review should take such differences into
account.
Despite these differences, and despite the fact that software often has taken the role of a proper,
albeit less tangible, scientific research instrument, neither the software nor its creators are com-
monly credited as part of a scientific publication. The group acknowledged the need for more
recognition for the creators of such software instruments, and indicated a number of possible path-
ways. First and foremost, domain scientists must be made aware of the important role of software,
and include the developers as co-authors of papers. A second approach is to fully embrace an
open badging infrastructure (such as Mozilla’s Open Badges), where a badge is a free, transferrable,
evidence-based indicator of an accomplishment, skill, quality, or interest. A third approach is for
the scientific community to support the increasing momentum of peer-reviewed journals specialized
in the open source/open access publication of scientific research software, such as Computer Physics
Communication, F1000 Research, Journal of Open Research Software, and SoftwareX.
Recognizing publication of research software as a proper scientific contribution raises several
important but currently unsolved questions, however. For example, is the number of users of the
software a relevant measure of impact? What standards of coding quality must be followed in order
to justify publication and hence recognition? Should the release of a new version of the software
be eligible for a new publication; if so, under what conditions? And above all: should software
publications be valued in the same way as traditional scientific publications? Or is there a need for
new measures of productivity and impact?
In part, the answers will come from the scientific community at large, as a natural consequence
of growing awareness and mindset change. Some of the answers, however, also should be based
on decades of experience in (and developing standards for) implementing, maintaining, refactoring,
documenting, testing, and deploying software instruments in scientific research. Care should be
taken, however, not to impose such standards for all domains in equal ways right from the start.
Forerunners should serve as an example, but should not scare away domains that have based their
progress on much less advanced methods of software carpentry. Nevertheless, proper guidelines are
needed, which eventually should be followed across all domains. The group also recognized that
funding bodies, universities, and publishers eventually should demand that research projects follow
such guidelines, and to implement a proper software sustainability plan.
To enable a form of standardized crediting for developers of research software, the group proposed
to work towards a taxonomy for software-based contributorship. The taxonomy should be derived
from, or extend, existing taxonomies for research impact and contributorship such as defined by
CASRAI (in particular based on the Wellcome-Harvard contributorship taxonomy22, VIVO, or ISNI.
An interesting measure of impact raised by the group was the betweenness centrality, an indicator
of a person’s centrality (and hence, importance) in a scientific collaboration. It is expected that
developers of research software often play such a central role.
22
Project CRediT, http://credit.casrai.org/

Page 19
19
The group defined the following actions to be performed:
(1) Investigate existing taxonomies for roles and contributorships.
(2) Investigate prototype badging initiatives.
(3) Investigate journals focusing on publishing peer-reviewed research software.
(4) Investigate guidelines and checklists of best practices.
(5) Communicate the results of the above investigations to the WSSSPE community and decision-
making bodies (funders, publishers, universities, and tenure committee representatives).
(6) Ensure engagement of the broader research community in this discussion.
8.1.6. Remaining breakout discussions: Prizes. This group discussed the idea of prizes. Prizes are
expected to reframe software as “instrument building” but will prizes be good or bad, and how can
we make sure there are no negative affects and the process cannot be gamed?
Prizes in different categories were discussed (like Academy Awards), for example: best contri-
bution (non-founder), broadest diversity of contributions, best tutorials or documentation, best
leadership transition (award ex-leader and new leader), best generalization (taking something that
was limited and making it more general), and best mentorship of contributors (bringing others into
the community).
Other (non-prize) forms of incentives are
• Converting reputation by joining the ASF, Google, etc.
• Inviting ASF and open source people to contribute to scientific code (at least in areas where
there is overlapping interest).
• Template for assessing scientific contributions made through software.
The group suggested that to determine who should give out prizes, perhaps we should find those
who we think would be awarded prizes, then ask them who they would want to receive a prize from,
and subsequently contact those organizations to see if they are willing to be involved in the process.
One of the group’s ideas was to create a funding program for disciplines or other organizations
to create a prize program. We would provide a framework and a set of requirements, for example:
awards to individuals; must award in 5 or 6 areas; must have a jury that includes senior/junior
domain experts and technologists (and should have objective criteria); must have the recipients
awarded at a relevant event; and must provide citations that explain why the prizes should be
awarded. Different organizations could then decide to sign up to the framework and give out
awards under this general brand. However, there was a concern that having many organizations
award prizes may reduce the impact of the prizes.
Possible groups that might give out awards, either under our framework, or more generally, are
AAAS, Nature Publishing Group, ACM, IEEE, Astro, Ecology Society of America, etc. Perhaps
this could be a joint technology/science partnership, for example, the [Apache | Mozilla]-[AAAS |
disciplinary society] prize.
Some potential criteria for prizes the group suggested are: community engagement, helping out
others; number of unique contributors; adding new pieces of functionality to software; integrating
software into broader ecosystem; championing broad principles of sustainability, open science, open
source, etc.; improving accessibility to software, to scientific software (perhaps championing inclu-
siveness or making software accessible?); documentation; tutorials; commits/patching; leadership
transition; and best contribution by a non-founder.
The group discussed if there should be different criteria for “established” members of the com-
munity versus junior members, and if prizes should be restricted to junior members, but left these
as open questions.
A point the group considered important is that we do not want to give prizes just to reward
people who are really good at this one thing, but rather we want to reward people who are building
the culture we want as scientists.
8.2. Papers. The papers that were discussed in the Citation & Incentives group are:

Page 20
20
• James Howison. Retract Bit-rotten Publications: Aligning Incentives for Sustaining Scientific
Software [41]
• Daniel S. Katz and Arfon M. Smith. Implementing Transitive Credit with JSON-LD [42]
• Ian Kelley. Publish or Perish: The Credit Deficit to Making Software and Generating Data [43]
9. Reproducibility & Reuse & Sharing
This group discussed five papers with a wide variety ideas of how to support reproducibility and
reuse. It focused on identifying concrete practices that the attendees could work together on, which
would have a positive effect on the community.
9.1. Discussion and Actions. In the first discussion session, this group broke up into two smaller
groups working on different topics: Reproducibility, and Reuse and Sharing. In the second discussion
session, three subgroups worked on creating specific pieces of guidance.
9.1.1. First breakout discussion: Reproducibility group. The first group discussed ways in which
reproducibility of papers could be improved. A consensus surrounded the provision of examples:
demonstrating to others how to achieve reproducibility. Major public funding investments go into
research that heavily relies on reproducible software, hence the lack thereof is raising major concerns.
Two main avenues could be used to implement policy that would improve reproducibility and
drive top-down culture change:
• Funders can aim to get more software expertise on to funding review panels and provide more
guidance on what is required of software related to research (cf. data management plans)
• Journals can define and publish journal policies to improve reproducibility, and reviewers can
insist that authors provide sufficient information and access to data and software to allow them
reproduce the results in the paper. Stronger policies even for some high-impact journals have
recently come into place, for example Nature Publishing journals.
If journals do enforce greater reproducibility constraints, it is important to lower the barriers to
reviewers attempting to verifying the correctness of the software used to generate the results. A
major issue is that a lot of software only builds on certain systems. Should journals provide more
tools/support for reviewers, and if so, what is it? An idea that came from the groups was to define a
set of support services that should be available to software paper reviewers. Another was to provide
the ability for reviewers to flag the requirement for a ‘software verification,’ similar to the ability
to flag that a paper needs to be seen by a statistician.
Other discussion focused on ways in which researchers themselves could improve the reproducibil-
ity of research. One way would be to establish tracks at conferences that subject papers to reproduc-
tion, which for those that pass would lend them an additional badge (similar to OOPSLA). Another
is simply to get more people to use your software: for instance by outreach to high school students
– can your software be used by them? Finally, there is a role for community-curated benchmarks
to validate the performance and capabilities of tools.
9.1.2. First breakout discussion: Reuse and Sharing group. The second group discussed ways in
which software could be more easily reused and shared. From the perspective of both user and
developer, any solutions must be 1) easy; 2) cheap; 3) not too time-consuming.
Reuse and sharing were considered to be distinct but linked. In many cases, pre-existing software
does not exist, so new software is written but even then it is not shared afterwards. The principal
barriers to reuse are the difficulty of finding out what software is available, understanding if it is
usable, and then of installing and running software if it is located.
Discovery of relevant software is still a fundamental issue: we need standard vocabularies and
metadata, better tools, and approaches that are sustainable. Publications are an easy entry point for
locating suitable software, but should the publishers lead the way, or is this the responsibility of “the
community”? Some communities have had significant initiatives to improve software discovery, e.g.,

Page 21
21
the NIH Software Discovery Index23. Likewise, there were examples of journals which had made
software more discoverable: ACM Transactions on Software offers a reproducibility review; the
Journal of Biostatistics has an opt-in to provide code and a certification mark if it can be run; the
Journal of Open Research Software requires software to be deposited in suitable repositories and
referenced.
An issue around the sustainability of software catalogues is that their usefulness often depends
on the domain. For instance, in the biosciences, there is more homogeneous data and standard
shared code. In areas like ecology, code is often very specific to a problem, meaning that the level
of re-use might be at a general statistical level of abstraction, but then every research use is highly
customized.
Finally, it was clear from the discussion that there were many ways in which software could be
made available in more reusable ways than just a tarball sitting on a personal website. Using code
repositories like GitHub24 gets you an archived, shared platform and improve the reusability of your
software incrementally, for instance by adding a license or by archiving (with a DOI) in Figshare25 or
Zenodo26. Docker27 might be a solution to the issue of dependencies, to allow binaries and libraries
to be bundled in a more lightweight fashion than a virtual machine image.
The key enabler for reuse and sharing was to get domain scientists more effectively connected
with programmers/analysts. Both have skills and experience which is necessary to make the right
decisions for improving the reusability and discoverability of software, and to apply community
pressure to change practice.
9.1.3. Second breakout discussion: Categorization of software journals. This group aimed to come
up with a categorization of journals which published software papers. Starting from the list of
journals28 maintained by the Software Sustainability Institute, the group chose seven journals and
looked at their advice to authors and reviewers. These were the Journal of Open Research Software29,
PLoS ONE30, Journal of Statistical Software31. Methods in Ecology and Evolution32, Transactions
of Mathematical Software33, GigaScience34, and PLoS Computational Biology35. From these a set
of common categories were synthesized, against which all journals could be compared:
• Journal Policies
– Accessibility of papers
∗ Open access
∗ “Freely” available
– Repositories
∗ Provides suggestions for recommended repositories
∗ Provides its own repository
– Review
∗ Reviewing software is mandatory
· Must check that software runs
· Must check quality of code
· Must check performance of code if paper makes claims on relative performance
23
NIH Software Discovery Index: http://softwarediscoveryindex.org/
24GitHub: http://github.com/
25FigShare: http://figshare.com/
26Zenodo: http://zenodo.org/
27Docker: htp://www.docker.com/
28http://www.software.ac.uk/resources/guides/which-journals-should-i-publish-my-software
29http://openresearchsoftware.metajnl.com/about/editorialPolicies#peerReviewProcess
30http://www.plosone.org/static/guidelines#software
31http://www.jstatsoft.org/instructions
32http://www.methodsinecologyandevolution.org/view/0/authorGuidelines.html
33http://toms.acm.org/Authors.html
34http://www.gigasciencejournal.com/about
35http://www.ploscompbiol.org/static/guidelines#software

Page 22
22
– Supporting data
∗ Must be publicly available
∗ Must be in an open access repository
∗ must have a DOI
– Article processing charge (APC)
∗ APC charge is transparent
∗ APC waiver program
• Paper Policies
– Required sections
– Keywords
∗ Paper provides keywords to help describe software
– Papers can be updated when new releases of software are made
∗ At no extra cost/at significantly reduced cost
• Software Policies
– Software must have a license
∗ Software must have an open source license
– Availability
∗ Software must be openly available and accessible
– Deposit policies
∗ Software should be in a public repository
· Of particular stature/with a preservation plan
∗ Software should have a permanent identifier
∗ Software deposit doesn’t count as a prior publication
– Runnability and dependencies
∗ Provide documentation to understand how to run
∗ Provide sample data
∗ Provide dependency information
The follow-up actions to this work are to use this set of categories on all the journals in the list,
refining the categories if necessary, then identify if any of the categories are seen to be more useful
to promote reproducibility, reuse and sharing.
9.1.4. Second breakout discussion: What should journals provide reviewers of software papers? This
group discussed whether they could come up with a list of things that a journal should provide its
reviewers to make it easier to review software submitted for publication as a “Software Paper.”
Journals should provide guidelines about what to consider when reviewing a software submission.
A good example for a relatively comprehensive guidelines for reviewers (and thus in turn authors)
are those of JORS36. Journals might also learn from organizations such as the ASF as to what is
good practice for software submissions. Guidance is needed on what constitutes an incremental
improvement that is significant enough to qualify for publication, otherwise this assessment can be
very subjective.
Journals should provide mechanisms to enable and track communication between reviewers and
authors. For anonymous peer review journals, anonymity of reviewers should be maintained. If
communication is necessary, that may mean that software is not that well documented. Journals
should also provide a set of simple metrics for software evaluation that reviewers can use for ratings,
similar to Consumer Reports.
Journals should provide guidelines about requirements for documentation of code: both in-lined
in code, and manuals/web pages, etc. Journal editors could provide documentation checks before it
goes out for review. This should include a requirement for good Use Cases specified for the software,
with references to executable test cases that demonstrate each use-case is met (at least in the form
of the test case).
36
http://openresearchsoftware.metajnl.com/about/editorialPolicies

Page 23
23
Journals should support mechanisms to run software. Sometimes this may be very hard to
accomplish, despite best efforts. Journals could provide an execution environment for any software
submitted for review, perhaps via a Docker container or a virtual machine. If not, instructions must
be adequate to compile and execute the software; to interpret the results (output files, formats,
etc.); and the full source code must be accessible to the reviewer. Mechanisms to quantify what
has changed compared to a previously published version would assist version comparison. However,
this cannot simply take lines of code into account. For example, a speedup of an algorithm may
not result in a huge code difference, but may nonetheless provide a breakthrough. Journals should
require software submissions to also provide ‘test materials’ – sample data, parameters to validate
that code is working as intended. In certain cases, well-selected benchmark datasets may be required
to assess performance and accuracy.
Journals should ensure minimal metadata are provided, similarly as is already the case for certain
kinds of data (though the latter is often enforced by data repositories) – Dublin core-ish (creator,
owner), and more specific (platform and compiler dependencies, sample benchmarks of performance,
etc.). Journals should provide guidance or constraints as to software licensing conditions.
The follow-up actions to this work are to work with journals and reviewers to identify whether
any of these suggestions can be easily provided, perhaps for a range of journals.
9.1.5. Second breakout discussion: Reproducibility Meta-track toolkit. This group worked on defining
a “toolkit” for running a reproducibility meta-track at a conference. They decided to take the work
done during the workshop and publish it as a paper.
9.2. Papers. The papers that were discussed in the Reproducibility & Reuse & Sharing group are:
• Jakob Blomer, Dario Berzano, Predrag Buncic, Ioannis Charalampidis, Gerardo Ganis, George
Lestaris and René Meusel. The Need for a Versioned Data Analysis Software Environment [44]
• Ryan Chamberlain and Jennifer Schommer. Using Docker to Support Reproducible Re-
search [45]
• Neil Chue Hong. Minimal Information for Reusable Scientific Software [46]
• Tom Crick, Benjamin A. Hall, and Samin Ishtiaq. “Can I Implement your Algorithm?”: A
Model for Reproducible Research Software [47]
• Bryan Marker, Don Batory, Field G. Van Zee, and Robert van de Geijn. Making Scientific
Computing Libraries Forward Compatible [48]
• Stephen Piccolo. Building Portable Analytical Environments to Improve Sustainability of
Computational-Analysis Pipelines in the Sciences [49]
10. Code Testing & Code Review
This group began by recognizing that they knew what the problems of code testing and code
review are, so the purpose of the group was really to think about concrete things that people in the
group would be willing to commit to, in order to improve the sustainability of software.
10.1. Discussion and Actions. The consensus of the group was that small, concrete actions
would have the greatest chance of proper followthrough by members of the group. An example of
this approach was the Architecture of Open Source Applications book [50], containing contributed
chapters on open source software. The group contemplated creating an analogous book or web
resource for testing of scientific software.
A key point of discussion was that there are two major challenges to software testing in science.
The first is convincing developers to incorporate testing. This challenge is both social and technical.
We need to communicate the value of testing and also teach developers how to choose and use testing
frameworks. The second challenge—and the one that the group felt was more difficult—is choosing
appropriate tests for scientific software. This is the difference between learning how to use, say,
Python unittest and knowing how to test that one’s code correctly implements, for example, a
Lattice-Boltzmann model in computational fluid dynamics.

Page 24
24
10.1.1. Choosing and implementing testing frameworks. There was general consensus that once a
developer has been exposed to a testing framework in a nontrivial fashion, he/she will subsequently
insist upon using a framework for further work. The acknowledged challenge was how to create
that crucial first exposure to such frameworks.
Within the group, there were several different paths by which participants had their first exposure
to code testing:
• Formal tutorials at conferences (for example, a test driven development (TDD) at a Software
Developers Best Practices conference): These often require self-learning following a tutorial, but
it can be hard to find a tutorial that matches the programming language and/or the domain of
the participant.
• As a way to be confident in other software: When breaking out pieces of software from a larger
application, a developer wants to ensure the software works as expected. These types of tests
may be ephemeral though, living long enough to give the author of the test confidence in the
code, but not handed on and made available to next user. These experiences can lead to more
systematic testing.
• Through experience with coding: As you write more software, your confidence in code is reduced
and the importance of testing becomes clearer. Attendees joked that they started testing “when
their skepticism/guilt became larger than their arrogance.”
• From other projects: When building on top of projects with good testing frameworks, you
realize that you should also adopt those good software engineering habits.
The two non-self-taught paths were direct learning and indirect through other software (learning
by example). The group recognized that teaching about testing needs to start early in projects and
careers. We need to teach people how to test short bits of code, rather than waiting until they have
thousands of lines of code and being surprised when they don’t write tests. The group generated
a set of practical suggestions for improving adoption of code testing. Established software projects
can encourage the latter through a simple rule of accepting only software patches with associated
tests and through good documentation of testing practices and requirements.
Simpler, standard ways of setting up appropriate testing infrastructure will be important for
adoption by scientists. Jenkins37 was suggested as a common open-source solution, but the initial
configuration was considered to be challenging for typical scientists.
Teach testing by beginning with a (smallish) piece of code that lacks appropriate tests and develop
an exercise that involves refactoring the code into a more presentable form through creating/deriving
unit tests. Students would extract unit-tests through reverse-engineering and/or questioning a
partner who is a developer/expert.
This led into a more general discussion about whether programming courses should be part of
the standard curriculum for science students and the challenges of fitting additional material into
university curricula. This is the reasons that short workshops such as Software Carpentry and
others exist.
Ironically, Software Carpentry no longer teaches testing [51]. This boils down to two issues. First,
that scientific computing doesn’t (yet) have the cultural norms for error bars that experimental
sciences have, and second, that there is a breathtaking diversity of scientific code; scientific research
is highly specialized, which means that the tests scientists write are much less transferable or
reusable than those found in other fields.
10.1.2. Writing tests for scientific software. The second major discussion thread in the group ad-
dressed the issues identified by Software Carpentry – the difficulties in writing tests appropriate
for scientific software: What tests are appropriate to ensure that an complex method is working
correctly? When is the result of a numerical computation ‘close enough’ to pass a test?
Several members of the group indicated that while they valued the concept of software testing in
theory, they were unaware of how to test certain kinds of software. Or, more importantly, they lacked
relevant examples of software testing that would be suitable for new members of their teams. As an
37
https://jenkins-ci.org/

Page 25
25
example, the group asked if it could produce a 100-line example of how to test neutron transport
(or other specific scientific examples) targeted at a sophomore? This would be a demonstration of
“How do I do this right?”
Another question was “How do I test so that I know that the code is not the source of my prob-
lems?” A common challenge is that of stripping things down to an appropriate level for tests. The
group recognized that the difficulty is different for so-called software infrastructure than for numer-
ical/scientific layers of software applications. There is some discussion of the issue in the computer
science literature [52, 53] but less in domain-specific journals. Some lessons on multiphysics software
verification can be found in a recent paper [54].
Two important pieces of the barrier are a) picking/implementing testing frameworks (a technical
barrier) and b) deciding what are the actual tests that I need to write? A possible method to solve
the latter is to ask students “how is what you are doing different from typing random keys on the
keyboard?” and then, turn their answer into the concept of tests. That is to say, expand the process
of creating code from one that is solely about “how?” into one that also includes “why?”
Another point of discussion is that many scientists are not aware that “testing the code” and
“testing the science” are distinct issues. A related question that came up is “How do you know that
your method produces a result that is ‘close enough’?”
The group then decided that a useful action would be to create a set of scientific codes with
associated tests. They developed the following basic structure for these examples and sketched out
six specific examples:
(1) A paragraph or two explaining the scientific problem the code addresses
(2) The size of the simplest piece of relevant, interesting code (i.e., an estimate of lines of code)
(3) A point-form list of the test cases you would use.
The follow-up actions from this work include creating a compilation of testing examples in sci-
entific software. Some of the examples from the workshop have become the starting seed of a
collaborative book (https://github.com/swcarpentry/close-enough-for-scientific-work) in which sci-
entists provide concrete examples of testing scientific software. The goal is a set of testing examples,
aimed at sophomores in science and engineering, that cover a broad range of domains and problems
and could be easily incorporated into other workshops or courses.
10.2. Papers. The papers that were discussed in the Code Testing & Code Review group are:
• Thomas Clune, Michael Rilee, and Damian Rouson. Testing as an Essential Process for Devel-
oping and Maintaining Scientific Software [55]
• Marian Petre and Greg Wilson. Code Review for and by Scientists [56]
• Andrew E. Slaughter, Derek R. Gaston, John Peterson, Cody J. Permann, David Andrs, and
Jason M. Miller. Continuous Integration for Concurrent MOOSE Framework and Application
Development on GitHub [57]
11. Conclusions
The WSSSPE2 workshop continued our experiment from WSSSPE1 in how we can collaboratively
build a workshop agenda, and we began a new experiment in how to build a series of workshops
into an ongoing community activity.
The differences in workshop organization in WSSSPE2 from WSSSPE1 are in using an existing
service (EasyChair) to handle submissions and reviews, rather than an ad hoc process, and using an
existing service (Well Sorted) to allow collaborative grouping of papers into themes by all authors,
reviewers, and the community, rather than this being done in an ad hoc manner by the organizers
alone.
The fact remains that contributors also want to get credit for their participation in the process.
And the workshop organizers will want to make sure that the workshop content and their efforts
are recorded. Ideally, there would be a service that would index the contributions to the workshop,
serving the authors, the organizers, and the larger community. Since there still isn’t such a service

Page 26
26
Table 3. Top tweets tagged #WSSSPE on Nov 16, 2014.
Author
Tweet
Retweets
Favorites
Neil P Chue Hong
Here’s @SoftwareSaved guidance on Writing and using a software
7
4
management plan used by EPSRC software grants
http://www.software.ac.uk/resources/guides/software-management-plans
Neil P Chue Hong
@jameshowison as well as software plans
4
7
http://www.software.ac.uk/resources/guides/software-management-plans
we provide a software evaluation tool:
http://www.software.ac.uk/online-sustainability-evaluation
Tom Crick
56% of UK researchers develop their own software → 140, 000
14
8
UK researchers write research software w/out any formal training
Karthik Ram
OH: “Institutionalize metadata before metadata institutionalizes you”
8
6
Josh Greenberg
@jameshowison: “1. retract any paper with bitrotten dependencies” *mic drop*
13
8
“2. add anyone who fixes bitrot as an author” *mic drop*
Ethan White
“@rOpenSci is all about community... our measures of success
9
3
[include] how many faces are up on our community page”
Ethan White
Daniel Katz talking about implementing transitive credit for
9
7
software http://arxiv.org/abs/1407.5117 Work with @arfon
Kaitlin Thaney
Great point by @tracykteal about planning for “end of life” with scientific
4
8
software projects and sustainability.
Aleksandra Pawlik
Lack of training as one of the main barriers for sustainable software
10
4
at @Supercomputing. @swcarpentry @datacarpentry can fix that!
Kaitlin Thaney
My slides from this morning’s keynote at
11
12
WSSSPE on Designing for Truth, Scale + Sustainability:
http://www.slideshare.net/kaythaney/
designing-for-truth-scale-and-sustainability-wssspe2-keynote
Neil P Chue Hong
@kaythaney shout out for @swcarpentry @datacarpentry
9
4
@rOpenSci @stilettofiend around open training activities for sustainability
Neil P Chue Hong
For those interested in Github - Figshare/Zenodo integration,
5
12
but want SWORD/DSpace/Fedora/ePrints see:
http://blog.stuartlewis.com/2014/09/09/github-to-repository-deposit/
Hilmar Lapp
Re: adopting the unix philosophy, consider signing the Small Tools in
7
6
Bioinformatics Manifesto: https://github.com/pjotrp/bioinformatics
Andre Luckow
“Traditions last not because they are excellent,
12
3
but because influential people are averse to change...” C. Sunstein
Tom Crick
“Can I Implement Your Algorithm?”: A Model for Reproducible
9
8
Research Software http://arxiv.org/abs/1407.5981
Mozilla Science Lab
At a loose end this Sunday? Care about reproducibility, software +
10
5
#openscience? Follow the #WSSSPE hashtag for more, live from New Orleans.
Kaitlin Thaney
I’m in New Orleans at #WSSSPE , speaking at 9:50 ET on scientific software
9
9
+ sustainability. Tune in! Live stream: http://ustre.am/17ddh
Daniel S. Katz
#WSSSPE Agenda (Sunday):
10
1
http://wssspe.researchcomputing.org.uk/wssspe2/agenda/
URL for live stream of keynotes & lightning talks: http://ustre.am/17ddh
today, the workshop organizers are writing this initial report and making use of arXiv as a partial
solution to provide a record of the workshop.
WSSSPE actively used the online social network Twitter, with hashtag “#WSSSPE”. There were
substantially more tweets (messages) during the days of the workshops WSSSPE2, WSSSPE1.1, and
WSSSPE1. Out of about 670 tweets as of Apr 18, 2015, more than 225 were about WSSSPE2 and
about 180 were posted during the day of the workshop. Some of the main points and highlights in
the meeting are shown in Table 3, which summarizes the top #WSSSPE tweets from the day of
workshop, selected by the metrics that number of retweets or favorites larger than five and the sum
of two measures greater than ten.
In terms of building community activities, we wanted to focus primarily on working groups, which
we were able to do, as discussed above, but we also wanted to make sure that attendees felt they
had a chance to get their ideas across to the whole group, which was the purpose of the lightning
talks. Overall, this seemed to be successful at the time, in terms of both the lightning talks and
the breakout groups, and the discussion of sustainability also led to interesting and useful results.
However, the challenge that we have discovered since WSSSPE2 is that it is very hard to continue
the breakout groups’ activities. The WSSSPE2 participants were willing to dedicate their time to

Page 27
27
the groups while they were at the meeting, but afterwards, they have gone back to their (paid) jobs.
We need to determine how to tie the WSSSPE breakout activities to people’s jobs, so that they feel
that continuing them is a higher priority than it is now, perhaps through funding the participants,
or through funding coordinators for each activity, or perhaps by getting the workshop participants
to agree to a specific schedule of activities during the workshop.
Acknowledgments
Work by Katz was supported by the National Science Foundation while working at the Foun-
dation. Any opinion, finding, and conclusions or recommendations expressed in this material are
those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Appendix A. Attendees
The following is a partial list of workshop attendees who registered on the collaborative notes
document [4] that was used for shared note-taking at the meeting, or who participated in a breakout
groups and were noted in that group’s notes.
Jordan Adams
Tulane University
Alice Allen
Astrophysics Source Code Library (ASCL)
Gabrielle Allen
University of Illinois Urbana-Champaign
Pierre-Yves Aquilanti TOTAL E&P R&T USA
Wolfgang Bangerth
Texas A&M University
David Bernholdt
Oak Ridge National Laboratory
Jakob Blomer
Carl Boettiger
University of California Santa Cruz & rOpenSci
Chris Bogart
ISR/CMU
Steven R. Brandt
Louisiana State University
Neil Chue Hong
Software Sustainability Institute & University of Edinburgh
Tom Clune
NASA GSFC
John W. Cobb
Dirk Colbry
Michigan State University
Karen Cranston
NESCent
Tom Crick
Cardiff Metropolitan University, UK
Ethan Davis
UCAR Unidata
Robert R Downs
CIESIN, Columbia University
Anshu Dubey
Lawrence Berkeley National Laboratory
Nicole Gasparini
Tulane University, New Orleans
Yolanda Gil
Information Sciences Institute, University of Southern California
Kurt Glaesemann
Pacific northwest national lab
Sol Greenspan
National Science Foundation
Ted Habermann
The HDF Group
Marcus D. Hanwell
Kitware
Sarah Harris
University of Leeds
David Henty
EPCC, The University of Edinburgh
James Howison
University of Texas
Maxime Hughes
Eric Hutton
University of Colorado
Ray Idaszak
RENCI/UNC
Samin Ishtiaq
Microsoft Research Cambridge, UK
Matt Jones
University of California Santa Barbara
Nick Jones
New Zealand eScience Infrastructure, University of Auckland
Daniel S. Katz
University of Chicago & Argonne National Laboratory
Ian Kelley
Hilmar Lapp
National Evolutionary Synthesis Center (NESCent) & Duke University
Christopher Lenhardt
Richard Littauer
University of Saarland

Page 28
28
Frank Löffler
Louisiana State University
Andre Luckow
Rutgers
Berkin Malkoc
Istanbul Technical University
Kyle Marcus
University at Buffalo
Bryan Marker
The University of Texas at Austin
Suresh Marru
Indiana University
Robert H. McDonald
Data to Insight Center/Libraries, Indiana University
Rupert Nash
Andy Nutter-Upham
Whitehead Institute
Abani Patra
University at Buffalo
Aleksandra Pawlik
Software Sustainability Institute
Cody J. Permann
Idaho National Laboratory
John W. Peterson
Idaho National Laboratory
Benjamin Pharr
University of Mississippi
Stephen Piccolo
Brigham Young University, Utah
Marlon Pierce
Indiana University
Ray Plante
NCSA, University of Illinois Urbana-Champaign
Sushil Prasad
Georgia State University, Atlanta
Karthik Ram
Berkeley Institute for Data Science, University of California Berkeley & rOpenSci
Mike Rilee
NASA/GSFC & Rilee Systems Technologies
Erin Robinson
Foundation for Earth Science
Mark Schildhauer
NCEAS, Univ. California, Santa Barbara
Jory Schossau
Michigan State University
Frank Seinstra
Netherlands eScience Center
James Shepherd
Rice University
Justin Shi
Ardita Shkurti
University of Nottingham
Alan Simpson
EPCC, The University of Edinburgh
Carol Song
Purdue University
James Spencer
Imperial College London
Tracy Teal
Data Carpentry
Kaitlin Thaney
Mozilla Science Lab
Matt Turk
NCSA, University of Illinois Urbana-Champaign
Colin C. Venters
University of Huddersfield
Nathan Weeks
Ethan White
University of Florida/Utah State University
Nancy Wilkins-Diehr San Diego Supercomputer Center, University of California San Diego
Greg Wilson
Software Carpentry
References
[1] Katz DS, Allen G, Chue Hong N, Parashar M, Proctor D. First Workshop on on Sustainable Software for Science:
Practice and Experiences (WSSSPE): Submission and Peer-Review Process, and Results. arXiv; 2013. 1311.3523.
http://arxiv.org/abs/1311.3523.
[2] Katz DS, Choi SCT, Lapp H, Maheshwari K, Löffler F, Turk M, et al. Summary of the First Workshop on
Sustainable Software for Science: Practice and Experiences (WSSSPE1). Journal of Open Research Software.
2014;2(1). http://dx.doi.org/10.5334/jors.an.
[3] Katz DS, Allen G, Chue Hong N, Cranston K, Parashar M, Proctor D, et al. Second Workshop on Sustainable
Software for Science: Practice and Experiences (WSSSPE2): Submission, Peer-Review and Sorting Process, and
Results. arXiv; 2014. 1411.3464. http://arxiv.org/abs/1411.3464.
[4] WSSSPE2 attendees. WSSSPE2 Collaborative Notes;. Accessed: 2015-02-15. https://docs.google.com/
document/d/1-BxkYWDQ6nNNBXBStUL0xcKF9qCTlEALwf928J MemI/.
[5] Thaney K. Designing for Truth, Scale, and Sustainability [slides]; 2014. Retrieved from http://www.slideshare.
net/kaythaney/designing-for-truth-scale-and-sustainability-wssspe2-keynote.
[6] Chue Hong N. We are the 92% [slides]; 2014. Retrieved from http://dx.doi.org/10.6084/m9.figshare.1243288.
[7] Venters C. The Nebuchadnezzar Effect: Dreaming of Sustainable Software through Sustainable Software Archi-
tectures [poster]; 2014. Retrieved from http://figshare.com/articles/The Nebuchadnezzar Effect Dreaming of
Sustainable Software through Sustainable Software Architectures/1243322.

Page 29
29
[8] Pierce M. Patching It Up, Pulling It Forward [poster]; 2014. Retrieved from http://wssspe.researchcomputing.
org.uk/wp-content/uploads/2014/11/Pierce.pdf.
[9] Peterson J. Continuous Integration for Concurrent MOOSE Framework and Application Devel-
opment on GitHub [poster];
2014. Retrieved from https://drive.google.com/file/d/0B9BK7pg8se
iWkdBYkhkM1d4bkN3QVNoekFGWE9WODVnaUxB/view?pli=1.
[10] Patra A. Execute it [poster]; 2014. Retrieved from http://wssspe.researchcomputing.org.uk/wp-content/uploads/
2014/11/Patra.pdf.
[11] Katz DS, Smith A. Implementing Transitive Credit with JSON-LD [poster]; 2014. Retrieved from http://dx.doi.
org/10.6084/m9.figshare.1234063.
[12] Downs R, Lenhardt WC, Robinson E, Davis E, Weber N. Community Recommendations for Improving Sustain-
able Scientific Software Practices [poster]; 2014. Retrieved from http://wssspe.researchcomputing.org.uk/wp-
content/uploads/2014/11/Downs.pdf.
[13] Boettiger C. rOpenSci: Building Sustainable Software by Fostering a Diverse Community [poster]; 2014. Retrieved
from http://wssspe.researchcomputing.org.uk/wp-content/uploads/2014/11/Boettiger1.pdf.
[14] Blomer J. The Need for a Versioned Data Analysis Environment [poster]; 2014. Retrieved from http://wssspe.
researchcomputing.org.uk/wp-content/uploads/2014/11/Blomer.pdf.
[15] Rosado de Souza M, Haines R, Jay C. Defining Sustainability through Developers’ Eyes: Recommendations from
an Interview Study. figshare; 2014. 1111925. http://dx.doi.org/10.6084/m9.figshare.1111925.
[16] Downs R, Lenhardt WC, Robinson E, Davis E, Weber N. Community Recommendations for Sustainable Scientific
Software. ESIP Commons; 2014. P3VX0DFX. http://dx.doi.org/10.7269/P3VX0DFX.
[17] Patra A, Jones M, Gallo S, Marcus K, Kosar T. Role of Online Platforms, Communications and Workflows in
Developing Sustainable Software for Science Communities. figshare; 2014. 1112569. http://dx.doi.org/10.6084/
m9.figshare.1112569.
[18] Pierce M, Marru S, Mattmann C. WSSSPE2: Patching It Up, Pulling It Forward. figshare; 2014. 1112540.
http://dx.doi.org/10.6084/m9.figshare.1112540.
[19] Shi J. Seeking the Principles of Sustainable Software Engineering. arXiv; 2014. 1405.4464. http://arxiv.org/abs/
1405.4464.
[20] Venters CC, Griffiths MK, Holmes V, Ward RR, Cooke DJ. The Nebuchadnezzar Effect: Dreaming of Sustainable
Software through Sustainable Software Architectures. figshare; 2014. 1112484. http://dx.doi.org/10.6084/m9.
figshare.1112484.
[21] ISO/IEC. ISO/IEC 25010:2011 Systems and software engineering: Systems and Software Quality Requirements
and Evaluation (SQuaRE): System and software quality models;. http://www.iso.org/iso/catalogue detail.htm?
csnumber=35733. Available from: http://www.iso.org/iso/catalogue detail.htm?csnumber=35733.
[22] Bourque P, Fairley RE, editors. SWEBOK v3.0: Guide to the Software Engineering Body of Knowledge. IEEE
Press; 2014.
[23] Bourne PE, et al. Ten Simple Rules Collection. PLOS Computational Biology; 2005–2015. Retrieved from http:
//www.ploscollections.org/article/browse/issue/info%3Adoi%2F10.1371%2Fissue.pcol.v03.i01.
[24] Spolsky J. The Joel Test: 12 Steps to Better Code. In: Joel on Software. joelonsftware.com; 2000. [blog] http:
//www.joelonsoftware.com/articles/fog0000000043.html.
[25] Crusoe MR, Brown CT. Channeling community contributions to scientific software: A hackathon experience.
figshare; 2014. 1112541. http://dx.doi.org/10.6084/m9.figshare.1112541.
[26] Hanwell M, O’Leary P, O’Bara B. Sustainable Software Ecosystems: Software Engineers, Domain Scientists, and
Engineers Collaborating for Science. figshare; 2014. 1112482. http://dx.doi.org/10.6084/m9.figshare.1112482.
[27] Lenhardt WC, Ahalt S, Jones M, Aukema J, Hampton S, Hespanh SR, et al. ISEES-WSSI Lessons for Sustainable
Science Software from an Early Career Training Institute on Open Science Synthesis. figshare; 2014. 1112560.
http://dx.doi.org/10.6084/m9.figshare.1112560.
[28] Schossau J, Wilson G. Which Sustainable Software Practices Do Scientists Find Most Useful? arXiv; 2014.
1407.6220. http://arxiv.org/abs/1407.6220.
[29] Adams J, Nudurupati S, Gasparini N, Hobley D, Hutton E, Tucker G, et al. Landlab: Sustainable Software
Development in Practice. figshare; 2014. 1097629. http://dx.doi.org/10.6084/m9.figshare.1097629.
[30] Allen A, Schmidt J. Looking before leaping: Creating a software registry. arXiv; 2014. 1407.5378. http://arxiv.
org/abs/1407.5378.
[31] Boettiger C, Hart T, Chamberlain S, Ram K. Building software, building community: lessons from the ROpenSci
project. figshare; 2014. 1112581. http://dx.doi.org/10.6084/m9.figshare.1112581.
[32] Gil Y, Moon E, Howison J. No Science Software is an Island: Collaborative Software Development Needs in
Geosciences. figshare; 2014. 1112561. http://dx.doi.org/10.6084/m9.figshare.1112561.
[33] Habermann T, Collette A, Vincena S, Benger W, Billings JJ, Gerring M, et al. The Hierarchical Data Format
(HDF): A Foundation for Sustainable Data and Software. figshare; 2014. 1112485. http://dx.doi.org/10.6084/
m9.figshare.1112485.
[34] Hutton E, Piper M, Overeem I, Kettner A, Syvitski J. Building Sustainable Software - The CSDMS Approach.
arXiv; 2014. 1407.4106. http://arxiv.org/abs/1407.4106.

Page 30
30
[35] Spencer JS, Blunt NS, Vigor WA, Malone FD, Foulkes WMC, Shepherd JJ, et al. The Highly Accurate N-
DEterminant (HANDE) quantum Monte Carlo project: Open-source stochastic diagonalisation for quantum
chemistry. arXiv; 2014. 1407.5407. http://arxiv.org/abs/1407.5407.
[36] Stalzer M, Mentzel C. A Preliminary Review of Influential Works in Data-Driven Discovery. arXiv; 2015.
1503.08776. http://arxiv.org/abs/1503.08776.
[37] CASRAI. Project Credit;. Accessed: 2015-03-31. http://credit.casrai.org.
[38] Open Researcher and Contributor ID (ORCID);. Accessed: 2015-03-31. http://http://orcid.org/.
[39] Howison J, Bullard J. Software in the scientific literature: Problems with seeing, finding, and using software
mentioned in the biology literature. Journal of the Association for Information Science and Technology (JASIST).
in press;Available from: http://dx.doi.org/10.6084/m9.figshare.1146366.
[40] Dubey A, Antypas K, Ganapathy MK, Reid LB, Riley K, Sheeler D, et al. Extensible Component Based Architec-
ture for FLASH, A Massively Parallel, Multiphysics Simulation Code. Parallel Computing. 2009;35(10-11):512–
522.
[41] Howison J. Retract bit-rotten publications: Aligning incentives for sustaining scientific software. figshare; 2014.
1111632. http://dx.doi.org/10.6084/m9.figshare.1111632.
[42] Katz DS, Smith AM. Implementing Transitive Credit with JSON-LD. arXiv; 2014. 1407.5117. http://arxiv.org/
abs/1407.5117.
[43] Kelley I. Publish or perish: the credit deficit to making software and generating data. figshare; 2014. 1112579.
http://dx.doi.org/10.6084/m9.figshare.1112579.
[44] Blomer J, Berzano D, Buncic P, Charalampidis I, Ganis G, Lestaris G, et al. The Need for a Versioned Data
Analysis Software Environment. arXiv; 2014. 1407.3063. http://arxiv.org/abs/1407.3063.
[45] Chamberlain R, Schommer J. Using Docker to Support Reproducible Research. figshare; 2014. 1101910. http:
//dx.doi.org/10.6084/m9.figshare.1101910.
[46] Chue Hong N. Minimal information for reusable scientific software. figshare; 2014. 1112528. http://dx.doi.org/
10.6084/m9.figshare.1112528.
[47] Crick T, Hall BA, Ishtiaq S. “Can I Implement Your Algorithm?”: A Model for Reproducible Research Software.
arXiv; 2014. 1407.5981. http://arxiv.org/abs/1407.5981.
[48] Marker B, Batory D, Zee FGV, van de Geijn R. Making Scientific Computing Libraries Forward Compatible.
figshare; 2014. 1101873. http://dx.doi.org/10.6084/m9.figshare.1101873.
[49] Piccolo S. Building Portable Analytical Environments to improve sustainability of computational-analysis
pipelines in the sciences. figshare; 2014. 1112571. http://dx.doi.org/10.6084/m9.figshare.1112571.
[50] Brown A, Wilson G, editors. The Architecture of Open Source Applications. lulu.com; 2012. Available from
http://aosabook.org.
[51] Wilson G. Why We Don’t Teach Testing (Even Though We’d Like To); 2014. [blog] http://software-carpentry.
org/blog/2014/10/why-we-dont-teach-testing.html.
[52] Hook D, Kelly D. Mutation Sensitivity Testing. Computing in Science Engineering. 2009 Nov;11(6):40–47.
[53] Hook D, Kelly D. Testing for trustworthiness in scientific software. In: Software Engineering for Computational
Science and Engineering, 2009. SECSE ’09. ICSE Workshop on; 2009. p. 59–64.
[54] Dubey A, Weide K, Lee D, Bachan J, Daley C, Olofin S, et al. Ongoing verification of a multiphysics community
code: FLASH. Software: Practice and Experience. 2015;45(2):233–244. Available from: http://dx.doi.org/10.
1002/spe.2220.
[55] Clune T, Rilee M, Rouson D. Testing as an Essential Process for Developing and Maintaining Scientific Software.
figshare; 2014. 1112520. http://dx.doi.org/10.6084/m9.figshare.1112520.
[56] Petre M, Wilson G. Code Review For and By Scientists. arXiv; 2014. 1407.5648. http://arxiv.org/abs/1407.5648.
[57] Slaughter AE, Gaston DR, Peterson J, Permann CJ, Andrs D, Miller JM. Continuous Integration for Concurrent
MOOSE Framework and Application Development on GitHub. figshare; 2014. 1112585. http://dx.doi.org/10.
6084/m9.figshare.1112585.