600742
NRJXXX10.1177/0739532915600742Newspaper Research JournalBroussard
research-article2015
Article
Preserving news
apps present huge
challenges
Newspaper Research Journal
2015, Vol. 36(3) 299–313
© 2015 NOND of AEJMC
Reprints and permissions:
sagepub.com/journalsPermissions.nav
DOI: 10.1177/0739532915600742
nrj.sagepub.com
By Meredith Broussard
Abstract
Currently the digital archives of newspapers are not archiving news
apps, the interactive database-driven, multimedia projects. Because
of the multiple elements required to access a news app, conversion
of the dynamic news app into static HTML pages is one possible
avenue for future archiving.
Keywords
archiving, software preservation, data journalism, news apps, copyright,
reproducible research
A
s digital technology has become more complex in recent years, archiving
the news has also grown far more complex. Today’s digital news organizations create interactive data visualizations, video, animation and apps in
addition to print artifacts. These multimedia projects are undoubtedly a crucial part of
how users experience news in the digital age, yet they are not preserved in database
versions of newspapers or in library archives. Nor can these multimedia elements simply be added to existing library databases, much like a floppy disk cannot be inserted
into a USB port.
This paper argues that entirely new engineering solutions are required in order to
effectively preserve today’s multimedia news for tomorrow’s scholars. I outline the
substantial technological challenges involved in preserving today’s most cutting-edge
multimedia artifact, a type of database-driven online story that news developers call a
news app. This is not the same as the news app that one might use to read a newspaper
on a phone or mobile device, although the term used is the same. News developers
Broussard is an assistant professor in the Arthur L. Carter
Journalism Institute at New York University. Broussard is the
corresponding author: merbroussard@gmail.com
300
Newspaper Research Journal 36(3)
have identified the news app as a priority for preservation efforts. I describe how news
apps work and why their unique design presents particular challenges to archivists. By
borrowing preservation strategies from video games and contemporary art, media
scholars can begin to develop an innovative path forward that will allow us to preserve
the first draft of news app history.
Literature Review and Background
Defining News Apps
News apps, or interactive news applications, can be thought of as online story packages. Klein writes:
Inside newsrooms, these interactive databases are sometimes called ‘news
applications’1—but don’t be confused. They’re interactive databases published
on the web, not something you buy on your smartphone. Think Dollars for Docs,
not Flipboard.2
ProPublica’s “Dollars for Docs” project is a news app that allows readers to search
for payments drug companies made to individual doctors and health professionals for
promotional talks, research and consulting. It is a searchable online database accompanied by investigative stories. Another example of a news app is The New York Times’
“Red Carpet Project,” which allows readers to search and view 19 years of Oscar
fashion photos. The most common type of news app includes an interactive online
database and one or more accompanying stories. Unlike a print story displayed online,
a news app is created using computer programming techniques.3 A news app thus has
multiple components:
•• A database
•• The data in the database
•• The unique graphical interface that appears in the browser, through which the
user interacts with the database
•• One or more text-based stories
•• Photos or illustrations
News developers engage in highly specialized labor to make all of these elements
interoperable on a newspaper’s unique Web server. Preserving a news app would
involve packaging all of these elements and making them work on a succession of different servers in perpetuity. This is easier said than done, for reasons that will be
spelled out shortly.
Literature in Adjacent Fields
Because journalists have only begun producing artifacts labeled news apps in the past
few years,4 and because very few news apps are produced each year, communication
scholars are just now beginning to look at news app preservation issues. Thus, much of
the relevant literature may be found in adjacent disciplines such as contemporary art,
Broussard
301
computer history, game development and library science.5 The problem of how to preserve digital artifacts is very much in process, and definitive long-term strategies are still
being developed about what to preserve and how to preserve it.6
Rothenberg writes of four challenges for digital archives: “physical decay of media,
loss of information about the format, encoding, or compression of files, obsolescence
of hardware, and unavailability of software.” 7 He notes that the practical physical
lifetime of a magnetic disk is 5-10 years, and the average time until the disk is obsolete
is only five years. Today’s news app storage solutions will likely be ported to future
technologies, just as newspapers were converted to microfilm and then microfilm was
widely digitized.
Loss of information is of particular interest to communication scholars who use
full-text databases, created by aggregators such as Lexis-Nexis Academic or
EBSCOhost, to acquire material for content analysis. Youngblood et al8 write that
media researchers assume—incorrectly—that the database version of a newspaper is
identical to the print version. They argue that the mismatch has substantial implications for content analysis.
News apps form a subset of the material that does not appear in aggregators’ databases, and as such their content is not available to scholars for systematic analysis.
However, news apps represent some of newsrooms’ most innovative and technologically advanced work today, and as such they are of clear interest to communication
researchers. A method to collect and analyze these artifacts would benefit the academy
and would allow newsrooms to preserve their work product more effectively.
Although we cannot know what scholars in the future will want to know about
news apps, we can confidently predict that they will want to know about them and that
they will want to view news apps on platforms and devices that do not exist today. This
will require us to develop standards and communicate them in order to ensure that
today’s software can run on tomorrow’s computers.
Software Preservation Obstacles
Grad wrote of preserving software:
Many of us in the software business believe that by studying the systems and
applications software produced over the past 50 years, historians can gain
special insight into the economic, political and social changes that have modified
the world and led to the dramatic increase in globalization and democracy.9
It is important to address the logistical issues associated with preserving software,
however.10 A news app or any Web-based software runs in a layered structure in the
following order:
••
••
••
••
••
Web browser
News app
Application/program software
Operating system
Hardware
302
Newspaper Research Journal 36(3)
Any piece of software is built to run on top of other software called an operating
system, which in turn runs on top of specific hardware. Today’s Mac laptop cannot run
a program written in BASIC on a Commodore 64 from the 1980s, in part because the
hardware is different in the Mac and the C64. Rothenberg11 and Rinehart argue that
using emulators is one viable strategy for running software in the future. Creating a
hardware emulator will allow us to install and run actual copies of today’s software in
the future. Bollacker writes of the inevitable problem associated with emulation:
Emulation is now a common technique used to run old software on new hardware. It
does, however, have a problem of recursion—what happens when there is no longer
compatible hardware to run the emulator itself? Emulators can by layered like
Matryoshka dolls, one running inside another running inside another.12
An effective strategy for preserving news apps will address these known challenges
and will address the nuances of preserving digital artifacts for future scholars.
Research Questions
As a first step toward effective news app preservation, researchers must make strategic decisions about which news apps should be preserved and what type of documentation or contextual material should accompany them. This study asks three
research questions:
RQ1:
Which news apps should be preserved, and how should this determination be made?
RQ2:
Which components of the apps should be preserved?
RQ3:
What are the known technological and legal challenges that must be addressed?
Methods
In the digital age, journalism has increasingly borrowed from qualitative approaches
in the social sciences, notably multi-method research strategies in sociology, anthropology and social psychology.13 This study relied on a “grounded theory” approach to
qualitative research as outlined by Glaser and Strauss.14 Theoretical saturation15 was
achieved by combining ethnographic participant-observation, document analysis and
focused interviews. As ethnography often requires the researcher to draw on past
experiences to develop rapport with informants as well as to interpret accurately the
expertise of cultural insiders as “local knowledge,”16 this researcher relied on her
admittedly eclectic professional experience as a former section editor of an American
Broussard
303
newspaper in a top five media market as well as academic training in computer science
and several years spent in industry as a professional software developer.
I conducted ethnographic interviews with 25 data journalists as key informants, all of
whom work at major news organizations, as well as scholars, librarians and developers. I
reviewed the interview transcripts in order to identify common themes, and I met many of
these informants while doing participatory fieldwork at a full-day software preservation
conference and planning session in March 2014 at the Newseum in Washington, DC.
Organized by the Mozilla Open News Foundation, that event was the first gathering of
journalists and scholars concerned about archival issues in digital news.17 I also collected
the outputs from the event and analyzed them for related background material and contextual evidence. These outputs included a collaboratively developed document, a “hackpad,”
about next steps and strategies; tweets from the event, hashtagged #apparchive; and multiple blog posts. Additionally, I analyzed the archives of the NICAR-L listserv, the primary
communication avenue for the international community of data journalists. In keeping
with recently developed qualitative research approaches conducted online, I spent a year as
a “virtual” participant-observer on the NICAR-L listserv.18 Finally, I supplemented this
research by assembling a bibliography of scholarly and popular sources in the adjacent
disciplines of game development, library science, visual art (specifically new media art or
software-based installations) and software preservation.
A national news app registry could potentially address
the question of which news apps should be preserved.
Findings
RQ1:
Which news apps should be preserved and how this determination should be made?
The case of Everyblock.com, a very early news app that was not effectively preserved, illustrates the myriad issues associated with preservation. Everyblock is identifiable as an app that probably should have been preserved; however, it is only
identifiable as such in retrospect because news developers use Everyblock (and its
demise) as both a cultural touchstone and a cautionary tale.
Everyblock began in 2005 when journalist and programmer Adrian Holovaty
launched a site called Chicagocrime.org. The site was revolutionary in the field as the
first example of a journalist combining geo-location and public data. Chicago magazine wrote of the project:
Google Maps had just launched. The Chicago Police Department had put some
of its statistics online. Holovaty combined the two and created Chicagocrime.
org, a website that allowed anyone to search for crimes by location, type and
date—and on a map, no less.19
Chicagocrime.org won substantial acclaim for digital hyper-local news, including a
2005 Knight-Batten Award for Innovation in Journalism and a $1.1 million grant from
the Knight Foundation.
304
Newspaper Research Journal 36(3)
As the number of people involved in the project grew, the software changed.
Holovaty used the Knight grant to expand chicagocrime.org into Everyblock.com, a
neighborhood news and discussion site, in 2007. EveryBlock used geo-location to feed
users relevant nearby news. Readers could search for local news and other information
by entering a zip code, neighborhood or address. Msnbc.com bought EveryBlock—the
company and the site—in 2009, and it expanded to 16 cities. Later, msnbc.com was
acquired by NBC News. Holovaty, who is also known for creating the Django open
source framework used by a number of news organizations to develop original news
apps, left Everyblock in 2012. In early 2013, NBC News shut down the site. A small
part of the project was resurrected as Chicago.everyblock.com, focusing only on
Chicago data—but the rest of the site is gone.
A national news app registry could potentially address the question of which news
apps should be preserved. An organization such as the Library of Congress or a professional association such as IRE/NICAR could maintain the registry. To return to the
difference between a “mobile app” and a “news app” referenced in the first paragraph,
the difference is one of perception. Both are pieces of software, and both present a
viewer or reader with journalistic stories. However, the news app is considered by
news developers to be a work of journalism; the mobile app, when it is considered at
all, is considered to be merely a delivery mechanism. It seems unfair to ask an archivist to parse out the nuances of why one piece of software is considered more journalistically prestigious than another. Thus it would seem useful for developers or
journalists themselves to nominate projects that they collectively deem important to
preserve.
Prompted in part by an emerging online conversation about preserving apps,
Document Cloud developer Ted Han started a Reddit page where anyone can contribute a link to a news app project. Han’s idea was that such a list would be the first step
toward determining what apps are out there, how many of them exist and which ones
news developers or Reddit users deem noteworthy. He wrote on the NICAR-L
listserv:
I’ve started informally collecting links to news apps here: http://www.reddit
.com/r/newsapps. I would entreat other folks who are interested in collecting
links to join me in doing so, since as far as I know there isn’t any sort of public
index to this info (would love to know if folks have tried elsewhere, though!).
Archives of articles and retrospective access to documents are important to
projects like DocumentCloud. Some have written articles around viewers
embedded from our site, so we both care about whether and how those articles
are available and whether they get reformatted or transformed in the future.
We’d also like to make sure uploaded documents that we maintain remain stable
and available over the long term.20
Han was motivated to create this informal collection, he said, after discovering in
February 2014 that U.S. News & World Report was no longer making archived content
before 2007 available on its website. U.S. News had switched to a different website
content management system and had determined that it was unfeasible to continue to
maintain these older archives using this new technology; readers were encouraged to
consult EBSCO, LexisNexis or bound volumes for archival material. As of March
Broussard
305
2014, Han’s list included links to 37 news apps or stories about news apps. This is a
relatively small number and could serve as a preliminary list of targets to optimize.
RQ2:
Which components of a news app should be preserved?
Preserving code alone is not enough. In preparing software for the future, it is
important to think about preserving code as well as documentation and information
about the development process and its infrastructure. In the future, a scholar might ask
questions like: What led Adrian Holovaty to take data published by the Chicago police
department and display it in a searchable Google Maps interface? What did the code
look like? How did the constraints of the programming language influence the visual
design of the project?
Some insight can be gained by asking current news developers what they would
like to know about EveryBlock. Two developers wrote of what they were curious
about:
It’s not just the code that Adrian wrote or the map itself, though his reverse
engineering of the Google Maps Flash API was one of its great innovations
when it first came out. We want to know about his process. We want to know the
infrastructure on which he built the app (indeed, making his use of Google Maps
even more impressive). We want to know about how it was designed, how the
user interactions worked. We want to know the impact it had and who responded
to it.21
In visual art, today’s archivists try to preserve some documentation about the art
along with the physical artwork. “You really cannot understand contemporary art
without its documentation,”22 said Pilar Garcia, archive director at the Museo
Universitario Arte Contemporáneo in a 2013 interview on the Museum of Modern Art/
PS1 blog. For example: Marcel Duchamp’s 1917 “Fountain” is important because it is
an iconic example of Dadaist artwork. Without an explanation of how and why the
artist turned this everyday object into art and why Dada was historically significant, in
100 years “Fountain” will look like an ordinary urinal. So, too, will this happen with
the collection of bits and bytes that make up a news app. Unless some historical information is preserved to explain the context, news apps in 100 years will look like piles
of unreadable 0s and 1s.
Interestingly, a connection to the art world might have saved components of
EveryBlock from being lost forever. Future scholars might be able to piece together a
representation of the site by using material in the MOMA archives. Just after
Chicagocrime.org was absorbed into EveryBlock, Holovaty wrote on his blog:
This story has a fitting epilogue. In just a few weeks after Chicagocrime.org
goes offline, the site will be featured in an exhibition at New York’s Museum of
Modern Art, called Design and the Elastic Mind. Chicagocrime.org will have
ended its life and become a museum piece.23
306
Newspaper Research Journal 36(3)
A news app is usually developed in an iterative fashion, meaning that a version of
the app is released to the public and the technology or presentation is fine-tuned over
the course of the next few days or weeks. This raises obvious questions about which
version of a news app should be preserved. Should it be the first version or the final?
In addition to versioning concerns, attention might be paid to which components of
an app might be reused in the future. Data-driven apps have two major components:
the underlying data and the presentation layer. The presentation layer includes the app
architecture, the data analysis and the user interface. The underlying data is potentially
reusable. Just as social scientists preserve and share their data through the ICPSR data
library, so too could journalists share data through a central organization, such as the
IRE Data Library, for the benefit of other data journalists.
Currently, news app history has been preserved on an ad-hoc basis. At the 2014
IRE/NICAR annual conference, a group of data journalists presented a panel called
“Save the data: going from Zip (drive) to news by rescuing, analyzing old data.”
Cheryl Phillips, the multiple Pulitzer Prize-winning reporter and data innovation editor at The Seattle Times, showed photos of her storage method for old data. She keeps
her old computers in a pile in her basement. Cardboard boxes nearby house Zip disks
and floppy disks. Other journalists contributed photos and samples of their own
archives. Paul Overberg of USA Today demonstrated a nine-track tape, a one-half
-inch magnetic tape reel that was used on minicomputers and mainframes from the
1970s to the 1990s.24 Overberg had received this particular 9-track in response to a
request for Census data, and it had seemed important to hang onto it. For decades. In
the basement.
There is some psychology attached to keeping these objects: perhaps the idea is that
if the physical storage medium is still in the reporter’s possession, the story could be
fact-checked or the data recovered. Cheryl Phillips no longer owns a Zip drive, but she
does have the Zip disks and the computer that once ran them.
Phillips spoke at NICAR of her reasoning for organizing the panel on data storage
and recovery: “Why do we care about this? Because the data we’re using now is going
to be just like this,”25 she said, gesturing at the 9-track tape.
We’re not going to have any way to get it because it will be on a USB drive, it’s
going to be on little floppies; we won’t be able to access it unless we figure out
now a way to save it for future geeks as well as aggregate it for good stories.
That’s why we need to document our data, and share it.26
Overberg wrote on a listserv about the utility of using older data as an efficient
starting point for new stories:
Phil Meyer said that precision journalists must adopt the scientific method,
including replicability, way back when “transparency” was just a UN buzzword.
So he pushed us at USA Today to document our work and archive data so we
could share it, including with our later selves. The archive from our 1997
Interstate speeding ticket project gave our 2004 project a lot more legs. Our
Census 2000 archive saved us a huge amount of work setting up for Census
2010. And archiving weekly best-selling book data let us produce an interactive
of every book that has topped our list when it turned 20 last fall.27
Broussard
307
Each of the various layers of a news app has potential value, whether it is the potential reuse value of the underlying data or the value a future scholar can derive from
seeing a representation of a multimedia news artifact created out of the cultural context of America in 2014. Looking at each layer separately may help scholars to prioritize their efforts to develop technical solutions for the challenge of preserving news
apps.
RQ3:
What are the known technological and legal challenges that must be addressed as
researchers develop standards?
The 2013 loss of EveryBlock prompted news developers to start asking questions
about what and how such content should be preserved An online and offline conversation identified problems and opportunities. News developer Matt Waite, a professor at
the University of Nebraska, wrote in Source in September 2013:
News people know there’s value in longevity. A good project becomes a resource,
or a monument to a moment in our history. And you can’t be the first draft of
history if you delete the draft.28
Prompted by Waite’s piece, The New York Times news developer Jacob Harris organized a panel at a conference called Newsfoo at which developers tried to explain the
need for archiving news apps. Harris then published an essay on Source called “And
remember—this is for posterity,”29 in which he argued for the benefits of archiving
dynamic sites as static pages.30
Static versus dynamic is the current methodological battleground among news
developers. The issue of archiving static versus dynamic pages is tied to human factors
and technical constraints associated with archiving. Chicagocrime.org and EveryBlock.
com were taken offline because of human factors: high-level business decisions.
Unfortunately, the tumultuous corporate history of Chicagocrime.org is typical of
news projects and contemporary companies. Internet companies, even digital media
companies, are bought, sold, consolidated and bankrupted at a rapid rate. The media
landscape will only get more complicated: Pew estimates that there were 438 small
digital news organizations in the US in 2013, most of which are digital-first
startups.31
Unlike legacy media organizations, these startups do not have archiving contracts
with news database companies like Lexis-Nexis or Thompson/Reuters. Their digital
content may be archived through a snapshot captured by the Internet Archive, which
attempts to preserve the history of the Internet. However, the content may not show up
in the Internet Archive, depending on the back-end technology the media company
uses. Internet Archive snapshots preserve static web pages, not dynamic ones, and
images are often not stored with the text of a web page. Here is an image of Everyblock
taken by the Internet Archive in 2008. [See Figure 1]
The live site likely had content in the large area that appears blank in this archived
version. Clicking on any of the links or attempting to use the search box yields the
following page. [See Figure 2]
308
Newspaper Research Journal 36(3)
Figure 1
Screencap of Everyblock Taken by Internet Archive, 2008
The technical challenge of preserving static versus dynamic content has wide-ranging implications.
In addition to information about file encoding, compression and format, scholars
may want to consider storing copyright and/or licensing information alongside news
apps. Copyright issues govern the completeness of the library databases that media
researchers rely on to construct samples. Chen32 writes that thousands of articles were
deleted from library databases in the wake of the 2001 New York Times vs. Tasini ruling, which held that freelance writers—not publishers—own the electronic rights to
their articles.
News apps, because they are interactive databases, pose another potential copyright
issue that news organizations may have to navigate in the future. A database is copyrightable, in part because it may include unique “selection and arrangement”33 of facts.
News organizations are familiar with copyright issues around text and photos, but will
likely need education around the additional concerns of database copyright. Should
news app developers drift into creating unique software, which is not a far stretch from
creating databases, publishers may find themselves in the realm of software development and intellectual property rights. Intellectual property rights around software are
another complex field that will challenge archivists.34 If a newspaper’s staffers develop
a news app, usually the same employment agreement governs the intellectual property
that the staffers produce. However, if freelancers contribute to the news app, each
freelancer’s contract can contain unique provisions regarding intellectual property.
The underlying data in a news app may also have licensing information associated
with it that will affect future use; archivists will want to preserve this licensing
information.
Broussard
309
Figure 2
Result of Clicking Links or Using Search Box on Everyblock, 2008
The nuances of copyright and modes of digital expression have been explored most
extensively by visual artists and curators. In order to preserve code-based artworks
submitted to the Rhizome ArtBase, a platform for new media art, Rinehart35 developed
a questionnaire for visual artists seeking to preserve their work. The Rhizome ArtBase
serves as a registry and documentation repository for such artworks. The questionnaire
focuses on emulation: a submitting artist must specify what hardware and software is
necessary to emulate the artwork at a future point in time. The questionnaire also asks
for rights associated with performance, so that future curators do not accidentally violate the artist’s copyright.36
The reason that news apps don’t get archived at legacy media organizations has to
do with the back-end technology of the newsroom. Story text, bylines and images are
typically stored inside a newspaper’s content management system, or CMS, which is
also used to transmit files to the printer. Newspapers typically have two content management systems—one that pushes stories and images to the Web and one that pushes
stories and images to the printing press. USA Today, for example, uses the CCI
NewsDesk Editorial and Pagination System for layout and page design. Reporters file
their stories in CCI, designers lay out the pages and the issue is transmitted via satellite
to 36 US printing plants and four printing plants in Europe and Asia.37 After stories and
images are entered into CCI, they are edited and approved, and they are pushed to the
Web content management system (CMS). The Web CMS delivers Web pages to users
who visit usatoday.com and related URLs.
CCI, Saxotech, Hermes and other print production management systems have been
in existence much longer than any Web CMS. Automatic archiving systems are set up
310
Newspaper Research Journal 36(3)
to pull content from the print CMS, not the Web CMS. For example, if LexisNexis
pulls an automatic feed of material from The Philadelphia Inquirer, the feed is set up
to pull from Hermes, the Inquirer’s print CMS, not Clickability, the Web CMS.
Reporters’ blog posts and other material posted on Clickability will not be automatically archived.
The CMS issue gets even more complicated for news apps because interactive
news apps do not appear in print and they are typically made outside of the regular
Web CMS. There are convincing technical reasons for this, starting with the fact that
most Web content management systems are unstable technologies. Harris explains:
Almost any news programmer generally loathes their organization’s Content
Management System; its codified formats and rigid workflows often feel more
like strictures to our project. And so, we do our work outside the CMS, skinning
our pages so they look like the main news site while remaining architecturally
apart. For instance, look at our how we reported election results in 2012.38 It’s
actually hosted on Amazon S3 and skinned to look like The New York Times
content. Why go through this extra work just to make it look like articles produced
via the CMS in the end? In our case, controlling our own technology stack
enabled us to do dynamic projects like election results that wouldn’t be possible
within the CMS. Also, the CMS model for stories is a foolish fit for data projects
that may include many thousands of browsable (sic) pages; you just can’t and
shouldn’t represent a relational database in a CMS. So, we do our work outside
the bounds of the CMS, but it has a cost.39
The convenience of creating news apps outside of a CMS currently comes at the
cost of easy archiving. Automatic bulk archiving is one of the reasons for extensive
newspaper archives today; however, if news apps are outside of the automated system,
an important first step is figuring out how to manually archive these journalistic projects. As one developer put it, “News apps are the artisanal cheeses of the journalism
world.”40 They are unique and exciting, but they are expensive to produce and very
hard to store.
Considering that preserving the underlying software is so complex, “baking out”
dynamic news apps as static pages seems like a practical strategy for news apps at the
end of their life cycle. A dynamic site could be converted into a set of dozens or thousands of static HTML pages. Static HTML pages, because they are flat text files, seem
like they will be more likely to remain readable for the foreseeable future. The images
associated with the Web pages could be rendered as TIFFs, which have emerged as a
popular format for archived images.41 It would be helpful to select a single app for
preservation and test an emulator solution, plus develop documentary metadata.
Discussion
Important strides have been made toward the goal of preserving news apps; however, it is clear that more work remains to be done.
If the goal is to allow future journalists and historians to experience today’s news
apps, an important first step is to identify which news apps should be preserved. The
fact that there are 400-plus media organizations producing digital content, some of
Broussard
311
which are news apps, suggests the need for some kind of registry of data journalism
projects.
Such a registry might maintain standardized documentation listing the app’s runtime environment, its copyright and intellectual property restrictions and other crucial
metadata. In addition to the app presentation layer, the registry could make available
the underlying data that powers the app, allowing other journalists to use the cleaned
data as a starting point for other investigations.
The logistical issues associated with running today’s software on tomorrow’s
machines must be addressed if any preservation efforts are to succeed. Hardware and
software emulators may solve this issue if properly preserved and kept up to date. It
will also be helpful if, in the future, news apps can be archived automatically just as
traditional print stories and images are now archived automatically.
The journalists who are making news apps today are making data journalism history. Developing methods for preserving news apps is an important step toward making sure that this first draft of history is available to future generations.
Notes
1.
While there is a whole universe of digital artifacts that could be archived, this paper focuses on what
news developers call “news apps.” Software developers tend to use the term “app” generically to mean
“application,” but the specific meaning of “application” varies depending on the situation. The electronic artifacts that news organizations generate, all of which could potentially be preserved, include
data visualizations, video, animation and news apps. The important distinction is that a news app is
considered a piece of journalism, and thus radically different from an app that one might download
from the iTunes Store in order to read individual articles in the newspaper. For the sake of clarity,
below are some definitions: News app is short for “interactive news application.” Scott Klein, senior
editor for news development at ProPublica, gives the following definition in The Data Journalism
Handbook: “A news application is a big interactive database that tells a news story. Think of it like you
would any other piece of journalism. It just uses software instead of words and pictures.” See Scott
Klein, “News Apps at ProPublica,” datajournalismhandbook.org, <http://datajournalismhandbook.
org/1.0/en/delivering_data_2.html> (May 28, 2015). Web app refers to a piece of software that runs
inside a Web browser. A Web app may be accessed on a desktop computer, a laptop computer or a
mobile device such as an iPhone, iPad, tablet or Android phone. A news app is usually a Web app in
that a news app is custom software designed to be viewed within a Web browser. Native app refers to
a piece of software designed to work with a mobile device’s native operating system. For Android
phones, this means the Android operating system; for iPhones and iPads, this means the iOS operating
system. Native apps are typically obtained via proprietary online stores such as the Apple Store or
Google Play. News apps are rarely native apps, but many news organizations also publish native apps.
For example. The New York Times published twelve different native apps as of March 2014. Among
these were: The NYTimes app for iPad, The NYTimes app for Android, The NYTimes app for Kindle
Fire, The NYTimes Crosswords app, The Scoop: NYC app for iPhone, The NYTimes Real Estate app.
The NYTimes app for iPad, Android, or Kindle Fire is a native app that readers would use to read
articles from the newspaper. It includes mobile advertising and is akin to an electronic version of a
newspaper. The NYTimes Crosswords app is a delivery device for crossword puzzles. The Scoop let
readers sort through restaurant reviews and ideas for New York City outings. The NYTimes Real
Estate app presents real estate listings and content from the real estate section. These native apps all
repurpose content that journalists have produced. Mobile app may be used to refer to a native app, the
mobile version of a Web app or the mobile version of a news app.
2. Scott Klein and Tyler Fisher, “A Conceptual Model for Interactive Databases in News,” propublica.
org, March 18, 2014, <http://www.propublica.org/nerds/item/a-conceptual-model-for-interactivedatabases-in-news> (May 28, 2015).
3. Alexander Howard, “Aron Pilhofer on Data Journalism, Culture and Going Digital,” towcenter.org,
March 27, 2014, <http://towcenter.org/aron-pilhofer-on-data-journalism-culture-and-going-digital/>
(June 22, 2015).
312
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
Newspaper Research Journal 36(3)
Meredith Broussard, “Future-Proofing News Apps,” pbs.org, April 23, 2014, <http://www.pbs.org/
mediashift/2014/04/future-proofing-news-apps/> (May 28, 2015).
National Digital Information Infrastructure and Preservation Program of the Library of Congress,
“PRESERVING.EXE: Toward a National Strategy for Software Preservation,” digitalpreservation.
gov, October 2013, <http://www.digitalpreservation.gov/multimedia/documents/PreservingEXE_
report_final101813.pdf> (June 22, 2015).
Jeff Rothenberg, “Ensuring the Longevity of Digital Information,” clir.org, February 22, 1999,
<http://www.clir.org/pubs/archives/ensuring.pdf> (May 28, 2015).
Ibid.
Norman E. Youngblood, Barbara A. Bishop and Debra L. Worthington, “Database Search Results Can
Differ from Newspaper Microfilm,” Newspaper Research Journal 34, no. 1 (winter 2013): 36-49.
Burton Grad, “Preserving the Software Industry’s Past,” IEEE Annals of the History of Computing 25,
no. 1 (January 2003): 88.
For more on the myriad logistical challenges of metadata around software preservation, see Kurt D.
Bollacker, “Avoiding a Digital Dark Age,” americanscientist.org, <http://www.americanscientist.org/
issues/pub/avoiding-a-digital-dark-age/1> (May 28, 2015); James Mitchell Crow, “Cultural Decay,”
New Scientist 206, no. 2765 (June 2010): 42-45; Helen R. Tibbo, “On the Nature and Importance of
Archiving in the Digital Age,” Advances in Computers 57 (2003): 1-67; Omar Alam, Bram Adams and
Ahmed E. Hassan, “Preserving Knowledge in Software Projects,” Journal of Systems and Software
85, no. 10 (October 2012): 2318-2330; James W. Cortada, “Think Piece: Preserving Records of the
Past, Today,” IEEE Annals of the History of Computing 31, no. 2 (April 2009): 88-87; Michael W.
Godfrey, “Understanding Software Artifact Provenance,” Science of Computer Programming 97, part
1 (January 2015): 86-90.
Jeff Rothenberg, “Avoiding Technological Quicksand: Finding a Viable Technical Foundation for
Digital Preservation,” clir.org, January 1998, <http://www.clir.org/pubs/reports/rothenberg/contents.
html> (May 28, 2015).
Bollacker, “Avoiding a Digital Dark Age.”
Sharon Hartin Iorio, ed., Qualitative Research in Journalism: Taking It to the Streets (Mahwah, NJ:
Lawrence Erlbaum Associates, 2004).
Barney Glaser and Anselm Strauss, The Discovery of Grounded Theory: Strategies for Qualitative
Research (Chicago: Aldine Publishing Company, 1999).
Ibid.
Clifford Geertz, Local Knowledge: Further Essays in Interpretative Anthropology (New York: Basic
Books, 2000).
Erika Owens, “Smart People Working on a Tough Problem: NICAR News Apps Archive Designathon,”
erikaowens.com, February 7, 2014, <http://erikaowens.com/blog/smart-people-working-tough-prob
lem-nicar-news-apps-archive-designathon> (May 28, 2015).
Tom Boellstorff, ed., Ethnography and Virtual Worlds: A Handbook of Method (Princeton, NJ:
Princeton University Press, 2012).
Andrew Huff, “Street Wise,” chicagomag.com, June 2009, <http://www.chicagomag.com/ChicagoMagazine/June-2009/Street-Wise/> (May 28, 2015).
Ted Han, “Re: NICAR News Apps Archive Designathon,” zotero.org, February 19, 2014, <https://
www.zotero.org/groups/app_archive/items/itemKey/DVKGNI37> (June 22, 2015).
Tyler Fisher and Scott Klein, “Preserving Interactive News Projects with Newseum, OpenNews and
Pop Up Archive,” knightlab.northwestern.edu, March 18, 2014, <http://knightlab.northwestern.
edu/2014/03/18/preserving-interactive-news-projects-with-newseum-opennews-and-pop-uparchive/> (May 28, 2015).
Naomi Kuromiya, “Examining Archives Exhibition Strategies in Mexico City,” moma.org, October 7,
2013, <http://www.moma.org/explore/inside_out/2013/10/07/examining-archives-exhibition-strategies-in-mexico-city> (May 28, 2015).
Adrian Holovaty, “In Memory of chicagocrime.org,” holovaty.com, January 31, 2008, <http://www
.holovaty.com/writing/chicagocrime.org-tribute/> (May 28, 2015).
Frank da Cruz, “IBM Mainframe Magnetic Storage Media,” columbia.edu, July 2010, <http://www
.columbia.edu/cu/computinghistory/media.html> (March 29, 2014).
Cheryl Phillips, “Save the Data: Going from Zip (Drive) to News by Rescuing, Analyzing Old Data”
(paper presented at the IRE/NICAR 2014 Conference, Baltimore, MD, February 2014).
Ibid.
Broussard
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
313
Paul Overberg, “Archiving News Applications,” zotero.org, January 21, 2014, <https://www.zotero
.org/groups/app_archive/items/VEFHZ7QX> (June 22, 2015).
Matt Waite, “Kill All Your Darlings,” source.opennews.org, September 12, 2013, <https://source
.opennews.org/en-US/learning/kill-all-your-darlings/> (May 28, 2015).
Jacob Harris, “And Remember, This Is for Posterity,” source.opennews.org, November 14, 2013,
<https://source.opennews.org/en-US/learning/and-remember-ones-posterity/> (May 28, 2015).
Ibid.
Mark Jurkowitz, “The Growth of Digital Reporting,” journalism.org, March 26, 2014, <http://www
.journalism.org/2014/03/26/the-growth-in-digital-reporting/> (May 28, 2015).
Xiaotian Chen, “Embargo, Tasini, and ‘Opted Out’: How Many Journal Articles are Missing from
Full-Text Databases,” Internet Reference Services Quarterly 7, no. 4 (September 2002): 23-34.
I. Trotter Hardy, “Project Looking Forward: Sketching the Future of Copyright in a Networked
World,” copyright.gov, May 1998, <http://www.copyright.gov/reports/thardy.pdf> (May 28, 2015).
National Research Council (U.S.), The Digital Dilemma: Intellectual Property in the Information Age
(Washington, DC: National Academy Press, 2000); Feng-Cheng Chang, Chin-Yuan Chang and
Hsueh-Ming Hang, “A Study on the Meta-Data Design for Long-Term Digital Multimedia
Preservation,” in Intelligent Information Hiding and Multimedia Signal Processing (proceedings from
IIHMSP International Conference, Harbin, China, 2008); Len Shustek, “What Should We Collect to
Preserve the History of Software?” IEEE Annals of the History of Computing 28, no. 4 (October
2006): 112-111.
Richard Rinehart, “Preserving the Rhizome ArtBase,” archive.rhizome.org, September 2002, <http://
archive.rhizome.org/artbase/preserving-the-rhizome-artbase-richard-rinehart/> (May 28, 2015).
For more on contemporary art preservation issues, see Berkeley Art Museum and Pacific Film
Archive, “Archiving the Avant-Garde: Documenting and Preserving Digital/Media Art,” bampfa.
berkeley.edu, 2001, <http://www.bampfa.berkeley.edu/about/avant-garde> (June 22, 2015); Dirk Von
Suchodoletz and Jeffrey Van der Hoeven, “Emulation: From Digital Artefact to Remotely Rendered
Environments,” International Journal of Digital Curation 4, no. 3 (December 2009): 146-155; Alain
Depocas, Jon Ippolito and Caitlin Jones, “Permanence through Change: The Variable Media
Approach,” variablemedia.net, <http://www.variablemedia.net/pdf/Permanence.pdf> (May 28,
2015); Jon Ippolito, “The Museum of the Future: A Contradiction in Terms?” Cross Talk ArtByte 1,
no. 2 (July 1998): 18-19; Richard Rinehart, “The Straw That Broke the Museum’s Back? Collecting
and Preserving Digital MediaArtworks for the Next Century,” switch.sjsu.edu, June 14, 2000, <http://
switch.sjsu.edu/web/v6n1/article_a.htm> (May 28, 2015).
USA Today, “How the Newspaper Is Produced,” usatoday30.usatoday.com, <http://usatoday30.usatoday.com/marketing/media_kit/pressroom/press_kit_usat_how_newspaper_produced.html> (March
21, 2014).
See The New York Times, “President Map,” elections.nytimes.com, November 29, 2012, <http://
elections.nytimes.com/2012/results/president> (May 28, 2015).
Harris, “And Remember, This Is for Posterity.”
Quote taken from author’s notes from software preservation conference and planning session at
Newseum, Washington, DC, March 2014.
Leslie Johnston, Library of Congress digital archivist, personal communication with author, March 2,
2014.