ARTIFICIAL INTELLIGENCE FOR
INVESTIGATIVE REPORTING
Using an expert system to enhance
journalists’ ability to discover original public
affairs stories
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
Meredith Broussard
This paper describes an artificial intelligence-based software system that augments public
affairs reporters’ ability to sort through data and identify investigative storytelling opportunities. A prototype of the model was developed and was used to analyze education data. The
successful prototype and the social impact of the stories derived from the prototype suggest
this approach as a valid option for newsrooms that seek to tell more compelling, data-rich
stories about public affairs issues.
KEYWORDS artificial intelligence; computational journalism; data journalism; expert
systems; innovation; public affairs journalism
Introduction
“Readers don’t care about bureaucracy,” one of my colleagues tells her students
on the first day of her public affairs journalism class. “To make people care about public
affairs, you have to tell a story that taps into our shared humanity.” The work of telling
routine public affairs stories becomes second nature to a beat reporter. But for an
investigative reporter, storytelling requires an additional layer of cognitive complexity.
The investigative reporter must come up with an original idea—a creative act—and
must then find sources and turn the idea into a narrative. Ideas are easy to generate.
Original ideas are much harder. Original ideas that can turn into successful investigative
stories are even more difficult to create. Once the idea exists, the timeline is uncertain:
investigative stories can take a very long time to conceive and report. Many of today’s
economically challenged newsrooms do not feel they can afford such a luxury. While a
computer cannot generate original story ideas, computational methods for accelerating
human creativity offer a possible solution for newsrooms seeking to amplify their investigative reporting capacity. This paper describes a model for leveraging artificial intelligence to accelerate the process of discovering investigative ideas on public affairs
beats such as education, transportation, or campaign finance.
Digital Journalism, 2014
http://dx.doi.org/10.1080/21670811.2014.985497
Ó 2014 Taylor & Francis
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
2
MEREDITH BROUSSARD
The model, which I call the “Story Discovery Engine,” derives from a type of artificial intelligence software called an expert system. In this paper, I explain how the
engine works to facilitate the discovery of investigative ideas. First, I outline the conceptual process involved in generating new investigative story ideas. I describe expert
systems, outline one of the logical rules embedded in the software, and show the difference between a classical expert system and the Story Discovery Engine. I demonstrate how I tested the system by developing a prototype of the Story Discovery
Engine that analyzed education data from the School District of Philadelphia, the
eighth-largest school district in the United States. That prototype was published online
as a project called “Stacked Up.” Stacked Up consists of a set of investigative stories as
well as a reporting tool made of dynamic, customizable data visualizations inside a narrative framework. I summarize the investigative stories that were produced from the
reporting tool and the policy changes that resulted. The implementation and resulting
investigative news stories, plus the project’s impact, suggest that the Story Discovery
Engine model can add value to investigative reporting.
Creativity and Investigative Story Ideas
Social scientists have engaged with the notion of investigative reporting as a
cultural construction produced inside a particular organizational culture (Gans 2004;
Tuchman 1978). For the purposes of this paper, investigative reporting is defined as a
type of enterprise journalism that is produced over time, outside of the day-to-day
deadline crunch, and includes diverse sources (Hansen 1991). The cognitive process of
coming up with an original investigative story idea is a creative act under Sternberg’s
(1999) definition of creativity as the production of work that is both novel (as in
original) and appropriate (as in useful).
Experienced investigative reporters build up a set of strategies for finding story
ideas, but novice investigative reporters often struggle to find opportunities for novel
enterprise stories. Training and education materials for novice investigators focus on
places to look for stories: follow the money, look at specific lines on financial filings,
and so on.1 The complexity of the process is part of the reason that so much investigative journalism is reactive, resulting from a tip from a whistle blower, rather than proactive (Protess 1991).
Journalism innovation theorists have suggested that tremendous possibilities exist
in analyzing data to find investigative ideas (Appelgren and Nygren 2014; Dick 2013;
Flaounas et al. 2013; Pavlik 2013). Hamilton and Turner (2009) write that the future of
watchdog journalism may be found in using algorithms (precisely defined problemsolving procedures) for accountability:
The best algorithms will essentially be public interest data mining. They will generate
leads, hunches, and anomalies to investigate. It will remain for reporters and others
interested in government performance to take the next step of tracking down the story
behind the data pattern.
Tracking down a story in data requires specialized technical skills (to do the datacrunching) as well as journalistic expertise (to refine the story idea and craft appropriate
prose). These skills until recently have tended to be segregated into different job
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING
categories and experience levels. A novice reporter might have sufficient technical skills
to use pivot tables in a spreadsheet, for example, but might not have sufficient job
experience to know that pivot table analysis could be applied to monthly government
data releases on a particular beat. The promise of computational journalism is that such
walls would be broken down through collaboration and training (Flew et al. 2012). A
successful computational journalism project might thus be described as one that uses
computational thinking to bridge a knowledge gap.
This knowledge gap between the experienced and the novice reporter involves
two types of knowledge: formal and informal (Scribner and Cole 1973). Formal knowledge includes rules of a system, as in knowing the rules of English grammar. An experienced education reporter has formal knowledge of his or her state’s laws and policies
around education. Informal knowledge includes domain expertise and rules of thumb
based on experience. Informal knowledge for an experienced investigative reporter
might include a rule of thumb like this:
If you have a natural disaster like Hurricane Sandy, and there is a big pool of money
for hurricane relief, some of those funds will be misused; after a natural disaster, always
follow up and find out where things went wrong with the government funds, and
you’ll find a story.
To come up with ideas the way an experienced reporter would, the novice reporter
needs the informal knowledge that the experienced reporter has about where to find
stories plus some of the formal knowledge about education policies.
Origin of the Project
In 2011, I found myself staring into exactly this type of knowledge gap. I was an
experienced reporter, but not on the public affairs beat. I wanted to investigate a question in education: do Philadelphia public school children have enough books to learn
the material on the state-mandated standardized tests? I had data, I had methods, but I
did not have contacts. I wanted to talk to parents, teachers, and students at the city’s
best schools, and the city’s worst schools, and see if there was a difference in the students’ access to books. To do that, I needed to figure out which were the best schools,
and which were the worst schools; I also needed to find people to talk to at each.
There were more than 200 schools. The task was daunting.
Educational data is abundant, but the specific analysis I wanted had not been
done before. It also involved numerous interdependencies and micro-judgments. To
investigate the story I wanted to write, I turned to data journalism.
Data journalism is the practice of finding stories in numbers, and using numbers
to tell stories (Broussard, quoted in Howard 2014). It is an evolving practice (Appelgren
and Nygren 2014) that may also be called data-driven journalism or computational journalism. Public affairs reporting is particularly suited to data journalism, and specifically
expert system analysis, because public affairs reporting depends on interpreting the
rules of a local system. An education beat reporter must be familiar with a dizzying
array of laws and policies at the federal and state level. Fortunately, these laws and policies are articulated in text-based rules that are easily available online. The government
uses data to track the success of its programs, and that data is frequently published
3
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
4
MEREDITH BROUSSARD
online. Other data sets are available to reporters and citizens under the Freedom of
Information Act of 1966. Clearly articulated rules in the real world can be translated
easily into computer logic rules. Applying the rules to the data allows the computational “intelligence” to uncover social problems.
Thus, the first step was creating a software system that would do some of the
necessary investigative thinking for me. Embedding formal and informal knowledge
into the software would allow me (or any other reporter) to use the software as a
reporting tool to refine story ideas and more efficiently find sources.
This is the essence of the Story Discovery Engine. It is possible to take some of
the experienced reporter’s knowledge, abstract out the high-level rules or heuristics,
and implement these rules inside an expert system in the form of database logic. The
data about the real world is fed in, the logical rules are applied, and the system
presents the reporter with a visual representation of a specific site within the system.
The Prototype
An investigation often arises when a reporter perceives a difference between
what is (the observed reality) and what should be (as articulated in law or policy). A
high-impact investigative story looks at a situation where what is differs from what
should be, and explains why. The reader can then use the narrative to create or enact
a path to remedy the situation.
The idea for Stacked Up arose from just such a difference. “The school is terrific,”
my neighbor said of her daughters’ public school, considered one of the best in the city.
“But if you’re a parent there, you have to be prepared to do a lot of fundraising for basic
things like textbooks.” A few years later, I noticed that I was getting the same email at
the beginning of every semester from the students in my college classes: it said that the
student was very sorry, but he or she could not do the homework because the course
books had not yet arrived in the mail. Those students always seemed to be the students
who received the lowest grades at the end of the semester. It made sense: they could
not do the work required to pass the class if they did not have the books. I wondered:
could book shortages be a factor in Philadelphia public schools’ consistently low standardized test scores? (Many parents do not have the resources to fundraise to get books
for a school—my neighbor is an outlier, as are many of the other parents at that particular school.) The District currently has 131,262 students in grades pre-kindergarten
through 12, 87.3 percent of whom are economically disadvantaged. This is a significant
issue because even if parents at each school fundraised, they might not be able to raise
enough money to buy all of the books needed.
Most people would be surprised at the idea that a public school would not have
enough books. After all, Pennsylvania law specifically says that the state provides books.
In Philadelphia, however, students and parents regularly complain of textbook shortages. A 10th grader at Parkway West High School told me that students often have to
share books in class and cannot take them home to do homework. Many books are in
poor condition: “There were pictures of testicles drawn on every page,” she said of one
of her ninth-grade books. The logistical challenges of getting multiple books to hundreds of thousands of students at hundreds of schools overwhelm many major school
districts (Labbé and Haynes 2007).
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING
Access to books is particularly critical because a school today is labeled a success
or failure based on students’ performance on high-stakes tests. The tests are highly specific and are aligned with state educational standards. The tests are also aligned with
the textbooks sold by the three educational publishers that dominate the educational
publishing market. These same publishers design and grade the standardized tests. It
therefore stands to reason that if students do not have the right textbooks, they will
not be able to do well on the tests even if they want to.
Answering the question whether a single school has enough books is complex
because each student in each grade studies at least four subjects every year. Asking if
there are enough books in an entire school district is a massive task. With more than
200 schools, the School District of Philadelphia is the eighth largest school district in
the country. Many of the schools have high student turnover because students switch
schools as they navigate the child welfare or juvenile justice systems (Department of
Human Services, City of Philadelphia 2012). The Children and Youth Division of the
Philadelphia Department of Human Services serves an estimated 20,000 children and
their families each year (Department of Human Services, City of Philadelphia 2014).
This background helped to pose what became the central research question: are
enough books available for Philadelphia students to allow them to prepare adequately
for state-mandated standardized tests?
I designed an algorithm and a database architecture that would let me calculate
the answer to my investigative question. The algorithm is designed to check whether
students are provided with the materials specified in the rules of the educational
system. If they are not, there is likely to be a violation, and there is probably an opportunity for a story.
Implementing the Prototype
The Story Discovery Engine prototype launched online as a project called
“Stacked Up.” It has two parts: it is both a reporting tool and a presentation system for
the stories I wrote using the reporting tool. The presentation system provides the user
with a set of investigative stories and some explanatory text about the project (see
Figure 1). The reporting tool is a set of dynamic data visualizations that allowed me to
write the investigative stories. The statistics and data that supported each story were
original, derived from the data analysis resulting from the algorithm that forms the
backbone of the project.
In the reporting tool view, the reporter sees a page representing a single school.
The page shows different types of data, organized so that specific types of investigative
questions can be easily answered (see Figure 2). Some such questions include:
How many students are in each grade in this school?
Where is the school located in the city?
How does this school’s test results compare to the rest of the district?
Do there seem to be enough books for the students enrolled?
The system design anticipates the data points that a reporter needs to write a
data-rich story and presents them in a centralized, easy-to-navigate format. The reporter
leverages their domain expertise, clicks around to adjust some what-ifs to prompt the
5
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
6
MEREDITH BROUSSARD
FIGURE 1
Presentation system and reporting tool shown on project home page
FIGURE 2
Reporting tool view
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING
creative process, and comes up with a story idea. Because the story idea is targeted, it
immediately becomes easier to identify appropriate sources.
The key is that the software does not try to solve a problem faced by all journalists on every beat. It tries to solve a specific problem on a specific beat, and in the process creates a way to solve other problems on that same beat. The Story Discovery
Engine prototype was created and applied to education data, but the model can easily
be applied to other beats as well.
A list of the rules used in the system is beyond the scope of this paper, as is a
depiction of the object model used to represent relationships between the entities
involved; however, additional technical details are available by request. For the sake of
description, however, one of the rules could be explained as follows:
Core_subjects = math, reading, social studies, science.
School_curriculum = a curriculum package published by a major educational publisher (e.g., “Everyday Math”).
Necessary_material = the minimum books or workbooks necessary to teach the
school’s curriculum package. This often means two items: a textbook and workbook.
For each school in School_District
For each grade in school
For each Core_subject
For each Necessary_material in School_curriculum
If
NumberOf(students_in_grade) = NumberOf(necessary_material)
Then
Enough_materials = yes
Else
Enough_materials = no.
Once the prototype existed, I looked at the data analysis and interviewed people
to validate the findings. I developed hypotheses, reported them out, revised the
hypotheses, and considered story formats as part of a months-long process. As predicted, the data revealed multiple potential stories about how books were “stacked up”
in Philadelphia city schools.
Theoretical Background
The Story Discovery Engine draws on adjacent, occasionally overlapping concepts
from the fields of communication, cognition, and computation. I will explain each in
turn and how it relates to the Story Discovery Engine. These fields are not generally
placed in dialogue with each other, but there are enormous productive possibilities if
they are put together in conversation.
Computation
The Story Discovery Engine software belongs to a class of artificial intelligence
programs called knowledge-based expert systems. Benfer offers an excellent definition:
7
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
8
MEREDITH BROUSSARD
Expert systems are computer programs that perform sophisticated tasks once thought
possible only for human experts. If performance were the sole criterion for labelling a
program an expert system, however, many decision support systems, statistical analyses, and spreadsheet programs could be called expert systems. Instead, the term
“expert system” is generally reserved for systems that achieve expert-level performance,
using artificial intelligence programming techniques such as symbolic representation,
inference, and heuristic search (Buchanan 1985). Knowledge-based systems can be distinguished from other branches of artificial intelligence research by their emphasis on
domain-specific knowledge, rather than more general problem-solving strategies.
Because their strength derives from such domain-specific knowledge rather than more
general problem-solving strategies (Feigenbaum 1977), expert systems are often called
“knowledge-based.” Since the knowledge of experts tends to be domain-specific rather
than general, most expert systems representing this knowledge reflect the specialized
nature of such expertise. (Benfer 1991, 4)
Benfer argues that expert systems can provide an important mechanism for prompting
new social science thinking, and expert system developers can learn from social scientists’ rigorous methods of data collection and validation. He was the first to deploy an
expert system in journalism:
MUckraker, an expert system under development by New Directions in News and the
Investigative Reporters and Editors Association at Missouri University, is a program to
advise investigative reporters on how to approach people for interviews, how to prepare
for those interviews, and how to examine a wide range of public documents in the conduct of an investigation. This program is designed to act much as an expert investigative
reporter might, advising the user on strategies to try when sources are reluctant to be
interviewed, pointing out documents that might be relevant to the investigation, and
advising the user on how to organize his or her work. (Benfer 1991, 4)
Under the expert system model Benfer describes, the expert system would deliver to
the reporter “advice” about whether the quantity of books in a school would be the
appropriate basis for a story.
The innovation in the Story Discovery Engine is that instead of advice, the expert
system delivers an interactive data visualization. The data visualization is specifically
designed to answer the most common questions a reporter might ask in order to
assess whether a story might be found at a particular school.
I decided that using the human reporter’s judgment was more efficient than a
computer’s for assessing newsworthiness in this case because the system is designed
to be used in the deadline-driven, time-sensitive environment of a newsroom. The
notion that computer-based quantitative methods should augment humans, not
replace them, is one of the principles of automated text analysis put foward by
Grimmer and Stewart (2013) in their analysis of possible pitfalls in automated content
analysis. In recent years, communication scholars have frequently used the human
workers who participate in Amazon’s Mechanical Turk in order to code content in large
data sets. In the Story Discovery Engine model, the reporter is a similarly essential part
of the system (see Figure 3).
Using the vast “computational” resources of the human brain, the reporter takes
only moments to look at the data revealed by the system, leverage formal and informal
knowledge, and make a judgment about the likelihood of a story. It would require vast
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING
amounts of computing power to get the computer to draw the same conclusions; also,
it would take years to tease out all of the subtleties of human news judgment and
implement them computationally. The human brain thus becomes an efficient part of
the story-generating process, aided and augmented by the computational system.
It is significant that Benfer used social science methods in crafting an expert system for journalism. Social science thinking is at the heart of what today we call data
journalism. Meyer pioneered the application of social science methods to journalism in
his 1967 Pulitzer Prize-winning story about race riots in Detroit; those methods were
later codified in Precision Journalism: A Reporter’s Introduction to Social Science Methods
Meyer (2002). Precision journalism methods informed computer-assisted reporting,
which flourished in the 1980s with the advent of desktop computers in the newsroom.
Today’s online data journalists are incubated and organized by the Investigative Report-
FIGURE 3
A classical expert system compared to the Story Discovery Engine
9
10
MEREDITH BROUSSARD
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
ers and Editors Association through the National Institute for Computer-Assisted Reporting, which offers the Phil Meyer Reporting Award for a data-driven project each year.
Three other computation concepts deserve mention: open data, open source, and
big data. Data journalism can only flourish if data sets are available. Structural changes in
the US government have allowed data to be more freely distributed. Influenced in part
by the open data movement, President Barack Obama (2009) released a memorandum
declaring a new openness around data access and availability. “My Administration is
committed to creating an unprecedented level of openness in Government,” it reads.
Information maintained by the Federal Government is a national asset. My Administration will take appropriate action, consistent with law and policy, to disclose information
rapidly in forms that the public can readily find and use. Executive departments and
agencies should harness new technologies to put information about their operations
and decisions online and readily available to the public. (Obama 2009)
The idea is that citizens can take government data and analyze it to increase transparency and accountability. The Story Discovery Engine is an intentional system: its analysis
is presented with the intent of increasing government accountability. It is nonpartisan
software, but it proceeds from the assumption that there are problems in the social
system that need to be exposed through the available data.
Open data is often mentioned in conjunction with open source software tools.
Stacked Up was implemented using almost exclusively open source tools. It consists of
43,000 lines of code, all of which are available on an open source version control site
called GitHub. Just like the data it analyzes, the software is publicly available for anyone
to peruse and fact-check. This adds an extra layer of transparency to a transparencyproducing activity.
It is worth mentioning at this point the relationship between software tools and
reporters’ productivity. Several Web-based tools have been developed to help journalists be more efficient at their investigative tasks. Tabula, for example, turns PDFs into
text. One of the most consistent points of conflict between reporters and officials is the
way that the officials provide information. Entire books have been written about the
nuances of negotiating for access to public records (Cuillier 2011; Marburger 2011). A
successful tool for investigative journalism allows reporters to surmount common difficulties that interfere with reporting. Likewise, several data visualization tools have
become popular to use on structured data. Putting census data into a data visualization
tool like Tableau, which displays maps and bubble charts and other forms, allows the
reporter to see patterns that would otherwise be invisible.
A small but growing subset of journalists is comfortable using data to enhance
their abilities to investigate stories. However, those reporters are limited to using the
number of data sets that they, or their newsroom team, can manage. Analyzing one
data set is usually enough for a story. Analyzing two or three data sets and turning
them into a story package requires a team that includes a programmer, designer,
writer, and editor (Domingo 2008; Parasie and Dagiral 2012; Royal 2010).
This is where big data comes in. The next frontier in investigative reporting is
using a computer to analyze multiple data sets at a time.
“Big data” means many things: lots of data (meaning a large quantity of data, as
in terabytes or yottaabytes) or lots of different types of data (meaning a great number
of data sets) (boyd and Crawford 2012). Each is difficult in a newsroom. Newsrooms
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING
tend to have minimal equipment (Domingo 2008), and it is hard to justify to an editor
why a reporter would need thousands of dollars’ worth of specialized equipment to
analyze terabytes of data. It is also hard to crunch a number of data sets in a newsroom because it requires computer-programming expertise. Reporters have to either
develop their own programming skills (which is difficult) or convince an editor to
devote in-house programming expertise to the project (which is also difficult, because
the few programmers in newsrooms tend to be overextended). Resource and personnel
shortages are practical reasons for why big data analysis seldom happens in the
newsroom (Royal 2010).
A software system, properly implemented, can shortcut this long process and can
make more efficient use of limited newsroom developer resources. Stacked Up analyzes
15 data sets, which is more than a typical newsroom can handle given staffing and
time constraints. It took three developers six months to implement, which is more time
than can usually be devoted to a news development project. However, now that the
system architecture exists, the analysis can be replicated in other states or districts in a
matter of days or weeks, not months. The system is based on standardized data, which
(as the name suggests) does not vary significantly. This is consistent with a software
design principle of “write once, run anywhere.” Any newsroom can take the software,
analyze local data, and generate dozens of original investigative stories that matter to
the newsroom’s specific audience. The Story Discovery Engine is a tool to improve productivity in both original investigative ideas and sources.
Communication
The project derives from two significant theories about the future of news. The
first is the paradigm proposed by Remler, Waisanen, and Gabor (2013): that collaborative efforts between journalists, programmers, academics, and foundations provide
opportunities for innovation. Stacked Up was created out of a partnership between a
nonprofit journalism organization under the aegis of Temple University’s Center for
Public Interest Journalism (CIPJ) and me, an independent journalist and academic. CPIJ
founded the organization with funding from the William Penn Foundation and the
Wyncote Foundation. The team also looked at best practices developed and publicized
by data journalism organizations. Data teams at ProPublica, the Chicago Tribune, and
the Washington Post all maintain “nerd blogs” that they use to communicate methodology behind their data projects; methodologies are also discussed on Source, a data
blog maintained by the Mozilla Foundation.
The other significant theoretical concept behind Stacked Up is the notion of
accountability through algorithm. In “Accountability Through Algorithm: Developing the
Field of Computational Journalism,” Hamilton and Turner (2009) define computational
journalism (of which data journalism is a subset) as: “The combination of algorithms,
data, and knowledge from the social sciences to supplement the accountability
function of journalism.” They write that computational journalism has the potential to
help sustain watchdog reporting because it can “hold leaders accountable, unmask
malfeasance, and make visible critical social trends.”
Accountability through algorithm can mean reverse-engineering an algorithm to
discover how a company used an algorithm to influence the public (Diakopoulos 2013,
11
12
MEREDITH BROUSSARD
2014; Sweeney 2013) or it can mean designing an algorithm that is used to hold
decision-makers accountable. I employ the latter meaning.
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
Cognition
To understand the cognitive labor-saving dimension of the Story Discovery
Engine model, it is useful to consider the role of creativity in newsroom production.
Reporters use what López-Ortega (2013) calls “deliberate creativity” in order to create
original prose on deadline. Spontaneous creativity, or waiting for inspiration to strike,
does not allow reporters to meet the demands of the job. Reporters employ a set of
creative problem-solving strategies to generate ideas, create interview questions,
observe events, and synthesize this information into prose that conforms to the
appropriate publication style (Gans 2004; Tuchman 1978). Boden writes of the creative
process:
Creativity is a fundamental feature of human intelligence, and a challenge for AI [Artificial Intelligence]. AI techniques can be used to create new ideas in three ways: by producing novel combinations of familiar ideas; by exploring the potential of conceptual
spaces; and by making transformations that enable the generation of previously impossible ideas. (Boden 1998, 347)
Many human beings—including (for example) most professional scientists, artists, and
jazz-musicians—make a justly respected living out of exploratory creativity. That is, they
inherit an accepted style of thinking from their culture, and then search it, and perhaps
superficially tweak it, to explore its contents, boundaries, and potential. But human
beings sometimes transform the accepted conceptual space, by altering or removing
one (or more) of its dimensions, or by adding a new one. Such transformation enables
ideas to be generated which (relative to that conceptual space) were previously impossible. The more fundamental the transformation, and/or the more fundamental the
dimension that is transformed, the more different the newly-possible structures will be.
(Boden 1998, 348)
A computer interface can provide the “fundamental transformation” that Boden calls
for:
It can be said that deliberate creativity is facilitated by objective manipulation of a conceptual space. Also, the iterative process that triggers spontaneous creativity can be
promoted by computer programs that transform repeatedly interim creations, while a
creative subject judges their value. This iterative activity leads to preserve, change,
combine or erase parameters as thought convenient. Therefore, computer-assisted software must facilitate both, deliberate and spontaneous creativity. To do so, cognitive
processes associated to creativity, as well as their complex interplay, must be characterized properly and then a computational solution can be proposed and implemented.
(López-Ortega 2013, 3460)
A computer-assistance tool to enhance creativity must possess algorithms that help
computing divergent exploration. The outcome of divergent exploration must be
unique ideas. In this sense, a software tool must help overcoming the inherent limits
of the individual for producing divergent solutions. (López-Ortega 2013, 3461)
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING
The Story Discovery Engine helps the individual overcome “inherent limits” because it
analyzes more data sets than an individual could achieve alone. It tests levels of meaning embedded in social rules: if we have ideals of equal access to education, and if we
have a public education system with standards, and if we have state-mandated assessments that measure how well students have met those standards, and if we have
teachers who are provided with the standards, and if we grant that objects (books or
other learning materials) are necessary to practice the material and concepts associated
with the standards: is this an equal system? If not, do we have enough money to make
it equal? If not, what do we do? The rules embedded in the expert system correspond
to the rules articulated in laws and public policies. Ordinarily, only a subject matter
expert would be able to render judgments about whether a scenario is within the law
or not. The Story Discovery Engine makes some of these decisions for the reporter,
freeing the reporter up for higher-level cognitive imaginings.
Findings and Implications for Further Research
I theorized that the Story Discovery Engine model could accelerate the production of ideas and stories on a public affairs beat. I prototyped the software and used it
to report on a specific beat. The successful implementation of the project suggests the
Story Discovery Engine model as a valid option for creating impactful news.
The following were among the project’s findings:
Only a handful of Philadelphia schools seem to have enough books and learning materials to teach students adequately under the district’s academic
guidelines.
At least 10 schools appear to have no books at all, others seem to have books
that are wildly out of date, and some seem to have only the books that fit the
curriculum guidelines established by a chief academic officer who left the district years ago.
Despite investing in custom software to track its textbook inventory, the District did not require any of its employees to use the software.
The District spent $111 million on textbooks between 2008 and 2013. Its
inventory showed more than a million books. Nobody knew where they were;
boxes and boxes of books lay unused and un-catalogued in the basement at
District headquarters.
The District published a recommended core curriculum, but did not know if
any of its schools were using it. There was no systematic way to determine
whether struggling schools had the books and resources they needed for student success.
These findings, once published, were shared extensively on social media and
prompted a number of changes at the School District of Philadelphia. Outcomes in subsequent weeks included:
One highly paid administrator was found to be responsible for a number of
textbook tracking failures. That administrator retired.
An internal investigation revealed that several school principals were buying
13
14
MEREDITH BROUSSARD
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
textbooks from sales representatives with whom they had personal relationships instead of buying the textbooks recommended by the central administration. Some of the reps were former school principals. This practice was
eliminated and cost savings were achieved (Jessica Diaz, personal communication 2013).
The School District of Philadelphia closed 24 schools at the end of the
2012–2013 school year, displacing approximately 4000 students. Originally, the
District planned to send all the books from the closing schools to the schools
that were slated to receive the students. Instead, the District collected all of
the books from the closing schools at a central location. An attempt was
made to organize the books and reallocate them judiciously.
An audit was performed so that the central administration was made aware of
the curriculum officially in use at each school.
Several local news organizations picked up the investigative stories and
re-published them on their own websites, amplifying the audience for the
stories.
This modest impact suggests that the reporting could be duplicated in other
large cities like Philadelphia, all of which struggle with similar logistical issues around
public education resources.
The Story Discovery Engine model also solves a particular logistical issue that
newsrooms struggle with. A newsroom depends on specialized labor. The writers are
good at writing, the editors are good at editing, the Web producers are good at the
nuances of the content management system, and the programmers are good at writing
programs. It makes sense to have the programmers write the code that teases out the
facts the reporters need to write stories. Getting the reporters to write high-level code
is less practical. However, few newsrooms have the staff that would be required to
write high-level code (McChesney 2012; Parasie and Dagiral 2012; Royal 2010). Writing
code is difficult. Royal writes that the more experience a reporter has, the more they
tend to appreciate the complexity of data journalism:
Experience is correlated to the perceived level of difficulty of working with data journalism for journalists in general. In this case, the more experience the journalist has,
the more likely he or she is to agree that data journalism is difficult for most journalists. This might indicate that the journalists with some or extensive data journalism
experience tend to value this expertise as unique and a skill that not everyone can
master. (Royal 2010)
Despite the enthusiasm for data journalism, the logistics of performing data journalism
have proved formidable for many news organizations.
Creating a Story Discovery Engine for a metropolitan area, then opening it up to
the public, allows more people to leverage the code to write stories. The engine could
also be implemented by a foundation and opened up to the public; the local press
could use it to write stories without having to fund the development or hire and manage a software staff.
A number of story prompts arose over the course of reporting for Stacked Up.
Any of the prompts could be used as prompts to write education beat stories in any
district in the United States. Some prompts include:
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING
- Some schools with active home–school associations fundraise for basic school supplies
like paper. Find a school that is fundraising for money for books or paper using social
media. Use Stacked Up to check whether the school seems to have enough books.
Explore a few scenarios:
- The school may be trying something new and interesting with its curriculum,
and the home–school association is trying to raise money to support it.
- The school was not allocated enough money to buy books for its students.
- The school was allocated enough money for books, but the money went to
something else.
- Additional scenarios not mentioned here.
- Use Stacked Up to find a school that seems not to have books. Arrange a visit and
ask to see the book storage room. Are any of the “missing” books sitting in the storage
area? If so, why?
- A school is known to have a one-to-one laptop program where each student receives
a school-issued laptop. The school still uses printed textbooks in addition to the laptops, but uses fewer textbooks. What happened to the books that were in the school
when the laptop program began? Were they redistributed to other students? If not,
where did they go?
- Every time state education standards change, every school needs to buy new books
to match the new standards. When did your state last update its standards?
- Who were the politicians on the committee that made the standards change? Is there
anything intriguing in their campaign donations?
- Districts have guidelines for how long textbooks should stay in use. Generally, a textbook lasts about five years. What happens to books after they are used for five years?
Are they recycled, or is there a depository?
- In Detroit, the book depository became a dumping ground (Dawsey 2008; Griffioen
2008). What is happening to old books in your city?
- When schools do not have enough books, teachers often compensate by making photocopies. Find a school that lacks books, and check how much they spend on photocopies. Is this an efficient economic choice?
- Some schools claim they have replaced print textbooks with digital textbooks. Digital
textbooks are password-protected. People regularly lose passwords and get locked out
of password-protected systems. Are kids and parents able to get to the digital textbooks when they need them?
- Use Stacked Up to find a school that is using social studies textbooks that are more
than five or eight years old. How do they teach civics or social studies with books that
do not include the name of the current US President?
These 10 ideas took me about 30 minutes to generate. Each of them could probably result in a series of at least three stories, plus two follow-up stories based on the
school district’s reaction. That is 50 original investigative stories, an entire year’s worth
of stories for a reporter writing one story a week. An interested reader will probably
generate additional questions while reading the story prompts; each of those questions
might produce five original investigative stories as well. The potential pool of story
15
16
MEREDITH BROUSSARD
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
ideas could multiply if given an entire newsroom of people practiced deliberate
creativity.
Having a virtual fountain of story ideas is especially useful for the modern newsroom, where online publishing means that reporters and editors need to “feed the
beast” almost constantly. Writing only one story a week is a luxury in today’s marketplace, especially at online publications where writers are urged to publish multiple stories a day and editors may edit 30–40 stories a week (June 2013; Peters 2010).
High-impact investigative stories can take a tremendous amount of time to conceive and report, a timeline that is the opposite of the current market imperative. A
software tool to accelerate the investigative process can add significant value to the
newsroom.
NOTE
1.
Books such as The Investigative Reporter’s Handbook (Houston and Investigative
Reporters and Editors, Inc. 2009) offer readers a set of places to look for stories
inside different beats such as education, transportation, or nonprofits. Likewise,
Investigative Reporters and Editors, Inc., the nonprofit formed in 1975 to help
“improve the quality of investigative reporting,” focuses significant educational
efforts on strategies to help reporters find story ideas: a February 2014 electronic
search of the Investigative Reporters and Editors library includes 127 tipsheets for
the search query “investigative story ideas.”
REFERENCES
Appelgren, Ester, and Gunnar Nygren. 2014, February. “Data Journalism in Sweden: Introducing New Methods and Genres of Journalism into ‘Old’ Organizations.” Digital Journalism: 1–12. doi:10.1080/21670811.2014.884344.
Benfer, Robert Alfred. 1991. Expert Systems. Sage University Papers Series, no. 07-077. Newbury
Park, Calif: Sage. http://dx.doi.org/10.4135/9781412984225.
Boden, Margaret A. 1998. “Creativity and Artificial Intelligence.” Artificial Intelligence 103:
347–356.
boyd, danah, and Kate Crawford. 2012. “Critical Questions for Big Data: Provocations for a
Cultural, Technological and Scholarly Phenomenon.” Information, Communication &
Society 15 (5): 662–679. doi:10.1080/1369118X.2012.678878.
Buchanan, Bruce G. 1985. “Expert systems.” Journal of Automated Reasoning 1 (1): 28–35.
Cuillier, David. 2011. The Art of Access: Strategies for Acquiring Public Records. Washington,
DC: CQ Press.
Dawsey, Chastity Pratt. 2008. “Unsecured Schools given up to Thieves, Vandals.” Detroit Free
Press, April 4. http://www.freep.com/apps/pbcs.dll/article?AID=/20080404/NEWS01/
804040302.
Department of Human Services, City of Philadelphia. 2012. 2011 Annual Report. Annual
Report. http://www.phila.gov/dhs/pdfs/DHS%20Annual%20report.pdf.
Department of Human Services, City of Philadelphia. 2014. “Children and Youth Division Home
Page.” http://dhs.phila.gov/intranet/pgintrahome_pub.nsf/content/cydhomepage.
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING
Diakopoulos, Nicholas. 2013. “Rage against the Algorithms.” The Atlantic, October 3. http://
www.theatlantic.com/technology/archive/2013/10/rage-against-the-algorithms/280255/.
Diakopoulos, Nicholas. 2014. “Algorithmic Accountability Reporting: On the Investigation of
Black Boxes”. Tow Center for Digital Journalism at Columbia University. http://towcen
ter.org/wp-content/uploads/2014/02/78524_Tow-Center-Report-WEB-1.pdf.
Diaz, Jessica. 2013. Personal Communication.
Dick, Murray. 2013, September. “Interactive Infographics and News Values.” Digital Journalism: 1–17. doi:10.1080/21670811.2013.841368.
Domingo, David. 2008. “Interactivity in the Daily Routines of Online Newsrooms: Dealing
with an Uncomfortable Myth.” Journal of Computer-Mediated Communication 13 (3):
680–704. doi:10.1111/j.1083-6101.2008.00415.x.
Feigenbaum, E.A. 1977. “The Art of Artificial Intelligence: Themes and Case Studies of Knowledge Engineering.” Proceedings UCAI 5. Cambridge, MA.
Flaounas, Ilias, Omar Ali, Thomas Lansdall-Welfare, Tijl De Bie, Nick Mosdell, Justin Lewis, and
Nello Cristianini. 2013. “Research Methods in the Age of Digital Journalism: MassiveScale Automated Analysis of News-Content—Topics, Style and Gender.” Digital
Journalism 1 (1): 102–116. doi:10.1080/21670811.2012.714928.
Flew, Terry, Christina Spurgeon, Anna Daniel, and Adam Swift. 2012. “The Promise of Computational Journalism.” Journalism Practice 6 (2): 157–171. doi:10.1080/17512786.2011.
616655.
Gans, Herbert J. 2004. Deciding What’s News: A Study of CBS Evening News, NBC Nightly News,
Newsweek and Time / Herbert J. Gans. Visions of the American Press. Evanston, Ill:
Northwestern University Press.
Griffioen, James D. 2008. “The Knowledge of What Happened and What Will.” Sweet Juniper.
http://www.sweet-juniper.com/2008/04/knowledge-of-what-happened-and-what.html.
Grimmer, Justin, and Brandon M. Stewart. 2013. “Text as Data: The Promise and Pitfalls of
Automatic Content Analysis Methods for Political Texts.” Political Analysis 21 (3):
267–297. doi:10.1093/pan/mps028.
Hamilton, James T., and Fred Turner. 2009. Accountability through Algorithm: Developing the
Field of Computational Journalism. Developing the Field of Computational Journalism.
Center For Advanced Study in the Behavioral Sciences Summer Workshop: Stanford
University. http://www.stanford.edu/~fturner/Hamilton%20Turner%20Acc%20by%20Alg
%20Final.pdf.
Hansen, Kathleen A. 1991. “Source Diversity and Newspaper Enterprise Journalism.” Journalism
& Mass Communication Quarterly 68 (3): 474–482. doi:10.1177/107769909106800318.
Houston, Brant and Investigative Reporters and Editors, Inc. 2009. The Investigative Reporter’s
Handbook: A Guide to Documents, Databases and Techniques. 5th ed. , edited by Brant
Houston, Investigative Reporters and Editors, Inc. Boston, MA: Bedford/St. Martin’s.
Howard, Alexander Benjamin. 2014. The Art & Science of Data-Driven Journalism. Tow/Knight
Reports. Tow Center for Digital Journalism: Columbia University. http://towcenter.org/
wp-content/uploads/2014/05/Tow-Center-Data-Driven-Journalism.pdf.
June, Laura. 2013. “Maura Johnston on Why She Opened Her IPad-Only Magazine to the
Web.” The Verge, July 10. http://www.theverge.com/2013/7/10/4506824/maura-john
ston-on-why-she-opened-her-ipad-only-magazine-to-the-web.
Labbé, Theola and Dion Haynes, V. 2007. “Rhee Blasts Textbook Process for Letting Supplies
Languish.” The Washington Post, August 4. http://www.washingtonpost.com/wp-dyn/
content/article/2007/08/03/AR2007080302134_pf.html.
17
Downloaded by [Temple University Libraries] at 08:32 15 December 2014
18
MEREDITH BROUSSARD
López-Ortega, Omar. 2013. “Computer-Assisted Creativity: Emulation of Cognitive Processes
on a Multi-Agent System.” Expert Systems with Applications 40 (9): 3459–3470.
doi:10.1016/j.eswa.2012.12.054.
Marburger, David. 2011. Access with Attitude: An Advocate’s Guide to Freedom of Information
in Ohio. Athens: Ohio University Press.
McChesney, Robert W. 2012. “Farewell to Journalism?: Time for a Rethinking.” Journalism
Practice 6 (5–6): 614–626. doi:10.1080/17512786.2012.683273.
Meyer, Philip. 2002. Precision Journalism: A Reporter’s Introduction to Social Science Methods.
4th ed. Lanham, Md: Rowman & Littlefield.
Obama, Barack. 2009. “Memorandum for the Heads of Executive Departments and Agencies
Re: Transparency and Open Government”. Federal Register. http://www.whitehouse.
gov/the_press_office/TransparencyandOpenGovernment.
Parasie, S., and E. Dagiral. 2012. “Data-Driven Journalism and the Public Good: ‘ComputerAssisted-Reporters’ and ‘Programmer-Journalists’ in Chicago.” New Media & Society 15
(6): 853–871. doi:10.1177/1461444812463345.
Pavlik, John V. 2013. “Innovation and the Future of Journalism.” Digital Journalism 1 (2):
181–193. doi:10.1080/21670811.2012.756666.
Peters, Jeremy W. 2010. “In a World of Online News, Burnout Starts Younger.” The New York
Times, July 18. http://www.nytimes.com/2010/07/19/business/media/19press.html.
Protess, David. 1991. The Journalism of Outrage: Investigative Reporting and Agenda Building
in America. New York: Guilford Press.
Remler, Dahlia K., Don J. Waisanen, and Andrea Gabor. 2013. “Academic Journalism: A Modest
Proposal.” Journalism Studies, August, 1–17. doi:10.1080/1461670X.2013.821321.
Royal, Cindy. 2010. “The Journalist as Programmer: A Case Study of the New York times
Interactive News Technology Department.” The University of Texas at Austin. https://
online.journalism.utexas.edu/2010/papers/Royal10.pdf.
Scribner, S., and M. Cole. 1973. “Cognitive Consequences of Formal and Informal Education:
New Accommodations Are Needed between School-Based Learning and Learning
Experiences of Everyday Life.” Science 182 (4112): 553–559. doi:10.1126/science.182.4112.553.
Sternberg, Robert J., ed. 1999. Handbook of Creativity. Cambridge, U.K. ; New York:
Cambridge University Press.
Sweeney, Latanya. 2013. “Discrimination in Online Ad Delivery.” Communications of the ACM
56 (5): 44. doi:10.1145/2447976.2447990.
Tuchman, Gaye. 1978. Making News: A Study in the Construction of Reality. New York: Free
Press.
Meredith Broussard, Department of Journalism, Temple University, USA. E-mail:
merbroussard@temple.edu. Web: http://meredithbroussard.com