Artificial Intelligence for Investigative Reporting

Meredith Broussard

Artificial Intelligence for Investigative Reporting

2014, Digital Journalism

ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING Using an expert system to enhance journalists’ ability to discover original public affairs stories Downloaded by [Temple University Libraries] at 08:32 15 December 2014 Meredith Broussard This paper describes an artificial intelligence-based software system that augments public affairs reporters’ ability to sort through data and identify investigative storytelling opportunities. A prototype of the model was developed and was used to analyze education data. The successful prototype and the social impact of the stories derived from the prototype suggest this approach as a valid option for newsrooms that seek to tell more compelling, data-rich stories about public affairs issues. KEYWORDS artificial intelligence; computational journalism; data journalism; expert systems; innovation; public affairs journalism Introduction “Readers don’t care about bureaucracy,” one of my colleagues tells her students on the first day of her public affairs journalism class. “To make people care about public affairs, you have to tell a story that taps into our shared humanity.” The work of telling routine public affairs stories becomes second nature to a beat reporter. But for an investigative reporter, storytelling requires an additional layer of cognitive complexity. The investigative reporter must come up with an original idea—a creative act—and must then find sources and turn the idea into a narrative. Ideas are easy to generate. Original ideas are much harder. Original ideas that can turn into successful investigative stories are even more difficult to create. Once the idea exists, the timeline is uncertain: investigative stories can take a very long time to conceive and report. Many of today’s economically challenged newsrooms do not feel they can afford such a luxury. While a computer cannot generate original story ideas, computational methods for accelerating human creativity offer a possible solution for newsrooms seeking to amplify their investigative reporting capacity. This paper describes a model for leveraging artificial intelligence to accelerate the process of discovering investigative ideas on public affairs beats such as education, transportation, or campaign finance. Digital Journalism, 2014 http://dx.doi.org/10.1080/21670811.2014.985497 Ó 2014 Taylor & Francis Downloaded by [Temple University Libraries] at 08:32 15 December 2014 2 MEREDITH BROUSSARD The model, which I call the “Story Discovery Engine,” derives from a type of artificial intelligence software called an expert system. In this paper, I explain how the engine works to facilitate the discovery of investigative ideas. First, I outline the conceptual process involved in generating new investigative story ideas. I describe expert systems, outline one of the logical rules embedded in the software, and show the difference between a classical expert system and the Story Discovery Engine. I demonstrate how I tested the system by developing a prototype of the Story Discovery Engine that analyzed education data from the School District of Philadelphia, the eighth-largest school district in the United States. That prototype was published online as a project called “Stacked Up.” Stacked Up consists of a set of investigative stories as well as a reporting tool made of dynamic, customizable data visualizations inside a narrative framework. I summarize the investigative stories that were produced from the reporting tool and the policy changes that resulted. The implementation and resulting investigative news stories, plus the project’s impact, suggest that the Story Discovery Engine model can add value to investigative reporting. Creativity and Investigative Story Ideas Social scientists have engaged with the notion of investigative reporting as a cultural construction produced inside a particular organizational culture (Gans 2004; Tuchman 1978). For the purposes of this paper, investigative reporting is defined as a type of enterprise journalism that is produced over time, outside of the day-to-day deadline crunch, and includes diverse sources (Hansen 1991). The cognitive process of coming up with an original investigative story idea is a creative act under Sternberg’s (1999) definition of creativity as the production of work that is both novel (as in original) and appropriate (as in useful). Experienced investigative reporters build up a set of strategies for finding story ideas, but novice investigative reporters often struggle to find opportunities for novel enterprise stories. Training and education materials for novice investigators focus on places to look for stories: follow the money, look at specific lines on financial filings, and so on.1 The complexity of the process is part of the reason that so much investigative journalism is reactive, resulting from a tip from a whistle blower, rather than proactive (Protess 1991). Journalism innovation theorists have suggested that tremendous possibilities exist in analyzing data to find investigative ideas (Appelgren and Nygren 2014; Dick 2013; Flaounas et al. 2013; Pavlik 2013). Hamilton and Turner (2009) write that the future of watchdog journalism may be found in using algorithms (precisely defined problemsolving procedures) for accountability: The best algorithms will essentially be public interest data mining. They will generate leads, hunches, and anomalies to investigate. It will remain for reporters and others interested in government performance to take the next step of tracking down the story behind the data pattern. Tracking down a story in data requires specialized technical skills (to do the datacrunching) as well as journalistic expertise (to refine the story idea and craft appropriate prose). These skills until recently have tended to be segregated into different job Downloaded by [Temple University Libraries] at 08:32 15 December 2014 ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING categories and experience levels. A novice reporter might have sufficient technical skills to use pivot tables in a spreadsheet, for example, but might not have sufficient job experience to know that pivot table analysis could be applied to monthly government data releases on a particular beat. The promise of computational journalism is that such walls would be broken down through collaboration and training (Flew et al. 2012). A successful computational journalism project might thus be described as one that uses computational thinking to bridge a knowledge gap. This knowledge gap between the experienced and the novice reporter involves two types of knowledge: formal and informal (Scribner and Cole 1973). Formal knowledge includes rules of a system, as in knowing the rules of English grammar. An experienced education reporter has formal knowledge of his or her state’s laws and policies around education. Informal knowledge includes domain expertise and rules of thumb based on experience. Informal knowledge for an experienced investigative reporter might include a rule of thumb like this: If you have a natural disaster like Hurricane Sandy, and there is a big pool of money for hurricane relief, some of those funds will be misused; after a natural disaster, always follow up and find out where things went wrong with the government funds, and you’ll find a story. To come up with ideas the way an experienced reporter would, the novice reporter needs the informal knowledge that the experienced reporter has about where to find stories plus some of the formal knowledge about education policies. Origin of the Project In 2011, I found myself staring into exactly this type of knowledge gap. I was an experienced reporter, but not on the public affairs beat. I wanted to investigate a question in education: do Philadelphia public school children have enough books to learn the material on the state-mandated standardized tests? I had data, I had methods, but I did not have contacts. I wanted to talk to parents, teachers, and students at the city’s best schools, and the city’s worst schools, and see if there was a difference in the students’ access to books. To do that, I needed to figure out which were the best schools, and which were the worst schools; I also needed to find people to talk to at each. There were more than 200 schools. The task was daunting. Educational data is abundant, but the specific analysis I wanted had not been done before. It also involved numerous interdependencies and micro-judgments. To investigate the story I wanted to write, I turned to data journalism. Data journalism is the practice of finding stories in numbers, and using numbers to tell stories (Broussard, quoted in Howard 2014). It is an evolving practice (Appelgren and Nygren 2014) that may also be called data-driven journalism or computational journalism. Public affairs reporting is particularly suited to data journalism, and specifically expert system analysis, because public affairs reporting depends on interpreting the rules of a local system. An education beat reporter must be familiar with a dizzying array of laws and policies at the federal and state level. Fortunately, these laws and policies are articulated in text-based rules that are easily available online. The government uses data to track the success of its programs, and that data is frequently published 3 Downloaded by [Temple University Libraries] at 08:32 15 December 2014 4 MEREDITH BROUSSARD online. Other data sets are available to reporters and citizens under the Freedom of Information Act of 1966. Clearly articulated rules in the real world can be translated easily into computer logic rules. Applying the rules to the data allows the computational “intelligence” to uncover social problems. Thus, the first step was creating a software system that would do some of the necessary investigative thinking for me. Embedding formal and informal knowledge into the software would allow me (or any other reporter) to use the software as a reporting tool to refine story ideas and more efficiently find sources. This is the essence of the Story Discovery Engine. It is possible to take some of the experienced reporter’s knowledge, abstract out the high-level rules or heuristics, and implement these rules inside an expert system in the form of database logic. The data about the real world is fed in, the logical rules are applied, and the system presents the reporter with a visual representation of a specific site within the system. The Prototype An investigation often arises when a reporter perceives a difference between what is (the observed reality) and what should be (as articulated in law or policy). A high-impact investigative story looks at a situation where what is differs from what should be, and explains why. The reader can then use the narrative to create or enact a path to remedy the situation. The idea for Stacked Up arose from just such a difference. “The school is terrific,” my neighbor said of her daughters’ public school, considered one of the best in the city. “But if you’re a parent there, you have to be prepared to do a lot of fundraising for basic things like textbooks.” A few years later, I noticed that I was getting the same email at the beginning of every semester from the students in my college classes: it said that the student was very sorry, but he or she could not do the homework because the course books had not yet arrived in the mail. Those students always seemed to be the students who received the lowest grades at the end of the semester. It made sense: they could not do the work required to pass the class if they did not have the books. I wondered: could book shortages be a factor in Philadelphia public schools’ consistently low standardized test scores? (Many parents do not have the resources to fundraise to get books for a school—my neighbor is an outlier, as are many of the other parents at that particular school.) The District currently has 131,262 students in grades pre-kindergarten through 12, 87.3 percent of whom are economically disadvantaged. This is a significant issue because even if parents at each school fundraised, they might not be able to raise enough money to buy all of the books needed. Most people would be surprised at the idea that a public school would not have enough books. After all, Pennsylvania law specifically says that the state provides books. In Philadelphia, however, students and parents regularly complain of textbook shortages. A 10th grader at Parkway West High School told me that students often have to share books in class and cannot take them home to do homework. Many books are in poor condition: “There were pictures of testicles drawn on every page,” she said of one of her ninth-grade books. The logistical challenges of getting multiple books to hundreds of thousands of students at hundreds of schools overwhelm many major school districts (Labbé and Haynes 2007). Downloaded by [Temple University Libraries] at 08:32 15 December 2014 ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING Access to books is particularly critical because a school today is labeled a success or failure based on students’ performance on high-stakes tests. The tests are highly specific and are aligned with state educational standards. The tests are also aligned with the textbooks sold by the three educational publishers that dominate the educational publishing market. These same publishers design and grade the standardized tests. It therefore stands to reason that if students do not have the right textbooks, they will not be able to do well on the tests even if they want to. Answering the question whether a single school has enough books is complex because each student in each grade studies at least four subjects every year. Asking if there are enough books in an entire school district is a massive task. With more than 200 schools, the School District of Philadelphia is the eighth largest school district in the country. Many of the schools have high student turnover because students switch schools as they navigate the child welfare or juvenile justice systems (Department of Human Services, City of Philadelphia 2012). The Children and Youth Division of the Philadelphia Department of Human Services serves an estimated 20,000 children and their families each year (Department of Human Services, City of Philadelphia 2014). This background helped to pose what became the central research question: are enough books available for Philadelphia students to allow them to prepare adequately for state-mandated standardized tests? I designed an algorithm and a database architecture that would let me calculate the answer to my investigative question. The algorithm is designed to check whether students are provided with the materials specified in the rules of the educational system. If they are not, there is likely to be a violation, and there is probably an opportunity for a story. Implementing the Prototype The Story Discovery Engine prototype launched online as a project called “Stacked Up.” It has two parts: it is both a reporting tool and a presentation system for the stories I wrote using the reporting tool. The presentation system provides the user with a set of investigative stories and some explanatory text about the project (see Figure 1). The reporting tool is a set of dynamic data visualizations that allowed me to write the investigative stories. The statistics and data that supported each story were original, derived from the data analysis resulting from the algorithm that forms the backbone of the project. In the reporting tool view, the reporter sees a page representing a single school. The page shows different types of data, organized so that specific types of investigative questions can be easily answered (see Figure 2). Some such questions include: How many students are in each grade in this school? Where is the school located in the city? How does this school’s test results compare to the rest of the district? Do there seem to be enough books for the students enrolled? The system design anticipates the data points that a reporter needs to write a data-rich story and presents them in a centralized, easy-to-navigate format. The reporter leverages their domain expertise, clicks around to adjust some what-ifs to prompt the 5 Downloaded by [Temple University Libraries] at 08:32 15 December 2014 6 MEREDITH BROUSSARD FIGURE 1 Presentation system and reporting tool shown on project home page FIGURE 2 Reporting tool view Downloaded by [Temple University Libraries] at 08:32 15 December 2014 ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING creative process, and comes up with a story idea. Because the story idea is targeted, it immediately becomes easier to identify appropriate sources. The key is that the software does not try to solve a problem faced by all journalists on every beat. It tries to solve a specific problem on a specific beat, and in the process creates a way to solve other problems on that same beat. The Story Discovery Engine prototype was created and applied to education data, but the model can easily be applied to other beats as well. A list of the rules used in the system is beyond the scope of this paper, as is a depiction of the object model used to represent relationships between the entities involved; however, additional technical details are available by request. For the sake of description, however, one of the rules could be explained as follows: Core_subjects = math, reading, social studies, science. School_curriculum = a curriculum package published by a major educational publisher (e.g., “Everyday Math”). Necessary_material = the minimum books or workbooks necessary to teach the school’s curriculum package. This often means two items: a textbook and workbook. For each school in School_District For each grade in school For each Core_subject For each Necessary_material in School_curriculum If NumberOf(students_in_grade) = NumberOf(necessary_material) Then Enough_materials = yes Else Enough_materials = no. Once the prototype existed, I looked at the data analysis and interviewed people to validate the findings. I developed hypotheses, reported them out, revised the hypotheses, and considered story formats as part of a months-long process. As predicted, the data revealed multiple potential stories about how books were “stacked up” in Philadelphia city schools. Theoretical Background The Story Discovery Engine draws on adjacent, occasionally overlapping concepts from the fields of communication, cognition, and computation. I will explain each in turn and how it relates to the Story Discovery Engine. These fields are not generally placed in dialogue with each other, but there are enormous productive possibilities if they are put together in conversation. Computation The Story Discovery Engine software belongs to a class of artificial intelligence programs called knowledge-based expert systems. Benfer offers an excellent definition: 7 Downloaded by [Temple University Libraries] at 08:32 15 December 2014 8 MEREDITH BROUSSARD Expert systems are computer programs that perform sophisticated tasks once thought possible only for human experts. If performance were the sole criterion for labelling a program an expert system, however, many decision support systems, statistical analyses, and spreadsheet programs could be called expert systems. Instead, the term “expert system” is generally reserved for systems that achieve expert-level performance, using artificial intelligence programming techniques such as symbolic representation, inference, and heuristic search (Buchanan 1985). Knowledge-based systems can be distinguished from other branches of artificial intelligence research by their emphasis on domain-specific knowledge, rather than more general problem-solving strategies. Because their strength derives from such domain-specific knowledge rather than more general problem-solving strategies (Feigenbaum 1977), expert systems are often called “knowledge-based.” Since the knowledge of experts tends to be domain-specific rather than general, most expert systems representing this knowledge reflect the specialized nature of such expertise. (Benfer 1991, 4) Benfer argues that expert systems can provide an important mechanism for prompting new social science thinking, and expert system developers can learn from social scientists’ rigorous methods of data collection and validation. He was the first to deploy an expert system in journalism: MUckraker, an expert system under development by New Directions in News and the Investigative Reporters and Editors Association at Missouri University, is a program to advise investigative reporters on how to approach people for interviews, how to prepare for those interviews, and how to examine a wide range of public documents in the conduct of an investigation. This program is designed to act much as an expert investigative reporter might, advising the user on strategies to try when sources are reluctant to be interviewed, pointing out documents that might be relevant to the investigation, and advising the user on how to organize his or her work. (Benfer 1991, 4) Under the expert system model Benfer describes, the expert system would deliver to the reporter “advice” about whether the quantity of books in a school would be the appropriate basis for a story. The innovation in the Story Discovery Engine is that instead of advice, the expert system delivers an interactive data visualization. The data visualization is specifically designed to answer the most common questions a reporter might ask in order to assess whether a story might be found at a particular school. I decided that using the human reporter’s judgment was more efficient than a computer’s for assessing newsworthiness in this case because the system is designed to be used in the deadline-driven, time-sensitive environment of a newsroom. The notion that computer-based quantitative methods should augment humans, not replace them, is one of the principles of automated text analysis put foward by Grimmer and Stewart (2013) in their analysis of possible pitfalls in automated content analysis. In recent years, communication scholars have frequently used the human workers who participate in Amazon’s Mechanical Turk in order to code content in large data sets. In the Story Discovery Engine model, the reporter is a similarly essential part of the system (see Figure 3). Using the vast “computational” resources of the human brain, the reporter takes only moments to look at the data revealed by the system, leverage formal and informal knowledge, and make a judgment about the likelihood of a story. It would require vast Downloaded by [Temple University Libraries] at 08:32 15 December 2014 ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING amounts of computing power to get the computer to draw the same conclusions; also, it would take years to tease out all of the subtleties of human news judgment and implement them computationally. The human brain thus becomes an efficient part of the story-generating process, aided and augmented by the computational system. It is significant that Benfer used social science methods in crafting an expert system for journalism. Social science thinking is at the heart of what today we call data journalism. Meyer pioneered the application of social science methods to journalism in his 1967 Pulitzer Prize-winning story about race riots in Detroit; those methods were later codified in Precision Journalism: A Reporter’s Introduction to Social Science Methods Meyer (2002). Precision journalism methods informed computer-assisted reporting, which flourished in the 1980s with the advent of desktop computers in the newsroom. Today’s online data journalists are incubated and organized by the Investigative Report- FIGURE 3 A classical expert system compared to the Story Discovery Engine 9 10 MEREDITH BROUSSARD Downloaded by [Temple University Libraries] at 08:32 15 December 2014 ers and Editors Association through the National Institute for Computer-Assisted Reporting, which offers the Phil Meyer Reporting Award for a data-driven project each year. Three other computation concepts deserve mention: open data, open source, and big data. Data journalism can only flourish if data sets are available. Structural changes in the US government have allowed data to be more freely distributed. Influenced in part by the open data movement, President Barack Obama (2009) released a memorandum declaring a new openness around data access and availability. “My Administration is committed to creating an unprecedented level of openness in Government,” it reads. Information maintained by the Federal Government is a national asset. My Administration will take appropriate action, consistent with law and policy, to disclose information rapidly in forms that the public can readily find and use. Executive departments and agencies should harness new technologies to put information about their operations and decisions online and readily available to the public. (Obama 2009) The idea is that citizens can take government data and analyze it to increase transparency and accountability. The Story Discovery Engine is an intentional system: its analysis is presented with the intent of increasing government accountability. It is nonpartisan software, but it proceeds from the assumption that there are problems in the social system that need to be exposed through the available data. Open data is often mentioned in conjunction with open source software tools. Stacked Up was implemented using almost exclusively open source tools. It consists of 43,000 lines of code, all of which are available on an open source version control site called GitHub. Just like the data it analyzes, the software is publicly available for anyone to peruse and fact-check. This adds an extra layer of transparency to a transparencyproducing activity. It is worth mentioning at this point the relationship between software tools and reporters’ productivity. Several Web-based tools have been developed to help journalists be more efficient at their investigative tasks. Tabula, for example, turns PDFs into text. One of the most consistent points of conflict between reporters and officials is the way that the officials provide information. Entire books have been written about the nuances of negotiating for access to public records (Cuillier 2011; Marburger 2011). A successful tool for investigative journalism allows reporters to surmount common difficulties that interfere with reporting. Likewise, several data visualization tools have become popular to use on structured data. Putting census data into a data visualization tool like Tableau, which displays maps and bubble charts and other forms, allows the reporter to see patterns that would otherwise be invisible. A small but growing subset of journalists is comfortable using data to enhance their abilities to investigate stories. However, those reporters are limited to using the number of data sets that they, or their newsroom team, can manage. Analyzing one data set is usually enough for a story. Analyzing two or three data sets and turning them into a story package requires a team that includes a programmer, designer, writer, and editor (Domingo 2008; Parasie and Dagiral 2012; Royal 2010). This is where big data comes in. The next frontier in investigative reporting is using a computer to analyze multiple data sets at a time. “Big data” means many things: lots of data (meaning a large quantity of data, as in terabytes or yottaabytes) or lots of different types of data (meaning a great number of data sets) (boyd and Crawford 2012). Each is difficult in a newsroom. Newsrooms Downloaded by [Temple University Libraries] at 08:32 15 December 2014 ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING tend to have minimal equipment (Domingo 2008), and it is hard to justify to an editor why a reporter would need thousands of dollars’ worth of specialized equipment to analyze terabytes of data. It is also hard to crunch a number of data sets in a newsroom because it requires computer-programming expertise. Reporters have to either develop their own programming skills (which is difficult) or convince an editor to devote in-house programming expertise to the project (which is also difficult, because the few programmers in newsrooms tend to be overextended). Resource and personnel shortages are practical reasons for why big data analysis seldom happens in the newsroom (Royal 2010). A software system, properly implemented, can shortcut this long process and can make more efficient use of limited newsroom developer resources. Stacked Up analyzes 15 data sets, which is more than a typical newsroom can handle given staffing and time constraints. It took three developers six months to implement, which is more time than can usually be devoted to a news development project. However, now that the system architecture exists, the analysis can be replicated in other states or districts in a matter of days or weeks, not months. The system is based on standardized data, which (as the name suggests) does not vary significantly. This is consistent with a software design principle of “write once, run anywhere.” Any newsroom can take the software, analyze local data, and generate dozens of original investigative stories that matter to the newsroom’s specific audience. The Story Discovery Engine is a tool to improve productivity in both original investigative ideas and sources. Communication The project derives from two significant theories about the future of news. The first is the paradigm proposed by Remler, Waisanen, and Gabor (2013): that collaborative efforts between journalists, programmers, academics, and foundations provide opportunities for innovation. Stacked Up was created out of a partnership between a nonprofit journalism organization under the aegis of Temple University’s Center for Public Interest Journalism (CIPJ) and me, an independent journalist and academic. CPIJ founded the organization with funding from the William Penn Foundation and the Wyncote Foundation. The team also looked at best practices developed and publicized by data journalism organizations. Data teams at ProPublica, the Chicago Tribune, and the Washington Post all maintain “nerd blogs” that they use to communicate methodology behind their data projects; methodologies are also discussed on Source, a data blog maintained by the Mozilla Foundation. The other significant theoretical concept behind Stacked Up is the notion of accountability through algorithm. In “Accountability Through Algorithm: Developing the Field of Computational Journalism,” Hamilton and Turner (2009) define computational journalism (of which data journalism is a subset) as: “The combination of algorithms, data, and knowledge from the social sciences to supplement the accountability function of journalism.” They write that computational journalism has the potential to help sustain watchdog reporting because it can “hold leaders accountable, unmask malfeasance, and make visible critical social trends.” Accountability through algorithm can mean reverse-engineering an algorithm to discover how a company used an algorithm to influence the public (Diakopoulos 2013, 11 12 MEREDITH BROUSSARD 2014; Sweeney 2013) or it can mean designing an algorithm that is used to hold decision-makers accountable. I employ the latter meaning. Downloaded by [Temple University Libraries] at 08:32 15 December 2014 Cognition To understand the cognitive labor-saving dimension of the Story Discovery Engine model, it is useful to consider the role of creativity in newsroom production. Reporters use what López-Ortega (2013) calls “deliberate creativity” in order to create original prose on deadline. Spontaneous creativity, or waiting for inspiration to strike, does not allow reporters to meet the demands of the job. Reporters employ a set of creative problem-solving strategies to generate ideas, create interview questions, observe events, and synthesize this information into prose that conforms to the appropriate publication style (Gans 2004; Tuchman 1978). Boden writes of the creative process: Creativity is a fundamental feature of human intelligence, and a challenge for AI [Artificial Intelligence]. AI techniques can be used to create new ideas in three ways: by producing novel combinations of familiar ideas; by exploring the potential of conceptual spaces; and by making transformations that enable the generation of previously impossible ideas. (Boden 1998, 347) Many human beings—including (for example) most professional scientists, artists, and jazz-musicians—make a justly respected living out of exploratory creativity. That is, they inherit an accepted style of thinking from their culture, and then search it, and perhaps superficially tweak it, to explore its contents, boundaries, and potential. But human beings sometimes transform the accepted conceptual space, by altering or removing one (or more) of its dimensions, or by adding a new one. Such transformation enables ideas to be generated which (relative to that conceptual space) were previously impossible. The more fundamental the transformation, and/or the more fundamental the dimension that is transformed, the more different the newly-possible structures will be. (Boden 1998, 348) A computer interface can provide the “fundamental transformation” that Boden calls for: It can be said that deliberate creativity is facilitated by objective manipulation of a conceptual space. Also, the iterative process that triggers spontaneous creativity can be promoted by computer programs that transform repeatedly interim creations, while a creative subject judges their value. This iterative activity leads to preserve, change, combine or erase parameters as thought convenient. Therefore, computer-assisted software must facilitate both, deliberate and spontaneous creativity. To do so, cognitive processes associated to creativity, as well as their complex interplay, must be characterized properly and then a computational solution can be proposed and implemented. (López-Ortega 2013, 3460) A computer-assistance tool to enhance creativity must possess algorithms that help computing divergent exploration. The outcome of divergent exploration must be unique ideas. In this sense, a software tool must help overcoming the inherent limits of the individual for producing divergent solutions. (López-Ortega 2013, 3461) Downloaded by [Temple University Libraries] at 08:32 15 December 2014 ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING The Story Discovery Engine helps the individual overcome “inherent limits” because it analyzes more data sets than an individual could achieve alone. It tests levels of meaning embedded in social rules: if we have ideals of equal access to education, and if we have a public education system with standards, and if we have state-mandated assessments that measure how well students have met those standards, and if we have teachers who are provided with the standards, and if we grant that objects (books or other learning materials) are necessary to practice the material and concepts associated with the standards: is this an equal system? If not, do we have enough money to make it equal? If not, what do we do? The rules embedded in the expert system correspond to the rules articulated in laws and public policies. Ordinarily, only a subject matter expert would be able to render judgments about whether a scenario is within the law or not. The Story Discovery Engine makes some of these decisions for the reporter, freeing the reporter up for higher-level cognitive imaginings. Findings and Implications for Further Research I theorized that the Story Discovery Engine model could accelerate the production of ideas and stories on a public affairs beat. I prototyped the software and used it to report on a specific beat. The successful implementation of the project suggests the Story Discovery Engine model as a valid option for creating impactful news. The following were among the project’s findings: Only a handful of Philadelphia schools seem to have enough books and learning materials to teach students adequately under the district’s academic guidelines. At least 10 schools appear to have no books at all, others seem to have books that are wildly out of date, and some seem to have only the books that fit the curriculum guidelines established by a chief academic officer who left the district years ago. Despite investing in custom software to track its textbook inventory, the District did not require any of its employees to use the software. The District spent $111 million on textbooks between 2008 and 2013. Its inventory showed more than a million books. Nobody knew where they were; boxes and boxes of books lay unused and un-catalogued in the basement at District headquarters. The District published a recommended core curriculum, but did not know if any of its schools were using it. There was no systematic way to determine whether struggling schools had the books and resources they needed for student success. These findings, once published, were shared extensively on social media and prompted a number of changes at the School District of Philadelphia. Outcomes in subsequent weeks included: One highly paid administrator was found to be responsible for a number of textbook tracking failures. That administrator retired. An internal investigation revealed that several school principals were buying 13 14 MEREDITH BROUSSARD Downloaded by [Temple University Libraries] at 08:32 15 December 2014 textbooks from sales representatives with whom they had personal relationships instead of buying the textbooks recommended by the central administration. Some of the reps were former school principals. This practice was eliminated and cost savings were achieved (Jessica Diaz, personal communication 2013). The School District of Philadelphia closed 24 schools at the end of the 2012–2013 school year, displacing approximately 4000 students. Originally, the District planned to send all the books from the closing schools to the schools that were slated to receive the students. Instead, the District collected all of the books from the closing schools at a central location. An attempt was made to organize the books and reallocate them judiciously. An audit was performed so that the central administration was made aware of the curriculum officially in use at each school. Several local news organizations picked up the investigative stories and re-published them on their own websites, amplifying the audience for the stories. This modest impact suggests that the reporting could be duplicated in other large cities like Philadelphia, all of which struggle with similar logistical issues around public education resources. The Story Discovery Engine model also solves a particular logistical issue that newsrooms struggle with. A newsroom depends on specialized labor. The writers are good at writing, the editors are good at editing, the Web producers are good at the nuances of the content management system, and the programmers are good at writing programs. It makes sense to have the programmers write the code that teases out the facts the reporters need to write stories. Getting the reporters to write high-level code is less practical. However, few newsrooms have the staff that would be required to write high-level code (McChesney 2012; Parasie and Dagiral 2012; Royal 2010). Writing code is difficult. Royal writes that the more experience a reporter has, the more they tend to appreciate the complexity of data journalism: Experience is correlated to the perceived level of difficulty of working with data journalism for journalists in general. In this case, the more experience the journalist has, the more likely he or she is to agree that data journalism is difficult for most journalists. This might indicate that the journalists with some or extensive data journalism experience tend to value this expertise as unique and a skill that not everyone can master. (Royal 2010) Despite the enthusiasm for data journalism, the logistics of performing data journalism have proved formidable for many news organizations. Creating a Story Discovery Engine for a metropolitan area, then opening it up to the public, allows more people to leverage the code to write stories. The engine could also be implemented by a foundation and opened up to the public; the local press could use it to write stories without having to fund the development or hire and manage a software staff. A number of story prompts arose over the course of reporting for Stacked Up. Any of the prompts could be used as prompts to write education beat stories in any district in the United States. Some prompts include: Downloaded by [Temple University Libraries] at 08:32 15 December 2014 ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING - Some schools with active home–school associations fundraise for basic school supplies like paper. Find a school that is fundraising for money for books or paper using social media. Use Stacked Up to check whether the school seems to have enough books. Explore a few scenarios: - The school may be trying something new and interesting with its curriculum, and the home–school association is trying to raise money to support it. - The school was not allocated enough money to buy books for its students. - The school was allocated enough money for books, but the money went to something else. - Additional scenarios not mentioned here. - Use Stacked Up to find a school that seems not to have books. Arrange a visit and ask to see the book storage room. Are any of the “missing” books sitting in the storage area? If so, why? - A school is known to have a one-to-one laptop program where each student receives a school-issued laptop. The school still uses printed textbooks in addition to the laptops, but uses fewer textbooks. What happened to the books that were in the school when the laptop program began? Were they redistributed to other students? If not, where did they go? - Every time state education standards change, every school needs to buy new books to match the new standards. When did your state last update its standards? - Who were the politicians on the committee that made the standards change? Is there anything intriguing in their campaign donations? - Districts have guidelines for how long textbooks should stay in use. Generally, a textbook lasts about five years. What happens to books after they are used for five years? Are they recycled, or is there a depository? - In Detroit, the book depository became a dumping ground (Dawsey 2008; Griffioen 2008). What is happening to old books in your city? - When schools do not have enough books, teachers often compensate by making photocopies. Find a school that lacks books, and check how much they spend on photocopies. Is this an efficient economic choice? - Some schools claim they have replaced print textbooks with digital textbooks. Digital textbooks are password-protected. People regularly lose passwords and get locked out of password-protected systems. Are kids and parents able to get to the digital textbooks when they need them? - Use Stacked Up to find a school that is using social studies textbooks that are more than five or eight years old. How do they teach civics or social studies with books that do not include the name of the current US President? These 10 ideas took me about 30 minutes to generate. Each of them could probably result in a series of at least three stories, plus two follow-up stories based on the school district’s reaction. That is 50 original investigative stories, an entire year’s worth of stories for a reporter writing one story a week. An interested reader will probably generate additional questions while reading the story prompts; each of those questions might produce five original investigative stories as well. The potential pool of story 15 16 MEREDITH BROUSSARD Downloaded by [Temple University Libraries] at 08:32 15 December 2014 ideas could multiply if given an entire newsroom of people practiced deliberate creativity. Having a virtual fountain of story ideas is especially useful for the modern newsroom, where online publishing means that reporters and editors need to “feed the beast” almost constantly. Writing only one story a week is a luxury in today’s marketplace, especially at online publications where writers are urged to publish multiple stories a day and editors may edit 30–40 stories a week (June 2013; Peters 2010). High-impact investigative stories can take a tremendous amount of time to conceive and report, a timeline that is the opposite of the current market imperative. A software tool to accelerate the investigative process can add significant value to the newsroom. NOTE 1. Books such as The Investigative Reporter’s Handbook (Houston and Investigative Reporters and Editors, Inc. 2009) offer readers a set of places to look for stories inside different beats such as education, transportation, or nonprofits. Likewise, Investigative Reporters and Editors, Inc., the nonprofit formed in 1975 to help “improve the quality of investigative reporting,” focuses significant educational efforts on strategies to help reporters find story ideas: a February 2014 electronic search of the Investigative Reporters and Editors library includes 127 tipsheets for the search query “investigative story ideas.” REFERENCES Appelgren, Ester, and Gunnar Nygren. 2014, February. “Data Journalism in Sweden: Introducing New Methods and Genres of Journalism into ‘Old’ Organizations.” Digital Journalism: 1–12. doi:10.1080/21670811.2014.884344. Benfer, Robert Alfred. 1991. Expert Systems. Sage University Papers Series, no. 07-077. Newbury Park, Calif: Sage. http://dx.doi.org/10.4135/9781412984225. Boden, Margaret A. 1998. “Creativity and Artificial Intelligence.” Artificial Intelligence 103: 347–356. boyd, danah, and Kate Crawford. 2012. “Critical Questions for Big Data: Provocations for a Cultural, Technological and Scholarly Phenomenon.” Information, Communication & Society 15 (5): 662–679. doi:10.1080/1369118X.2012.678878. Buchanan, Bruce G. 1985. “Expert systems.” Journal of Automated Reasoning 1 (1): 28–35. Cuillier, David. 2011. The Art of Access: Strategies for Acquiring Public Records. Washington, DC: CQ Press. Dawsey, Chastity Pratt. 2008. “Unsecured Schools given up to Thieves, Vandals.” Detroit Free Press, April 4. http://www.freep.com/apps/pbcs.dll/article?AID=/20080404/NEWS01/ 804040302. Department of Human Services, City of Philadelphia. 2012. 2011 Annual Report. Annual Report. http://www.phila.gov/dhs/pdfs/DHS%20Annual%20report.pdf. Department of Human Services, City of Philadelphia. 2014. “Children and Youth Division Home Page.” http://dhs.phila.gov/intranet/pgintrahome_pub.nsf/content/cydhomepage. Downloaded by [Temple University Libraries] at 08:32 15 December 2014 ARTIFICIAL INTELLIGENCE FOR INVESTIGATIVE REPORTING Diakopoulos, Nicholas. 2013. “Rage against the Algorithms.” The Atlantic, October 3. http:// www.theatlantic.com/technology/archive/2013/10/rage-against-the-algorithms/280255/. Diakopoulos, Nicholas. 2014. “Algorithmic Accountability Reporting: On the Investigation of Black Boxes”. Tow Center for Digital Journalism at Columbia University. http://towcen ter.org/wp-content/uploads/2014/02/78524_Tow-Center-Report-WEB-1.pdf. Diaz, Jessica. 2013. Personal Communication. Dick, Murray. 2013, September. “Interactive Infographics and News Values.” Digital Journalism: 1–17. doi:10.1080/21670811.2013.841368. Domingo, David. 2008. “Interactivity in the Daily Routines of Online Newsrooms: Dealing with an Uncomfortable Myth.” Journal of Computer-Mediated Communication 13 (3): 680–704. doi:10.1111/j.1083-6101.2008.00415.x. Feigenbaum, E.A. 1977. “The Art of Artificial Intelligence: Themes and Case Studies of Knowledge Engineering.” Proceedings UCAI 5. Cambridge, MA. Flaounas, Ilias, Omar Ali, Thomas Lansdall-Welfare, Tijl De Bie, Nick Mosdell, Justin Lewis, and Nello Cristianini. 2013. “Research Methods in the Age of Digital Journalism: MassiveScale Automated Analysis of News-Content—Topics, Style and Gender.” Digital Journalism 1 (1): 102–116. doi:10.1080/21670811.2012.714928. Flew, Terry, Christina Spurgeon, Anna Daniel, and Adam Swift. 2012. “The Promise of Computational Journalism.” Journalism Practice 6 (2): 157–171. doi:10.1080/17512786.2011. 616655. Gans, Herbert J. 2004. Deciding What’s News: A Study of CBS Evening News, NBC Nightly News, Newsweek and Time / Herbert J. Gans. Visions of the American Press. Evanston, Ill: Northwestern University Press. Griffioen, James D. 2008. “The Knowledge of What Happened and What Will.” Sweet Juniper. http://www.sweet-juniper.com/2008/04/knowledge-of-what-happened-and-what.html. Grimmer, Justin, and Brandon M. Stewart. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21 (3): 267–297. doi:10.1093/pan/mps028. Hamilton, James T., and Fred Turner. 2009. Accountability through Algorithm: Developing the Field of Computational Journalism. Developing the Field of Computational Journalism. Center For Advanced Study in the Behavioral Sciences Summer Workshop: Stanford University. http://www.stanford.edu/~fturner/Hamilton%20Turner%20Acc%20by%20Alg %20Final.pdf. Hansen, Kathleen A. 1991. “Source Diversity and Newspaper Enterprise Journalism.” Journalism & Mass Communication Quarterly 68 (3): 474–482. doi:10.1177/107769909106800318. Houston, Brant and Investigative Reporters and Editors, Inc. 2009. The Investigative Reporter’s Handbook: A Guide to Documents, Databases and Techniques. 5th ed. , edited by Brant Houston, Investigative Reporters and Editors, Inc. Boston, MA: Bedford/St. Martin’s. Howard, Alexander Benjamin. 2014. The Art & Science of Data-Driven Journalism. Tow/Knight Reports. Tow Center for Digital Journalism: Columbia University. http://towcenter.org/ wp-content/uploads/2014/05/Tow-Center-Data-Driven-Journalism.pdf. June, Laura. 2013. “Maura Johnston on Why She Opened Her IPad-Only Magazine to the Web.” The Verge, July 10. http://www.theverge.com/2013/7/10/4506824/maura-john ston-on-why-she-opened-her-ipad-only-magazine-to-the-web. Labbé, Theola and Dion Haynes, V. 2007. “Rhee Blasts Textbook Process for Letting Supplies Languish.” The Washington Post, August 4. http://www.washingtonpost.com/wp-dyn/ content/article/2007/08/03/AR2007080302134_pf.html. 17 Downloaded by [Temple University Libraries] at 08:32 15 December 2014 18 MEREDITH BROUSSARD López-Ortega, Omar. 2013. “Computer-Assisted Creativity: Emulation of Cognitive Processes on a Multi-Agent System.” Expert Systems with Applications 40 (9): 3459–3470. doi:10.1016/j.eswa.2012.12.054. Marburger, David. 2011. Access with Attitude: An Advocate’s Guide to Freedom of Information in Ohio. Athens: Ohio University Press. McChesney, Robert W. 2012. “Farewell to Journalism?: Time for a Rethinking.” Journalism Practice 6 (5–6): 614–626. doi:10.1080/17512786.2012.683273. Meyer, Philip. 2002. Precision Journalism: A Reporter’s Introduction to Social Science Methods. 4th ed. Lanham, Md: Rowman & Littlefield. Obama, Barack. 2009. “Memorandum for the Heads of Executive Departments and Agencies Re: Transparency and Open Government”. Federal Register. http://www.whitehouse. gov/the_press_office/TransparencyandOpenGovernment. Parasie, S., and E. Dagiral. 2012. “Data-Driven Journalism and the Public Good: ‘ComputerAssisted-Reporters’ and ‘Programmer-Journalists’ in Chicago.” New Media & Society 15 (6): 853–871. doi:10.1177/1461444812463345. Pavlik, John V. 2013. “Innovation and the Future of Journalism.” Digital Journalism 1 (2): 181–193. doi:10.1080/21670811.2012.756666. Peters, Jeremy W. 2010. “In a World of Online News, Burnout Starts Younger.” The New York Times, July 18. http://www.nytimes.com/2010/07/19/business/media/19press.html. Protess, David. 1991. The Journalism of Outrage: Investigative Reporting and Agenda Building in America. New York: Guilford Press. Remler, Dahlia K., Don J. Waisanen, and Andrea Gabor. 2013. “Academic Journalism: A Modest Proposal.” Journalism Studies, August, 1–17. doi:10.1080/1461670X.2013.821321. Royal, Cindy. 2010. “The Journalist as Programmer: A Case Study of the New York times Interactive News Technology Department.” The University of Texas at Austin. https:// online.journalism.utexas.edu/2010/papers/Royal10.pdf. Scribner, S., and M. Cole. 1973. “Cognitive Consequences of Formal and Informal Education: New Accommodations Are Needed between School-Based Learning and Learning Experiences of Everyday Life.” Science 182 (4112): 553–559. doi:10.1126/science.182.4112.553. Sternberg, Robert J., ed. 1999. Handbook of Creativity. Cambridge, U.K. ; New York: Cambridge University Press. Sweeney, Latanya. 2013. “Discrimination in Online Ad Delivery.” Communications of the ACM 56 (5): 44. doi:10.1145/2447976.2447990. Tuchman, Gaye. 1978. Making News: A Study in the Construction of Reality. New York: Free Press. Meredith Broussard, Department of Journalism, Temple University, USA. E-mail: merbroussard@temple.edu. Web: http://meredithbroussard.com

Log In

Artificial Intelligence for Investigative Reporting

Related papers

Related papers

Related topics