Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Transformation of Communication Processes: Data Journalism

This article gives a brief review of the history of data journalism, as well as the prerequisites for its appearance. The authors describe the advantages of employing data-journalism skills in the newsrooms. Finally, the article provides a review of data-driven journalism projects all over the world, state of art 2014.

N. Kayser-Bril Journalism++, A. Valeeva University of Siegen, I. Radchenko TRANSFORMATION OF COMMUNICATION PROCESSES: DATA JOURNALISM ITMO University Introduction Data journalism can be mistaken as the new buzzword for infographics. This is true, but only to some extent. Visualizing data to tell stories started long before the Internet came into being. What makes an attractive infographics, is not the great design, but, more importantly, the insight it gives. The John Snow’s map of cholera outbreaks from nineteenth century London, is an example of reference. The English physician mapped the cases of cholera deaths in the Soho district of London. Back then, the very notion of germs did not exist. Being mapped, the plain data gave an insight: the outliers clustered around the pump. This is just one of many examples of maps and charts that give another perspective on the same dataset. Another important part of data journalism – or, more precisely, data-driven journalism, is the use of computers, math and statistical analysis. Computer-assisted reporting was born in the American newsroom in the late 1960s, where technological advances met the social sciences. It was during the US national elections in 1952 when journalists first used a computer to predict the outcome of the vote – and the machine got it right. Fifteen years later Philip Meyer, a journalist working at the Detroit Free Press, used IBM 360 to cover the Detroit riots in 1967. Through a machine survey, he was able to investigate the dataset and sketch a profile of the rioters. This Pulitzer-winning story was the first attempt by a journalist to use analytical methods from sociology, behavioral science research methods and similar domains in the context of a newsroom. Philip Meyer himself called this “precision journalism.” The increasing size of datasets was another step on the way to data-driven journalism. In 2006, Adrian Holovaty, an American web developer, journalist and entrepreneur, wrote a blogpost which turned out to be a manifesto for data-driven journalism.1 His main point was that “newspapers need to stop the story-centric 1 See <http://chicago.everyblock.com/crime/ is an example of Holovaty’s implementation of his own manifesto>. 414 worldview.” What is required from media, is “structured information: the type of information that can be sliced-and-diced, in an automated fashion, by computers.” As Holovaty wrote, the media of the future “have to build an infrastructure that turns them into reliable data hubs, able to analyze even very large and complex datasets internally and to build stories on their insights.” What changed in the late 2000’s was that Internet gave access to an unprecedented wealth of information and computing power. Data from the public bodies and corporations are becoming increasingly available, in a movement known as open data. The vast amount of information calls for new methods to find and convey meaning from the original data. From the other end, the tools to handle data permit the evolution in the newsroom to happen. What used to be the exclusive domain of computer scientists can now be done by any journalist. Free software allows anyone to manage, analyze and visualize data. Open source alternatives abound to analyze geographic data or visualize it. Further, solutions on how to scrape, collect and store vast amounts of data are starting to come to the market. Newsrooms meet data journalism The first major news organization to adopt the term is The Guardian, which launched its Datablog in March 2009. Its editor Simon Rogers and his team regularly published stories based on a dataset which he also made public for others to use. At the same time in France, Le Post (now Le Huffington Post), a property of Le Monde, commissioned an interactive ranking of French members of Parliaments (MPs) summarizing the number elective mandates they held besides their position as MPs. The work involved scraping, visualization and investigation and was done by a team of external contractors,2 in-house developers and journalists. What would today be called data journalism was then done with no clear framework and no descriptive term. These first experiments created the foundations of the European data journalism for three reasons. The first one is technical. The second is political. The third one is circumstantial. In 2007, Apple launched the first version of its iPhone. It did not allow Adobe’s Flash software to run on the device, mostly as a way to save power. Flash had been the ubiquitous tool for interactive content for the previous ten years. Apple’s move meant that content produced with Flash would not be visible on the iPhone and later on the iPad. It fastened dramatically the demise of Flash and the rise of 2 Disclosure: Nicolas Kayser-Bril was one of them. 415 JavaScript-based, browser-rendered interactives. Without delving into technical details, it meant that new skills had to be harnessed as content producers moved away from Flash. These new, non-Flash teams would be the basis for the upcoming data journalism ones. The second catalyst for the rise of data journalism lies in the dynamism of the open data movement. Outside of the newsrooms, groups of computer developers and activists began to challenge the monopoly of news organizations. The best example is the “Parliament watch” initiatives. Those were websites that measured and structured parliamentary activities. In France, it was an NGO called “Regards Citoyens”, which started in 2008 in the wake of the passing of the Hadopi law, which allowed for automated discontinuation of a contract between an internet service provider and its client if the latter was suspected of sharing copyrighted materials. In the late 2000’s, Slovakia’s Zuzana Wienk created the Fair Play Alliance, which focused on disclosing money flows to and from the public administration by building data-driven online tools. The biggest data-related NGO in Europe however remains the Open Knowledge (OK), which came to continent-wide popularity with the release of the dataset management software CKAN, which was adopted by several open data portals. New agents also emerged – international journalism groups such as the International Consortium for Investigative Journalists or JournalismFund.eu and specialized agencies like OpenDataCity in Berlin, Journalism++ in five European cities or Dataninja in Bologna. These NGOs and data journalism teams both considered that development skills should be used to gather and communicate socially relevant information. It is not rare for data journalists to have worked at an NGO committed to open data.3 The third reason that explains the rise of journalism at the beginning of the 2010’s in Europe lies in Wikileaks’ publication strategy for the Afghan Warlogs in June 2010. From 2006 to 2010, Wikileaks, headed by Julian Assange, had published documents only on its own website. To maximize impact, Wikileaks tried a new strategy, setting up a consortium with The Guardian and Der Spiegel in Europe and the New York Times in the United States. For the first time, traditional newsrooms had to deal with source material in database format and needed the skills of working with data. This gave small data teams an opportunity to gain acceptance within their newsrooms. At The Guardian, Simon Rogers and others worked with journalists on the documents. In France, OWNI, a 4-person news operation with a focus for data journalism, worked on the Afghan documents once they were published, helping 3 For instance, developers Stefan Wehrmeyer and Friedrich Lindenberg both worked for Open Knowledge Germany before or after working in the newsroom of Spiegel or Correct!v and developer-journalist Annabel Church co-organized events in Berlin with OK's Lucy Chambers. 416 popularize the concept of data journalism in other newsrooms. Subsequent leaks of databases, organized by Wikileaks and later, by ICIJ (International Consortium of Investigative Journalists), Al Jazeera and others, continued to pressure publishers into paying attention to this trend. The combination of Adobe Flash’s demise, the rapid push of open data and database leaks moved data journalism from a side practice into a widely accepted concept. That data journalism courses started at the same time in journalism schools across Europe is no coincidence.4 Stories from the databases Databases allow for different modes of work and types of participation. Dealing with a large dataset is complicated if feasible at all for one person alone. Collaboration with others – both inside and outside the newsroom – becomes inevitable. However, it is only for good: it strengthens the journalism culture, restores the trust between the media and the audience and makes more sense of data. There are different types of database-driven journalism stories. Shared database for journalists from different media Media have to engage in a collaborative publishing endeavor: to publish the information on one day and to share the common database. In the journalist spirit of competition and rivalry, this might seem problematic; however this relieves the burden of the data and increases the impact of the publication. Afghan Warlogs publication was a case when one database was shared between three mainstream media: The Guardian, Der Spiegel and the NYT. The reporters of all three media outlets have joined forces, using a large database to organize the material, then plugging in search terms and combining the documents for newsworthy content. Together they assembled the material into a conveniently searchable and secure database. They brainstormed topics to explore and exchanged search results. Global collaboration between journalists Cross-border investigations allow for a broader reach and localized focus at the same time. The Offshore Leaks project, led by ICIJ, allowed 110 reporters from 47 countries to work together. Gerard Ryle, an Australian journalist, received a hard drive containing 260 gigabytes of information, concerning more than 100.000 companies operating under tax havens. The publication of their findings, on the same 4 At Ecole Supérieure de Journalisme, in Lille, France, courses started in 2011. 417 day, helped to create a large impact and offer a wider range of coverage on the issue in question. The Offshore Leaks investigation was the largest project of its kind at the time. Joint forces of journalists, activists and technologists Confiscati Bene (literally, “Well Confiscated”) is a participatory project aiming at stimulating an effective re-use of the assets seized from the mafia.5 ConfiscatiBene is carried out by a diverse group of journalists, activists and technologists. The project investigates the current condition of the assets through the analysis of relevant data coming both from official sources and from bottom-up, citizen monitoring initiatives. Crowdsourcing Outside collaboration can be done via crowdsourcing. The Guardian chose in 2009 to ask for the help of its readers. The expenses of all the Members of Parliament were published in June 2009 by the House of the Commons.6 But the amount of files, and the way they were edited led the journalists to build a microsite allowing readers to read and annotate the 700.000 documents. The main goal was to enable “users to fully investigate the documents and track what they – and other users – found”, Janine Gibson, editor in chief of The Guardian website, explained in a press release. The Guardian experiment worked well because readers could identify and look for their own Member of Parliament. Open Source Intelligence (OSINT) This method implies collecting already published information to find news. One of the best-known examples is The Migrant Files7 where a team of ten datajournalists across Europe gathered publicly available information to build an exclusive database of all the migrants and refugees’ deaths at the European Union’s borders. All the information was available but spread across a variety of data sources from news report to spreadsheets maintained by NGOs. The main task was to gather the information, clean it and construct a single database. On April 2d 2014, the team published the database and the articles on nine websites across Europe. Their investigation helped to assess the danger of each route taken by the migrants and quantified the deaths not counted by the EU officials. To publish the database, the team 5 See <http://www.confiscatibene.it/it/about-project-english-version>. See <http://www.theguardian.com/news/datablog/2009/jun/18/mps-expenses-houseofcommons>. 7 See <https://www.detective.io/detective/the-migrants-files/>. 6 418 used Detective.io, a tool developed by Journalism++ to help journalists make their own databases. Database for users to research After extensive data collection and analysis, data teams can come up with apps or interactive graphics that allow the end user to get insight relevant for him personally. For instance, Dollars for Docs by Propublica8 allow users to see how their doctors are sponsored by big pharmaceutical companies, and the famous “Rent or Buy9” calculator by the New York Times helps to take the right decision on housing. Evolution of data journalism after 2010 Newsrooms kept experimenting with data journalism. Zeit Online, in Berlin, commissioned several interactive projects from external developers in 2011. In Paris, Le Monde hired a developer-journalist from OWNI in 2011. All across Europe, in newsrooms small and large, individuals tried tools and techniques. Top management almost always ignored, let alone support, these experiments (a notable exception might be Zeit Online, where executives directly pushed for innovation). Some newsrooms did capitalize on their employees’ efforts at data journalism and set up processes, teams and positions dedicated to the practice. Zeit Online created positions as “developer-journalist” in 2012. In Zurich, Neue Zurcher Zeitung created a dedicated data journalism team that same year with Sylke Grunwald at its helm. In London, the Financial Times reorganized its interactive team under the guidance of Martin Stabe in 2013. The size of the newsroom is no indicator of its willingness to embrace data journalism. Regional outlets, such as Stimme.de in Heilbronn or L’Avenir, a regional newspapers group in Belgium, have programs to push for data journalism since 2013. Some media outlets that came late in the game did catch up by rapidly creating data teams. However, some media like Le Monde and Le Figaro, being among the first to experiment with data journalism, did not create teams where developers and journalists could work together. There, developers remain in their own, purely technical teams. As a result, all data journalism activities there took place independently from one another, preventing the organization from capitalizing on its experiences and from gaining productivity. 8 See <https://projects.propublica.org/docdollars/>. See <http://www.nytimes.com/interactive/2014/upshot/buy-rent-calculator.html?abt= 0002&abg=1>. 9 419 In Russia, several news organizations or NGOs have experimented with data journalism, but no team has emerged that considered the discipline its main activity. Several databases are available for use, among them the state procurement website (http://zakupki.gov.ru/) and the site run by the state treasure of the Russian Federation (http://bus.gov.ru/). There is also an open data portal run by enthusiasts of open data (http://hubofdata.ru/).10 One of the best practitioners of data journalism in Russia so far was the infographics team at RIA news agency. After its dismantling, what is left are the agencies specializing in data visualization like Mercator, and few data journalism projects delivered on the ad-hoc basis, such as those published by Slon.ru. Conclusion: data journalism will become the norm Six years after the term gained acceptance, data journalism remains a new and vaguely defined practice. Rather than being defined by content, inputs or outputs, the consistent definition of data journalism has to do with teamwork and processes. The way developers, project managers and reporters work together allow for producing content in a new, more efficient way. Projects such as Swiss Leaks, Migrants Files or Fifa Files show that these processes are now accepted and replicated as needed. Some newsrooms invested in data journalism, by creating new positions, while others remained on the sidelines. The term data journalism helped frame the debate regarding how journalism was adapting to the new economic and technical landscape and is now being integrated in the concept of “journalism” as a whole. Content production routinely includes developers and project managers, regardless of whether or not people involved call themselves data journalists. Journalism schools are adding, next to their courses in data journalism, classes in code and statistics to their curriculum,11 next to their data journalism classes. What defines data journalism – teamwork across disciplines – will soon become the norm in content production. Newsrooms that inherit a print-based brand still have a long way to go before they produce content by having personnel with different skill sets working together as a matter of routine. But as competition on 10 For the full list of Russian portals with data, check <http://iradche.ru/2014/08/opendata-2/>. 11 Coding classes started in 2013 at the School of Journalism of Sciences-Po Paris, France. Statistics were introduced in 2014 at Académie du Journalisme et des Médias in Neuchâtel, Switzerland. 420 content increases, with NGOs, corporations and institutions joining in the fray, the share of content produced in a single-person, text-only way, will be marginalized. The digitalization of content has radically transformed the media industry. Now that every Internet user can commit acts of journalism, professional journalists are in dire need of a redefinition of their work and purpose. With the rise of data, journalists can move up the value chain and focus on collecting information and analyzing it. Therefore, they can use the data as a resource for social change. References Gray J., Bounegru L., Chambers L. Data Journalism Handbook // O’Reilly. 2012. Holovaty A. A Fundamental Way Newspaper Sites Need to Change, 2006 <http:// www.holovaty.com/writing/fundamental-change/>. Howard A. The Growing Importance of Data Journalism // O’Reilly. 2010. 21 December. Keller B. Dealing with Assange and the WikiLeaks Secrets // The New York Times. 2011. 26 January <http://www.nytimes.com/2011/01/30/magazine/30Wikileaks-t. html?pagewanted=all&_r=1&>. Léchenet A. Global Database Investigations: The Role of the Computer-Assisted Reporter / Reuters Institute Fellowship Paper. University of Oxford, 2014. Lorenz M. From Attention to Trust: Data-Driven Journalism and the Urban Future. Presentation // Picnic. Amsterdam, Netherlands, 2011. 23 September. Lorenz M., Kayser-Bril N., McGhee G. Voices: News Organizations Must Become Hubs of Trusted Data in a Market Seeking (and Valuing) Trust // Nieman Lab. 2011. 1 March. Rogers S. Facts are Sacred. Faber and Faber, 2013. 421