Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 5
Hi and welcome back.
In earlier modules we progressed from learning to define our
problem to learning how to do so with the use of data and evidence. In fact, we have two modules: one on data analytical thinking, explaining why to use data to define your problem, and another where we focus on strategies for how to do so, learning to use data in practice to accelerate your own projects. Okay, now, we're going to deepen our discussion of data to introduce you to one of the most powerful and important governance innovations of the last decade. That's the policy of open government data. As we're going to see, open data policies are a key enabler for one of the most important sources of information and evidence that you will be able to avail yourself of when defining your problems. Okay, let's dive in and get started. By the end of this module, you should be able to do a few things. First, you're going to be able to define open data and open data policy. Second, you're going to understand how open data policies can be used to make data available to the public, including to you. And third, we want to really understand by the end of this how open data might be available to you to use when defining your problem. Let's start with the story as we usually do. Edmund Haley, the astronomer who lent his name to Haley's comet, published an article on annuities in 1693. His population table was based on data that was collected for the years 1687 to 1691 from the city of Breslau -- today it's called Wrocław (Vrotzlov). It was data collected by the protestant pastor of the town, Caspar Neumann. This work is now seen as a major event in the history of demography and the first major work of actuarial science. What was so wonderful about this project is that it illustrates the value of sharing and publishing data openly. By giving Haley the raw information, Caspar Neumann enabled more value and insight to be created than had he simply kept the data for himself. Such collaboration is what really makes open data truly transformative. The organization or individual that collects and maintains the data is not always the one with the exclusive ability to use the data well. By opening up and sharing data, institutions can enable the collaboration of people with diverse skills and talents and insights to work together to generate more value from data. By making data open. you're going to enable others to bring fresh perspectives, fresh insights, and additional resources to your data, and that's when it can really become valuable to you and to others for public problem solving. Okay, what is open data specifically? Government and other organizations have always collected data. We've gathered information from companies when government regulates, for example, government tracks statistics about the economy and society in its role as policy-making body, and it collects data from citizens in its role as a provider of public goods and services. Universities, companies, and others also regularly collect data. But what distinguishes open data from other types of data is that it is publicly available, it can be freely accessed and used and -- and this is important -- it's capable of being processed by machine. Okay, that is to say to be considered open data has to be both technically and legally accessible. To make it technically accessible, data must be available in a form that a computer can use and access so that the data can be analyzed. To be legally accessible, data must be licensed in such a way that anyone can use it and reuse the information without fee and without restriction or condition. When data is both legally and technically open, then anyone -- whether they are the collector of the data or not -- when they have the right tools, can then create sophisticated and useful tools, conduct analysis across data sets to enable empirical problem solving, and use data to advance both social good and potentially economic growth, as we'll see. Okay, let's take an example from Mexico, a project called Mejora tu escuela. Created by the Mexico Institute of Competitiveness or IMCO, Mejora tu escuela is an online platform that makes government data about Mexico's schools publicly available. The website provides parents with comparative data so that they can compare their own school's performance with that of other schools. This empowers parents and students to demand better quality education for their children. Mejora publishes expenditure data as well, which gives activists and administrators, policy makers and journalists the means to dig deeper to spot fraud and corruption and to advocate for change, ultimately. And this is exactly what happened in 2014 when a report by IMCO revealed that over 1400 teachers on public school payrolls were supposedly, according to the data, more than 100 years old with most having exactly the same birthday and most suspiciously of all earning more than the president of Mexico. No, this was not a case of the school board having discovered Ponce de Leon's mythical fountain of youth. Rather the story of Mejora tu escuela illustrates how when a government -- or any institution, for that matter -- makes information free of charge and readily downloadable in digital form, such open data can then solve problems. In this case federal authorities had then required states to provide information about the conditions of schools, payrolls, other expenditures. But it was when civil society activists at IMCO outside of government were able to create this platform to make that information accessible to citizens and to journalists and to themselves, then the information ultimately got scrutinized. That was when they ultimately exposed this rampant malfeasance that was previously hidden. Although good government initially prevaricated and hesitated, claiming clerical error, the ensuing media frenzy over the website and what it revealed helped to prompt reform and then a shift of responsibility over education from states to the federal government. Ultimately, the activists and the federal bureaucracy worked in parallel. They collaborated, addressing this local level corruption and acting to improve Mexico's schools. Open data matters for a variety of reasons, the same reasons, though, that using data matters in the first place. We can use open data to spot mistakes and outliers and rare events. We can use it to help us target scarce resources more effectively. We can use it to tell stories in the way that we've previously discussed. Let's consider a few more examples of open data being put to work to further understand also the benefit of data analysis for defining our problems. First, open data sometimes achieves greater government accountability. In the United States at the federal level open data facilitated the creation of a website called usaspending.gov, a set of online tools for exploring the federal budget. Opening local government data about public works in Zanesville, Ohio in that case revealed a 50-year pattern of discriminatory water service provision. While access to clean water from the city of Zanesville waterline spread throughout the rest of Muskegon county, residents of the predominantly African-american area of Zanesville in Ohio, they were only able to use contaminated rain water or they had to drive to the nearest water tower or store and truck water back to their homes in bottles. Opening the data laid bare the truth of what was going on and led to a successful civil rights lawsuit against Zanesville in 2008 when it was revealed the disparity between African-american and white residencies with regard to water provision. Second, open data can improve the delivery of services at the state and local level, increasing access to open data has allowed entrepreneurs and developers to build tools such as smart transit apps, citizen-facing information services, and business or government-facing data visualization and analysis platforms. For example, both transit authorities and commercial providers think the MTA in your city and Google maps, of course, use open transportation data to tell commuters when to expect their bus or their train coming along their route. Retroficiency analysis energy consumption data to allow utilities, energy service providers, or building owners to identify buildings with high energy savings potential. Third, open data also enables the creation of tools to improve consumer choice and citizen decision making in the marketplace. Let's take another example here. Data that's collected by government from universities has been transformed by the department of education into a calculator known as the college scorecard. This is designed to help parents and students make more informed financial decisions about their choice of college education. Sometimes the benefits of open data are going to ripple out beyond government accountability and government services. For instance, open data can also be used to catalyze greater business competition and entrepreneurship as well as job creation. Think of the wealth and the jobs that are created by government's release of both weather data and geo location data for the economy. Those have enabled the creation of weather apps as well as GPS devices. The open data institute in the UK notes that the global market for open data could be as high as 5 trillion dollars. Thousands of companies worldwide now already use open government data as a core business asset. One example of this is the company Brightscope, which worked with previously locked up or closed Department of Labor form 5500 retirement plan data, to offer better decision making tools to investors. When the data became available as open data, Brightscope was able to rapidly build tools to help people make decisions about which retirement plan had the lowest fees. A decade ago, open data was but an idea, a call to action by pro-democracy activists wanting government to be more transparent. Today, however, it encompasses a broader movement that is focused on solving public problems. Open data policies have helped to drive that change. On his first day in office in 2019 -- sorry 2009 -- fulfilling an earlier campaign promise, President Obama signed the memorandum on transparency and open government. That memorandum declared, and here I quote, "information maintained by the federal government is a national asset." It called for the use of, quote, "new technologies to put information about agency operations and decisions online and to make that data readily available to the public." In addition, the government's open data policy made clear that because the collection of data by government is already paid for by the taxpayer, it therefore makes sense to give that data back to the public to use for free. When the federal government's open data repository, called data.gov -- you can check out that website -- launched in May 2009, it started initially by just making 47 data sets searchable. But turning the principles of the memorandum into practice by creating a tangible and central place for agencies to list government data and, more importantly, a place for the government to find that data, data.gov was really instrumental in unleashing the movement of open data. Later that year the Office of Management and Budget directed federal agencies to release not only data about the workings of government but also what was termed high value information. The choice to broaden the 40 year old definition of government transparency from only data about government to data that agencies collect, expanding that definition responded to what both the technologies of big data and the technologies of collaboration can actually make possible. The directive emphasize the broad public benefits and the need to disclose new kinds of government information as open data in machine readable format, such as the locations of reported crimes or weather information or information that we've discussed before, like GPS data, that could foster new businesses. In 2013, the federal government recommitted to and expanded its work on open data policy by issuing another executive order on making open and machine readable the new default for government information. It was designed to advance and accelerate open data implementation by federal agencies, getting them to open up and put online ever more information. Entrepreneurship and innovation rather than government accountability alone are emphasized in that order. It makes clear that making information resources easy to find, accessible, and usable can fuel entrepreneurship, innovation, and scientific discovery that improves Americans' lives and contributes significantly to job creation. Further laws have followed, broadening the scope of data covered under open data statutes and policies. The Digital Accountability and Transparency Act, also known as the DATA Act of 2014, calls for publishing all federal government spending data, now in particular, as open data in standardized formats. There is also another statute known as the Open Public Electronic and Necessary Government Data Act, also known as the Open Government Idea Act for short, which was signed into law in early 2019. The open statute calls for inventorying and publishing all government information, not just spending data, as open data. Today, there are quarter million federal data sets online on data.gov. That's a long way from 47. And just about every state and hundreds of cities now release some data as open data and have some form of open data portal or website like data.gov but for a state or for a city. Despite this, though, the need for continued open data policy making is as strong and urgent as ever. An open data barometer survey of 1725 data sets covering 115 countries found that nearly 90 percent of priority data sets -- those that people most want -- still remain closed and unavailable. Only seven percent of the data governments collect, they say, is fully open, only one of every two data sets is machine readable, and only one in four data sets has an open license that makes it free to use and to reuse. The bipartisan interest in evidence-based approaches to governing has fueled demand, however, for more access to administrative information of all kinds, including the data that agencies collect about companies, about workplaces, and the environment. Using open data is a great way to get data that you can use to define and understand your problem. However, before you push ahead to identify how to use open data to better define your problem, remember to make sure that you have started in the first place by defining your problem, as we discussed in earlier modules. Without knowing the problem and especially root causes, it's going to be hard to know what type of data you actually need. So be sure you go over the exercises for defining your problem before selecting your data sets. Okay, but now let's take a minute then to finish up by considering whether the data that you actually need is or might be made available as open data. First, let's consider the availability of the data that you want as open data. Does somebody already collect it and publish it? Is there a government or another institution, a university, or a company making that data available online? While you can start with your own community's open data catalog, that of your city or your state, often these are not comprehensive sources. There are also other relevant agencies that you might want to engage to identify available data sets. There are numerous aggregators of open data that you can consider. For example, you might want to try the census bureau in your country. You might want to look at -- if you're in the U.S. -- the Urban Institute, which is a fabulous primary source of data about communities in the United States. Depending on your field of interest, different federal agencies such as the EPA for environmental data, self-evidently, or the US department of Labor for labor data, or the FBI for crime data, might offer access to machine readable, high quality, and comprehensive data that you need to tackle challenges. Open Corporates is the largest open database of companies in the world. Once you find the data that you want, you have to look at whether that data is fully open and or accessible to you in a machine readable form, enabling you to use it readily for analysis. If not, can you identify external or internal partners with the relevant expertise who can actually help you prepare the data for use. One strategy is to organize what's sometimes called a hackathon or a datathon, also might be known as a data dive. Data dives are these high energy, marathon style events where teams of volunteer data scientists, developers, statisticians, and designers help mission- driven organizations and individuals, whether they're government agencies or NGOs or activists, to organize, manipulate, clean, or visualize their data. If the data is not collected, what is it actually going to take to collect that data? We've previously discussed methods such as interviews and surveys that you might want to do to collect original data yourself, and in our next module on open innovation, we're going to look in detail at how to use crowdsourcing to collect data using distributed participation. Next, you want to consider your level of readiness to make use of the data. Do you actually have access to the necessary expertise, not simply to collect the data but now to analyze it and make sense of it? Again, reaching out to partners, especially in universities, can be one way to obtain the necessary expertise. Another way may be the use of a competition. New York City, for example, gets people to use its data and analyze it by hosting competitions or challenges to attract data savvy individuals to analyze and use their data to create new tools. The city's Big Apps competition invites private companies and individuals to solve public problems using open data. Their challenge, the Big Apps challenge, is overseen by the city's Economic Development Corporation. It engages agency leadership throughout the planning process to open up more data that the public can then use to create new tools. For example, a past Big Apps winning team used targeted, geo-located data to create an app called Mind My Business. It was a tool designed to assess brick and mortar food service establishments in New York by sending alerts that help owners predict changes in customer traffic, operate more efficiently, and avoid fines. Private sector platforms like Kaggle offer a community of data scientists online ready to solve problems, usually in exchange for a prize or some kind of monetary incentive. Okay, that concludes our much too short session on the power of open data. Used well, open data can generate new insights and enable us to define problems using empirical evidence. But collecting that data and deriving insight from it and ultimately designing solutions to public problems is, as we've discussed, going to require collaboration. That's why in our next module we turn to exploring ways of using new technology to organize such collaboration efficiently and effectively to allow us to do more using both quantitative and qualitative methods. Until then, see you next time.