Introduction To Open Data For Journalists
Introduction To Open Data For Journalists
Before we start
Wi-Fi = ODINET
Password: OpenData
Please download the latest Chrome at: http://tinyurl.com/install-google-chrome-now
Please set up a Google account if you don't have one already at: https://accounts.google.com/SignUp
Introductions
Ulrich Atz
Statistician, Open Data Institute
@statshero
Kathryn Corrick
Head of training, Open Data Institute
@kcorrick
Introductions
Your name
Where youve come from
Role
Your aims for the day
Agenda - Today
What is data?
Using data in journalism
Finding reliable data sources
Is there really a story in this data?
Using your data, including law and licensing
Cleaning and visualising your data
Common data analysis mistakes
Presenting your story
Special invited guest
WHAT IS DATA?
Discussion
In your groups discuss what is data for you?
http://theodi.github.io/data-denitions/
Open data is data that can be freely used, reused and redistributed by anyone subject only, at most, to the requirement to attribute and share alike.
Opendenition.org
http://www.theguardian.com/news/datablog/ 2012/may/24/data-journalism-punk
http://www.ft.com/cms/s/0/4b1a2f64-2048-11e3-9a9a-00144feab7de.html
http://numeroteca.org/2011/12/05/surface-newspapers-front-pagesvs-twitter-nov30th-occupy-ows/
http://road.cc/content/news/93687-bikes-faster-public-transport-most-londonjourneys-under-8-miles
http://wheredoesmymoneygo.org/
http://openspending.org/
http://behindthewire.theglobalmail.org/
Case Study
http://smtm.labs.theodi.org/
An approach (also see: Data Journalism Handbook) The process of Data Percolation
Source
Prepare
Analyse
We recommend you offer someone a coffee and nd a second pair of eyes for your work.
Discussion
What makes a trusted (data) source?
Database denition
A collection of independent works, data or other materials which are arranged in a systematic or methodical way and are individually accessible by electronic or other means
Databases
Copyright
Creative effort and substantial investment in the selection and presentation
Individual components of the database
Database rights
Substantial investment in obtaining, verifying and presenting the database
Rule of thumb
Do you have rights or permission to publish?
Do you have rights to use the information/data?
Is the data derived from other sources?
(see licensing)
Rule of thumb
Leaks and whistleblowing get your editor and the legal team in
Data Protection
Personal Data
Data Protection Act 1998
Data relating to a living identiable person must be processed fairly and lawfully
Processing that is not immediately apparent to users e.g. cookies (new laws and guidance) damages available to data subjects
Rule of thumb
Does this data contain personal identiable data?
Could this data be used combined with another data set to create personal identiable data?
Anonymisation is hard.
See
http://www.scribd.com/doc/128356210/Business-considerationsfor-privacy-and-open-data-how-not-to-get-caught-out http://www.scribd.com/doc/125638490/Getting-to-grips-with-theNational-Pupil-Database-personal-data-in-an-open-data-world
http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/
Rule of thumb
If you are uncertain about what rights you may have over a piece of content or dataset or how you can use it
Contact the owner. Ask.
http://paidcontent.org/2013/05/24/crowdsourcing-thenews-do-we-need-a-public-license-for-citizenjournalism/
FOIA tips
Sign up to 'What Do They Know?
https://www.whatdotheyknow.com/
https://code.google.com/p/google-rene/
Percentages
Know the difference between a percentage and a percentage point.
VAT increased from 17.5% to 20% on January 2011.
This is a rise of 2.5 percentage points not a rise of 2.5%.
How much would a rise in 2.5% actually be?
Averages
Where is the mode?
Map projections
Mercator projection
Kavrayskiy VII
XKCD
3.
2.
3.
http://www.scotland.gov.uk/Topics/Statistics/16002/DataTrendsInternet
Improved version
Data sources and links to your analysis How they can be used and reused by others (licences)
http://globalnews.ca/news/622513/opendata-alberta-oil-spills-1975-2013/
Special guest
Nick Scott
Import.io
http://www.bbc.co.uk/podcasts/series/moreorless
Thank you!
Further reading and links http://bit.ly/odidjlinks
The slides will be made available online after the course