Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
GEO 802, Data Information Literacy
Fall 2020 – Lecture 2
Gary Seitz, MA
Lesson 2 Outline
Portals for data
publication
Data Repositories
Discipline-related
repositories
Open Data from
organizations
Luis Prado from The Noun Project
Re3data
Registry of Research Data Repositories
Re3data
Registry of Research Data Repositories
Re3data
Registry of Research Data Repositories
Re3data
Registry of Research Data Repositories
Check Registry of Research Data Repositories
www.re3data.org:
 Can you find data repositories in your field?
 List 5 data repositories, where you think you could find
data for your thesis.
Exercise 2.1
Open Access Directory: Data Repositories Launched in
2008 and hosted by the Graduate School of Library and
Information Science at Simmons College, the Open Access
Directory is a wiki that lists links to over 50 open data
repositories in the disciplines of archaeology, biology,
chemistry, environmental sciences, geology, geosciences
and geospatial data, marine sciences, medicine and
physics, as well as multidisciplinary open data
repositories.
Directories of repositories
Zenodo
• A research data repository. It was created by OpenAIRE and CERN to
provide a place for researchers to deposit datasets
• Has integration with GitHub to make code hosted in GitHub citable
• Provides secure archiving and referability, including digital object
identifiers (DOIs)
• Easy access
• Disadvantage: No curation, no quality control
Data repositories
• International repository of data underlying scientific and medical
publications
• All data files are associated with a published article, and are made
available for reuse under the terms of a Creative Commons Zero waiver
• Began charging submission fees in September 2013
• Data in Dryad receives a permanent, unique Digital object identifier (DOI)
Dryad
Data repositories
http://datadryad.org/
•For datasets
associated with
publications only
•$80/data package,
unless…
•Journal sponsors
the submission
•Discipline agnostic
•Some integration
w/journals
•Metadata (DC)
A look at Dryad
figshare
• Repository for data and files (figures, datasets, images, audio and videos, articles
(including pre-print), posters, software und file-sets)
• Advantage: items are attributed a DOI, allows researchers to publish negative
data, altmetrics, tracks the download statistics for hosted materials, acting in turn
as a source for altmetrics, partnership with PLOS
• Disadvantage: operated by Macmillan (Nature)
Data repositories
http://figshare.com/
A look at figshare
Browsing in figshare
Data repositories
GitHub
DataCite Protocols.io
GitHub is a web-based Git or version
control repository and Internet hosting
service. It offers all of the distributed
version control and source code
management (SCM) functionality of Git
as well as adding its own features. It
provides access control and several
collaboration features such as bug
tracking, feature requests, task
management, and wikis for every
project
An up-to-date open access
repository of science
methods and a collaborative
protocol-centered platform,
to find and share life science
protocols
.
DataCite is a leading
global non-profit
organisation that
provides persistent
identifiers (DOIs) for
research data and
other research
outputs. Organizations
within the research
community join
DataCite as members
to be able to assign
DOIs to all their
research outputs.
Data repositories
ROAR
ICSU World Data System
Research Data Australia
Data repositories
Research Data Australia helps you find,
access, and reuse data for research from
over one hundred Australian research
organisations, government agencies, and
cultural institutions.
The aim of ROAR is to promote the development of
open access by providing timely information about
the growth and status of repositories throughout the
world.
WDS aims to facilitate scientific research by
coordinating and supporting trusted
scientific data services for the provision,
use, and preservation of relevant datasets,
while strengthening their links with the
research community.
Try to find data for your thesis in these repositories
Exercise 2.2
Repository Results Remarks
Zenodo
Dryad
Figshare
GitHub
Research Data Australia
ROAR
ICSU
Ecology
 Long Term Ecological Research (LTER)
https://portal.lternet.edu/nis/home.jsp
 EcoTrends: http://www.ecotrends.info/
 Ecological Society of America (ESA) Data Registry and Archive:
http://data.esa.org/esa/style/skins/esa/index.jsp
 Knowledge Network for Biocomplexity (KNB):
https://knb.ecoinformatics.org/index.jsp
 Oceanographic Data Repositories: provides access to several
oceanographic data repositories created by the US Joint Global Ocean
Flux Study and US Global Ocean Ecosystem Dynamic programs.
 Global Biodiversity Information Facility: http://www.gbif.org/
Discipline-related repositories
Life and Biological Sciences
 Biogeographic Information and Observation System
(BIOS).
 Protein DataBank - Experimentally determined structures
for macromolecules (protein and nucleic acids). The site
includes search and visualization tools
 TreeBase: http://treebase.org/treebase-web/home.html
Discipline-related repositories
Environmental and Geosciences
 Marine Geoscience Data System (MGDS): A data portal, hosted at the
Lamont-Doherty Earth Observatory (Columbia University)
 National Climatic Data Center (NCDC) : Meteorology and
paleoclimatology
 National Oceanographic Data Center (NODC): World-wide marine
environmental and ecosystem data
 National Snow and Ice Data Center (NSIDC): Cryospheric datasets
from ground field reseach and satellites
 DataONE (Data Observation Network for Earth):
https://search.dataone.org/data
 Kompetenzzentrum Forschungsdaten: https://www.komfor.net/data-
portal.html
 Polar Data Catalogue: https://www.polardata.ca/
Discipline-related repositories
Environmental and Geosciences
 DASH (University Corporation for Atmospheric Research&National
Centre for Atmospheric Research
 WDC (World Data Center for Climate): https://cera-
www.dkrz.de/WDCC/ui/cerasearch/
 Climate Data at the National Center for Atmospheric Research:
https://www.earthsystemgrid.org/home.html
 ENES (European Network for Earth System Modelling):
https://verc.enes.org/
 EarthChem - EarthChem operates and maintains a suite of data
systems and data collections that provide access to a wide variety of
solid earth data
Discipline-related repositories
Environmental and Geosciences
 Atmospheric radiation measurement data: focuses on obtaining
continuous measurements and providing data products that
promote the advancement of climate models.
 CUAHSI: a list of web portals and/or websites with data or links to
data on water resources. The portals generally provide data that
are at a minimum national in scope, and many of the portals offer
global data.
 British Atmospheric Data Centre (BADC) - Data Centre for the
Atmospheric Sciences
 KNMI Climate Explorer: http://climexp.knmi.nl/
 USGS Water Data for the Nation: https://waterdata.usgs.gov/nwis
 PANGAEA Data Publisher for Earth & Environmental Science
http://www.pangaea.de/
Discipline-related repositories
GIS and Geography
 GeoCommons.com GIS file repository and finding tool
 Federal Geographic Data Committee - Provides access to the
National Spatial Data Infrastructure (NSDI) Clearing House
Network and the geodata.gov portal
 http://inspire-geoportal.ec.europa.eu/ : The INSPIRE Geoportal
is the central European access point to the data provided by EU
Member States and several EFTA countries under the INSPIRE
Directive.
 Geoportal, Geodaten aus Deutschland
http://www.geoportal.de/
 Geodatenkatalog : https://wiki.gdi-de.org/display/gdk
Discipline-related repositories
Remote Sensing
 GEOSS Datenportal http://www.geoportal.org
 CEOS: Data & Tools of the Commitee on Earth Observation
Satellites
Discipline-related repositories
Chemistry
 ORNL DAAC for Biogeochemical Dynamics - The Oak Ridge National
Laboratory Distributed Active Archive Center for biogeochemical
dynamics is one of the NASA Earth Observing System Data and
Information System
 Cambridge Structural Database - small molecule crystal structures
 ChemSpider - free-to-access collection of chemical structures and
their associated information
 eCrystals - x-ray crystallographic data
 PubChem - NCBI's repository of bioactivy/bioassay data and
information for "small" molecules (i.e. not macromolecular). Both
text-based and structure-based search tools are provided
Discipline-related repositories
Social Sciences
 ICPSR (Inter-university Consortium for Political and Social
Research at the University of Michigan.
 Dataverse Network is a collection of social science research
data contained in virtual data archives called "dataverses".
 FORS : Schweizer Kompetenzzentrum für
Sozialwissenschaften. FORS führt große nationale und
internationale Umfragen durch, bietet Daten- und
Forschungsinformationsdienste für Forscher und akademische
Einrichtungen an.
 SSOAR : Social Science Open Access Repository
Discipline-related repositories
Exercise 2.3
Look through discipline-related repositories in your field.
• Have a close look at the records to see the ways repositories
have made their records discoverable and accessible. List
positive and negative aspects of the search in those
repositories.
• Can you already find data that you could use?
Save one dataset you maybe could use.
Discipline-related repositories
DataSearch
As of June 2016, they are (completely or partially) indexing the following content
sources:
a) Tables, figures and supplementary data associated with papers in ScienceDirect, arXiv
b) EarthChem Portal , Dryad, ICPSR, Harvard Dataverse, Mendeley Data, NeuroElectro,
PANGAEA and ThemoML
Data Search Machine
Google Dataset Search
Data Search Machine
How well does the Google Search work, after your knowledge and
experiences with the data repositories you have looked at?
Exercise
How well do these two search machines work, after your knowledge and experiences
with the data repositories you have looked at?
Can you refind the data you got out of the repositories?
Data Search Machine
32
Earth System Science Data
Data papers & data journals
33
Geoscience Data Journal
Data papers & data journals
Biodiversity Data Journal
34
Data papers & data journals
Nature Scientific Data
35
Data papers & data journals
Journal of Open Psychology Data
36
Data papers & data journals
Journal of Open Research Software
37
Data papers & data journals
Geoscientific Model Development
38
Data papers & data journals
CODATA Data Science Journal
39
The CODATA Data Science Journal is a peer-reviewed, open access, electronic journal, publishing papers
on the management, dissemination, use and reuse of research data and databases across all research
domains, including science, technology, the humanities and the arts.
Data papers & data journals
Data in Brief
40
The CODATA Data Science Journal is a peer-reviewed, open access, electronic journal, publishing papers
on the management, dissemination, use and reuse of research data and databases across all research
domains, including science, technology, the humanities and the arts.
Data papers & data journals
Sciencematters.io
Data papers & data journals
Exercise 2.4
A list of further data journals is here:
https://www.wiki.ed.ac.uk/display/datashare/Sources+of+datase
t+peer+review
Data papers & data journals
Have a look at the “About” of one of these journals.
1. What didn’t you expect to see?
2. What do think is the advantage to publish your
data in a special data journal?
3. What could be the advantage for the progress of
science and for the public
Data Data Citation Index (I)
Data Citation Index (II)
• started in October 2012
• There are about 380 Repositories in DCI
• crossdisciplinary, main focus on science; 50% of the
data are from medicine
• Linked with the bibliographic record in Web of
Science
• Linking of Peer-Reviewed-Articles with underlying
reserach data
• Uniform metadata schema
Data Citation Index (III)
Data Citation Index – Descriptive Document
Data Citation Index: Search
- plasma membrane protein*
Result list
A Dataset with link to its source
GLOBAL OPEN DATA INDEX
UNdata
Google Public Data
Google Public Data
UN Comtrade Database
Demographic Yearbook
Migration Data Catalogue
Millenium Development Goals Indicators
Monthly Bulletin of Statistics Online
ServiceTrade Statistics
Social Indicators
World Bank Data
How to get data from the world bank data portal
Eurostat
opendata.swiss
Data Portals
datahub
https://www.ons.gov.uk/
https://www.ordnancesurvey.co.uk/
business-and-
government/products/opendata-
products.html
https://data.gov.uk/
Exercise 2.5
1. Have a look at Web of Science Data Citation Index. In which
respect could this Database become of use for your master
thesis? Can you find something you can use?
2. Choose one or two of the open statistics sites you would
like to have a closer look at, just for interest and/or your
private life.
data.ac.uk/
“A landmark site for academia providing a single point of
contact for linked open data development.”
It not only provides access to the know-how and tools to discuss and
create linked data and data aggregation sites, but also enables access
to, and the creation of, large aggregated data sets providing powerful
and flexible collections of information.
Auffinden, Zitieren, Dokumentieren
Tips for searching for data (from the
Data Journalism Handbook)
• When searching for data, make sure that you
include both search terms relating to the
content of the data you’re trying to find as
well as some information on the format or
source that you would expect it to be in.
• Google and other search engines allow you to
search by file type.
http://datajournalismhandbook.org/1.0/en/getting_data_0.html
For example, you can look only for…
• Spreadsheets (by appending your search with ‘filetype:XLS
filetype:CSV’)
• Geodata (‘filetype:shp’)
• Database extracts (‘filetype:MDB, filetype:SQL, filetype:DB’).
• PDFs (‘filetype:pdf’).
• You can also search by part of a URL. Googling for
‘inurl:downloads filetype:xls’ will try to find all Excel files that
have “downloads” in their web address.
• Another popular trick is not to search for content
directly, but for places where bulk data may be
available.
• (if you find a single download, it’s often worth just
checking what other results exist for the same folder
on the web server). You can also limit your search to
only those results on a single domain name, by
searching for, e.g. ‘site:agency.gov’.
• For example, ‘site:agency.gov Directory Listing’ may
give you some listings generated by the web server
with easy access to raw files, while ‘site:agency.gov
Database Download’ will look for intentionally created
listings.

More Related Content

2 Discovery and Acquisition of Data1.pptx

  • 1. GEO 802, Data Information Literacy Fall 2020 – Lecture 2 Gary Seitz, MA
  • 2. Lesson 2 Outline Portals for data publication Data Repositories Discipline-related repositories Open Data from organizations Luis Prado from The Noun Project
  • 3. Re3data Registry of Research Data Repositories
  • 4. Re3data Registry of Research Data Repositories
  • 5. Re3data Registry of Research Data Repositories
  • 6. Re3data Registry of Research Data Repositories
  • 7. Check Registry of Research Data Repositories www.re3data.org:  Can you find data repositories in your field?  List 5 data repositories, where you think you could find data for your thesis. Exercise 2.1
  • 8. Open Access Directory: Data Repositories Launched in 2008 and hosted by the Graduate School of Library and Information Science at Simmons College, the Open Access Directory is a wiki that lists links to over 50 open data repositories in the disciplines of archaeology, biology, chemistry, environmental sciences, geology, geosciences and geospatial data, marine sciences, medicine and physics, as well as multidisciplinary open data repositories. Directories of repositories
  • 9. Zenodo • A research data repository. It was created by OpenAIRE and CERN to provide a place for researchers to deposit datasets • Has integration with GitHub to make code hosted in GitHub citable • Provides secure archiving and referability, including digital object identifiers (DOIs) • Easy access • Disadvantage: No curation, no quality control Data repositories
  • 10. • International repository of data underlying scientific and medical publications • All data files are associated with a published article, and are made available for reuse under the terms of a Creative Commons Zero waiver • Began charging submission fees in September 2013 • Data in Dryad receives a permanent, unique Digital object identifier (DOI) Dryad Data repositories
  • 11. http://datadryad.org/ •For datasets associated with publications only •$80/data package, unless… •Journal sponsors the submission •Discipline agnostic •Some integration w/journals •Metadata (DC) A look at Dryad
  • 12. figshare • Repository for data and files (figures, datasets, images, audio and videos, articles (including pre-print), posters, software und file-sets) • Advantage: items are attributed a DOI, allows researchers to publish negative data, altmetrics, tracks the download statistics for hosted materials, acting in turn as a source for altmetrics, partnership with PLOS • Disadvantage: operated by Macmillan (Nature) Data repositories
  • 16. GitHub DataCite Protocols.io GitHub is a web-based Git or version control repository and Internet hosting service. It offers all of the distributed version control and source code management (SCM) functionality of Git as well as adding its own features. It provides access control and several collaboration features such as bug tracking, feature requests, task management, and wikis for every project An up-to-date open access repository of science methods and a collaborative protocol-centered platform, to find and share life science protocols . DataCite is a leading global non-profit organisation that provides persistent identifiers (DOIs) for research data and other research outputs. Organizations within the research community join DataCite as members to be able to assign DOIs to all their research outputs. Data repositories
  • 17. ROAR ICSU World Data System Research Data Australia Data repositories Research Data Australia helps you find, access, and reuse data for research from over one hundred Australian research organisations, government agencies, and cultural institutions. The aim of ROAR is to promote the development of open access by providing timely information about the growth and status of repositories throughout the world. WDS aims to facilitate scientific research by coordinating and supporting trusted scientific data services for the provision, use, and preservation of relevant datasets, while strengthening their links with the research community.
  • 18. Try to find data for your thesis in these repositories Exercise 2.2 Repository Results Remarks Zenodo Dryad Figshare GitHub Research Data Australia ROAR ICSU
  • 19. Ecology  Long Term Ecological Research (LTER) https://portal.lternet.edu/nis/home.jsp  EcoTrends: http://www.ecotrends.info/  Ecological Society of America (ESA) Data Registry and Archive: http://data.esa.org/esa/style/skins/esa/index.jsp  Knowledge Network for Biocomplexity (KNB): https://knb.ecoinformatics.org/index.jsp  Oceanographic Data Repositories: provides access to several oceanographic data repositories created by the US Joint Global Ocean Flux Study and US Global Ocean Ecosystem Dynamic programs.  Global Biodiversity Information Facility: http://www.gbif.org/ Discipline-related repositories
  • 20. Life and Biological Sciences  Biogeographic Information and Observation System (BIOS).  Protein DataBank - Experimentally determined structures for macromolecules (protein and nucleic acids). The site includes search and visualization tools  TreeBase: http://treebase.org/treebase-web/home.html Discipline-related repositories
  • 21. Environmental and Geosciences  Marine Geoscience Data System (MGDS): A data portal, hosted at the Lamont-Doherty Earth Observatory (Columbia University)  National Climatic Data Center (NCDC) : Meteorology and paleoclimatology  National Oceanographic Data Center (NODC): World-wide marine environmental and ecosystem data  National Snow and Ice Data Center (NSIDC): Cryospheric datasets from ground field reseach and satellites  DataONE (Data Observation Network for Earth): https://search.dataone.org/data  Kompetenzzentrum Forschungsdaten: https://www.komfor.net/data- portal.html  Polar Data Catalogue: https://www.polardata.ca/ Discipline-related repositories
  • 22. Environmental and Geosciences  DASH (University Corporation for Atmospheric Research&National Centre for Atmospheric Research  WDC (World Data Center for Climate): https://cera- www.dkrz.de/WDCC/ui/cerasearch/  Climate Data at the National Center for Atmospheric Research: https://www.earthsystemgrid.org/home.html  ENES (European Network for Earth System Modelling): https://verc.enes.org/  EarthChem - EarthChem operates and maintains a suite of data systems and data collections that provide access to a wide variety of solid earth data Discipline-related repositories
  • 23. Environmental and Geosciences  Atmospheric radiation measurement data: focuses on obtaining continuous measurements and providing data products that promote the advancement of climate models.  CUAHSI: a list of web portals and/or websites with data or links to data on water resources. The portals generally provide data that are at a minimum national in scope, and many of the portals offer global data.  British Atmospheric Data Centre (BADC) - Data Centre for the Atmospheric Sciences  KNMI Climate Explorer: http://climexp.knmi.nl/  USGS Water Data for the Nation: https://waterdata.usgs.gov/nwis  PANGAEA Data Publisher for Earth & Environmental Science http://www.pangaea.de/ Discipline-related repositories
  • 24. GIS and Geography  GeoCommons.com GIS file repository and finding tool  Federal Geographic Data Committee - Provides access to the National Spatial Data Infrastructure (NSDI) Clearing House Network and the geodata.gov portal  http://inspire-geoportal.ec.europa.eu/ : The INSPIRE Geoportal is the central European access point to the data provided by EU Member States and several EFTA countries under the INSPIRE Directive.  Geoportal, Geodaten aus Deutschland http://www.geoportal.de/  Geodatenkatalog : https://wiki.gdi-de.org/display/gdk Discipline-related repositories
  • 25. Remote Sensing  GEOSS Datenportal http://www.geoportal.org  CEOS: Data & Tools of the Commitee on Earth Observation Satellites Discipline-related repositories
  • 26. Chemistry  ORNL DAAC for Biogeochemical Dynamics - The Oak Ridge National Laboratory Distributed Active Archive Center for biogeochemical dynamics is one of the NASA Earth Observing System Data and Information System  Cambridge Structural Database - small molecule crystal structures  ChemSpider - free-to-access collection of chemical structures and their associated information  eCrystals - x-ray crystallographic data  PubChem - NCBI's repository of bioactivy/bioassay data and information for "small" molecules (i.e. not macromolecular). Both text-based and structure-based search tools are provided Discipline-related repositories
  • 27. Social Sciences  ICPSR (Inter-university Consortium for Political and Social Research at the University of Michigan.  Dataverse Network is a collection of social science research data contained in virtual data archives called "dataverses".  FORS : Schweizer Kompetenzzentrum für Sozialwissenschaften. FORS führt große nationale und internationale Umfragen durch, bietet Daten- und Forschungsinformationsdienste für Forscher und akademische Einrichtungen an.  SSOAR : Social Science Open Access Repository Discipline-related repositories
  • 28. Exercise 2.3 Look through discipline-related repositories in your field. • Have a close look at the records to see the ways repositories have made their records discoverable and accessible. List positive and negative aspects of the search in those repositories. • Can you already find data that you could use? Save one dataset you maybe could use. Discipline-related repositories
  • 29. DataSearch As of June 2016, they are (completely or partially) indexing the following content sources: a) Tables, figures and supplementary data associated with papers in ScienceDirect, arXiv b) EarthChem Portal , Dryad, ICPSR, Harvard Dataverse, Mendeley Data, NeuroElectro, PANGAEA and ThemoML Data Search Machine
  • 30. Google Dataset Search Data Search Machine How well does the Google Search work, after your knowledge and experiences with the data repositories you have looked at?
  • 31. Exercise How well do these two search machines work, after your knowledge and experiences with the data repositories you have looked at? Can you refind the data you got out of the repositories? Data Search Machine
  • 32. 32 Earth System Science Data Data papers & data journals
  • 33. 33 Geoscience Data Journal Data papers & data journals
  • 34. Biodiversity Data Journal 34 Data papers & data journals
  • 35. Nature Scientific Data 35 Data papers & data journals
  • 36. Journal of Open Psychology Data 36 Data papers & data journals
  • 37. Journal of Open Research Software 37 Data papers & data journals
  • 38. Geoscientific Model Development 38 Data papers & data journals
  • 39. CODATA Data Science Journal 39 The CODATA Data Science Journal is a peer-reviewed, open access, electronic journal, publishing papers on the management, dissemination, use and reuse of research data and databases across all research domains, including science, technology, the humanities and the arts. Data papers & data journals
  • 40. Data in Brief 40 The CODATA Data Science Journal is a peer-reviewed, open access, electronic journal, publishing papers on the management, dissemination, use and reuse of research data and databases across all research domains, including science, technology, the humanities and the arts. Data papers & data journals
  • 42. Exercise 2.4 A list of further data journals is here: https://www.wiki.ed.ac.uk/display/datashare/Sources+of+datase t+peer+review Data papers & data journals Have a look at the “About” of one of these journals. 1. What didn’t you expect to see? 2. What do think is the advantage to publish your data in a special data journal? 3. What could be the advantage for the progress of science and for the public
  • 43. Data Data Citation Index (I)
  • 44. Data Citation Index (II) • started in October 2012 • There are about 380 Repositories in DCI • crossdisciplinary, main focus on science; 50% of the data are from medicine • Linked with the bibliographic record in Web of Science • Linking of Peer-Reviewed-Articles with underlying reserach data • Uniform metadata schema
  • 45. Data Citation Index (III) Data Citation Index – Descriptive Document
  • 46. Data Citation Index: Search - plasma membrane protein*
  • 48. A Dataset with link to its source
  • 57. Monthly Bulletin of Statistics Online
  • 60. World Bank Data How to get data from the world bank data portal
  • 66. Exercise 2.5 1. Have a look at Web of Science Data Citation Index. In which respect could this Database become of use for your master thesis? Can you find something you can use? 2. Choose one or two of the open statistics sites you would like to have a closer look at, just for interest and/or your private life.
  • 67. data.ac.uk/ “A landmark site for academia providing a single point of contact for linked open data development.” It not only provides access to the know-how and tools to discuss and create linked data and data aggregation sites, but also enables access to, and the creation of, large aggregated data sets providing powerful and flexible collections of information.
  • 69. Tips for searching for data (from the Data Journalism Handbook) • When searching for data, make sure that you include both search terms relating to the content of the data you’re trying to find as well as some information on the format or source that you would expect it to be in. • Google and other search engines allow you to search by file type. http://datajournalismhandbook.org/1.0/en/getting_data_0.html
  • 70. For example, you can look only for… • Spreadsheets (by appending your search with ‘filetype:XLS filetype:CSV’) • Geodata (‘filetype:shp’) • Database extracts (‘filetype:MDB, filetype:SQL, filetype:DB’). • PDFs (‘filetype:pdf’). • You can also search by part of a URL. Googling for ‘inurl:downloads filetype:xls’ will try to find all Excel files that have “downloads” in their web address.
  • 71. • Another popular trick is not to search for content directly, but for places where bulk data may be available. • (if you find a single download, it’s often worth just checking what other results exist for the same folder on the web server). You can also limit your search to only those results on a single domain name, by searching for, e.g. ‘site:agency.gov’. • For example, ‘site:agency.gov Directory Listing’ may give you some listings generated by the web server with easy access to raw files, while ‘site:agency.gov Database Download’ will look for intentionally created listings.

Editor's Notes

  1. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Please cite this work as “Whitmire, Amanda L. (2014). Research Data Management Curriculum, Lecture 15: Plan for Archiving and Preservation of Data. Oregon State University Libraries. Retrieved [date] from: http://guides.library.oregonstate.edu/grad521lectures.” Slides attributed to the UKDA have the following citation: “Research Data Management Team, UK Data Archive, University Of Essex (2012). Managing and Sharing Data: Training Resources. UK Data Service. Retrieved 29 May, 2012 from: http://data-archive.ac.uk/media/335419/trainingresources.zip.” Slides attributed to the DCC have the following citation: “Whyte, A. & Wilson, A. (2010). "How to Appraise and Select Research Data for Curation". DCC How-to Guides. Edinburgh: Digital Curation Centre. http://www.dcc.ac.uk/resources/how-guides.“ Slides attributed to DataONE have the following citation: “DataONE Education Module: Protecting Your Data: Backups, Archives, and Data Preservation. DataONE. Retrieved Jan. 5, 2014. From http://www.dataone.org/sites/all/documents/L06_DataProtectionBackups.pptx.”
  2. Image credit: Surveying by Luis Prado from The Noun Project
  3. “All journals with either integrated data submission or sponsored Data Publishing Charges are listed below. Our institutional sponsors fund submissions by their affiliated researchers. Note that for large data packages, submitters will be asked to pay an additional $15 for the first GB beyond 10GB and $10 for each GB thereafter.” I counted on 2/19/2014, and there were 64 journals listed as being integrated and/or sponsored entities. “$80 per data package, payable by the submitter”
  4. About figshare: free unlimited storage take files of any format all items receive DOIs all content is CC0 (data) or CC-BY (other stuff) reciprocal linking with PLOS, Nature, Faculty of 1000, more… metadata: no rules or constraints (can of worms…)
  5. Content under the “Health Sciences” category: formal datasets, posters, a blog post (?!), a citizen science dataset, PowerPoint presentation, etc.
  6. We have completely or partially indexed the following: Dryad EarthChem Portal from The Interdisciplinary Earth Data Alliance (IEDA) : Geochemistry of Rocks of the Oceans and Continents (GEOROC) MetPetDB The North American Volcanic and Intrusive Rock Database (NAVDAT) PetDB U.S. Geological Survey (USGS) Mineral Resources National Geochemical Database (MR NGDB) Harvard Dataverse The Inter-university Consortium for Political and Social Research (ICPSR) Mendeley Data NeuroElectro PANGAEA ThermoML - Thermodynamic Research Center (TRC) at the National Institute of Standards and Technology (NIST) Metadata from: 4TU.Centre of Research Data Apollo - University of Cambridge DataSpace - Princeton University DSpace - University of Washington LSHTM Data Compass - London School of Hygiene & Tropical Medicine Médecins Sans Frontières (MSF) Smithsonian Zenodo Tables, figures and supplementary data associated with papers from: arXiv ScienceDirect
  7. What is a data paper? http://guides.library.oregonstate.edu/data-management-data-papers-journals “Data papers facilitate the sharing of data in a standardized framework that provides value, impact, and recognition for authors. Data papers also provide much more thorough context and description than datasets that are simply deposited to a repository (which may have very minimal metadata requirements).” “Data papers thoroughly describe datasets, and do not usually include any interpretation or discussion (an exception may be discussion of different methods to collect the data, e.g.).” Earth System Science Data (ESSD) is an international, interdisciplinary journal for the publication of articles on original research data(sets), furthering the reuse of high (reference) quality data of benefit to Earth System Sciences. The editors encourage submissions on original data or data collections which are of sufficient quality and potential impact to contribute to these aims.
  8. Geoscience Data Journal provides an Open Access platform where scientific data can be formally published, in a way that includes scientific peer-review. Thus the dataset creator attains full credit for their efforts, while also improving the scientific record, providing version control for the community and allowing major datasets to be fully described, cited and discovered. An online-only journal, GDJ publishes short data papers cross-linked to – and citing – datasets that have been deposited in approved data centres and a warded DOIs. The journal will also accept articles on data services, and articles which support and inform data publishing best practices. broad range of geoscience disciplines, including, but not limited to: Weather and Climate; Oceanography; Atmospheric and Ocean Chemistry; Cryosphere; Biosphere, Land Surface and Geology, Hydrology, Geochemistry, Geophysics, Planetary and Space Sciences.
  9. Biodiversity Data Journal (BDJ) is a community peer-reviewed, open-access, comprehensive online platform, designed to accelerate publishing, dissemination and sharing of biodiversity-related data of any kind. All structural elements of the articles – text, morphological descriptions, occurrences, data tables, etc. – will be treated and stored as DATA, in accordance with the Data Publishing Policies and Guidelines of Pensoft Publishers. taxonomic, floristic/faunistic, morphological, genomic, phylogenetic, ecological or environmental data
  10. Scientific Data is a peer-reviewed, open-access journal for descriptions of scientifically valuable datasets, and research that advances the sharing and reuse of scientific data. broad range of natural science disciplines, including, but not limited to, data from the life, biomedical and environmental science
  11. The Journal of Open Psychology Data (JOPD) features peer reviewed data papers describing psychology datasets with high reuse potential. Data papers may describe data from unpublished work, including replication research, or from papers published previously in a traditional journal. Any kind of psychology data is acceptable, including from correlational, descriptive and experimental research, e.g. case studies, computer simulations, experimental results, interviews and surveys, neuroimaging data, etc.
  12. geoscientific model descriptions, from statistical models to box models to GCMs; new parameterizations or technical aspects of running models such as the reproducibility of results; new methods for assessment of models, including work on developing new metrics for assessing model performance and novel ways of comparing model results with observational data; papers describing new standard experiments for assessing model performance or novel ways of comparing model results with observational data; model experiment descriptions, including experimental details and project protocols; full evaluations of previously published models.
  13. Vorbild war die Kooperation von Pangaea und Elsevier - fachlicher Schwerpunkt: 80% Nawi, 18% Sowi, 2% Geisteswiss.
  14. - Kategorien sind hierarchisch zu verstehen: Repositorien beinhalten Datenstudien, Datenstudien beinhalten Datensätze [??] - reine Publikationsserver von Institutionen werden nicht aufgenommen, ebenso keine Metadatenkataloge, die auf Daten verweisen, die woanders archiviert sind