Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
The Australian Data Archive
- bringing the 19th and 20th
centuries into the 21st
Dr. Steve McEachern
Deputy Director, ADA

Future Perfect Conference
March 2012
Presentation
Overview
1.    About ADA
2.    Structure of the ADA
3.    ADA Deposit and Ingest
4.    Data access
5.    Data visualisation
6.    Infrastructure
7.    Current activities and future directions
1. About ADA
ADA in Brief
 •  The Social Science Data Archive (now ADA) was set up
 in 1981, housed in the Research School of Social
 Sciences, Australian National University, with a mission to
 collect and preserve Australian social science data on
 behalf of the social science research community
 •  Now includes nodes at University of Melbourne,
 University of Queensland, University of Western Australia,
 University of Technology Sydney, with infrastructure
 provided by the ANU Supercomputer Facility
 •  The Archive holds some 2400 data sets, including
 national election studies; public opinion polls; social
 attitudes surveys. Data holdings are sourced from
 academic, government and private sectors.
ADA NCRIS/NeAT
     development
The original research community needs identified by the ASSDA Advisory
   Panel to be addressed by the ASeSS project were as follows:
1.  A coherent single point of access for nationally significant social
    science and associated humanities resources, including access for
    researchers, students, government bodies, and other external
    agencies.
2.  Reliable access to the major national social surveys.
3.  Management of a diverse range of data forms needed to help answer
    research questions across these different forms: eg: unit record data,
    qualitative data, economics data, including a high level of data
    documentation that allows researchers to quickly identify its relevance
    and quality for research purposes.
4.  Easy access to specialised collections, eg: topic based data, such as
    data relating to ageing; colonial data; indigenous data.
5.  Provide fast search across all this data.
6.  Easy access to data analysis tools, including the development of
    advanced analytical and visualisation tools and capability (outside of
    commercially available products) that provide additional value to the
    data archives and support the ‘unlocking’ of otherwise inaccessible data
    sets of national significance.
7.  Computational modelling, expertise and resources including
    computationally expensive statistical packages.
ADA Subarchives

•  Social Science – predominantly survey or polling based
   quantitative social science data
•  Historical – an archive of Australian census data tables
   from 1834 to the present day
•  Indigenous – A thematic archive bringing together
   research data about Aboriginal and Torres Strait Islanders
•  Longitudinal –major longitudinal cohort and panel surveys
   of the Australian population
•  Qualitative – a new collection which provides specialist
   data archiving and access services to qualitative
   researchers
•  Crime & Justice – major collections of data in crime, law
   and justice, including criminal justice administrative data
•  International – a central point of access for links to
   international data sources around the world
Structure of the ADA
Approach

•  Core archive website:
   –  http://www.ada.edu.au
•  Sub-archives focussed on specialised thematic or
   methodological areas
   -  eg. http://www.ada.edu.au/indigenous/home
•  “Add-on” systems for complex analysis or
   visualisation tasks:
   –    Nesstar
   –    GIS: http://gis-test.ada.edu.au
   –    Longitudinal visualisation: Panemalia
   –    Historical census data: http://hccda.ada.edu.au
New OAIS architecture
The ADA website
ADA Deposit and Ingest
Data deposit: ADAPT
Steve Mc Eachern Australian Data Archive
Archival processing

Manual system with some automation tools
1.  Deposit:
   –  Review of ADAPT submission
   –  Storage via ADAPT to file store
2.  Data processing:
   –  File format conversion (usually to SPSS for processing)
   –  Privacy/confidentiality review
   –  Data cleaning (in consultation with depositor)
3.  Metadata processing:
   –    DDI-C metadata creation in Nesstar Publisher
4.  Publishing:
   –    Archival storage and access format creation
   –    Data publication to Nesstar server
   –    Metadata publication to Nesstar and ADA CMS
Data Access
Finding data

There are two methods for finding data in the Australian
   Data Archive:
•  Browsing the ADA Data Catalogue
•  Searching for data using the ADA search engine



Searching or browsing from within one of the ADA
  subarchives automatically limits the results to data
  from within that subarchive.
Search results
Browsing the catalogue
The ADA study page


Study information is available through the tabs at the top of the
   study:
•  Study: information including the investigators, abstract,
   sample, data collection methods, and access requirements.
•  Variables: a list of variables available in a quantitative dataset
•  Related Materials: additional documentation, links and other
   related studies (eg. others in the series) that may interest you
The study page is also the access point for the ADA Nesstar
   system, for:
•  Analysis of quantitative data online,
•  Download of data to your own computer.
The ADA Study Page
Data visualisation at ADA
Data visualisation

•  Interest in the use of data visualisation methods to
   explore survey data through web-based tools;
•  Used open-source tools and open standards such as the
   OGC WMS for web maps delivery, and Panemalia parallel
   coordinates plot software
•  GIS capability has had implications for the entire data
   workflow for archiving of survey data.
   –  Design of surveys to incorporate the accurate recording of
      geospatial identifiers,
   –  Maintaining confidentiality of geo-located respondents
      information to prevent identification by unauthorised users
   –  Allowing researchers access to the data in new and powerful
      ways.
•  Longitudinal tool revealed new requirements for
   metadata, which varies in quality and requires further
   preprocessing
GIS visualisation

                    http://gis.ada.edu.au
Longitudinal visualisation –
parallel coordinate plots
Longitudinal visualisation
Historical Census data


                         http://hccda.ada.edu.au
Infrastructure
ADA Infrastructure

•  Provided by NCI-ANUSF (National Computational
   Infrastructure)
•  As part of the current project, NCI-ANUSF migrated
   the Archive data services into its central cloud
   infrastructure.
•  This cloud infrastructure is a high-performance
   environment as well as providing a wide range of
   cloud services – from web frameworks to data-
   intensive analysis to robust archival capability.
•  This move has fundamentally changed the way ADA
   operates and has substantially increased the
   availability of our services.
Cloud
infrastructure
Current experiences and future
directions
Where are we now?

•  New archive interface: http://www.ada.edu.au
•  New thematic collections (indigenous, crime and
   justice, historical census, international)
•  New methodological collections (longitudinal,
   qualitative)
•  New analytical tools (particularly in visualisation)
Current experiences

Ingest and archiving
•  DDI provides core of all of our data deposit and archival
   processes
    –  Current work occurring for “qualitative” data
•  Nesstar and MySQL provides storage foundation
•  CMS: Ruby on Rails and Postgres (also used for spatial data)
Access
•  Access services involve various transformations for data
   discovery and access
•  CMS consumes DDI metadata (via Nesstar)
•  Longitudinal and GIS viz systems require further processing:
    –  ADA’s use of geographic attributes are inconsistent over time
    –  Longitudinal data management not suited to DDI2/DDI-C
Where to from here?
•  Audio-visual (LIEF 2011-12)
•  NeCTAR program: Data integration
   –    Secure data access (administrative data, data linkage)
   –    Qualitative data documentation and analysis
   –    Historical/time series spatial analysis
   –    Geospatial and temporal data integration
   –    Integration across content types – eg.
         •  Election results, poll results, candidate surveys
         •  Census, survey and administrative data on a topic (eg. crime)
Questions or comments?

  For further information
  Web: http://www.ada.edu.au
  Email: ada@anu.edu.au

More Related Content

Steve Mc Eachern Australian Data Archive

  • 1. The Australian Data Archive - bringing the 19th and 20th centuries into the 21st Dr. Steve McEachern Deputy Director, ADA Future Perfect Conference March 2012
  • 2. Presentation Overview 1.  About ADA 2.  Structure of the ADA 3.  ADA Deposit and Ingest 4.  Data access 5.  Data visualisation 6.  Infrastructure 7.  Current activities and future directions
  • 4. ADA in Brief •  The Social Science Data Archive (now ADA) was set up in 1981, housed in the Research School of Social Sciences, Australian National University, with a mission to collect and preserve Australian social science data on behalf of the social science research community •  Now includes nodes at University of Melbourne, University of Queensland, University of Western Australia, University of Technology Sydney, with infrastructure provided by the ANU Supercomputer Facility •  The Archive holds some 2400 data sets, including national election studies; public opinion polls; social attitudes surveys. Data holdings are sourced from academic, government and private sectors.
  • 5. ADA NCRIS/NeAT development The original research community needs identified by the ASSDA Advisory Panel to be addressed by the ASeSS project were as follows: 1.  A coherent single point of access for nationally significant social science and associated humanities resources, including access for researchers, students, government bodies, and other external agencies. 2.  Reliable access to the major national social surveys. 3.  Management of a diverse range of data forms needed to help answer research questions across these different forms: eg: unit record data, qualitative data, economics data, including a high level of data documentation that allows researchers to quickly identify its relevance and quality for research purposes. 4.  Easy access to specialised collections, eg: topic based data, such as data relating to ageing; colonial data; indigenous data. 5.  Provide fast search across all this data. 6.  Easy access to data analysis tools, including the development of advanced analytical and visualisation tools and capability (outside of commercially available products) that provide additional value to the data archives and support the ‘unlocking’ of otherwise inaccessible data sets of national significance. 7.  Computational modelling, expertise and resources including computationally expensive statistical packages.
  • 6. ADA Subarchives •  Social Science – predominantly survey or polling based quantitative social science data •  Historical – an archive of Australian census data tables from 1834 to the present day •  Indigenous – A thematic archive bringing together research data about Aboriginal and Torres Strait Islanders •  Longitudinal –major longitudinal cohort and panel surveys of the Australian population •  Qualitative – a new collection which provides specialist data archiving and access services to qualitative researchers •  Crime & Justice – major collections of data in crime, law and justice, including criminal justice administrative data •  International – a central point of access for links to international data sources around the world
  • 8. Approach •  Core archive website: –  http://www.ada.edu.au •  Sub-archives focussed on specialised thematic or methodological areas -  eg. http://www.ada.edu.au/indigenous/home •  “Add-on” systems for complex analysis or visualisation tasks: –  Nesstar –  GIS: http://gis-test.ada.edu.au –  Longitudinal visualisation: Panemalia –  Historical census data: http://hccda.ada.edu.au
  • 11. ADA Deposit and Ingest
  • 14. Archival processing Manual system with some automation tools 1.  Deposit: –  Review of ADAPT submission –  Storage via ADAPT to file store 2.  Data processing: –  File format conversion (usually to SPSS for processing) –  Privacy/confidentiality review –  Data cleaning (in consultation with depositor) 3.  Metadata processing: –  DDI-C metadata creation in Nesstar Publisher 4.  Publishing: –  Archival storage and access format creation –  Data publication to Nesstar server –  Metadata publication to Nesstar and ADA CMS
  • 16. Finding data There are two methods for finding data in the Australian Data Archive: •  Browsing the ADA Data Catalogue •  Searching for data using the ADA search engine Searching or browsing from within one of the ADA subarchives automatically limits the results to data from within that subarchive.
  • 19. The ADA study page Study information is available through the tabs at the top of the study: •  Study: information including the investigators, abstract, sample, data collection methods, and access requirements. •  Variables: a list of variables available in a quantitative dataset •  Related Materials: additional documentation, links and other related studies (eg. others in the series) that may interest you The study page is also the access point for the ADA Nesstar system, for: •  Analysis of quantitative data online, •  Download of data to your own computer.
  • 22. Data visualisation •  Interest in the use of data visualisation methods to explore survey data through web-based tools; •  Used open-source tools and open standards such as the OGC WMS for web maps delivery, and Panemalia parallel coordinates plot software •  GIS capability has had implications for the entire data workflow for archiving of survey data. –  Design of surveys to incorporate the accurate recording of geospatial identifiers, –  Maintaining confidentiality of geo-located respondents information to prevent identification by unauthorised users –  Allowing researchers access to the data in new and powerful ways. •  Longitudinal tool revealed new requirements for metadata, which varies in quality and requires further preprocessing
  • 23. GIS visualisation http://gis.ada.edu.au
  • 26. Historical Census data http://hccda.ada.edu.au
  • 28. ADA Infrastructure •  Provided by NCI-ANUSF (National Computational Infrastructure) •  As part of the current project, NCI-ANUSF migrated the Archive data services into its central cloud infrastructure. •  This cloud infrastructure is a high-performance environment as well as providing a wide range of cloud services – from web frameworks to data- intensive analysis to robust archival capability. •  This move has fundamentally changed the way ADA operates and has substantially increased the availability of our services.
  • 30. Current experiences and future directions
  • 31. Where are we now? •  New archive interface: http://www.ada.edu.au •  New thematic collections (indigenous, crime and justice, historical census, international) •  New methodological collections (longitudinal, qualitative) •  New analytical tools (particularly in visualisation)
  • 32. Current experiences Ingest and archiving •  DDI provides core of all of our data deposit and archival processes –  Current work occurring for “qualitative” data •  Nesstar and MySQL provides storage foundation •  CMS: Ruby on Rails and Postgres (also used for spatial data) Access •  Access services involve various transformations for data discovery and access •  CMS consumes DDI metadata (via Nesstar) •  Longitudinal and GIS viz systems require further processing: –  ADA’s use of geographic attributes are inconsistent over time –  Longitudinal data management not suited to DDI2/DDI-C
  • 33. Where to from here? •  Audio-visual (LIEF 2011-12) •  NeCTAR program: Data integration –  Secure data access (administrative data, data linkage) –  Qualitative data documentation and analysis –  Historical/time series spatial analysis –  Geospatial and temporal data integration –  Integration across content types – eg. •  Election results, poll results, candidate surveys •  Census, survey and administrative data on a topic (eg. crime)
  • 34. Questions or comments? For further information Web: http://www.ada.edu.au Email: ada@anu.edu.au