Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
You are here: University of Vienna PHAIDRA Detail o:502901
Title
Designing Scalable Cyberinfrastructure for Metadata Extraction in Billion-Record Archives: Paper - iPRES 2016 - Swiss National Library, Bern
Language
English
Description (en)
We present a model and testbed for a curation and preservation infrastructure, \Brown Dog", that applies to heterogeneous and legacy data formats. \Brown Dog" is funded through a National Science Foundation DIBBs grant (Data Infrastructure Building Blocks) and is a partnership between the National Center for Supercomputing Applications at the University of Illinois and the College of Information Studies at the University of Maryland at College Park. In this paper we design and validate a \computational archives" model that uses the Brown Dog data services framework to orchestrate data enrichment activities at petabyte scale on a 100 million archival record collection. We show how this data services framework can provide customizable workflows through a single point of software integration. We also show how Brown Dog makes it straightforward for organizations to contribute new and legacy data extraction tools that will become part of their archival workows, and those of the larger community of Brown Dog users. We illustrate one such data extraction tool, a _le characterization utility called Siegfried, from development as an extractor, through to its use on archival data.
Author of the digital object
Gregory  Jansen
Smruti  Padhy
Richard  Marciano
Publisher
Swiss National Library, Bern
Format
application/pdf
Size
1.2 MB
Licence Selected
CC BY-NC-SA 3.0 AT
Content
Details
Object type
PDFDocument
Format
application/pdf
Created
27.01.2017 03:35:00
Metadata