One Core Preservation System for all your Data. No Exceptions! Marco Klindt and Kilian Amrhein
•
1 like•770 views
Paper presented at the 12th International Conference on Digital Preservation, November 2-6, 2015. University of North Carolina at Chapel Hill.
Abstract:
In this paper, we describe an OAIS aligned data model and architectural design that enables us to archive digital information with a single core preservation workflow. The data model allows for normalization of metadata from widely varied domains to ingest and manage the submitted information utilizing only one generalized toolchain and be able to create access platforms that are tailored to designated data consumer communities. The design of the preservation system is not dependent on its components to continue to exist over its lifetime, as we anticipate changes both of technology and environment. The initial implementation depends mainly on the open-source tools Archivematica, Fedora/Islandora, and iRODS.
1 of 43
Download to read offline
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
More Related Content
One Core Preservation System for all your Data. No Exceptions! Marco Klindt and Kilian Amrhein
1. One Core Preservation
System for All your Data.
No Exceptions!
Marco Klindt, Kilian Amrhein
Zuse Institute Berlin (ZIB)
November 3, 2015
Frameworks for Digital Preservation
3. Why?
• Berlin funds digitization projects
for cultural heritage institutions (LAM)
• Servicecenter for Digitization (digiS) @ ZIB
supports these project with training and
technical solutions
• Sustainability demands digital
preservation service for digitization
outputs
4. ZIB what?
• Zuse Institute Berlin
Research Institute for
Applied Mathematics and
Computer Science
• Namesake Konrad Zuse:
an inventor of the computer
(Z1, 1938, 22bit floating point
processing... [German view])
8. Preservation is hard
• Digital Preservation as well
• Not feasible for smaller Institutions
• Provide Preservation as a Service utilizing
ZIB infrastructure and expertise
9. Even as a service
• Community effort (learn from each other)
• Depends on multiple Communities:
– Preservation
– IT
– Cultural Heritage
10. Architectural Requirements
1. Self-contained, self-documented
Information Packages (Intellectual
Entities)
2. Anticipate obsolescence of formats,
software tools, hardware, and
organisation
3. Loosely coupled Components with
defined Responsibilities
4. Use community (OSS) tools and
standards
11. Chapel Hill, USA
Data
Workflow
Open (Source)
• Do not reinvent the Wheel (it still remains
hard to maintain)
Canada
Ingest
Workflow
USA/Canada
Access/Management
Workflow
Code Glue inhouse
16. Deposit/Transfer Components
Administrative Information
(Submission Manifest)
Descriptive Information
(Metadata)
Content Information
(Binary or textual data)
Context Information
(Submission Documentation)
5x
1x
1x
314,159x 271,828x
000000
000001
To find Stuff (the archive)
To find Stuff (the depositor)
The Stuff
Stuff (maybe useful for users)
18. Descriptive Metadata
• Original description
in domain specific
Metadata formats
• Community standards
Content
Informa-
tion
DC
EAD
LIDO
MODS
Metadata
Unstructured
Data:
Text
Emails
...
Text
XML
Binary
Files
Content
Documentation
Content
Submission
19. Submission Metadata
• Submission Manifest
• YAML or METS
– Rights Information
– Contract and Contact
Information of
Depositor
– ...
• (nearly) complete SIP
DublinCore
Administrative
Description
Information
Content
Informa-
tion
Subm
ission
M
anifest
DC
EAD
LIDO
MODS
Metadata
Unstructured
Data:
Text
Emails
...
Text
L
Binary
Files
Content
Documentation
Content
Submission
SIP
20. MD Mapping
• Subset MD Mapping to
Dublin Core
– Metadata Object
Description Standard
(MODS)
– Encoded Archival
Description (EAD)
– Light Information
Describing Objects
(LIDO)
• SIP now ready for
Ingest.
Mapped to DC
DublinCore
Administrative
Description
Information
Content
Informa-
tion
DublinCore
Description
Information (DI)
Subm
ission
M
anifest
DC
EAD
LIDO
MODS
Metadata
Unstructured
Data:
Text
Emails
...
Text
XML
Binary
Files
Content
Documentation
Content
Submission
SIP
21. SIP Rejection
• We only require
Submission Manifest
(Administrative)
Metadata and DI
• If incomplete -> Reject
DublinCore
Administrative
Description
Information
Content
Informa-
tion
DublinCore
Description
Information (DI)
SIP
26. Preservation Levels
• Preservation level is perceived
not assigned:
– Passive (Known Unkown)
– Active (Known Known)
• Core Preservation Actions:
– Re-Identification
– Migration scheduling based on
FPR changes
Beholder
Rules
Technical
MD
+
+
Schedule
27. t
SIP
AIP
DIP
Data Management
iRODS
Archival Storage
Online &Tape
ManagementFedora/IslandoraAccess
P
R
e
p
o
sito
ry
AdminAccess
Compound
Object
ContentAccess
Compound
Object
Submission PDI
Mapped DC
Content
Submission PDI
Content Information
PDI (PREMIS)
Information
Content Description
Derivatives
rvices:
entification
characterisation
normalization
Data Management
32. • Access and Management System
• Dark Archive
– only for Admins and Depositors
• One Object (AIP) – Two Views:
– Admin Access View (us)
– Content Access View (them)
36. Amin Access
Compound Object
(AACO)
Administrative
Description for
Access and
Management
DublinCore
DublinCore
Administrative
Description
Information
Content
Informa-
tion
Description
Information (DI)
Preservation
Description
Information (PDI)
PREMIS
Normalized
binary and
text files
Text
XML
Binary
Files
SubmissionDocu.
AIP
AACO
Admin
Access
Compound
Object
39. AIP
Data
Model Mapped to DC
DublinCore
DublinCore
Administrative
Description
Information
Content
Informa-
tion
Description
Information (DI)
Preservation
Description
Information (PDI)
PREMIS
Normalized
binary and
text files
DerivativesAccesscopies
Subm
ission
M
anifest
DC
EAD
LIDO
MODS
Unstructured
Data:
Text
Emails
...
Text
XML
Binary
Files
Documentation
Content
Content
Submission
Metadata
AACO
CACO
Reference
AIP
Admin
Access
Compound
Object
Content
Access
Compound
Object
42. Contingency – Exit Strategy
• Archivematica: only find new
ingest workflow
• iRods: use filesystem
• Islandora: reingest into other
repository
• Organisation: Self-contained
AIP (transformation req‘d)
• No Strategy against Evil, yet!