The current guidance for transferring electronic records to NARA is outdated and limited in scope. This project aims to develop new, flexible guidance for all types of electronic records that addresses formats, metadata, and the full records lifecycle. The new guidance goals are to provide clear direction to agencies on acceptable formats, support digital continuity, and balance open standards with business needs. The guidance will stress sustainability factors and recognize how formats influence preservation and access.
2. Current Guidance (Developed Between 2002-2004)
Reflect NARA’s
capabilities at the time.
National Archives and Records Administration
3. Current Guidance: limited scope that does not
address all format types
National Archives and Records Administration
4. Current Guidance Products
Demonstrate a preference for open, standards-
based formats.
Require that agencies transform or normalize data
into acceptable formats prior to transfer*.
Have proven an obstacle to the steady transfer of
records.
Are referred to at many points of the lifecycle.
Cannot adapt to records with different retention
periods.
National Archives and Records Administration
5. Project Scope
In scope:
This project seeks to support the work of federal agencies by
providing flexible and realistic electronic records file format
guidance on all electronic records types for use when transferring
permanent records to NARA in accordance with the Federal
Records Act.
This project will identify and recommend changes but the
execution of any additional guidance including guidance for all
types of metadata as well as the revision of business
processes, or development of standard operating procedures is
beyond the scope of this project.
Out of scope:
Format guidance for other areas of the record lifecycle other than
transfer to NARA
Guidance on physical media
Records of the Executive Office of the President and the records
of the United States Congress. These branches are not covered
by the Federal Records Act and are therefore excluded from
consideration for this project
6. Project Phases
Phase 1: Planning and Preparation
July 1– December 30, 2011
Phase 2: Conduct Informational Meetings
February 6– August 12, 2012
Internal NARA SMEs
Future Perfect Conference
Agency representatives
Phase 3: Develop and Publish Guidance Product
May 29 – September 7, 2012
Phase 4: Evaluation and Completion
December 12 - December 21, 2012
7. Electronic Records Lifecycle
Migration Decommissioning
Processing
Maintenance
Transfer
Transformation
Preservation & Access Planning
System Transfer Planning
design/planning
Destruction Planning
Record Creation
Access
Preservation/
Ingest
Scheduling/Appraisal Maintenance
Regular
Public
Requests
(FOIA, etc.)
National Archives and Records Administration
8. Revised Guidance Content Categories
Electronic Textual Records
Digital Still Images
Digital Audio Records
Digital Moving Image Records
Structured Data
Geospatial Records
CAD and Vector Graphics
Web Records
E-mail Records
National Archives and Records Administration
9. Relevant Content Categories Definitions
Structured Data – includes the broad category of data that is stored in defined fields and
includes:
Databases – Database formats are organized collections of associated data that conform
to a logical structure. Database formats are determined by “data models” that describe
specific data structures used to model an application and generally include navigational,
relational, and hybrid models.
Spreadsheets – Spreadsheets are electronic simulations of paper accounting
worksheets for financial plans, budgets, etc. Personal computer and server based
spreadsheet programs [e.g. Microsoft Excel, Lotus 1-2-3, Open Office Calc.] can create
both proprietary files as well as software independent files including text or XML. Cloud-
based spreadsheets [e.g. Google Documents] include format export options such as .xls,
.csv, .txt, .ods, PDF and HTML files as well as import and conversion options for common
spreadsheet formats including .xls, .csv, and .ods.
Statistical Data – Statistical Data is the result of scientific quantitative research and
analyses. Statistical data formats contain collections of data presented in both tabular
and non-tabular form. Datasets are formatted as strings of characters contained within a
markup language [e.g. XML] or as software dependent proprietary files by commercial
statistical and qualitative data analysis software tools (e.g. SAS and SPSS).
Scientific data refers to research data collected by instrumentation tools during the
scientific process. Scientific data formats are either domain specific such as those used
within a single field of study [e.g. Flexible Image Transport System (FITS)] or are multi-
domain formats useful for transfer of scientific data between domains [e.g. Common Data
Format (CDF), HDF5].
10. Relevant Content Categories Definitions
Geospatial – Geospatial data includes files created by
geographic information systems (GIS) or other software
applications for spatial analysis using computer systems.
The data may be contained within a database to enable
analysis across the datasets (e.g. geo-database), united
within a complex file format structure where one geospatial
file is comprised of several distinct, but related, formats
(e.g. shapefile), or contained within a single file (e.g.
GML).
Computer Aided Design (CAD) and Vector Graphics–
Non-raster Vector graphics formats use mathematical
expressions to create and manipulate computer graphics
and animations. Computer Aided Design (CAD) are
vector programs used in engineering and manufacturing
design to create animations and represent three-
dimensional surfaces of inanimate objects. CAD and
Vector graphics programs can output binary and XML
11. Record Categories Held in
Systems
Geospatial Data
Geospatial
System Records
CADCAM System
CAD/CAM
Generated Records
Database Database System
Generated Records
NARA/ERA
12. Considerations*
What part(s) of the system represents the record?
Do we want to bring in the entire system?
Could ERA cope with the formats, file size, and/or
volume or files?
If we only want a subset or can only accept an export
then what is the “best” format for the electronic record
type in question?
What additional information should accompany the
data?
How should we validate and verify this data?
*These influence the transfer guidance but changes to existing work
processes are out of the scope of this project.
13. Goals
Provide clear, concise, and consistent direction to
agencies regarding formats that are acceptable
for use when transferring records to NARA.
Develop a flexible and extensible framework that
can adapt to future needs.
Balance preference for open formats with the
business needs of agencies and NARA.
Support digital continuity across the lifecycle of
electronic records.
National Archives and Records Administration
14. Stress Sustainability*
Disclosure: the degree to which complete specifications and technical
integrity tools exist.
Adoption: the degree to which the format is used by
creators, disseminators, or users.
Transparency: the degree to which the digital representation is open to
direct analysis with basic tools, including human readability using a text-
only editor.
Self-documentation: formats that contain all the metadata needed to
render the data as usable information.
External dependencies: refers to the degree to which a format
depends on particular hardware, operating system, or software for
rendering or use.
Impact of patents: Patents related to a digital format may inhibit the
ability of archival institutions to sustain content in that format.
Technical protection mechanisms: To preserve digital content and
provide service to users and designated communities decades
hence, NARA must be able to replicate the content on new
media, migrate and normalize it in the face of changing technology, and
disseminate it to researchers.
National Archives and Records Administration
*adapted from http://www.digitalpreservation.gov/formats/
15. Recognize That Formats …
Influence usability
Affect behavior and performance
Influence NARA’s capability to preserve
complex records like databases, video, and
GIS
National Archives and Records Administration
16. Concluding Thoughts
NARA should:
expand the types of formats that NARA accepts
balance the business requirements of agencies with
NARA’s preservation and access needs
minimize the need for agencies to transform records
prior to transfer
develop guidance across the lifecycle of electronic
records to support digital continuity
National Archives and Records Administration
17. Thank You!
Kevin L. De Vorsey
Supervisory Electronic Records Format Specialist
Electronic Records Format Section
Policy Analysis and Enforcement Division
Office of the Chief Records Officer
Agency Services
National Archive and Records Administration
kevin.devorsey@nara.gov