Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Electronic Records Transfer
         Guidance
          at NARA
      Kevin De Vorsey
  kevin.devorsey@nara.gov
Current Guidance (Developed Between 2002-2004)




Reflect NARA’s
capabilities at the time.
                 National Archives and Records Administration
Current Guidance: limited scope that does not
address all format types




               National Archives and Records Administration
Current Guidance Products
 Demonstrate a preference for open, standards-
    based formats.
   Require that agencies transform or normalize data
    into acceptable formats prior to transfer*.
   Have proven an obstacle to the steady transfer of
    records.
   Are referred to at many points of the lifecycle.
   Cannot adapt to records with different retention
    periods.

                  National Archives and Records Administration
Project Scope
 In scope:
    This project seeks to support the work of federal agencies by
     providing flexible and realistic electronic records file format
     guidance on all electronic records types for use when transferring
     permanent records to NARA in accordance with the Federal
     Records Act.
    This project will identify and recommend changes but the
     execution of any additional guidance including guidance for all
     types of metadata as well as the revision of business
     processes, or development of standard operating procedures is
     beyond the scope of this project.

 Out of scope:
   Format guidance for other areas of the record lifecycle other than
    transfer to NARA
   Guidance on physical media
   Records of the Executive Office of the President and the records
    of the United States Congress. These branches are not covered
    by the Federal Records Act and are therefore excluded from
    consideration for this project
Project Phases
 Phase 1: Planning and Preparation
   July 1– December 30, 2011
 Phase 2: Conduct Informational Meetings
   February 6– August 12, 2012
   Internal NARA SMEs
   Future Perfect Conference
   Agency representatives
 Phase 3: Develop and Publish Guidance Product
   May 29 – September 7, 2012
 Phase 4: Evaluation and Completion
   December 12 - December 21, 2012
Electronic Records Lifecycle
              Migration       Decommissioning


                                                                            Processing
                      Maintenance
                                                                 Transfer
                                                                                                                 Transformation
                                                                                Preservation & Access Planning
System                                    Transfer Planning
design/planning




                                          Destruction Planning
        Record Creation

                                                                                                           Access
                                                                                     Preservation/
                                                                       Ingest
                          Scheduling/Appraisal                                       Maintenance


                                                                                            Regular
                                                                                                                    Public
                                                                                            Requests
                                                                                            (FOIA, etc.)




                                             National Archives and Records Administration
Revised Guidance Content Categories

  Electronic Textual Records
  Digital Still Images
  Digital Audio Records
  Digital Moving Image Records
  Structured Data
  Geospatial Records
  CAD and Vector Graphics
  Web Records
  E-mail Records


               National Archives and Records Administration
Relevant Content Categories Definitions
     Structured Data – includes the broad category of data that is stored in defined fields and
      includes:
        Databases – Database formats are organized collections of associated data that conform
          to a logical structure. Database formats are determined by “data models” that describe
          specific data structures used to model an application and generally include navigational,
          relational, and hybrid models.
        Spreadsheets – Spreadsheets are electronic simulations of paper accounting
          worksheets for financial plans, budgets, etc. Personal computer and server based
          spreadsheet programs [e.g. Microsoft Excel, Lotus 1-2-3, Open Office Calc.] can create
          both proprietary files as well as software independent files including text or XML. Cloud-
          based spreadsheets [e.g. Google Documents] include format export options such as .xls,
          .csv, .txt, .ods, PDF and HTML files as well as import and conversion options for common
          spreadsheet formats including .xls, .csv, and .ods.
        Statistical Data – Statistical Data is the result of scientific quantitative research and
          analyses. Statistical data formats contain collections of data presented in both tabular
          and non-tabular form. Datasets are formatted as strings of characters contained within a
          markup language [e.g. XML] or as software dependent proprietary files by commercial
          statistical and qualitative data analysis software tools (e.g. SAS and SPSS).
        Scientific data refers to research data collected by instrumentation tools during the
          scientific process. Scientific data formats are either domain specific such as those used
          within a single field of study [e.g. Flexible Image Transport System (FITS)] or are multi-
          domain formats useful for transfer of scientific data between domains [e.g. Common Data
          Format (CDF), HDF5].
Relevant Content Categories Definitions
  Geospatial – Geospatial data includes files created by
   geographic information systems (GIS) or other software
   applications for spatial analysis using computer systems.
   The data may be contained within a database to enable
   analysis across the datasets (e.g. geo-database), united
   within a complex file format structure where one geospatial
   file is comprised of several distinct, but related, formats
   (e.g. shapefile), or contained within a single file (e.g.
   GML).
  Computer Aided Design (CAD) and Vector Graphics–
   Non-raster Vector graphics formats use mathematical
   expressions to create and manipulate computer graphics
   and animations. Computer Aided Design (CAD) are
   vector programs used in engineering and manufacturing
   design to create animations and represent three-
   dimensional surfaces of inanimate objects. CAD and
   Vector graphics programs can output binary and XML
Record Categories Held in
Systems

                                                    Geospatial Data
                                 Geospatial
                                                    System Records
CADCAM System
                    CAD/CAM
Generated Records


                              Database        Database System
                                              Generated Records




                       NARA/ERA
Considerations*
 What part(s) of the system represents the record?
 Do we want to bring in the entire system?
 Could ERA cope with the formats, file size, and/or
  volume or files?
 If we only want a subset or can only accept an export
  then what is the “best” format for the electronic record
  type in question?
 What additional information should accompany the
  data?
 How should we validate and verify this data?


*These influence the transfer guidance but changes to existing work
  processes are out of the scope of this project.
Goals
  Provide clear, concise, and consistent direction to
   agencies regarding formats that are acceptable
   for use when transferring records to NARA.
  Develop a flexible and extensible framework that
   can adapt to future needs.
  Balance preference for open formats with the
   business needs of agencies and NARA.
  Support digital continuity across the lifecycle of
   electronic records.



              National Archives and Records Administration
Stress Sustainability*
   Disclosure: the degree to which complete specifications and technical
      integrity tools exist.
     Adoption: the degree to which the format is used by
      creators, disseminators, or users.
     Transparency: the degree to which the digital representation is open to
      direct analysis with basic tools, including human readability using a text-
      only editor.
     Self-documentation: formats that contain all the metadata needed to
      render the data as usable information.
     External dependencies: refers to the degree to which a format
      depends on particular hardware, operating system, or software for
      rendering or use.
     Impact of patents: Patents related to a digital format may inhibit the
      ability of archival institutions to sustain content in that format.
     Technical protection mechanisms: To preserve digital content and
      provide service to users and designated communities decades
      hence, NARA must be able to replicate the content on new
      media, migrate and normalize it in the face of changing technology, and
      disseminate it to researchers.

                       National Archives and Records Administration
  *adapted from http://www.digitalpreservation.gov/formats/
Recognize That Formats …

   Influence usability
   Affect behavior and performance
   Influence NARA’s capability to preserve
   complex records like databases, video, and
   GIS




            National Archives and Records Administration
Concluding Thoughts
 NARA should:
  expand the types of formats that NARA accepts
  balance the business requirements of agencies with
   NARA’s preservation and access needs
  minimize the need for agencies to transform records
   prior to transfer
  develop guidance across the lifecycle of electronic
   records to support digital continuity




                 National Archives and Records Administration
Thank You!

               Kevin L. De Vorsey
Supervisory Electronic Records Format Specialist
       Electronic Records Format Section
   Policy Analysis and Enforcement Division
       Office of the Chief Records Officer
                 Agency Services
  National Archive and Records Administration
       kevin.devorsey@nara.gov

More Related Content

Kevin De Vorsey Past is Prologue

  • 1. Electronic Records Transfer Guidance at NARA Kevin De Vorsey kevin.devorsey@nara.gov
  • 2. Current Guidance (Developed Between 2002-2004) Reflect NARA’s capabilities at the time. National Archives and Records Administration
  • 3. Current Guidance: limited scope that does not address all format types National Archives and Records Administration
  • 4. Current Guidance Products  Demonstrate a preference for open, standards- based formats.  Require that agencies transform or normalize data into acceptable formats prior to transfer*.  Have proven an obstacle to the steady transfer of records.  Are referred to at many points of the lifecycle.  Cannot adapt to records with different retention periods. National Archives and Records Administration
  • 5. Project Scope  In scope:  This project seeks to support the work of federal agencies by providing flexible and realistic electronic records file format guidance on all electronic records types for use when transferring permanent records to NARA in accordance with the Federal Records Act.  This project will identify and recommend changes but the execution of any additional guidance including guidance for all types of metadata as well as the revision of business processes, or development of standard operating procedures is beyond the scope of this project.  Out of scope:  Format guidance for other areas of the record lifecycle other than transfer to NARA  Guidance on physical media  Records of the Executive Office of the President and the records of the United States Congress. These branches are not covered by the Federal Records Act and are therefore excluded from consideration for this project
  • 6. Project Phases  Phase 1: Planning and Preparation  July 1– December 30, 2011  Phase 2: Conduct Informational Meetings  February 6– August 12, 2012  Internal NARA SMEs  Future Perfect Conference  Agency representatives  Phase 3: Develop and Publish Guidance Product  May 29 – September 7, 2012  Phase 4: Evaluation and Completion  December 12 - December 21, 2012
  • 7. Electronic Records Lifecycle Migration Decommissioning Processing Maintenance Transfer Transformation Preservation & Access Planning System Transfer Planning design/planning Destruction Planning Record Creation Access Preservation/ Ingest Scheduling/Appraisal Maintenance Regular Public Requests (FOIA, etc.) National Archives and Records Administration
  • 8. Revised Guidance Content Categories  Electronic Textual Records  Digital Still Images  Digital Audio Records  Digital Moving Image Records  Structured Data  Geospatial Records  CAD and Vector Graphics  Web Records  E-mail Records National Archives and Records Administration
  • 9. Relevant Content Categories Definitions  Structured Data – includes the broad category of data that is stored in defined fields and includes:  Databases – Database formats are organized collections of associated data that conform to a logical structure. Database formats are determined by “data models” that describe specific data structures used to model an application and generally include navigational, relational, and hybrid models.  Spreadsheets – Spreadsheets are electronic simulations of paper accounting worksheets for financial plans, budgets, etc. Personal computer and server based spreadsheet programs [e.g. Microsoft Excel, Lotus 1-2-3, Open Office Calc.] can create both proprietary files as well as software independent files including text or XML. Cloud- based spreadsheets [e.g. Google Documents] include format export options such as .xls, .csv, .txt, .ods, PDF and HTML files as well as import and conversion options for common spreadsheet formats including .xls, .csv, and .ods.  Statistical Data – Statistical Data is the result of scientific quantitative research and analyses. Statistical data formats contain collections of data presented in both tabular and non-tabular form. Datasets are formatted as strings of characters contained within a markup language [e.g. XML] or as software dependent proprietary files by commercial statistical and qualitative data analysis software tools (e.g. SAS and SPSS).  Scientific data refers to research data collected by instrumentation tools during the scientific process. Scientific data formats are either domain specific such as those used within a single field of study [e.g. Flexible Image Transport System (FITS)] or are multi- domain formats useful for transfer of scientific data between domains [e.g. Common Data Format (CDF), HDF5].
  • 10. Relevant Content Categories Definitions  Geospatial – Geospatial data includes files created by geographic information systems (GIS) or other software applications for spatial analysis using computer systems. The data may be contained within a database to enable analysis across the datasets (e.g. geo-database), united within a complex file format structure where one geospatial file is comprised of several distinct, but related, formats (e.g. shapefile), or contained within a single file (e.g. GML).  Computer Aided Design (CAD) and Vector Graphics– Non-raster Vector graphics formats use mathematical expressions to create and manipulate computer graphics and animations. Computer Aided Design (CAD) are vector programs used in engineering and manufacturing design to create animations and represent three- dimensional surfaces of inanimate objects. CAD and Vector graphics programs can output binary and XML
  • 11. Record Categories Held in Systems Geospatial Data Geospatial System Records CADCAM System CAD/CAM Generated Records Database Database System Generated Records NARA/ERA
  • 12. Considerations*  What part(s) of the system represents the record?  Do we want to bring in the entire system?  Could ERA cope with the formats, file size, and/or volume or files?  If we only want a subset or can only accept an export then what is the “best” format for the electronic record type in question?  What additional information should accompany the data?  How should we validate and verify this data? *These influence the transfer guidance but changes to existing work processes are out of the scope of this project.
  • 13. Goals  Provide clear, concise, and consistent direction to agencies regarding formats that are acceptable for use when transferring records to NARA.  Develop a flexible and extensible framework that can adapt to future needs.  Balance preference for open formats with the business needs of agencies and NARA.  Support digital continuity across the lifecycle of electronic records. National Archives and Records Administration
  • 14. Stress Sustainability*  Disclosure: the degree to which complete specifications and technical integrity tools exist.  Adoption: the degree to which the format is used by creators, disseminators, or users.  Transparency: the degree to which the digital representation is open to direct analysis with basic tools, including human readability using a text- only editor.  Self-documentation: formats that contain all the metadata needed to render the data as usable information.  External dependencies: refers to the degree to which a format depends on particular hardware, operating system, or software for rendering or use.  Impact of patents: Patents related to a digital format may inhibit the ability of archival institutions to sustain content in that format.  Technical protection mechanisms: To preserve digital content and provide service to users and designated communities decades hence, NARA must be able to replicate the content on new media, migrate and normalize it in the face of changing technology, and disseminate it to researchers. National Archives and Records Administration *adapted from http://www.digitalpreservation.gov/formats/
  • 15. Recognize That Formats …  Influence usability  Affect behavior and performance  Influence NARA’s capability to preserve complex records like databases, video, and GIS National Archives and Records Administration
  • 16. Concluding Thoughts  NARA should:  expand the types of formats that NARA accepts  balance the business requirements of agencies with NARA’s preservation and access needs  minimize the need for agencies to transform records prior to transfer  develop guidance across the lifecycle of electronic records to support digital continuity National Archives and Records Administration
  • 17. Thank You! Kevin L. De Vorsey Supervisory Electronic Records Format Specialist Electronic Records Format Section Policy Analysis and Enforcement Division Office of the Chief Records Officer Agency Services National Archive and Records Administration kevin.devorsey@nara.gov