System Recovery Plan
System Recovery Plan
System Recovery Plan
Omega Research Inc. Security Categorization: High Information System Contingency Plan (ISCP)
Version 2.2 02/21/2013
Prepared by Josh Carpenter @ Red Beacon 1600 Pennsylvania Avenue Washington, DC 22122
Plan Approval As the designated authority for Omega Research Backup I Josh Carpenter hereby certify that the information system contingency plan (ISCP) is complete and that the information contained in this ISCP provides an accurate representation of the application, its hardware, software, and telecommunication components. I further certify that this document identifies the criticality of the system as it relates to the mission of the OMEGA RESEARCH and that the recovery strategies identified will provide the ability to recover the system functionality in the most expedient and costbeneficial method in keeping with its level of criticality. I further attest that this ISCP for Omega Research Backup will be tested at least annually. This plan was last tested on 12/21/2012 the test, training, and exercise (TT&E) material associated with this test can be found Appendix A. This document will be modified as changes occur and will remain under version control, in accordance with OMEGA RESEARCH contingency planning policy.
02/21/2013 Date
1.
Introduction
Information systems are vital to Omega Research business processes; therefore, it is critical that services provided by Omega Research Backup are able to operate effectively without excessive interruption. This Information System Contingency Plan (ISCP) establishes comprehensive procedures to recover Omega Research Backup quickly and effectively following a service disruption. 1.1 Background
This Omega Research Backup Information System (IS) Contingency Plan (CP) establishes procedures to recover Omega Research Backup following a disruption. The following recovery plan objectives have been established: Maximize the effectiveness of contingency operations through an established plan that consists of the following phases: Activation and Notification phase to activate the plan and determine the extent of
damage; Recovery phase to restore Omega Research Backup operations; and Reconstitution phase to ensure that Omega Research Backup is validated through testing and that normal operations are resumed.
Identify the activities, resources, and procedures to carry out Omega Research Backup processing requirements during prolonged interruptions to normal operations.
Assign responsibilities to designated OMEGA RESEARCH personnel and provide guidance for recovering Omega Research Backup during prolonged periods of interruption to normal operations.
Ensure coordination with other personnel responsible for OMEGA RESEARCH contingency planning strategies. Ensure coordination with external points of contact and vendors associated with Omega Research Backup and execution of this plan.
1.2
Scope
This ISCP has been developed for Omega Research Backup which is classified as a High-Impact system, in accordance with Federal Information Processing Standards (FIPS) 199 Standards for Security Categorization of Federal Information and Information Systems. Procedures in this ISCP are for High- Impact systems and designed to recover Omega Research Backup within 72 This plan does not address replacement or purchase of new equipment, short-term disruptions lasting less than 48 or loss of data at the onsite facility or at the user-desktop levels. 1.3 Assumptions
The following assumptions were used when developing this ISCP: Omega Research Backup has been established as a High-Impact System, in accordance with
FIPS 199. Alternate processing sites and offsite storage are required and have been established for this system. Current backups of the system software and data are intact and available at the offsite storage facility in Denver, CO Alternate facilities have been established at Denver, CO and are available if needed for relocation of Omega Research Backup The Omega Research Backup is inoperable at the OMEGA RESEARCH computer center and cannot be recovered within 48 hours Key Omega Research Backup personnel have been identified and trained in their emergency response and recovery roles; they are available to activate the Omega Research Backup Contingency Plan.
Overall recovery and continuity of business operations. The Business Continuity Plan (BCP) and Continuity of Operations Plan (COOP) address continuity of business operations.
Emergency evacuation of personnel. The Occupant Emergency Plan (OEP) addresses employee evacuation.
Emergency Contact list for all non-essential personnel- The list of all employee's not associated with any recovery efforts.
2 . Concept of Operations The Concept of Operations section provides details about Omega Research Backup an overview of the three phases of the ISCP (Activation and Notification, Recovery, and Reconstitution), and a description of roles and responsibilities of Omega Research personnel during a contingency activation. 2.1 System Description
2.2.
Recovery Phase There are common sequences and options to recovery activities; you've heard of or have experience with these, I am sure, and we touched on a number of them last week during our discussion of Recovery Strategies.
1. Facility Recovery - Hot, Warm, Cold Site Transitions 2. Mirror Sites or TPS Redundant Sites 3. Mobile Sites 4. Activation of Mutual Aid Agreements/Reciprocal Agreements 5. Service/Utility Restoration 6. Multiple Processing Centers
7. Data Recovery and Fault Tolerance; RAID and Backups 8. Basic User Services Recovery 9. Basic Telecom Services Recovery 10. Resumption of critical processes outlined and prioritized in BIA
This is the substance of the DRP. The DRP would spell out, in terms of policy, procedures, and work instructions, how to activate one or any of these Recovery Strategies. Typically, the declaration of a DR event specifies which DRP to execute, which already calls for a particular RS to be engaged, which would point you to one of the aforementioned options here. Reconstitution Phase
1. Cleanup activities and damage assessment 2. Insurance policy research 3. Reactivation of alarm systems 4. Coordinate repairs
Re-entry Phase
1. Restoring primary site to normal operating conditions 2. Implement and test facility, user, data, and telecommunications
Recovery Strategies
3.
4.
(Toigo, 2003 ) Activation and Notification Phase Activation of the ISCP occurs after a disruption or outage that may reasonably extend beyond the RTO established for a system. The outage event may result in severe damage to the facility that houses the system, severe damage or loss of equipment, or other damage that typically results in long-term loss. Once the ISCP is activated, system owners and users are notified of a possible long-term outage, and a thorough outage assessment is performed for the system. Information from the outage assessment is
presented to system owners and may be used to modify recovery procedures specific to the cause of the outage. Recovery Phase The Recovery phase details the activities and procedures for recovery of the affected system. Activities and procedures are written at a level that an appropriately skilled technician can recover the system without intimate system knowledge. This phase includes notification and awareness escalation procedures for communication of recovery status to system owners and users. Reconstitution Phase The Reconstitution phase defines the actions taken to test and validate system capability and functionality. This phase consists of two major activities: validating successful recovery and Omega Research activation of the plan. During validation, the system is tested and validated as operational prior to returning operation to its normal state. Validation procedures may include functionality or regression testing, concurrent processing, and/or data validation. The system is declared recovered and operational by system owners upon successful completion of validation testing. Omega Research activation includes activities to notify users of system operational status. This phase also addresses recovery effort documentation, activity log finalization, incorporation of lessons learned into plan updates, and readying resources for any future recovery events. 2.3 Roles and Responsibilities
Leadership roles should include an ISCP Director, who or awareness communications, and establish coordination with other recovery teams as appropriate:
ISCP Director: has overall management responsibility for the plan, and an ISCP Coordinator, who is
responsibility for the plan, and an ISCP Coordinator, who is responsible to oversee recovery effort progress, initiate any needed escalations
ISCP Coordinator: Coordinate the tasks designated by the Director.
Recovery Team Team Lead: Begin on the on-scene tasks. Will report to the ISCP Coordinator
Recovery Team Team Member: Lowest member and will do most of the hands on work and report
The Activation and Notification Phase defines initial actions taken once a Omega Research Backup disruption has been detected or appears to be imminent. This phase includes activities to notify recovery personnel, conduct an outage assessment, and activate the ISCP. At the completion of the Activation and Notification Phase, Omega Research Backup ISCP staff will be prepared to perform recovery measures to restore system functions.
3.1
The Omega Research Backup ISCP may be activated if one or more of the following criteria are met: 1. The type of outage indicates Omega Research Backup will be down for more than 72 hours 2. The facility housing Omega Research Backup is damaged and may not be available within 72 hours and 3.2 Notification The first step upon activation of the Omega Research Backup ISCP is notification of appropriate business and system support personnel. Contact information for appropriate POCs is included in Contact List Appendix. For Omega Research Backup , the following method and procedure for notifications are used:
All notifications will be done by phone. Getting in touch with the people on the list is imperative to get the BIA & DRP rolling. If nobody is called then nobody will know to come back to the company to begin the recovery process. After this if phones don't work you can try email. If all else fails you can try smoke signals.
Following notification, a thorough outage assessment is necessary to determine the extent of the disruption, any damage, and expected recovery time. This outage assessment is conducted by OMEGA RESEARCH Incident Response Team. Assessment results are provided to the ISCP Coordinator to assist in the coordination of the recovery of Omega Research Backup . 4 . Recovery
The Recovery Phase provides formal recovery operations that begin after the ISCP has been activated, outage assessments have been completed (if possible), personnel have been notified, and appropriate teams have been mobilized. Recovery Phase activities focus on implementing recovery strategies to restore system capabilities, repair damage, and resume operational capabilities at the original or new permanent location. At the completion of the Recovery Phase, Omega Research Backup will be functional and capable of performing the functions identified in the plan. 4.1 Sequence of Recovery Activities
The following activities occur during recovery of Omega Research Backup : 1. Identify recovery location (if not at original location); 2. Identify required resources to perform recovery procedures; 3. Retrieve backup and system installation media; 4. Recover hardware and operating system (if required); and 5. Recover system from backup and system installation media. 4.2 Recovery Procedures
The following procedures are provided for recovery of Omega Research Backup at the original or established alternate location. Recovery procedures are outlined per team and should be executed in the sequence presented to maintain an efficient recovery effort.
4.3
Escalation Notices/Awareness Call the first person on the list. Call the second person on the list Call the third person on the list
Call the fourth person on the list Call the fifth person on the list Call the sixth person on the list.
5.0 Reconstitution Reconstitution is the process by which a recovered system is tested to validate system capability and functionality. During Reconstitution, recovery activities are completed and normal system operations are resumed. If the original facility is unrecoverable, the activities in this phase can also be applied to preparing a new permanent location to support system processing requirements. This phase consists of two major activities validating successful recovery and Omega Research activation of the plan. 5.1 Concurrent Processing
In concurrent processing, a system operates at two separate locations concurrently until there is a level of assurance that the recovered system is operating correctly. Omega Research Backup does not have concurrent processing as part of validation. Once the system has been tested and validated, it will be placed into normal operations. 5.2 Validation Data Testing
Verify the last time a transaction was completely processed. If the backup files shows the same time as the power was lost then the backup logs will be considered as close as they can get to being complete. If an order was not completed then we will try and contact all customers and vendors by going off memory of the sale person doing the sales as this is the only way to get the order back as no records are kept for halfway done orders. 5.3 Validation Functionality Testing Log on and off a few times to make sure the system is starting the load process completely Conduct simple transactions
Send print jobs to printers send test emails Connect to internet Connect to wireless Surf the internet
10
5.4
Upon successfully completing testing and validation, the CEO will formally declare recovery efforts complete, and that Omega Research Backup is in normal operations. Omega Research Backup business and technical POCs will be notified of the declaration by the ISCP Coordinator. 5.5 Notifications (users)
Upon return to normal system operations, Omega Research Backup users will be notified by the Sr. Digital Forensics Examiner using email to your Firebird account and then also to your OMEGA RESEARCH issued mobile device.
5.6
Cleanup
Clear debris from working area Sweep area Remove hanging Debris Pick-up knock over furniture Open windows to allow fresh air Check restrooms 5.7 Offsite Data Storage
It is important that all backup and installation media used during recovery be returned to the offsite data storage location. The following procedures should be followed to return backup and installation media to its offsite data storage location.
11
All equipment will be brought back to the main site by either one of two ways. Using local UPS/FEdEX to conduct the shipping. All packages will be sent as soon as possible and all packages must have a tracking number. or Equipment can be brought onto the plane and stowed with the luggage. A pad lock of some sorts will need to be placed on the box to prevent theft or damage to the equipment and this includes the recovery drives 5.8 Data Backup
As soon as reasonable following recovery, the system should be fully backed up and a new copy of the current operational system stored for future recovery efforts. This full backup is then kept with other system backups. The procedures for conducting a full system backup are: 1. Turn computer on 2. wait till loads 3. press F12 4. Network Boot 5. Wait till splash screen appears 6. sign in with root 7. go to Intella Icon, double click 8. Follow default instructions 9. Map to "X" Drive 10. Go to ECA icon, hit start. Allow everything to load. 11. Double check systems 12. System should be back to normal 5.9 Event Documentation
It is important that all recovery events be well-documented, including actions taken and problems encountered during the recovery effort, and lessons learned for inclusion and update to this ISCP. It is the responsibility of each recovery team or person to document their actions during the recovery effort, and to provide that documentation to the ISCP Coordinator.
12
Provide details about the types of information each recovery team member is required to provide or collect for updating the ISCP with lessons learned. Types of documentation that should be generated and collected after a contingency activation include: Activity logs (including recovery steps performed and by whom, the time the steps were initiated and completed, and any problems or concerns encountered while executing activities) Functionality and data testing results Conduct Test and then start report and after results test / report finding
5.10 Omega Research Activation Once all activities have been completed and documentation has been updated, the ISCP Director will formally Omega Research activate the ISCP recovery effort. Notification of this declaration will be provided to all business and technical POCs.CONTACT LIST
Omega Research Backup ISCP Key Personnel
Key Personnel ISCP Director Josh Carpenter 100 Acme Street Washington, DC 22105 ISCP Director Alternate Bob Dole 200 Acme Street 600 Denver St. Denver, CO 802222 Work Home Cellular Email Work Home Cellular Email
13
ISCP Coordinator Dave Chappelle 10 Comedian Circle San Diego, CA 98678 ISCP Coordinator Alternate Brittney Spears 100 Hollywood Blvd Hollywood, CA 90210 Recovery Team Team Lead Clint Eastwood 500 Make my Day road Smith & Wesson, Kansas 70145 Recovery Team Team Member Oprah Winfrey 61 Talkshow Dr Chicago, IL 60081
Work 918-887-3241 Home 918-887-6698 Cellular 918-554-6658 Email Dave@OmegaResearch.gov Omega Research ISCP Key Personnel Work 412-688-9514 Home Cellular Email Work Home Cellular Email Work Home Cellular Email 412-541-8812 456-981-6341 Brittney@OmegaResearch.gov 412-736-5541 Same as Cell 552-335-6674 Clint@OmegaResearch.gov Same as cell Same as cell 843-844-4444 Oprah@OmegaResearch.gov
SunGard
Don Meltin (Test Coord.) Jack Fabrianni (Acct. Rep) Lincoln Balducci (Resource Coord.)
After entire team has assembled at the designated meeting spot, the entire group will enter the building and begin making preparations to start the recovery. The following steps are after the building has been entered: Verify whether machines are on. Or see where the disconnect is. If the entire build ing
has no power then it's assumed that either the machines are off or the UPS has kicked on and the machines will be in the same state that they were last seen. Find new power sources. Power cords, strips may taken from where it sees fit. First taking
14
After power has been restored begin powering up equipment as needed. Not all
equipment needs to be powered up at once. Things like printers, individual workstations can be last on the list and/or powered up at one time. Start looking for any issues with starting up the severs. Recovery of data from tape and audit logs; and
15
APPENDIX D ALTERNATE PROCESSING PROCEDURES During this phase actual testing takes place. There are two main methods of testing unit and integration testing. The first one concentrates on testing a single backup station (data recovery software, backup drives) at a time, while the second one tests the workings of the whole system. This may seem like an unnecessary distinction, but it's very important first test will tell you things such as whether a particular server's tape drive works properly, and integration testing will show whether you can successfully take a tape that this drive wrote, move it to another machine, and use data recovery software there to rebuild your information on a different machine. Once the system has been recovered, the following steps will be performed to validate system data and functionality:
Procedure
Expected Results
At the Command Prompt, type in sysname Log in as user testuser, using password testpass From Menu - select 5Generate Report - Select Current Date Report - Select Weekly - Select To Screen - Select Close - Select Return to Main Menu - Select Log-Off
System Log-in Screen appears Initial Screen with Main Menu shows Report Generation Screen shows Report is generated on screen with last successful transaction included Report Generation Screen Shows Initial Screen with Main Menu shows Log-in Screen appears
Same Same
Yes Yes
Brittney Brittney
No No No
16
APPENDIX F ALTERNATE STORAGE, SITE, AND TELECOMMUNICATIONS Alternate Storage: Denver, CO. 1,836.2 miles from HQ in Herndon, VA The building is leased from a third party that maintains the facility John Doe. Head Facilities Manager. Denver, CO 303-254-9178 Normal UPS & FEDEX delivery times All backups recovered from the damaged site will need to be brought or mailed in a secure packaged to the site. All other backups can be downloaded from the SAN. the back up and running. The alternate site has the same type of secure access doors, offices, gates that the HQ does. Your access cards will work at the alternate site. If they do not Please contact the Facilities manager Any large scale disasters shall be taken upon as they come. If personnel cannot get to HQ or to the alternate site due to various reasons then all efforts should be made to address the current situations the employees are in until they feel it is safe enough for them to come back to work. Types of data located at alternate storage site, including databases, application software, operating systems, and other critical information system software; and Alternate Processing Site: Denver, CO Third party Bob Dole, ISCP Alternate ISCP
303-288-1199 508-692-6855 Bob@OmegaResearch.gov
All personnel assigned to the Omega Research Incident Response Team is allowed access to systems and the tools as well as nay backups needs to bring the system
Use of CAC card to enter through gate. To enter room the CAC card will need to be scanned. See Appendix C
17
It's a warm site so some equipment will need to be brought but most equipment will be there. Workstations, desks, chairs, phones, server racks and empty slots for backup drives.
Open layout office with adjustable cubicles with server farm near the stairwell on 2nd floor. None If you can't get to the backup site seeing as it's in a different state then there are larger issues at hand which need to be resolved first. Safety first, then work comes next.
Alternate Telecommunications: None. If phones are down at one place then they will be down or clogged until the disaster is
18
APPENDIX J: TEST AND MAINTENANCE SCHEDULE A formal test plan is developed prior to the functional test, and test procedures are developed to include key sections of the ISCP, including the following: Notification Procedures; System recovery and an alternate platform from backup media; Internal and external connectivity; and Restoration to normal operations.
NOTE: Full functional tests of systems normally are failover tests to the alternate locations, and may be very disruptive to system operations if not planned well. Other systems located in the same physical location may be affected by or included in the full functional test. It is highly recommended that several functional tests be conducted and evaluated prior to conducting a full functional (failover) test.
Recovery of a server or database from backup media; and Setup and processing from a server at an alternate location.
19
Step Identify failover test facilitator. Determine scope of failover test (include other systems?). Develop failover test plan. Invite participants. Conduct functional test.
Responsible Party ISCP Coordinator ISCP Coordinator, Test Facilitator Test Facilitator Test Facilitator Test Facilitator, ISCP Coordinator,
Date Scheduled
12/05/2012 12/07/2012 01/28/2012 03/11/2012 06/07/2012
Date Held
12/05/12 12/08/2012 01/28/2012 03/11/2012 06/07/2012
Finalize after action report and lessons learned. Update ISCP based on lessons learned. Approve and distribute updated version of ISCP.
20