R A C B R
R A C B R
R A C B R
Using third party tools to perform backup and restore operations All of these options have their good and bad qualities:
EXPORT
A database export is a logical copy of the structure and data contained in an Oracle database. You cannot apply archive log information against a database recovered using the import of a export file. This means that a export is a point-in-time recovery of a database. In this way, an export is like a cold backup of a database that is not in archivelog mode. Exports are useful in that they allow easy restoration of tables and other structures instead of having to bring back entire tablespaces as you would in most other forms of backup and recovery. The import process can also be used to rebuild tables and indexes into more optimal configurations or to place data into new locations. Another benefit is that exports are capable of being copied across platforms, for example, an archive from a WIN2K server can be copied to a Solaris server and applied there.
Paper 215
The drawbacks to exports are that they take a great deal of time to generate (depending on database size), they can only be performed against a running database, and they take a long time to recover (again based on database size). In some versions of Oracle there are also file size limitations.
The connection string (in this case ault2) can only apply to a single instance so the entry in the tnsnames.ora for the ault2 connection would be:
ault2 = (DESCRIPTION = (ADDRESS_LIST = (LOAD_BALANCE = OFF) (FAILOVER = ON) (ADDRESS = (PROTOCOL = TCP)(HOST = aultlinux2)(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = ault) (INSTANCE_NAME = ault2) )
For RAC there is a special requirement if the instances are using archive logs, a channel connection must be specified for each instance and must resolve to only one instance, for example using our ault1 and ault2 instances from our previous example:
CONFIGURE CONFIGURE CONFIGURE CONFIGURE DEFAULT DEVICE TYPE TO sbt; DEVICE TYPE TO sbt PARALLELISM 2; CHANNEL 1 DEVICE TYPE sbt CONNECT = 'SYS/kr87m@ault1'; CHANNEL 2 DEVICE TYPE sbt CONNECT = 'SYS/kr87m@ault2';
This configuration only has to be specified once for a RAC environment and only should be changed if nodes are added or removed from the RAC configuration. In this way it is known as a "persistent" configuration and need never be changed for the life of your RAC environment. One requirement on this configuration for RAC is that each of the nodes specified must be
Paper 215
open (the database operational) or all must be closed (the database shutdown.) If even one of the instances specified is not in the same state as the others, the backup will fail. RMAN is also aware of the node affinity of the various database files and uses the node that has the greatest access to backup the datafiles that the instance has greatest affinity for. Node affinity can however be overridden with manual commands:
BACKUP #Channel 1 gets datafiles 1,2,3 (DATAFILE 1,2,3 CHANNEL ORA_SBT_TAPE_1) #Channel 2 gets datafiles 4,5,6,7 (DATAFILE 4,5,6,7 CHANNEL ORA_SBT_TAPE2)
The nodes chosen to backup and Oracle RAC cluster must have the ability to see all of the files that require backup. For example:
BACKUP DATABASE PLUS ARCHIVELOG;
Requires that the nodes specified have access to all archive logs generated by all instances. This requires some special considerations when configuring the Oracle RAC environment. The essential steps for using RMAN in Oracle RAC are: 1. Configure the snapshot control file location 2. Configure the control file autobackup feature 3. Configure the archiving scheme 4. Change the archivemode of the database (optional) 5. Monitor the archiver process First, let's look at the snapshot control file location configuration.
To change the default location, for example, to '/u04/backup/ault_rac/snap_ault.cf' the command would be:
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u04/backup/ault_rac/snap_ault.cf'
Note that this is only specified to a single location which requires each node to have a '/u04/backup/ault_rac' directory in our configuration for this example. Note that like the channel specifications shown in the previous section, this is also a persistent global specification. The control file can be automatically backed up during each backup operation by specification of the command:
CONFIGURE CONTROLFILE AUTOBACKUP ON;
This automatic control file backup is also a persistent setting and need never be specified again. By using the automatic control file backup RMAN can restore the control file even after loss of the recovery catalog and the current control files.
Paper 215
Then a duplicate file must be placed on any node that performs backup operations in a directory names the same. Only one node performs backup operations in a RAC environment, that node must have access to all archived logs. Another consideration it the setting of the parameter LOG_ARCHIVE_FORMAT, it must be specified identically on all nodes and should include the instance number and redo log thread number. How archive log locations are configured depends on the type of filesystems you use for your RAC environment. The filesystems available for RAC are: OCFS based filesystems RAW (non-OCFS filesystems) An OCFS based filesystem setup allows any node to read the archive logs from any other node, in fact all logs can be placed in a centralized area. This is the preferred setup for Oracle RAC. If node ault1 writes a log to /u01/backup/archives/ault_rac, any other RAC instance in an OCFS setup could see the log. This is demonstrated in Figure 1.
In a RAW filesystem setup the shared drives are configured into raw partitions, only a single file can be written to a specific raw device, thus raw devices must not be used for archive logs. This means that for a RAC environment that uses a raw configuration the archive logs are written to server-local disks that are not shared. Unless special configuration options such as NFS are used, no other instances can see the archive logs for any other in a raw environment. This is demonstrated in Figure 2. Let's examine a technique for overcoming this problem in a raw filesystem environment.
Paper 215
To make all archive log files visible to all other nodes in a raw filesystem configuration, you must name the directories according to which node they service, in out ault1, ault2 configuration this would be: On node aultlinux1:
/usr/backup/ault_rac1/archives1
On node aultlinux2:
/usr/backp/ault_rac2/archives2
In order to make the archive logs available, the /usr/backup//ault_rac2/archives2 directory would be NFS mounted to the aultlinux1 node and the /usr/backup/ault_rac1/archives1 directory NFS mounted to the aultlinux2 node. To make things even easier, a periodic job could be scheduled that copies the archive log files form the NFS mounts into the local archive destination. By copying the archive logs from the NFS mounted directories into the local archive log location you ensure that even if you lose the NFS mount the database will be recoverable. This setup is illustrated in Figure 3.
A Unix utility known as fuser verifies if a file is open or closed. If an archive log is being written Unix will allow you to copy it, delete it or anything else. Therefore using fuser before copying an archive file will ensure all files are complete and usable for recovery. Any script you write on a Unix or Linux box for copying archive logs should use this command to verify a log is not being written before attempting to copy it. About the only advantage of the raw configuration is that if you have multiple tapes (one on each node) then the archive log portion of the backup can be parallelized.
Paper 215
If only a single tape drive is available then you must use the NFS mount scheme and if the NFS mount is lost, you may not be able to fully recover the database should any archive log be lost. The initialization parameters should be set similar to:
ault1.LOG_ARCHIVE_DEST_1="LOCATION=/u01/backup/ault_rac1" ault2.LOG_ARCHIVE_DEST_1="LOCATION=/u01/backup/ault_rac2"
By using the NFS mount scheme, either node can backup the archive logs in the other. On Linux the process to set up an NFS mounted drive is: 1. Configure the services such that the NFS services are running: NFS NFSLOCK
NETFS Figure 4 shows the Service Configuration GUI with these services checked.
2. Configure the NFS server on each node. Figure 5 shows the NFS Server GUI configuration screen for Linux.
Paper 215
The configuration for NFS mounts is performed by the root user. 3. On each RAC node, create the mount point directory exactly as it appears on each remote node, for example in our demo system, for the server aultlinux2 the command to create the mount directory for the archive directory on aultlinux1 would be:
$ mkdir /usr/backup/ault_rac1/archives1
4. On each RAC node use the mount command to mount the NFS drive(s) from the other nodes, using the mount directory we created in the previous step, for our example setup this would be:
$ mount aultlinux1:/usr/backup/ault_rac2/archives2 /usr/backup/ault_rac2/archives2
THE CONFIGURATION OF NFS MOUNT POINTS ON SOLARIS 9 WOULD BE DONE USING MANUAL COMMANDS:
1. Start the NFS server:
# /etc/init.d/nfs.server start
ACCESS aultsolaris1
TRANSPORT -
5. On the target box create the required mount directory: # mkdir /usr/backup/ault_rac1/archives1
On the target, mount the shared directory: # mount -F nfs -o rw aultsolaris1:/usr/backup/ault_rac1/archives1
Make sure the mount directory is created with the same owner and group as you want to have access to the directory tree that is cross-mounted. In Windows2000 you share the drives across the network to achieve the same capabilities. The LOG_ARCHIVE_FORMAT parameter determines the format of the archive logs that have been generated. The LOG_ARCHIVE_FORMAT parameter must be the same on all clustered nodes. The allowed format strings for the LOG_ARCHIVE_FORMAT are: %T -- Thread number, left-zero-padded so LOG_ARCHIVE_FORMAT = ault_%T would be ault_0001 %t -- Thread number, non-left-zero-padded so LOG_ARCHIVE_FORMAT = ault_%t would be ault_1 %S -- Log sequence number, left-zero-padded, so LOG_ARCHIVE_FORMAT = ault_%S would be ault_0000000001 %s -- Log sequence number, non-left-zero_padded, so LOG_ARCHIVE_FORMAT = ault_%s would be ault_1. The format strings can be combined to show both thread and sequence number. The %T or %t parameters are required for RAC archive logs. In order to perform a complete recovery a database, whether it is a normal database or a RAC database, must use archive logging. In order to turn on archive logging in RAC the following procedure should be used: 1. Shut down all instances associated with the RAC database. 2. Choose one instance in the RAC cluster, in its unique initialization parameter file set the CLUSTER_DATABASE parameter to FALSE. (Note: If you are using a server parameter file, then append "<sid.>" to the parameter.) 3. In the instance parameter file, set the LOG_ARCHIVE_DEST_n, LOG_ARCHIVE_FORMAT and LOG_ARCHIVE_START parameters, for example for our example instances:
ault1.LOG_ARCHIVE_DEST_1 = "LOCATION=/u01/backup/ault_rac1 MANDATORY" ault2.LOG_ARCHIVE_DEST_2 = "LOCATION=/u01/backup/ault_rac2 MANDATORY" LOG_ARCHIVE_FORMAT = ault_%T_%s LOG_ARCHIVE_START = TRUE
6. Shut down the instance. 7. Edit the instance initialization parameter file or server parameter file to reset CLUSTER_DATABASE to TRUE. 8. Restart the all the instances.
To be absolutely sure that only backed up archive logs are deleted, use the DELETE command with a MAINTENANCE connection in RMAN. In our example with instances ault1 and ault2:
ALLOCATE CHANNEL FOR MAINTENANCE DEVICE TYPE DISK CONNECT 'SYS/kr87m@ault1'; DELETE ARCHIVELOG LIKE '%arc_dest_1%' BACKED UP 1 TIMES TO DEVICE TYPE sbt; RELEASE CHANNEL; ALLOCATE CHANNEL FOR MAINTENANCE DEVICE TYPE DISK CONNECT 'SYS/kr87m@ault2'; DELETE ARCHIVELOG LIKE '%arc_dest_1%' BACKED UP 1 TIMES TO DEVICE TYPE sbt; RELEASE CHANNEL;
Notice the BACKED UP 1 TIMES clause in the above commands. This tells RMAN not to delete the archive logs unless it has a record of them being backed up at least once. The '$arch_dest_1%' token tells what logs to remove and translates into the value specified for LOG_ARCHIVE_DEST_1 for the instance specified in the connection alias (example: @ault1). RMAN is capable of autolocation of files it needs to backup. RMAN, through the database synchronization and resync processes is aware of which files it needs to backup for each node. RMAN can only backup the files it has autolocated for each node on that node. During recovery, autolocation means that only the files backed up from a specific node will be written to that node.
Paper 215
The reason the backup set of commands is so simple in a CFS single tape drive backup is that in a CFS type system all nodes can see all drives in the CFS array, this allows any node to backup the entire instance.
Note: if you are using a device type of DISK substitute DISK for sbt and specify the path to the backup directory as a part of the CHANNEL configuration, for example:
CONFIGURE CHANNEL 1 TYPE disk FORMAT '/u01/backup/ault_rac1/b_%u_%p_%c' CONNECT 'sys/kr87m@ault1';
As we said before, this configuration only has to specified once unless something happens to your RMAN catalog. Once the configuration is set the command to perform the backup is fairly simple:
BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;
The above commands will only work if the backup node has read/write access to the archive log directories for all nodes in the RAC database cluster.
Note: if you are using a device type of DISK substitute DISK for sbt and specify the path to the backup directory as a part of the CHANNEL configuration, for example:
CONFIGURE CHANNEL 1 DEVICE TYPE disk FORMAT '/u01/backup/ault_rac1/b_%u_%p_%c' CONNECT 'sys/kr87m@ault1';
As we said before, this configuration only has to be specified once unless something happens to your RMAN catalog. Once the configuration is set the command to perform the backup is fairly simple:
BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;
Note that we use the DELETE INPUT clause in this case, this is because the process doing the backup is local to each node and has read/write access to the archive log directory.
EXAMPLE CONFIGURATION AND BACKUP USING OEM AND RMAN FOR A RAW FILESYSTEM
Let's take a look at actual screen shots of the configuration and execution of a backup in the Linux environment from the Oracle Enterprise Manager interface for Rman. First we need to execute a one-time configuration job on our server. Figure 6 shows the OEM Main GUI with the Job definition menu pulled down. You will use this menu to select the Create Job option.
The selection of the Create Job option invokes the OEM job edit menu as is shown in Figure 7. In this shot you can see we filled in the entries for the job name and the node to which it is to be submitted. In a RAC environment it doesn't matter which node we submit to for the configuration step as long as that node is available and will be used for future backup activities.
Paper 215
Next, we want to select the Tasks tab and select Run RMAN Script. This is shown in Figure 8.
Next, we use the Parameters screen to actually enter the RMAN commands we wish to execute. Notice that we do not have to bracket them with the RUN {} construct. Figure 9 shows the Job Parameters GUI with the script commands to configure our servers using tape drives.
Paper 215
One thing to notice in Figure 9 is that with the TAPE type configuration the command for the default device type must use the "TO sbt", if we were specifying a type of disk the "TO" is excluded. Also, with a device type of disk you need to specify the directory where you want the backup files placed, this is done using the FORMAT option on the CONFIGURE commands as shown in Figure 10.
Once the commands are entered into the Rman Script window, you can Submit, Save to Library or Submit and Save to Library, I suggest the Submit and Save to Library option in case you have made an error. If the job is not saved, you will have to re-enter it if you have made an error.
Paper 215
When the job is submitted, you can follow its progress by selecting the Jobs icon in the OEM menu tree and then double clicking on the entry that corresponds to your submitted job. A Job tracking screen, similar to the one in Figure 11 will be displayed.
Notice that the screen in Figure 11 is actually called the Edit Jobs screen, however, it is only used to show the job status and no actual editing is allowed. By clicking on the various steps of the job process you can get any output generated to be displayed. An example log from a configuration job is shown in figure 12.
Recovery Manager: Release 9.2.0.2.0 - Production Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
RMAN> 2> connected to target database: AULT (DBID=127952943) using target database controlfile instead of recovery catalog RMAN> RMAN> CONFIGURE DEFAULT DEVICE TYPE TO disk; old RMAN configuration parameters: CONFIGURE DEFAULT DEVICE TYPE TO DISK; new RMAN configuration parameters: CONFIGURE DEFAULT DEVICE TYPE TO DISK; new RMAN configuration parameters are successfully stored RMAN> CONFIGURE DEVICE TYPE disk PARALLELISM 2; old RMAN configuration parameters: CONFIGURE DEVICE TYPE DISK PARALLELISM 2; new RMAN configuration parameters: CONFIGURE DEVICE TYPE DISK PARALLELISM 2; new RMAN configuration parameters are successfully stored RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE disk CONNECT = 'SYS/xenon137@ault1' FORMAT "/usr/backup/ault_rac1/%U"; new RMAN configuration parameters: CONFIGURE CHANNEL 1 DEVICE TYPE DISK CONNECT 'SYS/xenon137@ault1' FORMAT "/usr/backup/ault_rac1/%U"; new RMAN configuration parameters are successfully stored RMAN> CONFIGURE CHANNEL 2 DEVICE TYPE disk CONNECT = 'SYSTEM/xenon137@ault2' FORMAT "/usr/backup/ault_rac2/%U"; new RMAN configuration parameters: CONFIGURE CHANNEL 2 DEVICE TYPE DISK CONNECT 'SYSTEM/xenon137@ault2' FORMAT "/usr/backup/ault_rac2/%U"; new RMAN configuration parameters are successfully stored RMAN> RMAN>
Paper 215
Once the configuration is complete (remember, this only has to be done once unless the configuration changes) we are ready to actually submit our backup job. To submit a backup job we once again invoke the OEM Job editor, and other than the job name (now RAC Backup) the steps are essentially the same until we get to the script entry, now we enter the actual backup commands as shown in Figure 13.
If your NFS mounts are read-only, exclude the DELETE INPUT portion of the command and use a manual deletion. Once the command(s) are entered, use the Submit and Save to Library selection to submit the job. The log file from this job is shown in Figure 14.
Recovery Manager: Release 9.2.0.2.0 - Production Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
RMAN> 2> connected to target database: AULT (DBID=127952943) using target database controlfile instead of recovery catalog RMAN> RMAN> BACKUP DATABASE INCLUDE CURRENT CONTROLFILE PLUS ARCHIVELOG DELETE INPUT; Starting backup at 15-FEB-03 current log archived allocated channel: ORA_DISK_1 channel ORA_DISK_1: sid=20 devtype=DISK allocated channel: ORA_DISK_2 channel ORA_DISK_2: sid=20 devtype=DISK channel ORA_DISK_1: starting archive log backupset channel ORA_DISK_1: specifying archive log(s) in backup set input archive log thread=1 sequence=21 recid=7 stamp=486059165 channel ORA_DISK_1: starting piece 1 at 15-FEB-03 channel ORA_DISK_2: starting archive log backupset channel ORA_DISK_2: specifying archive log(s) in backup set
Paper 215
input archive log thread=2 sequence=23 recid=4 stamp=486046138 input archive log thread=2 sequence=24 recid=6 stamp=486046347 channel ORA_DISK_2: starting piece 1 at 15-FEB-03 channel ORA_DISK_1: finished piece 1 at 15-FEB-03 piece handle=/usr/backup/ault_rac1/0befhb51_1_1 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:09 channel ORA_DISK_1: deleting archive log(s) archive log filename=/usr/backup/ault_rac1/archives1/1_21.dbf recid=7 channel ORA_DISK_1: starting archive log backupset channel ORA_DISK_1: specifying archive log(s) in backup set input archive log thread=2 sequence=25 recid=8 stamp=486059165 channel ORA_DISK_1: starting piece 1 at 15-FEB-03 channel ORA_DISK_1: finished piece 1 at 15-FEB-03 piece handle=/usr/backup/ault_rac1/0defhb5a_1_1 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:02 channel ORA_DISK_1: deleting archive log(s) archive log filename=/usr/backup/ault_rac1/archives1/2_25.dbf recid=8 channel ORA_DISK_2: finished piece 1 at 15-FEB-03 piece handle=/usr/backup/ault_rac2/0cefhb0h_1_1 comment=NONE channel ORA_DISK_2: backup set complete, elapsed time: 00:00:14 channel ORA_DISK_2: deleting archive log(s) archive log filename=/usr/backup/ault_rac1/archives1/2_23.dbf recid=4 archive log filename=/usr/backup/ault_rac1/archives1/2_24.dbf recid=6 Finished backup at 15-FEB-03
stamp=486059165
stamp=486059165
stamp=486046138 stamp=486046347
Starting backup at 15-FEB-03 using channel ORA_DISK_1 using channel ORA_DISK_2 channel ORA_DISK_1: starting full datafile backupset channel ORA_DISK_1: specifying datafile(s) in backupset input datafile fno=00002 name=/oracle/oradata/ault_rac/ault_rac_raw_undotbs1_200m.dbf input datafile fno=00005 name=/oracle/oradata/ault_rac/ault_rac_raw_example_140m.dbf input datafile fno=00010 name=/oracle/oradata/ault_rac/ault_rac_raw_xdb_40m.dbf input datafile fno=00006 name=/oracle/oradata/ault_rac/ault_rac_raw_indx_25m.dbf input datafile fno=00009 name=/oracle/oradata/ault_rac/ault_rac_raw_users_25m.dbf input datafile fno=00004 name=/oracle/oradata/ault_rac/ault_rac_raw_drsys_20m.dbf channel ORA_DISK_1: starting piece 1 at 15-FEB-03 channel ORA_DISK_2: starting full datafile backupset channel ORA_DISK_2: specifying datafile(s) in backupset including current controlfile in backupset input datafile fno=00001 name=/oracle/oradata/ault_rac/ault_rac_raw_system_411m.dbf input datafile fno=00003 name=/oracle/oradata/ault_rac/ault_rac_raw_cwlite_20m.dbf input datafile fno=00007 name=/oracle/oradata/ault_rac/ault_rac_raw_odm_20m.dbf input datafile fno=00011 name=/oracle/oradata/ault_rac/ault_rac_raw_undotbs2_200m.dbf input datafile fno=00008 name=/oracle/oradata/ault_rac/ault_rac_raw_tools_10m.dbf channel ORA_DISK_2: starting piece 1 at 15-FEB-03 channel ORA_DISK_1: finished piece 1 at 15-FEB-03 piece handle=/usr/backup/ault_rac1/0eefhb5h_1_1 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:01:09 channel ORA_DISK_2: finished piece 1 at 15-FEB-03 piece handle=/usr/backup/ault_rac2/0fefhb11_1_1 comment=NONE channel ORA_DISK_2: backup set complete, elapsed time: 00:01:53 Finished backup at 15-FEB-03 Starting backup at 15-FEB-03 current log archived using channel ORA_DISK_1 using channel ORA_DISK_2 channel ORA_DISK_1: starting archive log backupset channel ORA_DISK_1: specifying archive log(s) in backup set input archive log thread=1 sequence=22 recid=9 stamp=486059309 channel ORA_DISK_1: starting piece 1 at 15-FEB-03 channel ORA_DISK_2: starting archive log backupset channel ORA_DISK_2: specifying archive log(s) in backup set input archive log thread=2 sequence=26 recid=10 stamp=486059309
Paper 215
channel ORA_DISK_2: starting piece 1 at 15-FEB-03 channel ORA_DISK_1: finished piece 1 at 15-FEB-03 piece handle=/usr/backup/ault_rac1/0gefhb9e_1_1 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:00 channel ORA_DISK_1: deleting archive log(s) archive log filename=/usr/backup/ault_rac1/archives1/1_22.dbf recid=9 stamp=486059309 channel ORA_DISK_2: finished piece 1 at 15-FEB-03 piece handle=/usr/backup/ault_rac2/0hefhb4u_1_1 comment=NONE channel ORA_DISK_2: backup set complete, elapsed time: 00:00:01 channel ORA_DISK_2: deleting archive log(s) archive log filename=/usr/backup/ault_rac1/archives1/2_26.dbf recid=10 stamp=486059309 Finished backup at 15-FEB-03 RMAN> **end-of-file** RMAN> Recovery Manager complete. Figure 14: Example Backup Log
As you can see, using RMAN, RAC backup is much easier than doing manual scripting. The most difficult part of using OEM and RAMN is getting the intelligent agents properly configured on the RAC nodes. I cover OEM and RAC in another paper.
8. Once redo transactions are applied, all undo (rollback) records are applied, this eliminates non-committed transactions. 9. Database is now fully available to surviving nodes. Again, instance recovery is automatic and other than the performance hit to instances which survive and the disconnection of users who where using the failed instance, basically invisible to the other instances. If you properly utilize RAC failover and transparent application failover (TAF) technologies, the only users that should see a problem are those with in-flight transactions. For a look at what the other instance sees in its alert log during a reconfiguration, look at Figure 15.
Sat Feb 15 16:39:09 2003 Reconfiguration started List of nodes: 0, Global Resource Directory frozen one node partition Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Resources and enqueues cleaned out Resources remastered 1977 2381 GCS shadows traversed, 1 cancelled, 13 closed 1026 GCS resources traversed, 0 cancelled 3264 GCS resources on freelist, 4287 on array, 4287 allocated set master node info Submitted all remote-enqueue requests Update rdomain variables Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted 2381 GCS shadows traversed, 0 replayed, 13 unopened Submitted all GCS remote-cache requests 0 write requests issued in 2368 GCS resources 2 PIs marked suspect, 0 flush PI msgs Sat Feb 15 16:39:10 2003 Reconfiguration complete Post SMON to start 1st pass IR Sat Feb 15 16:39:10 2003 Instance recovery: looking for dead threads Sat Feb 15 16:39:10 2003 Beginning instance recovery of 1 threads Sat Feb 15 16:39:10 2003 Started first pass scan Sat Feb 15 16:39:11 2003 Completed first pass scan 208 redo blocks read, 6 data blocks need recovery Sat Feb 15 16:39:11 2003 Started recovery at Thread 2: logseq 26, block 14, scn 0.0 Recovery of Online Redo Log: Thread 2 Group 4 Seq 26 Reading mem 0 Mem# 0 errs 0: /oracle/oradata/ault_rac/ault_rac_raw_rdo_2_2.log Recovery of Online Redo Log: Thread 2 Group 3 Seq 27 Reading mem 0 Mem# 0 errs 0: /oracle/oradata/ault_rac/ault_rac_raw_rdo_2_1.log Sat Feb 15 16:39:12 2003 Completed redo application Sat Feb 15 16:39:12 2003 Ended recovery at Thread 2: logseq 27, block 185, scn 0.5479311 6 data blocks read, 8 data blocks written, 208 redo blocks read Ending instance recovery of 1 threads SMON: about to recover undo segment 11 SMON: mark undo segment 11 as available Figure 15: Alert Log Entries During Reconfiguration
On word of caution however, during testing to get the listing in Figure 15 I stumbled upon a rare occurrence of not being able to get the instance up after an instance failure. In the Linux/RAC/Raw environment I did a "kill -9" on the SMON process on aultlinux1, the above was the result for the database (aultlinux2 stayed up and operating and recovered the failed instance)
Paper 215
however, when I attempted a restart of the instance on aultlinux1 I received a Linux Error: 24 : Too Many Files Open error. This was actually caused by something wacking the spfile link. Once I pointed the instance toward the proper spfile location during startup, it restarted with no problems.
If more than one tape device is available the parallelism of the recovery can be increased thus reducing recovery time, for example:
CONFIGURE DEVICE TYPE sbt PARALLELISM 2; CONFIGURE DEFAULT DEVICE TYPE TO sbt; CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT 'SYS/kr87m@ault1'; CONFIGRE CHANNEL 2 DEVICE TYPE sbt CONNECT 'SYS/kr87m@ault2'; RESTORE DATABASE; RECOVER DATABASE;
Since Oracle RMAN uses autolocation, the channel connected to each node restores the files backed up by that node. Remember that the configure commands only have to be issued once.
Paper 215
However, this assumes all of the archived redo logs are available. If all of the logs are not available then an RMAN script similar to:
RUN { SET UNTIL LOG SEQUENCE 2245 THREAD 2' RESTORE DATABASE; RECOVER DATABASE; } ALTER DATABASE OPEN RESETLOGS;
would be used to recover until the first unavailable log (in the example log sequence 2245 in thread 2 is the first unavailable log.) Notice the use of the alter command to open the database, the resetlogs option resets the archive log sequences and renders previous archive logs unusable for most recovery scenarios.
PARALLEL RECOVERY
Recovery in Oracle9i RAC is automatically parallelised for these three stages of recovery: 1. Restoration of data files 2. Application of incremental backups 3. Application of redo logs The number of channels configured for a RAC database in RMAN determines the degree of parallel for data file restoration. In our example configuration we could have two streams restoring data files since we configured two channels. The degree of parallel for restoration of incremental backups is also dependent on the number of configured channels. Redo logs are applied by RMAN using the degree of parallel specified in the initialization parameter RECOVERY_PARALLELISM. Using manual recovery methods such as sqlplus (remember, in Oracle9i there is no server manager program, so all DBA functions are done through SQLPLUS) you can specify values for RECOVERY_PARALLELISM since it is a dynamic parameter, however, it cannot exceed your setting for PARALLEL_MAX_SERVERS. Using the DEGREE option for the RECOVER command you can also control the degree of parallel for other recovery operations.
In the case of a normal instance Oracle provides processes on the main node that handle the copying of archive logs to the standby server for physical standby and processes that read the changes from the redo logs and apply them as SQL statements in the case of logical standby. There are two types of Standby Databases supported for RAC. You can either create the Standby database on a Single-Node or on a Clustered-node system. In Oracle 9i (both releases) the DataGuard Manager does not support RAC, so you have to setup the Dataguard (Standby) Configuration manually.
Shared Disk, if a Cluster File System (CFS) is supported for the used Platform. Otherwise, the Archive logs are stored on each nodes Private Disk area. You must use different formats for naming the Archive logs to prevent overwriting. It is required to use at least the %t in LOG_ARCHIVE_FORMAT to prevent this. The %t represents the Thread Number where the Log comes from. Thus, we have the following settings: Instance ault1:
LOG_ARCHIVE_DEST_1=(location=/usr/backup/ault_rac1/archives1) LOG_ARCHIVE_FORMAT=arc_%s_%t
Instance ault2:
LOG_ARCHIVE_DEST_1=(location=/usr/backup/ault_rac2/archives2) LOG_ARCHIVE_FORMAT=arc_%s_%t
Next we add the Log Transport for each Primary RAC Instance to the corresponding Standby RAC Instance. As we have Standby Redo logs created and want to have maximum Performance, we have now these settings: Instance ault1:
LOG_ARCHIVE_DEST_1=(location=/usr/backup/ault_rac1/archives1) LOG_ARCHIVE_DEST_2=(SERVICE=ault3 LGWR ...) LOG_ARCHIVE_FORMAT=arc_%s_%t
Instance ault2:
LOG_ARCHIVE_DEST_1=(location=/usr/backup/ault_rac2/archives2) LOG_ARCHIVE_DEST_2=(SERVICE=ault4 LGWR ...) LOG_ARCHIVE_FORMAT=arc_%s_%t
Now we designate Instance ault4 as the Recovering Instance (it can be either ault3 or ault4 but for this example we designate ault4). This means that the Archive logs from the Instance ault3 have to be transferred to Instance ault4. Instance ault4 is also performing the Archiving to disk process (Again on the Shared Disk that is available for both standby instances). The resulting settings for Instance ault3 and ault4 now are: Instance ault3:
LOG_ARCHIVE_DEST_1=(SERVICE=ault4 ARCH SYNC)
Instance ault4:
LOG_ARCHIVE_DEST_1=(location=/usr/backup/ault_rac4/archives4)
If all this is done, you can now STARTUP NOMOUNT and ALTER DATABASE MOUNT STANDBY DATABASE all instances and put the Recovering Instance into RECOVERY Mode.
ault2:
LOG_ARCHIVE_DEST_1=(location=/usr/backup/ault_rac2/archives2) LOG_ARCHIVE_DEST_2=(SERVICE=ault3 LGWR ...)
You must also create different Standby Redo Log Threads on the Standby Database. Instance ault3 now receives all the Logs and applies them to the Standby Database. Instance ault3:
LOG_ARCHIVE_DEST_1=(location=/usr/backup/ault_rac3/archives3)
CROSS-INSTANCE ARCHIVAL
In addition to the Parameters shown above, you can configure Cross-instance-Archival. This means you can ship your Archive logs from one Primary RAC Instance to another Primary RAC Instance. This could be helpful in case of Gap-resolution if one Instance looses Network connectivity or Archive logs are deleted by fault. To enable Cross-instance Archival, you simply have to take one free LOG_ARCHIVE_DEST_n-parameter and configure it for shipping to another Primary Instance. For Cross-Instance Archival only the Archive process is supported and can be used. If we use again our earlier example, the result would be: Instance ault1:
LOG_ARCHIVE_DEST_1=(location=/usr/backup/ault_rac1/archives1) LOG_ARCHIVE_DEST_2=(SERVICE=ault3 LGWR ...) LOG_ARCHIVE_DEST_3=(SERVICE=ault2 ARCH ...) LOG_ARCHIVE_FORMAT=arc_%s_%t
When Cross-Instance Archival is configured, Instance ault3 would now receive missing Archive logs from Instance ault2 and Instance ault4 from Instance ault1.
Paper 215
SUMMARY
Although RAC configurations can be complex, but the backup and recovery of RAC need not be a nightmare. Using RMAN, the configuration and, backup and recovery in RAC environments is greatly simplified. Log transport in RAC can be configured to easily support standby or Data Guard configurations.
REFERENCES
Metalink NOTE:180031.1:Creating a Data Guard Configuration Metalink NOTE:68537.1: Init.ora Parameter "LOG_ARCHIVE_DEST_n" Metalink NOTE:150584.1:Data Guard 9i Setup with Guaranteed Protection Mode Oracle9i Data Guard Concepts and Administration, Release 2 (9.2), A96653-01 Oracle9i Database Reference, Release 2 (9.2), A96536-01 Oracle9i SQL Reference, Release 2 (9.2), A96540-01 Oracle9i Real Application Clusters Administration, Release 2 (9.2), A96596-01, Chapters 6-8. MetaLink NOTE: 203326.1: Data Guard 9i Log Transportation on RAC Metalink Note: 186592.1: Backing Up Archivelogs in a RAC Cluster Metalink Note: 145178.1: RAMN9i: Automatic Channel Allocation and Configuration Metalink Note: 150584.1: Data Guard 9i Setup with Guaranteed Protection Mode
Paper 215