Data Guard Deep Dive UKOUG 2012

1 – Configuration Considerations
2 – Performance Tuning
3 – Role Transition Best Practices
4 – Corruption Detection
5 – Integration Issues
Emre Baransel – DBA, Employee ACE- Oracle
Data Guard Deep Dive

Choosing the Protection Mode
MODE
REDO TRANSPORT
ACTION WITH NO STANDBY DATABASE CONNECTION
RISK OF DATA LOSS
Maximum Protection
SYNC & LGWR
The primary database has to write redo to at least one standby database. Otherwise it will shut down
Zero data loss is guaranteed
Maximum Availability
SYNC & LGWR
Normally works with SYNC. If primary database cannot write redo to any of its standby databases, it continues as in ASYNC mode
Zero data loss in normal operation, but not guaranteed
Maximum Performance
ASYNC & (LGWR or ARCH)
Never expects acknowledgment from standby database
Potential for minimal data loss in normal operation

Choosing the Protection Mode
•If there is network bandwidth and latency issue
•use Maximum Performance
•recommended because it has not any performance benefit with LGWR !!! ARCH is not but has less data protection in 11g
•When any data loss is not acceptable & service outage is preferred against any data loss
•make your network bandwidth high enough
•and use Maximum Protection.
•If there is no intolerance about data loss & have high bandwidth
•use Maximum Availability
Required bandwidth (Mbps) =
((Max redo rate bytes per sec. / 0.7) * 8) / 1,000,000
If maximum redo generation rate is 500MB per minute
which is 8738133 bytes per second,
Then Required bandwidth = 100 Mbps
* only for data guard
* latency is important

SYNC Enhancement in 11g
Previously, primary database was first finishing writes to online redo log and then sending redo to standby database. There were two consecutive I/O operations that primary database needs to wait in order to complete the commit.
Standby Redo Log
Redo Log Buffer
Online Redo Log
Before 11g
Commit
OK
ok
ok
Primary Standby

SYNC Enhancement in 11g
In 11g these two I/O operations run in parallel. Primary database does not wait finishing writes to online redo log and it sends the redo data to standby at the same time.
Standby Redo Log
Redo Log Buffer
Online Redo Log
In11g
Commit
OK
ok
ok
Primary Standby

No More Delay to Decrease RTO
Prefer Real Time Apply with “Flashback On” rather than “Delay”. Delayed configuration increases RTO
LOG_ARCHIVE_DEST_2='SERVICE=STANDBY LGWR ASYNC VALID_FOR= (ONLINE_LOGFILES, PRIMARY_ROLE) DB_UNIQUE_NAME=ORCLSTD DELAY=120
DB_RECOVERY_FILE_DEST=‘+FRA’;
DB_RECOVERY_FILE_DEST_SIZE=500G;
DB_FLASHBACK_RETENTION_TARGET=120;
ALTER DATABASE FLASHBACK ON;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION
USE REAL-TIME APPLY
TURN ON FLASHBACK

Using Flashback Database...
You can reinstate the original primary database as a new standby database following a failover
A failed switchover process can be reversed easily
Unwanted changes on Primary Database can be reversed and queried from Standby Database if flashback is not being used on primary.

Using Real Time Apply...
Prefer Real Time Apply to avoid
ORA-01555 Snapshot Too Old
errors on Active Data Guard standby databases.
Query fresh data from standby
RTO is decreased

11g Performance Improvements
11g Recovery performance improvements include:
•More parallelism by default
•More efficient asynchronous redo read, parse, and apply
•Fewer synchronization points in the parallel apply algorithm
•The media recovery checkpoint at a redo log boundary no longer blocks the apply of the next log
Active Data Guard 11g Best Practices
Oracle Maximum Availability Architecture
White Paper

1 – Configuration Considerations 2 – Performance Tuning 3 – Role Transition Best Practices 4 – Corruption Detection 5 – Integration Issues
Determining Redo Apply Rate
1. Method:
SQL> select * from v$recovery_progress
23-SEP-11 Media Recovery Active Apply Rate KB/sec 15564 0
23-SEP-11 Media Recovery Average Apply Rate KB/sec 20890 0
2. Method:
SQL> select APPLY_RATE from V$STANDBY_APPLY_SNAPSHOT;
APPLY_RATE
----------
16305

Determining Redo Apply Rate
SQL> SELECT PROCESS, SEQUENCE#, THREAD#, block#, BLOCKS, TO_CHAR(SYSDATE, 'DD-MON-YYYY HH:MI:SS') time from v$MANAGED_STANDBY WHERE PROCESS='MRP0';
PROCESS SEQUENCE# THREAD# BLOCK# BLOCKS TIME
--------- ---------- ---------- ---------- ---------- --------------------
MRP0 276877 1 147338 4097947 19-APR-2011 12:25:34
PROCESS SEQUENCE# THREAD# BLOCK# BLOCKS TIME
--------- ---------- ---------- ---------- ---------- --------------------
MRP0 276877 1 645542 4097947 19-APR-2011 12:25:39
SQL> SELECT lebsz LOG_BLOCK_SIZE from x$kccle;  Redo block size (byte)
3. Method:
0. Second 
5. Second 
Media Recovery Rate:
((BLOCK#_END – BLOCK#_BEG) * LOG_BLOCK_SIZE)) / ((TIME_END – TIME_BEG) * 1024 * 1024)

Redo Apply Tuning
•By default recovery parallelism = CPU Count-1. Do not use any other values.
•Keep PARALLEL_EXECUTION_MESSAGE_SIZE >= 8192
•Keep DB_CACHE_SIZE >= Primary value
•Keep DB_BLOCK_CHECKING = FALSE (if you have to)
•System Resources Needs to be assessed
SQL> select a.sid, b.username, b.osuser, a.event, a.wait_time, a.p1, a.p1text, a.seconds_in_wait from gv$session_wait a, gv$session b where a.sid=b.sid and b.sid=(select SID from v$session where PADDR=(select PADDR from v$bgprocess where NAME='MRP0'));
Query what MRP process is waiting

Redo Transport Tuning
Also consider:
3 - Configuring TCP Send / Receive Buffer Sizes (RECV_BUF_SIZE / SEND_BUF_SIZE)
4 - Increasing SDU Size
5 - Setting TCP.NODELAY to YES
1 - Tune LOG_ARCHIVE_MAX_PROCESSES parameter on the primary.
•Specifies the parallelism of redo transport
•Default value is 2 in 10g, 4 in 11g
•Increase if there is high redo generation rate and/or multiple standbys
•Must be increased up to 30 in some cases.
•Significantly increases redo transport rate.
2 - Consider using Redo Transport Compression:
•In 11.2.0.2 redo transport compression can be always on
•Use if network bandwidth is insufficient
•and CPU power is available
Redo Transport Services Best Practices Oracle® Database High Availability Best Practices 11g Release 1

Switchover Best Practices
Set JOB_QUEUE_PROCESSES & AQ_TM_PROCESSES params to 0.
Use Real-Time Apply
Reduce LOG_ARCHIVE_MAX_PROCESSES to the minimum.
Properly set archiving destinations on the standby database.
Set LOG_ARCHIVE_TRACE=8191;
Enable Flashback Database or use Guaranteed Restore Points

Failover Best Practices
Enable Flashback Database
Use Real-Time Apply
Consider configuring multiple standby databases.
Consider using Fast-Start Failover
Set FastStartFailoverThreshold
Set FastStartFailoverAutoReinstate

Corruption Detection Parameters
DB_BLOCK_CHECKSUM
OFF
(FALSE)
TYPICAL
(TRUE)
FULL
Physical Corruption
DB_BLOCK_CHECKING
OFF
(FALSE)
LOW
MEDIUM
FULL
(TRUE)
Logical Corruption
Best Practices for Corruption Detection, Prevention, and Automatic Repair - in a Data Guard Configuration [ID 1302539.1]

Automatic Block Corruption Repair
‘Automatic Block Corruption Repair’
•11gR2 feature
•ON with Physical Standby & Active Data Guard
•Corruptions are reparied automatically using the remote db.
Also using RMAN “RECOVER BLOCK” command you can repair the corruption. This operation will try use the standby database first. If you don’t want to use the standby database for corruption repair, you must use EXCLUDE STANDBY option in the “RECOVER BLOCK” command.

“Lost – Write” detection
“Lost – Write” detection
•11gR1 feature
•A serious corruption which has its source in I/O subsystem.
•Physical Standby, Active Data Guard and Real-Time Apply is needed
•DB_LOST_WRITE_PROTECT = “TYPICAL” on both Primary and standby.
•When detected, standby recovery stops
•The way to get rid of this corruption is to failover to standby database.

RMAN Integration
And beginning with 11g, for “Block Change Tracking” feature of RMAN, which records the changed blocks for incremental backups, standby databases can be used. This requires Active Data Guard. There are important bugs of this feaure. Check bugs 9869287, 9068088, 10094823.
Integration Requirements and Best Practices
•Only Physical Standby can be used for interchangeable backups.
•RMAN Catalog must be used. (In a seperate location if possible)
•DB_UNIQUE_NAME must be different.
•General RMAN Best Practices must be preserved.

Integration with Oracle Applications
•Directs write operations to primary
•All read operations to Active Data Duard standby
•Applications developed with Oracle TopLink are able to be configured as “Active Data Guard aware”
•An ongoing study,
•Writes will work on primary and Reads on standby
•Automatic direction to primary in a case of lag
Configuring Oracle TopLink Applications with Oracle Active Data Guard
Oracle Maximum Availability Architecture White Paper
Configuring Oracle BI EE Server with Oracle Active Data Guard
Oracle Maximum Availability Architecture White Paper
Using Active Data Guard Reporting with Oracle E- Business Suite Release 12.1 and Oracle Database 11g [ID 1070491.1]
•Redirect Reports to Active Data Guard
•“fnd_adg_utility.enable_adg_support”

Best Practices Papers
http://www.oracle.com/technetwork/ database/features/availability/dataguard11g-bestpractices-161724.html

Data Guard Deep Dive UKOUG 2012

More Related Content

Data Guard Deep Dive UKOUG 2012