Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

About WAN Resiliency

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

NetBackup Whitepaper

About WAN Resiliency


This document gives brief introduction about problems faced in WAN environment and how WAN Resiliency feature addressed in NB 7.5 This document is for internal and external use. If you have any feedback or questions about this document please email them to IMG-TPM-Requests@symantec.com stating the document title.

This document applies to the following version(s) of NetBackup: 7.5

This document is provided for informational purposes only and is intended for distribution only to Symantec employees. If you have any feedback or questions about this document please email them to IMG-TPMRequests@symantec.com stating the document title. All warranties relating to the information in this document either express or implied are disclaimed to the maximum extent allowed by law. The information in this document is subject to change without notice. Copyright 2012 Symantec Corporation. All rights reserved. Symantec, the Symantec logo and NetBackup are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners

Document Control
Contributors Who Contribution

Harsha Muktamath Brijesh Chougule

Page-2

Abstract
The purpose of this white paper is to explain how WAN Resiliency can be used in WAN environment. This paper explains about the possible network interruptions and how WAN resiliency feature helps to fix these issues.

Intended Audience
This paper is intended for internal and external audience. It gives an overview of potential WAN issues and how WAN resiliency feature overcomes it.

Introduction
Every company may have the need to communicate over a WAN environment. A company may need to send data across a WAN between their Data Centre (in one location) and their remote offices (in other locations). A WAN enables data transfers to happen across geographies with a tradeoff of various network interruptions. These interruptions include (but are not limited to) intermittent disconnections, latency or chit chat, packet losses, and delays. To overcome these problems, there are multiple solutions available. There are third party tools available which can help an application in recovering from such interruptions but would involve software costs and/or hardware costs. Other solutions might involve making the application itself resilient to such interruptions. NetBackup 7.5 includes a feature, Resilient Network, which will help data transfers over a WAN environment by becoming resilient to WAN interruptions.

Requirements
1) Master server, media servers, and clients should all be running NBU 7.5 or later. Back level media servers and clients are supported in the environment but for the Resilient Network feature to work, all machines involved must be running NBU 7.5 or later. 2) The below figure illustrates one scenario where the master server and media servers are in the Data Centre and the clients are at remote locations.

Note: The figure depicts only some of the supported platforms. Please refer to the Hardware Compatibility guide for a complete list of supported platforms.

Page-3

How it works:
The Resilient Network feature gives NetBackup 7.5 the ability to recover from network interrupt ions during operations between clients in a remote office and the master server and/or media server in the Data Centre. This is accomplished by the following: 1. Enhanced communication between the NetBackup Master/Media Server and the NetBackup Client. Connections between the client and the master server / media server will be resilient to failures. The connections will recover from interruptions during communications without causing operations to fail. 2. Enhanced administration and configuration to better suit the needs and requirements of NetBackup Clients in a remote office. Below is a pictorial representation of the supported configuration in a WAN environment. Even though the diagram only lists the Media Server backing up to either BasicDisk or AdvancedDisk, the Media Server can backup to any type of storage; this includes tape as well as a PureDisk Deduplication-Pool

To set resiliency:
Resiliency can be set using CLI or through GUI. CLI New resilient_clients command Windows: install_path\NetBackup\bin\admincmd UNIX: /usr/openv/netbackup/bin/admincmd

resilient_clients status | on | off <nbuclientname>

Page-4

GUI

Note: Please refer Symantec NetBackup Deduplication Guide for more information on how to configure the Resilient Network feature (using GUI or CLI).

Some information on the WAN network (Good or Bad)


The solution mentioned in the above section works better considering High Availability networks. The definition of a High Availability network is as follows Availability = MTBF/(MTBF+MTTR). MTBF Mean Time Before Failure MTTR - Mean Time To Repair 24 hours/day * 3600 seconds/hour = 86400 seconds Availability is 90%; downtime per day is 86400*0.10 = 8640 seconds. Availability is 99%; downtime per day is 86400*0.01 = 864 seconds. Availability is 99.9%; downtime per day is 86400*0.001 = 86.4 seconds. Availability is 99.99%; downtime per day is 86400*0.0001 = 8.64 seconds. Availability is 99.999%; downtime per day is 86400*0.00001 = 0.864 seconds. Availability is 99.9999%; downtime per day is 86400*0.000001 = 0.0864 seconds.

Let us break it down a little more: When availability is 90%, the downtime per hour is 8640/24 = 360 seconds, and the downtime per 10 minutes is 60 seconds. When availability is 99%, the downtime per hour is 864/24 = 36 seconds, and the downtime per 10 minutes is 6 seconds. When availability is 99.9%, the downtime per hour is 86.4/24 = 3.6 seconds, and the downtime per 10 minutes is 0.6 seconds. -As per the industry standards, it has been observed that the networks are available for greater than 99% most of the times and the worst availability is 97.4%. Anything below this value will constitute for a Bad Network;
Page-5

in which case no other application (including NetBackup) will work. With the Resilient Network feature, a NetBackup resilient connection can survive network interruptions of up to 80 seconds and may survive interruptions longer than 80 seconds.

A new NBRNTD daemon


In order to protect the connections between a NetBackup client and a NetBackup Media/Master server, a new daemon, NBRNTD, is introduced NetBackup Remote Network Transport Daemon. This daemon maintains the connections and will re-establish a broken connection. It is started on demand. The network resiliency communication includes following aspects: 1. 2. 3. 4. Initiate a protected connection between remote client and data centre (VNETD NBRNTD<------>NBRNTD-VNETD). Connection reestablishment if the WAN connection is down. Data Buffering Data Integrity and Consistency

Supportability with external applications:


1) Files and folder backup File system related backup/restore. 2) Virtualization/Flash backups. 3) Database agents backup/restore.

What is not supported?


1) Resilient connections are not supported for a. Master Server <-> Media Server communications b. Media Server <-> Media Server communications 2) Optimized duplication is between Storage servers and resiliency between these is not supported. 3) Resilient connections are not supported to the client if the client also happens to be a Master Server or Media Server.

Page-6

Troubleshooting techniques:
With the introduction of new service NBRNTD, all the connection/reconnection logs are stored under nbrntd logs.

Connection/Reconnection can be seen in nbrntd logs 9/7/2011 13:15:22.068 [wan_status_notify] [Statistic] 0000000000000204 link_id:8 Network broken! (../local_connection.cpp:1493) 9/7/2011 13:15:22.228 [wan_status_notify] [Statistic] 0000000000000204 link_id:8 Network recovered! (../local_connection.cpp:1487)

Network failures can be identified by SESSION_SWAP messages in pdplugin.log: 09/07/11 14:40:05 [15544] [9532] [DEBUG] PDVFS: pdvfs_lib_log: Sending binary message to <NBU Media server>:10082: SESSION_SWAP 2941385056 4 0: 898404890 0 428019057 27988 09/07/11 14:40:05 [15544] [9532] [DEBUG] PDVFS: pdvfs_lib_log: __crSessionSetupAgain: client has sent 428019057 bytes and read 27988 bytes for session id 898404890 09/07/11 14:40:05 [15544] [9532] [DEBUG] PDVFS: pdvfs_lib_log: Received binary message from <NBU Media server>:10082: SESSION_SWAP 2941385056 4 0: 898404890 0 27988 427887969 09/07/11 14:40:05 [15544] [9532] [DEBUG] PDVFS: pdvfs_lib_log: __crSessionSetupAgain: server sent 0 status with 27988 sent bytes and 427887969 read bytes for session id 898404890

Connection/Reconnection messages can also be seen in deduplication engine (spoold) messages: September 07 15:11:28 WARNING [0000000000C6F3F0]: 25015: resiliency reconnect check, sleeping for 9 seconds September 07 15:11:29 WARNING [0000000000C6F3F0]: 25015: resiliency reconnect check, timeout comparing 300 vs 300, attempt no 53 September 07 15:11:29 ERR [0000000000C6F3F0]: 25015: Timed out reached while waiting for session id 3235169508 to reconnect. Reconnect Failed. September 07 15:11:29 ERR [0000000000C6F3F0]: 25036: Could not receive binary message: expected 4 bytes got -1 bytes. Cause: connection reset by peer

Recommended Practices:
1) Use Client Side Deduplication: This would help in good utilization of bandwidth and also since packets are flowing across WAN, if a failure occurs we are at loss of sending less data. However once the WAN connection is established, the backups/restore would resume due to resiliency feature. 2) It is also recommended to enable Checkpoint along with resiliency. 3) CD + Accelerator: NetBackup Accelerator significantly reduces the amount of resource (time, network, and storage) that a full backup requires. Using NetBackup Accelerator, a very large file system can be fully backed up in the amount of time previously taken by an incremental backup. This clubbed with CD and WAN resiliency would give optimum utilization of resources.

Page-7

Conclusion
WAN resiliency feature provides an efficient way to perform Backup/Restore operations in a typical WAN environment. This helps to reestablish broken network connections and continue with the operation without any loss. The important feature of resilient jobs is that administrator need not have to re start the failed jobs due to network interruptions. It automatically restarts/continues with the job once the connection is re established over the WAN which means the job is still active and is resilient to WAN interruptions.

References
1) Symantec NetBackup Deduplication guide. 2) For statistics on real high availability numbers http://www.internetpulse.net/Main.aspx?xAxis=Destination&yAxis=Origin&zAxis=Metric&nAxi s=Period). 3) High Availability Network Fundamentals E book by Chris Oggerino.

Page-8

About Symantec: Symantec is a global leader in providing storage, security and systems management solutions to help consumers and organizations secure and manage their information-driven world. Our software and services protect against more risks at more points, more completely and efficiently, enabling confidence wherever information is used or stored.

For specific country offices and contact numbers, please visit our Web site: www.symantec.com

Symantec Corporation World Headquarters 350 Ellis Street Mountain View, CA 94043 USA +1 (650) 527 8000 +1 (800) 721 3934

Copyright 2012 Symantec Corporation. All rights reserved. Symantec and the Symantec logo are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners.

Page-9

You might also like