Docu58234 PDF
Docu58234 PDF
Docu58234 PDF
Abstract
This White Paper provides an overview of VPLEX networking. This document
includes topics for internal IP networks, management network, WAN COM for
Metro as well as VPN for Metro and Witness.. It provides guidance for VS2 and
VS6 hardware platforms.
April 2019
H13552
Revisions
Revisions
Date Description
April 2019 Revision 3
Acknowledgements
This paper was produced by the following:
VPLEX_CSE_TEAM@emc.com
The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this
publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.
Use, copying, and distribution of any software described in this publication requires an applicable software license.
Copyright © April 2019 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC, Dell EMC and other trademarks are trademarks of Dell Inc. or its
subsidiaries. Other trademarks may be trademarks of their respective owners. [5/16/2019] [Best Practices] [H13552]
Table of contents
Revisions............................................................................................................................................................................. 2
Acknowledgements ............................................................................................................................................................. 2
Table of contents ................................................................................................................................................................ 3
Executive summary............................................................................................................................................................. 4
Audience....................................................................................................................................................................... 4
Document Organization ................................................................................................................................................ 4
1 Internal Networks.......................................................................................................................................................... 5
1.1 Description & Requirement................................................................................................................................. 5
1.2 Management Network and Internal Management VPN ...................................................................................... 5
2 VPLEX Metro WAN Cluster Connectivity ..................................................................................................................... 6
2.1 VPLEX Metro Fibre Channel WAN Cluster Connectivity ................................................................................... 6
2.1.1 Requirements ..................................................................................................................................................... 6
2.2 Metro over IP (10Gb/E) ....................................................................................................................................10
2.2.1 Requirements ...................................................................................................................................................10
2.3 VPLEX Metro IP WAN Cluster Connectivity .....................................................................................................11
3 Rule Sets ....................................................................................................................................................................13
Executive summary
This VPLEX™ Networking Best Practices document covers VPLEX network requirements for WAN
communications (both Fibre Channel and Ethernet) for cluster to cluster communications over distance in a
VPLEX Metro configuration as well as providing an overview for the internal communication layer between
directors and management server to directors within the cluster. This document also covers the VPN
configuration between management servers and the Cluster Witness Server (CWS).
Note: The network best practices section does not cover host cluster network considerations as VPLEX does
not provide solutions directly for host stretch cluster networks. Host stretch clusters may require layer 2
network connectivity so please refer to host requirements and network vendor products that provide the ability
to separate cluster nodes across datacenters. Even though this document does not provide for host cluster
network requirements they are still an integral part of the overall solution.
Audience
These technical notes are for Dell-EMC field personnel and partners and customers who will be configuring,
installing, and supporting VPLEX. An understanding of these technical notes requires an understanding of the
following:
Document Organization
This technical note is one of a set of documents that supersede the monolithic Implementation Planning and
Best Practices for Dell-EMC VPLEX Technical Notes document that had previously been available. It is
intended to provide more concise, per-topic information that will be easier to maintain and keep up-to-date.
The following list represents the technical note best practice documents:
1 Internal Networks
Note: At no time should any of these internal networks be connected to customer networks.
The VS6 has two management ports, one on MMCS-A and the other on MMCS-B. Both of these ports must
be configured and connected to the customer network. MMCS-A is the management port that will be
accessed for all management and monitoring purposes.
MMCS-B must be configured so that processes such as ndu or configuration changes like setting cluster ID
during installation can be persisted to MMCS-B or those processes will fail.
This port provides the connectivity to facilitate the management and configuration of the cluster. For VPLEX
Metro, the management port is also utilized for a VPN which extends the internal management networks
between all of the cluster components in both clusters and, if installed, the VPLEX Witness.
Refer the Security Guide for port information and the protocols that need to be allowed on the firewall (both in
the outbound and inbound filters).
2.1.1 Requirements
• Latency may be up to 10ms rtt, depending on the use case. Many systems require 5ms rtt or less.
Please consult the latest VPLEX Simple Support Matrix
• Each director’s FC WAN ports must be able to see at least one FC WAN port on every other remote
director (required)
• Independent FC WAN links are strongly recommended for redundancy
• Each director has two FC WAN ports that should be configured on separate fabrics (or FC/IP external
hardware) to maximize redundancy and fault tolerance so that a single external VPLEX failure does
not cause a full WAN communications failure to a single director
• Configure FC WAN links between clusters like ISLs between FC switches. This does not require a
merged fabric between locations
• Logically isolate VPLEX Metro traffic from other traffic using zoning, VSANs or LSAN zones.
Additionally, physical isolation from the SAN can be achieved by connecting to separate switches
used to connect the data centers without requiring connection to the local SAN
• The director’s local com port is used for communications between directors within the cluster
Note: Cisco IVR is not supported on the VPLEX WAN COM fabric.
FC WAN connectivity utilizes Fibre Channel with standard synchronous distance limitations. Considerations
for Fibre Channel include latency/round trip conditions and buffer-to-buffer credits as well as the BB_credits
applied to distance. An excellent source for additional information is the Dell-EMC Symmetrix Remote Data
Facility (SRDF) Connectivity Guide or Dell-EMC Networked Storage Topology Guide, available through E-
Lab™ Interoperability Navigator at: http://elabnavigator.EMC.com.
Note: VPLEX short-write was introduced in VPLEX 5.2 SP1. The feature is not compatible with FCIP Fast
Write (Brocade) and FCIP Write Acceleration (Cisco). This feature must be disabled on the FCIP switches to
ensure that VPLEX works properly. Please refer to the latest VPLEX Simple Support Matrix – Distance
Extension section for details. https://support.emc.com/docu31803_Simple-Support-Matrix-VPLEX-and-
GeoSynchrony-.pdf
Latency/roundtrip conditions
Latency is generally referred to in milliseconds (ms) as the combined roundtrip time (RTT) between local and
remote clusters. A FC-Frame by itself takes approximately 1 ms to traverse a one-way distance of 100 km
from primary-transmitter to secondary-receiver.
For example, if two locations are 100 km apart, since standard Fibre Channel protocol requires two round
trips for a write I/O, then 4 ms of latency (2 x RTT) will be added to the write operation. VPLEX short-write,
introduced in VPLEX 5.2 SP1, will reduce number of required round trips for a write I/O to the equivalent of
1.5 round trips. However, as more network components are attached to the configuration for pure Fibre
Channel environments, latency will naturally increase. This latency can be caused by network components
such as host initiators, switches, fiber optics, and distance extension devices, as well as factors such as cable
purity. The VPLEX application layer will contribute additional delays on top of the network.
The supported network round trip latency is up to 10ms for VPLEX Metro. However, the actual supported
round trip latency depends not only on VPLEX but on the server OS, clustering software, and host level
applications. In addition, host virtualization, if applicable, may have requirements as well. Please see the
VPLEX Simple Support Matrix, as well as release notes and documentation for all host components (OS,
application, virtualization, and clustering).
Using performance monitoring tools, the roundtrip time can be determined for troubleshooting any WAN
latency issues within the network.
Buffer-to-buffer credits
Fibre Channel uses the BB_Credits (buffer-to-buffer credits) mechanism for hardware-based flow control. This
means that a port has the ability to pace the frame flow into its processing buffers. This mechanism eliminates
the need of switching hardware to discard frames due to high congestion. On VPLEX, during fabric login, the
VS2 8 Gbps FC ports advertise a BB credit value of up to 41. The switch will respond to the login with its
maximum support BB credit value. The BB credit value returned by the switch is used by the FC port, up to a
maximum of 41. Dell-EMC testing has shown this mechanism to be extremely effective in its speed and
robustness. The BB credits between the two clusters must be configured for long distance.
Refer to the Dell-EMC Networked Storage Topology Guide for proper calculations and settings for your WAN
connectivity.
A VPLEX Metro should be set up with redundant and completely independent Fibre Channel connectivity
between clusters. This provides maximum performance, fault isolation, fault tolerance, and availability.
Redundant fabrics are of critical importance due to the fact that when the directors in one cluster have
inconsistent connectivity with the directors in the remote cluster, the two clusters will be logically split until the
connectivity issues are resolved. This is by design. The firmware requires full connectivity among all
directors for protocols such as cache coherence and inter-cluster communication. Every local director must
be able to communicate with every remote director and vice versa. Without full connectivity, the director will
continue to run but will bring the inter-cluster link down. The net result is that all volumes at the non preferred
(or losing) site will become unavailable as per the pre-defined per volume cluster detach rules. Recovery is
simple, but manual. It requires that connectivity be restored between all directors prior to the resumption of
I/O operations. This scenario is much less likely if two independent and redundant connections are utilized
between clusters.
Note: Connecting the redundant fabrics to a single switch, regardless of the hardware redundancy within the
switch, anywhere between the two clusters is a single point of failure resulting in loss of redundancy. The best
practice is to have complete “air gap” separation between port group fabrics. These fabrics should be isolated
from the front-end and backend SAN fabrics and switches.
At Site-1:
At Site-2:
Place ISLs between switch-1A and switch-2A and between switch-1B and switch-2B. The best practice is to
have your ISL traffic travel through independent links (and/or carrier) between your sites.
The Fibre Channel WAN COM ports are configured into two different port groups. Ports on the WAN COM
module labeled FC00 will be in port group 0 and the ports labeled FC01 will be in port group 1. The inter-
cluster connectivity will be spread across two separate fabrics for HA and resiliency. Port group 0 will be
connected to one fabric and port group 1 will be connected to the other fabric.
A Fibre Channel WAN COM port in one cluster will be zoned to all of the FC WAN ports on the same port
group at the remote site. This is roughly equivalent to one initiator zoned to multiple targets. This is repeated
for each and every Fibre Channel WAN COM ports at both clusters.
This zoning provides additional fault tolerance and error isolation in the event of configuration error or a rogue
fabric device (when compared to a single large zone).
Although this requires more setup than a single zone, it is worth the effort and should not be considered out of
the norm for a SAN administrator.
Assuming two fabrics and dual-engine systems for Cluster A and Cluster B, each fabric would be zoned as
follows:
Key:
Zone configuration:
There would be 16 zones for the quad engine configuration. Please extrapolate from this example.
Note: If VPLEX is deployed with IP inter-cluster network (fcip), the inter-cluster network must not be able to
route to the following reserved VPLEX subnets: 128.221.252.0/24, 128.221.253.0/24, and 128.221.254.0/24.
To check for FC WAN connectivity, log in to the VPLEX CLI and run the following command:
Any issues, whether hardware or zoning related, would result in a detailed output showing the error:
/engines/engine-2-1/directors/director-2-1-A/hardware/ports/A2-FC00 ->
Missing connectivity to /engines/engine-1-1/directors/director-1-1-
A/hardware/ports/A2-FC00
/engines/engine-2-1/directors/director-2-1-B/hardware/ports/B2-FC00 ->
Missing connectivity to /engines/engine-1-1/directors/director-1-1-
A/hardware/ports/A2-FC00
/engines/engine-1-1/directors/director-1-1-A/hardware/ports/A2-FC00 ->
Missing all expected connectivity.
VPlexcli:/>
…
Directors discovered by director-1-1-A, UUID 0x0000000043e008d2:
Director ID Protocol Address Ports
------------------ -------- ------------------ -------
0x0000000043f008d2 COMIB 0x604800000a6fd0 A3-IB00
0x604800000a6fd8 A3-IB01
0x0000000043e00836 COMSCSI 0xc001448788361000 A2-FC00
0xc001448788361100 A2-FC01
0x0000000043f00836 COMSCSI 0xc001448788369000 A2-FC00
0xc001448788369100 A2-FC01
VPlexcli:/
Check to make sure that the director has connectivity to the remote directors using the ports “A2-FC00” and
“A2-FC01” shown in the “Ports” column. Connections to I/O Module 2 (A2 or B2) are WAN-COM connections
(shown in Bold). The display will also show LOCAL-COM connections to ports in I/O Module 3 (A3 or B3).
Repeat this process for all the remaining directors in your system and check to make sure that they can reach
the remote directors using both the FC WAN ports.
2.2.1 Requirements
• Latency may be up to 10ms rtt, depending on the use case. Many systems require 5ms rtt or less.
Please consult the latest VPLEX Simple Support Matrix.
• Must follow all the network requirements outlined in the Metro IP section below
IP WAN connectivity
• Each director’s IP WAN ports must be able to see at least one WAN port on every other remote
director (required).
• Independent WAN links are strongly recommended for redundancy
• Each director has two WAN ports that should be configured on separate hardware to maximize
redundancy and fault tolerance.
• Configure WAN links between clusters on network components that offer the same Quality of Service
(QoS).
• VPLEX uses best available path load balancing and is capable of adapting to changing WAN latency
• Logically isolate VPLEX Director WAN traffic from other WAN traffic
• The supported network round trip latency is less than or equal to 10ms.
Note: VPLEX Metro IP WAN cluster only supports IPv4 for GeoSynchrony releases 6.0.1 and newer , 5.2 and
older. IPv6 is available with GeoSynchrony 5.3, 5.4 and 5.5.”
A VPLEX Metro should be set up with redundant and completely independent networks between clusters
located over geographically different paths. This provides maximum performance, fault isolation, fault
tolerance, and availability.
Redundant networks are of critical importance due to the fact that when the directors in one cluster have
inconsistent connectivity with the directors in the remote cluster, the two clusters will be logically split until the
connectivity issues are resolved. This is by design. The firmware requires full connectivity among all
directors for protocols such as cache coherence and inter-cluster communication. Without full connectivity,
the director will continue to run but will bring the inter-cluster link down. The net result is that all volumes at
the losing site will become unavailable as per the pre-defined per volume or per consistency group cluster
detach rules or through the control based on VPLEX Witness.
Maximum Transmission Unit (MTU) size is also a configurable attribute. Performance measurements for
GeoSynchrony 6.0.1 and newer, do not show significant improvements with jumbo frames. It is
recommended that the default MTU of 1500 be kept.
3 Rule Sets
It is recommended that Consistency Groups be used for setting and maintaining detach rules.
As a minimum, set the detach timer to 5 seconds. Setting the detach delay lower than 5 seconds can result in
unnecessary or numerous storage detaches during periods of network instability. Multiple detaches in a short
period of time can also result in many unnecessary data rebuilds and subsequently in reduced performance.
Configure detach rules based on the cluster/site that you expect to continue I/O during any network outage.
Avoid conflicting detach situations. Each distributed device (or consistency group) must have a rule set
assigned to it. When a cluster’s distributed device detaches during a link outage or other communications
issue with the other members of a distributed device, the detached device can resume I/O. Therefore, it is
important to understand the nature of the outage and which cluster is set to automatically detach. It is a
recommendation that the rule set configuration for each distributed device or group of devices be documented
as well as plans for how to handle various outage types.
It is important to note that rule sets are applied on a distributed device basis or to a number of devices within
a consistency group. It is within normal parameters for different distributed devices to resume I/O on different
clusters during an outage. However, if a host application uses more than one distributed device, most likely
all of the distributed devices for that application should have the same rule set to resume I/O on the same
cluster.