Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Active-Active DC

Download as pdf or txt
Download as pdf or txt
You are on page 1of 117

How to Achieve True

Active-Active Data Centre


Infrastructures
BRKDCT-2615
Carlos Pereira – Distinguished Systems Engineer II - WW Data Center / Cloud - @capereir
1. This session material has more slides and information than what we will be able to cover during the 120

min, so you can have “even more fun” after this full session 

2. Some slides have a icon on the top right corner. It means they relate to an interesting reference or

associated information for the sake of completeness. I won’t cover them during the session.

3. Some slides have a icon on the top right corner. We will “fly over” those, if any.

4. Questions are welcome at any time. But if …


1. … we (myself, as the speaker, and any other Cisco representative on the room) don’t know the answer, I will find out and
come to you with the answer later;
2. … the question leads to a “rat-hole”, I will respectfully move ahead in order to keep the timing and flow going, but will have it
covered offline during the breaks, lunch or any available time during the week at Cisco Live.
Active / Active Data Centres
Typical Process

Everybody wantsthat
Then try to figure thisout
… Then,
… and feeldivide & conquer
tired (or panic  )

4
Objectives Legend
1. Understand the Active/Active Data
Centre requirements and considerations
Load SSL
2. Provide considerations for Active/Active Balancer Offloader APIC
DC Design – inclusive for Metro areas -
from storage, DCI, LAN extension, ACI
Application
and network services perspectives. SVI / HSRP Policy
IDS / IPS
3. Analyse the Active/Active aspects of Default Gw Infrastructure
cloud workloads and cloud-native Controller
applications with containers and micro-
services WAN
Accelerator Firewall
4. Deep dive on ACI Fabric extension,
stretch fabrics, application portability,
VM mobility, policy synchronisation, etc.
5. Share Experiences with State-full
Devices placements and their impact
within DCI environment

5
Agenda
• Active-Active (A/A) Data Centre:
– Market & Business Drivers
– Terminology, Criticality levels and Solutions Overview

• A/A Data Centre Design Considerations:


– Storage Extension
– Data Centre Interconnect (DCI) – L2 & L3 scenarios
• OTV, MPLS, VXLAN
– Containers and Micro-services

• Cisco ACI and Active / Active Data Centre


– Multi-Site
– Multi-POD
– Hybrid-Cloud

• A/A Metro Data Centres Designs


- Network Services and Applications (Path optimisation)

• Q&A
Industry Evolution & Data Centres
Digitisation and IoT/IoE

Traditional Applications Cloud-native applications


Open-ness Business Agility with cloud model
Monolithic Model
Micro-services / Bi-Modal IT / DevOps
Multi-tier Apps

Policy and Automation


Manual Interaction
Enterprise-wide policy, hyper-convergence and cross-domain
IT Silos based approach
DevOps automation
Configuration driven
Consumption driven with analytics and programmability

Focus on products Focus on business solutions


Data Centre is the foundation for business agility
Disjoint approaches to solve technical demands Agility &
Delivered as a solution and / or as a service,
Cohesiveness as “after thought” Scale

7
Global – Data Centre / Cloud Traffic Forecast (2014-2019)
Data Centre Traffic Highlights Cloud Traffic Highlights
Globally, data centre traffic will reach 10.4 Zettabytes per year Globally, cloud data centre traffic will represent 83% of total data
(863 Exabytes per month) by 2019, up from 3.4 Zettabytes per centre traffic by 2019, compared to 61% in 2014.
year (287 Exabytes per month) in 2014.
Globally, cloud data centre traffic will reach 8.6 Zettabytes per
Globally, data centre traffic will grow 3.0-fold by 2019, at a year (719 Exabytes per month) by 2019, up from 2.1 Zettabytes
CAGR of 25% from 2014 to 2019. per year (176 Exabytes per month) in 2014.
Globally, data centre traffic grew 35% in 2014, up from 2.6 Globally, cloud data centre traffic will grow 4.1-fold by 2019, at
Zettabytes per year (213 Exabytes per month) in 2013. a CAGR of 33% from 2014 to 2019.
Globally, 73.1% of data centre traffic will remain within the data Globally, cloud data centre traffic grew 52% in 2014, up from 1.4
centre by 2019, compared to 75.4% in 2014. Zettabytes per year (116 Exabytes per month) in 2013.
Globally, 18.2% of data centre traffic will travel to end users by Globally, consumer will represent 69% of cloud data centre traffic
2019, compared to 17.8% in 2014. by 2019, compared to 63% in 2014.
Globally, 8.7% of data centre traffic will travel between data
centres by 2019, compared to 6.8% in 2014.

Globally, consumer data centre traffic will represent 66% of http://www.cisco.com/c/en/us/solutions/service-provider/gci-highlights-tool/index.html


total data centre traffic by 2019, compared to 61% in 2014.
Global – Data Centre / Cloud Traffic Forecast (2014-2019)

http://www.cisco.com/c/dam/assets/sol/sp/gci/global-cloud-index-infographic.html
Two Market Transitions – One DC Network

Apps Portability, Cross-


Platform & Automation

Virtual LXC / Docker


Machines Containers

Applications PaaS
Infrastructure HyperScale
Traditional Application Centric Data Centres
Data Centre Infrastructure (ACI)
Networking

Network

Network + Services
DC Abstraction & Automation
Switching Apps Policy
The App Market Transition – From Traditional to Cloud-native

Traditional Monolithic Multi-tier App Cloud-Aware App

IaaS  PaaS
Terminology
• The Terminology around Workload and Business Availability / Continuity is not
always consistent
• Some examples:

“Availability Zone”

• AWS - Availability Zones are distinct locations within a region that are engineered to be
isolated from failures in other Availability Zones

• OpenStack - An availability zone is commonly used to identify a set of servers that have a
common attribute. For instance, if some of the racks in your data centre are on a
separate power source, you can put servers in those racks in their own availability zone.
Availability Zone & Regions – AWS Definitions

• Regions are large and widely


dispersed into separate
geographic locations.

• Availability Zones are distinct


locations within a region that are
engineered to be isolated from
failures in other Availability Zones
and provide inexpensive, low
latency network connectivity to
other Availability Zones in the
same region
Availability Zone and Regions - Openstack Definitions
• Regions - Each Region has its own full
Openstack deployment, including its own
API endpoints, networks and compute
resources. Different Regions share one
set of Keystone and Horizon to provide
access control and Web portal. (Newer
deployments do not share Keystone)

• Availability Zones - Inside a Region,


compute nodes can be logically grouped
into Availability Zones, when launching
new VM instance, we can specify AZ or
even a specific node in a AZ to run the VM
instance.
In-Region and Out-of-Region Data Centres
Business Continuity and Disaster Recovery
• Active/active — Traffic intended for the failed
node is either passed onto an existing node or
load balanced across the remaining nodes.

• Active/passive — Provides a fully redundant


instance of each node, which is only brought
online when its associated primary node fails.

• Out-of-Region – Beyond the ‘Blast Radius” for


any disaster
Business Continuity and Disaster Recovery
Ability to Absorb the Impact of a Disaster and Continue to Provide an Acceptable Level of Service.

“Operational Continuity / “Disaster Recovery (DR) /


Disaster Avoidance (DA)” Hybrid-Cloud”
Planned / Unplanned Service Continuity Loss of “Regional data centres” leads
within the Metro to recovery in a Remote data centre
(including a single data centre loss)
“Applications and services
extended across Metro and
Geo distances are a natural
“next step.”

DC-1 DC-2 DC-3


Active-Active

Metro Area Pervasive Data Protection


+ Infrastructure Rebalance
Industry Standard Measurements of Business Continuity

Time to Recover

Data Lost 7
p.m.
Application Resiliency and Business Criticality Levels
Defining How A Service Outage Impacts Business Will Dictate A Redundancy Strategy (And Cost)

Each Data Centre should accommodate all levels… Cost is important factor

Criticality Typical App Typical Cost $


Application Level Impact Description
Levels
Distribution
Mission Any outage results in immediate cessation of a primary function,
Lowest C1 equivalent to immediate and critical impact to revenue generation,
Imperative 100% Duplicate
RTO / RPO RTO/RPO brand name and/or customer satisfaction; no downtime is
Resources,
0 to 15 mins acceptable under any circumstances
~ 20% of Apps 2x Cost
Multiplier
Mission Critical Any outage results in immediate cessation of a primary function, (Most Costly)
C2 equivalent to major impact to revenue generation, brand name and/or
customer satisfaction;

Any outage results in cessation over time or an immediate reduction


C3 Business Critical of a primary function, equivalent to minor impact to revenue
generation, brand name and/or customer satisfaction ~ 20% of Apps Many-to-One
RTO/RPO
15+ mins Resource Sharing,
Lower Cost
Business A sustained outage results in cessation or reduction of a primary
C4 function ~ 40% of Apps Multiplier
Operational (Less Costly)
Highest
Business A sustained outage has little to no impact on a primary function
RTO / RPO ~ 20% of Apps
C5 Administrative

Two of the most common RTO/RPO Targets


Mapping Applications to
Business Criticality Levels
Data Centre Interconnect DCI
Business Drivers
• Data Centres are extending beyond traditional boundaries
• Virtualisation applications are driving DCI across PODs and across Data Centres

Drivers Business Solution Constraints IT Technology

Business  GSLB
 Stateless
Recovery  Disaster Recovery Plan  Geo-clusters
 Network Service
 HA Cluster
Operation Cost  Data Centre Maintenance /  DCI LAN extension
 LAN Extension
Containment Migration / Consolidation  DCI Layer 3
 Stateful
 Disaster Avoidance  Metro Virtual DC
Business Continuity  Bandwidth
 Virtualisation
Resource & Optimisation  Workload Elasticity  Hair-pinning
 VM mobility
 Latency
 Inter-Cloud Networking  VM Mobility
Cloud Services  Flexibility  Virtualisation
 XaaS  Automation
Application Centric View of Data Centre Interconnect (DCI)
• Applications consume resources across the Cloud DC infrastructure
• If an Application moves between sites, each element of the Application
Environment must also adjust to the new location
• DCI extends the Application Environment between Geographic sites within
Private Clouds and Public Clouds
• Critical IT Use Cases including Business Continuity, Workload Mobility, and
Disaster Recovery within Public and Private Clouds, impact each element of the
Application Environment

Multi DC DC Fabric L4–L7 Hypervisors Compute Storage


WAN and Cloud Networking and Virtual
Services Networking

Application Environment

DCI EXTENDS THE APPLICATION ENVIRONMENT ACROSS MULTIPLE


SITES, SUPPORTING PHYSICAL AND VIRTUAL ELEMENTS
DCI Enables Critical Use Cases within Private Clouds and Public Clouds
Including Business Continuity and Workload Mobility between Metro/Geo Sites
Site 2
Site 1 Business Continuity
Workload Mobility
Disaster Recovery and Avoidance
Site Migrations
Business Use Cases
Load Balanced Workloads
Operations Maintenance
Operations Rebalancing Public Clouds
Application Geo-Clusters
Application Environment
Multi DC DC Fabric L4–L7 Hypervisors Compute Storage
WAN and Cloud Networking Services and Virtual
Networking
Ex.1: Requirements for the Active-Active Metro Design – Hot Live Migration
Move Virtualised Workload Across Metro Data Centres While Maintaining Stateful Services

Business Continuity Use Cases for Live Mobility


Most Business Critical Applications (Lowest RPO/RTO)
Stateful Live Workload Migrations
Operations Rebalancing / Maintenance / Consolidation of Live Workloads
Disaster Avoidance of Live Workloads
Application HA-Clusters spanning Metro DCs (<10 ms)

Hypervisor Tools for Live Mobility


VMware vMotion or Hyper-V Live Migration
Stretched HA-Clusters across Metro DCs (<10 ms)
Host Affinity rules to manage resource allocation
Distributed vCenter or System Centre across Metro DCs

Metro DC Infrastructure to support Live Workload Mobility


Network: LAN Extension
Data Centre Interconnect and Localised E-W traffic
Virtual Switches Distributed across Metro distances
Maintain Multi-Tenant Containers
Localised E-W traffic using distributed Default Gateway
Services: Maintain Stateful Services for active connections
Minimise traffic tromboning between Metro DCs
Compute: Support Single-Tier and Multi-Tier Applications
Storage: Shared Storage extended across Metro
Synchronous Data Replication
Distributed Virtual Volumes
Hyper-V Shared Nothing Live Migration (Storage agnostic)

Cisco Public 24
Ex.2: Requirements for Metro/Geo Data Centres – Cold Migration
Move a Stopped Virtualised Workload across Metro/Geo DCs, Reboot Machine at New Site

Business Continuity Use Cases for Cold Mobility


Less Business Critical Applications (Medium to High RPO/RTO)
Planned Workload Migrations of Stopped VMs
Operations Rebalancing / Maintenance / Consolidation of Stopped Workloads
Subnet Extension
Disaster Avoidance of Stopped Workloads
Disaster Recovery of Stopped Workloads

Hypervisor Tools for Cold Mobility


VMware Site Recovery Manager (SRM) or Hyper-V Failover Clustering
Geo-Clusters across A/A or A/S Geographically dispersed DCs
Host Affinity rules to manage resource allocation
Many-to-One Site Recovery Scenarios

VMDC Infrastructure to support Cold Workload Mobility


Network: Subnet Extension
Data Centre Interconnect LAN Extension optional,
Localised N-S traffic using Ingress Path Optimisation
Create new Multi-Tenant Containers
Cold migration across unlimited distances
Services: Service connections temporarily disrupted
New service containers created at new site
Traffic tromboning between DCs can be reduced
Compute: Support Single-Tier and Multi-Tier Applications
Storage: Asynchronous Data Replication to remote site (ex.: NetApp
SnapMirror)
Hyper-V Replica Asynchronous Data Replication (Storage agnostic)
Virtual Volumes silo’d to each DC

Cisco Public
Cloud-Native Apps: Why does it Matter to Customers ?
Flexibility as Cloud native applications are fully portable (not dependent of a particular
infrastructure or cloud implementation)
Why / Where are customers Struggling on this Transition ?

Operations

“Traditional” Apps @ “Cloud-Native ” DC Infra =


Availability Challenge

“Cloud-Native ” Apps @ “Traditional” DC Infra =


Resource optimisation & Scale Challenges

Traditional Cloud-Native
“Reliable” “Agile”
Policy-based Management (Contiv)

• Lead industry thought leadership


around need for infrastructure policies
for containerised applications in a
shared infrastructure
• Contiv allows us to maintain leadership
with Cisco Networking, Cisco Storage,
Policy/Ops-Intent in container
communities.

http://www.contiv.io
Ex.3: Multi-site Abstraction and Portability of Network Metadata and
Cloud-native Applications Based on Micro-services (with Docker Containers)

ACI Application Network Profile


ACI Fabric – DC 01 ACI Fabric – DC 02

Data Centre 01 Data Centre 02

Docker-based Cloud-native Application Docker-based Cloud-native Application


Agenda
• Active-Active (A/A) Data Centre:
– Market & Business Drivers
– Terminology, Criticality levels and Solutions Overview

• A/A Data Centre Design Considerations:


– Storage Extension
– Data Centre Interconnect (DCI) – L2 & L3 scenarios
• OTV, MPLS, VXLAN
– Containers and Micro-services

• Cisco ACI and Active / Active Data Centre


– Multi-Site
– Multi-POD
– Hybrid-Cloud

• A/A Metro Data Centres Designs


- Network Services and Applications (Path optimisation)

• Q&A
Data Centre Interconnect
SAN Extension

 Synchronous I/O implies strict distance limitation


DC 1  Localisation of Active Storage is key
 Consistent LUN ID must be maintained for stateful migration DC 2

ESX-A source ESX-B target


SAN Extension
Synchronous vs. Asynchronous Data Replication
• Synchronous Data replication: The Application receives the acknowledgement for I/O complete when both
primary and remote disks are updated. This is also known as Zero data loss data replication method (or Zero RPO)
• Metro Distances (depending on the Application can be 50-300kms max)

• Asynchronous Data replication: The Application receives the acknowledgement for I/O complete as soon as the
primary disk is updated while the copy continues to the remote disk.
• Unlimited distances

Synchronous Asynchronous
Data Replication Data Replication

4 1 2 1

2 3

3
Storage Deployment in DCI
Option 1 - Shared Storage

Core Network

L2 extension for vMotion Network

DC 1 DC 2

Initiator
ESX-A source ESX-B target

Virtual Center

Volumes

Target
Storage Deployment in DCI
Shared Storage Improvement Using Cisco IOA

Core Network

L2 extension for vMotion Network

DC 1 DC 2

ESX-A source ESX-B target

Virtual Centre

Improve Latency using Cisco Write


Acceleration feature on MDS Fabric
Storage Deployment in DCI
Option 2 - NetApp FlexCache (Active/Cache)

Core Network

L2 extension for vMotion Network

DC 1 DC 2

NAS Read
Write ?

2 Temp
data
Read
Write
ESX-A source data 3 Cache ESX-B target
data
ACK 1
4
2 data
ACK

Virtual Center
 FlexCache does NOT act as a write-back cache
 FlexCache responds to the Host only if/when the original subsystem ack’ed to it
 No imperative need to protect a Flexcache from a power Failure
http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/DCI/4.0/Netapp/dciNetapp.html
Storage Deployment in DCI
Option 3 - EMC VPLEX Metro (Active/Active)

 Hosts at both sites instantly access


Distributed Virtual Volume
 Synchronisation starts at Distributed
Volume creation

Synchronous Latency
Distributed Virtual Volume
Fibre Channel

 WRITEs are protected on storage at both


Site A and B
 READs are serviced from VPLEX cache or
local storage.

DC A DC B
Storage Deployment in DCI
Option 3 - EMC VPLEX Metro (Active/Active)

Core Network

L2 extension for vMotion Network

DC 1 DC 2

Initiator
ESX-A source ESX-B target

Virtual Centre

From the Host

VPLEX Virtual Layer

Target
From the Storage

LUNv LUNv

EMC
Initiator CLARiiON
EMC
VMAX
VPLEX Synchronous Latency requiments 5ms max VPLEX
Target Engine Engine

http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/DCI/4.0/ EMC/dciEmc.html
Storage Deployment in DCI
Option 4 - EMC VPLEX Geo (Active/Active)
 Active/Active Storage Virtualisation Platform for the Private and
Hybrid Cloud
 Enables workloads Mobility over Long distance at Asynchronous
distances using Microsoft Hyper-V.

OTV WAAS WAAS OTV


L3 Network
Infrastructure

Distributed Virtual Volumes


SAN SAN
Asynchronous latency up to 50ms max
VPLEX Geo VPLEX Geo
cluster cluster

 Uses existing WAN bandwidth which can be optimised


EMC or  Windows Failover Cluster is used to provide HA features, Live EMC or
non-EMC Migration and Cluster Shared volume capability non-EMC
Storage Storage

DC A DC B
http://www.emc.com/collateral/hardware/white-papers/h8214-application-mobility-vplex-geo-wp.pdf
Evolution of Storage @ the Tale of 2 ITs

Cisco Hyperconverged Solutions


&
Software Defined Storage

SPEED

EFFICIENCY

Cisco Integrated Infrastructure Solutions


Cisco’s Storage Strategy

Operational Simplicity
Applications Operations Infrastructure
Traditional Emerging Mode 1 Mode 2 Array-Based Server-Based

VM VM VM VM VM
+ APP
+ +

3 Simplified Consumption
Cisco UCS Director Storage Orchestration
UCS Manager Storage Profiles

Optimisation for Server-based Storage


2
UCS Hyperconverged Solutions
UCS Software-Defined Storage Solutions

1 Comprehensive Array Storage Integration


Cisco Integrated Infrastructure Solutions
Cisco Storage Portfolio Completion
Current offering New Offering Current offering

UCS M-Series
Fourth Generation UCS HyperFlex Systems Modular Servers

UCS C3160

INTEGRATED HYPERCONVERGED
UCS MINI UCS B-SERIES SCALE OUT
INFRASTRUCTURE

One System, One Management Model

Edge Core Data Centre Cloud

UCS as the common platform for heterogeneous workloads (with Storage) in the Data Centre
Contiv Storage Provides Policy-Rich Container Storage that
Leverages Ceph/NFS Underneath - volplugin
"volplugin", despite the name, is actually a suite of
components:

• volmaster is the master process. It exists to coordinate the


volplugins in a way that safely manages container volumes. It
talks to etcd to keep its state. volmaster is completely
stateless, and can run as multi-host for redundancy

• volplugin is the slave process. It exists to bridge the state


management between volmaster and docker, and to mount
volumes on specific hosts. volplugin needs to run on every host
that will be running containers

• volcli is a utility for managing volmaster's data. It makes both


REST calls into the volmaster and additionally can write directly
to etcd.

https://docs.docker.com/engine/extend/plugins/
Agenda
• Active-Active (A/A) Data Centre:
– Market & Business Drivers
– Terminology, Criticality levels and Solutions Overview

• A/A Data Centre Design Considerations:


– Storage Extension
– Data Centre Interconnect (DCI) – L2 & L3 scenarios
• OTV, MPLS, VXLAN
– Containers and Micro-services

• Cisco ACI and Active / Active Data Centre


– Multi-Site
– Multi-POD
– Hybrid-Cloud

• A/A Metro Data Centres Designs


- Network Services and Applications (Path optimisation)

• Q&A
Extending Virtual Tenant Space outside the Fabric
Logical view of Multi-tier Applications
IP network IP network
Maintain the L2 segmentation End
to End toward the remote DC
Maintain the L3
segmentation of the
Layer 3 Edge public networks (VRF) Layer 3 Edge Gateway
Gateway to the outside world

Web Segment public network Web Segment public network

Web Web

App Segment private network App Segment private network

App App

DB Segment private network DB Segment private network

DB DB

L3 connectivity outside the fabric L2 connectivity outside the fabric


Scope of the L2 DCI Requirements

Must have: Additional Improvements:


Failure domain containment  ARP suppress
 Control Plane Independence
 ARP caching
o STP domain confined inside the DC
o EVPN multi-domains  VLAN translation
 Control-plane MAC learning  IP Multicast or Non-IP Multicast choice
Reduce any flooding
 Multi-homing
 Control the BUM*
 Site Independence  Path Diversity (VLAN based / Flow-based / IP-based)
Dual-homing with independent paths  Load Balancing (A/S, VLAN-based, Flow-based)
Reduced hair-pinning  Localised N-S traffic (for long distances)
 Distributed L2 Gateway on TOR
 Localised E-W traffic o Ingress Path Optimisation (LISP)
o FHRP Isolation o Works in conjunction with egress Path
o Anycast L3 Default Gateway optimisation (FHRP localisation, Anycast L3
Fast convergence
Gateway)
Transport agnostic

* Broadcast, Unknown Unicast and Multicast


LAN Extension for DCI
Technology Selection
Over dark fibre or protected D-WDM
 VSS & vPC
 Dual site interconnection Metro style
Ethernet
 FabricPath & VXLAN & ACI Stretched Fabric
 Multiple sites interconnection

MPLS Transport
 EoMPLS
 Transparent point to point
 VPLS SP style
MPLS
 Large scale & Multi-tenants, Point to Multipoint
 PBB-EVPN
 Large scale & Multi-tenants, Point to Multipoint

IP Transport
 OTV
 Enterprise style Inter-site MAC Routing IP style
IP  LISP
 For Subnet extension and Path Optimisation
 VXLAN/EVPN – evolving
 Emerging A/A site interconnect (Layer 2 only or with Anycast L3 gateway)
DCI LAN Extension
IP-Based Solution
OTV
Overlay Transport Virtualisation
Technology Pillars
OTV is a “MAC in IP” technique to extend Layer 2 domains
OVER ANY TRANSPORT

Dynamic Encapsulation Nexus 7000 Protocol Learning


First platform to support OTV
No Pseudo-Wire State
(since 5.0 NXOS Release) Preserve Failure Boundary
Maintenance
Optimal Multicast
Built-in Loop Prevention
Replication
Multipoint Connectivity ASR 1000 Automated Multi-homing
Now also supporting OTV
(since 3.5 XE Release)
Point-to-Cloud Model Site Independence 56
Overlay Transport Virtualisation
OTV Control Plane
• Neighbour discovery and adjacency over
• Multicast (Nexus 7000 and ASR 1000)
• Unicast (Adjacency Server Mode currently available with Nexus 7000 from 5.2 release)

• OTV proactively advertises/withdraws MAC reachability (control-plane learning)


• IS-IS is the OTV Control Protocol - No specific configuration required

4
VLAN MAC IF

1 100 MAC A IP A
3 New MACs are
learned on VLAN 100
OTV updates exchanged via 100 MAC B IP A
the L3 core
Vlan 100 MAC A 3 300 MAC C IP A

Vlan 100 MAC B

Vlan 300 MAC C 2 East


IP A 4
VLAN MAC IF
West 100 MAC A IP A
3
100 MAC B IP A

300 MAC C IP A

South 58
Overlay Transport Virtualisation
Inter-Sites Packet Flow

4
MAC TABLE Transport MAC TABLE
VLAN MAC IF Infrastructure VLAN MAC IF
Decap
100 MAC 1 Eth 2 IP A IP B 100 MAC 1 IP A
3 5
100 OTV MAC 2 Eth 1 OTV
Encap OTV
100 MAC 2 OTV
IP A 6
100 MAC 3 IP B MAC 1  MAC 3 IP A IP B
MAC 1  MAC 3 IP A  IP B Layer 2
100 MAC 3 Eth 3
Lookup
100 MAC 4 IP B 100 MAC 4 Eth 4

MAC 1  MAC 3
MAC 1  MAC 3 West East
Server 1 Site Site Server 3
1 7
59
Placement of the OTV Edge Device
Option 1 | 2: OTV in the DC Aggregation
 L2-L3 boundary at aggregation
 DC Core performs only L3 role
 STP and L2 broadcast Domains isolated between PODs
 Intra-DC and Inter-DCs LAN extension provided by OTV
SVIs SVIs SVIs SVIs

vPC vPC

o Requires the deployment of dedicated OTV VDCs


 Ideal for single aggregation block topologies
 Recommended for Green Field deployments
o Nexus 7000 required in aggregation

 Easy deployment for Brownfield SVIs SVIs

 L2-L3 boundary in the DC core


 DC Core devices performs L2, L3 and OTV
functionalities
Placement of the OTV Edge Device
Option 3 - SVI enabled on different platforms

 The Default Gateway (SVI) are distributed among the Leafs (Anycast Gateway)
 The Firewalls host the Default Gateway
 No SVIs at the Aggregation Layer or DCI Layer
 No Need for the OTV VDC
Spine

L3
Border-Leaf
L2
Core
Leaf
OTV OTV OTV OTV
Def Def
L3 GWY GWY
Anycast L3 GWY
OTV DCI Layer L2
Aggregation

Firewall Firewall
OTV
Summary
 Extensions over any transport (IP, MPLS)
 Failure boundary preservation
Only 5 CLI
 Site independence commands

 Optimal BW utilisation (IP MC)


 Automated Built-in Multi-homing
 End-to-End loop prevention
 Operations simplicity
 VXLAN Encapsulation
 Improvement
• Selective Unicast Flooding
• F-series internal interfaces
• Logical Source Interfaces w/Multiple Uplinks*
• Dedicated Distribution Group for Data Broadcasting
• Tunnel Depolarisation
• VLAN translation
• Improved Scale & Convergence
DCI LAN Extension
MPLS-based Solution
E-VPN
E-VPN
Main Principles

• Control-Plane Distribution of Customer BGP


MAC-Addresses using BGP
• PE continues to learn C-MAC over AC
• When multiple PEs announce the same C-MAC,
hash to pick one PE PE PE

• MP2MP/P2MP LSPs for Multicast Traffic


Distribution
• MP2P (like L3VPN) LSPs for Unicast
PE PE
Distribution
• Full-Mesh of PW no longer required !!
Supported Access Topologies

Single Home Device (SHD) Dual Home Device (DHD) Dual Home Device (DHD) Dual Home Network (DHN)
Single Home Network (SHN) Active / Active Per-Flow LB Active / Active Per-Service LB Active / Active Per-Service LB
PE1
PE1 PE1
Bundle-
CE1 Eth25
Bundle-
Bundle- Eth25
Eth25 VID X
PE1 VID X
VID X Bundle- CE1
CE1 Eth1 CE1 MST-AG / REP-AG
MPLS MPLS MPLS MPLS
MST on N-PE
Core Core Core Core
G.8032
P-mLACP
VID X
VID X VID Y
Bundle- Bundle-
Eth25 Eth25
Bundle-
CE2 Eth25
PE2 PE2 PE2

• Physical Interfaces  Bundle Interface  Physical Interfaces  Physical Interfaces

• Bundle Interfaces (shown)  Bundle Interfaces (shown)  Bundle Interfaces (shown)

69
Optimise L2 Forwarding with PBB-EVPN

Optimised L2 forwarding High Scale


 2M MAC ( 10M+ w per-LC MAC learning in the future)
 Per-flow load balancing, ECMPs  64K+ BDs
 Fast network convergence  PBB-EVPN doesn’t require PW, ultra high DC sites scale
 Active/Active MC-LAG  128K VLAN scale, flexible VLAN translation: all ToR
 Optimised multicast forwarding with LSM (P2MP-TE) switches can keep the same 4K VLAN provisioning

PBB-EVPN
IP-VPN/L3

7K/6K/5K
FEX 2K

2-layers architecture, high control plane scale, small-medium physical ports scale
More ToR, more Tenant scale
ASR 9K MACSec: 100G Linerate encryption

•Linerate link level MACSec encryption on every 100G port on the linecard: Industry best solution
•Supports the max security protection available commercially: NSA suite-B standards
•Supports both P2P and P2MP encryption over the cloud
•Other devices in the cloud could be MACSec agnostic (act as pass through) and still get end to end encryption
benefits

ASR9000
MPLS Cloud PE
CE

ASR9000 PE
CE
EoMPLS/VPLS, E-VPN

MACSec
PE

Payload VLAN MACSec Inner D/S Payload VLAN MACSec Inner D/S MPLS Outer D/S Payload VLAN MACSec Inner D/S
Header MAC Header MAC Labels MAC Header MAC

MACSec Protected MACSec Protected MACSec Protected


Interconnecting VXLAN
Fabrics
Stretched Fabric
Considerations

No clear boundary demarcation


Shared multicast domain
Gateway localisation per site
o Anycast gateway
Fabric o E/W traffic local routing
o N/S egress path optimisation
Metro Distance – Dark fibre / DWDM o N/S ingress requires additional technics (LISP, RHI, GSLB)

Hardware based:
o One global L3 only fabric
o Anycast VTEP L2 or L3 gateway distributed
o VxLAN EVPN (ToR)
Fabric
o ACI (ToR)
Metro Distance – Dark fibre / DWDM
o VLAN translation with local VLAN significant
Network Dual-Fabric
Considerations

DCI model with dedicated DCI device


 Failure domain isolation
Dual site vPC
 E-W traffic localisation
o Distributed Active L3 gateways
 N-S traffic localisation
Metro Distance – Dark fibre / DWDM o Egress path optimisation
o Ingress path optimisation (LISP or IGP assist)
 Dual homing
 Flow control between site
 VLAN Translation
o per Site, per ToR, per Port,
L3 WAN  Unicast L3 WAN supported
 Path diversity
OTV/EVPN

Any Distance

N-S Traffic localisation is a choice between efficiency (latency-sensitive Application) and elaboration
Connecting a VXLAN
fabric to Outside
Overview and Models
Host Reachability is End-to-End

VXLAN Stretched Fabric Layer 3


From Transit Leaf Nodes
Transit Spine nodes
VXLAN Stretched Fabric
 VXLAN/EVPN Stretched Fabric using Transit Leaf Nodes
o Host Reachability information is distributed End-to-End.
o Transit Leaf (or Spine) node can be a pure Layer 3 only platform Traffic is encapsulated & de-encapsulated on each far end side
o Data Plane is stretched End-to-End (VXLAN tunnels are established from site to site)
VXLAN Leaf / Spines
 When to use VXLAN/EVPN Stretched Fabric ? * DCI Leaf nodes
o Across Metro distances, Private L3 DCI, IP Multicast available E2E VXLAN tunnel
o Currently Up to 256 Leaf Nodes End-to-End
 Why to use it ?
iBGP AS 100 eBGP iBGP AS 200
o VXLAN/EVPN intra-fabric within multiple Greenfield DC Host Reachability is End-to-End

What is the Cisco Value of it ?


o VXLAN EVPN MP-BGP Control-Plane Transit Leaf nodes

o IRB Symmetrical Routing and Anycast L3 Gateway


o Storm Control, BPDU Guard, HMM Route Tracking
o ARP suppress
VXLAN Stretched Fabric
o Bud-Node support
Traffic is encapsulated & de-encapsulated on each far end side
* Do NOT necessarily terminate the overlay tunnel
L3 Spine
VXLAN stretched Fabric Design Consideration VXLAN Leaf
Control Plane Functions delineation DCI Leaf nodes
vPC domain (optional)

Underlay network (Layer 3)


Used to exchange VTEP
reachability information
Separated IGP Area
Area 0 being the Inter-site
links

Overlay Routing Control AS 1 iBGP AS 2 iBGP


Plane Separated MP-iBGP iBGP eBGP iBGP
(AS) interconnected via MP-
eBGP sessions DC-1 DCI/L3 DC-2
eBGP AS 1 AS 2

Data Plane & Reachability information is End-to-End


Data Plane is End-to-End
Host Reachability
VxLAN Fabric end-to-end
Information is End-to-End
VTEP tunnels are extended DC-1 DCI/L3 DC-2
inside or across sites Reachability Information is End-to-End
VXLAN stretched Fabric Design Consideration
IP Multicast separation
 BUM traffic relies either on IP Multicast or Ingress Replication
 BUM traffic intra-site cannot be dissociated from the inter-sites capabilities
o if IP Multicast is not supported for inter-site communication, Ingress Replication must be used for the whole stretched
Fabric
 The total number of IP Multicast group supported within the VXLAN Stretched Fabric relies on the maximum
IP Multicast groups supported inter-site (in case of L3 managed services for example)
 Recommended independency with PIM based Anycast-RP for Intra-site and MSDP for inter-sites
 Preferred mode IP Multicast End-to-End for performance and scalability purposes

MSDP PIM-SM Domain B


PIM-SM Domain A
PIM Anycast RP PIM Anycast RP
VXLAN Stretched Fabric

L3 Spine
VXLAN Leaf

IP Multicast End-to-End DCI Leaf nodes


vPC and Stretched VXLAN Fabric
Increase MTU
Stretched VXLAN Fabric doesn't require VTEP by 50 bytes

capable DCI nodes (no Encap/Decap functions)


• MAC-to-UDP encapsulation requires to increase the VTEP VTEP
MTU in the transport network by at least 50 bytes
• It is common usage to deploy Jumbo Frames VXLAN Stretched Fabric

Intra/Inter-DC L3 Spine
End-nodes
VXLAN Leaf
vPC Domain
DCI Leaf nodes

When a Transit Leaf node becomes also a


Compute or Service Leaf, it may be part of a vPC vPC Configuration with
domain anycast VTEP

• It Encaps/Decaps (VTEP) VXLAN traffic from and to


VTEP VTEP
the locally attached nodes while it is an IP transit VTEP VTEP

device for DCI (requires ‘Bud Node’ support) Bud Nodes


VXLAN Stretched Fabric
VXLAN Dual-Fabrics
VXLAN Dual-Fabrics VXLAN Leaf
* DCI Leaf nodes
Interconnecting multiple Fabrics using DCI solutions
VXLAN tunnel

 When to use VXLAN/EVPN Dual-Fabric ?


o Separated Independent VXLAN/EVPN Fabrics interconnected using a DCI approach
o Support any distances from Metro to Inter-continental
o Isolated Data plane Overlay and Control Plane Overlay
o Host reachability information is contained within a single site

L2 extension End-2-End
VxLAN/EVPN Fabric 1 VxLAN/EVPN Fabric 2
 Why to use it ?
o Scalability
o Delineation for the Fabric interconnection Dot1Q Layer 3 Dot1Q
DCI/L2

What is the Cisco Value of it ? Host Reachability is local to each site Host Reachability is local to each site
o Multi-site Anycast L3 Gateway
o Storm Control, BPDU Guard
o DCI technology agnostic (Over IP, Over MPLS, DWDM…)
* Do NOT terminate the overlay tunnel
Dual VXLAN Fabrics interconnection
Inter-Fabric Connectivity using DCI solution

• Interconnecting VXLAN/EVPN Fabric with traditional DCI


solutions WAN
Layer-3
• OTV, VPLS, PBB-EVPN, vPC, VXLAN
L3
• Separated Control-Plane Domains (EVPN) DCI layer
L2
• Separated Data-Plane Encapsulation
VLAN peering VLAN peering
VLAN 300 VLAN 300
• VLAN Hand-off toward the DCI layer VLAN
hand-off
• Can be initiated from the Spine layer or from the
V V V V
BorderLeaf
VXLAN or VXLAN or
VXLAN/EVPN VXLAN/EVPN
VNI 30000 VNI 31000

V V V V V V

91
Dual VXLAN Fabrics interconnection
Inter-Fabric Connectivity using OTV solution

• (OTV) DCI nodes can be configured in the stick


from the Border Spine L3 from DC fabric toward
Core Routers WAN
Layer-3
• VLAN hand-off toward the DCI layer
• Design Option:
• FHRP Filtering for Active Active Defaut Gateway on
both sites
OTV OTV OTV OTV

• Domain Failure Containment DCI VLAN


L3 V V V V
L2
hand-off
• Unknown Unicast are discarded by OTV functions
VXLAN or VXLAN or
VXLAN/EVPN

EVPN
VXLAN/EVPN

iBGP

EVPN
• Initiated from the Spine or the Border Leaf

iBGP
VNI 30000 VNI 31000

V V V V V V

92
Agenda
• Active-Active (A/A) Data Centre:
– Market & Business Drivers
– Terminology, Criticality levels and Solutions Overview

• A/A Data Centre Design Considerations:


– Storage Extension
– Data Centre Interconnect (DCI) – L2 & L3 scenarios
• OTV, MPLS, VXLAN
– Containers and Micro-services

• Cisco ACI and Active / Active Data Centre


– Multi-Site
– Multi-POD
– Hybrid-Cloud

• A/A Metro Data Centres Designs


- Network Services and Applications (Path optimisation)

• Q&A
Differentiation for Nexus/ACI Solutions with Contiv

ACI: Automated Networking, Policies,


</code>
Prioritisation, network uniformity for various
WEB APP DB
workloads

App1 App2

Native Apps: Better Visibility, Diagnostics, Analytics,


Interoperable Standards Based

Network SLAs for Applications: App to App


App1 App2
with physical infrastructure integration
Contiv Networking Provides Policy-rich Container Networking
That Integrates with Cisco Nexus and ACI

Application Composition
+ • Contiv.io is an open-source project that creates a
Policy Intent policy framework in different domains of containers
• Network Policies: Policies for Application Security,
Contiv Master Prioritisation, and Network Resource Allocation
Docker | Kubernetes | • Network Services for Apps (Virtual or Physical
Mesos Plugin Agents Service appliances)
• Analytics/Diagnostics
• Integrates with Cisco ACI, Nexus, and UCS
Solutions
Node 1 Node2 Node-n
• Status: Beta
Policy-Based Container Networking with Project Contiv
– Netplugin (1/2)
• Contiv netplugin is a generic networking plugin that is designed to provide multi host policy based networking for containerised
applications.

• Netplugin is designed to be used with docker containers and with cluster schedulers like Swarm, Kubernetes and Mesos

Netplugin architecture & components:

• Netmaster is the master daemon that is responsible for


storing the Intent and distributing state to all nodes in
the cluster.

• Netplugin: Netplugin is a long running daemon on every


node in the cluster. As a docker network plugin, it is
responsible for setting up network and policy for each
container.

• Netctl: netctl is a command line tool to modify the intent

https://docs.docker.com/engine/extend/plugins/
Policy-Based Container Networking with Project Contiv
– Netplugin (2/2)
Contiv Object Model:
• Contiv object model provides a way for users to specify their Intent.
• Netmaster provides a REST API to view and modify contiv object model.
• Netctl command-line tool is a convenient utility to interact with the object model

Network Related
How Does it Work? ACI and Contiv Plugin
Agenda
• Active-Active (A/A) Data Centre:
– Market & Business Drivers
– Terminology, Criticality levels and Solutions Overview

• A/A Data Centre Design Considerations:


– Storage Extension
– Data Centre Interconnect (DCI) – L2 & L3 scenarios
• OTV, MPLS, VXLAN
– Containers and Micro-services

• Cisco ACI and Active / Active Data Centre


– Multi-Site
– Multi-POD
– Hybrid-Cloud

• A/A Metro Data Centres Designs


- Network Services and Applications (Path optimisation)

• Q&A
Fabric Infrastructure
Important Concepts – Inside and Outside APIC

‘Outside’ EPG associated Forwarding Policy for ‘inside’ EPG’s defined by associated
with external network Bridge Domain network policies
policies (OSPF, BGP, …
peering)

Web App DB

Outside QoS QoS QoS


(Tenant VRF) Service
Filter Filter

Location for Endpoints that are


‘Outside’ the Fabric are found via Location for Endpoints that are ‘Inside’ the
redistributed routes sourced from Fabric are found via the Proxy Mapping DB
the externally peered routers (Host Level Granularity)
(Network Level Granularity)
ACI MultiPod/MultiSite Use Cases
 Single Site Multi-Fabric
Multiple Fabrics connected within the same DC (between halls, buildings, … within the same
Campus location)
Cabling limitations, HA requirements, Scaling requirements

 Single Region Multi-Fabric (Classic Active/Active scenario)


Scoped by the application mobility domain of 10 msec RTT
BDs/IP Subnets can be stretched between sites
Desire is reducing as much as possible fate sharing across sites, yet maintaining operational
simplicity

 Multi Region Multi-Fabric


Disaster Recovery – Minimal Cross Site Communication
Deployment of applications not requiring Layer 2 adjacency
ACI Multi-Fabric Design Options
Single APIC Cluster/Single Domain Multiple APIC Clusters/Multiple Domains
Stretched Fabric Dual-Fabric Connected (L2 and L3 Extension)
ACI Fabric ACI Fabric 1 ACI Fabric 2
Site 1 Site 2

L2/L3

Multi-POD (Q3CY16) Multi-Site (Future)

POD ‘A’ IP Network POD ‘n’ Site ‘A’ IP Network Site ‘n’

MP-BGP - EVPN

… MP-BGP - EVPN

APIC Cluster
Stretched ACI Fabric
ACI Stretched Fabric

DC Site 1 DC Site 2

vCenter

Transit leaf Transit leaf

 Fabric stretched to two sites  works as a  Work with one or more transit leaf per site  any
single fabric deployed within a DC leaf node can be a transit leaf
 One APIC cluster  one management and  Number of transit leaf and links dictated by
configuration point redundancy and bandwidth capacity decision
 Anycast GW on all leaf switches
Stretched ACI Fabric
Support for 3 Interconnected Sites Site 2

Transit leafs in all sites connect to the


local and remote spines

Site 1

Site 3
Transit Leaf

2x40G or 4x40G
ACI Multi-POD Solution
Overview Inter-POD Network

POD ‘A’ POD ‘n’

MP-BGP - EVPN

Single APIC Cluster


IS-IS, COOP, MP-BGP IS-IS, COOP, MP-BGP

 Multiple ACI PODs connected by an IP Inter-POD  Forwarding control plane (IS-IS, COOP) fault
L3 network, each POD consists of leaf and spine isolation
nodes  Data Plane VXLAN encapsulation between
 Managed by a single APIC Cluster PODs
 Single Management and Policy Domain  End-to-end policy enforcement
ACI Multi-POD Solution
Use Cases
Handling 3-tiers physical POD
cabling layout Inter-POD
Leaf Nodes Network
Cable constrain (multiple buildings,
campus, metro) requires a second
tier of “spines” Spine Nodes
Preferred option when compared to
ToR FEX deployment

Evolution of Stretched Fabric POD 1 POD 2


design
Metro Area (dark fibre, DWDM),
L3 core
DB
APIC Cluster
Web/App Web/App
>2 interconnected sites
ACI Multi-POD Solution
SW and HW Requirements

Software
The solution will be available from Q3CY16 SW Release

Hardware
The Multi-POD solution can be supported with all currently shipping Nexus
9000 platforms
The requirement is to use multicast in the Inter-POD Network for handling BUM
(L2 Broadcast, Unknown Unicast, Multicast) traffic across PODs
ACI Multi-POD Solution
Supported Topologies
Intra-DC Two DC sites connected
back2back
10G/40G/100G
40G/100G 40G/100G
POD 1 POD n POD 1 40G/100G 40G/100G
POD 2
Dark fibre/DWDM (up to
10 msec RTT)

APIC Cluster
DB Web/App Web/App DB
APIC Cluster
Web/App Web/App

3 DC Sites Multiple sites interconnected


POD 1 POD 2
40G/100G
10G/40G/100G
40G/100G
by a generic L3 network
Dark fibre/DWDM (up to 40G/100G 40G/100G
10 msec RTT)

L3
40G/100G 40G/100G
40G/100G (up to 10 msec RTT)

POD 3
ACI Multi-POD Solution
Inter-POD Network (IPN) Requirements
 Not managed by APIC, must be pre-configured

 IPN topology can be arbitrary, not mandatory to POD ‘A’ 40G/100G 40G/100G
POD ‘B’
connect to all spine nodes
MP-BGP - EVPN
 Main requirements:
• 40G/100G interfaces to connect to the spine nodes DB APIC Cluster
Web/App Web/App

• Multicast BiDir PIM  needed to handle BUM traffic

• DHCP Relay to enable spine/leaf nodes discovery across PODs

• OSPF to peer with the spine nodes and learn VTEP reachability

• Increased MTU support to handle VXLAN encapsulated traffic

• QoS (to prioritise intra APIC cluster communication)


ACI Multi-POD Solution
APIC Cluster Deployment

 APIC cluster is stretched across multiple PODs


Central Mgmt for all the PODs (VTEP address, VNIDs,
class-IDs, GIPo, etc.)
POD ‘A’ POD ‘B’
Centralised policy definition
Recommended not to connect more than two APIC MP-BGP - EVPN
nodes per POD (due to the creation of three replicas
per ‘shard’) DB APIC Cluster
Web/App Web/App

 The first APIC node connects to the ‘Seed’ POD


Drives auto-provisioning for all the remote PODs

 PODs can be auto-provisioned and managed


even without a locally connected APIC node
ACI Multi-POD Solution
Inter-PODs MP-BGP EVPN Control Plane

 MP-BGP EVPN used to communicate


Endpoint (EP) and Multicast Group 172.16.1.10 Leaf 1 172.16.1.10 Proxy A
Proxy A
information between PODs 172.16.2.40
172.16.1.20
Leaf 3
Proxy B
172.16.2.40
172.16.1.20 Leaf 4
172.16.3.50 Proxy B 172.16.3.50 Leaf 6
All remote POD entries associated to a Proxy
VTEP next-hop address Proxy A MP-BGP - EVPN Proxy B

 Single BGP AS across all the PODs

 BGP EVPN on multiple spines in a POD 172.16.2.40


APIC Cluster
172.16.1.20 172.16.3.50
172.16.1.10
(minimum of two for redundancy)
Some spines may also provide the route
reflector functionality (one in each POD)
Multi-POD and “GOLF Project”
Multi-DC Deployment – Control Plane

GOLF devices inject host routes into


the WAN or register them in the
Host routes for endpoint belonging to LISP database
Host routes for endpoint belonging to
public BD subnets in POD ‘A’
public BD subnets in POD ‘B’

MP-BGP EVPN Control Plane


MP-BGP EVPN Control Plane

POD ‘A’ IPN POD ‘B’

Web/App Web/App
DB DB
Single APIC Cluster
ACI Multi-POD Solution
Summary

 ACI Multi-POD solution represents the natural evolution of the


Stretched Fabric design
 Combines the advantages of a centralised mgmt and policy
domain with fault domain isolation (each POD runs independent
control planes)
 Control and data plane integration with WAN Edge devices
(Nexus 7000/7700, ASR 9000 and ASR 1000) completes and
enriches the solution
 The solution is planned to be available in Q3CY16 and will be
released with a companion Design Guide
ACI Dual-Fabric Solution
ACI Fabric 1 ACI Fabric 2

L2/L3
DCI

 Independent ACI Fabrics interconnected via L2  Data Plane VXLAN encapsulation terminated
and L3 DCI technologies at the edge of each Fabric
 Each ACI Fabric is independently managed by a VLAN hand-off to the DCI devices for providing
Layer 2 extension service
separate APIC cluster
 Requires to classify inbound traffic for
 Separate Management and Policy Domains
providing end-to-end policy extensibility
Multi-Site Fabrics
Reachability
Fabric ‘A’ Fabric ‘B’
mBGP - EVPN

DB Multiple APIC Clusters


Web/App Web/App

 Host Level Reachability Advertised between Fabrics via BGP


 Transit Network is IP Based
 Host Routes do not need to be advertised into transit network
 Policy Context is carried with packets as they traverse the transit IP Network
 Forwarding between multiple Fabrics is allowed (not limited to two sites)
Multi-Fabric Scenarios
Multi-Site
Fabric ‘A’ Fabric ‘B’
mBGP - EVPN

DB Multiple APIC Clusters


Web/App Web/App

Web1 Web2 Import Web & App Export Web & App Web1 Web2
from Fabric ‘B’ to Fabric ‘A’

App1 App2 App1 App2


Export Web, App, Import Web, App,
DB to Fabric ‘B’ DB from Fabric ‘A’
dB1 dB1 dB2
Multi-Site Fabrics
Policy
 EPG policy is exported by source site to desired peer target site fabrics
Fabric ‘A’ advertises which of its endpoints it allows other sites to see
 Target site fabrics selectively imports EPG policy from desired source sites
Fabric ‘B’ controls what it wants to allow its endpoints to see in other sites
 Policy export between multiple Fabrics is allowed (not limited to two sites)

Site ‘A’ Site ‘B’

Web1 Web2 Import Web & App Export Web & App Web1 Web2
from Fabric ‘B’ to Fabric ‘A’

App1 App2 App1 App2


Export Web, App, Import Web, App,
DB to Fabric ‘B’ DB from Fabric ‘A’
dB1 dB1 dB2
Scope of Policy
Fabric ‘A’ Fabric ‘B’
mBGP - EVPN

DB Multiple APIC Clusters


Web/App Web/App

Web App Web App

 Policy is applied at provider of the contract (always at fabric where the


provider endpoint is connected)
 Rational:
Scoping of changes
No need to propagate all policies to all fabrics
Different policy applied based on source EPG (which fabric)
Policy Extensibility Across Sites (L3 Connectivity Only)
Host Routes Programming with ACI Toolkit Multisite (11.2(x) Release)

WAN
‘Register for EP
notification in ‘WEB1’ EPG Ext-WEB1 App
APP1 WEB1 ‘Ext-WEB1 EPG created in
Subnet/BD ‘A’ ACI remote L3Out
Toolkit

 ACI Toolkit Multisite peers with local and remote APIC clusters and specifies:
What local EPG needs to be “exported” to a remote site (‘WEB1’) and its name in the
remote location (‘Ext-WEB1’)
What contracts will be consumed/provided by that EPG in the remote site
The L3Out in the remote sites where to program host routes
Stretched Application Deployment with CliQr CloudCenter & ACI

Application Centric Application Centric Full Power of


Management + Infrastructure = Software Defined
Networking (SDN)

CliQr CloudCenter

Northbound API

Cisco ACI

http://goo.gl/CU6MSb
Four Deployment Topologies
Single Pod Multi Site

Multi Pod Multi Cloud


Multi Site Deployment

1. 2. 3.
Four Deployment Topologies
Single Pod Multi Site

Multi Pod Multi Cloud


Multi Cloud Deployment

1. 2. 3.

Database Application

On-Premise Public Cloud


Agenda
• Active-Active (A/A) Data Centre:
– Market & Business Drivers
– Terminology, Criticality levels and Solutions Overview

• A/A Data Centre Design Considerations:


– Storage Extension
– Data Centre Interconnect (DCI) – L2 & L3 scenarios
• OTV, MPLS, VXLAN
– Containers and Micro-services

• Cisco ACI and Active / Active Data Centre


– Multi-Site
– Multi-POD
– Hybrid-Cloud

• A/A Metro Data Centres Designs


- Network Services and Applications (Path optimisation)

• Q&A
• Historically this has been
Network Service Placement for Metro Distances well accepted for most of
Metro Virtual DC (Twin-
A/S Stateful Devices Stretched Across 2 Locations – Nominal Workflow DC)
• Almost 80% of Twin-DC
follows this model

L3 Core
• Network Services are usually active on
primary DC
• Distributed pair of Act/Sby FW & SLB on
Outside VLAN
each location
• Additional VLAN Extended for state
FW FT and session synch
synchronisation between peers
Inside VLAN
• Source NAT for SLB VIP

Note: With traditional pair cluster this


VIP VIP VLAN

Src-NAT
scenario is limited to 2 sites
SLB session synch

Front-end VLAN

Subnet A
Subnet A Back-end VLAN

Primary DC-1 Secondary DC-2


Network Service Placement for Metro Distances • Historically limited to
Network services and HA
Ping-pong Impact with A/S Stateful Devices Stretched Across 2 Locations clusters offering stateful
failover & fast
convergences
• Not optimum, but has
been usually accepted to
work in “degraded mode”
with predictable mobility of
Network Services under
short distance
L3 Core

• FW failover to remote site


• Source NAT for SLB VIP
Outside VLAN • Consider +/- 1 ms for each round trip for
100 km
• For Secured multi-tier software
Inside VLAN architecture, it is possible to measure + 10
round-trips from the initial client request
VIP VIP VLAN
up to the result.
Src-NAT • Interface tracking optionally enabled to
maintain active security and network
Front-end VLAN services on the same site

Subnet A
Subnet A Back-end VLAN

100 km +/- 1 ms per round trip


Primary DC-1 Secondary DC-2
• Network team is not
Network Service Placement for Metro Distances necessarily aware of the
Application/VM mobility
Additional Ping-pong Impact with IP Mobility Between 2 Locations • Uncontrolled degraded
mode with unpredictable
mobility of Network
Services and Applications

L3 Core
• FW failover to remote site
• Front-end server farm moves to
remote site
Outside VLAN
• Source NAT for SLB VIP maintains
the return path thru the Active SLB
• Partial move of a server farm is not
optimised
Inside VLAN
• Understand and identify the multi-
tier frameworks
VIP VIP VLAN

Src-NAT

Front-end VLAN

Subnet A Subnet A
Back-end VLAN

100 km +/- 1 ms per round trip


Primary DC-1 Secondary DC-2
• Limited relation between
Network Service Placement for Metro Distances server team (VM mobility)
and Network Team (HSRP
Stateful Devices and Trombone Effect for IP Mobility Between 2 Locations Filtering) and Service
Team (FW, SLB, IPS..)
• Ping-Pong effect with
active services placement
may impact the
performances

L3 Core • It is preferred to migrate the whole multi-


tier framework and enable FHRP filtering
to reduce the trombone effect
• FHRP filtering is ON on the Front-end
Outside VLAN
& Back-end side gateways
• Source NAT for SLB VIP maintains the
return path thru the Active SLB
Inside VLAN
• Understand and identify the multi-tier
frameworks
VIP VLAN

Src-NAT

HSRP
Front-end VLAN Filter

Subnet A Subnet A
Back-end VLAN

100 km +/- 1 ms per round trip


Primary DC-1 Secondary DC-2
• Improving relations between
Network Service Placement for Metro Distances sillo’ed organisations
increases workflow
Intelligent placement of Network Services Based on IP Mobility localisation efficiency
• Reduce trombon’ing with
active services placement

• Move the FW Context associated to the


application of interests
L3 Core • Interface Tracking to maintain the
stateful devices in same location when
possible
• Return traffic keeps symmetrical via the
Outside VLAN stateful devices and source-NAT
• Intra-DC Path Optimisation almost
achieved , however Ingress Path
Inside VLAN Optimisation may be required

VIP VIP VLAN


• Sillo’ed organisations
VIP
Src-NAT Src-NAT • Server/app
HSRP
HSRP • Network/HSRP filter
Front-end VLAN
Filter
Filter • service & security
• Storage
Subnet A
Subnet A Back-end VLAN

100 km +/- 1 ms per round trip


Primary DC-1 Secondary DC-2
Network Service Placement for Long Distances •

Subnet is extended with the VLAN
Src NAT on each FW is mandatory
Active/Standby Network Services per Site with Extended LAN (Hot Live migration) • No Config nor State Synch
• Per-box per-site configuration
• Ingress Path Optimisation can be
initiated to reduce trombone effect
due to active services placement
• With Ingress Path optimisation the
current Active TCP sessions are
interrupted and re-established on the
remote site

L3 Core

IP Localisation • Extend the VLAN of interests


• FW and SLB maintain stateful session per DC.
• No real limit in term of number of DC
routed mode routed mode
Src NAT Src NAT • Granular migration is possible only using
LISP or IGP Assist or RHI (if the Enterprise
Inside VLAN
owns the L3 core)
HSRP Filter

Front-End VLAN

Subnet A Subnet A
Back-End VLAN

Move the whole apps framework


(Front-End and Back-End)
DC-1 DC-2
Single ASA Clustering Stretched Across Multiple DC
Case 1: LISP Extended Subnet Mode with ASA Clustering (Stateful Live migration with LAN extension)
M-DB • One Way Symmetric Establishment is achieved via
the CCL
• Current active sessions are maintained stateful
• New Sessions are optimised dynamically
• Up to 10ms max one-trip latency, extend the ASA
ITR cluster for config and state synch
L3 Core
Update your Table

App is located in ETR-DC-2 1 - End-user sends Request to App


ETR ETR 2 - ITR intercepts the Req and check the localisation
3 - MS relays the Req to the ETR of interests which

App has moved


replies location for Subnet A being ETR DC-1
3” - ITR encaps the packet and sends it to RLOC
Director
ETR-DC-1
CCL Extension 4 – LISP Multi-hop notifies ETR on DC-2 about the
Owner move of the App
5 – Meanwhile ETR DC-2 informs MS about new
location of App
6 – MR updates ETR DC-1
HRSP Filter HRSP Filter 7 – ETR DC-1 updates its table (App:Null0)
8 – ITR sends traffic to ETR DC-1
Subnet A Subnet A 9 – ETR DC-1 replies with a Solicit Map Req
Data VLAN Extension
8 – ITR sends a Map Req and redirects the Req to
HRSP Filter HRSP Filter
ETR DC-2
DC-1 DC-2
Single ASA Clustering Stretched Across Multiple DC
Case 2: LISP Across Subnet Mode with ASA Clustering (Cold migration with Layer 3)
M-DB • Servers can migrate with same IP identifiers
(LISP ASM takes care of it)
• Cold Migration implies the Server to restart
• Business continuity implies the user to re-
establish a new session
ITR • TCP session is re-initiated
L3 Core • Up to 10ms max one-way latency, extend
Update your Table
the ASA cluster for config synch
1 - End-user sends Request to App
ETR ETR 2 - ITR intercepts the Req and check the localisation
3 - MS relays the Req to the ETR of interests which

App has moved


replies location for Subnet A being ETR DC-1
3” - ITR encaps the packet and sends it to RLOC
ETR-DC-1
CCL Extension 4 – LISP Multi-hop notifies ETR on DC-2 about the
Owner move of App
5 – ETR DC-2 informs MS about new location of
App
6 – MR updates ETR DC-1
7 – ETR DC-1 updates its table (App:Null0)
8 – ITR sends traffic to ETR DC-1
Subnet AA
Subnet Subnet B 9 – ETR DC-1 replies Solicit Map Req
8 – ITR sends a Map Req and redirects the flow to
ETR DC-2
DC-1 DC-2
Single ASA Clustering Stretched Across Multiple DC
Case 4: IGP Assist – ESM (Stateful Live migration with LAN extension) • Selective and Automatic notification
• IGP Assist is available with the function LISP
FHR (N7k)
LISP Control Plane • It offers a very granular optimisation (per-host
New sessions are /32)
User Data Plane • It requires the L3 core to support host route
optimised
(usually the Enterprise owns the L3 network)
• With the ASA cluster assist, active sessions are
L3 Core maintained, new sessions are optimised

1 - End-user sends Request to App

Subnet A / 24
Subnet A / 25

2 - Request is attracted by DC-1 (best metric) for subnet A


Active sessions are (/25)
maintained stateful 3 – Apps moves from DC-1 to DC-2 and continue sending
traffic to outside (stateful live migration)
4 – LISP IGP Assist detects the new EID and map-notifies
Director all LISP FHR (DC-2 and DC-1) via the LAN extension.
CCL Extension 5 – Meanwhile LISP First Hop Router (FHR) ’s Redistribute
the LISP routes into IGP
Owner
Redistribute
6 – On DC-2, where detection has been made, it installs a
LISP Map-notify Redistribute
LISP route into Local & remote FHR /32 LISP Interface Route of the host (EID)
LISP route into

Host route /32


IGP IGP 7 – On DC-1, the original location of the EID, it removes
HRSP Filter HRSP Filter /32 LISP Interface Route of the host
8 – The host route is propagated to the core attracting the
Subnet A Subnet A
traffic to its destination (more specific route)
Remove /32 OTV Install /32
lisp
VLAN Extension
lisp Active sessions are maintained stateful
interface interface New sessions workflows are optimised
HRSP Filter HRSP Filter route
route
DC-1 DC-2
Single ASA Clustering Stretched Across Multiple DC
Case 5: IGP Assist – ASM (Cold Migration) • Selective and Automatic notification
• IGP Assist is available with the function LISP
sessions are FHR (N7k)
LISP Control Plane • It offers an very granular optimisation (per-host
Active / Active DC Redirected to the /32)
User Data Plane Sessions are distributed DC of interest • It requires the L3 core to support host route
(usually the Enterprise owns the L3 network)
• Current session are interrupted and new
L3 Core sessions are optimised

1 - End-user sends Request to App

Subnet B / 25
Subnet A / 24
Subnet B / 24

Subnet A / 25

M-DB
2 - Request is attracted by DC-1 (best metric) for subnet A
(/25) and DC-2 is primary DC for Subnet B
3 – Apps migrates and restarts on DC-2 (Cold move)
4 – LISP IGP Assist detects the new EID and map-notifies its
Map-Register local LISP XTR (DC-2)
CCL CCL
5 –LISP FHR DC-2 Redistribute LISP routes into IGP
6 – Installs a /32 LISP Interface Route of the host (EID)
7 – Sends a Map-register of new EID to the MS
Redistribute 8 – MS sends a Map-notifies to LISP FHR (DC-1)
Map-Notify Redistribute
LISP route into LISP route into
9 – LISP FHR DC-1 Redistribute LISP routes into IGP and

Host route /32


IGP IGP removes /32 LISP Interface Route of the host
LISP Map-notify HRSP Filter 10 – The host route is propagated to the core attracting the
HRSP Filter
all local FHR traffic to its destination (more specific route)
Subnet A Subnet B Install /32
New sessions workflows are optimised
Remove /32
lisp lisp
interface interface
HRSP Filter HRSP Filter route
route
DC-1 DC-2
Agenda
• Active-Active (A/A) Data Centre:
– Market & Business Drivers
– Terminology, Criticality levels and Solutions Overview

• A/A Data Centre Design Considerations:


– Storage Extension
– Data Centre Interconnect (DCI) – L2 & L3 scenarios
• OTV, MPLS, VXLAN
– Containers and Micro-services

• Cisco ACI and Active / Active Data Centre


– Multi-Site
– Multi-POD
– Hybrid-Cloud

• A/A Metro Data Centres Designs


- Network Services and Applications (Path optimisation)

• Q&A
Q&A
Complete Your Online Session Evaluation
Give us your feedback and receive a
Cisco 2016 T-Shirt by completing the
Overall Event Survey and 5 Session
Evaluations.
– Directly from your mobile device on the Cisco Live Mobile
App
– By visiting the Cisco Live Mobile Site http://showcase.genie-
connect.com/ciscolivemelbourne2016/
– Visit any Cisco Live Internet Station located throughout
the venue
Learn online with Cisco Live!
T-Shirts can be collected Friday 11 March Visit us online after the conference
for full access to session videos and
at Registration presentations.
www.CiscoLiveAPAC.com
Thank you

You might also like