CiscoPress Cisco Data Center Fundamentals
CiscoPress Cisco Data Center Fundamentals
CiscoPress Cisco Data Center Fundamentals
||||||||||||||||||||
Cisco Press
||||||||||||||||||||
All rights reserved. This publication is protected by copyright, and permission must be obtained from the
publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form
or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding
permissions, request forms, and the appropriate contacts within the Pearson Education Global Rights &
Permissions Department, please visit www.pearson.com/permissions.
No patent liability is assumed with respect to the use of the information contained herein. Although
every precaution has been taken in the preparation of this book, the publisher and author assume no
responsibility for errors or omissions. Nor is any liability assumed for damages resulting from the use of
the information contained herein.
ScoutAutomatedPrintCode
Library of Congress Control Number: 2022909825
ISBN-13: 978-0-13763824-6
ISBN-10: 0-13-763824-8
The information is provided on an “as is” basis. The authors, Cisco Press, and Cisco Systems, Inc. shall
have neither liability nor responsibility to any person or entity with respect to any loss or damages
arising from the information contained in this book or from the use of the discs or programs that may
accompany it.
The opinions expressed in this book belong to the author and are not necessarily those of
Cisco Systems, Inc.
Trademark Acknowledgments
All terms mentioned in this book that are known to be trademarks or service marks have been
appropriately capitalized. Cisco Press or Cisco Systems, Inc., cannot attest to the accuracy of this
information. Use of a term in this book should not be regarded as affecting the validity of any
trademark or service mark.
Special Sales
For information about buying this title in bulk quantities, or for special sales opportunities (which may
include electronic versions; custom cover designs; and content particular to your business, training
goals, marketing focus, or branding interests), please contact our corporate sales department at
corpsales@pearsoned.com or (800) 382-3419.
For questions about sales outside the U.S., please contact intlcs@pearson.com.
||||||||||||||||||||
iii
Feedback Information
At Cisco Press, our goal is to create in-depth technical books of the highest quality and value. Each book
is crafted with care and precision, undergoing rigorous development that involves the unique expertise of
members from the professional technical community.
Readers’ feedback is a natural continuation of this process. If you have any comments regarding how we
could improve the quality of this book, or otherwise alter it to better suit your needs, you can contact us
through email at feedback@ciscopress.com. Please make sure to include the book title and ISBN in your
message.
Alliances Manager, Cisco Press: Arezou Gol Technical Editor: Ozden Karakok, CCIE No. 6331
Director, ITP Product Management: Brett Bartow Editorial Assistant: Cindy Teeters
Cisco has more than 200 offices worldwide. Addresses, phone numbers, and fax numbers are listed on the Cisco Website at www.cisco.com/go/offices.
Cisco and the Cisco logo are trademarks or registered trademarks of Cisco and/or its affiliates in the U.S. and other countries. To view a list of Cisco trademarks,
go to this URL: www.cisco.com/go/trademarks. Third party trademarks mentioned are the property of their respective owners. The use of the word partner does
not imply a partnership relationship between Cisco and any other company. (1110R)
||||||||||||||||||||
Education is a powerful force for equity and change in our world. It has the potential to
deliver opportunities that improve lives and enable economic mobility. As we work with
authors to create content for every product and service, we acknowledge our responsibil-
ity to demonstrate inclusivity and incorporate diverse scholarship so that everyone can
achieve their potential through learning. As the world’s leading learning company, we have
a duty to help drive change and live up to our purpose to help more people create a
better life for themselves and to create a better world.
■ Our educational products and services are inclusive and represent the rich diversity
of learners
■ Our educational content accurately reflects the histories and experiences of the
learners we serve
■ Our educational content prompts deeper discussions with learners and motivates
them to expand their own learning (and worldview)
While we work hard to present unbiased content, we want to hear from you about any
concerns or needs with this Pearson product so that we can investigate and address them.
||||||||||||||||||||
Iskren Nikolov, CCIE No.20164, CCSI No.32481, MCT Alumni, Content architect,
engineer, and developer with the Cisco Learning & Certifications Data Center and
Cloud team. He is responsible for designing, developing, reviewing Data Center Official
Learning Cisco courses, including lab infrastructures and exercises. He holds a master’s
degree in computer systems and management from the Technical University-Sofia,
Bulgaria. Iskren has more than 26 years of experience in designing, implementing, and
supporting solutions based on technologies such as data center, security, storage, wide
area network, software-defined network, cloud, hybrid, and multicloud, including 11
years of teaching and developing Cisco Data Center and Cloud courses and Microsoft
Azure courses. Because of his vast experience across technologies from multiple vendors,
such as Cisco Systems, VMware, Microsoft, and Barracuda, combined with the different
perspectives he has gained from his different work roles and working with customers
from different industries, Iskren has a unique view of the current data center technologies
and future trends. You can reach Iskren on LinkedIn: https://www.linkedin.com/in/
iskrennikolov.
||||||||||||||||||||
||||||||||||||||||||
vii
Dedications
Somit Maloo:
To AUM.
Iskren Nikolov:
To my loving family—my wife Petya and my kids Diana and Valentin—for their
unlimited support!
||||||||||||||||||||
Acknowledgments
Somit Maloo:
I would like to thank my co-author, Iskren Nikolov, for teaming up with me to
complete this book. Without his support, it would not have been possible. I am thank-
ful to the professional editors and the whole production team at Cisco Press, especially
James Manly and Ellie Bru, for their patience and guidance at every step of the book pub-
lishing process. I would also like to thank our technical editor, Ozden Karakok, for her
keen attention to detail and for taking time out of her busy schedule to review the book.
Iskren Nikolov:
I would like to thank my co-author, Somit Maloo. He not only worked with me as a team
on this book but guided me through the process of being a Cisco Press author. I am
thankful to the whole production team, especially James Manly and Ellie Bru, for their
professionalism and endless patience with me! Also, special thanks to our technical
editor, Ozden Karakok, for providing another valuable perspective on how we can tell a
better story about this technology!
||||||||||||||||||||
ix
Contents at a Glance
Introduction xxiv
Part I Networking
Chapter 1 Data Center Architectures 1
Chapter 2 Describing the Cisco Nexus Family and Cisco NX-OS Software 21
Part II Storage
Chapter 10 Data Center Storage Concepts 361
||||||||||||||||||||
Part IV Automation
Chapter 19 Using APIs 613
Part V Cloud
Chapter 21 Cloud Computing 645
Index 673
Reader Services
Register your copy at www.ciscopress.com/title/ISBN for convenient access to down-
loads, updates, and corrections as they become available. To start the registration pro-
cess, go to www.ciscopress.com/register and log in or create an account*. Enter the
product ISBN 9780137638246 and click Submit. When the process is complete, you
will find any available bonus content under Registered Products.
*Be sure to check the box that you would like to hear from us to receive exclusive
discounts on future editions of this product.
||||||||||||||||||||
xi
Contents
Introduction xxiv
Part I Networking
Chapter 2 Describing the Cisco Nexus Family and Cisco NX-OS Software 21
Cisco Nexus Data Center Product Overview 21
Cisco Nexus 9000 Series Switches 21
Cisco Nexus 9500 Platform Switches 22
Cisco Nexus 9300 Platform Switches 24
Cisco Nexus 9200 Platform Switches 25
Cisco Nexus 7000 Series Switches 26
Cisco Nexus 7000 Platform Switches 27
Cisco Nexus 7700 Platform Switches 28
Cisco Nexus 3000 Series Switches 29
||||||||||||||||||||
||||||||||||||||||||
Contents xiii
HSRP Versions 71
HSRP Configuration 72
Virtual Router Redundancy Protocol 78
VRRP Tracking 79
VRRP Router Priority and Preemption 79
VRRP Load Balancing 80
VRRP Configuration 81
Gateway Load Balancing Protocol 82
GLBP Operation 83
GLBP Interface Tracking 84
Summary 86
References 86
||||||||||||||||||||
||||||||||||||||||||
Contents xv
||||||||||||||||||||
||||||||||||||||||||
Contents xvii
Part II Storage
||||||||||||||||||||
||||||||||||||||||||
Contents xix
||||||||||||||||||||
||||||||||||||||||||
Contents xxi
Part IV Automation
Part V Cloud
||||||||||||||||||||
Index 673
||||||||||||||||||||
xxiii
VM
■ Boldface indicates commands and keywords that are entered literally as shown. In
actual configuration examples and output (not general command syntax), boldface
indicates commands that are manually input by the user (such as a show command).
■ Braces within brackets ([{ }]) indicate a required choice within an optional element.
||||||||||||||||||||
Introduction
This book is intended to give you an understanding of Cisco data center products and
data center–related protocols. This book is for those who are new to the data center
technologies, such as computer networking students, systems engineers, and server
administrators, as well as network engineers with many years of experience administering
larger data center networks. The only prerequisite is a basic understanding of networking
protocols.
This book is written as a self-study guide for learning data center technologies. The infor-
mation has been organized to help those who want to read the book from cover to cover
and also for those looking for specific information. A brief look at the contents of this
book will give you an idea of what is covered and what is needed to have a good under-
standing of data center technologies.
Our approach for writing this book was to do our best to explain each concept in a sim-
ple, step-by-step approach, as well as to include the critical details. It was challenging to
balance between providing as much information as possible and not overwhelming you,
the reader. Data center technologies are not difficult to learn, but they involve multiple
protocols and processes that might be new to some.
RFCs and Cisco’s official configuration guides are cited throughout the book. It was
important to include these references, as we wanted to give you the authoritative source
for the material in this book so that you have resources for more information. If you are
not familiar with reading RFCs, don’t be intimidated. Most of them are not difficult to
read, and they do their best to explain their topic clearly.
At times in this book, we introduce a technology or concept but state that it is covered in
more detail in a later chapter. We do this to explain the concept as it relates to the topic
being discussed without getting lost in the details. The details are covered where appro-
priate.
The objective of this book is to explain data center technologies as clearly as possible. At
times, it was like herding cats, trying to decide which topic to cover first. The first chap-
ter covers various data center architectures, including data center basics, and is designed
to give you an overview of the main topics. Having this overview will make it easier as
you progress through the rest of the book.
||||||||||||||||||||
Introduction xxv
The book provides much more than a simple overview of the various data center product
architectures, protocols, and features. This information helps you understand the various
pieces involved in setting up and managing a data center.
The book has been organized such that you might begin with little or no knowledge of
Cisco Data Center products. The book discusses installation and deployment of the data
center technologies, and some possible applications are also discussed to show some
basic capabilities of these products. It is largely up to you to decide how to best utilize
these technologies within your organization.
The reader should be familiar with networking protocols, because having a basic under-
standing of them is a prerequisite for this book.
Finally, this book is especially useful to people who are preparing for the CCNP DCCOR
exam. This book acts as a bridge between CCNA and CCNP DCCOR concepts and intro-
duces Cisco Data Center products and technologies related to the CCNP DCCOR exam.
■ Part I consists of Chapters 1 to 9 and covers Cisco’s data center switches along with
basic networking concepts related to the data center.
■ Part II consists of Chapters 10 to 15 and covers Cisco’s data center storage products
and storage concepts related to the data center.
■ Part III consists of Chapters 16 to 18 and covers Cisco’s compute products and con-
cepts related to data center compute.
■ Part IV consists of Chapters 19 and 20 and covers basic automation concepts related
to data center products.
||||||||||||||||||||
The following list highlights the topics covered in each chapter and the book’s
organization:
■ Chapter 1, “Data Center Architectures”: This chapter discusses data center basics
and analyzes the Data Center Architecture from a holistic point of view, which is
Cisco’s unique approach to the creation of a unified data center. It also discusses dif-
ferent core components of a data center—network infrastructure, storage infrastruc-
ture, and computing infrastructure.
■ Chapter 2, “Describing the Cisco Nexus Family and Cisco NX-OS Software”:
This chapter discusses Cisco Nexus Data Center products, Cisco Fabric Extenders
(FEXs), and Cisco NX-OS Software Architecture and concludes by exploring the
Cisco NX-OS CLI.
■ Chapter 4, “Port Channels and vPCs”: This chapter discusses Ethernet port chan-
nels including port channel modes, port channel compatibility requirements, and
port channel load balancing. It also discusses virtual port channels (vPCs), including
various vPC topology implementations, vPC components, vPC control plane and
data plane, vPC failure scenarios, and vPC configuration and verification.
■ Chapter 6, “Nexus Switch Routing”: This chapter discusses the underlying con-
cepts along with the configuration and verification for the Layer 3 unicast routing
protocols in Cisco NX-OS, including RIPv2, EIGRP, and OSPFv2. It also discusses
the multicast fundamentals, including PIM configuration and verification in Cisco
NX-OS.
■ Chapter 8, “Describing Cisco ACI”: This chapter provides an overview of Cisco ACI
and discusses building blocks, deployment models, hardware components, and fabric
startup discovery, along with Cisco ACI policy model, including logical constructs,
fabric policies, and access policies. It also briefly discusses packet forwarding within
the ACI fabric.
||||||||||||||||||||
Introduction xxvii
■ Chapter 9, “Operating Cisco ACI”: This chapter discusses Cisco ACI external
connectivity options, including L2Out and L3Out, Cisco ACI and VMM integra-
tion, Cisco ACI and L4–L7 integration, Cisco ACI management options, Cisco ACI
Anywhere, and Cisco Nexus Dashboard.
■ Chapter 10, “Data Center Storage Concepts”: This chapter discusses the data cen-
ter storage protocols and design, including block-based and file-based storage proto-
cols, NAS, DA, SAN, the Fibre Channel protocol and topologies, and also makes an
overview of the Cisco MDS family of Fibre Channel switches.
■ Chapter 12, “Describing VSANs and Fibre Channel Zoning”: This chapter
describes the vSANs and Fibre Channel zoning and the implementation of these fea-
tures on the Cisco MDS switches.
■ Chapter 13, “Storage Virtualization”: This chapter describes the storage virtualiza-
tion features supported on the Cisco Data Center switches, including the NPV and
NPIV.
■ Chapter 15, “Describing FCoE”: This chapter discusses the FCoE protocol, includ-
ing the FCoE architecture, FCIP, and FCoE configuration on Cisco Data Center
switches.
■ Chapter 16, “Describing Cisco UCS Components”: This chapter describes the
Cisco UCS components of the Cisco compute solution, including the Cisco UCS
Fabric Interconnects, blade chassis, B- and C-series servers, as well as the Cisco
HyperFlex platform and the latest generation of the Cisco UCS X-series modular
system.
■ Chapter 17, “Describing Cisco UCS Abstraction”: This chapter discusses the hard-
ware abstraction approach used on the Cisco UCS, and what basic configuration is
needed, including the Cisco UCS service profiles, templates, policies, and an over-
view of the Cisco UCS Central.
■ Chapter 18, “Server Virtualization”: This chapter discusses key server virtualization
components, including the virtual machine and its components, types of hypervi-
sors, VMware vSphere architecture, VMware ESXi, and the VMware vCenter Server
appliance installation procedure.
||||||||||||||||||||
■ Chapter 19, “Using APIs”: This chapter starts the discussion about automation
in the data center by introducing the APIs and the data model–based framework,
including the YANG data models; the NETCONF, RESTCONF, and gRPC configura-
tion protocols; and the JSON, XML, and YAML data formats. This chapter also pro-
vides an overview of the APIs supported on the Cisco NX-OS and the Cisco UCS.
■ Chapter 20, “Automating the Data Center”: This chapter discusses the data center
automation, orchestration, management, and toolsets.
■ Chapter 21, “Cloud Computing”: This chapter discusses cloud computing, looking
at its definition, characteristics, services, and deployment models. This chapter also
provides an overview of the Cisco Intersight hybrid cloud platform.
Figure Credits
Figure 18-6 Oracle
||||||||||||||||||||
Chapter 1
A data center is home to the computational power, storage, and applications necessary to
support an enterprise business. The data center infrastructure is central to the IT archi-
tecture from which all content is sourced or through which all content passes. Proper
planning of the data center infrastructure design is required, considering performance,
resiliency, and scalability needs. Also, the data center design should be flexible in quickly
deploying and supporting new services. Such a design requires solid initial planning and
thoughtful consideration in the areas of port density, access layer uplink bandwidth, true
server capacity, and oversubscription, to name just a few.
In this chapter, we will discuss the data center basics and analyze the data center archi-
tecture from a holistic point of view that is Cisco’s unique approach to creating a unified
data center. We will also discuss different core components of a data center: the network
infrastructure, storage infrastructure, and computing infrastructure.
At its simplest, a data center is a physical facility that organizations use to house their
critical applications and data. A data center’s design is based on a network of computing
and storage resources that enable the delivery of shared applications and data. The key
components of a data center design include routers, switches, firewalls, storage systems,
servers, and application-delivery controllers.
Modern data centers are very different from what they were just a short time ago.
Infrastructure has shifted from traditional on-premises physical servers to virtual net-
works that support applications and workloads across pools of physical infrastructure
Technet24
||||||||||||||||||||
and into a multicloud environment. In this era, data exists and is connected across
multiple data centers, the edge, and public and private clouds. The data center must be
able to communicate across these multiple sites—both on-premises and in the cloud.
Even the public cloud is a collection of data centers. When applications are hosted in the
cloud, they use data center resources from the cloud provider.
In the world of enterprise IT, data centers are designed to support the following business
applications and activities:
■ Productivity applications
■ Network infrastructure: This connects servers (physical and virtualized), data center
services, storage, and external connectivity to end-user locations.
■ Storage infrastructure: Data is the fuel of the modern data center. Storage systems
are used to hold this valuable commodity.
■ The first wave saw the shift from proprietary mainframes to x86-based servers,
based on-premises and managed by internal IT teams.
■ The third wave finds us in the present, where we are seeing the move to cloud,
hybrid cloud, and cloud-native (that is, applications born in the cloud).
This evolution has given rise to distributed computing. This is where data and applica-
tions are distributed among disparate systems, connected and integrated by network ser-
vices and interoperability standards to function as a single environment. It has meant the
term data center is now used to refer to the department that has responsibility for these
systems irrespective of where they are located.
||||||||||||||||||||
Since data center components store and manage business-critical data and applications,
data center security is critical in the data center design. The following data center ser-
vices are typically deployed to protect the performance and integrity of the core data
center components:
Data center components require significant infrastructure to support the center’s hard-
ware and software. This includes power subsystems, uninterruptible power supplies
(UPSs), ventilation, cooling systems, fire suppression, backup generators, and connections
to external networks.
■ Tier 1: Basic site infrastructure. A tier 1 data center offers limited protection against
physical events. It has single-capacity components and a single, nonredundant distri-
bution path.
■ Tier 4: Fault-tolerant site infrastructure. This data center provides the highest levels
of fault tolerance and redundancy. Redundant-capacity components and multiple
independent distribution paths enable concurrent maintainability, and one fault any-
where in the installation doesn’t cause downtime.
Technet24
||||||||||||||||||||
the topology of other data centers, what technologies they use for computing and stor-
age, and even their energy efficiency. There are four main types of data centers:
■ Enterprise data centers: These are built, owned, and operated by companies and are
optimized for their end users. Most often they are housed on the corporate campus.
■ Managed services data centers: These data centers are managed by a third party (or
a managed services provider) on behalf of a company. The company leases the equip-
ment and infrastructure instead of buying it.
■ Colocation data centers: In colocation (“colo”) data centers, a company rents space
within a data center owned by others and located off company premises. The colo-
cation data center hosts the infrastructure (building, cooling, bandwidth, security,
and so on), while the company provides and manages the components, including
servers, storage, and firewalls.
■ Cloud data centers: In this off-premises form of data center, data and applica-
tions are hosted by a cloud services provider such as Amazon Web Services (AWS),
Microsoft Azure, or IBM Cloud, or other public cloud provider.
Organizations can choose to build and maintain their own hybrid cloud data centers,
lease space within colocation facilities (colos), consume shared compute and storage ser-
vices, or use public cloud-based services. The net effect is that applications today no lon-
ger reside in just one place. They operate in multiple public and private clouds, managed
offerings, and traditional environments. In this multicloud era, the data center has become
vast and complex, geared to drive the ultimate user experience.
The Cisco Unified Data Center is a comprehensive, fabric-based platform that integrates
computing, networking, security, virtualization, and management solutions into a single,
highly efficient, and simplified architecture. The platform does not just accommodate
virtualization; it was designed to build on the advantages of virtualization, increasing the
density, performance, mobility, and security of data center resources. The result is a plat-
form that is significantly easier to scale than other solutions, whether you are adding pro-
cessing power within the data center, expanding geographic coverage by linking multiple
data centers, or securely connecting additional users and devices.
This less complex, unified approach also facilitates the introduction of automation, which
can dramatically increase data center efficiency, productivity, and agility, including the
capability to manage deployment and operations across physical and virtual resources,
which is critical to the delivery of IT as a Service (ITaaS). Less complexity means faster
time to value—a significant advantage over other data center architectures.
||||||||||||||||||||
The Cisco Unified Data Center is based on three pillars of Cisco innovation: Unified
Fabric, Unified Computing, and Unified Management, as shown in Figure 1-1.
Unified Fabric
Cisco’s fabric-based approach to data center infrastructure eliminates the tiered silos
and inefficiencies of multiple network domains, instead offering a flatter, unified fabric
that allows consolidation of local area network (LAN), storage area network (SAN), and
network-attached storage (NAS) over one high-performance and fault-tolerant network.
Cisco Unified Fabric delivers massive scalability and resiliency to the data center by cre-
ating large pools of virtualized network resources that can be easily moved and rapidly
reprovisioned. This approach reduces complexity and enables automated deployment
of new virtual machines and applications. Deep integration between the architecture of
the server and the network enables delivery of secure IT services within the data center,
between data centers, or beyond the data center to users from any device. Cisco Unified
Fabric is based on the Cisco Nexus family of switches, which runs on a common operat-
ing system named Cisco NX-OS.
Unified Computing
This highly scalable computing solution integrates servers, flash-memory acceleration,
and networking with embedded management and automation to simplify operations for
physical, virtual, and cloud workloads. Cisco Unified Computing solution is based on
the Cisco Unified Computing System (UCS). Cisco UCS integrates industry-standard
x86-architecture servers, access and storage networking, and enterprise-class management
Technet24
||||||||||||||||||||
into a single system for greater speed, simplicity, and scalability. Cisco UCS eliminates
the multiple redundant devices that populate traditional blade servers and add layers
of management complexity. When used within the high-bandwidth, low-latency Cisco
Unified Fabric framework, Cisco UCS gives IT managers a wire-once platform for provid-
ing highly elastic and agile pools of virtualized resources.
Cisco UCS is massively scalable to hundreds of blades and thousands of virtual machines,
all with a single point of connectivity and management. Every aspect of the system’s
configuration can be programmed through an intuitive GUI using automated rules and
policies and operating across bare-metal, virtualized, and cloud computing environments.
Open standards-based application programming interfaces (APIs) offer exceptional flex-
ibility for integration of diverse application, virtualization, storage, and system manage-
ment solutions.
Unified Management
Automation, orchestration, and lifecycle management tools simplify deployment and
operation of physical and bare-metal environments, virtual environments, and private,
public, and hybrid cloud environments. Most organizations have dozens of different
management solutions that do not necessarily work all that well together. Cisco offers the
open platform for centrally managing all data center resources across physical, virtual,
and cloud environments. The flexible automation of Cisco Unified Management solutions
reduces the time and cost of setting up and provisioning infrastructure. Role- and policy-
based provisioning using service profiles and templates simplifies operations. By provid-
ing lifecycle management and process automation, Cisco Unified Management solutions
deliver greater agility and scalability for the data center while reducing complexity and
risk. The Cisco Unified Management solution includes Cisco UCS Manager, Cisco UCS
Central, Cisco Intersight, Cisco Nexus Dashboard, Cisco Nexus Insights, Cisco Network
Assurance Engine (NAE), and more.
||||||||||||||||||||
Core
Layer 3 Aggregation
Layer 2
Access
The architecture consists of core routers, aggregation routers (sometimes called distribu-
tion routers), and access switches. Between the aggregation routers and access switches,
Spanning Tree Protocol is used to build a loop-free topology for the Layer 2 part of the
network. Spanning Tree Protocol provides several benefits. For example, it is simple,
and it is a plug-and-play technology requiring little configuration. VLANs are extended
within each pod, and servers can move freely within a pod without the need to change IP
address and default gateway configurations. However, Spanning Tree Protocol cannot use
parallel forwarding paths, and it always blocks redundant paths in a VLAN.
In 2010, Cisco introduced virtual port channel (vPC) technology to overcome the limita-
tions of Spanning Tree Protocol. vPC eliminates spanning tree’s blocked ports, provides
active-active uplink from the access switches to the aggregation routers, and makes full
use of the available bandwidth, as shown in Figure 1-3. With vPC technology, Spanning
Tree Protocol is still used as a failsafe mechanism.
vPC technology works well in a relatively small data center environment where most traf-
fic consists of northbound and southbound communication between clients and servers.
We will discuss vPCs in detail in Chapter 4, “Port Channels and vPCs.”
Since 2003, with the introduction of virtual technology, the computing, networking, and
storage resources that were segregated in pods in Layer 2 in the three-tier data center
design can be pooled. This revolutionary technology created a need for a larger Layer 2
domain, from the access layer to the core layer, as shown in Figure 1-4.
Technet24
||||||||||||||||||||
Core
Layer 3 Aggregation
Layer 2
vPC
Access
Layer 3
Core
Layer 2
vPC
Aggregation
vPC
Access
With Layer 2 segments extended across all the pods, the data center administrator can
create a central, more flexible resource pool that can be reallocated based on needs.
Servers are virtualized into sets of virtual machines that can move freely from server to
server without the need to change their operating parameters.
||||||||||||||||||||
A new data center design called the Clos network–based spine-leaf architecture was
developed to overcome these limitations. This architecture has been proven to deliver the
high-bandwidth, low-latency, nonblocking server-to-server connectivity.
Spine-Leaf Network
In this two-tier Clos architecture, every lower-tier switch (leaf layer) is connected to each
of the top-tier switches (spine layer) in a full-mesh topology. The leaf layer consists of
access switches that connect to devices such as servers. The spine layer is the backbone
of the network and is responsible for interconnecting all leaf switches. Every leaf switch
connects to every spine switch in the fabric. The path is randomly chosen so that the traf-
fic load is evenly distributed among the top-tier switches. If one of the top tier switches
were to fail, it would only slightly degrade performance throughout the data center.
Spine
Leaf
If oversubscription of a link occurs (that is, if more traffic is generated than can be
aggregated on the active link at one time), the process for expanding capacity is straight-
forward. An additional spine switch can be added, and uplinks can be extended to every
leaf switch, resulting in the addition of interlayer bandwidth and the reduction of the
oversubscription. If device port capacity becomes a concern, a new leaf switch can be
added by connecting it to every spine switch and adding the network configuration to
the switch. The ease of expansion optimizes the IT department’s process of scaling the
network. If no oversubscription occurs between the lower-tier switches and their uplinks,
a nonblocking architecture can be achieved.
With a spine-and-leaf architecture, no matter which leaf switch a server is connected to,
its traffic always has to cross the same number of devices to get to another server (unless
the other server is located on the same leaf). This approach keeps latency at a predictable
level because a payload only has to hop to a spine switch and another leaf switch to reach
its destination.
Technet24
||||||||||||||||||||
There are various benefits of using a SAN. It allows the storage disks to be connected
to servers over large distances, making it ideal for enterprise data center networks. The
same physical disk can be connected to more than one server, allowing for effective disk
utilization. A SAN provides high availability using multiple physical paths from server to
storage. SAN supports nondisruptive scalability (that is, additional storage can be added
to the storage network without affecting the device currently using the network).
There are three types of SAN topologies: single-tier, two-tier, and three-tier topology.
Single-Tier Topology
In single-tier topology, also called collapsed-core topology, servers connect to the core
switches, which provide storage services. Storage devices connect to one or more core
switches, as shown in Figure 1-6. Core devices have a large number of blades to support
initiator (host) and target (storage) ports. High availability is achieved using two physical-
ly separate, but identical, redundant SAN fabrics. This topology has single management
per fabric and is suitable for small SAN environments.
Storage
FC
FC
Core Core
Servers
||||||||||||||||||||
Two-Tier Topology
Two-tier topology, also called core-edge topology, is the most common SAN network
topology. In this topology, servers connect to the edge switches, whereas storage devices
connect to one or more core switches, as shown in Figure 1-7. Core switches provide
storage services to one or more edge switches, thus servicing more servers in the fabric.
Inter-Switch Links (ISLs) have to be designed in such a way that enough links/bandwidth
are available between the switches to avoid congestion when servers communicate with
storage devices. High availability is achieved using two physically separate, but identi-
cal, redundant SAN fabrics. This topology guarantees a single switch hop (edge to core)
reachability from servers to storage. The key drawback of this topology is that core con-
nections and the storage are in contention for extension. This topology provides room for
minimal growth.
Storage
FC
FC
Core Core
Edge Edge
Servers
Three-Tier Topology
In three-tier topology, also called edge-core-edge topology, servers connect to the edge
switches. Storage devices connect to one or more edge switches, as shown in Figure 1-8.
Core switches provide storage services to one or more edge switches, thus servicing more
Technet24
||||||||||||||||||||
servers and storage in the fabric. ISLs have to be designed in such a way that enough
links/bandwidth are available between the switches to avoid congestion when servers
communicate with storage devices. High availability is achieved using two physically
separate, but identical, redundant SAN fabrics. In this topology, a core switch is used
exclusively for edge switch interconnections, allowing easy SAN expansion by storage
size (adding more edge switches connected to storage) or by computing power (adding
more edge switches connected to servers) independently.
Storage
FC
FC
Edge Edge
Core Core
Edge Edge
Servers
||||||||||||||||||||
We will discuss Cisco UCS in detail in Chapter 16, “Describing Cisco UCS Components.”
However, Figure 1-9 provides a sneak peek at the Cisco UCS anatomy. Cisco UCS is built
using the hierarchy of components illustrated in Figure 1-9.
Each Cisco UCS domain is established with a pair of Cisco UCS fabric interconnects,
with a comprehensive set of options for connecting blade, rack, multinode, and storage
servers to them either directly or indirectly. Cisco UCS fabric interconnects provide a sin-
gle point of connectivity and management for an entire Cisco UCS system. Deployed as
Technet24
||||||||||||||||||||
an active-active pair, the system’s fabric interconnects integrate all components into a sin-
gle, highly available management domain. Cisco fabric interconnects support low-latency,
line-rate, lossless Ethernet and Fibre Channel over Ethernet (FCoE) connectivity. A pair
of Cisco UCS fabric interconnects forms the single point of connectivity for a Cisco UCS
domain. Blade servers connect through fabric extenders, most rack servers can connect
through optional fabric extenders, and rack servers, multinode rack servers, and storage
servers can connect directly to the fabric interconnects, as illustrated in Figure 1-10.
Figure 1-10 Cisco UCS Connectivity Options for Blade, Rack, and Storage Servers
Cisco UCS virtual interface cards (VICs) extend the network fabric directly to both serv-
ers and virtual switches so that a single connectivity mechanism can be used to connect
both physical and virtual servers with the same level of visibility and control.
Cisco UCS B-Series blade servers provide massive amounts of computing power in a
compact form factor, helping increase density in computation-intensive and enterprise
application environments.
Cisco UCS C-Series rack servers can integrate into Cisco UCS through Cisco UCS fabric
interconnects or be used as standalone servers with Cisco or third-party switches. These
servers provide a wide range of I/O, memory, internal disk, solid-state disk (SSD) drive
and Non-Volatile Memory Express (NVMe) storage device capacity, enabling you to
easily match servers to workloads. Cisco UCS C4200 Series multinode rack servers are
designed for clustered workloads where high core density is essential.
Cisco UCS S-Series storage servers are modular servers that support up to 60 large-form-
factor internal drives to support storage-intensive workloads, including big data, content
streaming, online backup, and Storage as a Service applications. The servers support one
||||||||||||||||||||
or two computing nodes with up to two CPUs each, connected to a system I/O controller
that links the server to the network. These servers offer the flexibility of compute pro-
cessing to balance the needed storage for workloads like big data, data protections, and
software-defined storage.
Multiple management tools are available that use Cisco UCS Unified API to manage
Cisco UCS. Cisco UCS Manager and Cisco Intersight are two of the most widely used
management tools.
Cisco Intersight Software as a Service provides a consistent management interface for all
your Cisco UCS instances, Cisco HyperFlex clusters, edge deployments, and standalone
rack servers, regardless of their location. You can access the Intersight platform through
the cloud or through an optional management appliance. Intersight is designed to inte-
grate management capabilities with a broader set of features, including a recommendation
engine, integration with Cisco Technical Assistance Center (TAC), contract management,
inventory management, and alerts.
Converged Infrastructure
Rapidly changing demand and workload diversity have forced the IT organizations to cre-
ate large sets of hardware and software components—and spend time integrating them
into a workable solution. Cisco Converged Infrastructure is a powerful and unique inte-
gration of high-performance computing, networking, and storage coupled with resource
and performance management software, multicloud orchestration, and support for mul-
tiple deployment models.
Cisco’s Converged Infrastructure core components are built using Cisco Unified
Computing System (Cisco UCS), Cisco Nexus switches, and Cisco Intersight Software as
a Service management and enterprise-approved storage systems to offer a range of con-
verged solutions to meet an enterprise’s needs. Figure 1-11 shows the Cisco Converged
Infrastructure portfolio, including solutions for FlexPod with NetApp, FlashStack with
Pure Storage, Cisco and Hitachi Adaptive Solutions, VersaStack with IBM, and VxBlock
Systems with Dell EMC.
Technet24
||||||||||||||||||||
The following list provides some of the many benefits of Converged Infrastructure:
■ Lower total cost of ownership: The management teams recognize that opera-
tions—including people, management, software, and facilities—are the largest cost
in the data center, far greater than the cost of underlying hardware. Cisco Converged
Infrastructure solutions help to get more work done with the same resources and
enable the staff to reduce the amount of time they spend “keeping the lights on.”
With these innovative solutions, the IT staff can consolidate more workloads onto
fewer servers so there are fewer components to buy and manage. All components are
connected through a unified fabric that delivers high-performance data and storage
networking to simplify deployment, help ensure the quality of the user experience,
and reduce operating costs. These solutions also reduce cabling, power, and cooling
requirements and automate routine tasks to increase productivity.
■ Reduced risk: Deployment delays and service disruptions affect the company’s prof-
itability. That’s why it is important to deploy infrastructure correctly the first time.
Converged Infrastructure reduces risk and guesswork by giving the architects and
administrators a guidebook for implementing solutions. Converged Infrastructure
solutions are pre-validated and documented so that you can get environments up and
running quickly and with confidence.
||||||||||||||||||||
Before we discuss HCI in detail, let’s first see what the difference is between Converged
and Hyperconverged Infrastructures. Converged and hyperconverged systems both aim
to simplify data center management. Converged Infrastructure has the same compo-
nents, but they’re discrete, separable, and cumbersome to manage compared with HCI.
Hyperconverged Infrastructure fully integrates all components and is software-defined.
In essence, HCI is designed to work as one system with software-managed storage, as
opposed to converged solutions and their separate components. HCI delivers deeper
abstraction and higher automation and scalability than Converged Infrastructure. HCI
simplifies administration by providing a single point of management. HCI fully integrates
with the entire data center, eliminating the need for separate servers and network storage
and delivering on-demand infrastructure for data-centric workloads.
Cisco HyperFlex engineered on the Cisco UCS is an HCI solution. Cisco HyperFlex
systems with Intel Xeon Scalable Processors deliver hyperconvergence with power and
simplicity for any application, on any cloud, anywhere. Cisco HyperFlex includes hybrid,
all-flash, all-NVMe, and edge configurations, an integrated network fabric, and power-
ful data optimization features that bring the full potential of hyperconvergence to a
wide range of workloads and use cases. Cisco UCS fabric interconnects provide a single
point of connectivity integrating Cisco HyperFlex HX-Series all-flash, all-NVMe, or
hybrid nodes and other Cisco UCS servers into a single unified cluster. More on Cisco
HyperFlex can be found in Chapter 16.
Figure 1-12 shows a typical Cisco HyperFlex system with virtualized and containerized
applications support.
Technet24
||||||||||||||||||||
The following list provides some of the many benefits of hyperconverged infrastructure:
■ Lower cost: Integrating components into one platform reduces storage footprint,
power use, maintenance costs, and total cost of ownership (TCO). Hyperconverged
systems eliminate the need to overprovision to accommodate growth and help enable
data centers to scale in small, easily managed steps.
||||||||||||||||||||
Summary 19
including self-encrypting drives and tools that provide high levels of visibility.
Backup and disaster recovery are also built in.
Different products corresponding to the Unified Management pillar of the Cisco Unified
Data Center platform, such as Cisco UCS Manager, Cisco UCS Central, and so on, will be
discussed in detail in Chapter 17, “Describing Cisco UCS Abstraction.” Cisco Intersight
will be discussed in Chapter 21, “Cloud Computing,” and Cisco Nexus Dashboard in
Chapter 9, “Operating ACI.”
Summary
This chapter discusses data center basics, Cisco’s Unified Data Center platform architec-
ture, and core components of data center infrastructure, including the following points:
■ A data center is a physical facility that organizations use to house their critical appli-
cations and data.
■ As per the ANSI/TIA-942 standard, data centers are categorized into four tiers,
depending on the levels of redundancy and fault tolerance.
■ There are four main types of data centers, depending on whether they are owned by
one or many organizations, how they fit (if they fit) into the topology of other data
centers, and what technologies they use for computing and storage. These are enter-
prise data centers, managed services data centers, colocation data centers, and cloud
data centers.
■ The Cisco Unified Data Center is a comprehensive, fabric-based platform that inte-
grates computing, networking, security, virtualization, and management solutions in
a single, highly efficient, and simplified architecture.
■ The Cisco Unified Data Center is based on three pillars: Unified Fabric, Unified
Computing, and Unified Management
■ A data center network infrastructure can be built using traditional three-tier network
architecture or two-tier Clos architecture.
■ A data center storage infrastructure can be built using single-tier, two-tier, or three-
tier architecture, depending on the size of the organization and taking into account
future growth consideration.
■ A data center computing infrastructure is built using Cisco UCS as its core
component.
Technet24
||||||||||||||||||||
References
“Cisco Unified Data Center: Simplified, Efficient, and Agile Infrastructure for the Data
Center,” https://www.cisco.com/c/dam/global/en_in/solutions/collateral/ns340/
ns517/ns224/solution_overview_c22-700817.pdf
“Cisco Data Center Spine-and-Leaf Architecture: Design Overview White Paper,” https://
www.cisco.com/c/en/us/products/collateral/switches/nexne-7000-series-switches/
white-paper-c11-737022.html
“Cisco Unified Computing System Solution Overview,” https://www.cisco.com/c/en/us/
products/collateral/servers-unified-computing/solution-overview-c22-744677.html
“Cisco HyperFlex Systems Solution Overview,” https://www.cisco.com/c/en/us/products/
collateral/hyperconverged-infrastructure/hyperflex-hx-series/solution-overview-
c22-744674.html
Relevant Cisco Live sessions: http://www.ciscolive.com
||||||||||||||||||||
Chapter 2
In this chapter, we will discuss the Cisco Nexus Data Center product, Cisco Fabric
Extender (FEX), and Cisco NX-OS software architecture, and we will conclude by
exploring the Cisco NX-OS CLI.
Technet24
||||||||||||||||||||
22 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
The Cisco Nexus 9000 Series switches include the Nexus 9500 Series modular switches
and the Nexus 9200/9300 Series fixed switches, as shown in Figure 2-1.
The Cisco Nexus 9500 Series modular switch includes the components shown in
Figure 2-2.
The Cisco Nexus 9500 Series modular switches support several line cards and fabric
modules. A pair of redundant supervisor modules manage all switch operations using
a state-synchronized, active-standby model. The supervisor accepts an external clock
and supports management through multiple ports: two USB ports, a serial port, and a
10/100/1000Mbps Ethernet port. All supervisors support Cisco ACI or NX-OS deploy-
ments. Redundant supervisors should be of the same type within a chassis.
||||||||||||||||||||
A pair of redundant system controllers offload chassis management functions from the
supervisor modules. The controllers are responsible for managing the power supplies and
fan trays; they are also the central point for the Gigabit Ethernet Out-of-Band Channel
(EOBC) between the supervisors, fabric modules, and line cards.
Each Cisco Nexus 9500 Series chassis supports up to six fabric modules, which plug in
vertically at the back of the chassis behind the fan trays.
The Cisco Nexus 9500 chassis supports two versions of hot-swappable fan trays that are
compatible with specific fabric modules. Each fan tray covers two fabric module slots
and enables front-to-back airflow for the entire chassis. An appropriate fabric module
blank card should be installed in all empty fabric module slots to ensure proper airflow
and cooling of the chassis.
The Cisco Nexus 9500 platform supports hot-swappable, front-panel-accessible AC, DC,
and universal high-voltage AC/DC power supplies. The total power budget required for
the mix and number of line cards and fabric modules installed in the chassis determines
the ability to support power supply redundancy modes (combined, n + 1, n + n, or input-
source redundancy).
Technet24
||||||||||||||||||||
24 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
Note Cisco introduces new models of line cards and fabric modules with enhanced
capacity and feature set from time to time. For the latest supported line cards and fabric
modules in 9500 Series modular chassis, refer to the datasheet located at https://www.
cisco.com/c/en/us/products/switches/nexus-9000-series-switches/datasheet-listing.html.
Table 2-1 summarizes the Cisco Nexus 9300 platform switch models.
||||||||||||||||||||
Model Description
Cisco Nexus 93180YC-EX 48 x 1/10/25Gbps fiber ports and 6 x 40/100Gbps
QSFP28 ports
Cisco Nexus 93180YC-FX 48 x 1/10/25Gbps fiber ports and 6 x 40/100Gbps
QSFP28 ports
Cisco Nexus 93180YC-FX3 48 x 1/10/25Gbps fiber ports and 6 x 40/100Gbps
QSFP28 ports
Cisco Nexus 93180YC-FX3S 48 x 1/10/25Gbps fiber ports and 6 x 40/100Gbps
QSFP28 ports
Cisco Nexus 93240YC-FX2 48 x 1/10/25Gbps fiber ports and 12 x 40/100Gbps
QSFP28 ports
Cisco Nexus 93360YC-FX2 96 x 1/10/25Gbps fiber ports and 12 x 40/100Gbps
QSFP28 ports
Cisco Nexus 9364C 64 x 40/100G QSFP28 ports and 2-port 1/10G SFP+
ports
Cisco Nexus 9336C-FX2 36 x 40/100Gbps QSFP28 ports
Cisco Nexus 9336C-FX2-E 36 x 40/100Gbps QSFP28 ports
Cisco Nexus 9332C 32 x 40/100G QSFP28 ports and 2-port 1/10G SFP+
ports
Cisco Nexus 9364C-GX 64 x 100/40Gbps QSFP28 ports
Note Quad Small Form Factor Pluggable Double Density (QSFP-DD) transceivers can
support up to 400Gbps bandwidth, whereas QSFP+ and QSFP28 support a maximum
bandwidth of 40Gbps and 100Gbps, respectively.
Table 2-2 summarizes the Cisco Nexus 9200 platform switch models.
Technet24
||||||||||||||||||||
26 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
Note Refer to the following link for comparing hardware capabilities of various Nexus
9000 Series switches: https://www.cisco.com/c/en/us/products/switches/nexus-9000-se-
ries-switches/models-comparison.html
The supervisor module delivers control-plane and management functions. The supervi-
sor controls the Layer 2 and Layer 3 services, redundancy capabilities, configuration
management, status monitoring, power and environmental management, and more. It pro-
vides centralized arbitration to the system fabric for all line cards. The fully distributed
forwarding architecture allows the supervisor to support transparent upgrades to I/O and
fabric modules with greater forwarding capacity. Two supervisors are required for a fully
redundant system, with one supervisor module running as the active device and the other
in hot-standby mode, providing exceptional high-availability features such as stateful
switchover and In-Service Software Upgrade (ISSU) on mission-critical data center–class
products.
||||||||||||||||||||
The fabric modules provide parallel fabric channels to each I/O and supervisor module
slot. All fabric modules connect to all module slots. The addition of each fabric module
increases the bandwidth to all module slots up to the system limit of five fabric modules
for 7000 platform switches and six fabric modules for 7700 platform switches. The archi-
tecture supports lossless fabric failover, with the remaining fabric modules load-balancing
the bandwidth to all the I/O module slots, helping ensure graceful removal and insertion.
■ Ten-slot chassis with ten front-accessible vertical module slots and front-to-back air-
flow and an integrated cable management system.
■ Nine-slot with nine front-accessible module slots and side-to-side airflow in a com-
pact horizontal form factor with purpose-built integrated cable management.
■ Four-slot chassis with all front-accessible module slots and side-to-back airflow in a
small form factor with purpose-built integrated cable management.
Technet24
||||||||||||||||||||
28 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
All Cisco Nexus 7000 Series chassis use a passive mid-plane architecture, providing
physical connectors and copper traces for interconnecting the fabric modules and the I/O
modules for direct data transfer. All intermodule switching is performed via the crossbar
fabric ASICs on the individual I/O modules and fabric modules. In the case of Cisco
Nexus 7004 chassis, since there are no fabric modules, the mid-plane provides the con-
nectors and traces to interconnect the fabric ASICs on the I/O modules directly.
The midplane design on the 9-slot, 10-slot, and 18-slot chassis and the backplane design
on the 4-slot chassis support flexible technology upgrades as your needs change, provid-
ing ongoing investment protection.
With more than 83Tbps of overall switching capacity, the Cisco Nexus 7700 switches
deliver the highest-capacity 10-, 40-, and 100-Gigabit Ethernet ports, with up to 768
native 10Gbps ports, 384 40Gbps ports, or 192 100Gbps ports. This high system capac-
ity is designed to meet the scalability requirements of the largest cloud environments.
Powered by Cisco NX-OS, the Cisco Nexus 7700 switches deliver a comprehensive set
of features with nonstop operations in four chassis form factors (that is, 2-, 6-, 10-, and
18-slot), as shown in Figure 2-4. All 7700 chassis have front-accessible module slots with
front-to-back airflow and an integrated cable management system.
||||||||||||||||||||
A scalable, fully distributed fabric architecture uses up to six fabric modules to deliver up
to 1.32Tbps per slot of bandwidth in the Cisco Nexus 7700 6-, 10-, and 18-slot switches
on day one. In the case of the Cisco Nexus 7700 2-slot chassis, the fabric modules are
not required since it uses a single I/O module. The midplane design on the 2-, 6-, 10-, and
18-slot chassis supports flexible technology upgrades as your needs change, providing
ongoing investment protection.
Note Cisco introduces new supervisor modules, I/O modules, and fabric modules
with enhanced capacity and feature sets from time to time. For the latest supported line
cards and fabric modules in 7000 Series modular chassis, refer to the datasheet located at
https://www.cisco.com/c/en/us/products/switches/nexus-7000-series-switches/datasheet-
listing.html.
The Cisco Nexus 3000 Series switches include the Nexus 3100/3200/3400/3500 and
3600 platform fixed switches.
Table 2-3 summarizes the Cisco Nexus 3000 series switch models.
Technet24
||||||||||||||||||||
30 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
||||||||||||||||||||
and/or blades) with Cisco Adapter FEX and Cisco Data Center VM-FEX technologies.
Although this chapter is for the Cisco Nexus family of products, we will also introduce
the fabric extender products used on the Unified Compute side so that you know what
fabric extender products exist today.
The Cisco Nexus 2200 platform fabric extenders behave like remote line cards for a par-
ent Cisco Nexus switch. The fabric extenders are essentially extensions of the parent
Cisco Nexus switch fabric, with the fabric extenders and the parent switch together form-
ing a distributed modular system. This architecture enables physical topologies with the
flexibility and benefits of both ToR and EoR (end-of-row) deployments.
The Cisco Nexus 2200 platform fabric extenders provide two types of ports: ports for
end-host attachment (host interfaces) and uplink ports (fabric interfaces). Fabric interfac-
es, differentiated with a yellow color, are for connectivity to the upstream parent Cisco
Nexus switch.
Figure 2-6 shows various Cisco Nexus 2200 platform fabric extenders.
Figure 2-6 Cisco Nexus 2200 Platform Fabric Extenders: Cisco Nexus 2232TM-E (Top
Left), Cisco Nexus 2248TP-E (Top Right), and Cisco Nexus Cisco Nexus 2232PP (Bottom)
Table 2-4 summarizes the Cisco Nexus 2200 platform fabric extender models.
Technet24
||||||||||||||||||||
32 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
The Cisco Nexus 2300 platform maintains all the existing Cisco Nexus 2200 Series fea-
tures, including a single point of management, high availability with virtual port channels
(vPC), vPC+, Enhanced vPC, and LAN and SAN convergence using Fibre Channel over
Ethernet (FCoE).
Figure 2-7 shows various Cisco Nexus 2300 platform fabric extenders.
Figure 2-7 Cisco Nexus 2300 Platform Fabric Extenders: Cisco Nexus 2332TQ (Top
Left), Cisco Nexus 2348UPQ (Middle Left), Cisco Nexus 2348TQ (Bottom Left), and Cisco
Nexus 2348TQ-E (Right)
Table 2-5 summarizes the Cisco Nexus 2300 platform fabric extender models.
||||||||||||||||||||
The Cisco Nexus B22 blade fabric extender behaves like a remote line card for a parent
Cisco Nexus switch, together forming a distributed modular system. This architecture
simplifies data center access operations and architecture by combining the management
simplicity of a single high-density access switch with the cabling simplicity of integrated
blade switches and ToR access switches.
The Cisco Nexus B22 provides two types of ports: ports for blade server attachment
(host interfaces) and uplink ports (fabric interfaces). Fabric interfaces, located on the front
of the Cisco Nexus B22 module, are for connectivity to the upstream parent Cisco Nexus
switch.
The Cisco Nexus B22 comes in four models, as shown in Figure 2-8.
Technet24
||||||||||||||||||||
34 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
Figure 2-8 Cisco Nexus B22 Blade Fabric Extender for Dell (Left), IBM (Middle), HP
(Top Right), and Fujitsu (Bottom Right)
Table 2-6 summarizes the Cisco Nexus B22 fabric extender models.
||||||||||||||||||||
Cisco FEX technology offers a flexible solution for adding access port connectivity to
the data center network as well as simplifying migration from 100Mbps or 1Gbps to
10Gbps at the server access layer. Cisco FEX provides flexibility and simplified cabling
of ToR designs as well as simplified management of EoR designs. Finally, unified ports
support flexible LAN and SAN deployments via Ethernet, Fibre Channel, and FCoE con-
nectivity.
The fabric extender integrates with its parent switch, which is a Cisco Nexus Series
device, to allow automatic provisioning and configuration taken from the settings on the
parent device. This integration allows large numbers of servers and hosts to be supported
by using the same feature set as the parent device with a single management domain. The
fabric extender and its parent switch enable a large multipath, loop-free data center topol-
ogy without the use of the Spanning Tree Protocol (STP).
The Cisco Nexus 2000 Series fabric extender forwards all traffic to its parent Cisco
Nexus Series device over 10-Gigabit Ethernet fabric uplinks, which allows all traffic to
be inspected by policies established on the Cisco Nexus Series device. No software is
included with the fabric extender. The software is automatically downloaded and upgrad-
ed from its parent device.
Technet24
||||||||||||||||||||
36 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
Raised
Floor
||||||||||||||||||||
■ Less flexible “per row” architecture. Platform upgrades/changes affect entire row.
Technet24
||||||||||||||||||||
38 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
■ More switches to manage. More ports are required in the aggregation switches.
■ Unique control plane per switch. Higher skill set needed for switch replacement.
||||||||||||||||||||
■ Combines the benefits of both ToR and EoR architectures: high-density server
aggregation switch that physically resides on the top of each rack but logically acts
like an end-of-row access switch
FEX Forwarding
The Cisco Nexus 2000 (that is, the 2200 or 2300 platform) Series fabric extender does
not perform any local switching. All traffic is sent to the parent switch, which provides
central forwarding and policy enforcement, including host-to-host communications
between two systems that are connected to the same fabric extender, as shown in
Figure 2-12.
Parent Switch
Fabric Interface
Fabric Extender
Host Interfaces
Hosts
Technet24
||||||||||||||||||||
40 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
The forwarding model facilitates feature consistency between the fabric extender and its
parent Cisco Nexus device. The fabric extender provides end-host connectivity into the
network fabric. As a result, BPDU Guard is enabled on all its host interfaces. If you con-
nect a bridge or switch to a host interface, that interface is placed in an error-disabled
state when a BPDU is received. You cannot disable BPDU Guard on the host interfaces of
the fabric extender.
The fabric extender supports egress multicast replication from the network to the host.
Packets sent from the parent switch for multicast addresses attached to the fabric extend-
er are replicated by the fabric extender ASICs and then sent to corresponding hosts.
Two methods (the static pinning fabric interface connection and the dynamic fabric inter-
face connection) allow the traffic from an end host to the parent switch to be distributed
when going through the Cisco Nexus 2000 Series fabric extender.
Static Pinning
Static pinning provides a deterministic relationship between the host interfaces and the
parent switch. You configure the fabric extender to use individual fabric interface con-
nections. In this configuration, the 10-Gigabit Ethernet fabric interfaces connect to the
parent switch, as shown in Figure 2-13. You can use any number of fabric interfaces, up
to the maximum available on the model of the fabric extender.
Parent Switch
Fabricl/F4
Fabricl/F3
Fabricl/F2
Fabricl/F1
Fabric Extender
Hosts
||||||||||||||||||||
When the fabric extender is brought up, its host interfaces are distributed equally among
the available fabric interfaces. As a result, the bandwidth that is dedicated to each end
host toward the parent switch is never changed by the switch but instead is always speci-
fied by you.
Note The FEX static pinning fabric interface connection is not supported on Nexus 7000
and 9000 Series switches. Static pinning was supported on Nexus 5000 and 6000 Series
switches, which are end-of-life/end-of-sale at the time of this writing. We have covered it
here, as Nexus 5000 and 6000 Series switches are still deployed at scale at various data
centers.
Dynamic Pinning
Dynamic pinning provides load balancing between the host interfaces and the parent
switch. You configure the fabric extender to use a port channel fabric interface connec-
tion. This connection bundles 10-Gigabit Ethernet fabric interfaces into a single logical
channel, as shown in Figure 2-14.
Parent Switch
Fabric Interface
Fabric Extender
Host Interfaces
Hosts
When you configure the fabric extender to use a port channel fabric interface connection
to its parent switch, the switch load balances the traffic from the hosts that are connected
Technet24
||||||||||||||||||||
42 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
to the host interface ports by using the following load-balancing criteria to select
the link:
■ For a Layer 2 frame, the switch uses the source and destination MAC addresses.
■ For a Layer 3 frame, the switch uses the source and destination MAC addresses and
the source and destination IP addresses.
A fabric interface that fails in the port channel does not trigger a change to the host inter-
faces. Traffic is automatically redistributed across the remaining links in the port channel
fabric interface. If all links in the fabric port channel go down, all host interfaces on the
FEX are set to the down state.
Note Nexus 2000 Series fabric extenders support various topologies using dynamic pin-
ning. Although covering all the supported/unsupported topologies is beyond the scope of
this chapter, it is highly recommended that you refer to the following link to know more
about them: https://www.cisco.com/c/en/us/support/docs/switches/nexus-2000-series-
fabric-extenders/200363-nexus-2000-fabric-extenders-supported-un.html.
Ethernet
Frame
Parent
Switch Size
Frame with 16 11 14 1 1 3 12 in bits
VNTag
Dvif_id or
Ethertype d p l r ver Svif_id
vif_list_id
Fabric
Extender
Ethernet
Frame
||||||||||||||||||||
■ Ethertype: This field identifies a VN-Tag frame. IEEE reserved the value 0x8926 for
Cisco VN-Tag.
■ Direction bit (d): A 0 indicates that the frame is traveling from the FEX to the parent
switch. A 1 means that the frame is traveling from the parent switch to the FEX.
■ Pointer bit (p): A 1 indicates that a Vif_list_id is included in the tag. A 0 signals that
a Dvif_id is included in the frame.
■ Virtual Interface List Identifier (Vif_list_id): This is a 14-bit value mapped to a list
of host interfaces to which this frame must be forwarded.
■ Looped bit (l): This field indicates a multicast frame that was forwarded out the
switch port and later received. In this case, the FEX checks the Svif_id and filters the
frame from the corresponding port.
■ Source Virtual Interface Identifier (Svif_id): This is a 12-bit value mapped to the
host interface that received this frame (if it is going from the FEX to the parent
switch).
■ Host Interfaces (HIFs): These are physical user/host interfaces on the FEX. These
interfaces receive normal Ethernet traffic before it is encapsulated with the VN-Tag
header. Each HIF interface is assigned a unique VN-Tag ID that is used with the
encapsulation.
■ Logical Interface (LIF): This is a logical interface representation of a HIF and its
configuration on the parent switch. Forwarding decisions are based on the LIF.
■ Virtual Interface (VIF): This is a logical interface on the FEX. The parent switch
assigns/pushes the config of an LIF to the VIF of an associated FEX, which is
mapped to a physical HIF. This is why replacing an FEX becomes trivial in that the
broken FEX is unplugged and the replacement is plugged in.
Technet24
||||||||||||||||||||
44 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
LIF
Parent Swith
NIF
VIF
Fabric HIF
Extender
Cisco NX-OS provides a robust and comprehensive feature set that fulfills the switching
and storage networking needs of present and future data centers. Cisco NX-OS runs on
the Cisco Nexus family of network switches, which include Cisco Nexus 9000, 7000,
and 3000 Series switches, Cisco Nexus 2000 Series fabric extenders, and the Cisco MDS
family of storage network switches. A single Cisco NX-OS image runs on Cisco Nexus
switching platforms—Nexus 9000 and Nexus 3000 series switches based on Cisco Cloud
Scale ASICs and merchant silicon ASICs.
||||||||||||||||||||
Feature API
API
Management
Infrastructure HA
Infrastructure
API
Hardware
Netstack
Drivers
Kernel
■ The Linux kernel brings the benefits of Linux to NX-OS, such as preemptive multi-
tasking and multithreading, and is the foundation on which all other processes run
on NX-OS. It has multi-CPU/core support.
■ Hardware drivers are chipset-specific code and provide a hardware abstraction layer
(HAL).
Technet24
||||||||||||||||||||
46 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
Kernel
■ Message & Transaction Service (MTS): MTS is the message relay system for
inter-process communication (IPC) and provides reliable unicast and multicast
delivery. MTS is used for service-to-service and module-to-module messaging.
■ Architectural flexibility
■ Support for Layer 3 (v4/v6) unicast and multicast routing protocol suites such as
BGP, OSPF, EIGRP, PIM-SM, SSM, and MDSP.
■ Support for VXLAN EVPN overlay fabrics, including VXLAN EVPN vPC fabric
peering for an enhanced dual-homing access solution.
||||||||||||||||||||
■ Extensive programmability
■ Pervasive APIs for all-switch CLI functions with NX-API (JSON-based RPC over
HTTP/HTTPs).
■ Pervasive visibility
■ Support for a flexible NetFlow feature that enables enhanced network anomalies
and security detection.
■ Support for monitoring real-time flows, flow paths, and latency, which allows
organizations to gain invaluable visibility into their fabrics with Cisco Nexus
Insights.
■ Enables service-level high availability with (a) process isolation and (b) process
restartability. Process isolation provides a highly fault-tolerant software infra-
structure and fault isolation between the services. Cisco NX-OS processes run
in protected memory spaces that are independent of each other and the kernel.
Process restartability ensures that process-level failures do not cause system-level
failure.
Technet24
||||||||||||||||||||
48 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
■ Network modeling
■ Cisco Nexus 9000v switch (virtual NX-OS), with both 9300 and 9500 form fac-
tors, extends automation and operational models for DevOps and NetOps inte-
gration, with images built for Vagrant, VMware ESXi, KVM, and Fusion.
■ Extensive support for Nexus 9000v is available in Cisco Virtual Internet and
Routing Lab (Cisco VIRL) and Cisco Modeling Labs (CML).
Note The Cisco NX-OS naming convention differs among the various Nexus product
family. For example, Nexus 7000 and MDS switches follow one naming convention, where-
as Nexus 9000 and Nexus 3000 switches follow another. Refer to the following links for
naming conventions of NX-OS for different Nexus products:
■ Nexus 7000 and MDS switches: https://tools.cisco.com/security/center/resources/
ios_nx_os_reference_guide as part of hyperlink.
||||||||||||||||||||
Global configuration mode provides access to the broadest range of commands. The term
indicates characteristics or features that affect the device as a whole. You can enter com-
mands in global configuration mode to configure your device globally or to enter more
specific configuration modes to configure specific elements such as interfaces or proto-
cols.
Example 2-2 illustrates global configuration mode. The global configuration mode can be
enabled using the command configure terminal.
To configure interfaces on your device, you must specify the interface and enter interface
configuration mode from global configuration mode, as shown in Example 2-3. You can
enable many features on a per-interface basis. Interface configuration commands modify
Technet24
||||||||||||||||||||
50 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
the operation of the interfaces on the device, such as Ethernet interfaces or management
interfaces (mgmt0).
From global configuration mode, you can access a configuration submode for configur-
ing VLAN interfaces called subinterfaces. In subinterface configuration mode, you can
configure multiple virtual interfaces on a single physical interface. Subinterfaces appear
to a protocol as distinct physical interfaces. Subinterfaces also allow multiple encapsula-
tions for a protocol on a single interface. For example, you can configure IEEE 802.1Q
encapsulation to associate a subinterface with a VLAN.
The Cisco NX-OS software allows you to save the current command mode, configure
a feature, and then restore the previous command mode. The push command saves the
command mode, and the pop command restores the command mode.
Example 2-5 shows how to save and restore a command mode using the push and pop
commands.
||||||||||||||||||||
To exit a configuration command mode, you can use either the exit or end command.
The exit command exits from the current configuration command mode and returns to
the previous configuration command mode, as shown in Example 2-6.
The end command exits from the current configuration command mode and returns to
EXEC mode, as shown in Example 2-7.
The output from show commands can be lengthy and cumbersome. The Cisco NX-OS
software provides the means to search and filter the output so that you can easily locate
information. The searching and filtering options follow a pipe character (|) at the end of
the show command, as shown in Example 2-8.
Technet24
||||||||||||||||||||
52 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
Table 2-8 lists command key combinations that can be used in both EXEC and configura-
tion modes.
||||||||||||||||||||
Keystrokes Description
Ctrl-U Deletes all characters from the cursor to the beginning of the com-
mand line.
Ctrl-V Removes any special meaning for the following keystroke. For
example, press Ctrl-V before entering a question mark (?) in a regu-
lar expression.
Ctrl-W Deletes the word to the left of the cursor.
Ctrl-X, H Lists the history of commands you have entered. When using this
key combination, press and release the Ctrl and X keys together
before pressing H.
Ctrl-Y Recalls the most recent entry in the buffer (press keys simultane-
ously).
Ctrl-Z Ends a configuration session and returns you to EXEC mode.
When used at the end of a command line in which a valid command
has been typed, the resulting configuration is first added to the run-
ning configuration file.
Up arrow key Displays the previous command in the command history.
Down arrow key Displays the next command in the command history.
Right arrow key Moves your cursor through the command string, either forward or
backward, allowing you to edit the current command.
Left arrow key
? Displays a list of available commands.
Tab Completes the word for you after you enter the first characters of
the word and then press the Tab key. All options that match are
presented.
■ Command names
You can display all the options following a pipe character using the CLI context-sensitive
help (?) facility, as shown in Example 2-9.
Technet24
||||||||||||||||||||
54 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
N9K#
Pressing the Tab key completes the word for you. If there is ambiguity and there are mul-
tiple commands possible, the Tab key lists all the possible options that match, as shown
in Example 2-10.
||||||||||||||||||||
N9K# conf<Tab>
N9K# configure
N9K# sh run
Technet24
||||||||||||||||||||
56 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
feature interface-vlan
feature lldp
<output omitted>
N9K# conf t
Enter configuration commands, one per line. End with CNTL/Z.
N9K(config)# int e 1/2
N9K(config-if)#
To change the running configuration, use the configure terminal command to enter
global configuration mode. As you use the Cisco NX-OS configuration modes, com-
mands generally are executed immediately and are saved to the running configuration file
either immediately after you enter them or when you exit a configuration mode.
||||||||||||||||||||
To change the startup configuration file, you can either save the running configuration
file to the startup configuration using the copy running-config startup-config command
or copy a configuration file from a file server to the startup configuration.
Example 2-13 illustrates various Nexus device configurations and their impact on running
configuration and startup configuration files.
interface Ethernet1/5
interface Ethernet1/5
interface Ethernet1/5
description TEST
interface Ethernet1/5
Technet24
||||||||||||||||||||
58 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
interface Ethernet1/5
description TEST
N9K#
The write erase command erases the entire startup configuration, except for the
following:
■ Address
■ Subnet mask
To remove the boot variable definitions and the IPv4/IPv6 configuration on the mgmt0
interface, use the write erase boot command. To remove all application persistency files
(such as patch rpms, third party rpms, and application configuration) in the /etc directory
other than configuration, use 'install reset'.
||||||||||||||||||||
N9K# wr
[########################################] 100%
Copy complete, now saving to disk (please wait)...
Copy complete.
N9K# show startup-config | include alias
cli alias name wr copy running-config startup-config
N9K#
! Under the software section of show version command, you can see that the software
version of NX-OS installed is 9.3(5) version and under the hardware section of the
output, you can see that the device model is C93180YC-FX.
Technet24
||||||||||||||||||||
60 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
Software
BIOS: version 05.45
NXOS: version 9.3(5)
BIOS compile time: 07/05/2021
NXOS image file is: bootflash:///nxos.9.3.5.bin
NXOS compile time: 7/20/2020 20:00:00 [07/21/2020 06:30:11]
Hardware
cisco Nexus9000 C93180YC-FX Chassis
Intel(R) Xeon(R) CPU D-1528 @ 1.90GHz with 24569356 kB of memory.
Processor Board ID FDO23510ELW
plugin
Core Plugin, Ethernet Plugin
Active Package(s):
N9K#
||||||||||||||||||||
Summary 61
The show module command displays all the hardware modules installed in a modular
chassis. For fixed models, the show module command displays the preinstalled modules,
as illustrated in Example 2-16.
Mod Sw Hw Slot
--- ----------------------- ------ ----
1 9.3(5) 1.2 NA
Summary
This chapter discusses Cisco Nexus Data Center, Cisco Fabric Extender (FEX), Cisco
NX-OS software architecture, and the Cisco NX-OS CLI, including the following points:
■ The Cisco Nexus switches are a foundational component of the Cisco Data Center
and are well-suited for both traditional and fully automated software-defined data
center deployments.
■ The Cisco Nexus 9000 Series switches operate in one of two modes: Cisco
Application Centric Infrastructure (Cisco ACI) or Cisco NX-OS. The Cisco Nexus
9000 Series switches include the Nexus 9500 Series modular switches and the Nexus
9200/9300 Series fixed switches.
■ Cisco Nexus 7000 Series switches are a modular data center–class product line
designed for highly scalable 1/10/40/100-Gigabit Ethernet networks. The Cisco
Nexus 7000 Series switches include the Nexus 7000 and 7700 platform modular
switches.
Technet24
||||||||||||||||||||
62 Chapter 2: Describing the Cisco Nexus Family and Cisco NX-OS Software
■ The Cisco Nexus 3000 Series switches are a comprehensive portfolio of 1-, 10-, 40-,
100-, and 400-Gigabit Ethernet switches built from a switch-on-a-chip (SoC) archi-
tecture.
■ The Cisco FEX technology solution is based on Standard IEEE 802.1BR and includes
a parent switch that can be a Nexus 7000 Series switch, Nexus 9000 Series switch,
or a Cisco UCS Fabric Interconnect. The parent switch is then extended to connect
to the server either as a remote line card with Nexus 2200/2300 Series fabric extend-
ers or logically partitioned or virtualized adapter ports to connect to any type of
server (rack and/or blades) with Cisco Adapter FEX and Cisco Data Center VM-FEX
technologies.
■ There are three major server deployment models: end-of-row (EoR) deployment
model, top-of-rack (ToR) deployment model, and Fabric Extender (FEX) deployment
model. The FEX deployment model enables physical topologies with the flexibility
and benefits of both ToR and EoR deployments.
■ FEX supports two forwarding models: static pinning, which provides a deterministic
relationship between the host interfaces and the parent switch, and dynamic pinning,
which provides load balancing between the host interfaces and the parent switch.
■ Cisco NX-OS software is a data center–class operating system built with modularity,
resiliency, and serviceability at its foundation and runs on the Cisco Nexus family
of network switches, which include Cisco Nexus 9000, 7000, 3000 Series switches,
Cisco Nexus 2000 Series fabric extenders, and the Cisco MDS family of storage net-
work switches.
■ Cisco NX-OS CLI supports various configuration modes such as EXEC mode, global
configuration mode, and subinterface configuration mode to configure specific
elements.
■ Cisco NX-OS CLI supports various features, such as command abbreviations and
command aliases, that help an administrator to configure and deploy the device
quickly with less effort.
References
Relevant Cisco Nexus Switches Data Sheets: https://www.cisco.com/c/en/us/products/
switches/nexus-9000-series-switches/datasheet-listing.html
“Cisco Nexus 9000 NX-OS Fundamentals Configuration Guide, Release 10.2(x),” https://
www.cisco.com/c/en/us/td/docs/dcn/nx-os/nexus9000/102x/configuration/funda-
mentals/cisco-nexus-9000-nx-os-fundamentals-configuration-guide-102x.html
Relevant Cisco Live sessions: http://www.ciscolive.com
||||||||||||||||||||
References 63
“Cisco Nexus 2000 Series NX-OS Fabric Extender Configuration Guide for Cisco Nexus
9000 Series Switches, Release 10.2(x),” https://www.cisco.com/c/en/us/td/docs/
dcn/nx-os/nexus9000/102x/configuration/fex/cisco-nexus-2000-series-nx-os-fabric-
extender-configuration-guide-for-cisco-nexus-9000-series-switches-release-102x.
html
“Cisco Nexus 2000 Series Fabric Extender Software Configuration Guide for Cisco
Nexus 7000 Series Switches, Release 8.x,” https://www.cisco.com/c/en/us/td/docs/
switches/datacenter/nexus2000/sw/configuration/guide/b_Configuring_the_Cisco_
Nexus_2000_Series_Fabric_Extender_rel_8_x.html
“Cisco Adapter Fabric Extender: Solution Overview,” https://www.cisco.com/c/en/us/
products/collateral/switches/nexus-5000-series-switches/data_sheet_c78-657397.
html
Technet24
||||||||||||||||||||
||||||||||||||||||||
Chapter 3
In this chapter, we will discuss the challenges presented by single default gateway config-
uration on end hosts, along with its solutions using Hot Standby Router Protocol (HSRP),
Virtual Router Redundancy Protocol (VRRP), and Gateway Load Balancing Protocol
(GLBP) as Layer 3 redundancy protocols.
Figure 3-1 illustrates the default gateway’s limitations. Router A is configured as the
default gateway for Host A. If Router A becomes unavailable, the routing protocols can
quickly and dynamically converge and determine that Router B can transfer packets to
the destination server that would otherwise have gone through Router A. However, most
workstations, servers, and printers do not receive this dynamic routing information.
Technet24
||||||||||||||||||||
Server
10.9.1.50
Router A Router B
10.1.10.2 10.1.10.3
0014.a855.1788 0014.a866.2898
No Default Gateway
I cannot reach my
default gateway.
Host A
The default gateway limitation discussed earlier can be resolved by using router redun-
dancy. In router redundancy, multiple routers are configured to work together to pres-
ent the illusion of a single virtual router to the hosts on a particular IP segment. This is
achieved by sharing a virtual IP (Layer 3) address and a virtual MAC (Layer 2) address
between multiple routers. The IP address of the virtual router is configured as the default
gateway for the hosts on that particular IP segment.
In the beginning, before the end host can send any packets to a different network than its
own, the end host uses Address Resolution Protocol (ARP) to resolve the MAC address
that is associated with the IP address of the default gateway. The ARP resolution returns
the MAC address of the virtual router. Frames that are sent to the MAC address of the
virtual router can be physically processed by an active router or standby router that is
part of that virtual router group, depending on the first-hop redundancy protocol used.
Host devices send traffic to the address of the virtual router. The physical router that for-
wards this traffic is transparent to the end stations. The redundancy protocol provides the
mechanism for determining which router should take the active role in forwarding traffic
and determining when a standby router must assume that role. In short, first-hop redun-
dancy provides a network the ability to dynamically recover from the failure of a device
acting as a default gateway.
||||||||||||||||||||
Core
Virtual Router
Forwarding Standby
Router Router
In Figure 3-3, when the forwarding router fails, the standby router stops receiving hello
messages from the forwarding router. We will discuss the concept of hello messages in
router redundancy protocols later in this chapter. The standby router assumes the role
of the forwarding router and assumes the IP address and the MAC address of the virtual
router. During this failover, the end hosts see no disruption in service, as the end hosts are
still sending the data packets to the same virtual IP and MAC address of the default gate-
way configured on them.
Core
Virtual Router
Standby Forwarding
Router Router
Technet24
||||||||||||||||||||
Because hosts are configured with their default router as the HSRP virtual IP address,
hosts must communicate with the MAC address associated with the HSRP virtual IP
address. This MAC address is a virtual MAC address, 0000.0C07.ACxy, where xy is the
HSRP group number in hexadecimal based on the respective interface. For example,
HSRP group 1 uses the HSRP virtual MAC address 0000.0C07.AC01. Hosts on the
adjoining LAN segment use the normal Address Resolution Protocol (ARP) process to
resolve the associated MAC addresses.
Interfaces that run HSRP send and receive multicast User Datagram Protocol–based hello
messages to detect a failure and to designate active and standby routers. When the active
router fails to send a hello message within a configurable period of time, the standby
router with the highest priority becomes the active router. The transition of packet-for-
warding functions between the active and standby routers is completely transparent to all
hosts on the network.
HSRP hello packets are sent to the destination IP multicast address 224.0.0.2 (reserved
multicast address used to communicate to all routers) on User Datagram Protocol (UDP)
port 1985. The active router sources hello packets from its configured IP address and the
HSRP virtual MAC address, while the standby router sources hellos from its configured
IP address and the interface MAC address, which might be the burned-in address (BIA).
The BIA is the last 6 bytes of the MAC address assigned by the manufacturer of the net-
work interface card (NIC).
||||||||||||||||||||
Two objects you can track are the line protocol state of an interface and the reachability
of an IP route. If the specified object goes down, Cisco NX-OS reduces the HSRP prior-
ity by the configured amount. Object tracking allows you to route to a standby router if
the interface to the main network fails.
Figure 3-4 illustrates the HSRP object-tracking feature. In the left diagram, uplinks from
Router A and Router B were tracked by HSRP. Router A was the HSRP standby router
and Router B was the HSRP active router. When the uplink from Router B fails, HSRP
decreases the HSRP priority on Router B, making Router A the active router to process
the traffic.
Tracked
Interfaces
Core Core
A B A B
Standby Active Active Standby
Technet24
||||||||||||||||||||
Core
A B
Root VLAN 10 Root VLAN 20
HSRP Active (Group 10) HSRP Active (Group 20)
HSRP Standby (Group 20) HSRP Standby (Group 10)
VLAN 20 VLAN 10
Figure 3-5 shows two routers (A and B) and two HSRP groups (10 and 20). Router A is
the active router for group 10 but is the standby router for group 20. Similarly, Router B is
the active router for group 20 and the standby router for group 10. If both routers remain
active, HSRP load-balances the traffic from the hosts across both routers. If either router
fails, the remaining router continues to process traffic for both hosts.
HSRP States
Each HSRP router will go through a number of states before it ends up as an active or
standby router. Table 3-1 describes the various HSRP states.
||||||||||||||||||||
State Definition
Speak The router sends periodic hello messages and actively participates in the election of
the active and/or standby router. A router cannot enter speak state unless the router
has the virtual IP address.
Standby The router is a candidate to become the next active router and sends periodic hello
messages. With the exclusion of transient conditions, there is, at most, one router in
the group in standby state.
Active The router currently forwards packets that are sent to the group virtual MAC
address. The router sends periodic hello messages. With the exclusion of transient
conditions, there must be, at most, one router in active state in the group.
Note The terminology router/switch might be a little confusing in Table 3-1. Cisco
Nexus devices are Layer 3 switches, meaning they are capable of both switching and rout-
ing. The naming switch or router is applied depending on the OSI protocol layer the con-
figuration applies to. HSRP is a Layer 3 redundancy protocol. Therefore, the Nexus devices
act as routers in this case.
HSRP Versions
Cisco NX-OS supports HSRP version 1 by default. You can configure an interface to use
HSRP version 2. With HSRP version 2, you gain the following enhancements:
■ Expanded group number range: HSRP version 1 supports group numbers from 0 to
255. HSRP version 2 supports group numbers from 0 to 4095.
■ Use of the new IP multicast address: HSRP version 2 uses the IPv4 multicast
address 224.0.0.102 to send hello packets instead of the multicast address of
224.0.0.2, which is used by HSRP version 1.
■ Use of the new MAC address range: HSRP version 2 uses the MAC address range
from 0000.0C9F.F000 to 0000.0C9F.FFFF. HSRP version 1 uses the MAC address
range 0000.0C07.AC00 to 0000.0C07.ACFF.
Technet24
||||||||||||||||||||
When you change the HSRP version, Cisco NX-OS reinitializes the HSRP group because
it now has a new virtual MAC address. Since the interface is reset during the reinitializa-
tion of the HSRP group, the version change process is disruptive in nature.
HSRP Configuration
Configuring basic HSRP is a multistep process. The following are the steps to configure a
basic HSRP configuration on the Cisco Nexus 7000 or 9000 Series switch:
Table 3-2 summarizes the NX-OS CLI commands related to basic HSRP configuration
and verification.
Table 3-2 Summary of NX-OS CLI Commands for HSRP Configuration and
Verification
Command Purpose
configure terminal Enters global configuration mode.
[no] feature hsrp Enables the HSRP feature. Use the no form of this command to
disable HSRP for all groups.
interface vlan number Creates a VLAN interface. The number range is from 1 to 4094.
hsrp version {1 | 2} Confirms the HSRP version. Version 1 is the default.
hsrp group- Creates an HSRP group and enters HSRP configuration mode.
number [ipv4 | ipv6]
ip [ip-address [secondary]] Configures the virtual IP address for the HSRP group and enables
the group. This address should be in the same subnet as the IPv4
address of the interface.
||||||||||||||||||||
Command Purpose
priority [value] Sets the priority level used to select the active router in an HSRP
group. The range is from 0 to 255. The default is 100.
preempt [delay [mini- Configures the router to take over as the active router for an
mum seconds] [reload HSRP group if it has a higher priority than the current active
seconds] [sync seconds]] router. This command is disabled by default. Optionally, you con-
figure a delay of the HSRP group preemption by the configured
time. The range is from 0 to 3600 seconds.
show hsrp [group group- Displays HSRP information.
number] [ipv4]
show hsrp brief Displays a brief summary of the HSRP status for all groups in the
device.
Examples 3-1 to 3-3 show the basic HSRP configuration and verification on the sample
topology shown in Figure 3-6. The base IP addresses have already been configured in
VLAN 100 on the sample topology. Here, we will focus on HSRP-specific configuration.
N7K-A and N7K-B will act as redundant gateways using HSRP.
VIP = .1
N7K-B
N7K-A
Et /8
h h6
6/
8 Et
Et
h
1/ /50
49 h1
Et
N9K-A
Technet24
||||||||||||||||||||
In Example 3-1, we will see the basic HSRP configuration on N7K-A and N7K-B.
N7K-A
N7K-B
! Configuring HSRP version 2 for interface Vlan 100. HSRP version 1 is the default.
N7K-A
N7K-B
! Configuring the HSRP group 100 for interface vlan 100 with 192.168.100.1 as the
virtual IP.
N7K-A
||||||||||||||||||||
N7K-B
N7K-B(config-if)# hsrp 100
N7K-B(config-if-hsrp)# ip 192.168.100.1
N7K-B(config-if-hsrp)#
! Setting higher priority for N7K-A with preemption feature. The default priority
is 100. Preempt setting only applies to the router with higher priority if a router
with lower priority is in active state. This usually means that there was a fail-
ure of the router with higher priority. Below configuration will assure that N7K-A
becomes HSRP active router.
N7K-A
N7K-A
Technet24
||||||||||||||||||||
Interface Grp Prio P State Active addr Standby addr Group addr
Vlan100 100 120 P Active local 192.168.100.3 192.168.100.1
(conf)
N7K-A#
N7K-B
In Example 3-3, we see the impact of the HSRP preempt configuration on HSRP
operation.
! Shutting down interface vlan 100 on N7K-A HSRP active router. N7K-B takes over
the active role. HSRP on N7K-A will be stuck in initial state.
N7K-A
||||||||||||||||||||
N7K-B
! Bringing the interface vlan 100 up on N7K-A. Preempt feature kicks in because
N7K-A has higher priority and N7K-A takes over the active role once again.
N7K-A
N7K-A(config-if)# no shutdown
N7K-A(config-if)# show hsrp brief
*:IPv6 group #:group belongs to a bundle
P indicates configured to preempt.
|
Interface Grp Prio P State Active addr Standby addr Group addr
Vlan100 100 120 P Active local unknown 192.168.100.1
(conf)
N7K-A(config-if)#
N7K-B
Technet24
||||||||||||||||||||
Figure 3-7 shows a basic VLAN topology. In this example, Routers A, B, and C form a
VRRP group. The IP address of the group is the same address that was configured for the
Ethernet interface of Router A (10.0.0.1).
Because the virtual IP address uses the IP address of the physical Ethernet interface of
Router A, Router A is the primary router (also known as the IP address owner). As the
primary, Router A owns the virtual IP address of the VRRP group and forwards packets
sent to this IP address. Clients 1 through 3 are configured with the default gateway IP
address of 10.0.0.1. Routers B and C function as backups. If the primary fails, the backup
router with the highest priority becomes the primary and takes over the virtual IP address
to provide uninterrupted service for the LAN hosts. When Router A recovers, it becomes
the primary again.
The VRRP primary sends VRRP advertisements to other VRRP routers in the same group.
The advertisements communicate the priority and state of the primary. Cisco NX-OS
encapsulates the VRRP advertisements in IP packets and sends them to the IP multicast
address 224.0.0.18, assigned to the VRRP group. Cisco NX-OS sends the advertisements
once every second, by default, but you can configure a different advertisement interval.
||||||||||||||||||||
VRRP Tracking
VRRP supports the following options for tracking:
■ Native interface tracking: Tracks the state of an interface and uses that state to
determine the priority of the VRRP router in a VRRP group. The tracked state is
down if the interface is down or if the interface does not have a primary IP address.
■ Object tracking: Tracks the state of a configured object and uses that state to deter-
mine the priority of the VRRP router in a VRRP group. The tracked object can be an
interface IP routing state or IP route reachability.
If the tracked state (interface or object) goes down, VRRP updates the priority based on
what you have configured the new priority to be for the tracked state. When the tracked
state comes up, VRRP restores the original priority for the virtual router group. For
example, you might want to lower the priority of a VRRP group member if its uplink to
the network goes down so that another group member can take over as primary for the
VRRP group.
In Figure 3-8, if Router A (the primary in a LAN topology) fails, VRRP must determine if
one of the backups (B or C) should take over. If you configure Router B with priority 101
and Router C with the default priority of 100, VRRP selects Router B to become the pri-
mary because it has the higher priority. If you configure both Routers B and C with the
default priority of 100, VRRP selects the backup with the higher IP address to become
the primary. In this case, Router C will become the primary, as it has a higher IP address.
Technet24
||||||||||||||||||||
VRRP uses preemption to determine what happens after a VRRP backup router becomes
the primary. With preemption enabled by default, VRRP switches to a backup if that
backup comes online with a priority higher than the new primary. In the previous exam-
ple, if Router A is the primary and fails, VRRP selects Router B (next in order of priority).
If Router C comes online with a higher priority than Router B, VRRP selects Router C as
the new primary, even though Router B has not failed. If you disable preemption, VRRP
switches only if the original primary recovers (that is, Router A) or the new primary fails.
Figure 3-9 shows a LAN topology in which VRRP is configured so that Routers A and B
share the traffic to and from Clients 1 through 4. Routers A and B act as backups to each
other if either router fails. The topology contains two virtual IP addresses for two VRRP
groups that overlap. For VRRP group 1, Router A is the owner of IP address 10.0.0.1 and
is the primary. Router B is the backup to Router A. Clients 1 and 2 are configured with
the default gateway IP address of 10.0.0.1. For VRRP group 2, Router B is the owner of
IP address 10.0.0.2 and is the primary. Router A is the backup to Router B. Clients 3 and
4 are configured with the default gateway IP address of 10.0.0.2. If Router A fails, Router
B takes the role of the primary router and forwards the traffic from all the clients until
Router A becomes available again.
Router A Router B
Primary for Virtual Router 1 Backup for Virtual Router 1
Backup for Virtual Router 2 Primary for Virtual Router 2
10.0.0.1 10.0.0.2
Table 3-3 lists various differences and similarities between VRRP and HSRP.
||||||||||||||||||||
VRRP Configuration
Configuring basic VRRP is a multistep process. The following are the steps to configure a
basic VRRP configuration on the Cisco Nexus 7000 or 9000 Series switch:
Technet24
||||||||||||||||||||
First, you must globally enable the VRRP feature before you can configure VRRP groups.
Next, you configure a VRRP group on an interface and configure the virtual IP address
of the VRRP group. The virtual IP address should be in the same subnet as the IPv4
address of the interface. Next, you configure the VRRP priority on the interface. The
priority range for a virtual router is from 1 to 254 (1 is the lowest priority and 254 is the
highest). The default is 100 for backups and 255 for a primary router. Optionally, you can
configure simple text authentication for the VRRP group. Also, you can optionally con-
figure the VRRP group to adjust its priority based on the availability of an interface.
GLBP performs a similar function to the Hot Standby Redundancy Protocol (HSRP) and
the Virtual Router Redundancy Protocol (VRRP). In HSRP and VRRP, multiple routers
participate in a virtual group configured with a virtual IP address. These protocols elect
one member as the active router to forward packets sent to the virtual IP address for the
group. The other routers in the group are redundant until the active router fails. GLBP
performs an additional load-balancing function that the other protocols do not provide.
GLBP load-balances over multiple routers (gateways) using a single virtual IP address and
multiple virtual MAC addresses. GLBP shares the forwarding load among all routers in a
GLBP group instead of allowing a single router to handle the whole load while the other
routers remain idle. You configure each host with the same virtual IP address, and all
routers in the virtual group participate in forwarding packets. GLBP members communi-
cate between each other using periodic hello messages.
||||||||||||||||||||
Internet
GLBP Group 1
1 2 3 4
GLBP Operation
GLBP prioritizes gateways to elect an active virtual gateway (AVG). Other group mem-
bers provide backup for the AVG if that AVG becomes unavailable. If multiple gateways
have the same priority, the gateway with the highest real IP address becomes the AVG.
The group members request a virtual MAC address after they discover the AVG through
hello messages. The AVG assigns a virtual MAC address to each member of the GLBP
group. Each member is the active virtual forwarder (AVF) for its assigned virtual MAC
address, forwarding packets sent to its assigned virtual MAC address. The AVG also
answers Address Resolution Protocol (ARP) requests for the virtual IP address. Load
sharing is achieved when the AVG replies to the ARP requests with different virtual MAC
addresses.
Figure 3-11 illustrates the GLBP ARP resolution process. Router R1 is the AVG for a
GLBP group, and it is responsible for the virtual IP (vIP in the figure) address 10.88.1.10.
Router R1 is responsible for responding to ARP requests for default gateway (10.88.1.10)
and handing out a MAC address of an AVF. Router R1 is also an AVF for the virtual MAC
address 0000.0000.0001. Router R2 is a member of the same GLBP group and is desig-
nated as the AVF for the virtual MAC address 0000.0000.0002. Client A has a default
gateway IP address of 10.88.1.10 and, during initial ARP resolution for the default gate-
way IP address, receives a gateway MAC address of 0000.0000.0001. Client B shares the
same default gateway IP address but receives the gateway MAC address 0000.0000.0002
because R2 is sharing the traffic load with R1.
Technet24
||||||||||||||||||||
1 2
AVG/AVF vIP AVF
10.88.1.10
glbp 1 ip 10.88.1.10 glbp 1 ip 10.88.1.10
vMAC 0000.0000.0001 R1 ARP R2 vMAC 0000.0000.0002
.1 Reply .2
10.88.1.0/24
.4 .5
A B
Figure 3-12 illustrates the traffic forwarding in the GLBP environment. Clients A and B
send their off-network traffic to separate next-hop routers because each has cached a
different MAC address for the single virtual gateway IP address (in this case, 10.88.1.10).
Each GLBP router is an AVF for the MAC address it has been assigned.
1 2
AVG/AVF vIP AVF
10.88.1.10
glbp 1 ip 10.88.1.10 glbp 1 ip 10.88.1.10
vMAC 0000.0000.0001 vMAC 0000.0000.0002
R1 R2
.1 .2
10.88.1.0/24
.4 .5
A B
||||||||||||||||||||
load balancing to determine whether a GLBP group member acts as an AVF. You must
configure the initial weighting values and optional thresholds to enable or disable this
group member as an AVF. You can also configure the interface to track and the value
that reduces the interface’s weighting if the interface goes down. When the GLBP group
weighting drops below the lower threshold, the member is no longer an AVF and a sec-
ondary virtual forwarder takes over. When the weighting rises above the upper threshold,
the member can resume its role as an AVF.
Figure 3-13 illustrates the GLBP interface-tracking feature. Router R2 is the secondary
virtual forwarder for Client A and is configured for preemption. If Router R1 becomes
unavailable, the weight of the interface connecting Client A to R1 drops below the
lower threshold configured, and Router R2 preempts and takes over as AVF for vMAC
0000.0000.0001. Client A will not lose access to the outside network because R2 will
assume responsibility for forwarding packets sent to the virtual MAC address of R1 and
for responding to packets sent to its own virtual MAC address. R2 will also assume the
role of the AVG for the entire GLBP group. Communication for the GLBP members con-
tinues despite the failure of a router in the GLBP group.
1 2
AVG vIP AVF
10.88.1.10
glbp 1 ip 10.88.1.10
glbp 1 ip 10.88.1.10
vMAC 0000.0000.0002
R1 R2 vMAC 0000.0000.0001
.1 .2
10.88.1.0/24
.4 .5
A B
Note At the time of this writing, GLBP is not supported on Nexus 9000 Series switches.
Nexus 7000 Series switches support GLBP.
Technet24
||||||||||||||||||||
Summary
This chapter discusses the challenges of single default gateway configuration on end
hosts along with the solutions using default gateway redundancy, including Hot Standby
Router Protocol (HSRP), Virtual Router Redundancy Protocol (VRRP), and Gateway
Load Balancing Protocol (GLBP). The following points were discussed:
■ End hosts are typically configured with a single default gateway IP address that does
not change when the network topology changes.
■ HSRP, VRRP, and GLBP are first-hop redundancy protocols (FHRPs) for establishing
a fault-tolerant default gateway for IP hosts on Ethernet networks.
■ HSRP selects one active router (responsible for forwarding traffic destined to the
HSRP virtual IP address) and one standby router, and the remaining routers are in the
listening state.
■ VRRP selects one primary router (responsible for forwarding traffic destined to the
VRRP virtual IP address) and multiple backup routers.
■ GLBP selects one active virtual gateway (AVG) and up to four active virtual forward-
ers (AVFs). AVG itself can act as AVF. All AVFs can forward traffic at the same time,
resulting in in-built load balancing.
■ HSRP and VRRP use one virtual IP and one virtual MAC, whereas GLBP uses one
virtual IP with multiple virtual MACs.
References
“Cisco Nexus 9000 Series NX-OS Unicast Routing Configuration Guide, Release 10.2(x),”
https://www.cisco.com/c/en/us/td/docs/dcn/nx-os/nexus9000/102x/configuration/
Unicast-routing/cisco-nexus-9000-series-nx-os-unicast-routing-configuration-guide-
release-102x.html
“Cisco Nexus 7000 Series NX-OS Unicast Routing Configuration Guide, Release 8.x,”
https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus7000/sw/unicast/
config/cisco_nexus7000_unicast_routing_config_guide_8x.html
“Cisco Nexus 7000 Series NX-OS High Availability and Redundancy Guide, Release 8.x,”
https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus7000/sw/high-
availability/config/cisco_nexus7000_high_availability_config_guide_8x.html
Relevant Cisco Live sessions: http://www.ciscolive.com
||||||||||||||||||||
Chapter 4
In early Layer 2 Ethernet network environments, Spanning Tree Protocol (STP) was the
primary solution to limit the disastrous effects of a topology loop in the network. STP
has one suboptimal principle: to break loops in a network, only one active path is allowed
from one device to another, regardless of how many actual connections might exist in the
network. The single logical link creates two problems: one problem is that half (or more)
of the available system bandwidth is off limits to data traffic, and the other problem is
that a failure of the active link tends to cause multiple seconds of systemwide data loss
while the network re-evaluates the new “best” solution for network forwarding in the
Layer 2 network. In addition, no efficient dynamic mechanism exists for using all the
available bandwidth in a robust network with STP loop management. To overcome these
challenges, enhancements to Layer 2 Ethernet networks were made in the form of port
channel and virtual port channel (vPC) technologies. Port Channel technology allows
multiple links between two participating devices to be used to forward traffic by using
a load-balancing algorithm that equally balances traffic across the available Inter-Switch
Links (ISLs) while also managing the loop problem by bundling the links as one logical
link. vPC technology allows multiple devices to form a port channel. In vPC, a pair of
switches acting as a vPC peer endpoint looks like a single logical entity to port channel–
attached devices; the two devices that act as the logical port channel endpoint are still
two separate devices. The vPC environment combines the benefits of hardware redun-
dancy with the benefits of port channel loop management.
In this chapter, we will discuss Ethernet port channels, including port channel modes,
port channel compatibility requirements, and port channel load balancing. We will also
discuss virtual port channels, including various vPC topology implementations, vPC com-
ponents, vPC control and data planes, vPC failure scenarios, and vPC configuration and
verification.
Technet24
||||||||||||||||||||
You can create a Layer 2 port channel by bundling compatible Layer 2 interfaces, or you
can create Layer 3 port channels by bundling compatible Layer 3 interfaces. You cannot
combine Layer 2 and Layer 3 interfaces in the same port channel. You can also change
the port channel from Layer 3 to Layer 2. You can create port channels directly by creat-
ing the port channel interface, or you can create a channel group that acts to aggregate
individual ports into a bundle. When you associate an interface with a channel group, the
Cisco NX-OS software creates a matching port channel automatically if the port channel
does not already exist. In this instance, the port channel assumes the Layer 2 or Layer
3 configuration of the first interface. You can also create the port channel first. In this
instance, the Cisco NX-OS software creates an empty channel group with the same chan-
nel number as the port channel and takes the default Layer 2 or Layer 3 configuration as
well as the compatibility configuration.
You can configure Layer 2 port channels in either access or trunk mode. A Layer 2 port
channel interface and its member ports can have different STP parameters. Changing the
STP parameters of the port channel does not impact the STP parameters of the member
ports because a port channel interface takes precedence if the member ports are bundled.
After a Layer 2 port becomes part of a port channel, all switchport configurations must
be done on the port channel; you can no longer apply switchport configurations to indi-
vidual port channel members. Layer 3 port channel interfaces have routed ports as chan-
nel members. You cannot apply Layer 3 configurations to an individual port channel
||||||||||||||||||||
member either; you must apply the configuration to the entire port channel. You can
configure a Layer 3 port channel with a static MAC address. If you do not configure this
value, the Layer 3 port channel uses the router MAC of the first channel member to
come up.
Figure 4-2 illustrates Layer 2 (access and trunk) and Layer 3 (routed) port channel inter-
faces. Port channel 20 is the L2 access port channel, with only VLAN 1 allowed on the
port channel. Port channel 21 is the L2 trunk port channel, with VLAN 1 and VLAN 2
allowed on the port channel. Port channel 22 is the L3 routed port channel, whereas Eth
2/3 is the regular Ethernet routed interface.
Layer 3 Routing
For simplified port channel configuration, you can use static port channels with no
associated aggregation protocol. For more flexibility, you can use the Link Aggregation
Control Protocol (LACP), which is defined in the IEEE 802.1AX and IEEE 802.3ad stan-
dards. LACP controls how physical ports are bundled together to form one logical chan-
nel; for example, you can control the maximum number of bundled ports allowed. You
cannot configure LACP on shared interfaces.
The port channel is operationally up when at least one of the member ports is up and that
port’s status is channeling. The port channel is operationally down when all member ports
are operationally down. On Cisco Nexus 7000 Series switches, all ports in a port channel
must be in the same virtual device context (VDC).
Technet24
||||||||||||||||||||
disabling the feature, and you can roll back to this checkpoint. You cannot disable LACP
while any LACP configurations are present. After you enable LACP globally on the
device, you enable LACP for each channel by setting the channel mode for each inter-
face to either active or passive. You can configure channel mode for individual links in
the LACP channel group when you are adding the links to the channel group. When you
delete the port channel, the software automatically deletes the associated channel group.
All member interfaces revert to their original configuration.
Both the passive and active modes allow LACP to negotiate between ports to determine
if they can form a port channel based on criteria such as the port speed and the trunking
state. The passive mode is useful when you do not know whether the remote system, or
partner, supports LACP.
Two devices can form an LACP port channel, even when their ports are in different LACP
modes, if the modes are compatible.
Table 4-2 shows various compatible channel modes for port channels.
||||||||||||||||||||
When the interface joins a port channel, some of its individual parameters are removed
and replaced with the values on the port channel. The following list provides some of
these individual parameters:
■ Bandwidth
■ Delay
■ VRF
■ IP address
■ MAC address
■ Service policy
All the QoS service policies on the port channel are implicitly applied on the member
ports when they join the port channel. You will not see QoS service policies in the
running-config of the member ports. When you delete the port channel, the software
sets all member interfaces as if they were removed from the port channel.
Technet24
||||||||||||||||||||
Many interface parameters remain unaffected when the interface joins or leaves a port
channel, including those in the following list:
■ Description
■ CDP
■ UDLD
■ Rate mode
■ Shutdown
■ SNMP trap
Each port that is configured to use LACP has an LACP port priority. LACP uses the port
priority to decide which ports should be put in standby mode when there is a limitation
that prevents all compatible ports from aggregating and which ports should be put into
active mode. You can accept the default value of 32768 for the LACP port priority, or
you can configure a value between 1 and 65535. A higher port priority value means a
lower priority for LACP. You can configure the port priority so that specified ports have
a lower priority for LACP and are most likely to be chosen as active links rather than as
hot-standby links.
You can configure the load-balancing mode to apply to all port channels that are config-
ured on the entire device or on specified modules. The per-module configuration takes
precedence over the load-balancing configuration for the entire device. You cannot con-
figure the load-balancing method per port channel. The default load-balancing method
for Layer 2 packets is src-dst-mac. The default method for Layer 3 packets is src-dst ip-
l4port.
You can configure the device to use one of the following methods to load-balance across
the port channel:
■ Destination IP address
||||||||||||||||||||
■ Source IP address
Switch 1 Switch 2
Switch 1 Switch 2
Switch 3 Switch 3
vPC Physical Topology vPC Logical Topology
■ Allows a single device to use a port channel across two upstream devices
Technet24
||||||||||||||||||||
B. Server dual-homing: In this topology, a server is connected via two interfaces to two
access switches, as shown in Figure 4-4(B).
vPC
vPC
A B
Figure 4-4 (A) Dual-Uplink Layer 2 Access and (B) Server Dual-Homing
C. FEX supported topologies: FEX supports various vPC topologies with Cisco Nexus
7000 and 9000 Series as their parent switches.
■ Host vPC (single link or dual links) and FEX single-homed (port channel mode)
straight-through design: In this topology, you connect a server with dual or quad
network adapters that are configured in a vPC to a pair of FEXs that are con-
nected straight through to the Cisco Nexus 9000 or Cisco Nexus 7000 Series
switches. The link between the server and FEXs can be single link, as shown in
Figure 4-5(1), or Dual Links, as shown in Figure 4-5(2).
■ Host port channel and active-active (dual-homed) FEX (vPC) design: In this
topology, you connect the FEX to two upstream Cisco Nexus 9000 or Cisco
Nexus 7000 Series switches in vPC fashion and downstream to several single-
homed servers using port channel, as shown in Figure 4-6(2).
||||||||||||||||||||
Server Server
1 2
Figure 4-5 Single-Link (1) or Dual-Link (2) Connected Host vPC with
Single-Homed FEX
Figure 4-6 Single-Homed (1) or Port Channel (2) Connected Host and Dual-Homed
FEX vPC
Technet24
||||||||||||||||||||
Cisco Cisco
Nexus Nexus
Parent Parent
Switch Switch
Cisco Cisco
Nexus 2K Nexus 2K
Server
Note FEX vPC is not supported between any model of FEX and the Cisco Nexus 9500
platform switches as the parent switches.
vPC Components
Figure 4-8 shows the components of vPC along with their naming conventions.
Layer 3 Cloud
vPC Member
Port
vPC
Orphan
Device
Switch 3
||||||||||||||||||||
■ vPC: The combined port channel between the vPC peer devices and the downstream
device.
■ vPC peer device: One of a pair of devices connected with the special port chan-
nel known as the vPC peer-link. You can have only two devices as vPC peers; each
device can serve as a vPC peer to only one other vPC peer. The vPC peer devices can
also have non-vPC links to other devices.
■ vPC peer-keepalive link: The peer-keepalive link monitors the vitality of vPC peer
devices. The peer-keepalive link sends configurable, periodic keepalive messages
between vPC peer devices. It is highly recommended to associate a peer-keepalive
link to a separate virtual routing and forwarding (VRF) instance that is mapped to a
Layer 3 interface in each vPC peer device. If you do not configure a separate VRF,
the system uses the management VRF by default. However, if you use the manage-
ment interfaces for the peer-keepalive link, you must put a management switch con-
nected to both the active and standby management ports on each vPC peer device.
Do not use Ethernet crossover cables to connect the management ports on the vPC
peers to each other back-to-back because the peer-keepalive link will fail on supervi-
sor switchover. No data or synchronization traffic moves over the vPC peer-keepalive
link; the only traffic on this link is a message that indicates that the originating
switch is operating and running a vPC.
■ vPC peer-link: The vPC peer-link carries essential vPC traffic between the vPC peer
switches and is used to synchronize state between the vPC peer devices. The vPC
peer-link is a port channel and should consist of at least two dedicated 10-Gigabit
Ethernet links terminated on two different I/O modules, if at all possible, for high
availability. Higher-bandwidth interfaces (such as 25-Gigabit Ethernet, 40-Gigabit
Ethernet, 100-Gigabit Ethernet, and so on) may also be used to form the port chan-
nel. The peer-link should only allow traffic that is part of the vPC domain. If other
traffic is also allowed, it could overload the link during failures. The system cannot
bring up the vPC peer-link unless the peer-keepalive link is already up and running.
■ vPC member port: A port that is assigned to a vPC channel group. These ports form
the virtual port channel and are split between the vPC peers.
■ Host vPC port: A fabric extender host interface that belongs to a vPC.
■ Orphan port: A non-vPC port, also known as an orphaned port, is a port that is not
part of a vPC.
■ vPC domain: The vPC domain includes both vPC peer devices, the vPC peer-
keepalive link, and all of the port channels in the vPC connected to the downstream
devices. It is also associated to the configuration mode you must use to assign
vPC global parameters. Each vPC domain has a vPC instance number that is shared
Technet24
||||||||||||||||||||
between two devices. Only two devices can be part of the same vPC domain, but
you can have many vPC domains on a single device. The domain ID can be any value
between 1 and 1000, and the same value must be configured on both switches that
form the vPC pair. The vPC peer devices use the vPC domain ID to automatically
assign a unique vPC system MAC address. Each vPC domain has a unique MAC
address that is used as a unique identifier for the specific vPC-related operation.
Although the devices use the vPC system MAC addresses only for link-scope opera-
tions such as LACP, it is recommended that you create each vPC domain within the
contiguous Layer 2 network with a unique domain ID. You can also configure a
specific MAC address for the vPC domain rather than having Cisco NX-OS software
assign the address.
■ Cisco Fabric Services: The Cisco Fabric Services (CFS) is a reliable state transport
mechanism used to synchronize the actions of the vPC peer devices. CFS carries
messages and packets for many features linked with vPC, such as STP and IGMP.
Information is carried in CFS/CFS over Ethernet (CFSoE) protocol data units (PDUs).
When you enable the vPC feature, the device automatically enables CFSoE, and you
do not have to configure anything. CFSoE distributions for vPCs do not need the
capabilities to distribute over IP or the CFS regions. CFS messages provide a copy
of the configuration on the local vPC peer device to the remote vPC peer device.
All MAC addresses for those VLANs configured on both devices are synchronized
between vPC peer devices using the CFSoE protocol. The primary vPC device
synchronizes the STP state on the vPC secondary peer device using Cisco Fabric
Services over Ethernet (CFSoE).
■ vPC VLANs: The VLANs allowed on the vPC are called vPC VLANs. These VLANs
must also be allowed on the vPC peer-link.
■ Non-vPC VLANs: Any of the STP VLANs that are not carried over the vPC
peer-link.
■ STP management
||||||||||||||||||||
CFSoE
- Configuration Consistency
- MAC Address Table Synchronization
- Member Port Status
- Primary and Secondary vPC Devices
- STP Management
- IGMP Snooping Synchronization
- ARP Table Synchronization
Primary Secondary
Single
Entity
Similar to regular port channels, virtual port channels are subject to consistency checks
and compatibility checks. CFSoE protocol communicates essential configuration infor-
mation to ensure configuration consistency between peer switches. During a compat-
ibility check, one vPC peer conveys configuration information to the other vPC peer to
verify that vPC member ports can actually form a port channel. For example, if two ports
that are going to join the channel carry a different set of VLANs, this is a misconfigura-
tion. Depending on the severity of the misconfiguration, vPC may either warn the user
(Type-2 misconfiguration) or suspend the port channel (Type-1 misconfiguration). In the
specific case of a VLAN mismatch, only the VLAN that differs between the vPC mem-
ber ports will be suspended on all the vPC port channels. You can verify the consistency
between vPC peers by using the command show vpc consistency-parameter. In addition
to compatibility checks for the individual vPCs, CFSoE also performs consistency checks
for a set of switch-wide parameters that must be configured consistently on the two peer
switches.
The vPC peers must synchronize the Layer 2 forwarding table (that is, the MAC address
information between the vPC peers). If one vPC peer learns a new MAC address, that
MAC address is also communicated to the other vPC peer using the CFSoE protocol.
The other vPC peer then programs the new MAC address information into the Layer 2
Technet24
||||||||||||||||||||
forwarding table. This MAC address learning mechanism replaces the regular switch
MAC address learning mechanism and prevents traffic from being forwarded across the
vPC peer-link unnecessarily.
If one vPC member port goes down on a vPC peer (for instance, if a link from a NIC
goes down), the member is removed from the port channel without bringing down the
vPC entirely. The vPC peer where the member port went down informs the other vPC
peer using the CFSoE protocol. The vPC peer on which the remaining port is located
will allow frames to be sent from the peer-link to the vPC orphan port. The Layer 2 for-
warding table for the switch that detected the failure is also updated to point the MAC
addresses that were associated with the vPC port to the peer-link. When all vPC member
ports on one of the vPC peer switches go down, Cisco Fabric Services notifies the other
vPC peer switch that its ports are now orphan ports and that traffic received on the peer-
link for that vPC should now be forwarded to the vPC.
When you configure the vPC peer-link, the vPC peer devices negotiate using the CFSoE
protocol and perform an election to determine the primary and secondary role of peer
switches. The Cisco NX-OS software uses the lowest MAC address to elect the primary
device. The software takes different actions on each device (that is, the primary and
secondary) only in certain failover conditions. We will look at different failure scenarios
later in this chapter. vPCs do not support role preemption. If the primary vPC peer device
fails, the secondary vPC peer device takes over to become operationally the vPC primary
device. However, the original operational roles are not restored if the formerly primary
vPC comes up again.
Although vPCs provide a loop-free Layer 2 topology, STP is still required to provide
a fail-safe mechanism to protect against any incorrect or defective cabling or possible
misconfiguration. When you first bring up a vPC, STP reconverges. STP treats the vPC
peer-link as a special link and always includes the vPC peer-link in the STP active topol-
ogy. STP is distributed; that is, the protocol continues running on both vPC peer devices.
However, the configuration on the vPC peer device elected as the primary device controls
the STP process for the vPC interfaces on the secondary vPC peer device. The primary
vPC device synchronizes the STP state on the vPC secondary peer device using CFSoE.
The STP process for vPC also relies on the periodic keepalive messages to determine
when one of the connected devices on the vPC peer-link fails. It is recommended to con-
figure the primary vPC peer device as the STP primary root device and configure the
secondary VPC device to be the STP secondary root device. If the primary vPC peer
device fails over to the secondary vPC peer device, there is no change in the STP topol-
ogy. The vPC primary device sends and processes BPDUs on the vPC interfaces and uses
its own bridge ID. The secondary switch only relays BPDUs and does not generate any
BPDU. The vPC peer switch feature allows a pair of vPC peers to appear as a single STP
root in the Layer 2 topology. In vPC peer switch mode, STP BPDUs are sent from both
vPC peer devices, and both primary and secondary switches use the same bridge ID to
present themselves as a single switch. This improves vPC convergence. You must config-
ure both ends of vPC peer-link with the identical STP configuration.
||||||||||||||||||||
The IGMP snooping process on a vPC peer device shares the learned group information
with the other vPC peer device through the vPC peer-link using the CFSoE protocol.
When IGMP traffic enters a vPC peer switch through a vPC port channel, it triggers
hardware programming for the multicast entry on both vPC member devices. Multicast
traffic is copied over the peer-link to help ensure that orphan ports get the multicast
stream and to help with failure scenarios. This happens regardless of the presence of
receivers on the vPC peer.
The ARP table synchronization across vPC peers uses CFSoE. The ARP table synchroni-
zation feature enables faster convergence of address tables between the vPC peers. This
convergence overcomes the delay that occurs in ARP table restoration for IPv4 or ND
table restoration for IPv6 when the vPC peer-link port channel flaps or when a vPC peer
comes back online. This feature is disabled by default and can be enabled using the ip
arp synchronize or ipv6 nd synchronize command.
When communicating with external networks, the vPC domain prioritizes forwarding
through local ports, except in certain situations such as traffic forwarding to orphan
devices and flooding traffic (broadcast, multicast, and unknown unicast traffic), which
uses the vPC peer-link. For forwarding regular vPC traffic, vPC peer-link is not used to
forward data packets. An exception to this rule is when a vPC peer switch has lost all its
member ports, resulting in orphan ports on other peer switch. In this case, the vPC peer
switch, where the member ports are up, will be allowed to forward the traffic received on
the peer-link to one of the remaining active vPC member ports.
Technet24
||||||||||||||||||||
Switch 1 Switch 2
Po51 Po52
Switch 3 Switch 4
vPC peer switches commonly use an FHRP, such as HSRP, GLBP, or VRRP, for default
gateway redundancy. You can configure vPC peer devices to act as the gateway even
for packets destined to the vPC peer device’s MAC address using the peer-gateway
feature. The vPC peer-gateway capability allows a vPC switch to act as the active gate-
way for packets that are addressed to the router MAC address of the vPC peer. This
feature enables local forwarding of packets without the need to cross the vPC peer-link.
Configuring the peer-gateway feature must be done on both primary and secondary vPC
peers and is nondisruptive to the operations of the device or to the vPC traffic. VRRP
acts similarly to HSRP when running on vPC peer devices. When the primary vPC peer
device fails over to the secondary vPC peer device, the FHRP traffic continues to flow
seamlessly.
Figure 4-11 illustrates the traffic forwarding in a vPC environment. In the left diagram,
the data traffic reaching Cisco Nexus switches Agg1 and Agg2 from the core is for-
warded toward the access switches acc1, acc2, and acc3 without traversing the peer Cisco
Nexus switch device using the vPC peer-link. Similarly, traffic from the server directed to
||||||||||||||||||||
the core reaches Cisco Nexus switches Agg1 and Agg 2, and the receiving Cisco Nexus
switch routes it directly to the core without unnecessarily passing it to the peer Cisco
Nexus device using the peer-link. This happens regardless of which Cisco Nexus device is
the primary HSRP device for a given VLAN.
L3 L3
L2 L2
A B C D E F A B C D E F
■ vPC member port failure: When one vPC member port fails, the host MAC detects
a link failure on one of the port channel members and redistributes the affected
flows to the remaining port channel members. Before the failure, the MAC pointed
to primary port, and after the failure, it points to secondary port. This is one of the
scenarios where a vPC peer-link is used to carry data traffic.
Technet24
||||||||||||||||||||
■ vPC peer-link failure: In a vPC topology, one vPC peer switch is elected as the vPC
primary switch and the other switch is elected as the vPC secondary switch, based
on the configured role priority for the switch. In a scenario where the vPC peer-link
goes down, the vPC secondary switch shuts down all of its vPC member ports if it
can still receive keepalive messages from the vPC primary switch (which indicates
that the vPC primary switch is still alive). The vPC primary switch keeps all of its
interfaces up, as shown in Figure 4-12.
P vPC Peer-Keepalive S
SW1 SW2
vPC Peer-Link
Suspend Secondary
vPC Member Ports
vPC1 vPC2
SW3 SW4
Keepalive Heartbeat
P Primary
S Secondary
■ vPC peer-keepalive link failure: During a vPC peer-keepalive link failure, there is no
impact on traffic flow.
■ vPC keepalive-link failure followed by a peer-link failure: If the vPC keepalive link
fails first and then a peer-link fails, the vPC primary switch continues to be primary
but the vPC secondary switch becomes the operational primary switch and keeps its
vPC member ports up (this is also known as a dual active scenario). This can occur
when both the vPC switches are healthy but the failure has occurred because of a
connectivity issue between the switches. This situation is known as a split-brain sce-
nario. There is no loss of traffic for existing flows, but new flows can be affected as
the peer-link is not available. The two vPC switches cannot synchronize the unicast
MAC address and the IGMP groups and therefore cannot maintain the complete
||||||||||||||||||||
unicast and multicast forwarding table. Also, there may be some duplicate packet
forwarding, as shown in Figure 4-13.
P vPC Peer-Keepalive P
SW1 SW2
vPC Peer-Link
vPC1 vPC2
SW3 SW4
P Primary
S Secondary
Figure 4-13 vPC Keepalive Link Failure Followed by a Peer-Link Failure Scenario
■ vPC peer-link and keepalive both fail but only keepalive returns: Initially a dual
active state will exist. When the keepalive link is restored, we can expect that the
configured primary will become the operational primary.
vPC Guidelines
vPCs have the following configuration guidelines and limitations:
■ A vPC can be deployed on two identical Cisco Nexus 9300 Series switches or two
identical Cisco Nexus 9500 Series switches. Both switches must be the exact same
model and both switches must consist of the same models of line cards, fabric mod-
ules, supervisor modules, and system controllers inserted in the same slots of the
chassis.
■ A vPC peer-link must consist of Ethernet ports with an interface speed of 10Gbps or
higher. It is recommended to use at least two 10-Gigabit Ethernet ports in dedicated
mode on two different I/O modules.
■ A vPC is a per-VDC function on the Cisco Nexus 7000 Series switches. A vPC can
be configured in multiple VDCs, but the configuration is entirely independent.
Each VDC requires an independent vPC peer-link and vPC peer-keepalive link. vPC
Technet24
||||||||||||||||||||
domains cannot be stretched across multiple VDCs on the same switch, and all ports
for a given vPC must be in the same VDC.
■ A vPC is a Layer 2 port channel. A vPC does not support the configuration of Layer
3 port channels. Dynamic routing from the vPC peers to routers connected on a vPC
is not supported. It is recommended that routing adjacencies be established on sepa-
rate routed links.
■ A vPC can be used as a Layer 2 link to establish a routing adjacency between two
external routers. The routing restrictions for vPCs only apply to routing adjacencies
between the vPC peer switches and routers that are connected on a vPC.
■ A vPC has support for static routing to FHRP addresses. The FHRP enhancements
for vPCs enable routing to a virtual FHRP address across a vPC.
vPC Configuration
Configuring a basic vPC is a multistep process. The following are the steps to enable a
basic vPC configuration on the Cisco Nexus 7000 or 9000 Series switch:
Step 3. Create a vPC domain and enter the vPC domain mode.
From the global configuration mode, you must enable the vPC feature before you can
configure and use vPCs. The next step is to create a vPC domain. Use a unique vPC
domain number throughout a single vPC domain. This domain ID is used to automati-
cally form the vPC system MAC address. You can then configure the destination IP for
the peer-keepalive link that carries the keepalive messages. Once the vPC peer-keepalive
link is configured, you can create the vPC peer-link by designating the port channel you
want on each device as the vPC peer-link for the specified vPC domain. Once the vPC
peer-link is configured, you can connect the downstream device. You create a port chan-
nel from the downstream device to the primary and secondary vPC peer devices. On
each vPC peer device, you assign a vPC number to the port channel that connects to the
downstream device.
Table 4-3 summarizes the NX-OS CLI commands related to basic vPC configuration and
verification.
||||||||||||||||||||
Table 4-3 Summary of NX-OS CLI Commands for vPC Configuration and Verification
Command Purpose
configure terminal Enters global configuration mode.
[no] feature vpc Enables vPCs on the device.
[no] feature lacp Enables LACP on the device.
vrf context vrf-name Creates a new VRF and enters VRF configuration
mode. The vrf-name can be any case-sensitive, alpha-
numeric string up to 32 characters.
interface interface-type slot/port Enters interface configuration mode.
no switchport Configures the interface as a Layer 3 interface.
vrf member vrf-name Adds this interface to a VRF.
ip address ip-prefix/length Configures an IP address for this interface. You must
do this step after you assign this interface to a VRF.
switchport mode trunk Sets the interface as a Layer 2 trunk port. A trunk
port can carry traffic in one or more VLANs on the
same physical link.
channel-group channel-number Configures the port in a channel group and sets the
[force] [mode {on | active | passive}] mode. The channel-number range is from 1 to 4096.
This command creates the port channel associated
with this channel group, if the port channel does not
already exist. All static port channel interfaces are
set to mode on. You must set all LACP-enabled port
channel interfaces to active or passive. The default
mode is on.
vpc domain domain-id Creates a vPC domain if it does not already exist, and
enters the vpc domain configuration mode. There is
no default; the range is from 1 to 1000.
peer-keepalive destination ipaddress Configures the IPv4 and IPv6 addresses for the
source ipaddress | vrf {name | remote end of the vPC peer-keepalive link.
management vpc-keepalive}
interface port-channel channel- Selects the port channel and enters interface configu-
number ration mode.
vpc peer-link Configures the selected port channel as the vPC peer-
link.
Technet24
||||||||||||||||||||
Command Purpose
vpc number Configures the selected port channel into the vPC to
connect to the downstream device. The range is from
1 and 4096.
Examples 4-1 to 4-3 show the basic vPC configuration and verification on the sample
topology shown in Figure 4-14. Layer 3 connectivity between N7K-A and N7K-B and
N9K-A is established in the backend. In this example, we will focus only on vPC configu-
ration and verification. We will configure N7K-A and N7K-B as vPC peers in vPC domain
11. We will configure the link connecting the interface Ethernet 3/25 on both vPC peers
as a vPC peer-keepalive link. We will also configure the link connecting the interfaces
Ethernet 3/26 and Ethernet 3/31 in the port channel on both vPC peers and configure
it as vPC peer-link. vPC 10 will be set up toward N9K-A on the interfaces shown in
Figure 4-14.
||||||||||||||||||||
vPC Domain 11
N7K-A N7K-B
Et
h
1/
49 vPC 10 /50
h1
Et
N9K-A
In Example 4-1, we will do some pre-configuration, such as setting up the Layer 3 link
between vPC peers to be later utilized as a vPC keepalive link and setting up a port chan-
nel between vPC peers to be later utilized as a vPC peer-link during vPC configuration.
N7K-A
Technet24
||||||||||||||||||||
N7K-B(JAF1752AKJA)
Eth3/31 173 R S I s N7K-C7009 Eth3/31
N9K-A(FDO241519JZ)
Eth6/8 176 R S I s N9K-C93180YC-FX Eth1/49
N7K-B
N9K-A
||||||||||||||||||||
N7K-B(JAF1752AKJA)
Eth1/50 174 R S I s N7K-C7009 Eth6/8
! Configuring the Layer 3 link between N7K-A and N7K-B and making it ready to be
later used as vPC Peer-Keepalive link. We will configure this link in vrf VPC-KEE-
PALIVE and make sure the end-to-end connectivity between vPC peers N7K-A and N7K-B
is established via this link.
N7K-A
N7K-B
Technet24
||||||||||||||||||||
N7K-A
N7K-B
! Configuring Port-Channel 1 between N7K-A and N7K-B using links Ethernet 3/26 and
Ethernet 3/31 and making it ready to be later used as vPC Peer-Link. Although the
channel group number can be any value between 1 and 4096, matching the port channel
number of vPC Peer-Link with the vPC domain number may help with troubleshooting.
In this exercise, the same number is not used to demonstrate that it is not
required for configuring the vPC domain.
N7K-A
||||||||||||||||||||
N7K-B
! Verifying the newly created Layer 2 Port-Channel. The flags next to the interfaces
are described by the legend at the beginning of the command output. The interface
port-channel 1 is switched port (S) and is up (U), its member ports are flagged
with (P).
N7K-A
N7K-B
Technet24
||||||||||||||||||||
U - Up (port-channel)
M - Not in use. Min-links not met
--------------------------------------------------------------------------------
Group Port- Type Protocol Member Ports
Channel
--------------------------------------------------------------------------------
1 Po1(SU) Eth NONE Eth3/26(P) Eth3/31(P)
N7K-A
N7K-B
! Configuring the vPC domain 11 for the vPC and configuring the vPC peer-keepalive
link.
N7K-A
N7K-B
||||||||||||||||||||
N7K-A
N7K-B
! Configuring vPC Member Ports on vPC peers N7K-A and N7K-B. First, we will enable
LACP feature and configure the member ports on Port-Channel 10.
N7K-A
Technet24
||||||||||||||||||||
N7K-B
N9K-A
||||||||||||||||||||
--------------------------------------------------------------------------------
Group Port- Type Protocol Member Ports
Channel
--------------------------------------------------------------------------------
15 Po15(SU) Eth LACP Eth1/49(P) Eth1/50(P)
N9K-A#
vPC domain id : 11
Peer status : peer adjacency formed ok
vPC keep-alive status : peer is alive
Configuration consistency status : success
Per-vlan consistency status : success
Type-2 consistency status : success
vPC role : primary
Number of vPCs configured : 1
Peer Gateway : Disabled
Dual-active excluded VLANs and BDs : -
Graceful Consistency Check : Enabled
Auto-recovery status : Enabled, timer is off.(timeout = 240s)
Delay-restore orphan ports status : Timer is off.(timeout = 0s)
Operational Layer3 Peer-router : Disabled
Self-isolation : Disabled
vPC status
Id : 10
Technet24
||||||||||||||||||||
Port : Po10
Status : up
Consistency : success
Reason : success
Active Vlans : 1,200
vPC domain id : 11
Peer status : peer adjacency formed ok
vPC keep-alive status : peer is alive
Configuration consistency status : success
Per-vlan consistency status : success
Type-2 consistency status : success
vPC role : primary
Number of vPCs configured : 1
Peer Gateway : Disabled
Dual-active excluded VLANs and BDs : -
Graceful Consistency Check : Enabled
Auto-recovery status : Enabled, timer is off.(timeout = 240s)
Delay-restore status : Timer is off.(timeout = 30s)
Delay-restore SVI status : Timer is off.(timeout = 10s)
Delay-restore orphan ports status : Timer is off.(timeout = 0s)
Operational Layer3 Peer-router : Disabled
Self-isolation : Disabled
vPC status
------------------------------------------------------
id Port Status Consistency Active VLANs
----- ------------ ------ ----------- ----------------
10 Po10 up success 1,200
||||||||||||||||||||
! Verifying vPC role of vPC peers. The show vpc role command also shows the vPC
system-mac created from vPC domain ID. The last octet (0b, or decimal 11) is
derived from the vPC domain ID 11.
Technet24
||||||||||||||||||||
||||||||||||||||||||
TX
48 unicast packets 3587 multicast packets 2 broadcast packets
3587 output packets 364772 bytes
7 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
0 Tx pause
Technet24
||||||||||||||||||||
Legend:
Type 1 : vPC will be suspended in case of mismatch
Legend:
Type 1 : vPC will be suspended in case of mismatch
||||||||||||||||||||
version 8.4(2)
feature vpc
interface port-channel1
vpc peer-link
interface port-channel10
vpc 10
N7K-A#
Technet24
||||||||||||||||||||
Summary
This chapter discusses Ethernet port channels, virtual port channels (vPCs), and vPC
configuration and verification, including the following points:
■ A port channel bundles physical links into a channel group to create a single logical
link that provides an aggregate bandwidth of up to 32 physical links.
■ You can configure Layer 2 port channels in either access or trunk mode. Layer 3 port
channel interfaces have routed ports as channel members. You cannot combine Layer
2 and Layer 3 interfaces in the same port channel.
■ Individual interfaces in port channels are configured with channel modes. When you
run static port channels with no aggregation protocol, the channel mode is always
set to on. When you configure LACP port channels, the channel mode is set to either
active or passive.
■ In active mode, ports initiate negotiations with other ports by sending LACP pack-
ets. In passive mode, ports respond to LACP packets they receive but do not initiate
LACP negotiation.
■ When you add an interface to a channel group, the NX-OS software checks certain
interface and operational attributes to ensure that the interface is compatible with
the channel group. If you configure a member port with an incompatible attribute,
the software suspends that port in the port channel.
■ The Cisco NX-OS software load-balances traffic across all operational interfaces in a
port channel by hashing the addresses in the frame to a numerical value that selects
one of the links in the channel.
■ A vPC allows links that are physically connected to two different Cisco Nexus 7000
or 9000 Series devices to appear as a single port channel by a third device. You can
use only Layer 2 port channels in the vPC.
■ A vPC uses CFSoE as the primary control plane protocol for vPC.
■ vPC forwarding rule: a frame that enters the vPC peer switch from the peer-link can-
not exit the switch from a vPC member port.
■ vPC interacts differently with events triggered by failure of vPC peer-keepalive link,
vPC peer-link, and so on.
||||||||||||||||||||
References 125
References
“Cisco Nexus 9000 NX-OS Interfaces Configuration Guide, Release 10.2(x),” https://
www.cisco.cSom/c/en/us/td/docs/dcn/nx-os/nexus9000/102x/configuration/inter-
faces/cisco-nexus-9000-nx-os-interfaces-configuration-guide-102x.html
“Cisco Nexus 7000 Series NX-OS Interfaces Configuration Guide 8.x,” https://www.
cisco.com/c/en/us/td/docs/switches/datacenter/nexus7000/sw/interfaces/config/
cisco_nexus7000_interfaces_config_guide_8x.html
“Understand Virtual Port Channel (vPC) Enhancements,” https://www.cisco.com/c/en/us/
support/docs/ios-nx-os-software/nx-os-software/217274-understand-virtual-port-
channel-vpc-en.html
“Supported Topologies for Routing over Virtual Port Channel on Nexus Platforms,”
https://www.cisco.com/c/en/us/support/docs/ip/ip-routing/118997-technote-nex-
us-00.html
“Best Practices for Virtual Port Channels (vPC) on Cisco Nexus 7000 Series Switches,”
https://www.cisco.com/c/dam/en/us/td/docs/switches/datacenter/sw/design/vpc_
design/vpc_best_practices_design_guide.pdf
“Nexus 2000 Fabric Extenders Supported/Unsupported Topologies,” https://www.cisco.
com/c/en/us/support/docs/switches/nexus-2000-series-fabric-extenders/200363-
nexus-2000-fabric-extenders-supported-un.html
Relevant Cisco Live sessions: http://www.ciscolive.com
Technet24
||||||||||||||||||||
||||||||||||||||||||
Chapter 5
Switch Virtualization
Cisco Nexus Series switches support many virtualization options, including Layer 3
virtual routing and forwarding (VRF) instances and virtual device contexts (VDCs). A
VRF can be used to virtualize the Layer 3 forwarding and routing tables. VDCs allow
the Cisco Nexus 7000 Series switches to be virtualized at the device level. Cisco Nexus
Series switches also provide operational segmentation into functional planes to segment
the functions of the switch into functional layers. This segmentation enables features
such as control plane policing (CoPP) that prevent operational disruptions. All these func-
tionalities allow you to establish a stable data center environment with high performance
and easy management.
In this chapter, we discuss Cisco Nexus switch functional planes, Cisco Nexus switch
process separation and restartability, and VRF configuration and verification on Cisco
NX-OS. We will also discuss Cisco Nexus 7000 switch VDCs, including VDC architec-
ture, VDC types, VDC resources, VDC fault isolation, VDC high availability, VDC man-
agement, along with VDC configuration and verification.
■ Data plane: Handles all the data traffic. The basic functionality of a Cisco NX-OS
device is to forward packets from one interface to another. The packets that are not
meant for the switch itself are called the transit packets. These packets are handled
by the data plane.
■ Control plane: Handles all routing protocol control traffic. These protocols, such as
the Border Gateway Protocol (BGP) and the Open Shortest Path First (OSPF) pro-
tocol, send control packets between devices. These packets are destined to router
addresses and are called control plane packets.
Technet24
||||||||||||||||||||
■ Management plane: Runs the components meant for Cisco NX-OS device
management purposes, such as the command-line interface (CLI) and Simple
Network Management Protocol (SNMP).
The Cisco NX-OS device provides control plane policing (CoPP), which protects the
control plane and separates it from the data plane, thus ensuring network stability, reach-
ability, and packet delivery. The CoPP feature allows a policy map to be applied to the
control plane. This policy map looks like a normal quality of service (QoS) policy and is
applied to all traffic entering the switch from a non-management port.
The Cisco Nexus switch supervisor module has both the management plane and control
plane and is critical to the operation of the network. Any disruption to or attacks against
the supervisor module will result in serious network outages. For example, excessive traf-
fic to the supervisor module could overload and slow down the performance of the entire
Cisco NX-OS device. To protect the control plane, the Cisco NX-OS device segregates
different packets destined for the control plane into different classes. Once these classes
are identified, the Cisco NX-OS device polices the packets, which ensures that the super-
visor module is not overwhelmed.
■ Receive packets: Packets that have the destination address of a router. The destina-
tion address can be a Layer 2 address (such as a router MAC address) or a Layer 3
address (such as the IP address of a router interface). These packets include router
updates and keepalive messages. Multicast packets can also be in this category,
where packets are sent to multicast addresses used by a router.
■ Exception packets: Packets that need special handling by the supervisor module.
For example, if a destination address is not present in the Forwarding Information
Base (FIB) and results in a miss, the supervisor module sends an ICMP unreachable
packet back to the sender. Another example is a packet with IP options set.
■ Glean packets: If a Layer 2 MAC address for a destination IP address is not present
in the FIB, the supervisor module receives the packet and sends an ARP request to
the host.
All of these different packets could be maliciously used to attack the control plane and
overwhelm the Cisco NX-OS device. CoPP classifies these packets to different classes
and provides a mechanism to individually control the rate at which the supervisor mod-
ule receives these packets. For example, you might want to be less strict with a protocol
packet such as Hello messages but more strict with a packet that is sent to the supervisor
module because the IP option is set. You configure packet classifications and rate-
controlling policies using class maps and policy maps.
Table 5-1 summarizes the NX-OS CLI commands related to CoPP verification.
||||||||||||||||||||
Example 5-1 shows the CoPP verification on a standalone Nexus 9000 switch.
! Reviewing CoPP status. In this output N9K is using strict profile for the CoPP.
N9K# show copp status
Last Config Operation: copp profile strict
Last Config Operation Timestamp: 13:12:26 UTC Dec 12 2021
Last Config Operation Status: Success
Policy-map attached to the control-plane: copp-system-p-policy-strict
ip access-list copp-system-p-acl-auto-rp
permit ip any 224.0.1.39/32
permit ip any 224.0.1.40/32
ip access-list copp-system-p-acl-bgp
permit tcp any gt 1023 any eq bgp
permit tcp any eq bgp any gt 1023
ipv6 access-list copp-system-p-acl-bgp6
permit tcp any gt 1023 any eq bgp
Technet24
||||||||||||||||||||
! Viewing the statistics that are compiled for the CoPP class-maps.
N9K# show policy-map interface control-plane
Control Plane
dropped 0 bytes;
5-min violate rate 0 byte/sec
violated 0 peak-rate byte/sec
||||||||||||||||||||
module 1 :
transmitted 100947782 bytes;
5-minute offered rate 15 bytes/sec
conformed 51 peak-rate bytes/sec
at Wed Dec 22 08:22:02 2021
dropped 0 bytes;
5-min violate rate 0 byte/sec
violated 0 peak-rate byte/sec
<output omitted>
! Filtering the CoPP statistics to obtain an aggregate view of conformed and vio-
lated counters for all the CoPP class-maps.
N9K# show policy-map interface control-plane | include
class|conform|violated
class-map copp-system-p-class-l3uc-data (match-any)
conformed 0 peak-rate bytes/sec
violated 0 peak-rate byte/sec
class-map copp-system-p-class-critical (match-any)
conformed 51 peak-rate bytes/sec
violated 0 peak-rate byte/sec
class-map copp-system-p-class-important (match-any)
conformed 66 peak-rate bytes/sec
violated 0 peak-rate byte/sec
class-map copp-system-p-class-openflow (match-any)
conformed 0 peak-rate bytes/sec
violated 0 peak-rate byte/sec
class-map copp-system-p-class-multicast-router (match-any)
conformed 19 peak-rate bytes/sec
violated 0 peak-rate byte/sec
class-map copp-system-p-class-multicast-host (match-any)
conformed 0 peak-rate bytes/sec
violated 0 peak-rate byte/sec
<output omitted>
! Comparing CoPP profiles. In this output we are comparing dense and strict CoPP
profiles.
N9K# show copp diff profile dense profile strict
Prior Profile Doesn't Exist.
Technet24
||||||||||||||||||||
- class copp-system-p-class-l3uc-data
- set cos 1
- police cir 800 kbps bc 32000 bytes conform transmit violate drop
- class copp-system-p-class-critical
- set cos 7
- police cir 36000 kbps bc 1280000 bytes conform transmit violate drop
- class copp-system-p-class-important
- set cos 6
- police cir 2500 kbps bc 1280000 bytes conform transmit violate drop
- class copp-system-p-class-openflow
- set cos 5
- police cir 1000 kbps bc 32000 bytes conform transmit violate drop
- class copp-system-p-class-multicast-router
- set cos 6
- police cir 2600 kbps bc 128000 bytes conform transmit violate drop
<output omitted>
The Cisco NX-OS service restart features allow you to restart a faulty service without
restarting the supervisor to prevent process-level failures from causing system-level
failures. You can restart a service depending on current errors, failure circumstances,
and the high-availability policy for the service. A service can undergo either a stateful or
stateless restart. Cisco NX-OS allows services to store runtime state information and mes-
sages for a stateful restart. In a stateful restart, the service can retrieve this stored state
information and resume operations from the last checkpoint service state. In a stateless
restart, the service can initialize and run as if it had just been started with no prior state.
Not all services are designed for a stateful restart. For example, Cisco NX-OS does not
store runtime state information for Layer 3 routing protocols such as Open Shortest Path
||||||||||||||||||||
First (OSPF) and Routing Information Protocol (RIP). Their configuration settings are
preserved across a restart, but these protocols are designed to rebuild their operational
state using information obtained from neighbor routers.
■ System Manager: The system manager directs overall system function, service man-
agement, and system health monitoring and enforces high-availability policies. The
system manager is responsible for launching, stopping, monitoring, and restarting
services as well as initiating and managing the synchronization of service states and
supervisor states for a stateful switchover.
■ Persistent storage service: Cisco NX-OS services use the persistent storage service
(PSS) to store and manage operational runtime information. The PSS component
works with system services to recover states in the event of a service restart. PSS
functions as a database of state and runtime information that allows services to make
a checkpoint of their state information whenever needed. A restarting service can
recover the last-known operating state that preceded a failure, which allows for a
stateful restart. Each service that uses PSS can define its stored information as pri-
vate (it can be read only by that service) or shared (the information can be read by
other services). If the information is shared, the service can specify that it is local
(the information can be read only by services on the same supervisor) or global (it
can be read by services on either supervisor or on modules). For example, if the PSS
information of a service is defined as shared and global, services on other modules
can synchronize with the PSS information of the service that runs on the active
supervisor.
■ Message and transaction service: The message and transaction service (MTS) is a
high-performance interprocess communications (IPC) message broker. MTS handles
message routing and queuing between services on and across modules and between
supervisors. MTS facilitates the exchange of messages such as event notification,
synchronization, and message persistency between system services and system com-
ponents. MTS can maintain persistent messages and logged messages in queues for
access even after a service restart.
■ High Availability (HA) policies: Cisco NX-OS allows each service to have an associ-
ated set of internal HA policies that define how a failed service is restarted. Each
service can have four defined policies—a primary policy and secondary policy when
two supervisors are present and a primary policy and secondary policy when only
one supervisor is present. If no HA policy is defined for a service, the default HA
policy to be performed upon a service failure is a switchover (if two supervisors are
present) or a supervisor reset (if only one supervisor is present).
Technet24
||||||||||||||||||||
Default VRF
Mgmt VRF
Additional
VRFs
■ All Layer 3 interfaces exist in the default VRF until they are assigned to another
VRF.
■ Routing protocols run in the default VRF context unless another VRF context is
specified.
■ The default VRF uses the default routing context for all show commands.
■ The default VRF is similar to the global routing table concept in Cisco IOS.
All unicast and multicast routing protocols support VRFs. When you configure a routing
protocol in a VRF, you set routing parameters for the VRF that are independent of rout-
ing parameters in another VRF for the same routing protocol instance. You can assign
interfaces and route protocols to a VRF to create virtual Layer 3 networks. An interface
exists in only one VRF.
||||||||||||||||||||
By default, Cisco NX-OS uses the VRF of the incoming interface to select which routing
table to use for a route lookup. You can configure a route policy to modify this behavior
and set the VRF that Cisco NX-OS uses for incoming packets.
A fundamental feature of the Cisco NX-OS architecture is that every IP-based feature
is “VRF aware.” Table 5-2 shows VRF-aware services that can select a particular VRF to
reach a remote server or to filter information based on the selected VRF.
■ When you make an interface a member of an existing VRF, Cisco NX-OS removes all
Layer 3 configurations. You should configure all Layer 3 parameters after adding an
interface to a VRF.
■ If you configure an interface for a VRF before the VRF exists, the interface is opera-
tionally down until you create the VRF.
■ Cisco NX-OS creates the default and management VRFs by default. You should add
the mgmt0 interface to the management VRF and configure the mgmt0 IP address
and other parameters after you add it to the management VRF.
Technet24
||||||||||||||||||||
■ The write erase boot command does not remove the management VRF
configurations. You must use the write erase command and then the write erase
boot command to remove the management VRF configurations.
Table 5-3 summarizes the NX-OS CLI commands that are related to basic VRF configura-
tion and verification.
Table 5-3 Summary of NX-OS CLI Commands for VRF Configuration and Verification
Command Purpose
configure terminal Enters global configuration mode.
[no] vrf context name Creates a new VRF and enters VRF configuration mode. The
name can be any case-sensitive, alphanumeric string up to 32
characters.
Using the no option with this command deletes the VRF and
all associated configurations.
interface interface-type slot/ Enters interface configuration mode.
port
vrf member vrf-name Adds this interface to a VRF.
show vrf [vrf-name] Displays VRF information.
Examples 5-2 to 5-6 show the basic VRF configuration and verification on the sample
topology shown in Figure 5-2. OSPF area 0 is preconfigured on the topology, and OSPF
neighborship is already fully functional. We will concentrate only on the VRF configura-
tion and its impact on the OSPF routing in this example.
Area 0
N9K-A N9K-B
10.10.10.10/32 – Lo0 20.20.20.20/32 – Lo0
1.1.1.1 – Router ID 2.2.2.2 – Router ID
Note OSPF fundamentals, along with configuration and verification, are covered
in detail in Chapter 6, “Nexus Switch Routing.”
||||||||||||||||||||
interface mgmt0
vrf member management
ip address 10.10.1.6/24
! If you don't specify the vrf, a simple ping to the management interface will ini-
tiate the ping from the default vrf. Since the interface is under vrf management,
the ping will fail as default vrf is not aware of the management vrf interfaces.
N9K-A# ping 10.10.1.6
PING 10.10.1.6 (10.10.1.6): 56 data bytes
ping: sendto 10.10.1.6 64 chars, No route to host
Request 0 timed out
ping: sendto 10.10.1.6 64 chars, No route to host
Request 1 timed out
ping: sendto 10.10.1.6 64 chars, No route to host
Request 2 timed out
ping: sendto 10.10.1.6 64 chars, No route to host
Request 3 timed out
ping: sendto 10.10.1.6 64 chars, No route to host
Request 4 timed out
! Default vrf routing table is not aware of the management interface IP address.
N9K-A# show ip route
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
Technet24
||||||||||||||||||||
! Pinging the management interface specifying the correct vrf. This time, the ping
will succeed as management vrf routing table have reachability information for
all the IP addresses in the management vrf.
N9K-A# ping 10.10.1.6 vrf management
PING 10.10.1.6 (10.10.1.6): 56 data bytes
64 bytes from 10.10.1.6: icmp_seq=0 ttl=255 time=0.295 ms
64 bytes from 10.10.1.6: icmp_seq=1 ttl=255 time=0.178 ms
64 bytes from 10.10.1.6: icmp_seq=2 ttl=255 time=0.172 ms
64 bytes from 10.10.1.6: icmp_seq=3 ttl=255 time=0.169 ms
64 bytes from 10.10.1.6: icmp_seq=4 ttl=255 time=0.258 ms
N9K-A#
||||||||||||||||||||
In Examples 5-3 and 5-4, we verify the OSPF configuration and end-to-end connectivity.
! Verifying if the interfaces are configured correctly. Note that all interfaces
are in vrf default.
N9K-A# show ip interface brief
IP Interface Status for VRF "default"(1)
Interface IP Address Interface Status
Lo0 10.10.10.10 protocol-up/link-up/admin-up
Eth1/3 192.168.1.45 protocol-up/link-up/admin-up
router ospf 1
router-id 1.1.1.1
interface loopback0
ip router ospf 1 area 0.0.0.0
interface Ethernet1/3
ip router ospf 1 area 0.0.0.0
! Verifying OSPF neighbors. Note that the OSPF neighborship is formed under vrf
default.
N9K-A# show ip ospf neighbors
OSPF Process ID 1 VRF default
Total number of neighbors: 1
Neighbor ID Pri State Up Time Address Interface
2.2.2.2 1 FULL/BDR 01:36:30 192.168.1.46 Eth1/3
Technet24
||||||||||||||||||||
! Verifying if the interfaces are configured correctly. Note that all interfaces
are in vrf default.
N9K-B# show ip interface brief
IP Interface Status for VRF "default"(1)
Interface IP Address Interface Status
Lo0 20.20.20.20 protocol-up/link-up/admin-up
Eth1/3 192.168.1.46 protocol-up/link-up/admin-up
||||||||||||||||||||
router ospf 1
router-id 2.2.2.2
interface loopback0
ip router ospf 1 area 0.0.0.0
interface Ethernet1/3
ip router ospf 1 area 0.0.0.0
! Verifying OSPF neighbors. Note that the OSPF neighborship is formed under vrf
default.
N9K-B# show ip ospf neighbors
OSPF Process ID 1 VRF default
Total number of neighbors: 1
Neighbor ID Pri State Up Time Address Interface
1.1.1.1 1 FULL/DR 01:38:43 192.168.1.45 Eth1/3
Technet24
||||||||||||||||||||
In Examples 5-5 and 5-6, we configure the nondefault VRF DCFNDU on the Loopback 0
and Ethernet 1/3 interfaces on N9K-A and N9K-B and verify its impact on OSPFv2 rout-
ing. Since adding an interface to a VRF wipes its configuration, we need to reconfigure
the IP address and the OSPF configuration.
N9K-A
! Placing the Loopback 0 and Ethernet 1/3 interface into vrf instance DCFNDU.
N9K-A(config)# interface Loopback 0
N9K-A(config-if)# vrf member DCFNDU
Warning: Deleted all L3 config on interface loopback0
N9K-A(config-if)# ip address 10.10.10.10/32
N9K-A(config-if)# ip router ospf 1 area 0
N9K-A(config-if)# interface Ethernet 1/3
N9K-A(config-if)# vrf member DCFNDU
Warning: Deleted all L3 config on interface Ethernet1/3
N9K-A(config-if)# ip address 192.168.1.45/30
N9K-A(config-if)# ip router ospf 1 area 0
N9K-A(config-if)# end
N9K-A#
N9K-B
||||||||||||||||||||
! Placing the Loopback 0 and Ethernet 1/3 interface into vrf instance DCFNDU.
N9K-B(config)# interface Loopback 0
N9K-B(config-if)# vrf member DCFNDU
Warning: Deleted all L3 config on interface loopback0
N9K-B(config-if)# ip address 20.20.20.20/32
N9K-B(config-if)# ip router ospf 1 area 0
N9K-B(config-if)# interface Ethernet 1/3
N9K-B(config-if)# vrf member DCFNDU
Warning: Deleted all L3 config on interface Ethernet1/3
N9K-B(config-if)# ip address 192.168.1.46/30
N9K-B(config-if)# ip router ospf 1 area 0
N9K-B(config-if)# end
N9K-B#
! Once the interfaces are moved under vrf instance DCFNDU, the ospf neighborship is
formed under vrf DCFNDU and not under default vrf.
N9K-A# show ip ospf neighbors
N9K-A# show ip ospf neighbors vrf DCFNDU
OSPF Process ID 1 VRF DCFNDU
Total number of neighbors: 1
Neighbor ID Pri State Up Time Address Interface
192.168.1.46 1 FULL/BDR 00:15:55 192.168.1.46 Eth1/3
! OSPF routing table under vrf default don't show any routes as there is no neigh-
borship formed under vrf instance default. All OSPF routes are showing up under vrf
instance DCFNDU.
N9K-A# show ip route ospf
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
Technet24
||||||||||||||||||||
! Ping between Loopback interfaces of N9K-A and N9K-B fails under vrf instance
default. End-to-End reachability between Loopback interfaces of N9K-A and N9K-B is
achieved only under vrf instance DCFNDU.
N9K-A# ping 20.20.20.20 source 10.10.10.10
ping: can't bind to address 10.10.10.10
||||||||||||||||||||
VDCs virtualize the control plane, which includes all those software functions processed
by the CPU on the active supervisor module. A VDC contains its own unique and inde-
pendent set of VLANs and VRFs. Each VDC can have assigned to it physical ports,
thus allowing for the hardware data plane to be virtualized as well. Within each VDC, a
separate management domain can manage the VDC itself, thus allowing the management
plane itself to also be virtualized.
In its default state, the switch control plane runs a single device context (called VDC 1)
within which it will run approximately 80 processes. Some of these processes can have
other threads spawned, resulting in as many as 250 processes actively running on the sys-
tem at a time depending on the services configured. This single device context also has a
number of Layer 2 and 3 services running on top of the infrastructure and kernel compo-
nents of the OS, as shown in Figure 5-3.
This collection of processes constitutes what is seen as the control plane for a single
physical device (that being with no other virtual device contexts enabled). VDC 1 is
always active, always enabled, and can never be deleted. When you create a subsequent
(additional) VDC, the Cisco NX-OS software takes several of the control plane processes
and replicates it for each device context that exists in the switch. When this occurs,
duplication of VRF names and VLAN IDs is possible. For example, you could have a
VRF called “sales” in one device context and the same “sales” name applied to a VRF
in another virtual device context. Hence, each VDC administrator essentially interfaces
with its own set of processes and its own set of VRFs and VLANs, which in turn, rep-
resents its own logical (or virtual) switch context. This provides a clear delineation of
management contexts and forms the basis for configuration separation and independence
between VDCs.
Technet24
||||||||||||||||||||
Each VDC has a minimum of two VRF instances- a default VRF instance and a manage-
ment VRF instance. All Layer 3 interfaces and routing protocols exist in the default VRF
instance until they are assigned to another VRF instance. The mgmt0 interface exists in
the management VRF instance and is accessible from any VDC. Up to 4000 VRF instanc-
es per system are permitted. With each new VDC configured, the number of configu-
rable VRF instances per system is reduced by two because each VDC has a default VRF
instance and a management VRF instance that are not removable.
VDC Architecture
The Cisco NX-OS software provides the base upon which the VDCs are supported.
Figure 5-5 shows NX-OS in VDC mode.
||||||||||||||||||||
At the heart of the NX-OS, are the kernel and infrastructure layer. The kernel supports
all processes and all VDCs that run on the switch, but only a single instance of the kernel
exists at any one point in time. The infrastructure layer provides an interface between
the higher layer processes and the hardware resources of the physical switch (TCAM
and so on). Having a single instance of this layer reduces complexity (when managing the
hardware resources). Having a single infrastructure layer also helps scale performance by
avoiding duplication of this system’s management process.
Working under control of the infrastructure layer are a number of other important system
processes that also exist as unique entities. Of these, the VDC manager is a key process
when it comes to supporting VDCs. The VDC manager is responsible for the creation and
deletion of VDCs. More importantly, it provides VDC-related APIs for other infrastruc-
ture components, such as the system manager and resource manager, to perform their
own related functions.
When a VDC is created, the system manager is responsible for launching all services
required for VDC startup that run on a per-VDC basis. As new services are configured,
the system manager will launch the appropriate process. For example, if OSPF were
enabled in a VDC named Marketing, the system manager would launch an OSPF process
for that VDC. If a VDC is deleted, the system manager is responsible for tearing down all
related processes for that VDC.
The resources manager is responsible for managing the allocation and distribution of
resources between VDCs. Resources such as VLANs, VRFs, port channels, and physical
ports are examples of resources managed by the resource manager.
Sitting above the infrastructure layer and its associated managers are processes that run
on a per-VDC basis. All the Layer 2 and Layer 3 protocol services run within a VDC.
Each protocol service started within a VDC runs independently of the protocol services
in other VDCs. The infrastructure layer protects the protocol services within a VDC so
that a fault or other problem in a service in one VDC does not impact other VDCs. The
Cisco NX-OS software creates these virtualized services only when a VDC is created.
Each VDC has its own instance of each service. These virtualized services are unaware
of other VDCs and only work on resources assigned to that VDC. Only a user with the
network-admin role can control the resources available to these virtualized services. (You
can find more on the network-admin role later in this chapter in the “VDC Management”
section.) The Cisco NX-OS software also creates a virtualized control plane for each
VDC that processes all the protocol-related events.
VDC Types
The use of VDCs with the Cisco Nexus 7000 Series Supervisor 2E or 3E modules allows
a single Cisco Nexus 7000 Series switch to be partitioned into up to eight VDCs: the
default VDC and seven additional VDCs. Another available choice is to create one admin
VDC and eight additional VDCs. More than four VDCs require additional licenses. The
VDC types are discussed in the sections that follow.
Technet24
||||||||||||||||||||
Default VDC
The physical device always has at least one VDC, the default VDC (VDC 1). When you
first log in to a new Cisco NX-OS device, you begin in the default VDC. Initially, all
hardware resources of the switch belong to the default VDC. The default VDC is a fully
functional VDC with all the capabilities and can be used for production traffic with no
issues. Some customers may choose to reserve it for administrative functions.
Some tasks can only be performed in the default VDC, including the following:
■ VDC creation/deletion/suspend
■ Licensing operations
||||||||||||||||||||
The default VDC has a special role: it controls all hardware resources and can access all
other VDCs. VDCs are always created from the default VDC. Hardware resources, such
as interfaces and memory, are also allocated to other VDCs from the default VDC. Other
VDCs only have access to the resources allocated to them and cannot access any other
VDCs.
VDCs are separated on the data plane, control plane, and management plane. The only
exception to this rule is the default VDC, which can interact with the other VDCs on the
management plane. Control plane and data plane functions of the default VDC are still
separated from the other VDCs.
Admin VDC
You can enable an admin VDC at the initial system bootup through a setup script. It is an
optional step, and the creation of an admin VDC is not required. When an admin VDC is
enabled at bootup, it replaces the default VDC. An admin VDC is used for administrative
functions only and is not a fully functional VDC like the default VDC. If an admin VDC
is created, it does not count toward the maximum of eight VDCs on Cisco Nexus 7000
Series switches.
You can also change the default VDC to admin VDC using the following methods.
■ When you enter the system admin-vdc command after bootup, the default VDC
becomes the admin VDC. The nonglobal configuration in the default VDC is lost
after you enter this command. This option is recommended for existing deployments
where the default VDC is used only for administration and does not pass any traffic.
■ You can change the default VDC to the admin VDC with the system admin-vdc
migrate new vdc name command. After you enter this command, the nonglobal
configuration on a default VDC is migrated to the new migrated VDC. This option is
recommended for existing deployments where the default VDC is used for produc-
tion traffic whose downtime must be minimized.
Once an admin VDC is created, it cannot be deleted and it cannot be changed back to
the default VDC without erasing the configuration and performing a fresh bootup.
Admin VDCs are supported on Supervisor 1 and Supervisor 2/2e/3e modules. When an
admin VDC is enabled, only the mgmt0 port can be allocated to the admin VDC, which
means that for an admin VDC, only out-of-band management is possible through the
mgmt0 interface and console port. No other physical Ethernet or logical interfaces are
associated with the admin VDC.
Technet24
||||||||||||||||||||
The admin VDC provides access only to pure system administration tasks, including the
following:
■ Licensing operations
Nondefault VDC
Nondefault VDCs are created by the default VDC and are fully functional VDCs with
all capabilities. Changes done in a nondefault VDC only affect that particular VDC.
Nondefault VDCs have discrete configuration file and checkpoints per VDC. Nondefault
VDCs run independent processes for each protocol per VDC and thus provide fault isola-
tion. Nondefault VDCs can be of the Ethernet type or Storage type. VDCs that only have
Ethernet interfaces allocated to them are called Ethernet VDCs. Ethernet VDCs don’t
have any storage ports such as FCoE ports allocated to them.
||||||||||||||||||||
Storage VDC
Beginning with Cisco NX-OS Release 5.2(1), Nexus 7000 Series devices support Fibre
Channel over Ethernet (FCoE). To run FCoE, a dedicated storage VDC should be config-
ured on the Cisco Nexus 7000 Series devices. The storage VDC is one type of nondefault
VDC. Storage virtual device context (VDC) separates LAN and SAN traffic on the same
switch and maintains one physical infrastructure, but with separate logical data paths.
A storage VDC creates a virtual MDS switch within the Nexus 7000 chassis and par-
ticipates as a full Fibre Channel forwarder (FCF) in the network. A storage VDC can be
configured with zoning, a Fibre Channel alias, Fibre Channel domains, fabric binding, and
so on. After the storage VDC is created, FCoE VLANs can be configured, and interfaces
are specified as dedicated FCoE interfaces or shared interfaces. A shared interface can
carry both Ethernet and FCoE traffic, however; storage traffic is processed in the storage
VDC, while Ethernet traffic is processed in another Ethernet VDC. Traffic is split based
on Ethertype. Traffic from the storage protocol is sent to the storage VDC, while the rest
is sent to the Ethernet VDC, as you can see in Figure 5-9.
The Ethernet VDC administratively “owns” the interface. The shared port must be config-
ured as an 802.1Q trunk in the Ethernet VDC. All ports on the ASIC (port group) must be
configured for sharing. Shutting down a shared interface in the Ethernet VDC shuts down
both Ethernet and storage VDC interfaces. However, shutting down a shared interface in
the storage VDC only shuts down the FCoE interface, not the Ethernet interface.
Technet24
||||||||||||||||||||
Fibre
Channel
Ethernet Storage
VDC VDC
FCoE Initialization
Protocol (FIP)
Ethertype 0x8914
and FCoE 0x8906
only are directed to
the storage VDC.
All other Ethertypes
are directed toward
the Ethernet VDC.
CNA
Server
Although a storage VDC does not require an advanced license (VDCs), it requires the
FCoE license to enable the FCoE function on the modules. There can be only one storage
VDC on the Cisco Nexus 7000 Series device. A default VDC cannot be configured as the
storage VDC. Only the Cisco Nexus 7000 F-series module supports the storage VDC.
The M-series I/O modules do not support storage VDCs. F1 and F2/F2e Series modules
cannot be intermixed in the storage VDC.
||||||||||||||||||||
Figure 5-10 shows three VDCs configured on the same physical Nexus 7000 Switch:
M1-F1 mixed VDC, M1-XL only VDC, and F2 only VDC.
Table 5-4 shows the VDC module type compatibilities for Cisco NX-OS Release 8.x, the
latest version at the time of this writing.
Technet24
||||||||||||||||||||
VDC2
VDC3
VDC Resources
When you’re creating VDCs, certain resources are shared across VDCs while others must
be dedicated to a VDC. Three types of VDC resources can be defined:
■ Global resources: These resources can only be allocated, set, or configured glob-
ally for all VDCs. These resources include boot image configuration, Ethanalyzer
session, NTP servers, CoPP configuration, and in-band SPAN sessions. For example,
boot string specifies the version of software that should be used upon booting up
the switch. It is not possible to run different versions of the Cisco NX-OS software
in different VDCs.
■ Shared resources: These resources are shared between VDCs, such as the OOB
Ethernet management port on the supervisor. For example, if multiple VDCs are
configured and accessible from the management interface, they must share it, and
the OOB management interface cannot be allocated to a VDC like other regular
switch ports. The management interfaces of the VDCs should be configured with IP
addresses on the same IP subnet as the management VRF instance.
||||||||||||||||||||
Types of Resources
VDC resources can also be classified into two broad categories: physical and logical.
Physical Resources
Network admins can allocate physical device resources exclusively for the use of a VDC.
Once a resource is assigned to a specific VDC, you can manage it only from that VDC.
Users logging directly into the VDC can only see this limited view of the device and can
manage only those resources that the network administrator explicitly assigns to that
VDC. Users within a VDC cannot view or modify resources in other VDCs.
The only physical resources that you can allocate to a VDC are the Ethernet interfaces.
For the Ethernet VDCs, each physical Ethernet interface can belong to only one VDC,
including the default VDC, at any given time. When you are working with shared inter-
faces in the storage VDC, the physical interface can belong to both one Ethernet VDC
and one storage VDC simultaneously, but to no more than one of each.
Initially, all physical interfaces belong to the default VDC (VDC 1). When you create a
new VDC, the Cisco NX-OS software creates the virtualized services for the VDC with-
out allocating any physical interfaces to it. After you create a new VDC, you can allocate
a set of physical interfaces from the default VDC to the new VDC. When you allocate an
interface to a VDC, all configuration for that interface is erased. The VDC administrator
must configure the interface from within the VDC. Only the interfaces allocated to the
VDC are visible for configuration. On many I/O modules, any port can be individually
assigned to a VDC. The exceptions to this rule include modules whose architecture uses
port groups that consist of two, four, or eight ports each, where the ports within the
same port group share some common hardware elements. In this case, ports in the same
group must be assigned to the same VDC. For example, the N7K-M132XP-12 line card
has eight port groups of four interfaces each, so in total 32 interfaces. Hence all N7K-
M132XP-12 line cards require allocation in groups of four ports. Interfaces belonging to
the same port group must belong to the same VDC.
Figure 5-12 illustrates the interface allocation for port groups on a Cisco N7K-
M132XP-12 line card.
VDC-A VDC-C
VDC-B VDC-D
Figure 5-12 Interface Allocation for Port Groups on Cisco N7K-M132XP-12 Line Card
Technet24
||||||||||||||||||||
Note Beginning with Cisco NX-OS Release 5.2(1) for Cisco Nexus 7000 Series devices,
all members of a port group are automatically allocated to the VDC when you allocate an
interface.
Logical Resources
Each VDC acts as a separate logical device within a single physical device, which means
that all the namespaces are unique within a VDC. However, you cannot use an identical
namespace within a storage VDC and an Ethernet VDC.
When you create a VDC, it has its own default VLAN and VRF that are not shared with
other VDCs. You can also create other logical entities within a VDC for the exclusive
use of that VDC. These logical entities, which include SPAN monitoring sessions, port
channels, VLANs, and VRFs, are for Ethernet VDCs. When you create a logical entity in
a VDC, only users in that VDC can use it even when it has the same identifier as another
logical entity in another VDC. A VDC administrator can configure VLAN IDs indepen-
dently of the VLAN IDs used in other Ethernet VDCs on the same physical device. For
example, if VDC administrators for Ethernet VDC A and Ethernet VDC B both create
VLAN 10, these VLANs are internally mapped to separate unique identifiers as shown in
Figure 5-13. But remember, when you are working with both storage VDCs and Ethernet
VDCs, the VLAN ID and logical entity must be entirely separate for the storage VDCs.
Physical device
VDC A VDC B
VLAN 10 VLAN 10
||||||||||||||||||||
shares are supported on Supervisor 2/2e/3e modules. You can also configure the number
of CPU shares on a VDC. For example, a VDC with 10 CPU shares gets twice the CPU
time compared to a VDC that has five CPU shares, as shown in Figure 5-14.
VDC 1 8 Shares
VDC 2 10 Shares
VDC 3 5 Shares
0 2 4 6 8 10
CPU Shares
CPU shares are not used to limit the CPU that is available to a VDC but to allocate CPU
cycles to a VDC with a higher priority during times of congestion. CPU shares’ configu-
ration takes effect immediately, and there is no need to restart/reload the switch. The
CPU time a process receives is as follows:
Note You cannot change the limits in the default VDC resource template.
Technet24
||||||||||||||||||||
Figure 5-15 shows how the distributed Layer 2 learning process is affected by the pres-
ence of VDCs. On line card 1, MAC address A is learned from port 1/2 in VDC 10. This
address is installed in the local Layer 2 forwarding table of line card 1. The MAC address
is then forwarded to both line cards 2 and 3. As line card 3 has no ports that belong
to VDC 10, it will not install any MAC addresses learned from that VDC. Line card 2,
however, does have a local port in VDC 10, so it will install MAC address A into its local
forwarding tables.
Switch Fabric
X
Line card 1 Line card 2 Line card 3
MAC Table MAC Table MAC Table
1/1 1/2 1/3 1/4 2/1 2/2 2/3 2/4 3/1 3/2 3/3 3 /4
VDC
VDC
VDC
VDC
VDC
VDC
VDC
10
30
30
20
10
20
20
MAC Address A MAC “A” is propagated to line card 2 and 3 but only
line card 2 installs MAC due to local port being in
VDC 10
||||||||||||||||||||
With this implementation of Layer 2 learning, the Cisco Nexus 7000 Series switch offers
a way to scale the use of the Layer 2 MAC address table more efficiently when VDCs are
unique to line cards.
When the default VDC is the only active VDC, learned routes and ACLs are loaded into
each line card TCAM table so that each line card has the necessary information local to it
to make an informed forwarding decision. Figure 5-16 illustrates this process, where the
routes for the default “red” VDC are present in the FIB and ACL TCAMs.
Line card 1 Line card 2 Line card 3 Line card 4 Line card 5 Line card 6 Line card 7 Line card 8
When physical port resources are split between VDCs, only the line cards associated
with that VDC are required to store forwarding information and associated ACLs. In this
way, the resources can be scaled beyond the default system limits seen in the preceding
example. Table 5-5 illustrates a resource separation example.
Technet24
||||||||||||||||||||
Line card 1 Line card 2 Line card 3 Line card 4 Line card 5 Line card 6 Line card 7 Line card 8
The effect of allocating a subset of ports to a given VDC results in the FIB and ACL
TCAM for the respective line cards being primed with the forwarding information and
ACLs for that VDC. This extends the use of those TCAM resources beyond the simple
system limit described earlier. In the preceding example, a total of 188,000 forwarding
entries have been installed in a switch that, without VDCs, would have a system limit
of 128,000 forwarding entries. Likewise, a total of 80,000 ACEs have been installed,
where a single VDC would only allow 64,000 access control entries. More important,
FIB and ACL TCAM space on line cards 4, 6, 7, and 8 is free for use by additional VDCs
that might be created. This further extends the use of those resources well beyond the
defined system limits noted here.
||||||||||||||||||||
Figure 5-18 shows that a fault in a process running in VDC 1 does not impact any of the
running processes in the other VDCs. Other equivalent processes will continue to run
uninhibited by any problems associated with the faulty running process.
Fault isolation is enhanced with the ability to provide per-VDC debug commands. Per-
VDC logging of messages via syslog is also another important characteristic of the VDC
fault isolation capabilities. When combined, these two features provide a powerful tool
for administrators to locate problems.
The creation of multiple VDCs also permits configuration isolation. Each VDC has its
own unique configuration file that is stored separately in NVRAM. There are a number
of resources in each VDC whose associated numbers and IDs can overlap between mul-
tiple VDCs without having an effect on another VDC’s configuration. For example, the
same VRF IDs, port channel numbers, VLAN IDs, and management IP address can exist
on multiple VDCs. More important, configuration separation in this manner not only
secures configurations between VDCs but also isolates a VDC from being affected by an
erroneous configuration change in another VDC.
Technet24
||||||||||||||||||||
Should a control plane failure occur, the administrator has a set of options that can be
configured on a per-VDC basis defining what action will be taken regarding that VDC.
Three actions can be configured: restart, bringdown, and reset. The restart option will
delete the VDC and then re-create it with the running configuration. This configured
action will occur regardless of whether there are dual supervisors or a single supervi-
sor present in the chassis. The bringdown option will simply delete the VDC. The reset
option will issue a reset for the active supervisor when there is only a single supervisor
in the chassis. If dual supervisors are present, the reset option will force a supervisor
switchover.
The default VDC always has a high-availability option of reset assigned to it. Subsequent
VDCs created will have a default value of bringdown assigned to them. This value can be
changed under configuration control.
Stateful switchover is supported with dual supervisors in the chassis. During the course
of normal operation, the primary supervisor will constantly exchange and synchronize
its state with the redundant supervisor. A software process (watchdog) is used to moni-
tor the responsiveness of the active (primary) supervisor. Should the primary supervisor
fail, a fast switchover is enacted by the system. Failover occurs at both the control plane
and data plane layers. At supervisor switchover, the data plane continues to use the Layer
2– and Layer 3–derived forwarding entries simply by maintaining the state written into
the hardware. For the control plane, the graceful restart process that is part of nonstop
forwarding (NSF) is used to provide failover for Layer 3. For Layer 2, the control plane is
maintained by locally stateful PSS mechanisms. This process provides for the following:
■ A nondisruptive recovery mechanism that will not render the network unstable
during the recovery process
Table 5-6 shows the result of various policy configurations, depending on single-
supervisor or dual-supervisor module configuration.
||||||||||||||||||||
ISSU is another important aspect of high availability that has a direct effect on VDCs.
ISSU allows the administrator to install and activate a new version of software in a chas-
sis that is running two supervisors. The software upgrade can be applied to the backup
supervisor, and then a switchover to that upgraded supervisor is invoked. The other
supervisor is then upgraded with the same new set of software; all the while the system
maintains data flow without interruption. ISSU cannot be applied on a per-VDC basis.
The installed software on the chassis is applicable for all active VDCs.
VDC Management
When several departments share a single physical device and each department has its
own administrator, it presents a security concern to share one administrator account with
every department admin. VDC user roles comes to the rescue here. Using VDC user
roles, each VDC can be managed by a different VDC administrator. An action taken by a
VDC administrator in one VDC does not impact users in other VDCs. A VDC adminis-
trator within a VDC can create, modify, and delete the configuration for resources allo-
cated to VDC with no impact to other VDCs.
The Cisco NX-OS software has default user roles that the network administrator can
assign to the user accounts that administer VDCs. These user roles make available a set
of commands the user can execute after logging into the device. All commands the user
is not allowed to execute are hidden from the user or return an error. You must have the
network-admin or vdc-admin role to create user accounts in a VDC.
The Cisco NX-OS software provides default user roles with different levels of authority
for VDC administration as follows:
■ network-admin: The first user account created on a Cisco Nexus 7000 Series switch
in the default VDC is the user “admin.” This user is automatically assigned the
network-admin role. The network-admin role, which exists only in the default VDC,
allows access to all the global configuration commands (such as reload and install)
and all the features on the physical device. A custom user role is not granted access
to these network-admin-only commands or to other commands that are scoped
admin-only. Only the network administrator can access all the commands related to
the physical state of the device. This role can perform system-impacting functions
Technet24
||||||||||||||||||||
such as upgrading software and running an Ethernet analyzer on the traffic. Network
administrators can create and delete VDCs, allocate resources for these VDCs, man-
age device resources reserved for the VDCs, and configure features within any VDC.
Network administrators can also access nondefault VDCs using the switchto vdc
command from the default VDC. When network administrators switch to a nonde-
fault VDC, they acquire vdc-admin permissions, which are the highest permissions
available in a nondefault VDC.
■ network-operator: The second default role that exists on Cisco Nexus 7000 Series
switches is the network-operator role. This role gives a user read-only rights in the
default VDC. The network-operator role, which exists only in the default VDC,
allows users to display information for all VDCs on the physical device. Users with
network-operator roles can access nondefault VDCs using the switchto vdc com-
mand from the default VDC. By default, there are no users assigned to this role. The
role must be specifically assigned to a user by a user who has network-admin rights.
■ vdc-admin: When a VDC is created, the first user account created on that VDC is
the user “admin,” similar to the way the admin user was created for the whole physi-
cal switch in default VDC. The admin user on a nondefault VDC is automatically
assigned the vdc-admin role. Users who have the vdc-admin role can configure all
features within a VDC. Users with either the network-admin or vdc-admin role can
create, modify, or remove user accounts within the VDC. All configurations for the
interfaces allocated to a VDC must be performed within the VDC. Users with the
vdc-admin role are not allowed to execute any configuration commands related to
the physical device.
■ vdc-operator: The vdc-operator role has read-only rights for a specific VDC. This
role has no rights to any of the other VDCs. Users assigned the vdc-operator role
can display information only for the VDC. Users with either the network-admin or
vdc-admin role can assign the vdc-operator role to user accounts within the VDC.
The vdc-operator role does not allow the user to change the configuration of the
VDC. When a user who has the network-admin or network-operator role accesses a
nondefault VDC using the switchto command, that user will be mapped to a role of
the same level in that VDC. A user with the network-admin role will get the VDC-
admin role in the nondefault VDCs. A user with the network-operator role will get
the VDC-operator role in the nondefault VDCs.
Figure 5-19 shows various default user roles available for VDC administration.
Default VDC access is restricted to a select few administrators who are allowed to
modify the global configuration (network-admin role). Few features (such as CoPP and
rate limits) can only be configured in the default VDC. If the default VDC is used for
data plane traffic, administrators who require default VDC configuration access but not
global configuration access should be assigned with the vdc-admin role. This role restricts
administrative functions to the default VDC exclusively and prevents access to global
VDC configuration commands.
||||||||||||||||||||
Figure 5-20 illustrates that all the VDCs share the management network, but their virtual
management interface IP addresses are unique. Also, the services are using shared or
separate external services, such as syslog.
mgmt
10.1.1.200 10.1.1.100
Syslog Server for VDC-1 and VDC-2 Syslog Server for VDC-3
Technet24
||||||||||||||||||||
Figure 5-21 illustrates how each VDC has management access from its own unique net-
work, and external services such as RADIUS and syslog are unique to each VDC.
Physical
Device VDC-1 VDC-2 VDC-3
(Default VDC)
mgmt 0
AAA AAA AAA
syslog sshd syslog sshd syslog sshd
Eth 1/2 Eth 2/3 Eth 2/5 Eth 1/6 Eth 3/4 Eth 3/5 Eth 1/7 Eth 4/3 Eth 4/5
VDC Configuration
Table 5-7 summarizes the NX-OS CLI commands related to basic VDC configuration and
verification.
Table 5-7 Summary of NX-OS CLI Commands for VDC Configuration and Verification
Command Purpose
show resource Displays the VDC resource configuration for the
current VDC.
show vdc [vdc-name] Displays the VDC configuration information.
show vdc detail Displays the VDC status information.
show running-config {vdc | vdc-all} Displays the VDC information in the running configu-
ration.
show module Displays module information.
||||||||||||||||||||
Command Purpose
configure terminal Enters global configuration mode.
vdc {switch | vdc-name} [ha-policy Creates a VDC and enters the VDC configuration
{dual-sup {bringdown | restart | mode.
switchover} [single-sup {bringdown
The keywords and arguments are as follows:
| reload | restart}] [id vdc-number]
[template template-name] [template ■ switch: Specifies the default VDC. VDC number 1
template-name] [type storage] is reserved for the default VDC.
[no] allocate interface ethernet Allocates a range of interfaces on the same module to
slot/port - last-port the VDC.
Technet24
||||||||||||||||||||
Command Purpose
limit-resource vrf minimum min- Specifies the limits for VRF. The equal-to-min
value maximum {max-value | equal- keyword automatically sets the maximum limit equal
to-min} to the minimum limit.
show vdc membership [status] Displays the status of VDC interface membership.
switchto vdc vdc-name Switches to the nondefault VDC.
show user-account Displays the role configuration.
Examples 5-7 and 5-8 show the basic VDC configuration and verification on a standalone
Nexus 7000 Series switch. In Example 5-7, we create a nondefault VDC named Pod8
from the admin VDC and allocate interfaces to it. In Example 5-8, we set up the newly
created nondefault VDC and do the final verification from the newly created nondefault
VDC Pod8 itself.
! Checking configured resources for Admin VDC. In Admin VDC, no of vrfs are limited
to 2048.
N7K-A# show resource
||||||||||||||||||||
monitor-session-erspan-dst 0 23 0 0 23
vrf 2 2048 2 0 2046
port-channel 0 768 0 0 767
u4route-mem 96 96 1 95 95
u6route-mem 24 24 1 23 23
m4route-mem 58 58 0 58 58
m6route-mem 8 8 0 8 8
monitor-session-inband-src 0 1 0 0 1
anycast_bundleid 0 16 0 0 16
monitor-session-mx-excepti 0 1 0 0 1
monitor-session-extended 0 12 0 0 12
monitor-rbs-filter 0 12 0 0 12
monitor-rbs-product 0 12 0 0 12
! Checking running-config of Admin VDC. Note that the Default VDC was converted to
Admin VDC using the command system admin-vdc.
N7K-A# show running-config vdc | begin admin
system admin-vdc
vdc N7K-A id 1
cpu-share 5
limit-resource vlan minimum 16 maximum 4094
limit-resource monitor-session minimum 0 maximum 2
limit-resource monitor-session-erspan-dst minimum 0 maximum 23
limit-resource vrf minimum 2 maximum 2048
limit-resource port-channel minimum 0 maximum 768
limit-resource u4route-mem minimum 96 maximum 96
limit-resource u6route-mem minimum 24 maximum 24
limit-resource m4route-mem minimum 58 maximum 58
limit-resource m6route-mem minimum 8 maximum 8
limit-resource monitor-session-inband-src minimum 0 maximum 1
limit-resource anycast_bundleid minimum 0 maximum 16
limit-resource monitor-session-mx-exception-src minimum 0 maximum 1
limit-resource monitor-session-extended minimum 0 maximum 12
limit-resource monitor-rbs-filter minimum 0 maximum 12
limit-resource monitor-rbs-product minimum 0 maximum 12
<output omitted>
! Checking modules installed on the Nexus 7000 Switch Chassis. All line cards
installed are F3 modules. Since the switch has Sup2E supervisor module, it supports
Admin + 8 Non-default VDCs, so in total 9 VDCs. Currently only Admin + 7 Non-default
VDCs are configured.
N7K-A# show module
Technet24
||||||||||||||||||||
Mod Sw Hw
--- --------------- ------
1 8.4(2) 6.1
3 8.4(2) 1.0
4 8.4(2) 1.0
5 8.4(2) 1.1
6 8.4(2) 1.1
7 8.4(2) 1.1
8 8.4(2) 1.1
<output omitted>
! Pod8 VDC was automatically assigned vdc_id 9 and supported linecards as m1 m1xl
m2xl and f2e by default.
N7K-A(config-vdc)# show vdc
||||||||||||||||||||
! Since we are having only F3 line cards in the chassis, we will limit the module-
type resource to f3 type modules and allocate ports from module 4 and 8 to the
newly created non-default VDC Pod8.
N7K-A(config-vdc)# limit-resource module-type f3
This will cause all ports of unallowed types to be removed from this vdc. Continue
(y/n)? [yes] y
N7K-A(config-vdc)# allocate interface Ethernet4/25-32
Moving ports will cause all config associated to them in source vdc to be removed.
Are you sure you want to move the ports (y/n)? [yes] y
N7K-A(config-vdc)# allocate interface Ethernet8/7-12
Moving ports will cause all config associated to them in source vdc to be removed.
Are you sure you want to move the ports (y/n)? [yes] y
! Verifying the running-config of the newly created non-default VDC Pod8. Note that
the maximum number of vrf that can be configured on Pod8 VDC is 4096.
N7K-A(config-vdc)# show running-config vdc | begin Pod8
vdc Pod8 id 9
limit-resource module-type f3
allow feature-set fabricpath
allow feature-set fabric
Technet24
||||||||||||||||||||
cpu-share 5
allocate interface Ethernet4/25-32
allocate interface Ethernet8/7-12
boot-order 1
limit-resource vlan minimum 16 maximum 4094
limit-resource monitor-session minimum 0 maximum 2
limit-resource monitor-session-erspan-dst minimum 0 maximum 23
limit-resource vrf minimum 2 maximum 4096
limit-resource port-channel minimum 0 maximum 768
limit-resource u4route-mem minimum 8 maximum 8
limit-resource u6route-mem minimum 4 maximum 4
limit-resource m4route-mem minimum 8 maximum 8
limit-resource m6route-mem minimum 5 maximum 5
limit-resource monitor-session-inband-src minimum 0 maximum 1
limit-resource anycast_bundleid minimum 0 maximum 16
limit-resource monitor-session-mx-exception-src minimum 0 maximum 1
limit-resource monitor-session-extended minimum 0 maximum 12
limit-resource monitor-rbs-filter minimum 0 maximum 12
limit-resource monitor-rbs-product minimum 0 maximum 12
<output omitted>
! Limiting the maximum vrf numbers to 2048 and verifying the same.
N7K-A(config-vdc)# limit-resource vrf minimum 2 maximum 2048
N7K-A(config-vdc)# show running-config vdc | begin Pod8 | include Pod8|vrf
vdc Pod8 id 9
limit-resource vrf minimum 2 maximum 2048
! Verifying the unallocated interfaces on the Nexus 7000 Switch Chassis. Note that
unallocated interfaces are assigned to vdc_id 0 and not Admin VDC (vdc_id 1). Pod8
VDC is showing correctly the allocated interfaces in previous steps.
N7K-A(config-vdc)# show vdc membership
Flags : b - breakout port
---------------------------------
||||||||||||||||||||
<output omitted>
N7K-A(config-vdc)# end
N7K-A#
Example 5-8 Setup and Verification of the Newly Created Nondefault VDC Pod8
! Switching to newly created non-default VDC Pod8 and doing initial set-up of the
Pod8 VDC. The options are self-explanatory.
N7K-A# switchto vdc Pod8
This setup utility will guide you through the basic configuration of
the system. Setup configures only enough connectivity for management
of the system.
Technet24
||||||||||||||||||||
Would you like to enter the basic configuration dialog (yes/no): yes
Type of ssh key you would like to generate (dsa/rsa) [rsa]: rsa
||||||||||||||||||||
Configuration update aborted: This vdc has had a global configuration change since
the last saved config. Please save config in default vdc before proceeding
! show vdc command from within a non-default VDC displays only the information
about the non-default VDC you are logged in.
N7K-A-Pod8# show vdc
Technet24
||||||||||||||||||||
! Verifying detailed vdc configuration of the non-default VDC Pod8. Note the config-
ured vdc ha policy of RESTART and dual-sup ha policy of SWITCHOVER. These are the
defaults for single sup and dual sup module configuration.
N7K-A-Pod8# show vdc detail
vdc id: 9
vdc name: Pod8
vdc state: active
vdc mac address: e4:c7:22:15:2c:49
vdc ha policy: RESTART
vdc dual-sup ha policy: SWITCHOVER
vdc boot Order: 1
CPU Share: 5
CPU Share Percentage: 11%
vdc create time: Fri Jan 21 17:15:31 2022
vdc reload count: 0
vdc uptime: 0 day(s), 0 hour(s), 39 minute(s), 45 second(s)
vdc restart count: 1
vdc restart time: Fri Jan 21 17:15:31 2022
vdc type: Ethernet
vdc supported linecards: f3
||||||||||||||||||||
References 177
Summary
This chapter discusses Cisco Nexus switch functional planes, Cisco Nexus switch
process separation and restartability, virtual routing and forwarding (VRFs) instances, and
Cisco Nexus 7000 switch VDC configuration and verification on NX-OS, including the
following points:
■ Cisco Nexus switch divides the traffic it manages into three functional components
or planes: data plane, control plane, and management plane.
■ The Cisco NX-OS service restart feature restarts a faulty service without restarting
the supervisor to prevent process-level failures from causing system-level failures.
■ A VRF virtualizes the Layer 3 forwarding and routing tables. Each VRF makes rout-
ing decisions independent of any other VRFs. Each NX-OS device has a default VRF
and a management VRF.
■ VDCs partition a single physical device into multiple logical devices that provide
fault isolation, management isolation, address allocation isolation, service differentia-
tion domains, and adaptive resource management.
■ There are three types of VDCs: default VDC, admin VDC, and nondefault VDC.
Nondefault VDCs are further classified as Ethernet VDCs and storage VDCs.
■ There are three types of VDC resources: global resources, dedicated resources, and
shared resources. VDC resources can also be classified as physical resources and
logical resources.
■ CPU shares are used to control the CPU resources among the VDCs to prioritize
VDC access to the CPU during CPU contention.
■ The Cisco NX-OS software provides four default user roles with different levels of
authority for VDC administration: network-admin, network-operator, vdc-admin, and
vdc-operator.
References
“Cisco Nexus 9000 Series NX-OS Security Configuration Guide, Release 10.2(x),”
https://www.cisco.com/c/en/us/td/docs/dcn/nx-os/nexus9000/102x/configuration/
Security/cisco-nexus-9000-nx-os-security-configuration-guide-102x.html
Technet24
||||||||||||||||||||
“Cisco Nexus 7000 Series NX-OS Security Configuration Guide, Release 8.x,” https://
www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus7000/sw/security/config/
cisco_nexus7000_security_config_guide_8x.html
“Cisco Nexus 9000 Series NX-OS High Availability and Redundancy Guide, Release
10.1(x),” https://www.cisco.com/c/en/us/td/docs/dcn/nx-os/nexus9000/101x/con-
figuration/high-availability-and-redundancy/cisco-nexus-9000-series-nx-os-high-
availability-and-redundancy-guide-101x.html
“Cisco Nexus 7000 Series NX-OS High Availability and Redundancy Guide, Release 8.x,”
https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus7000/sw/high-
availability/config/cisco_nexus7000_high_availability_config_guide_8x.html
“Cisco Nexus 9000 Series NX-OS Unicast Routing Configuration Guide, Release 10.2(x),”
https://www.cisco.com/c/en/us/td/docs/dcn/nx-os/nexus9000/102x/configuration/
Unicast-routing/cisco-nexus-9000-series-nx-os-unicast-routing-configuration-guide-
release-102x.html
“Cisco Nexus 7000 Series NX-OS Unicast Routing Configuration Guide, Release 8.x,”
https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus7000/sw/unicast/
config/cisco_nexus7000_unicast_routing_config_guide_8x.html
“Cisco Nexus 7000 Series Virtual Device Context Configuration Guide, Release 8.x,”
https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus7000/sw/vdc/con-
fig/cisco_nexus7000_vdc_config_guide_8x.html
Relevant Cisco Live sessions: http://www.ciscolive.com
||||||||||||||||||||
Chapter 6
Modern-day data centers have rapidly evolving trends and technologies that underscore
the need for scale and fast convergence. In order to efficiently route traffic within the
data center, Interior Gateway Protocols (IGPs) are deployed either in the network core
or within the underlay network. The IGPs commonly used today are Open Shortest Path
First Protocol (OSPF) and Enhanced Interior Gateway Routing Protocol (EIGRP). In addi-
tion, traditional IP communication allows a host to send packets to a single host (unicast
transmission) or to all hosts (broadcast transmission). IP multicast provides a third pos-
sibility: allowing a host to send packets to a subset of all hosts as a group transmission.
Protocol-Independent Multicast (PIM) is the routing protocol for multicast forwarding.
In this chapter, we will discuss the underlying concepts along with the configuration and
verification for the Layer 3 unicast routing protocols in Cisco NX-OS, including RIPv2,
EIGRP, and OSPFv2. We will also discuss the multicast fundamentals, including PIM con-
figuration and verification in Cisco NX-OS.
Routing Fundamentals
Layer 3 unicast routing involves two basic activities: determining optimal routing paths
and packet switching. You can use routing algorithms to calculate the optimal path from
the router to a destination. This calculation depends on the algorithm selected, route met-
rics, and other considerations such as load balancing and alternate path discovery.
Technet24
||||||||||||||||||||
associations tell a router that an IP destination can be reached optimally by sending the
packet to a particular router that represents the next hop on the way to the final destina-
tion. When a router receives an incoming packet, it checks the destination address and
attempts to associate this address with the next hop.
Routing tables can contain other information, such as the data about the desirability of
a path. Routers compare metrics to determine optimal routes, and these metrics differ
depending on the design of the routing algorithm used.
Routers communicate with one another and maintain their routing tables by transmitting
a variety of messages. The routing update message is one such message that consists of
all or a portion of a routing table. By analyzing routing updates from all other routers, a
router can build a detailed picture of the network topology. A link-state advertisement,
which is another example of a message sent between routers, informs other routers of
the link state of the sending router. You can also use link information to enable routers to
determine optimal routes to network destinations.
A key aspect to measure for any routing algorithm is how much time a router takes to
react to network topology changes. When a part of the network changes for any reason,
such as a link failure, the routing information in different routers might not match. Some
routers will have updated information about the changed topology, while others will still
have the old information. The convergence is the amount of time before all routers in
the network have updated, matching routing information. The convergence time varies
depending on the routing algorithm. Fast convergence minimizes the chance of lost pack-
ets caused by inaccurate routing information.
Packet Switching
In packet switching, a host determines that it must send a packet to another host. Having
acquired a router address by some means, the source host sends a packet addressed spe-
cifically to the router physical (Media Access Control [MAC] layer) address but with the
IP (network layer) address of the destination host.
The router examines the destination IP address and tries to find the IP address in the
routing table. If the router does not know how to forward the packet, it typically drops
the packet. If the router knows how to forward the packet, it changes the destination
MAC address to the MAC address of the next-hop router and transmits the packet.
The next hop might be the ultimate destination host or another router that executes
the same switching decision process. As the packet moves through the internetwork,
its physical address changes but its protocol address remains constant, as shown in
Figure 6-1.
||||||||||||||||||||
Source host
PC Packet
Packet
Router 1
To: Destination host (protocol address)
Router 2 (physical address)
Router 2
Packet
Destination host
PC
Routing Metrics
Routing algorithms use many different metrics to determine the best route. Sophisticated
routing algorithms can base route selection on multiple metrics.
Path Length
The path length is the most common routing metric. Some routing protocols allow you to
assign arbitrary costs to each network link. In this case, the path length is the sum of the
costs associated with each link traversed. Other routing protocols define the hop count,
Technet24
||||||||||||||||||||
which is a metric that specifies the number of passes through internetworking products,
such as routers, that a packet must take from a source to a destination.
Reliability
The reliability, in the context of routing algorithms, is the dependability (in terms of the
bit-error rate) of each network link. Some network links might go down more often than
others. After a network fails, certain network links might be repaired more easily or more
quickly than other links. The reliability factors you can take into account when assigning
the reliability rating are arbitrary numeric values you usually assign to network links.
Routing Delay
The routing delay is the length of time required to move a packet from a source to a
destination through the internetwork. The delay depends on many factors, including the
bandwidth of intermediate network links, the port queues at each router along the way,
the network congestion on all intermediate network links, and the physical distance the
packet must travel. Because the routing delay is a combination of several important vari-
ables, it is a common and useful metric.
Bandwidth
The bandwidth is the available traffic capacity of a link. For example, a 10-Gigabit
Ethernet link is preferable to a 1-Gigabit Ethernet link. Although the bandwidth is the
maximum attainable throughput on a link, routes through links with greater bandwidth
do not necessarily provide better routes than routes through slower links. For example, if
a faster link is busier, the actual time required to send a packet to the destination could
be greater.
Load
The load is the degree to which a network resource, such as a router, is busy. You can
calculate the load in a variety of ways, including CPU usage and packets processed per
second. Monitoring these parameters on a continual basis can be resource intensive.
Communication Cost
The communication cost is a measure of the operating cost to route over a link. The com-
munication cost is another important metric, especially if you do not care about perfor-
mance as much as operating expenditures. For example, the line delay for a private line
might be longer than a public line, but you can send packets over your private line rather
than through the public lines, which cost money for usage time.
||||||||||||||||||||
Router IDs
Each routing process has an associated router ID. You can configure the router ID to any
interface in the system. If you do not configure the router ID, Cisco NX-OS selects the
router ID based on the following criteria:
■ Cisco NX-OS prefers loopback0 over any other interface. If loopback0 does not
exist, Cisco NX-OS prefers the first loopback interface over any other interface type.
■ If you have not configured a loopback interface, Cisco NX-OS uses the first inter-
face in the configuration file as the router ID. If you configure any loopback inter-
face after Cisco NX-OS selects the router ID, the loopback interface becomes the
router ID. If the loopback interface is not loopback0 and you configure loopback0
with an IP address, the router ID changes to the IP address of loopback0.
■ If the IP address of the interface the router ID is based on changes, that new IP
address becomes the router ID. If any other interface changes its IP address, there is
no router ID change.
Autonomous Systems
An autonomous system (AS) is a portion of an internetwork under common admin-
istrative authority that is regulated by a particular set of administrative guidelines.
Autonomous systems divide global external networks into individual routing domains,
where local routing policies are applied. This organization simplifies routing domain
administration and simplifies consistent policy configuration. Each autonomous system
can support multiple interior routing protocols that dynamically exchange routing infor-
mation through route redistribution.
The autonomous system number assignment for public and private networks is governed
by the Internet Assigned Number Authority (IANA). A public autonomous system can be
directly connected to the Internet. This autonomous system number (AS number) identi-
fies both the routing process and the autonomous system. Private autonomous system
numbers are used for internal routing domains but must be translated by the router for
traffic that is routed out to the Internet. You should not configure routing protocols to
advertise private autonomous system numbers to external networks. By default, Cisco
NX-OS does not remove private autonomous system numbers from routing updates.
Administrative Distance
An administrative distance is a rating of the trustworthiness of a routing information
source. A higher value indicates a lower trust rating. Typically, a route can be learned
through more than one protocol. Administrative distance is used to distinguish between
routes learned from more than one protocol. The route with the lowest administrative dis-
tance is installed in the IP routing table.
Table 6-1 shows the default administrative distance for selected routing information
sources.
Technet24
||||||||||||||||||||
Routing protocols can use load balancing or equal-cost multipath (ECMP) to share traf-
fic across multiple paths. If the router receives and installs multiple paths with the same
administrative distance and cost to a destination, load balancing can occur. Load balanc-
ing distributes the traffic across all the paths, sharing the load. The number of paths used
is limited by the number of entries the routing protocol puts in the routing table. ECMP
does not guarantee equal load balancing across all links. It guarantees only that a particu-
lar flow will choose one particular next hop at any point in time.
If you have multiple routing protocols configured in your network, you can configure
these protocols to share routing information by configuring route redistribution in each
protocol. The router that is redistributing routes from another protocol sets a fixed route
metric for those redistributed routes, which prevents incompatible route metrics between
the different routing protocols. For example, routes redistributed from EIGRP into OSPF
are assigned a fixed link cost metric that OSPF understands. Route redistribution also
uses an administrative distance to distinguish between routes learned from two different
routing protocols. The preferred routing protocol is given a lower administrative distance
so that its routes are picked over routes from another protocol with a higher administra-
tive distance assigned.
Routing Algorithms
Routing algorithms determine how a router gathers and reports reachability informa-
tion, how it deals with topology changes, and how it determines the optimal route to a
destination. Various types of routing algorithms exist, and each algorithm has a different
impact on network and router resources. Routing algorithms use a variety of metrics that
affect calculation of optimal routes. You can classify routing algorithms in the following
major categories.
||||||||||||||||||||
in environments where network traffic is relatively predictable and where network design
is relatively simple.
Because static routing systems cannot react to network changes, you should not use
them for large, constantly changing networks. Most routing protocols today use dynamic
routing algorithms that adjust to changing network circumstances by analyzing incoming
routing update messages. If the message indicates that a network change has occurred,
the routing software recalculates routes and sends out new routing update messages.
These messages permeate the network, triggering routers to rerun their algorithms and
change their routing tables accordingly.
Technet24
||||||||||||||||||||
Figure 6-3 shows a routing table example for a distance vector protocol.
E0 A S0 S0 B S1 S0 C E0
To prevent routing loops, most distance vector algorithms use split horizon with poison
reverse, which means that the routes learned from an interface are set as unreachable and
advertised back along the interface that they were learned on during the next periodic
update. This process prevents the router from seeing its own route updates coming back.
Distance vector algorithms send updates at fixed intervals but can also send updates in
response to changes in route metric values. These triggered updates can speed up the
route convergence time. The Routing Information Protocol (RIP) is a distance vector
protocol.
The link-state protocols, also known as shortest path first (SPF), use the Dijkstra algo-
rithm to find the shortest path between two nodes in the network. Each router shares
information with neighboring routers and builds a link-state advertisement (LSA) that
contains information about each link and directly connected neighbor routers. Each LSA
has a sequence number. When a router receives an LSA and updates its link-state data-
base, the LSA is flooded to all adjacent neighbors. If a router receives two LSAs with the
same sequence number (from the same router), the router does not flood the last LSA it
received to its neighbors because it wants to prevent an LSA update loop. Because the
router floods the LSAs immediately after it receives them, the convergence time for link-
state protocols is minimized.
Figure 6-5 illustrates the link-state protocol update and convergence mechanism.
||||||||||||||||||||
B A
D
Topological
Database Link-State Packets
SPF Routing
Algorithm Table
SPF Tree
The LSAs received by a router are added to the router’s link-state database. Each entry
consists of the following parameters:
■ Neighbor ID
■ Link cost
The router runs the SPF algorithm on the link-state database, building the shortest path
tree for that router. This SPF tree is used to populate the routing table. In link-state
algorithms, each router builds a picture of the entire network in its routing tables. The
link-state algorithms send small updates everywhere, while distance vector algorithms
send larger updates only to neighboring routers. Because they converge more quickly,
link-state algorithms are less likely to cause routing loops than distance vector algorithms.
However, link-state algorithms require more CPU power and memory than distance vector
algorithms, and they can be more expensive to implement and support. Link-state proto-
cols are generally more scalable than distance vector protocols. OSPF is an example of a
link-state protocol.
Table 6-2 compares a distance vector protocol with a link-state protocol.
Technet24
||||||||||||||||||||
Supervisor components
URIB Adjacency Manager (AM)
The unicast routing information base (URIB) exists on the active supervisor. It maintains
the routing table with directly connected routes, static routes, and routes learned from
dynamic unicast routing protocols. The unicast RIB also collects adjacency information
||||||||||||||||||||
from sources such as the Address Resolution Protocol (ARP). The unicast RIB determines
the best next hop for a given route and populates the FIB by using the services of the
unicast forwarding information base (FIB) distribution module (uFDM). Each dynamic
routing protocol must update the unicast RIB for any route that has timed out. The uni-
cast RIB then deletes that route and recalculates the best next hop for that route (if an
alternate path is available).
The adjacency manager (AM) exists on the active supervisor and maintains adjacency
information for different protocols, including ARP, Neighbor Discovery Protocol (NDP),
and static configuration. The most basic adjacency information is the Layer 3 to Layer 2
address mapping discovered by these protocols. Outgoing Layer 2 packets use the adja-
cency information to complete the Layer 2 header. The adjacency manager can trigger
ARP requests to find a particular Layer 3 to Layer 2 mapping. The new mapping becomes
available when the corresponding ARP reply is received and processed. For IPv6, the adja-
cency manager finds the Layer 3 to Layer 2 mapping information from NDP.
The unicast FIB distribution module (uFDM) exists on the active supervisor and distrib-
utes the forwarding path information from the unicast RIB and other sources. The unicast
RIB generates forwarding information that the unicast FIB (UFIB) programs into the
hardware forwarding tables on the standby supervisor and the modules. The unicast FDM
also downloads the FIB information to newly inserted modules. The unicast FDM gathers
adjacency information, rewrite information, and other platform-dependent information
when updating routes in the unicast FIB. The adjacency and rewrite information consists
of interface, next hop, and Layer 3 to Layer 2 mapping information. The interface and
next-hop information is received in route updates from the unicast RIB. The Layer 3 to
Layer 2 mapping is received from the adjacency manager.
The UFIB exists on supervisors and switching modules and builds the information used
for the hardware forwarding engine. The UFIB receives route updates from the uFDM
and sends the information to be programmed in the hardware forwarding engine. The
UFIB controls the addition, deletion, and modification of routes, paths, and adjacen-
cies. The unicast FIBs are maintained on a per-VRF and per-address-family basis (that is,
one for IPv4 and one for IPv6 for each configured VRF; for more about VRFs, refer to
Chapter 5, “Switch Virtualization”). Based on route update messages, the UFIB maintains
a per-VRF prefix and next-hop adjacency information database. The next-hop adjacency
data structure contains the next-hop IP address and the Layer 2 rewrite information.
Multiple prefixes could share a next-hop adjacency information structure.
Technet24
||||||||||||||||||||
The software forwarding path in Cisco NX-OS is used mainly to handle features that are
not supported in the hardware or to handle errors encountered during the hardware pro-
cessing. Typically, packets with IP options or packets that need fragmentation are passed
to the CPU on the active supervisor. All packets that should be switched in the software
or terminated go to the supervisor. The supervisor uses the information provided by the
unicast RIB and the adjacency manager to make the forwarding decisions. The module is
not involved in the software forwarding path. Software forwarding is controlled by con-
trol plane policies and rate limiters.
RIPv2 on NX-OS
The Routing Information Protocol (RIP) is a distance vector protocol that uses a hop
count as its metric. The hop count is the number of routers a packet can traverse before
reaching its destination. A directly connected network has a metric of 1; an unreachable
network has a metric of 16. This small range of metrics makes RIP an unsuitable routing
protocol for large networks. RIP is an Interior Gateway Protocol (IGP), which means that
it performs routing within a single autonomous system. RIP uses User Datagram Protocol
(UDP) data packets to exchange routing information. RIPv2 supports IPv4.
■ Request: Sent to the multicast address 224.0.0.9 to request route updates from other
RIP-enabled routers.
■ Response: Sent every 30 seconds by default. The router also sends response mes-
sages after it receives a request message. The response message contains the entire
RIP route table. RIP sends multiple response packets for a request if the RIP routing
table cannot fit in one response packet.
RIP uses an optional authentication feature supported by the RIPv2 protocol. You can
configure authentication on RIP messages to prevent unauthorized or invalid routing
updates in your network. Cisco NX-OS supports a simple password or an MD5 authen-
tication digest. RIPv2 supports key-based authentication using keychains for password
management. Keychains are nothing but a sequence of keys.
You can use split horizon to ensure that RIP never advertises a route out of the inter-
face where it was learned. Split horizon is a method that controls the sending of RIP
update and query packets. When you enable split horizon on an interface, Cisco NX-OS
does not send update packets for destinations that were learned from this interface.
Controlling update packets in this manner reduces the possibility of routing loops. You
can use split horizon with poison reverse to configure an interface to advertise routes
learned by RIP as unreachable over the interface that learned the routes. By default, split
horizon is enabled on all interfaces.
You can configure a route policy on a RIP-enabled interface to filter the RIP updates.
Cisco NX-OS updates the route table with only those routes that the route policy allows.
||||||||||||||||||||
You can configure multiple summary aggregate addresses for a specified interface. Route
summarization simplifies route tables by replacing a number of more-specific addresses
with an address that represents all the specific addresses. For example, you can replace
10.1.1.0/24, 10.1.2.0/24, and 10.1.3.0/24 with one summary address, 10.1.0.0/16. If more
specific routes are in the routing table, RIP advertises the summary address from the
interface with a metric equal to the maximum metric of the more specific routes.
You can use RIP to redistribute static routes or routes from other protocols. You must
configure a route map with the redistribution to control which routes are passed into
RIP. A route policy allows you to filter routes based on attributes such as the destination,
origination protocol, route type, route tag, and so on. Whenever you redistribute routes
into a RIP routing domain, Cisco NX-OS does not, by default, redistribute the default
route into the RIP routing domain. You can generate a default route into RIP, which can
be controlled by a route policy. You also configure the default metric that is used for all
imported routes into RIP.
You can use load balancing to allow a router to distribute traffic over all the router
network ports that are the same distance from the destination address. Load balancing
increases the usage of network segments and increases effective network bandwidth.
Cisco NX-OS supports the Equal-Cost Multipath (ECMP) feature with up to 16 equal-
cost paths in the RIP route table and the unicast RIB. You can configure RIP to load-
balance traffic across some or all of those paths.
Cisco NX-OS supports stateless restarts for RIP. After a reboot or supervisor switcho-
ver, Cisco NX-OS applies the running configuration, and RIP immediately sends request
packets to repopulate its routing table.
Cisco NX-OS supports multiple instances of RIP that run on the same system. RIP sup-
ports virtual routing and forwarding (VRF) instances. VRFs exist within virtual device
contexts (VDCs). You can configure up to four RIP instances on a VDC. By default,
Cisco NX-OS places you in the default VDC and default VRF unless you specifically
configure another VDC and VRF.
Note Cisco NX-OS does not support RIPv1. If Cisco NX-OS receives a RIPv1 packet,
it logs a message and drops the packet. In this chapter, RIP and RIPv2 are used
interchangeably.
Configuring basic RIP is a multistep process. The following are the steps to enable a basic
configuration of RIP:
Technet24
||||||||||||||||||||
Table 6-3 summarizes the NX-OS CLI commands that are related to basic RIPv2
configuration and verification.
Table 6-3 Summary of NX-OS CLI Commands for RIPv2 Configuration and
Verification
Command Purpose
configure terminal Enters global configuration mode.
[no] feature rip Enables the RIP feature.
||||||||||||||||||||
Command Purpose
show ip rip [instance instance- Displays the RIP neighbor table.
tag] neighbor [interface-type
number] [vrf vrf-name]
show ip route Displays routes from the unicast routing information
base (URIB).
show ip rip [instance instance- Displays the RIP route table.
tag] route [ip-prefix/length [longer-
prefixes | shorter-prefixes]]
[summary] [vrf vrf-name]
show ip rip [instance instance- Displays RIP information for an interface.
tag] interface [interface-type slot/
port] [vrf vrf-name] [detail]
Examples 6-1 through 6-4 show the basic RIPv2 configuration and verification on the
sample topology shown in Figure 6-7. In this example, we will configure the MD5 authen-
tication on the interface connecting the two Nexus switches.
N7K N9K-A
10.10.10.10/32 – Lo0 20.20.20.20/32 – Lo0
Technet24
||||||||||||||||||||
! Since RIP feature was not enabled, Enabling RIP feature, and confirming the same.
N7K(config)# feature rip
N7K(config)# show feature | in rip
rip 1 enabled(not-running)
rip 2 enabled(not-running)
rip 3 enabled(not-running)
rip 4 enabled(not-running)
! Configuring RIP instance and keychain named MYKEYS having only one key with key-
string cisco.
N7K(config)# router rip DCFNDU
N7K(config-router)# key chain MYKEYS
N7K(config-keychain)# key 1
N7K(config-keychain-key)# key-string cisco
N7K(config-keychain-key)# end
! Enabling RIP and configuring MD5 authentication using keychain MYKEYS on interface
connected to N9K-A.
N7K# configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
N7K(config)# interface Ethernet 6/8
N7K(config-if)# ip router rip DCFNDU
N7K(config-if)# ip rip authentication mode md5
N7K(config-if)# ip rip authentication key-chain MYKEYS
||||||||||||||||||||
! Since RIP feature was not enabled, Enabling RIP feature, and confirming the same.
N9K-A(config)# feature rip
N9K-A(config)# show feature | in rip
rip 1 enabled(not-running)
rip 2 enabled(not-running)
rip 3 enabled(not-running)
rip 4 enabled(not-running)
! Configuring RIP instance and keychain named MYKEYS having only one key with key-
string cisco.
N9K-A(config)# router rip DCFNDU
N9K-A(config-router)# key chain MYKEYS
N9K-A(config-keychain)# key 1
N9K-A(config-keychain-key)# key-string cisco
N9K-A(config-keychain-key)# end
! Enabling RIP and configuring MD5 authentication using keychain MYKEYS on interface
connected to N7K.
N9K-A# configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
N9K-A(config)# interface Ethernet 1/49
N9K-A(config-if)# ip router rip DCFNDU
N9K-A(config-if)# ip rip authentication mode md5
N9K-A(config-if)# ip rip authentication key-chain MYKEYS
Technet24
||||||||||||||||||||
192.168.1.46, Ethernet6/8
Last Response sent/received: 00:00:17/00:00:23
Last Request sent/received: dead/dead
Bad Pkts Received: 0
Bad Routes Received: 0
! Verifying unicast routing table. N9K-A Loopback 0 interface is learned via RIP.
N7K# show ip route
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
||||||||||||||||||||
192.168.1.45, Ethernet1/49
Last Response sent/received: 00:00:22/00:00:18
Last Request sent/received: dead/never
Bad Pkts Received: 0
Bad Routes Received: 0
! Verifying unicast routing table. N7K Loopback 0 interface is learned via RIP.
N9K-A# show ip route
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
Technet24
||||||||||||||||||||
EIGRP on NX-OS
The Enhanced Interior Gateway Routing Protocol (EIGRP) is a unicast routing proto-
col that has the characteristics of both distance vector and link-state routing protocols.
EIGRP relies on its neighbors to provide the routes. It constructs the network topology
from the routes advertised by its neighbors, similar to a link-state protocol, and uses this
information to select loop-free paths to destinations.
EIGRP sends out periodic Hello messages for neighbor discovery. Once EIGRP learns a
new neighbor, it sends a one-time update of all the local EIGRP routes and route metrics.
The receiving EIGRP router calculates the route distance based on the received metrics
and the locally assigned cost of the link to that neighbor. After this initial full route table
update, EIGRP sends incremental updates to only those neighbors affected by the route
change. This process speeds convergence and minimizes the bandwidth used by EIGRP.
||||||||||||||||||||
EIGRP uses Reliable Transport Protocol, which includes the following message types:
■ Hello: Used for neighbor discovery and recovery. By default, EIGRP sends a periodic
multicast Hello message on the local network at the configured hello interval. By
default, the hello interval is 5 seconds.
■ Queries and replies: Sent as part of the Diffusing Update Algorithm (DUAL) used
by EIGRP.
DUAL calculates the routing information based on the destination networks in the topol-
ogy table. The topology table includes the following information:
■ IPv4 address/mask: The network address and network mask for this destination.
■ Successors: The IP address and local interface connection for all feasible successors
or neighbors that advertise a shorter distance to the destination than the current fea-
sible distance.
■ Feasibility distance (FD): The lowest calculated distance to the destination. The fea-
sibility distance is the sum of the advertised distance from a neighbor plus the cost
of the link to that neighbor.
DUAL uses the distance metric to select efficient, loop-free paths. DUAL selects routes to
insert into the unicast RIB based on feasible successors. When a topology change occurs,
DUAL looks for feasible successors in the topology table. If there are feasible successors,
DUAL selects the feasible successor with the lowest feasible distance and inserts that into
the unicast RIB, thus avoiding unnecessary recomputation. When there are no feasible
successors but there are neighbors advertising the destination, DUAL transitions from
the passive state to the active state and triggers a recomputation to determine a new suc-
cessor or next-hop router to the destination. The amount of time required to recompute
the route affects the convergence time. EIGRP sends Query messages to all neighbors,
searching for feasible successors. Neighbors that have a feasible successor send a Reply
message with that information. Neighbors that do not have feasible successors trigger a
DUAL recomputation.
When a topology change occurs, EIGRP sends an Update message with only the changed
routing information to affected neighbors. This Update message includes the distance
information to the new or updated network destination. The distance information in
EIGRP is represented as a composite of available route metrics, including bandwidth,
delay, load utilization, and link reliability. Each metric has an associated weight that deter-
mines if the metric is included in the distance calculation. You can configure these metric
weights. You can fine-tune link characteristics to achieve optimal paths, but using the
default settings for configurable metrics is recommended.
Technet24
||||||||||||||||||||
Internal routes are routes that occur between neighbors within the same EIGRP
autonomous system. These routes have the following metrics:
■ Delay: The sum of the delays configured on the interfaces that make up the route to
the destination network. The delay is configured in tens of microseconds.
■ MTU: The smallest maximum transmission unit value along the route to the destina-
tion.
■ Hop count: The number of hops or routers that the route passes through to the des-
tination. This metric is not directly used in the DUAL computation.
By default, EIGRP uses the bandwidth and delay metrics to calculate the distance to the
destination. You can modify the metric weights to include the other metrics in the calcu-
lation.
External routes are routes that occur between neighbors in different EIGRP autonomous
systems. These routes have the following metrics:
■ Router ID: The ID of the router that redistributed this route into EIGRP.
■ Protocol ID: A code that represents the routing protocol that learned the destination
route.
■ Metric: The route metric for this route from the external routing protocol.
EIGRP adds all learned routes to the EIGRP topology table and the unicast RIB. When
a topology change occurs, EIGRP uses these routes to search for a feasible successor.
EIGRP also listens for notifications from the unicast RIB for changes in any routes redis-
tributed to EIGRP from another routing protocol. You can also configure an interface as a
passive interface for EIGRP. A passive interface does not participate in EIGRP adjacency,
but the network address for the interface remains in the EIGRP topology table.
||||||||||||||||||||
configure a password that is shared at the local router and all remote EIGRP neighbors.
When an EIGRP message is created, Cisco NX-OS creates an MD5 one-way message
digest based on the message itself and the encrypted password and sends this digest
along with the EIGRP message. The receiving EIGRP neighbor validates the digest using
the same encrypted password. If the message has not changed, the calculation is identi-
cal, and the EIGRP message is considered valid. MD5 authentication also includes a
sequence number with each EIGRP message that is used to ensure no message is replayed
in the network.
You can use the EIGRP stub routing feature to improve network stability, reduce resource
usage, and simplify stub router configuration. Stub routers connect to the EIGRP net-
work through a remote router. When using EIGRP stub routing, you need to configure
the distribution and remote routers to use EIGRP and configure only the remote router as
a stub. EIGRP stub routing does not automatically enable summarization on the distribu-
tion router. In most cases, you need to configure summarization on the distribution rout-
ers. EIGRP stub routing allows you to prevent queries to the remote router.
You can configure a summary aggregate address for a specified interface. Route sum-
marization simplifies route tables by replacing a number of more-specific addresses
with an address that represents all the specific addresses. For example, you can replace
10.1.1.0/24, 10.1.2.0/24, and 10.1.3.0/24 with one summary address, 10.1.0.0/16. If more
specific routes are in the routing table, EIGRP advertises the summary address from the
interface with a metric equal to the minimum metric of the more specific routes.
You can use EIGRP to redistribute static routes, routes learned by other EIGRP autono-
mous systems, or routes from other protocols. You must configure a route map with the
redistribution to control which routes are passed into EIGRP. A route map allows you to
filter routes based on attributes such as the destination, origination protocol, route type,
route tag, and so on. You also configure the default metric that is used for all imported
routes into EIGRP. You use distribute lists to filter routes from routing updates.
You can use load balancing to allow a router to distribute traffic over all the router
network ports that are the same distance from the destination address. Load balancing
increases the usage of network segments, which increases effective network bandwidth.
Cisco NX-OS supports the equal-cost multipath (ECMP) feature with up to 16 equal-
cost paths in the EIGRP route table and the unicast RIB. You can configure EIGRP to
load-balance traffic across some or all of those paths.
You can use split horizon to ensure that EIGRP never advertises a route out of the inter-
face where it was learned. Split horizon is a method that controls the sending of EIGRP
update and query packets. When you enable split horizon on an interface, Cisco NX-OS
does not send update and query packets for destinations that were learned from this
interface. Controlling update and query packets in this manner reduces the possibility of
routing loops. Split horizon with poison reverse configures EIGRP to advertise a learned
route as unreachable back through the interface from which EIGRP learned the route. By
default, the split horizon feature is enabled on all interfaces.
EIGRP supports virtual routing and forwarding (VRF) instances. Cisco NX-OS supports
multiple instances of EIGRP that run on the same system. Every instance uses the same
system router ID. You can optionally configure a unique router ID for each instance.
Technet24
||||||||||||||||||||
Configuring basic EIGRP is a multistep process. The following are the steps to enable a
basic configuration of EIGRP:
Table 6-4 summarizes the NX-OS CLI commands related to basic EIGRP configuration
and verification.
Table 6-4 Summary of NX-OS CLI Commands for EIGRP Configuration and
Verification
Command Purpose
configure terminal Enters global configuration mode.
[no] feature eigrp Enables the EIGRP feature.
||||||||||||||||||||
Command Purpose
key-string [encryption-type] text- Configures the text string for the key. The text-string
string argument is alphanumeric, case-sensitive, and supports
special characters.
Technet24
||||||||||||||||||||
Examples 6-5 through 6-8 show the basic EIGRP configuration and verification on
the sample topology shown in Figure 6-8. In this example, we will configure the MD5
authentication on the interface connecting the two Nexus switches.
N7K N9K-A
10.10.10.10/32 – Lo0 20.20.20.20/32 – Lo0
! Since EIGRP feature was not enabled, Enabling EIGRP feature, and confirming the
same.
N7K(config)# feature eigrp
N7K(config)# show feature | in eigrp
eigrp 1 enabled(not-running)
eigrp 2 enabled(not-running)
eigrp 3 enabled(not-running)
<output omitted>
||||||||||||||||||||
! Configuring keychain named MYKEYS having only one key with key-string cisco.
N7K(config)# key chain MYKEYS
N7K(config-keychain)# key 1
N7K(config-keychain-key)# key-string cisco
N7K(config-keychain-key)# end
! Enabling EIGRP and configuring MD5 authentication using keychain MYKEYS on inter-
face connected to N9K-A.
N7K# configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
N7K(config)# interface Ethernet 6/8
N7K(config-if)# ip router eigrp DCFNDU
N7K(config-if)# ip authentication key-chain eigrp DCFNDU MYKEYS
N7K(config-if)# ip authentication mode eigrp DCFNDU md5
! Configuring keychain named MYKEYS having only one key with key-string cisco.
N9K-A(config)# key chain MYKEYS
N9K-A(config-keychain)# key 1
Technet24
||||||||||||||||||||
! Enabling EIGRP and configuring MD5 authentication using keychain MYKEYS on inter-
face connected to N7K.
N9K-A# configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
N9K-A(config)# interface Ethernet 1/49
N9K-A(config-if)# ip router eigrp DCFNDU
N9K-A(config-if)# ip authentication key-chain eigrp DCFNDU MYKEYS
N9K-A(config-if)# ip authentication mode eigrp DCFNDU md5
! Verifying unicast routing table. N9K-A Loopback 0 interface is learned via EIGRP.
N7K# show ip route
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
||||||||||||||||||||
Technet24
||||||||||||||||||||
! Verifying unicast routing table. N7K Loopback 0 interface is learned via EIGRP.
N9K-A# show ip route
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
||||||||||||||||||||
Technet24
||||||||||||||||||||
OSPFv2 on NX-OS
OSPFv2 is a link-state protocol for IPv4 networks. An OSPFv2 router sends a special
message, called a hello packet, out each OSPF-enabled interface to discover other
OSPFv2 neighbor routers. Once a neighbor is discovered, the two routers compare infor-
mation in the Hello packet to determine if the routers have compatible configurations.
The neighbor routers try to establish adjacency, which means that the routers synchro-
nize their link-state databases to ensure they have identical OSPFv2 routing information.
Adjacent routers share link-state advertisements (LSAs) that include information about the
operational state of each link, the cost of the link, and any other neighbor information.
The routers then flood these received LSAs out every OSPF-enabled interface so that all
OSPFv2 routers eventually have identical link-state databases. When all OSPFv2 rout-
ers have identical link-state databases, the network is converged. Each router then uses
Dijkstra’s Shortest Path First (SPF) algorithm to build its route table.
OSPFv2 routers periodically send Hello packets on every OSPF-enabled interface. The
hello interval determines how frequently the router sends these Hello packets and is con-
figured per interface. OSPFv2 uses Hello packets for the following tasks:
■ Neighbor discovery: The Hello packet contains information about the originating
OSPFv2 interface and router, including the assigned OSPFv2 cost of the link, the
hello interval, and optional capabilities of the originating router. An OSPFv2 inter-
face that receives these Hello packets determines if the settings are compatible with
the receiving interface settings. Compatible interfaces are considered neighbors and
are added to the neighbor table.
||||||||||||||||||||
configured dead interval (usually a multiple of the hello interval), the neighbor is
removed from the local neighbor table.
■ Bidirectional communications: Hello packets also include a list of router IDs for the
routers that the originating interface has communicated with. If the receiving inter-
face sees its own router ID in this list, bidirectional communication has been estab-
lished between the two interfaces.
■ Designated router election: Networks with multiple routers present a unique situ-
ation for OSPF. If every router floods the network with LSAs, the same link-state
information is sent from multiple sources. Depending on the type of network,
OSPFv2 might use a single router, the designated router (DR), to control the LSA
floods and represent the network to the rest of the OSPFv2 area. If the DR fails,
OSPFv2 selects a backup designated router (BDR) and uses the BDR. The DR and
BDR are selected based on the information in the Hello packet. When an interface
sends a Hello packet, it sets the priority field and the DR and BDR field if it knows
who the DR and BDR are. The routers follow an election procedure based on which
routers declare themselves in the DR and BDR fields and the priority field in the
Hello packet. As a final tie breaker, OSPFv2 chooses the highest router IDs as the DR
and BDR. All other routers establish adjacency with the DR and the BDR and use the
IPv4 multicast address 224.0.0.6 to send LSA updates to the DR and BDR. DRs are
based on a router interface. A router might be the DR for one network and not for
another network on a different interface.
■ Point-to-point: A network that exists only between two routers. All neighbors on a
point-to-point network establish adjacency, and there is no DR.
■ Broadcast: A network with multiple routers that can communicate over a shared
medium that allows broadcast traffic, such as Ethernet. OSPFv2 routers establish a
DR and a BDR that controls LSA flooding on the network. OSPFv2 uses the well-
known IPv4 multicast addresses 224.0.0.5 and the MAC address 0100.5e00.0005 to
communicate with neighbors.
■ Hello interval
■ Dead interval
■ Area ID
■ Authentication
■ Optional capabilities
Technet24
||||||||||||||||||||
If there is a match, the following information is entered into the neighbor table:
■ Priority: Priority of the neighbor. The priority is used for designated router election.
■ State: Indication of whether the neighbor has just been heard from, is in the process
of setting up bidirectional communications, is sharing the link-state information, or
has achieved full adjacency.
■ Dead time: Indication of the time since the last Hello packet was received from this
neighbor.
■ Designated router: Indication of whether the neighbor has been declared as the des-
ignated router or as the backup designated router.
■ Local interface: The local interface that received the Hello packet for this neighbor.
You can divide OSPFv2 networks into areas. Routers send most LSAs only within one
area, which reduces the CPU and memory requirements for an OSPF-enabled router. An
area is a logical division of routers and links within an OSPFv2 domain that creates sepa-
rate subdomains. LSA flooding is contained within an area, and the link-state database
is limited to links within the area. You can assign an area ID to the interfaces within the
defined area. The area ID is a 32-bit value that you can enter as a number or in dotted
decimal notation, such as 10.2.3.1. Cisco NX-OS always displays the area in dotted deci-
mal notation. If you define more than one area in an OSPFv2 network, you must also
define the backbone area, which has the reserved area ID of 0. If you have more than one
area, then one or more routers become area border routers (ABRs).
Figure 6-9 shows how an ABR connects to both the backbone area and at least one other
defined area.
The ABR has a separate link-state database for each area to which it connects. The ABR
sends Network Summary (type 3) LSAs from one connected area to the backbone area.
The backbone area sends summarized information about one area to another area. In
Figure 6-9, Area 0 sends summarized information about Area 5 to Area 3.
||||||||||||||||||||
ABR1
Area 3
Area 0
ABR2
Area 5
OSPFv2 defines one other router type: the autonomous system boundary router (ASBR).
This router connects an OSPFv2 area to another autonomous system. An autonomous sys-
tem is a network controlled by a single technical administration entity. OSPFv2 can redis-
tribute its routing information into another autonomous system or receive redistributed
routes from another autonomous system.
Each OSPFv2 interface is assigned a link cost. The cost is an arbitrary number. By default,
Cisco NX-OS assigns a cost that is the configured reference bandwidth divided by the
interface bandwidth. By default, the reference bandwidth is 40Gbps. The link cost is car-
ried in the LSA updates for each link.
OSPFv2 uses link-state advertisements (LSAs) to build its routing table. When an OSPFv2
router receives an LSA, it forwards that LSA out every OSPF-enabled interface, flooding
the OSPFv2 area with this information. This LSA flooding guarantees that all routers in
the network have identical routing information.
Technet24
||||||||||||||||||||
Opaque LSAs allow you to extend OSPF functionality. Opaque LSAs consist of a stan-
dard LSA header followed by application-specific information. This information might be
used by OSPFv2 or by other applications. The three Opaque LSA types are defined as
follows:
Each router maintains a link-state database for the OSPFv2 network. This database con-
tains all the collected LSAs and includes information on all the routes through the net-
work. OSPFv2 uses this information to calculate the best path to each destination and
populates the routing table with these best paths. LSAs are removed from the link-state
database if no LSA update has been received within a set interval, called the MaxAge.
Routers flood a repeat of the LSA every 30 minutes to prevent accurate link-state infor-
mation from being aged out.
OSPFv2 runs the Dijkstra shortest path first algorithm on the link-state database. This
algorithm selects the best path to each destination based on the sum of all the link costs
for each link in the path. The resultant shortest path for each destination is then put in
the OSPFv2 route table. When the OSPFv2 network is converged, this route table feeds
into the unicast RIB.
||||||||||||||||||||
In OSPFv2, you can limit the amount of external routing information that floods an area
by making it a stub area. A stub area is an area that does not allow AS External (type 5)
LSAs. These LSAs are usually flooded throughout the local autonomous system to propa-
gate external route information. Stub areas use a default route for all traffic that needs to
go through the backbone area to the external autonomous system.
A not-so-stubby area (NSSA) is similar to a stub area, except that an NSSA allows you
to import autonomous system external routes within an NSSA using redistribution. The
NSSA ASBR redistributes these routes and generates NSSA External (type 7) LSAs that
it floods throughout the NSSA. You can optionally configure the ABR that connects the
NSSA to other areas to translate this NSSA External LSA to AS External (type 5) LSAs.
The ABR then floods these AS External LSAs throughout the OSPFv2 autonomous sys-
tem. Summarization and filtering are supported during the translation. The backbone
Area 0 cannot be an NSSA. You can, for example, use NSSA to simplify administration
if you are connecting a central site using OSPFv2 to a remote site that is using a differ-
ent routing protocol. Before NSSA, the connection between the corporate site border
router and a remote router could not be run as an OSPFv2 stub area because routes for
the remote site could not be redistributed into a stub area. With NSSA, you can extend
OSPFv2 to cover the remote connection by defining the area between the corporate
router and remote router as an NSSA.
Virtual links allow you to connect an OSPFv2 area ABR to a backbone area ABR when a
direct physical connection is not available. You can also use virtual links to temporarily
recover from a partitioned area, which occurs when a link within the area fails, isolating
part of the area from reaching the designated ABR to the backbone area.
OSPFv2 can learn routes from other routing protocols by using route redistribution. You
configure OSPFv2 to assign a link cost for these redistributed routes or a default link
Technet24
||||||||||||||||||||
cost for all redistributed routes. Route redistribution uses route maps to control which
external routes are redistributed. You must configure a route map with the redistribution
to control which routes are passed into OSPFv2. A route map allows you to filter routes
based on attributes such as the destination, origination protocol, route type, route tag,
and so on. You can use route maps to modify parameters in the AS External (type 5)
and NSSA External (type 7) LSAs before these external routes are advertised in the local
OSPFv2 autonomous system.
Because OSPFv2 shares all learned routes with every OSPF-enabled router, you can use
route summarization to reduce the number of unique routes that are flooded to every
OSPF-enabled router. Route summarization simplifies route tables by replacing more-
specific addresses with an address that represents all the specific addresses. For example,
you can replace 10.1.1.0/24, 10.1.2.0/24, and 10.1.3.0/24 with one summary address,
10.1.0.0/16. Typically, you would summarize at the boundaries of area border routers
(ABRs). Although you could configure summarization between any two areas, it is bet-
ter to summarize in the direction of the backbone so that the backbone receives all the
aggregate addresses and injects them, already summarized, into other areas. The two
types of summarization are as follows:
Cisco NX-OS supports multiple instances of the OSPFv2 protocol that run on the same
node. You cannot configure multiple instances over the same interface. By default, every
instance uses the same system router ID. You must manually configure the router ID for
each instance if the instances are in the same OSPFv2 autonomous system.
Configuring basic OSPFv2 is a multistep process. The following are the steps to enable a
basic configuration of OSPFv2:
Table 6-6 summarizes the NX-OS CLI commands related to basic OSPFv2 configuration
and verification.
||||||||||||||||||||
Table 6-6 Summary of NX-OS CLI Commands for OSPFv2 Configuration and
Verification
Command Purpose
configure terminal Enters global configuration mode.
[no] feature ospf Enables the OSPFv2 feature.
[no]router ospf instance-tag Creates a new OSPFv2 instance with the configured
instance tag.
(Optional) router-id ip-address Configures the OSPFv2 router ID. This IP address
identifies this OSPFv2 instance and must exist on a
configured interface in the system.
interface interface-type slot/port Enters interface configuration mode.
ip router ospf instance-tag area Adds the interface to the OSPFv2 instance and area.
area-id [secondaries none]
ip ospf network { broadcast | Sets the network type.
point-to-point }
ip ospf authentication [message- Enables interface authentication mode for OSPFv2
digest] for either cleartext or message digest type. Overrides
area-based authentication for this interface. All neigh-
bors must share this authentication type.
ip ospf message-digest-key key- Configures message digest authentication for this
id md5 [0 | 3 | 7] key interface. Use this command if the authentication is
set to message digest. The key-id range is from 1 to
255. The MD5 options are as follows:
Technet24
||||||||||||||||||||
Command Purpose
show ip route Displays routes from the unicast routing information
base (URIB).
show ip ospf route [ ospf- Displays the internal OSPFv2 routes.
route ] [ summary ] [ vrf { vrf-
name | all | default | management }]
Examples 6-9 through 6-14 show the basic OSPFv2 configuration and verification on
the sample topology shown in Figure 6-10. The link between N7K and N9K-A is con-
figured as a point-to-point network in Area 0, and the link between N7K and N9K-B is
configured as a broadcast network in Area 1. In this example, we will configure the MD5
authentication on the interfaces. Router N7K will act as an area border router.
N9K-A
20.20.20.20/32 – Lo0
2.2.2.2 – Router ID
0 1/49
Area Eth
30
.44/ rk
2 . 1 68.1 Netwo
19 -Poin t
6/8 t-to
Eth Poin
Eth
6/9 Bro
ad
192 cast N
.168 et
N7K (ABR) .1.4 work
8/30
10.10.10.10/32 – Lo0
1.1.1.1 – Router ID Are Eth
a1 1/4
9
N9K-B
30.30.30.30/32 – Lo0
3.3.3.3 – Router ID
||||||||||||||||||||
! Since OSPF feature was not enabled, Enabling OSPF feature, and confirming the
same.
N7K(config)# feature ospf
N7K(config)# show feature | in ospf
ospf 1 enabled(not-running)
ospf 2 enabled(not-running)
ospf 3 enabled(not-running)
<output omitted>
! Configuring Loopback 0 and Ethernet 6/8 in ospf area 0 and Ethernet 6/9 in ospf
area 1. Interface Ethernet 6/8 is configured as ospf point-to-point network and
Ethernet 6/9 is configured as ospf broadcast network.
N7K(config-router)# interface loopback 0
N7K(config-if)# ip router ospf 1 area 0
N7K(config-if)# exit
N7K(config)# interface Ethernet 6/8
N7K(config-if)# ip ospf network point-to-point
N7K(config-if)# ip router ospf 1 area 0
N7K(config-if)# int eth 6/9
N7K(config-if)# ip ospf network broadcast
N7K(config-if)# ip router ospf 1 area 1
! Configuring md5 authentication on Ethernet 6/8 and Ethernet 6/9 interface with
message-digest-key as cisco.
N7K(config-if)# interface Ethernet 6/8, Ethernet 6/9
N7K(config-if-range)# ip ospf authentication message-digest
N7K(config-if-range)# ip ospf message-digest-key 1 md5 cisco
N7K(config-if-range)# end
N7K#
Technet24
||||||||||||||||||||
! Configuring Loopback 0 and Ethernet 1/49 in ospf area 0. Interface Ethernet 1/49
is configured as ospf point-to-point network with cisco as message-digest key for
md5 authentication.
N9K-A(config-router)# interface loopback 0
N9K-A(config-if)# ip router ospf 1 area 0
N9K-A(config-if)# interface Ethernet 1/49
N9K-A(config-if)# ip ospf network point-to-point
N9K-A(config-if)# ip router ospf 1 area 0
N9K-A(config-if)# ip ospf authentication message-digest
N9K-A(config-if)# ip ospf message-digest-key 1 md5 cisco
N9K-A(config-if)# end
N9K-A#
||||||||||||||||||||
! Configuring Loopback 0 and Ethernet 1/49 in ospf area 1. Interface Ethernet 1/49
is configured as ospf broadcast network with cisco as message-digest key for md5
authentication.
N9K-B(config-router)# interface loopback 0
N9K-B(config-if)# ip router ospf 1 area 1
N9K-B(config-if)# interface Ethernet 1/49
N9K-B(config-if)# ip ospf network broadcast
N9K-B(config-if)# ip router ospf 1 area 1
N9K-A(config-if)# ip ospf authentication message-digest
N9K-A(config-if)# ip ospf message-digest-key 1 md5 cisco
N9K-B(config-if)# end
N9K-B#
! Verifying unicast routing table. N9K-A and N9K-B Loopback 0 interfaces are learned
via OSPF.
N7K# show ip route
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
Technet24
||||||||||||||||||||
||||||||||||||||||||
! Verifying unicast routing table. N7K and N9K-B Loopback 0 interfaces are learned
via OSPF.
N9K-A# show ip route
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
Technet24
||||||||||||||||||||
||||||||||||||||||||
! Verifying unicast routing table. N7K and N9K-A Loopback 0 interfaces are learned
via OSPF.
N9K-B# show ip route
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
Technet24
||||||||||||||||||||
N9K-B#
||||||||||||||||||||
Note There are a few more commands you can use to verify OSPF configuration,
including show ip ospf, show ip ospf neighbors detail, and show ip ospf interface.
Multicast Fundamentals
IP multicast is a method of forwarding the same set of IP packets to a number of hosts
within a network. You can use multicast in IPv4 networks to provide efficient delivery of
data to multiple destinations. Multicast involves both a method of delivery and discovery
of senders and receivers of multicast data, which is transmitted on IP multicast addresses
called groups. A multicast address that includes a group and source IP address is often
referred to as a channel. The Internet Assigned Number Authority (IANA) has assigned
224.0.0.0 through 239.255.255.255 as IPv4 multicast addresses.
The routers in the network listen for receivers to advertise their interest in receiving mul-
ticast data from selected groups. The routers then replicate and forward the data from
sources to the interested receivers. Multicast data for a group is transmitted only to those
LAN segments with receivers that requested it.
Source Trees
A source tree represents the shortest path that the multicast traffic takes through the
network from the sources that transmit to a particular multicast group to receivers that
requested traffic from that same group. Because of the shortest path characteristic of a
source tree, this tree is often referred to as a shortest path tree (SPT). Figure 6-11 shows a
source tree for group 224.1.1.1 that begins at host A and connects to hosts B and C.
Technet24
||||||||||||||||||||
Source
Host A
A B D F
C E
192.0.2.2 192.0.2.3
Receiver Receiver
Host B Host C
The notation (S, G) represents the multicast traffic from source S on group G. The SPT
in this figure is written (192.0.2.1, 224.1.1.1). Multiple sources can be transmitting on the
same group.
Shared Trees
A shared tree represents the shared distribution path that the multicast traffic takes
through the network from a shared root or rendezvous point (RP) to each receiver.
The RP creates an SPT to each source. A shared tree is also called an RP tree (RPT).
Figure 6-12 shows a shared tree for group 224.2.2.2 with the RP at router D. Source hosts
A and D send their data to router D, the RP, which then forwards the traffic to receiver
hosts B and C.
The notation (*, G) represents the multicast traffic from any source on group G. The
shared tree in this figure is written (*, 224.2.2.2).
||||||||||||||||||||
Source
Host A
A B D (RP) F
192.0.2.4
192.0.2.2 192.0.2.3
Receiver Receiver
Host B Host C
Technet24
||||||||||||||||||||
Source
Host A
224.2.2.2 Traffic
192.0.2.1
A B E
D (RP)
DF
192.0.2.2 192.0.2.3
Receiver Receiver
Host B Host C
The notation (*, G) represents the multicast traffic from any source on group G. The
bidirectional tree in the figure is written as (*, 224.2.2.2).
The forwarding and tree building process in bidirectional shared trees consists of three
main stages:
1. The DF is responsible for sending (*, G) joins toward the RP for the active bidirec-
tional group. Downstream routers address their (*, G) joins to upstream DFs. This is
accomplished by putting the IP address of the upstream DF in the upstream router
field of a PIM join message.
2. When the DF receives an (*, G) join, it adds the link to the OIL of the (*, G) entry and
joins toward the RP. If the interface exists in the OIL, the interface timer is refreshed.
||||||||||||||||||||
Note Steps 1 and 2 describe the bidirectional shared tree building process and step 3
describes the multicast forwarding process in BIDIR-PIM.
Multicast Forwarding
In unicast routing, traffic is routed through the network along a single path from the
source to the destination host. A unicast router does not consider the source address; it
considers only the destination address and how to forward the traffic toward that des-
tination. The router scans through its routing table for the destination address and then
forwards a single copy of the unicast packet out the correct interface in the direction of
the destination.
In multicast forwarding, the source is sending traffic to an arbitrary group of hosts that
are represented by a multicast group address. The multicast router must determine which
direction is the upstream direction (toward the source) and which is the downstream
direction (or directions). If there are multiple downstream paths, the router replicates the
packet and forwards it down the appropriate downstream paths (best unicast route met-
ric), which is not necessarily all paths. Forwarding multicast traffic away from the source,
rather than to the receiver, is called reverse path forwarding (RPF).
PIM uses the unicast routing information to create a distribution tree along the reverse
path from the receivers toward the source. The multicast routers then forward packets
along the distribution tree from the source to the receivers. RPF is a key concept in
multicast forwarding. It enables routers to correctly forward multicast traffic down the
distribution tree. RPF makes use of the existing unicast routing table to determine the
upstream and downstream neighbors. A router will forward a multicast packet only if it is
received on the upstream interface. This RPF check helps to guarantee that the distribu-
tion tree will be loop-free.
Because multicast traffic is destined for an arbitrary group of hosts, the router uses
reverse path forwarding (RPF) to route data to active receivers for the group. When
receivers join a group, a path is formed toward the RP (ASM mode). The path from a
source to a receiver flows in the reverse direction from the path that was created when
the receiver joined the group.
For each incoming multicast packet, the router performs an RPF check. If the packet
arrives on the interface leading to the source, the packet is forwarded out each interface
in the outgoing interface (OIF) list for the group. Otherwise, the router drops the packet.
Figure 6-14 shows an example of RPF checks on packets coming in from different inter-
faces. The packet that arrives on E0 fails the RPF check because the unicast route table
lists the source of the network on interface E1. The packet that arrives on E1 passes
the RPF check because the unicast route table lists the source of that network on
interface E1.
Technet24
||||||||||||||||||||
■ Internet Group Management Protocol (IGMP) is used between hosts on a LAN and
the routers on that LAN to track the multicast groups of which hosts are members.
■ Protocol-Independent Multicast (PIM) is used between routers so that they can track
which multicast packets to forward to each other and to their directly connected
LANs.
IGMP Version 1
In Version 1, only the following two types of IGMP messages exist:
■ Membership query
■ Membership report
Hosts send out IGMP membership reports corresponding to a particular multicast group
to indicate that they are interested in joining that group. The TCP/IP stack running on a
||||||||||||||||||||
host automatically sends the IGMP membership report when an application opens a
multicast socket. The router periodically sends out an IGMP membership query to verify
that at least one host on the subnet is still interested in receiving traffic directed to that
group. When there is no reply to three consecutive IGMP membership queries, the router
times out the group and stops forwarding traffic directed toward that group. IGMPv1
supports membership report suppression, which means that if two hosts on the same
subnet want to receive multicast data for the same group, the host that receives a member
report from the other host suppresses sending its report. Membership report suppression
occurs for hosts that share a port.
IGMP Version 2
IGMPv1 has been superseded by IGMP Version 2 (IGMPv2). IGMPv2 is backward
compatible with IGMPv1. In Version 2, the following four types of IGMP messages exist:
■ Membership query
■ Leave group
IGMP Version 2 works basically the same way as Version 1 and supports membership
report suppression. The main difference is that there is a leave group message. With
this message, the hosts can actively communicate to the local multicast router that they
intend to leave the group. The router then sends out a group-specific query and deter-
mines if any remaining hosts are interested in receiving the traffic. If there are no replies,
the router times out the group and stops forwarding the traffic. The addition of the leave
group message in IGMP Version 2 greatly reduces the leave latency compared to IGMP
Version 1. Unwanted and unnecessary traffic can be stopped much sooner.
IGMP Version 3
IGMPv3 adds support for “source filtering,” which enables a multicast receiver host to
signal to a router the groups from which it wants to receive multicast traffic as well as
from which sources this traffic is expected. This membership information enables Cisco
NX-OS software to forward traffic from only those sources from which receivers request-
ed the traffic.
IGMPv3 supports applications that explicitly signal sources from which they want to
receive traffic.
Technet24
||||||||||||||||||||
Note The NX-OS device supports IGMPv2 and IGMPv3 as well as IGMPv1 report
reception.
The MLD protocol is equivalent to IGMP in the IPv6 domain and is used by a host
to request multicast data for a particular group. MLDv1 is derived from IGMPv2, and
MLDv2 is derived from IGMPv3. Also, IGMP uses IP Protocol 2 message types, while
MLD uses IP Protocol 58 message types, which is a subset of the ICMPv6 messages. In
this chapter, we cover only IGMP for IPv4. To know more about MLD, refer to “Cisco
Nexus 9000 Series NX-OS Multicast Routing Configuration Guide” for the latest version
of NX-OS code.
In PIM ASM mode, the NX-OS software chooses a designated router (DR) from the rout-
ers on each network segment. The DR is responsible for forwarding multicast data for
specified groups and sources on that segment. The DR for each LAN segment is deter-
mined as described in the Hello messages. In ASM mode, the DR is responsible for uni-
casting PIM register packets to the RP. When a DR receives an IGMP membership report
from a directly connected receiver, the shortest path is formed to the RP, which may or
may not go through the DR. The result is a shared tree that connects all sources transmit-
ting on the same multicast group to all receivers of that group.
PIM-SSM
Source-Specific Multicast (SSM) is an extension of the PIM protocol that allows for an
efficient data delivery mechanism in one-to-many communications. SSM enables a receiv-
ing client once it has learned about a particular multicast source and receives
||||||||||||||||||||
content directly from the source, rather than receiving it using a shared RP. Source-
Specific Multicast (SSM) is a PIM mode that builds a source tree that originates at the
designated router on the LAN segment that receives a request to join a multicast source.
Source trees are built by sending PIM join messages in the direction of the source. The
SSM mode does not require any RP configuration. The SSM mode allows receivers to
connect to sources outside the PIM domain.
BIDIR-PIM
Bidirectional PIM (BIDIR-PIM) is an enhancement of the PIM protocol that was designed
for efficient many-to-many communications, where each participant is a receiver as well as a
sender. Multicast groups in bidirectional mode can scale to an arbitrary number of sources.
The shared trees created in PIM sparse mode are unidirectional. This means that a
source tree must be created to bring the data stream to the RP (the root of the shared
tree) and then it can be forwarded down the branches to the receivers. The traffic from
sources to the RP initially flows encapsulated in register messages. This activity presents
a significant burden because of the encapsulation and de-encapsulation mechanisms.
Additionally, an SPT is built between the RP and the source, which results in (S,G) entries
being created between the RP and the source.
BIDIR-PIM eliminates the registration/encapsulation process and the (S,G) state. Packets
are natively forwarded from a source to the RP using the (*, G) state only. This capabil-
ity ensures that only (*, G) entries appear in multicast forwarding tables. The path that is
taken by packets flowing from the participant (source or receiver) to the RP and from the
RP to the participant will be the same by using a bidirectional shared tree.
In BIDIR-PIM, the packet-forwarding rules have been improved over PIM SM, allowing
traffic to be passed up the shared tree toward the RP. To avoid multicast packet looping,
BIDIR-PIM introduces a new mechanism called the designated forwarder (DF), which
establishes a loop-free SPT rooted at the RP. A single DF for a particular BIDIR-PIM
group exists on every link within a PIM domain. Elected on both multi-access and point-
to-point links, the DF is the router on the link with the best unicast route to the RP.
■ Downstream traffic that flows down the shared tree onto the link
The DF does this for all the bidirectional groups served by the RP. The election mecha-
nism for the DF must ensure that all the routers on the link have a consistent view of the
path toward the RP.
Technet24
||||||||||||||||||||
On every link in the network, the BIDIR-PIM routers participate in a procedure called DF
election. The procedure selects one router as the DF for every RP of bidirectional groups.
The router with the best unicast route to the RP is elected as a DF. There is also an elec-
tion tie-breaking process if there are parallel equal-cost paths to the RP. If the elected DF
fails, it is detected via the normal PIM hello mechanism, and a new DF election process
will be initiated.
Note There are various ways of configuring and selecting the RP in a PIM-SM network,
such as static-RP, BSRs, Auto-RP, Anycast-RP, and so on, depending on if we have a single
or multiple RPs in a PIM domain. Covering all the RP configuration methods is beyond
the scope of this book. For simplicity, we will stick to static-RP configuration in multicast
configuration examples.
MSDP was developed for peering between Internet service providers (ISPs). ISPs did not
want to rely on an RP maintained by a competing ISP to provide service to their custom-
ers. MSDP allows each ISP to have its own local RP and still forward and receive multi-
cast traffic to the Internet.
MSDP enables RPs to share information about active sources. RPs know about the receiv-
ers in their local domain. When RPs in remote domains hear about the active sources,
they can pass on that information to their local receivers and multicast data can then be
forwarded between the domains. A useful feature of MSDP is that it allows each domain
to maintain an independent RP that does not rely on other domains. MSDP gives the
network administrators the option of selectively forwarding multicast traffic between
domains or blocking particular groups or sources. PIM-SM is used to forward the traffic
between the multicast domains.
The RP in each domain establishes an MSDP peering session using a TCP connection
with the RPs in other domains or with border routers leading to the other domains. When
the RP learns about a new multicast source within its own domain (through the normal
PIM register mechanism), the RP encapsulates the first data packet in a Source-Active
(SA) message and sends the SA to all MSDP peers. MSDP uses a modified RPF check
in determining which peers should be forwarded the SA messages. This modified RPF
check is done at an AS level instead of a hop-by-hop metric. The SA is forwarded by
each receiving peer, also using the same modified RPF check, until the SA reaches every
MSDP router in the internetwork—theoretically, the entire multicast Internet. If the
receiving MSDP peer is an RP, and the RP has an (*, G) entry for the group in the SA (that
||||||||||||||||||||
is, there is an interested receiver), the RP creates the (S, G) state for the source and joins
the shortest path tree for the source. The encapsulated data is decapsulated and forward-
ed down the shared tree of that RP. When the packet is received by the last-hop router
of the receiver, the last-hop router also may join the shortest path tree to the source.
The MSDP speaker periodically sends SAs that include all sources within the domain
of the RP.
When a receiver joins a group that is transmitted by a source in another domain, the RP
sends PIM join messages in the direction of the source to build a shortest path tree. The
DR sends packets on the source tree within the source domain, which can travel through
the RP in the source domain and along the branches of the source tree to other domains.
In domains where there are receivers, RPs in those domains can be on the source tree.
The peering relationship is conducted over a TCP connection.
Figure 6-15 shows four PIM domains. The connected RPs (routers) are called MSDP peers
because they are exchanging active source information with each other. Each MSDP peer
advertises its own set of multicast source information to the other peers. Source host 2
sends the multicast data to group 224.1.1.1. On RP 6, the MSDP process learns about the
source through PIM register messages and generates Source-Active (SA) messages to its
MSDP peers that contain information about the sources in its domain. When RP 3 and
RP 5 receive the SA messages, they forward these messages to their MSDP peers. When
RP 5 receives the request from host 1 for the multicast data on group 224.1.1.1, it builds a
shortest path tree to the source by sending a PIM join message in the direction of host 2
at 192.1.1.1.
RP 4
RP 1 RP 2
IGMP report
(*, 224.1.1.1)
RP 5
RP 3 Host 1
RP 6
RP 8
RP 7
Host 2
Source
(192.1.1.1, 224.1.1.1)
MSDP peers
Interdomain Source Active messages
Technet24
||||||||||||||||||||
IGMP Snooping
The default behavior for a Layer 2 switch is to forward all multicast traffic to every port
that belongs to the destination LAN on the switch. This behavior reduces the efficiency
of the switch, whose purpose is to limit traffic to the ports that need to receive the data.
IGMP snooping efficiently handles IP multicast in a Layer 2 environment; in other words,
IGMP snooping makes Layer 2 switches IGMP-aware. IGMP snooping is used on subnets
that include end users or receiver clients.
IGMP Router
Host
||||||||||||||||||||
The IGMP snooping feature operates on IGMPv1, IGMPv2, and IGMPv3 control plane
packets where Layer 3 control plane packets are intercepted and influence the Layer 2
forwarding behavior.
With IGMP snooping, the switch learns about connected receivers of a multicast group.
The switch can use this information to forward corresponding multicast frames only
to the interested receivers. The forwarding decision is based on the destination MAC
address. The switch uses the IP-to-MAC address mapping to decide which frames go to
which IP multicast receivers (or destination MAC).
The IANA owns a block of Ethernet MAC addresses that start with 01:00:5E in hexa-
decimal format. Half of this block is allocated for multicast addresses. The range from
0100.5e00.0000 through 0100.5e7f.ffff is the available range of Ethernet MAC addresses
for IP multicast.
For IPv4, mapping of the IP multicast group address to a Layer 2 multicast address
happens by taking the 23 low-order bits from the IPv4 address and adding them to the
01:00:5e prefix, as shown in Figure 6-17. By the standard, the upper 9 bits of the IP
address are ignored, and any IP addresses that only differ in the value of these upper bits
are mapped to the same Layer 2 address, since the lower 23 bits used are identical. For
example, 239.255.0.1 is mapped to the MAC multicast group address 01:00:5e:7f:00:01.
Up to 32 IP multicast group addresses can be mapped to the same Layer 2 address.
32 bits
28 bits
1110
IP multicast address 239.255.0.1
5 bits
lost
48 bits
Because the upper 5 bits of the IP multicast address are dropped in this mapping, the
resulting address is not unique. In fact, 32 different multicast group IDs map to the same
Ethernet address, as shown in Figure 6-18. Network administrators should consider this
fact when assigning IP multicast addresses. For example, 224.1.1.1 and 225.1.1.1 map to
the same multicast MAC address on a Layer 2 switch. If one user subscribes to Group A
(as designated by 224.1.1.1) and the other users subscribe to Group B (as designated by
225.1.1.1), they would both receive both A and B streams. This situation limits the
effectiveness of this multicast deployment.
Technet24
||||||||||||||||||||
32 IP multicast addresses
224.1.1.1
224.129.1.1
225.1.1.1
225.129.1.1 Multicast MAC addresses
0x0100.5E01.0101
238.1.1.1
238.129.1.1
239.1.1.1
239.129.1.1
The PIM process begins when the router establishes PIM neighbor adjacencies by send-
ing PIM hello messages to the multicast IPv4 address 224.0.0.13. Hello messages are sent
periodically at the interval of 30 seconds. When all neighbors have replied, the PIM soft-
ware chooses the router with the highest priority in each LAN segment as the designated
router (DR). The DR priority is based on a value in the PIM hello message. If the DR pri-
ority value is not supplied by all routers, or if the priorities match, the highest IP address
is used to elect the DR.
The hello message also contains a hold-time value, which is typically 3.5 times the hello
interval. If this hold time expires without a subsequent hello message from its neighbor,
the device detects a PIM failure on that link.
When the DR receives an IGMP membership report message from a receiver for a new
group or source, the DR creates a tree to connect the receiver to the source by sending a
PIM join message out the interface toward the rendezvous point (ASM mode). The ren-
dezvous point (RP) is the root of a shared tree, which is used by all sources and hosts in
the PIM domain in the ASM mode.
When the DR determines that the last host has left a group or source, it sends a PIM
prune message to remove the path from the distribution tree. The routers forward the
join or prune action hop by hop up the multicast distribution tree to create (join) or tear
down (prune) the path.
PIM requires that multicast entries are refreshed within a 3.5-minute timeout interval.
The state refresh ensures that traffic is delivered only to active listeners, and it keeps rout-
ers from using unnecessary resources.
||||||||||||||||||||
To maintain the PIM state, the last-hop DR sends join-prune messages once per minute.
State creation applies to both (*, G) and (S, G) states as follows:
■ (*, G) state creation example: An IGMP (*, G) report triggers the DR to send an
(*, G) PIM join message toward the RP.
■ (S, G) state creation example: An IGMP (S, G) report triggers the DR to send an
(S, G) PIM join message toward the source.
If the state is not refreshed, the PIM software tears down the distribution tree by remov-
ing the forwarding paths in the multicast outgoing interface list of the upstream routers.
PIM register messages are unicast to the RP by DRs that are directly connected to multi-
cast sources. The PIM register message has the following functions:
■ To deliver multicast packets sent by the source to the RP for delivery down the
shared tree
The DR continues to send PIM register messages to the RP until it receives a Register-
Stop message from the RP. The RP sends a Register-Stop message in either of the
following cases:
■ The RP has joined the SPT to the source but has not started receiving traffic from the
source.
Configuring basic PIM sparse mode is a multistep process. The following are the steps to
enable a basic configuration of PIM sparse mode:
Step 1. Select the range of multicast groups that you want to configure in the multi-
cast distribution mode.
Table 6-7 summarizes the NX-OS CLI commands related to basic PIM configuration and
verification.
Technet24
||||||||||||||||||||
Table 6-7 Summary of NX-OS CLI Commands for PIM Configuration and Verification
Command Purpose
configure terminal Enters global configuration mode.
[no] feature pim Enables PIM. By default, PIM is disabled.
interface interface-type slot/port Enters interface configuration mode.
ip pim sparse-mode Enables PIM sparse mode on this interface. The default
is disabled.
ip pim hello-authentication Enables an MD5 hash authentication key in PIM hello
ah-md5 auth-key messages. You can enter an unencrypted (cleartext) key
or one of these values followed by a space and the MD5
authentication key:
||||||||||||||||||||
Command Purpose
show ip pim neighbor [inter- Displays neighbors by the interface.
face interface | ip-prefix] [vrf vrf-
name | all ]
show ip pim interface [inter- Displays information by the interface.
face | brief ] [vrf vrf-name | all ]
show ip pim rp [ip-prefix] [vrf Displays rendezvous points known to the software, how
vrf-name | all ] they were learned, and their group ranges.
show ip pim group-range Displays the learned or configured group ranges and
[ip-prefix] [vrf vrf-name | all ] modes.
show ip mroute ip-address Displays the mroute entries.
Examples 6-15 through 6-24 show the basic PIM and IGMP configuration and verifica-
tion on the sample topology shown in Figure 6-19. In this example, we will configure the
MD5 authentication on the interfaces connecting the Nexus switches. We will configure
Loopback 0 on N9K-B as the multicast receiver for the multicast group 230.0.0.0. Source
is connected to the Eth 1/50 interface on N9K-A. OSPFv2 is already configured for the
topology in the background for the multicast RPF check to pass.
N7K
10.10.10.10/32 – Lo0 (RP)
Et
8
6/
h
6/
h
Et
9
19
0
2.
/3
16
44
8.
1.
1.
8.
Et
48
19 49
16
h
1/
/3
2.
1/
0
h
49
Et
192.168.1.52/30
192.168.1.40/30 Eth 1/3 Eth 1/3
Eth 1/50
Source N9K-A N9K- B
20.20.20.20/32 – Lo0 30.30.30.30/32 – Lo0 (Receiver)
Technet24
||||||||||||||||||||
! Enabling Loopback 0, Ethernet 6/8 and Ethernet 6/9 for pim sparse-mode and config-
uring md5 authentication with authentication key cisco on interface Ethernet 6/8
and Ethernet 6/9.
N7K(config)# interface Loopback 0
N7K(config-if)# ip pim sparse-mode
N7K(config-if)# interface Ethernet 6/8
N7K(config-if)# ip pim sparse-mode
N7K(config-if)# ip pim hello-authentication ah-md5 cisco
N7K(config-if)# interface Ethernet 6/9
N7K(config-if)# ip pim sparse-mode
N7K(config-if)# ip pim hello-authentication ah-md5 cisco
N7K(config-if)# exit
! Configuring Loopback 0 interface of N7K as static RP. Since we are not specifying
the group for the RP, it will act as RP for whole multicast range i.e. 224.0.0.0/4.
N7K(config)# ip pim rp-address 10.10.10.10
N7K(config)# exit
N7K#
||||||||||||||||||||
! Enabling Loopback 0, Ethernet 1/49 and Ethernet 1/3 for pim sparse-mode and con-
figuring md5 authentication with authentication key cisco on interface Ethernet
1/49 and Ethernet 1/3.
N9K-A(config)# interface Loopback 0
N9K-A(config-if)# ip pim sparse-mode
N9K-A(config-if)# interface Ethernet 1/49
N9K-A(config-if)# ip pim sparse-mode
N9K-A(config-if)# ip pim hello-authentication ah-md5 cisco
N9K-A(config-if)# interface Ethernet 1/3
N9K-A(config-if)# ip pim sparse-mode
N9K-A(config-if)# ip pim hello-authentication ah-md5 cisco
N9K-A(config-if)# exit
! Configuring Loopback 0 interface of N7K as static RP. Since we are not specifying
the group for the RP, it will act as RP for whole multicast range i.e. 224.0.0.0/4.
N9K-A(config)# ip pim rp-address 10.10.10.10
N9K-A(config)# exit
N9K-A#
Technet24
||||||||||||||||||||
! Enabling Loopback 0, Ethernet 1/49 and Ethernet 1/3 for pim sparse-mode and con-
figuring md5 authentication with authentication key cisco on interface Ethernet
1/49 and Ethernet 1/3.
N9K-B(config)# interface Loopback 0
N9K-B(config-if)# ip pim sparse-mode
N9K-B(config-if)# interface Ethernet 1/49
N9K-B(config-if)# ip pim sparse-mode
N9K-B(config-if)# ip pim hello-authentication ah-md5 cisco
N9K-B(config-if)# interface Ethernet 1/3
N9K-B(config-if)# ip pim sparse-mode
N9K-B(config-if)# ip pim hello-authentication ah-md5 cisco
N9K-B(config-if)# exit
! Configuring Loopback 0 interface of N7K as static RP. Since we are not specifying
the group for the RP, it will act as RP for whole multicast range i.e. 224.0.0.0/4.
N9K-B(config)# ip pim rp-address 10.10.10.10
N9K-B(config)# exit
N9K-B#
||||||||||||||||||||
Technet24
||||||||||||||||||||
||||||||||||||||||||
Technet24
||||||||||||||||||||
! Configuring Loopback 0 on N9K-B to join the 230.0.0.0 group and act as receiver
for the group.
N9K-B# configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
N9K-B(config)# interface Loopback 0
N9K-B(config-if)# ip igmp version 3
N9K-B(config-if)# ip igmp join-group 230.0.0.0
Note: IGMP join-group disrupts forwarding on the outgoing interface list
N9K-B(config-if)# end
N9K-B#
Once we configure the receiver, a shared tree is formed from the RP to the receiver rep-
resented by the (*, G) entry, as shown in Figure 6-20. When the receiver sends the IGMP
report to the RP, you will see an (*, G) entry on the shared tree path from RP to receiver (that
is, N7K and N9K-B switches). The shared tree path can be verified by checking the incoming
interface and outgoing interface list in the multicast routing table, as shown in Example 6-22.
Sh
N7K ar
10.10.10.10/32 – Lo0 (RP)
ed
Tr
ee
fro
mR
Et
8
6/
P to
6/
h
Et
19
9
0
2.
/3
Receive
16
44
8.
1.
1.
8.
Et 0
19 49
48
16
h
1/
/3
2.
1/
r(
h
49
Et
192.168.1.52/30
192.168.1.40/30 Eth 1/3 Eth 1/3
Eth 1/50
Source N9K-A N9K- B
20.20.20.20/32 – Lo0 30.30.30.30/32 – Lo0 (Receiver)
||||||||||||||||||||
Example 6-22 The mroute Verification on N7K, N9K-A, and N9K-B After Configuring
IGMP
! Verifying the multicast routing table on N7K, N9K-A and N9K-B to trace the shared
tree path from RP(N7K) to Receiver(N9K-B).
N7K
N9K-A
N9K-B
Technet24
||||||||||||||||||||
Before the source generates any traffic, we have to enable PIM sparse mode on the
Ethernet 1/50 interface on N9K-A, as shown in Example 6-23, to make it multicast
capable.
Example 6-23 Enabling the Interface Connected to the Source with PIM
Once the source generates some multicast traffic for the receivers, you will see an (S, G)
entry on the path from the source to the receiver (that is, N9K-A, N7K, and N9K-B
switch). The shortest path tree can be verified by checking the incoming interface and
outgoing interface list in the multicast routing table, as shown in Example 6-24.
Example 6-24 The mroute Verification on N7K, N9K-A, and N9K-B with Multicast
Traffic from Source to Receiver
! Verifying the multicast routing table on N7K, N9K-A and N9K-B to trace the Short-
est Path Tree (SPT) from the Source to the Receiver(N9K-B).
N7K
||||||||||||||||||||
N9K-A
N9K-B
Figure 6-21 illustrates the shortest path tree (SPT) from the source to the receiver
(N9K-B).
Technet24
||||||||||||||||||||
)
,G
(Sr
ive
ce
N7K Sh
Re
ar
10.10.10.10/32 – Lo0 (RP) e
to
e
d
rc
Tr
u
ee
So
fro
m
fro
mR
Et
/8
h
ee
/3 h 6
6/
P to
Tr
44 Et
19
9
th
2.
Pa
16
Receiver
st
8.
1.
te
1.
8.
Et 0
or
49
48
16
h
Sh
1/
/3
2.
1/
19
h
49
Et
(S,G)
192.168.1.52/30
192.168.1.40/30 Eth 1/3 Eth 1/3
Eth 1/50
Source N9K-A N9K- B
20.20.20.20/32 – Lo0 30.30.30.30/32 – Lo0 (Receiver)
Summary
This chapter discusses routing fundamentals, configuration and verification of RIPv2,
EIGRP, and OSPFv2 unicast routing protocols on NX-OS, multicast fundamentals, and
PIMv2 configuration and verification on NX-OS, including the following points:
■ Layer 3 unicast routing involves two basic activities: determining optimal routing
paths and packet switching.
■ Routing algorithms use many different metrics to determine the best path to the
destination.
■ Routing protocols that route packets between autonomous systems are called exte-
rior gateway protocols or interdomain protocols. Routing protocols used within an
autonomous system are called interior gateway protocols or intradomain protocols.
■ Distance vector protocols use distance vector algorithms (also known as Bellman-
Ford algorithms) that call for each router to send all or some portion of its routing
table to its neighbors.
||||||||||||||||||||
References 255
In link-state protocols, also known as shortest path first (SPF), each router builds a
link-state advertisement (LSA) that contains information about each link and directly
connected neighbor router and shares the information with its neighboring routers.
The Routing Information Protocol (RIP) is a distance vector protocol that uses a hop
count as its metric.
■ OSPFv2 is a link-state protocol that uses a link cost as its metric and uses link-state
advertisements (LSAs) to build its routing table.
■ Multicast involves both a method of delivery and discovery of senders and receivers
of multicast data, which is transmitted on IP multicast addresses called groups.
■ A multicast distribution tree represents the path that multicast data takes between
the routers that connect sources and receivers. Protocol-Independent Multicast (PIM)
is used to dynamically create a multicast distribution tree.
■ MSDP enables RPs to share information about active sources across PIM-SM
domains.
References
“Cisco Nexus 9000 Series NX-OS Unicast Configuration Guide, Release 10.2(x),”: https://
www.cisco.com/c/en/us/td/docs/dcn/nx-os/nexus9000/102x/configuration/Unicast-
routing/cisco-nexus-9000-series-nx-os-unicast-configuration-guide-release-102x.html
“Cisco Nexus 7000 Series NX-OS Unicast Routing Configuration Guide, Release 8.x,”
https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus7000/sw/unicast/
config/cisco_nexus7000_unicast_routing_config_guide_8x.html
“Cisco Nexus 9000 Series NX-OS Multicast Routing Configuration Guide, Release
10.2(x),” https://www.cisco.com/c/en/us/td/docs/dcn/nx-os/nexus9000/102x/
configuration/multicast-routing/cisco-nexus-9000-series-nx-os-multicast-routing-
configuration-guide-release-102x.html
“Cisco Nexus 7000 Series NX-OS Multicast Routing Configuration Guide, Release 8.x,”
https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus7000/sw/
multicast/config/cisco_nexus7000_multicast_routing_config_guide_8x.html
Relevant Cisco Live sessions: http://www.ciscolive.com
Technet24
||||||||||||||||||||
||||||||||||||||||||
Chapter 7
Network Virtualization
In this chapter, we will discuss overlay network protocols such as Network Virtualization
using GRE (NVGRE), Cisco Overlay Transport Virtualization (OTV), and VXLAN
Overlay. We will also discuss network interface virtualization using FEX technology and
VMware vSphere virtual switches.
Technet24
||||||||||||||||||||
Network overlays are virtual networks of interconnected nodes that share an underlying
physical network, allowing deployment of applications that require specific network
topologies without the need to modify the underlying network. Multiple overlay net-
works can coexist at the same time.
Encapsulation
Overlay Network
Edge Devices
Edge Device
Hosts
(endpoints)
■ Layer 2 overlays: Layer 2 overlays emulate a LAN segment and transport both IP
and non-IP packets, and forwarding is based on Ethernet frame headers. Mobility
is restricted to a single subnet (that is, a single L2 domain). Since it’s a single L2
domain, Layer 2 floods are not uncommon. Layer 2 overlays are useful in emulating
physical topologies.
||||||||||||||||||||
Depending on the types of overlay edge devices (that is, where the virtualized network is
terminated), overlays are classified into three categories.
Network-based Host-based
Overlay Overlay
Overlay technologies allow the network to scale by focusing scaling on the network over-
lay edge devices. With overlays used at the fabric edge, the core devices are freed from
the need to add end-host information to their forwarding tables. Most overlay technolo-
gies used in the data center allow virtual network IDs to uniquely scope and identify indi-
vidual private networks. This scoping allows potential overlap in MAC and IP addresses
between tenants. The overlay encapsulation also allows the underlying infrastructure
address space to be administered separately from the tenant address space.
Technet24
||||||||||||||||||||
NVGRE endpoints (NVEs) are the ingress/egress points between the virtual and the
physical networks. Any physical server or network device can be an NVGRE endpoint.
One common deployment is for the endpoint to be part of a hypervisor. The primary
function of this endpoint is to encapsulate/decapsulate Ethernet data frames to and from
the GRE tunnel, ensure Layer 2 semantics, and apply isolation policy scoped on VSID.
The endpoint can optionally participate in routing and function as a gateway in the vir-
tual topology. To encapsulate an Ethernet frame, the endpoint needs to know the location
information for the destination address in the frame. This information can be provisioned
via a management plane or obtained via a combination of control-plane distribution or
data-plane learning approaches.
Original packet
Ethernet TCP
IP header TCP user data
header header
||||||||||||||||||||
■ Outer Ethernet header: The source Ethernet address in the outer frame is set to the
MAC address associated with the NVGRE endpoint. The destination endpoint may
or may not be on the same physical subnet. The destination Ethernet address is set
to the MAC address of the next-hop IP address for the destination NVE.
■ Outer IP header: The IP address in the outer frame is referred to as the Provider
Address (PA). There can be one or more PAs associated with an NVGRE endpoint,
with policy controlling the choice of which PA to use for a given Customer Address
(CA) for a customer VM.
■ GRE header: In the GRE header, the C (Checksum Present) and S (Sequence Number
Present) bits must be 0 and the K (Key Present) bit must be set to 1. The 32-bit Key
field in the GRE header is used to carry the VSID and the Flow ID. Flow ID is an
8-bit value that is used to provide per-flow entropy for flows in the same VSID and
must not be modified by transit devices. If FlowID is not generated, it must be set to
all zeros. The protocol field in the GRE header is set to 0x6558.
■ Inner Ethernet and IP header: The inner Ethernet frame is composed of an inner
Ethernet header followed by optional inner IP header, followed by the IP payload.
The inner frame could be any Ethernet data frame, not just IP. The IP address con-
tained in the inner frame is referred to as the Customer Address (CA).
Figure 7-5 shows a typical NVGRE communication. The traffic is transported from VM1
(10.0.0.5) to VM2 (10.0.0.7). Two VM1s with the same IP address are representing two dif-
ferent customers using the same IP address space. The original IP packet is encapsulated
with the MAC header containing MAC addresses of the source and destination VMs of
specific customer VMs that are communicating. On top of that, a GRE header is added
that contains a VSID used to identify each virtual network. The outer IP header con-
tains the source and destination IP address of the tunnel endpoints (that is, the provider
address).
Provider Customer
Address VSID Address
192.168.2.22 10.0.0.5
GRE Key 5001 MAC
192.168.5.55 10.0.0.7 NVGRE
192.168.2.22 10.0.0.5 Packet
GRE Key 6001 MAC
192.168.5.55 10.0.0.7
Technet24
||||||||||||||||||||
■ Edge device: An edge device performs typical Layer 2 learning and forwarding on
the site-facing interfaces (internal interfaces) and performs IP-based virtualization
on the transport-facing interfaces. The edge device capability can be colocated in
a device that performs Layer 2 and Layer 3 functionality. OTV functionality only
occurs in an edge device. A given edge device can have multiple overlay interfaces.
You can also configure multiple edge devices on a site.
■ Transport network: The network that connects OTV sites. This network can be
customer managed, provided by a service provider, or a mix of both.
■ Join interface: One of the uplink interfaces of the edge device. The join interface is
a point-to-point routed interface. The edge device joins an overlay network through
this interface. The IP address of this interface is used to advertise reachability of a
MAC address present in this site.
■ Internal interface: The Layer 2 interface on the edge device that connects to the
VLANs that are to be extended. These VLANs typically form a Layer 2 domain
known as a site and can contain site-based switches or site-based routers. The inter-
nal interface is a Layer 2 access or trunk interface regardless of whether the internal
interface connects to a switch or a router.
■ MAC routing: Associates the destination MAC address of the Layer 2 traffic with
an edge device IP address. The MAC-to-IP association is advertised to the edge
devices through the OTV control-plane protocol. In MAC routing, MAC addresses
are reachable through the IP address of a remote edge device on the overlay network.
Layer 2 traffic destined to a MAC address is encapsulated in an IP packet based on
the MAC-to-IP mapping in the MAC table.
■ Overlay network: A logical network that interconnects remote sites for MAC rout-
ing of Layer 2 traffic. The overlay network is composed of multiple edge devices.
||||||||||||||||||||
■ Site VLAN: OTV sends local hello messages on the site VLAN to detect other OTV
edge devices in the site and uses the site VLAN to determine the authoritative edge
device for the OTV-extended VLANs. VLAN 1 is the default site VLAN. It is recom-
mended to use a dedicated VLAN as a site VLAN. You should ensure that the site
VLAN is active on at least one of the edge device ports and that the site VLAN is
not extended across the overlay.
Overlay
OTV Edge
Device
Interface
OTV Core
L2 L3
Internal Join
Interfaces Interface
OTV offers unicast and multicast as transports between sites. For a small number of sites
such as two or three sites, unicast works just fine without losing any features or func-
tions. In unicast-only transport, edge devices register with an adjacency server (AS) edge
device and receive a full list of neighbors (oNL) from the AS. An edge device can be man-
ually configured to act as an AS edge device. OTV hellos and updates are encapsulated in
IP and unicast to each neighbor. Figure 7-7 illustrates the neighbor discovery process over
unicast-only transport.
Technet24
||||||||||||||||||||
Unicast-only
Transport
OTV Control Plane OTV OTV OTV Control Plane
OTV
IP A IP B
West East
Multicast is the preferred transport because of its flexibility and smaller overhead when
communicating with multiple sites. In multicast transport, one multicast address (the
control-group address) is used to encapsulate and exchange OTV control-plane protocol
updates. Each edge device that participates in the particular overlay network shares the
same control-group address with all the other edge devices. As soon as the control-group
address and the join interface are configured, the edge device sends an IGMP report mes-
sage to join the control group. The edge devices act as hosts in the multicast network and
send multicast IGMP report messages to the assigned multicast group address. Figure 7-8
illustrates the neighbor discovery process over multicast transport.
Multicast-enabled
Transport
OTV Control Plane OTV OTV OTV Control Plane
OTV
IP A IP B
West East
||||||||||||||||||||
Edge devices participate in data-plane learning on internal interfaces to build up the list
of MAC addresses that are reachable within a site. OTV sends these locally learned MAC
addresses in the OTV control-plane updates to remote sites.
When an edge device receives a Layer 2 frame on an internal interface, OTV performs the
MAC table lookup based on the destination address of the Layer 2 frame. If the frame is
destined to a MAC address that is reachable through another internal interface, the frame
is forwarded out on that internal interface. OTV performs no other actions, and the pro-
cessing of the frame is complete.
If the frame is destined to a MAC address that was learned over an overlay interface,
OTV performs the following tasks, as illustrated in Figure 7-9:
1. Strips off the preamble and frame check sequence (FCS) from the Layer 2 frame.
2. Adds an OTV shim header to the Layer 2 frame and copies the 802.1Q information
into the OTV shim header. The outer OTV shim header contains the VLAN, overlay
number, and so on.
3. Adds the IP address to the packet, based on the initial MAC address table lookup.
This IP address is used as a destination address for the IP packet that is sent into the
core switch. In the process, 42 bytes of overhead to the packet IP MTU size is added
for IPv4 packet.
AN #
ID
OTV traffic appears as IP traffic to the network core. At the destination site, the edge
device performs the reverse operation and presents the original Layer 2 frame to the local
site. That edge device determines the correct internal interface to forward the frame on,
based on the local MAC address table.
In Figure 7-10, the west site communicates with the east site over the overlay network.
Edge Device 1 receives the Layer 2 frame from MAC1, which belongs to Server 1, and
looks up the destination MAC address, MAC3, in the MAC table. The edge device
encapsulates the Layer 2 frame in an IP packet with the IP destination address set for
Edge Device 3 (IP B). When Edge Device 3 receives the IP packet, it strips off the IP
header and sends the original Layer 2 frame onto the VLAN and port that MAC3 is
connected to.
Technet24
||||||||||||||||||||
266
9780137638246_print.indb 266
MAC 1 MAC 3 IP A IP B
OTV
Chapter 7: Network Virtualization
MAC TABLE
Transprot MAC TABLE
Infrastructure
VLAN MAC IF VLAN MAC IF
IP A Decap
100 MAC 1 Eth 2 IP B 100 MAC 1 IP A
3 5
2 6
100 OTVMAC 2 Eth 1 OTV Encap 100 MAC 2 IP A
OTV OTV
Layer 2 100 100 Layer 2
MAC 3 IP B MAC 3 Eth 3
Lookup Lookup
100 MAC 4 IP B 100 MAC 4 Eth 4
23/08/22 4:34 PM
||||||||||||||||||||
VXLAN Overlay
Traditional network segmentation has been provided by VLANs that are standardized
under the IEEE 802.1Q group. VLANs provide logical segmentation of Layer 2 boundar-
ies or broadcast domains. However, due to the inefficient use of available network links
with VLAN use, the rigid requirements on device placement in the data center network,
and the limited scalability to a maximum of 4094 VLANs, using VLANs has become a
limiting factor for IT departments and cloud providers as they build large multitenant
data centers. Virtual Extensible LAN (VXLAN) provides the solution to the data center
network challenges posed by traditional VLAN technology by providing elastic workload
placement and the higher scalability of Layer2 segmentation required by today’s applica-
tion demands.
Virtual Extensible LAN (VXLAN) is a Layer 2 overlay scheme over a Layer 3 network
and provides a means to extend Layer 2 segments across a Layer 3 infrastructure using
MAC-in-UDP encapsulation and tunneling. VXLAN supports a flexible, large-scale mul-
titenant environment over a shared common physical infrastructure. The transport proto-
col over the physical data center network is IP plus UDP.
■ Flexible placement of workloads across the data center fabric: VXLAN provides a
way to extend Layer 2 segments over the underlying shared Layer 3 network infra-
structure so that tenant workloads can be placed across physical pods in a single
data center—or even across several geographically diverse data centers.
■ Higher scalability to allow more Layer 2 segments: VXLAN uses a 24-bit segment
ID, the VXLAN network identifier (VNID). This allows a maximum of 16 million
VXLAN segments to coexist in the same administrative domain. In comparison,
traditional VLANs use a 12-bit segment ID that can support a maximum of 4096
VLANs.
Before understanding the VXLAN operation, let’s first discuss a few important terms:
■ Virtual network instance (VNI): Each VNI identifies a specific virtual network in
the data plane and provides traffic isolation. VLANs are mapped to a VNI to extend
a VLAN across a Layer 3 infrastructure.
Technet24
||||||||||||||||||||
■ VXLAN tunnel endpoint (VTEP): VXLAN tunnel endpoints (VTEPs) are devices,
either physical or virtual, that terminate VXLAN tunnels. They perform VXLAN
encapsulation and de-encapsulation. Each VTEP has two interfaces. One is a Layer
2 interface on the local LAN segment to support a local endpoint communication
through bridging. The other is a Layer 3 interface on the IP transport network. The
IP interface has a unique address that identifies the VTEP device in the transport
network. The VTEP device uses this IP address to encapsulate Ethernet frames
and transmit the packets on the transport network. A VTEP discovers other VTEP
devices that share the same VNIs it has locally connected. It advertises the locally
connected MAC addresses to its peers. It also learns remote MAC address to VTEP
mappings through its IP interface.
A VXLAN frame adds 50 bytes to the original Layer 2 frame after encapsulation.
This overhead includes the outer MAC header, outer IP header, outer UDP header, and
VXLAN header. If the optional VLAN Type and VLAN ID Tag fields are used on the
Outer MAC header, the overhead will be 54 bytes.
■ The “I” flag in the VXLAN Flags field must be set to 1 for a valid VXLAN network
ID (VNI). The other 7 bits (designated “R”) are reserved fields and must be set to 0.
■ The VNI field has a 24-bit value that is used to identify an individual VXLAN
overlay network on which the communicating VMs are situated. VMs in different
VXLAN overlay networks cannot communicate with each other.
■ The source port is calculated using a hash of fields from the inner Ethernet frame’s
headers. This enables a level of entropy for the ECMP/load balancing of the
VM-to-VM traffic across the VXLAN overlay.
||||||||||||||||||||
9780137638246_print.indb 269
Outer MAC Header Outer IP Header Outer UDP Header VXLAN Header Original Layer 2 Frame
FCS
14 Bytes
(4 Bytes Optional) 20 Bytes 8 Bytes 8 Bytes Ethernet Payload
Port
VNI
Port
Tag
UDP
Source
VXLAN
0x0000
0x8100
Header
0x0800
Length
Dest. IP
Address
Protocol
Address
Reserved
RRRRIRRR
Src. MAC
VLAN ID
Source IP
Reserved
IP Header
Checksum
Misc. Data
Checksum
Dest. MAC
Ether Type
VXLAN Flags
VLAN Type
0x11 (UDP)
48 48 16 16 16 72 8 16 32 32 16 16 16 16 8 24 24 8
Src VTEP MAC Address Src and Dst addresses of Large scale
the VTEPs Allow for 16M
UDP 4789 segmentation
possible segments
Next-Hop MAC Address
Hash of the inner L2/L3/L4 headers
of the original frame.
Enables entropy for ECMP load Tunnel Entropy
50 (54) bytes of overhead balancing in the network.
Technet24
23/08/22 4:34 PM
||||||||||||||||||||
■ The destination MAC address can be either the MAC address of the target VTEP or
of an intermediate Layer 3 router.
■ The VLAN Type (2 bytes) and VLAN ID Tag (2 bytes) are optional. If present, they
may be used for delineating VXLAN traffic on the LAN.
Flood-and-Learn
In the flood-and-learn method, VXLAN uses existing Layer 2 mechanisms (flooding and
dynamic MAC address learning) to do the following:
■ Learn remote host MAC addresses and MAC-to-VTEP mappings for each VXLAN
segment
||||||||||||||||||||
For the three traffic types, IP multicast is used to reduce the flooding scope of the set
of hosts participating in the VXLAN segment. Each VXLAN segment, or VNID, is
mapped to an IP multicast group in the transport IP network. Each VTEP device is inde-
pendently configured and joins this multicast group as an IP host through the Internet
Group Management Protocol (IGMP). The IGMP joins trigger Protocol-Independent
Multicast (PIM) joins and signaling through the transport network for the particular mul-
ticast group. The multicast distribution tree for this group is built through the transport
network based on the locations of participating VTEPs. This multicast group is used to
transmit VXLAN broadcast, unknown unicast, and multicast traffic through the IP net-
work, limiting Layer 2 flooding to those devices that have end systems participating in
the same VXLAN segment. VTEPs communicate with one another through the flooded
or multicast traffic in this multicast group.
The Flood-and-Learn VXLAN implementation uses the classic Layer 2 data plane flood-
ing and learning mechanisms for remote VTEP discovery and tenant address learning.
Figure 7-12 shows the remote VTEP discovery and end-host address learning process.
The tenant VXLAN segment has VNID 10 and uses the multicast group 239.1.1.1 over
the transport network. It has three participating VTEPs in the data center. Assume that
no address learning has been performed between locations. End System A (with IP-A,
MAC-A) starts IP communication with End System B (with IP-B, MAC-B). The sequence
of steps is as follows:
1. End System A sends out an Address Resolution Protocol (ARP) request for IP-B on
its Layer 2 VXLAN network.
2. VTEP-1 receives the ARP request. It does not yet have a mapping for IP-B. VTEP-
1 encapsulates the ARP request in an IP multicast packet and forwards it to the
VXLAN multicast group. The encapsulated multicast packet has the IP address of
VTEP-1 as the source IP address and the VXLAN multicast group address as the
destination IP address.
3. The IP multicast packet is distributed to all members in the tree. VTEP-2 and VTEP-
3 receive the encapsulated multicast packet because they’ve joined the VXLAN
multicast group. They de-encapsulate the packet and check its VNID in the VXLAN
header. If it matches their configured VXLAN segment VNID, they forward the ARP
request to their local VXLAN network. They also learn the IP address of VTEP-1
from the outer IP address header and inspect the packet to learn the MAC address of
End System A, placing this mapping in the local table.
4. End System B receives the ARP request forwarded by VTEP-2. It responds with its
own MAC address (MAC-B) and learns the IP-A-to-MAC-A mapping.
5. VTEP-2 receives the ARP reply of End System B with MAC-A as the destination
MAC address. It now knows about MAC-A-to-IP-1 mapping. It can use the unicast
tunnel to forward the ARP reply back to VTEP-1. In the encapsulated unicast packet,
the source IP address is IP-2 and the destination IP address is IP-1. The ARP reply is
encapsulated in the UDP payload.
Technet24
||||||||||||||||||||
272
9780137638246_print.indb 272
3
ARP Request of IP B
Src MAC: MAC-A
Dst MAC: FF:FF:FF:FF:FF:FF
S-MAC:MAC-1
D-MAC: MAC VXLAN Remote
00:01:5E:01:01:01 VTEP 3 Address ID VTEP
IP-3
Chapter 7: Network Virtualization
VTEP-3
Outer S-IP: IP-1 MAC-A 10 IP-1
Outer D-IP: 239.1.1.1
UDP 4
VXLAN VNID: 10
ARP Response from IP B
ARP Request of IP B Src MAC: MAC-B
Src MAC: MAC-A 2
7 Dist MAC: MAC-A
Dst MAC: VTEP 2
FF:FF:FF:FF:FF:FF Multicast Group IP-2 End System B
ARP Response from IP B MAC-2 MAC-B
239.1.1.1
Src MAC: MAC-B IP-B
Dst MAC: MAC-A 2
VTEP-1 VTEP-2 3
Figure 7-12 VXLAN Remote VTEP Discovery and End-Host Address Learning
23/08/22 4:34 PM
||||||||||||||||||||
6. VTEP-1 receives the encapsulated ARP reply from VTEP-2. It de-encapsulates and
forwards the ARP reply to End System A. It also learns the IP address of VTEP-2
from the outer IP address header and inspects the original packet to learn the MAC-
B-to-IP-2 mapping.
7. End System A receives the ARP reply sent from End System B. Subsequent IP pack-
ets between End Systems A and B are unicast forwarded, based on the mapping
information on VTEP-1 and VTEP-2, using the VXLAN tunnel between them.
8. VTEP-1 can optionally perform proxy ARPs for subsequent ARP requests for IP-B to
reduce the flooding over the transport network.
The MP-BGP EVPN control plane offers the following main benefits:
■ The EVPN address family carries both Layer 2 and Layer 3 reachability information,
thus providing integrated bridging and routing in VXLAN overlay networks.
■ It provides optimal forwarding for east-west and north-south traffic and supports
workload mobility with the distributed anycast function.
■ It provides VTEP peer discovery and authentication, mitigating the risk of rogue
VTEPs in the VXLAN overlay network.
IP transport devices provide IP routing in the underlay network. By running the MP-BGP
EVPN protocol, they become part of the VXLAN control plane and distribute the
MP-BGP EVPN routes among their MP-BGP EVPN peers. Devices might be MP-iBGP
Technet24
||||||||||||||||||||
EVPN peers or route reflectors, or MP External BGP (MP-eBGP) EVPN peers. Their
operating system software needs to support MP-BGP EVPN so that it can understand
the MP-BGP EVPN updates and distribute them to other MP-BGP EVPN peers using
the standards-defined constructs. For data forwarding, IP transport devices perform IP
routing based only on the outer IP address of a VXLAN encapsulated packet. They don’t
need to support the VXLAN data encapsulation and decapsulation functions.
VTEPs running MP-BGP EVPN need to support both the control-plane and data-plane
functions. In the control plane, they initiate MP-BGP EVPN routes to advertise their local
hosts. They receive MP-BGP EVPN updates from their peers and install the EVPN routes
in their forwarding tables. For data forwarding, they encapsulate user traffic in VXLAN
and send it over the IP underlay network. In the reverse direction, they receive VXLAN
encapsulated traffic from other VTEPs, decapsulate it, and forward the traffic with native
Ethernet encapsulation toward the host.
The correct switch platforms need to be selected for the different network roles. For IP
transport devices, the software needs to support the MP-EVPN control plane, but the
hardware doesn’t need to support VXLAN data-plane functions. For VTEP, the switch
needs to support both the control-plane and data-plane functions.
The MP-BGP EVPN control plane provides integrated routing and bridging by distribut-
ing both the Layer 2 and Layer 3 reachability information for end hosts on VXLAN over-
lay networks. Communication between hosts in different subnets requires inter-VXLAN
routing. BGP EVPN enables this communication by distributing Layer 3 reachability
information in either a host IP address route or an IP address prefix. In the data plane,
the VTEP needs to support IP address route lookup and perform VXLAN encapsulation
based on the lookup result. This capability is referred to as the VXLAN routing function.
Like other network routing control protocols, MP-BGP EVPN is designed to distribute
network layer reachability information (NLRI) for the network. A unique feature of
EVPN NLRI is that it includes both the Layer 2 and Layer 3 reachability information for
end hosts that reside in the EVPN VXLAN overlay network. In other words, it advertises
both MAC and IP addresses of EVPN VXLAN end hosts. This capability forms the basis
for VXLAN integrated routing and bridging support.
Traffic between end hosts in the same VNI needs to be bridged in the overlay network,
which means that VTEP devices in a given VNI need to know about other MAC address-
es of end hosts in this VNI. Distribution of MAC addresses through BGP EVPN allows
unknown unicast flooding in the VXLAN to be reduced or eliminated. Layer 3 host IP
addresses are advertised through MP-BGP EVPN so that inter-VXLAN traffic can be
routed to the destination end host through an optimal path. For inter-VXLAN traffic that
needs to be routed to the destination end host, host-based IP routing can provide the
optimal forwarding path to the exact location of the destination host.
||||||||||||||||||||
VXLAN overlay networks. Each VTEP performs local learning to obtain MAC and IP
address information from its locally attached hosts and then distributes this information
through the MP-BGP EVPN control plane. Hosts attached to remote VTEPs are learned
remotely through the MP-BGP control plane. This approach reduces network flooding
for end-host learning and provides better control over end-host reachability information
distribution.
A VTEP in MP-BGP EVPN learns the MAC addresses and IP addresses of locally
attached end hosts through local learning. This learning can be local data plane based
using the standard Ethernet and IP learning procedures, such as source MAC address
learning from the incoming Ethernet frames and IP address learning when the hosts send
Gratuitous ARP (GARP) and Reverse ARP (RARP) packets or ARP requests for the gate-
way IP address on the VTEP.
After learning the localhost MAC and IP addresses, a VTEP advertises the host infor-
mation in the MP-BGP EVPN control plane so that this information can be distributed
to other VTEPs. This approach enables EVPN VTEPs to learn the remote end hosts in
the MP-BGP EVPN control plane. The EVPN routes are advertised through the L2VPN
EVPN address family. The BGP L2VPN EVPN routes include the following information:
■ L2 VNI: VNI of the bridge domain to which the end host belongs
MP-BGP EVPN uses the BGP extended community attribute to transmit the exported
route targets in an EVPN route. When an EVPN VTEP receives an EVPN route, it com-
pares the route target attributes in the received route to its locally configured route target
import policy to decide whether to import or ignore the route. This approach uses the
decade-old MP-BGP VPN technology (RFC 4364) and provides scalable multitenancy in
which a node that does not have a VRF locally does not import the corresponding routes.
Route target is an extended-community attribute to filter appropriate VPN routes into
the correct VRFs.
When a VTEP switch originates MP-BGP EVPN routes for its locally learned end hosts,
it uses its own VTEP address as the BGP next hop. This BGP next hop must remain
Technet24
||||||||||||||||||||
unchanged through the route distribution across the network because the remote VTEP
must learn the originating VTEP address as the next hop for VXLAN encapsulation when
forwarding packets for the overlay network.
Figure 7-13 shows local and remote end-host address learning and distribution in an
MP-iBGP EVPN using route reflectors.
In Figure 7-13, VTEP-1 learns the MAC addresses and IP addresses of locally attached
end hosts through local learning. VTEP-1 then sends a BGP update to the route-reflector
in the transit network, informing about the host IP (H-IP-1) and MAC (H-MAC-1) address
along with the L2-VNI information. The next-hop in the MP-BGP EVPN route update
is set to VTEP-1. When remote VTEP-2 and VTEP-3 receive the route update from the
route reflector, they install the host information in their routing information base (RIB)
and forwarding information base (FIB).
With the MP-BGP EVPN control plane, a VTEP device first needs to establish BGP
neighbor adjacency with other VTEPs or with Internal BGP (iBGP) route reflectors. In
addition to the BGP updates for end-host NLRI, VTEPs exchange the following informa-
tion about themselves through BGP:
■ Layer 3 VNI
■ VTEP address
As soon as a VTEP receives BGP EVPN route updates from a remote VTEP BGP neigh-
bor, it adds the VTEP address from that route advertisement to the VTEP peer list. This
VTEP peer list is then used as an allowed list of valid VTEP peers. VTEPs that are not on
this allowed list are considered invalid or unauthorized sources. VXLAN encapsulated
traffic from these invalid VTEPs will be discarded by other VTEPs.
Along with the VTEP address that promotes VTEP peer learning, BGP EVPN routes
carry VTEP router MAC addresses. Each VTEP has a router MAC address. Once a
VTEP’s router MAC address is distributed via MP-BGP and learned by other VTEPs, the
other VTEPs use it as an attribute of the VTEP peer to encapsulate inter-VXLAN routed
packets to that VTEP peer. The router MAC address is programmed as the inner destina-
tion MAC address for routed VXLAN.
For additional security, the existing BGP Message Digest 5 (MD5) authentication can be
conveniently applied to the BGP neighbor sessions so that switches can’t become BGP
neighbors to exchange MP-BGP EVPN routes until they successfully authenticate each
other with a preconfigured MD5 Triple Data Encryption Standard (3DES) key.
||||||||||||||||||||
9780137638246_print.indb 277
Intall host information in RIB and FIB:
H-MAC-1→ MAC table
H-IP-1→ VRF IP host table Install host information in RIB and FIB:
BGP Update: H-MAC-1→ MAC table
4 BGP Update: H-IP-1→ VRF IP host table
H-MAC-1 4
MAC Host IP VNI VTEP H-MAC-1
3 Route H-IP-1
H-IP-1
H-MAC-1 H-IP-1 VNI-1 VTEP-1 Reflector 3 VTEP-1 MAC Host IP VNI VTEP
VTEP-1
VNI-1
VNI-1 H-MAC-1 H-IP-1 VNI-1 VTEP-1
2
BGP Update: VTEP-2
VTEP-3
H-MAC-1
H-IP-1
VTEP-1
VNI-1
1 VTEP-1
Technet24
23/08/22 4:34 PM
||||||||||||||||||||
The same anycast gateway virtual The same anycast gateway virtual
IP address and MAC address are IP address and MAC address are
configured on all VTEPs in the VNI. configured on all VTEPs in the VNI.
||||||||||||||||||||
9780137638246_print.indb 279
Physical
IP Address MAC Address VLAN Interface Index Flags
(ifindex)
Spine
IP-1 MAC-1 10 E1/1 Local
IP-2 MAC-2 10 Null Remote
IP-3 MAC-3 10 Null Remote
Host 1 Host 2
VTEP-1 sends an ARP response 3 MAC 1 MAC 2
back to Host-1 with MAC-2. IP 1 IP 2
VLAN 10 VLAN 10
VXLAN 5000 VXLAN 5000
Technet24
23/08/22 4:34 PM
||||||||||||||||||||
Because most end hosts send GARP or RARP requests to announce themselves to the
network right after they come online, the local VTEP will immediately have the opportu-
nity to learn their MAC and IP addresses and distribute this information to other VTEPs
through the MP-BGP EVPN control plane. Therefore, most active IP hosts in VXLAN
EVPN should be learned by the VTEPs either through local learning or control-plane-
based remote learning. As a result, ARP suppression reduces the network flooding caused
by host ARP learning behavior.
■ Layer 2 VNI: An EVPN VXLAN tenant can have multiple Layer 2 networks, each
with a corresponding VNI. These Layer 2 networks are bridge domains in the overlay
network. The VNIs associated with them are often referred to as Layer 2 (L2) VNIs.
A VTEP can have all or a subset of the Layer 2 VNIs in a VXLAN EVPN.
■ Layer 3 VNI: Each tenant VRF instance is mapped to a unique Layer 3 VNI in the
network. This mapping needs to be consistent on all the VTEPs in network. All inter-
VXLAN routed traffic is encapsulated with the Layer 3 VNI in the VXLAN header
and provides the VRF context for the receiving VTEP. The receiving VTEP uses this
VNI to determine the VRF context in which the inner IP packet needs to be for-
warded. This VNI also provides the basis for enforcing Layer 3 segmentation in the
data plane.
■ VTEP router MAC address: Each VTEP has a unique system MAC address that
other VTEPs can use for inter-VNI routing. This MAC address is referred to as the
router MAC address. The router MAC address is used as the inner destination MAC
address for the routed VXLAN packet. As shown in Figure 7-16, when a packet is
sent from VNI A to VNI B, the ingress VTEP routes the packet to the Layer 3 VNI.
It rewrites the inner destination MAC address to the egress VTEP’s router MAC
address and encodes the Layer 3 VNI in the VXLAN header. After the egress VTEP
receives the encapsulated VXLAN packet, it first decapsulates the packet by remov-
ing the VXLAN header. Then it looks at the inner packet header. Because the desti-
nation MAC address in the inner packet header is its own MAC address, it performs
a Layer 3 routing lookup. The Layer 3 VNI in the VXLAN header provides the VRF
context in which this routing lookup is performed.
||||||||||||||||||||
Egress VTEP
Ingress VTEP S-IP: VTEP-1 routes packets
routes packets D-IP: VTEP-4 from Layer 3 VNI
from source VNI VNI: L3 VNI to the destination
to Layer 3 VNI. 1 2 VNI and VLAN.
D-MAC in the S-MAC: GW-MAC
inner header is D-MAC: Router-MAC-4
the egress VTEP S-IP: H-IP-1
S-MAC: GW-MAC
router MAC L3 D-IP: H-IP-2 L3 VNI
VNI D-MAC: H-MAC-2
address. A VNI VNI B S-IP: H-IP-1
VTEP-1 VTEP-2 VTEP-3 VTEP-4 D-IP: H-IP-2
GW-MAC GW-MAC GW-MAC GW-MAC
Router MAC-1 Router MAC-2 Router MAC-3 Router MAC-4
S-MAC: H-MAC-1
D-MAC: GW-MAC Host 1 Host 2
S-IP: H-IP-1
H-MAC-1 H-MAC-2
D-IP: H-IP-2
H-IP-1 H-IP-2
VNI-A VNI-B
When an EVPN VTEP performs forwarding lookup and VXLAN encapsulation for the
packets it receives from its local end hosts, it uses either a Layer 2 VNI or the Layer 3
VNI in the VXLAN header, depending on whether the packets need to be bridged or
routed. If the destination MAC address in the original packet header does not belong to
the local VTEP, the local VTEP performs a Layer 2 lookup and bridges the packet to the
destination end host located in the same Layer 2 VNI as the source host. The local VTEP
embeds this Layer 2 VNI in the VXLAN header. In this case, both the source and desti-
nation hosts are in the same Layer 2 broadcast domain. If the destination MAC address
belongs to the local VTEP switch (that is, if the local VTEP is the IP gateway for the
source host, and the source and destination hosts are in different IP subnets), the packet
will be routed by the local VTEP. In this case, it performs Layer 3 routing lookup. It then
encapsulates the packets with the Layer 3 VNI in the VXLAN header and rewrites the
inner destination MAC address to the remote VTEP’s router MAC address. Upon receipt
of the encapsulated VXLAN packet, the remote VTEP performs another routing lookup
based on the inner IP header because the inner destination MAC address in the received
packet belongs to the remote VTEP itself.
The destination VTEP address in the outer IP header of a VXLAN packet identifies the
location of the destination host in the underlay network. VXLAN packets are routed
toward the egress VTEP through the underlay network based on the outer destination
IP address. After the packet arrives at the egress VTEP, the VNI in the VXLAN header
is examined to determine the VLAN in which the packet should be bridged or the ten-
ant VRF instance to which it should be routed. In the latter case, the VXLAN header
is encoded with a Layer 3 VNI. A Layer 3 VNI is associated with a tenant VRF routing
instance, so the egress VTEP can directly map the routed VXLAN packets to the appro-
priate tenant routing instance. This approach makes multitenancy easier to support for
both Layer 2 and Layer 3 segmentation. The following two VXLAN data plane packet
walk examples illustrate the VXLAN bridging and routing concept.
Technet24
||||||||||||||||||||
282
9780137638246_print.indb 282
Outer S-MAC: MAC-3
Outer S-MAC: MAC-1 Outer D-MAC: MAC-4
Outer D-MAC: MAC-2
3 Outer S-IP: IP-1
Outer S-IP: IP-1 Outer D-IP: IP-4
Outer D-IP: IP-4
Routed Based on UDP
UDP
Outer IP header
VXLAN VNID: 5010
VXLAN VNID: 5010 (L2 VNI) IP Network
Chapter 7: Network Virtualization
S-MAC: MAC-A
Underlay Underlay D-MAC: MAC-B
S-MAC: MAC-A
D-MAC: MAC-B
Router-1 Router-2
S-IP: IP-A
S-IP: IP-A MAC-3 D-IP: IP-B
2 MAC-2
D-IP: IP-B IP-3: 4
IP-2
165.123.1.2 140.123.1.2 MAC-4
IP-4:
MAC-1 140.123.1.1
S-MAC: MAC-A VTEP-1 IP-1 VTEP-2
D-MAC: MAC-B S-MAC: MAC-A
165.123.1.1 D-MAC: MAC-B
S-IP: IP-A
1
D-IP: IP-B S-IP: IP-A 5
D-IP: IP-B
Host-A Host-B
23/08/22 4:34 PM
||||||||||||||||||||
In Figure 7-17, Host-A and Host-B in VXLAN segment 5010 communicate with each
other through the VXLAN tunnel between VTEP-1 and VTEP-2. This example assumes
that address learning has been done on both sides and that corresponding MAC-to-VTEP
mappings exist on both VTEPs.
When Host-A sends traffic to Host-B, it forms Ethernet frames with the MAC-B address
of Host-B as the destination MAC address and sends them out to VTEP-1. VTEP-1, with
a mapping of MAC-B to VTEP-2 in its mapping table, performs VXLAN encapsula-
tion on the packets by adding VXLAN, UDP, and outer IP address headers to them. In
the outer IP address header, the source IP address is the IP address of VTEP-1, and the
destination IP address is the IP address of VTEP-2. VTEP-1 then performs an IP address
lookup for the IP address of VTEP-2 to resolve the next hop in the transit network and
subsequently uses the MAC address of the next-hop device to further encapsulate the
packets in an Ethernet frame to send to the next-hop device.
The packets are routed toward VTEP-2 through the transport network based on their
outer IP address header, which has the IP address of VTEP-2 as the destination address.
After VTEP-2 receives the packets, it strips off the outer Ethernet, IP, UDP, and VXLAN
headers and then forwards the packets to Host-B, based on the original destination MAC
address in the Ethernet frame.
In Figure 7-18, Host-A belongs to VNI 5010 and Host-B belongs to VNI 5020, and both
belong to the same tenant with VRF A. Since intra-VRF communication is allowed,
Host-A can communicate with Host-B through the VXLAN tunnel in VRF A with L3
VNID 9999 between VTEP-1 and VTEP-2. Since the communication between Host-A
and Host-B is inter-VLAN/VNI, L3 VNI will be used instead of L2 VNI during VXLAN
encapsulation at the source VTEP. This example assumes that address learning has been
done on both sides using the MP-BGP EVPN control plane and that corresponding
IP-to-VTEP mappings exist on both VTEPs.
When Host-A sends traffic to Host-B, it forms Ethernet frames with an anycast default
gateway MAC GW-MAC address of VTEP-1 as the destination MAC address and sends
them out to VTEP-1. Since the destination MAC is the anycast gateway MAC, VTEP-1
does an L3 lookup and finds a mapping of IP-B to VTEP-2 in its routing table; it then
performs VXLAN encapsulation on the packets by adding VXLAN, UDP, and outer IP
address headers to them. This time, VTEP-1 uses Layer 3 VNI 9999 for the encapsula-
tion since the communication is between hosts in different VLAN/VNI. In the outer IP
address header, the source IP address is the IP address of VTEP-1, and the destination
IP address is the IP address of VTEP-2. VTEP-1 then performs an IP address lookup
for the IP address of VTEP-2 to resolve the next hop in the transit network and subse-
quently uses the MAC address of the next-hop device to further encapsulate the packets
in an Ethernet frame to send to the next-hop device. Since the packet is routed, VTEP-1
will rewrite the source MAC to VTEP-1 and destination MAC to VTEP-2 in the inner
Ethernet frame L2 header.
Technet24
||||||||||||||||||||
284
9780137638246_print.indb 284
Outer S-MAC: MAC-3
Outer S-MAC: MAC-1 Outer D-MAC: MAC-4
Outer D-MAC: MAC-2
3 Outer S-IP: IP-1
Outer S-IP: IP-1 Outer D-IP: IP-4
Outer D-IP: IP-4
Routed Based on UDP
UDP
Outer IP header
VXLAN VNID: 9999
VXLAN VNID: 9999 (L3 VNI) IP Network
Chapter 7: Network Virtualization
S-MAC: MAC-1
Underlay Underlay D-MAC: MAC-4
S-MAC: MAC-1
D-MAC: MAC-4
Router-1 Router-2
S-IP: IP-A
S-IP: IP-A MAC-3 D-IP: IP-B
2 MAC-2
D-IP: IP-B IP-3: 4
IP-2
165.123.1.2 140.123.1.2 MAC-4
IP-4:
VTEP-1 MAC-1 VTEP-2 140.123.1.1
S-MAC: MAC-A (L3 GW) IP-1
D-MAC: GW-MAC S-MAC: MAC-A (L3 GW)
165.123.1.1 D-MAC: MAC-B
S-IP: IP-A
1
D-IP: IP-B S-IP: IP-A 5
D-IP: IP-B
Host-A Host-B
23/08/22 4:34 PM
||||||||||||||||||||
The packets are routed toward VTEP-2 through the transport network based on their
outer IP address header, which has the IP address of VTEP-2 as the destination address.
After VTEP-2 receives the packets, it strips off the outer Ethernet, IP, UDP, and VXLAN
headers. VTEP-2 finds that the inner packet has a destination IP address of Host-B and
does the routing lookup, rewrites the packet with the source MAC of VTEP-2 and desti-
nation MAC of Host-B, and then forwards the packets to Host-B.
The MP-BGP EVPN control plane works transparently with the vPC VTEP. With an
MP-BGP EVPN control plane, vPC VTEPs continue to function as a single logical VTEP
with the anycast VTEP address for VTEP functions, but they operate as two separate
entities from the perspective of MP-BGP. They have different router IDs for BGP, form
BGP neighbor adjacency with the BGP peers separately, and advertise EVPN routes
independently, as illustrated in Figure 7-19. In the EVPN routes, they both use the any-
cast VTEP address as the next hop so that the remote VTEPs can use the learned EVPN
routes and encapsulate packets using the anycast VTEP address as the destination in the
outer IP header of encapsulated packets.
Underlay IP Network
BGP Peer
eer
BGP P
BG
P
Pe BGP Router ID 2
BGP Router ID 1 er er
Pe
P
BG
Virtual
vPC VTEP with Port Channel
Anycast VTEP
Address
Layer 2 Link
Layer 3 Link
Technet24
||||||||||||||||||||
■ Cisco Adapter FEX: Typical deployments of virtual machines have an extra layer of
switching in the hypervisor. The software switches in the hypervisor emulate hard-
ware at the expense of application performance. Cisco virtual interface cards (VICs)
solve this problem by acting as adapter fabric extenders and bringing the network to
virtual machines (VMs) using Cisco Adapter FEX technology. Cisco Adapter FEX
enables the server adapter to be logically partitioned into multiple virtual network
interface cards (vNICs). Each vNIC behaves like a physical NIC port and meets the
network connectivity needs for each application so that security and quality of ser-
vice (QoS) policies can be applied for each vNIC and application.
■ Cisco Data Center VM-FEX: Network administrators have no control on the soft-
ware switches in the hypervisor, which makes monitoring of network traffic to indi-
vidual VMs very cumbersome. Cisco VICs use the VN-Tag standard in IEEE 802.1
BR standard to manage each link from the VM as if it were a physical link. The VICs
can provide dynamic interfaces to virtual machines, allowing the network to bypass
the hypervisor and directly connect to the VM using VM-FEX technology. Cisco
Data Center VM-FEX partitions the server adapter into multiple vNICs, and each
vNIC is assigned to individual virtual machines, allowing network administrators to
monitor the ports easily. Additionally, the VMs can move from one server to another
with the same security policies and no compromises on the overall network security
to allow the move. Switching of VM traffic happens in hardware switches instead of
using a software switch within the hypervisor.
Figure 7-20 illustrates the difference in the implementation of Adapter FEX and VM-FEX
technology using Cisco VIC card.
||||||||||||||||||||
VM VM VM
Hypervisor Hypervisor
Cisco Virtual
Interface Card vNIC
vNIC
(VIC) vNIC
Physical NICs of the host are connected to the uplink ports on the standard switch.
Uplink ports connect the virtual switch to the physical world. A virtual switch can have
one or more uplinks. Virtual machines’ network adapters (vNICs) are connected to the
port groups on the standard switch. Port groups are groups of virtual ports with similar
configurations. Each logical port on the standard switch is a member of a single port
group. Every port group can use one or more physical NICs to handle its network traf-
fic. If a port group does not have a physical NIC connected to it, virtual machines on
the same port group can only communicate with each other but not with the external
network.
The standard switch also handles VMkernel traffic. A VMkernel port (or the VMkernel
adapter or interface) is used by the hypervisor for VMkernel services when we need to
connect to the physical network. Every VMkernel adapter has an IP address by which this
service is accessible. VMkernel NICs support services such as management traffic, vMo-
Technet24
||||||||||||||||||||
tion traffic, IP storage traffic and discovery, fault tolerance traffic, vSphere replication
traffic, vSAN traffic, and more. Note that a port group can either be used for VMs or
VMkernel ports, not both simultaneously. You can create two port groups with the same
VLAN ID: one for VMs and one for VMkernel ports. A VLAN ID, which restricts port
group traffic to a logical Ethernet segment within the physical network, is optional.
Port
groups
vMotion Dev Production Dev vMotion Management
Management Production
Enviroment Enviroment
Physical Server
Physical Switch
Each port group on a standard switch is identified by a network label, which must be
unique to the current host. You can use network labels to make the networking configu-
ration of virtual machines portable across hosts. You should give the same label to the
port groups in a data center that use physical NICs connected to one broadcast domain
on the physical network. Likewise, if two port groups are connected to physical NICs
on different broadcast domains, the port groups should have distinct labels. For example,
you can create Production and Dev environment port groups as virtual machine networks
on the hosts that share the same broadcast domain on the physical network, as shown in
Figure 7-21.
■ Forwarding of L2 frames
■ VLAN segmentation
||||||||||||||||||||
A network switch in vSphere consists of two logical sections: the data plane and the man-
agement plane. The data plane implements the packet switching, filtering, tagging, and so
on. The management plane is the control structure you use to configure the data plane
functionality. A vSphere standard switch contains both data and management planes,
and the standard switch is configured and maintained individually. A vSphere distributed
switch separates the data plane and the management plane, as shown in Figure 7-22. The
management functionality of the distributed switch resides on the vCenter Server system,
which lets you administer the networking configuration of your environment on a data
center level. The data plane remains locally on every host associated with the distributed
switch. The data plane section of the distributed switch is called a host proxy switch. The
networking configuration you create on vCenter Server (the management plane) is auto-
matically pushed down to all host proxy switches (the data plane).
The vSphere distributed switch introduces two abstractions, the uplink port group and
the distributed port group, that create a consistent networking configuration for physical
NICs, virtual machines, and VMkernel services.
An uplink port group or dvuplink port group is defined during the creation of the dis-
tributed switch and can have one or more uplinks. An uplink is a template you use to
configure physical connections of hosts as well as failover and load-balancing policies.
Physical NICs of hosts are mapped to uplinks on the distributed switch. At the host level,
each physical NIC is connected to an uplink port with a particular ID. Once the policies
such as failover and load balancing are configured over uplinks, the policies are automati-
cally propagated to the host proxy switches, or the data plane. The automatic propaga-
tion of policies ensures consistent failover and load-balancing configuration for the physi-
cal NICs of all hosts associated with the distributed switch.
Technet24
||||||||||||||||||||
vCenter Server
vSphere Distributed Switch
Distributed
Production network VMkernel network
port groups
Uplink port group
Uplink1 Uplink2 Uplink3
Management plane
Data plane
Host 1 Host 2
Virtual network
Physical network
Physical NICs
Physical Switch
Distributed port groups provide network connectivity to virtual machines and accommo-
date VMkernel traffic. Each distributed port group is identified by using a network label,
which must be unique to the current data center. Policies such as NIC teaming, failover,
load balancing, VLAN, security, traffic shaping, and other policies are configured on the
distributed port groups. The virtual ports connected to a distributed port group share the
same properties configured for the distributed port group. As with uplink port groups,
the configuration you set on distributed port groups on vCenter Server (the management
plane) is automatically propagated to all hosts on the distributed switch through the host
proxy switches (the data plane). A group of virtual machines associated to the same dis-
tributed port group share the same networking configuration.
A vSphere distributed switch supports all the features of a standard switch. In addition,
the vSphere distributed switch supports the following features:
■ Data-center-level management
||||||||||||||||||||
Summary 291
■ Private VLANs
■ Port mirroring
■ NetFlow
Summary
This chapter discusses overlay network protocols such as NVGRE, Cisco OTV, VXLAN
Overlay, along with network interface virtualization using FEX and VMware vSphere vir-
tual switches, including the following points:
■ Network overlays are virtual networks of interconnected nodes that share an under-
lying physical network, allowing deployment of applications that require specific
network topologies without the need to modify the underlying network.
■ Virtual Extensible LAN (VXLAN) is a Layer 2 overlay scheme over a Layer 3 net-
work that provides a means to extend Layer 2 segments across a Layer 3 infrastruc-
ture using MAC-in-UDP encapsulation and tunneling.
■ The VXLAN control plane can use one of two methods: flood-and-learn or MP-BGP
EVPN.
■ The VXLAN data plane uses stateless tunnels between VTEPs to transmit traffic of
the overlay Layer 2 network through the Layer 3 transport network.
Technet24
||||||||||||||||||||
■ Cisco Adapter FEX enables the server adapter to be logically partitioned into
multiple virtual network interface cards (vNICs). Cisco Data Center VM-FEX pro-
vides dynamic vNIC interfaces to virtual machines, bypassing the hypervisor and
directly connecting to the VM.
■ VMware vSphere virtual switches provide network connectivity to hosts and virtual
machines and support VLANs that are compatible with standard VLAN implementa-
tions.
References
“Cisco Data Center Spine-and-Leaf Architecture: Design Overview White Paper,”
https://www.cisco.com/c/en/us/products/collateral/switches/nexus-7000-series-
switches/white-paper-c11-737022.html
“Cisco Overlay Transport Virtualization Technology Introduction and Deployment
Considerations,” https://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/
Data_Center/DCI/whitepaper/DCI3_OTV_Intro.html
“Cisco Nexus 7000 Series NX-OS OTV Configuration Guide, Release 8.x,” https://
www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus7000/sw/otv/config/
cisco_nexus7000_otv_config_guide_8x.html
“Overlay Transport Virtualization Best Practices Guide,” https://www.cisco.com/c/
dam/en/us/products/collateral/switches/nexus-7000-series-switches/guide_c07-
728315.pdf
“Cisco Nexus 9000 Series NX-OS VXLAN Configuration Guide, Release 10.1(x),”
https://www.cisco.com/c/en/us/td/docs/dcn/nx-os/nexus9000/101x/configura-
tion/vxlan/cisco-nexus-9000-series-nx-os-vxlan-configuration-guide-release-101x.
html
“Cisco Nexus 7000 Series NX-OS VXLAN Configuration Guide, Release 8.x,”
https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus7000/sw/vxlan/
config/cisco_nexus7000_vxlan_config_guide_8x.html
“VXLAN EVPN Multi-Site Design and Deployment White Paper,” https://www.
cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/white-
paper-c11-739942.html
“VXLAN Network with MP-BGP EVPN Control Plane Design Guide,” https://www.
cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/guide-
c07-734107.html
“VXLAN Overview: Cisco Nexus 9000 Series Switches,” https://www.cisco.com/c/
en/us/products/collateral/switches/nexus-9000-series-switches/white-paper-
c11-729383.html
“Cisco Nexus 2000 Series NX-OS Fabric Extender Configuration Guide for Cisco
Nexus 9000 Series Switches, Release 9.2(x),” https://www.cisco.com/c/en/us/td/
docs/switches/datacenter/nexus9000/sw/92x/fex/configuration/guide/b-cisco-
nexus-2000-nx-os-fabric-extender-configuration-guide-for-nexus-9000-switches-
92x.html
||||||||||||||||||||
References 293
Technet24
||||||||||||||||||||
||||||||||||||||||||
Chapter 8
Organizations today are challenged to keep pace with everchanging customer demands
and ever-present competitive threats. To succeed, organizations need to be much more
agile and focused on delivering a superior customer experience. This requires fine-tuning
and in many cases upgrading processes, culture, and technology; this is commonly
referred to as a digital transformation. Applications are at the center of the drive to
modernize traditional businesses and the foundation of their digital transformation. It
is critical to ensure that all applications are deployed rapidly, perform optimally, remain
highly available, and are secure. Technologies using software-defined capabilities are key
enablers of digital transformation. Cisco Application-Centric Infrastructure (ACI) is a
software-defined networking (SDN) solution designed for data centers. Cisco ACI allows
application requirements to define the network. The Cisco ACI architecture simplifies,
optimizes, and accelerates the entire application deployment lifecycle.
In this chapter, we will discuss Cisco ACI overview, building blocks, deployment models,
hardware components, fabric startup discovery, along with the Cisco ACI policy model,
including logical constructs, fabric policies, and access policies. We will also discuss in
brief about the packet forwarding within the ACI fabric.
Technet24
||||||||||||||||||||
■ Cisco Nexus 9000 Series leaf switches: Represent connection points for end devices,
including APIC, and are connected to spine switches.
■ Cisco Nexus 9000 Series spine switches: Represent the backbone of the ACI fabric
and are connected to leaf switches.
Figure 8-1 shows a traditional network versus a Cisco ACI spine/leaf architecture.
Spine
Complicated Core/Dist/Access Simple
Topology layer separation Topology
SPINE/LEAF
Leaf
■ Layer 2 loops: Traditional networks rely on the Spanning Tree Protocol (STP) for
loop-free topology. Cisco ACI use equal-cost multipath (ECMP), and since there is
IP reachability between leaf and spine switches, there is no need for STP, and you do
not have to block any port to avoid the Layer 2 loops.
Figure 8-2 shows a traditional network versus Cisco ACI loop avoidance.
Infra
VRF No STP
||||||||||||||||||||
■ Security: From the security perspective, in a traditional network device, all the traf-
fic is allowed by default, and you need to explicitly configure the device to block the
traffic. However, in Cisco ACI, an allow-list model is used. By default, everything is
blocked, unless you explicitly allow the traffic.
Figure 8-3 shows a traditional network versus the Cisco ACI security model.
EPG EPG
Contract
Traditional network security model Cisco ACI uses allow list model
Figure 8-4 shows a traditional network versus Cisco ACI device management.
Technet24
||||||||||||||||||||
Spine
Simple
Topology
SPINE/LEAF
Leaf
■ Coordination between the network and server team: Typically, the network and
server teams are two different teams. They need to cooperate to make sure that, for
example, the new service has correct security rules, the correct VLAN, and that the
correct VLAN is deployed in the correct place. Sometimes, that communication is
not an easy task. By using the dynamic integration, for example, VMware integra-
tion, you can dynamically push the configuration to the vCenter Server. Then you
can verify that the network (ACI) side has the configuration deployed and also that
the server side has the mapped configuration.
||||||||||||||||||||
Figure 8-6 shows a traditional network versus Cisco ACI coordination between
network and server teams.
Figure 8-6 Traditional Network vs. Cisco ACI Coordination Between the Network and
Server Teams
■ Centralized network management and visibility with full automation and real-time
network health monitoring.
■ Inherent security with a zero-trust allow list model and innovative features in policy
enforcement, microsegmentation, and analytics.
■ Open APIs and a programmable SDN fabric, with 65+ ecosystem partners.
Technet24
||||||||||||||||||||
use of two tiers of leaf switches, which provides the capability for vertical expansion of
the Cisco ACI fabric. This is useful to migrate a traditional three-tier architecture of core
aggregation access that has been a common design model for many enterprise networks
and is still required today. The primary reason for this is cable reach, where many hosts
are located across floors or across buildings; however, due to the high pricing of fiber
cables and the limitations of cable distances, it is not ideal in some situations to build
a full-mesh two-tier fabric. In those cases, it is more efficient to build a spine-leaf-leaf
topology and continue to benefit from the automation and visibility of Cisco ACI.
Spine Spine
Leaf Leaf
(tier-1 leaf)
tier-2 leaf
Leaf Switches
Leaf switches are the switches to which all endpoints (servers, storage, service nodes,
and so on) connect. Leaf switches are available with various port speeds, ranging from
100Mbps to 400Gbps. Leaf switches are at the edge of the fabric and provide the
VXLAN tunnel endpoint (VTEP) function. In Cisco ACI terminology, the IP address that
represents the leaf VTEP is called the physical tunnel endpoint (PTEP). The leaf switches
are responsible for routing or bridging tenant packets and for applying network policies.
In large-scale deployments, leaf switches are often dedicated and categorized by func-
tions:
■ Border leaf: Leaf switches that provide Layer 2 and Layer 3 connections to outside
networks.
■ Services leaf: Leaf switches that connect to Layer 4–7 service appliances such as
load balancers and firewalls.
■ Compute leaf: Leaf switches that connect to compute resources such as physical and
virtualized servers.
■ Storage leaf: Leaf switches that connect to storage devices for compute resources.
This can include iSCSI, NFS, and other Ethernet medium storage devices.
||||||||||||||||||||
Leaf switches do not need to be delegated to only one category. Depending on the
design, the categories can overlap. For example, a leaf switch serving as a border leaf
switch can also connect to compute resources.
Spine Switches
Spine switches interconnect leaf switches and provide the backbone of the ACI fabric.
Spine switches are available with various port speeds, ranging from 40Gbps to 400Gbps.
Within a pod, all tier-1 leaf switches connect to all spine switches, and all spine switches
connect to all tier-1 leaf switches, but no direct connectivity is allowed between spine
switches, between tier-1 leaf switches, or between tier-2 leaf switches. If you incorrectly
cable spine switches to each other or leaf switches in the same tier to each other, the
interfaces will be disabled. You may have topologies in which certain leaf switches are
not connected to all spine switches, but traffic forwarding may be suboptimal in this sce-
nario. Spine switches can also be used to build a Cisco ACI MultiPod fabric by connect-
ing a Cisco ACI pod to an IP network, or they can connect to a supported WAN device
for external Layer 3 connectivity. Spine switches also store all the endpoint-to-VTEP
mapping entries (spine switch proxies). Nexus 9000 Series switches used in the ACI fab-
ric run the ACI operating system instead of NX-OS.
Cisco APIC
The Cisco Application Policy Infrastructure Controller (APIC) is the central point of
management for the ACI fabric. It is a clustered network control and policy system that
provides image management, bootstrapping, and policy configuration for the Cisco ACI
fabric. APIC translates a policy created on it into a configuration and pushes it to the
right switches. The APIC appliance is deployed as a cluster. A minimum of three infra-
structure controllers are configured in a cluster to provide control of the scale-out Cisco
ACI fabric. The ultimate size of the controller cluster is directly proportionate to the size
of the Cisco ACI deployment and is based on the transaction-rate requirements. Any
controller in the cluster can service any user for any operation, and a controller can be
transparently added to or removed from the cluster. If you lose one of the controllers,
you can still change and add new configurations through the remaining controllers. Since
the APIC is not involved in data plane forwarding, even if all the controllers in the fabric
go down, the traffic flow is not impacted, and forwarding continues through the leaf
and spine switches. If configuration changes need to be made, you must bring the Cisco
APIC back up.
Cisco APICs are equipped with two network interface cards (NICs) for fabric connectiv-
ity. These NICs should be connected to different leaf switches for redundancy. Cisco
APIC connectivity is automatically configured for active-backup teaming, which means
that only one interface is active at any given time.
Technet24
||||||||||||||||||||
Spines
Leafs
Backup Link
Active Link
APIC Cluster
■ Policy Manager: Manages the distributed policy repository responsible for the
definition and deployment of the policy-based configuration of Cisco ACI.
■ Topology Manager: Maintains up-to-date Cisco ACI topology and inventory infor-
mation.
■ Observer: The monitoring subsystem of the Cisco APIC; serves as a data repository
for Cisco ACI operational state, health, and performance information.
■ Boot Director: Controls the booting and firmware updates of the spine and leaf
switches as well as the Cisco APIC elements.
■ Appliance Director: Manages the formation and control of the Cisco APIC
appliance cluster.
■ Virtual Machine Manager (or VMM): Acts as an agent between the policy reposi-
tory and a hypervisor and is responsible for interacting with hypervisor management
systems such as VMware vCenter.
■ Event Manager: Manages the repository for all the events and faults initiated from
the Cisco APIC and the fabric switches.
■ Appliance element: Manages the inventory and state of the local Cisco APIC
appliance.
||||||||||||||||||||
Interpod IP network
Site A Site B
Technet24
||||||||||||||||||||
provisioning for multiple Cisco ACI fabrics operating in a coordinated way. When this
solution is combined with the latest networking enhancements of Cisco ACI, organiza-
tions can manage extension network elements such as virtual routing and forwarding
(VRF) instances, bridge domains, and subnets across multiple fabrics. (More on VRF,
bridge domains, and subnets can be found later in this chapter.) It enables centralized
policy and security controls across geographically distributed fabrics and very large
scaled-out fabrics with automation and operations from a common point, allowing for a
global cloud-scale infrastructure.
The main features of Cisco Nexus Dashboard Orchestrator include the following:
■ Change control across multiple fabrics, allowing staging, testing, and, if required,
clean backout of any policy changes
Cisco Nexus
Dashboard
Orchestrator
A A A A
google aws
cloud
Azure
VM VM VM VM VM VM VM VM VM
DCNM
APIC APIC
||||||||||||||||||||
them into native policy constructs for applications deployed across various cloud envi-
ronments. It uses a holistic approach to enable application availability and segmentation
for bare-metal, virtualized, containerized, or microservices-based applications deployed
across multiple cloud domains. The common policy and operating model drastically
reduce the cost and complexity of managing multicloud deployments. Cisco Cloud ACI
provides a single management console to configure, monitor, and operate multiple dis-
joint environments spread across multiple clouds. It allows you to securely connect and
segment workloads, not only in the public cloud, but also across public clouds. Cisco
Cloud ACI is available on AWS and Microsoft Azure; future availability for Google Cloud
has been announced at the time of this writing.
■ Cisco Cloud APIC: Manage multiple cloud regions and the Cisco Cloud Services
Routers (CSR) 1000v Series from a single instance of Cisco Cloud APIC and enable
consistent policy, security, and operations through secure interconnect for a multi-
cloud environment.
■ Cisco Cloud Services Router 1000v Series: Cloud ACI uses the Cisco Cloud
Services Router (CSR) 1000v Series as the cloud router for connectivity between
on-premises and cloud environments.
ACI
Any routed IP network VM VM VM VM VM VM VM
Technet24
||||||||||||||||||||
IP Network
(IPN)
Logical
connection Virtual Pod
to Spine
(BGP-EVPN)
vSpine vSpine
Hypervisor
Remote
Location
VM VM VM VM VM VM VM VM VM VM VM VM VM VM
||||||||||||||||||||
Figure 8-13 shows an example of a mini Cisco ACI fabric with a physical APIC and two
virtual APICs (vAPICs).
Spine switches
Leaf switches
VM 1 VM 2
APIC APIC
Technet24
||||||||||||||||||||
Virtual APIC is deployed using two hard disks. Table 8-2 shows the Cisco virtual APIC
requirements.
The Cisco Nexus 9000 Series switches operate in one of two modes: Cisco ACI or
Cisco NX-OS. The Cisco Nexus 9000 Series switches in ACI mode are the spine and leaf
switches that build the fabric. Spine switches are available in both modular and fixed
variants. Cisco Nexus 9500 Series modular chassis with 4, 8, and 16 slots are used as
modular spine switches. The Cisco Nexus 9500 Series modular switches support a com-
prehensive selection of line cards and fabric modules that provide 1-, 10-, 25-, 40-, 50-,
100-, 200-, and 400-Gigabit Ethernet interfaces. The supervisor, system controller, power
supplies, and line cards are common across all three switches. Each switch, however, has
unique fabric modules and fan trays that plug in vertically in the rear of the chassis. Cisco
Nexus 9300 Series switches are available in both spine and leaf variants.
||||||||||||||||||||
Table 8-3 shows the various 9500 modular spine switch chassis available at the time of
this writing.
You can easily figure out the capabilities of spine and leaf switches from the product ID
of each switch. Try to decode the capabilities of spine and leaf switches from Tables 8-4
through 8-7 from the product ID of each switch using the taxonomy for N9K part
numbers, as shown in Figure 8-15.
Technet24
||||||||||||||||||||
Table 8-4 shows the modular spine switch line cards available at the time of writing.
Note Cloud Scale in Table 8-4 refers to line cards using Cisco Cloud Scale ASICs.
Table 8-5 shows the fixed spine switches available at the time of writing.
||||||||||||||||||||
Table 8-6 shows the fixed leaf switches available at the time of writing.
Technet24
||||||||||||||||||||
Product ID Description
N9K-C93180YC-FX Cisco Nexus 9300 platform switch with 48 1/10/25-Gigabit
Ethernet SFP28 front-panel ports and six fixed 40/100-Gigabit
Ethernet QSFP28 spine-facing ports.
N9K-C93216TC-FX2 Cisco Nexus 9300 platform switch with 96 1/10GBASE-T (cop-
per) front-panel ports and 12 40/100-Gigabit Ethernet QSFP28
spine-facing ports.
N9K-C93180YC-FX-24 Cisco Nexus 9300 platform switch with 24 1/10/25-Gigabit
Ethernet SFP28 front-panel ports and six fixed 40/100-Gigabit
Ethernet QSFP28 spine-facing ports.
N9K-C9348GC-FXP Cisco Nexus 9348GC-FXP switch with 48 100/1000-Megabit
1GBASE-T downlink ports, four 10/25-Gigabit SFP28 downlink
ports, and two 40/100-Gigabit QSFP28 uplink ports.
N9K-C93108TC-EX Cisco Nexus 9300 platform switch with 48 1/10GBASE-T (cop-
per) front-panel ports and six 40/100-Gigabit QSFP28 spine-
facing ports.
N9K-C93108TC-EX-24 Cisco Nexus 9300 platform switch with 24 1/10GBASE-T (cop-
per) front-panel ports and six 40/100-Gigabit QSFP28 spine-
facing ports.
N9K-C93180LC-EX Cisco Nexus 9300 platform switch with 24 40-Gigabit front-panel
ports and six 40/100-Gigabit QSFP28 spine-facing ports.
N9K-C93108TC-FX-24 Cisco Nexus 9300 platform switch with 24 1/10GBASE-T (cop-
per) front-panel ports and six fixed 40/100-Gigabit Ethernet
QSFP28 spine-facing ports.
N9K-C93180YC-EX Cisco Nexus 9300 platform switch with 48 1/10/25-Gigabit front-
panel ports and six 40/100 Gigabit QSFP28 spine-facing ports.
N9K-C93180YC-EX-24 Cisco Nexus 9300 platform switch with 24 1/10/25-Gigabit front-
panel ports and six 40/100 Gigabit QSFP28 spine-facing ports.
N9K-C93120TX Cisco Nexus 9300 platform switch with 96 1/10GBASE-T (cop-
per) front-panel ports and six 40-Gigabit Ethernet QSFP spine-
facing ports.
Table 8-7 lists the fixed Nexus 9300 series switches at the time of writing. These switches
can be used as either spine or leaf switches in an ACI fabric, as per the requirement.
||||||||||||||||||||
Note Cisco often launches new leaf/spine switch models with added capabilities. For the
most up-to-date information, refer to https://www.cisco.com/c/en/us/products/switches/
nexus-9000-series-switches/index.html.
The Cisco Nexus ACI fabric software is bundled as an ISO image, which can be
installed on the Cisco APIC server through the KVM interface on the Cisco Integrated
Management Controller (CIMC). The Cisco Nexus ACI Software ISO contains the Cisco
APIC image, the firmware image for the leaf node, the firmware image for the spine node,
default fabric infrastructure policies, and the protocols required for operation.
The ACI fabric bootstrap sequence begins when the fabric is booted with factory-
installed images on all the switches. The Cisco Nexus 9000 Series switches that run the
ACI firmware and APICs use a reserved overlay for the boot process. This infrastructure
space is hard-coded on the switches. The APIC can connect to a leaf through the default
overlay, or it can use a locally significant identifier.
The ACI fabric uses an infrastructure space, which is securely isolated in the fabric and
is where all the topology discovery, fabric management, and infrastructure addressing is
Technet24
||||||||||||||||||||
performed. ACI fabric management communication within the fabric takes place in the
infrastructure space through internal private IP addresses. This addressing scheme allows
the APIC to communicate with fabric nodes and other Cisco APICs in the cluster. The
APIC discovers the IP address and node information of other Cisco APICs in the cluster
using a Link Layer Discovery Protocol–based discovery process.
■ Each APIC in the Cisco ACI uses an internal private IP address to communicate with
the ACI nodes and other APICs in the cluster. The APIC discovers the IP address of
other APICs in the cluster through an LLDP-based discovery process.
■ APICs maintain an appliance vector (AV), which provides a mapping from an APIC
ID to an APIC IP address and a universally unique identifier (UUID) of the APIC.
Initially, each APIC starts with an AV filled with its local IP address, and all other
APIC slots are marked as unknown.
■ When a switch reboots, the policy element (PE) on the leaf gets its AV from the
APIC. The switch then advertises this AV to all its neighbors and reports any discrep-
ancies between its local AV and the neighbors’ AVs to all the APICs in its local AV.
Using this process, the APIC learns about the other APICs in the ACI through switches.
After these newly discovered APICs in the cluster have been validated, they update their
local AV and program the switches with the new AV. Switches then start advertising this
new AV. This process continues until all the switches have the identical AV and all APICs
know the IP address of all the other APICs.
The ACI fabric is brought up in a cascading manner, starting with the leaf nodes directly
attached to the APIC. LLDP and control-plane IS-IS convergence occurs in parallel to this
boot process. The ACI fabric uses LLDP- and DHCP-based fabric discovery to automati-
cally discover the fabric switch nodes, assign the infrastructure VXLAN tunnel endpoint
(VTEP) addresses, and install the firmware on the switches. The fabric uses an IS-IS
(Intermediate System to Intermediate System) environment utilizing Level 1 connections
within the topology for advertising loopback addresses called the Virtual extensible LAN
tunnel endpoints (VTEPs), which are used in the integrated overlay and advertised to all
other nodes in the fabric for overlay tunnel use. Prior to this automated process, a mini-
mal bootstrap configuration must be performed on the Cisco APIC. After the APICs are
connected and their IP addresses assigned, the APIC GUI can be accessed by entering
the address of any APIC into a web browser. The APIC GUI runs HTML5 and eliminates
the need for Java to be installed locally.
||||||||||||||||||||
change to the policy model. This policy model change then triggers a change to the actu-
al managed endpoint. This approach is called a model-driven framework.
■ The logical domain (ACI objects configured by a user in APIC) and concrete domain
(ACI objects upon which the switch’s operating system acts) are separated; the logi-
cal configurations are rendered into concrete configurations by applying the policies
in relation to the available physical resources. No configuration is carried out against
concrete entities. Concrete entities are configured implicitly as a side effect of the
changes to the APIC policy model. Concrete entities can be, but do not have to be,
physical (such as a virtual machine or a VLAN).
■ The system prohibits communications with newly connected devices until the policy
model is updated to include the new device.
The Cisco ACI fabric is composed of the physical and logical components recorded in
the Management Information Model (MIM), which can be represented in a hierarchi-
cal management information tree (MIT). The information model is stored and managed
by processes that run on the APIC. The APIC enables the control of managed resources
by presenting their manageable characteristics as object properties that can be inherited
according to the location of the object within the hierarchical structure of the MIT.
Each node in the tree represents a managed object (MO) or group of objects. MOs are
abstractions of fabric resources. An MO can represent a concrete object (such as a switch
or adapter) or a logical object (such as an application profile, endpoint group, or fault).
Figure 8-16 provides an overview of the MIT.
Policy Universe
Technet24
||||||||||||||||||||
The hierarchical structure starts with the policy universe at the top (root) and contains
parent and child nodes. Each node in the tree is an MO, and each object in the fabric has
a unique distinguished name (DN) that describes the object and locates its place in the
tree. The following managed objects contain the policies that govern the operation of the
system:
■ User tenants are defined by the administrator according to the needs of users.
They contain policies that govern the operation of resources such as applications,
databases, web servers, network-attached storage, virtual machines, and so on.
■ The common tenant is provided by the system but can be configured by the fab-
ric administrator. It contains policies that govern the operation of resources acces-
sible to all tenants, such as firewalls, load balancers, Layer 4 to Layer 7 services,
intrusion detection appliances, and so on.
■ The infrastructure tenant is provided by the system but can be configured by the
fabric administrator. It contains policies that govern the operation of infrastruc-
ture resources such as the fabric VXLAN overlay. It also enables a fabric provider
to selectively deploy resources to one or more user tenants.
■ Access policies govern the operation of switch access ports that provide connectiv-
ity to resources such as storage, compute, Layer 2 and Layer 3 (bridged and routed)
connectivity, virtual machine hypervisors, Layer 4 to Layer 7 devices, and so on. If
a tenant requires interface configurations other than those provided in the default
link, Cisco Discovery Protocol (CDP), Link Layer Discovery Protocol (LLDP), Link
Aggregation Control Protocol (LACP), or Spanning Tree, an administrator must con-
figure access policies to enable such configurations on the access ports of the leaf
switches.
■ Fabric policies govern the operation of the switch fabric ports, including functions
such as Network Time Protocol (NTP) server synchronization, Intermediate System
to Intermediate System Protocol (IS-IS), Border Gateway Protocol (BGP) route reflec-
tors, Domain Name System (DNS), and so on. The fabric MO contains objects such
as power supplies, fans, chassis, and so on.
||||||||||||||||||||
■ Virtual machine (VM) domains group VM controllers with similar networking policy
requirements. VM controllers can share VLAN or Virtual Extensible Local Area
Network (VXLAN) space and application endpoint groups (EPGs). The APIC com-
municates with the VM controller to publish network configurations such as port
groups that are then applied to the virtual workloads.
■ Access, authentication, and accounting (AAA) policies govern the user privileges,
roles, and security domains of the Cisco ACI fabric.
The hierarchical policy model fits well with the REST API interface. When invoked, the
API reads from or writes to objects in the MIT. Any data in the MIT can be described as
a self-contained, structured tree text document encoded in XML or JSON.
Figure 8-17 provides an overview of the ACI policy model logical constructs.
APIC Policy
Technet24
||||||||||||||||||||
Tenant
1 1
1 1 1 1
n n n n n n
Application
Outside Bridge
Network VRF Contract Filter
Network Domain
Profile 1
1 1 1 n
n n n
n
Subnet Subject
n n
Endpoint
Group n
1
Legend:
Object contains the ones below
1 Criterion Indicate a relationship
Encap, 1:n Indicate one to many
IP, MAC n:n Indicate many to many
Tenant
A tenant is a logical container for application policies that enable an administrator to
exercise domain-based access control. A tenant represents a unit of isolation from a
policy perspective, but it does not represent a private network. Tenants can represent a
customer in a service provider setting, an organization or domain in an enterprise setting,
or just a convenient grouping of policies. Tenants can be isolated from one another or
can share resources. The primary elements that the tenant contains are filters, contracts,
outside networks, bridge domains, virtual routing and forwarding (VRF) instances, and
application profiles that contain endpoint groups (EPGs). Entities in the tenant inherit
||||||||||||||||||||
its policies. VRFs are also known as contexts; each VRF can be associated with multiple
bridge domains. Tenants are logical containers for application policies. The fabric can
contain multiple tenants. A tenant must be configured before you can deploy any Layer 4
to Layer 7 services. The ACI fabric supports IPv4, IPv6, and dual-stack configurations for
tenant networking.
VRFs
A virtual routing and forwarding (VRF), or context, is a tenant network. A tenant can
have multiple VRFs. A VRF is a unique Layer 3 forwarding and application policy
domain. A VRF defines a Layer 3 address domain. One or more bridge domains are asso-
ciated with a VRF. All of the endpoints within the Layer 3 domain must have unique IP
addresses because it is possible to forward packets directly between these devices if the
policy allows it. A tenant can contain multiple VRFs. After an administrator creates a
logical device, the administrator can create a VRF for the logical device, which provides
a selection criteria policy for a device cluster. A logical device can be selected based on a
contract name, a graph name, or the function node name inside the graph.
Application Profiles
An application profile defines the policies, services, and relationships between endpoint
groups (EPGs). Application profiles contain one or more EPGs. Modern applications con-
tain multiple components. For example, an e-commerce application could require a web
server, a database server, data located in a storage area network, and access to outside
resources that enable financial transactions. The application profile contains as many (or
as few) EPGs as necessary that are logically related to providing the capabilities of an
application.
Endpoint Groups
The endpoint group (EPG) is the most important object in the policy model. An EPG is
a managed object that is a named logical entity that contains a collection of endpoints.
Endpoints are devices connected to the network directly or indirectly. They have an
address (identity), a location, and attributes (such as version or patch level), and they can
be physical or virtual. Knowing the address of an endpoint also enables access to all its
other identity details. Endpoint examples include servers, virtual machines, network-
attached storage, or clients on the Internet. Endpoint membership in an EPG can be
dynamic or static. An EPG can be statically configured by an administrator in the APIC,
or dynamically configured by an automated system such as vCenter or OpenStack. WAN
router connectivity (L3Out) to the fabric is an example of a configuration that uses a
static EPG. (More on ACI external connectivity options can be found in Chapter 9,
“Operating Cisco ACI.”) Virtual machine management connectivity to VMware vCenter
is an example of a configuration that uses a dynamic EPG. Once the virtual machine
management domain is configured in the fabric, vCenter triggers the dynamic configura-
tion of EPGs that enable virtual machine endpoints to start up, move, and shut down as
needed. EPGs contain endpoints that have common policy requirements such as security,
Technet24
||||||||||||||||||||
virtual machine mobility, QoS, and Layer 4 to Layer 7 services. Rather than
endpoints being configured and managed individually, they are placed in an EPG and
managed as a group. Policies apply to EPGs, never to individual endpoints.
■ Shared: The subnet can be shared with and exported to multiple VRFs in the same
tenant or across tenants as part of a shared service. An example of a shared service
is a routed connection to an EPG present in another VRF in a different tenant. This
enables traffic to pass in both directions across VRFs. An EPG that provides a shared
service must have its subnet configured under that EPG (not under a BD), and its
scope must be set to advertised externally and shared between VRFs.
||||||||||||||||||||
is deployed on the port. Without a VLAN pool being defined in an AEP, a VLAN is not
enabled on the leaf port even if an EPG is provisioned.
Contracts
In addition to EPGs, contracts are key objects in the policy model. EPGs can only com-
municate with other EPGs according to contract rules. An administrator uses a contract
to select the type(s) of traffic that can pass between EPGs, including the protocols and
ports allowed. If there is no contract, inter-EPG communication is disabled by default.
There is no contract required for intra-EPG communication; intra-EPG communication is
always implicitly allowed. You can also configure contract preferred groups that enable
greater control of communication between EPGs in a VRF. If most of the EPGs in the
VRF should have open communication, but a few should only have limited communica-
tion with the other EPGs, you can configure a combination of a contract preferred group
and contracts with filters to control communication precisely. Contracts govern the com-
munication between EPGs that are labeled providers, consumers, or both. EPG providers
expose contracts with which a would-be consumer EPG must comply. The relationship
between an EPG and a contract can be either a provider or consumer. When an EPG pro-
vides a contract, communication with that EPG can be initiated from other EPGs as long
as the communication complies with the provided contract. When an EPG consumes a
contract, the endpoints in the consuming EPG may initiate communication with any end-
point in an EPG that is providing that contract.
Labels control which rules apply when communicating between a specific pair of EPGs.
Labels are managed objects with only one property: a name. Labels enable classifying
which objects can and cannot communicate with one another. Label matching is done
first. If the labels do not match, no other contract or filter information is processed.
The label match attribute can be one of these values: at least one (the default), all, none,
or exactly one. Labels can be applied to a variety of provider and consumer managed
objects, including EPGs, contracts, bridge domains, and so on. Labels do not apply
across object types; a label on an application EPG has no relevance to a label on a bridge
domain.
Filters are Layer 2 to Layer 4 fields, TCP/IP header fields such as Layer 3 protocol type,
Layer 4 ports, and so forth. According to its related contract, an EPG provider dictates
the protocols and ports in both the in and out directions. Contract subjects contain asso-
ciations to the filters (and their directions) that are applied between EPGs that produce
and consume the contract.
Technet24
||||||||||||||||||||
Subjects are contained in contracts. One or more subjects within a contract use filters to
specify the type of traffic that can be communicated and how it occurs. For example,
for HTTPS messages, the subject specifies the direction and the filters that specify the
IP address type (for example, IPv4), the HTTP protocol, and the ports allowed. Subjects
determine if filters are unidirectional or bidirectional.
Outside Networks
Outside network policies control connectivity to the outside. A tenant can contain mul-
tiple outside network objects. Outside network policies specify the relevant Layer 2 or
Layer 3 properties that control communications between an outside public or private
network and the ACI fabric. External devices, such as routers that connect to the WAN
and enterprise core, or existing Layer 2 switches, connect to the front panel interface of
a leaf switch. The leaf switch that provides such connectivity is known as a border leaf.
The border leaf switch interface that connects to an external device can be configured as
either a bridged or routed interface. In the case of a routed interface, static or dynamic
routing can be used. The border leaf switch can also perform all the functions of a nor-
mal leaf switch.
Fabric
■ Switch profiles specify which switches to configure and the switch configuration
policy.
||||||||||||||||||||
■ Module profiles specify which spine switch modules to configure and the spine
switch configuration policy.
■ Interface profiles specify which fabric interfaces to configure and the interface con-
figuration policy.
■ Global policies specify DNS, fabric MTU default, multicast tree, and load balancer
configurations to be used throughout the fabric.
■ Pod profiles specify date and time, SNMP, Council of Oracle Protocol (COOP),
IS-IS, and Border Gateway Protocol (BGP) route reflector policies.
Fabric policies configure interfaces that connect spine and leaf switches. Fabric poli-
cies can enable features such as monitoring (statistics collection and statistics export),
troubleshooting (on-demand diagnostics and SPAN), IS-IS, Council of Oracle Protocol
(COOP), SNMP, Border Gateway Protocol (BGP) route reflectors, DNS, and Network
Time Protocol (NTP).
Admin creates new Admin creates Admin creates Admin creates Switch
(or selects from Policy Group 1 Interface Profile 1, Profile 1 that specifies
existing) fabric and Policy Group 2, associates it with which switches use
policies that which specify which Policy Group 1 Interface Profile 1
configure available access policies and creates and creates Switch
options to use Interface Profile 2 Profile 2 that specifies
and associates it which switch uses
with Policy Group 2 interface Profile 2
Technet24
||||||||||||||||||||
Figure 8-21 shows the result of applying Switch Profile 1 and Switch Profile 2 to the ACI
fabric.
ALL Spine
Switch Profile 2
Policies
Interfaces
Configured
Spine Health Score2
Fabric Diag2
||||||||||||||||||||
Fabric Access
Switch Module Interface Global DHCP, VLAN and Physical and L2, L3 Monitoring &
Profiles Profiles Profiles QoS, AEP Policies Multicast Pools External Domains Troubleshooting
■ Switch profiles specify which switches to configure and the switch configuration
policy.
■ Module profiles specify which leaf switch access cards and access modules to
configure and the leaf switch configuration policy.
■ Interface profiles specify which access interfaces to configure and the interface
configuration policy.
■ Global policies enable the configuration of DHCP, QoS, and attachable entity pro-
file (AEP) functions that can be used throughout the fabric. AEP profiles provide a
template to deploy hypervisor policies on a large set of leaf ports and associate a
Virtual Machine Manager (VMM) domain and the physical network infrastructure.
They are also required for Layer 2 and Layer 3 external network connectivity
■ Pools specify VLAN, VXLAN, and multicast address pools. A pool is a shared
resource that can be consumed by multiple domains such as VMM and Layer 4 to
Layer 7 services.
■ External bridged domain Layer 2 domain profiles contain the port and VLAN
specifications that a bridged Layer 2 network connected to the fabric uses.
■ External routed domain Layer 3 domain profiles contain the port and VLAN
specifications that a routed Layer 3 network connected to the fabric uses.
Technet24
||||||||||||||||||||
Admin creates new Admin creates Admin creates Admin creates Switch
(or selects from Policy Group 1 Interface Profile 1, Profile 1 that specifies
existing) interface and Policy Group 2, and associates it which switches use
access policies that which specify which with Policy Group 1 Interface Profile 1
configure available interface and creates and creates Switch
options configuration Interface Profile 2 Profile 2 that specifies
policies to use and associates it which switch uses
with Policy Group 2 Interface Profile 2
Figure 8-24 shows the result of applying Switch Profile 1 and Switch Profile 2 to the ACI
fabric.
Although configuration steps of each logical construct along with Cisco ACI fabric and
access policy components are beyond the scope of this book, I would highly recommend
you check out dCloud lab on “Getting Started with Cisco ACI” (https://dcloud2-sng.
cisco.com/content/demo/343552?returnPathTitleKey=content-view) to get a feel of the
GUI interface and the configuration steps involved in configuring the individual compo-
nents discussed in this chapter.
||||||||||||||||||||
Within the fabric, ACI uses IS-IS and COOP for all forwarding of endpoint-to-endpoint
communications. The forwarding across switch nodes is performed based on the Tunnel
Endpoint (TEP) IP in the VXLAN encapsulation. IS-IS provides IP reachability among
TEP addresses. In case the ingress leaf is not aware of the destination endpoint location
(TEP), ACI has a distributed database called the Council of Oracles Protocol (COOP) on
each spine that knows all the mapping of endpoints and TEPs. For propagating routing
information between software-defined networks within the fabric and routers external to
the fabric, ACI uses the Multiprotocol Border Gateway Protocol (MP-BGP). All ACI links
in the fabric are active, equal-cost multipath (ECMP) forwarding, and fast-reconverging.
Technet24
||||||||||||||||||||
IP Fabric leveraging
VXLAN tagging
Eth
802.1Q VXLAN VXLAN NVGRE Payload
MAC
VLAN 10 VNID = 5789 VNID = 11348 VSID = 7456
Eth IP Payload
VM VM VM
802.1Q IP Payload
Summary
This chapter discussed Cisco ACI building blocks, deployment models, hardware used in
a Cisco ACI solution, Cisco ACI fabric startup discovery, the Cisco ACI policy model,
and packet forwarding within the Cisco ACI fabric, including the following points:
■ ACI is a spine/leaf network of Nexus 9k switches using the ACI operating system
with a management platform called APIC, which provides a single place from which
the network can be managed.
■ Cisco ACI has many benefits over a traditional network, such as simple spine/leaf
architecture, Layer 2 loop avoidance using ECMP, better security using an allow list
model, and REST API automation advantage.
■ Starting from Cisco ACI 4.1, Cisco ACI supports both two-tier and multitier topolo-
gies.
■ Leaf switches are the switches to which all endpoints (servers, storage, service nodes,
and so on) connect, and they provide the VXLAN tunnel endpoint (VTEP) function.
Leaf switches are often categorized by function, such as border leaf, compute leaf,
and so on.
■ Spine switches interconnect leaf switches and provide the backbone of the ACI fabric.
||||||||||||||||||||
Summary 329
■ The Cisco Nexus 9000 Series switches operate in one of two modes: Cisco
Application-Centric Infrastructure (Cisco ACI) or Cisco NX-OS. The Cisco Nexus
9000 Series switches in ACI mode provide the spine and leaf switches that build the
fabric.
■ During startup discovery, the ACI fabric is brought up in a cascading manner, start-
ing with the leaf nodes that are directly attached to the APIC.
■ The ACI policy model enables the specification of application requirements policies.
The policy model manages the entire fabric, including the infrastructure, authentica-
tion, security, services, applications, and diagnostics.
■ Access policies govern the operation of switch access ports that provide connectiv-
ity to resources such as storage, compute, Layer 2 and Layer 3 (bridged and routed)
connectivity, virtual machine hypervisors, Layer 4 to Layer 7 devices, and so on.
■ Fabric policies govern the operation of the switch fabric ports, including functions
such as Network Time Protocol (NTP) server synchronization, Intermediate System
to Intermediate System Protocol (IS-IS), Border Gateway Protocol (BGP) route reflec-
tors, Domain Name System (DNS), and so on.
■ Virtual routing and forwarding (VRF) is a tenant network and defines a Layer 3
address domain.
■ An application profile defines the policies, services, and relationships between end-
point groups (EPGs). An EPG is a managed object that is a named logical entity that
contains a collection of endpoints. Endpoints are devices that are connected to the
network directly or indirectly.
■ A bridge domain represents a Layer 2 forwarding construct within the fabric. While
a VRF defines a unique IP address space, that address space can consist of multiple
subnets. Those subnets are defined in one or more BDs that reference the corre-
sponding VRF.
■ An attachable entity profile (AEP) represents a group of external entities with similar
infrastructure policy requirements.
■ EPGs can only communicate with other EPGs according to contract rules. Label,
filter, and subject managed objects enable mixing and matching among EPGs and
contracts so as to satisfy various applications or service delivery requirements.
■ All traffic in the ACI fabric is normalized as VXLAN packets. As traffic enters the
fabric, ACI encapsulates it in VXLAN packets and applies policy to it, forwards it as
needed across the fabric through a spine switch, and de-encapsulates it upon exiting
the fabric.
Technet24
||||||||||||||||||||
References
“Cisco Application Centric Infrastructure Solution Overview,” https://www.cisco.com/c/
en/us/solutions/collateral/data-center-virtualization/application-centric-infrastruc-
ture/solution-overview-c22-741487.html
“Cisco Application Centric Infrastructure Design Guide,” https://www.cisco.com/c/en/
us/td/docs/dcn/whitepapers/cisco-application-centric-infrastructure-design-guide.
html
“Cisco Application Centric Infrastructure Fundamentals, Release 5.2(x),” https://www.
cisco.com/c/en/us/td/docs/dcn/aci/apic/5x/aci-fundamentals/cisco-aci-fundamentals-
52x.html
“Cisco APIC Getting Started Guide, Release 5.2(x),” https://www.cisco.com/c/en/us/td/
docs/dcn/aci/apic/5x/getting-started/cisco-apic-getting-started-guide-52x.html
“Cisco APIC Basic Configuration Guide, Release 5.2(x),” https://www.cisco.com/c/en/us/
td/docs/dcn/aci/apic/5x/basic-configuration/cisco-apic-basic-configuration-guide-
52x.html
“Setting Up an ACI Fabric: Initial Setup Configuration Example,” https://www.cisco.
com/c/en/us/td/docs/switches/datacenter/aci/apic/white_papers/Cisco-ACI-Initial-
Deployment-Cookbook.html
“Cisco APIC Layer 2 Networking Configuration Guide, Release 5.2(x),” https://www.
cisco.com/c/en/us/td/docs/dcn/aci/apic/5x/layer-2-configuration/cisco-apic-layer-
2-networking-configuration-guide-52x.html
“Cisco APIC Layer 3 Networking Configuration Guide, Release 5.2(x),” https://www.
cisco.com/c/en/us/td/docs/dcn/aci/apic/5x/l3-configuration/cisco-apic-layer-3-net-
working-configuration-guide-52x.html
“ACI Fabric Endpoint Learning White Paper,” https://www.cisco.com/c/en/us/solutions/
collateral/data-center-virtualization/application-centric-infrastructure/white-paper-
c11-739989.html
“Cisco Application Centric Infrastructure (ACI) – Endpoint Groups (EPG) Usage and
Design,” https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualiza-
tion/application-centric-infrastructure/white-paper-c11-731630.html
“Cisco Mini ACI Fabric and Virtual APICs,” https://www.cisco.com/c/en/us/td/docs/
switches/datacenter/aci/apic/sw/kb/Cisco-Mini-ACI-Fabric-and-Virtual-APICs.html
“Cisco ACI Remote Leaf Architecture White Paper,” https://www.cisco.com/c/en/us/
solutions/collateral/data-center-virtualization/application-centric-infrastructure/
white-paper-c11-740861.html
“ACI Multi-Pod White Paper,” https://www.cisco.com/c/en/us/solutions/collateral/data-
center-virtualization/application-centric-infrastructure/white-paper-c11-737855.html
“Determine Packet Flow Through an ACI Fabric,” https://www.cisco.com/c/en/us/sup-
port/docs/switches/nexus-9336pq-aci-spine-switch/118930-technote-aci-00.html
||||||||||||||||||||
References 331
Technet24
||||||||||||||||||||
||||||||||||||||||||
Chapter 9
In this chapter, we will discuss Cisco ACI external connectivity options, including L2Out
and L3Out, Cisco ACI and VMM integration, Cisco ACI and L4–L7 integration, Cisco
ACI management options, Cisco ACI Anywhere, and Cisco Nexus Dashboard.
The Layer 3 connection between an ACI fabric and an outside network is required in the
following scenarios:
■ Connecting to WAN routers in the data center so that a WAN router provides Layer
3 data center interconnect (DCI) or Internet access for tenants. In some scenarios, a
WAN router provides a VPN connection to a tenant’s on-premises network.
The Layer 2 connection between an ACI fabric and an outside network is required in the
following scenarios:
■ In the existing data centers, connecting the existing switching network to an ACI
leaf and stretching the same VLAN and subnet across ACI and the existing network.
Technet24
||||||||||||||||||||
■ Extending the Layer 2 domain from ACI to a DCI platform so that the Layer 2
domain of ACI can be extended to a remote data centers.
Cisco ACI was originally built to be a stub network in a data center to manage endpoints.
The ACI L3Out was initially designed only as a border between the stub network formed
by ACI and the rest of the network, such as intranet, Internet, WAN, and so on, not as a
transit network, as shown in Figure 9-1.
ACI as a stub
BD BD BD L3OUT
Intranet Internet
Due to this stub nature, traffic traversing from one L3Out to another through the ACI
network was originally not supported. Beginning with APIC Release 1.1, however, Cisco
ACI introduced the Transit Routing feature, which allows the ACI fabric to be a tran-
sit network so that traffic can traverse from one L3Out to another L3Out, as shown in
Figure 9-2.
||||||||||||||||||||
Internet
L3OUT2
Intranet2
ACI as a transit
BD
BD
Figure 9-2
Intranet1
Technet24
||||||||||||||||||||
The border leafs (BLEAFs) are ACI leaves that provide Layer 3 connections to outside
networks. Any ACI leaf can be a border leaf. In addition to supporting routing protocols
to exchange routes with external routers, the border leaf also applies and enforces policy
for traffic between internal and external endpoints.
Three different types of interfaces are supported on a border leaf switch to connect to an
external router:
■ Sub-interface with 802.1Q tagging: With sub-interface, the same physical interface
can be used to provide multiple outside connections for multiple tenants or VRFs.
■ Switched Virtual Interface (SVI): With an SVI, the same physical interface that sup-
ports Layer 2 and Layer 3 and the same physical interface can be used for a Layer 2
outside connection as well as a Layer 3 outside connection. In addition to support-
ing routing protocols to exchange routes with external routers, the border leaf also
applies and enforces policy for traffic between internal and external endpoints.
Within the ACI fabric, Multiprotocol BGP (MP-BGP) is implemented between leaf and
spine switches to propagate external routes within the ACI fabric. The BGP route reflec-
tor technology is deployed in order to support a large number of leaf switches within a
single fabric. All of the leaf and spine switches are in one single BGP autonomous system
(AS). Once the border leaf learns the external routes, it can then redistribute the external
routes of a given VRF to an MP-BGP address family VPN version 4 (or VPN version 6
when IPv6 routing is configured). With address family VPN version 4, MP-BGP maintains
a separate BGP routing table for each VRF. Within MP-BGP, the border leaf advertises
routes to a spine switch, which is a BGP route reflector. The routes are then propagated to
all the leafs where the VRFs are instantiated.
The L3Out provides the necessary configuration objects for the following five key func-
tions, which are also displayed in Figure 9-3:
2. Distribute learned external routes (or static routes) to other leaf switches.
5. Allow traffic to arrive from or be sent to external networks via L3Out by using a
contract.
||||||||||||||||||||
Distribute
External Routes
VRF Overlay-1
BD L3OUT1 L3OUT2
3 L3OUT 1 L3OUT 4
EPG
EPG1 Learn EPG2
Advertise Advertise
Internal Routes External Routes External Routes
Contract
5 Allow Traffic
There are two common ways of extending a Layer 2 domain outside the Cisco ACI fabric:
■ Extend the EPG out of the ACI fabric: You can extend an EPG out of the ACI
fabric by statically assigning a port (along with VLAN ID) to an EPG, as shown in
Figure 9-4. The leaf will learn the endpoint information, assign the traffic (by match-
ing the port and VLAN ID) to the proper EPG, and then enforce the policy. The
endpoint learning, data forwarding, and policy enforcement remain the same whether
the endpoint is directly attached to the leaf port or is behind a Layer 2 network
(provided the proper VLAN is enabled in the Layer 2 network). This is great for the
migration scenario. STP TCNs from the external Layer 2 network may impact ACI
internal EPs in the same VLAN. This scenario is avoided by using different VLANs
for Layer 2 external network and internal EPs.
Technet24
||||||||||||||||||||
BD
EPG
VLAN 10 VLAN 10
VM VM
■ Extend the bridge domain out of the ACI fabric: Another option to extend the
Layer 2 domain is to create a Layer 2 outside connection (or L2Out, as it’s called
in the APIC GUI) for a given bridge domain, as shown in Figure 9-5. It effectively
extends the bridge domain to the outside network. The external Layer 2 network
belongs to its own dedicated EPG. In this scenario, STP TCN from the external Layer
2 network does not affect any internal EPs, which is good for complete separation.
BD Extension
BD
EPG1 EPG2
VLAN 10 VLAN 20
VM VM
||||||||||||||||||||
Cisco ACI supports virtual machine managers (VMMs) from the following products and
vendors:
■ Cloud Foundry
■ Kubernetes
■ OpenShift
■ OpenStack
VMM domain profiles specify connectivity policies that enable virtual machine control-
lers to connect to the ACI fabric. Figure 9-6 provides an overview of the VMM domain
policy model.
VMM Domain n 1
VLAN Pool
Profile
n 1
1 1 n
n n n n n
Endpoint Attachable
Controller Credential Encap Blocks
Group Entity Profile
Legend
Technet24
||||||||||||||||||||
The following are the essential components of an ACI VMM domain policy:
■ EPG association: Endpoint groups regulate connectivity and visibility among the
endpoints within the scope of the VMM domain policy. VMM domain EPGs behave
as follows:
■ The APIC pushes these EPGs as port groups into the VM controller.
■ An EPG can span multiple VMM domains, and a VMM domain can contain mul-
tiple EPGs.
■ Attachable entity profile association: Associates a VMM domain with the physi-
cal network infrastructure. An attachable entity profile (AEP) is a network interface
template that enables deploying VM controller policies on a large set of leaf switch
ports. An AEP specifies which switches and ports are available as well as how they
are configured.
■ VLAN pool association: A VLAN pool specifies the VLAN IDs or ranges used for
VLAN encapsulation that the VMM domain consumes.
The following modes of Cisco ACI and VMware VMM integration are supported:
■ VMware VDS: When integrated with Cisco ACI, the VMware vSphere distributed
switch (VDS) enables you to configure VM networking in the Cisco ACI fabric.
■ Cisco ACI Virtual Edge: Cisco ACI Virtual Edge is a hypervisor-independent dis-
tributed service VM that leverages the native distributed virtual switch that belongs
to the hypervisor. Cisco ACI Virtual Edge operates as a virtual leaf.
Figure 9-7 outlines the workflow of how an APIC integrates with VMM domain
(VMware vCenter in this case) and pushes policies to the virtual environment.
The APIC administrator configures the vCenter domain policies in the APIC. The APIC
administrator provides the following vCenter connectivity information:
||||||||||||||||||||
9780137638246_print.indb 341
Application Network Profile
5 Traditional
EPG EPG EPG
3-Tier F/W L/B
WEB APP DB
APIC Create Application Policy Application
APP PROFILE
APIC Admin
9 ACI
Fabric
Push Policy (On Demand)
1
Technet24
23/08/22 4:45 PM
||||||||||||||||||||
Following outlines the workflow of how a APIC integrates with VMM domain (VMware
vCenter in this case) and pushes policies to the virtual environment.
1. The Cisco APIC performs an initial handshake, opens a TCP session with the
VMware vCenter specified by a VMM domain, and syncs the inventory.
2. The APIC creates the VDS—or uses an existing VDS if one is already created—
matching the name of the VMM domain. If you use an existing VDS, the VDS must
be inside a folder on vCenter with the same name.
3. The vCenter administrator or the compute management tool adds the ESXi host or
hypervisor to the APIC VDS and assigns the ESXi host hypervisor ports as uplinks
on the APIC VDS. These uplinks must connect to the ACI leaf switches.
4. The APIC learns the location of the hypervisor host using the LLDP or CDP infor-
mation of the hypervisors.
7. The APIC automatically creates port groups in the VMware vCenter under the VDS.
This process provisions the network policy in the VMware vCenter.
8. The vCenter administrator or the compute management tool instantiates and assigns
VMs to the port groups.
9. The APIC learns about the VM placements based on the vCenter events. The APIC
automatically pushes the application EPG and its associated policy (for example,
contracts and filters) to the ACI fabric.
||||||||||||||||||||
The Cisco ACI allows you to define a sequence of meta-devices, such a firewall of a
certain type followed by a load balancer of a certain make and version. This is called a
service graph template, also known as an abstract graph. When a service graph template
is referenced by a contract, the service graph template is instantiated by being mapped to
concrete devices such as the firewall and load balancers present in the fabric. The map-
ping happens within the concept of a context. The device context is the mapping config-
uration that allows Cisco ACI to identify which firewalls and which load balancers can be
mapped to the service graph template. A logical device represents the cluster of concrete
devices. The rendering of the service graph template is based on identifying the suitable
logical devices that can be inserted in the path defined by a contract.
The following is the outline of the service graph workflow, as illustrated in Figure 9-8:
EPG C P C P EPG
Consumer ASA5525-X... Pod1_BIGIP... Provider
FW ADC
3: Attach Service Graph to
Contract Subject
1: L4-L7 Device
Service node pool
EPG EPG
Client Contract Web
FW A SLB A
Chain
1. Define an L4–L7 device (for example, the ports to which the device is connected).
2. Create a service graph template that defines the flow of the traffic.
4. Create a device selection policy that ties the intended logical device to a service
graph template and contract.
5. Configure the firewall and the load balancers with all the necessary rules for security
and load balancing, such as ACL configuration, server farm configuration, and so on.
Technet24
||||||||||||||||||||
management greatly, reduce the number of touch points, and decouple the switching
hardware from the desired configuration intent. These changes include the following:
■ Stateless hardware
In the ACI architecture, Cisco APIC provides the single point of management and access
to all configuration, management, monitoring, and health functions. Cisco APIC can be
configured using a graphical user interface (GUI), command-line interface (CLI), and API.
The underlying interface for all access methods is provided through a REST-based API,
which modifies the contents of a synchronized database that is replicated across APICs
in a cluster and provides an abstraction layer between all of the interfaces. This results
in a clean and predictable transition between the interfaces with no risk of inconsistency
between the various data interfaces.
REST API
APIC
||||||||||||||||||||
stateless fashion, meaning that hardware swaps can be faster, topology changes are less
impactful, and network management is simplified.
The desired state model for configuration further complements the concepts of
controller-based management and statelessness by taking advantage of a concept known
as declarative control-based management, based on a concept known as the promise
theory. Declarative control dictates that each object is asked to achieve a desired state
and makes a “promise” to reach this state without being told precisely how to do so. This
stands in contrast with the traditional model of imperative control, where each managed
element must be told precisely what to do, be told how to do it, and take into account
the specific situational aspects that will impact its ability to get from its current state
to the configured state. A system based on declarative control is able to scale much
more efficiently than an imperative-based system, since each entity within the domain is
responsible for knowing its current state and the steps required to get to the desired state,
as dictated by the managing controller.
■ Access to monitoring (statistics, faults, events, audit logs) operational and configura-
tion data.
■ Access to the APIC and spine and leaf switches through a single sign-on mechanism.
■ Communication with the APIC using the same RESTful APIs available to third parties.
Technet24
||||||||||||||||||||
■ Implemented from the ground up in Python. You can switch between the Python
interpreter and the CLI.
Example 9-1 illustrates an SSH login to the APIC CLI and shows how to view the running
configuration.
||||||||||||||||||||
The REST API is the interface into the management information tree (MIT) and allows
manipulation of the object model state. The same REST interface is used by the APIC
CLI, GUI, and SDK so that whenever information is displayed, it is read through the
REST API, and when configuration changes are made, they are written through the REST
API. The REST API also provides an interface through which other information can be
retrieved, including statistics, faults, and audit events. It even provides a means of sub-
scribing to push-based event notification so that when a change occurs in the MIT, an
event can be sent through a web socket.
Standard REST methods are supported on the API, which includes POST, GET, and
DELETE operations through HTTP. The POST and DELETE methods are idempotent,
meaning that there is no additional effect if they are called more than once with the same
input parameters. The GET method is nullipotent, meaning that it can be called zero or
more times without making any changes (or that it is a read-only operation). Payloads to
and from the REST interface can be encapsulated through either XML or JSON encoding.
NX-OS
NX-OS
All the switch nodes contain a complete copy of the concrete model. When an adminis-
trator creates a policy in the APIC that represents a configuration, the APIC updates the
logical model. The APIC then performs the intermediate step of creating a fully elabo-
rated policy that it pushes into all the switch nodes where the concrete model is updated.
Technet24
||||||||||||||||||||
The hierarchical structure starts at the top (root) and contains parent and child nodes.
Each node in this tree is an MO, and each object in the ACI fabric has a unique distin-
guished name (DN) that describes the object and its place in the tree. MOs are abstrac-
tions of the fabric resources. An MO can represent a physical object, such as a switch or
adapter, or a logical object, such as a policy or fault.
Configuration policies make up the majority of the policies in the system and describe
the configurations of different ACI fabric components. Policies determine how the sys-
tem behaves under specific circumstances. Certain MOs are not created by users but are
automatically created by the fabric (for example, power supply objects and fan objects).
By invoking the API, you can read and write objects to the MIM.
The information model is centrally stored as a logical model by the APIC, while each
switch node contains a complete copy as a concrete model. When a user creates a policy
in the APIC that represents a configuration, the APIC updates the logical model. The
APIC then performs the intermediate step of creating a fully elaborated policy from
the user policy and then pushes the policy into all the switch nodes where the concrete
model is updated. The models are managed by multiple data management engine (DME)
processes that run in the fabric. When a user or process initiates an administrative change
to a fabric component (for example, when you apply a profile to a switch), the DME first
applies that change to the information model and then applies the change to the actual
managed endpoint. This approach is called a model-driven framework.
Figure 9-12 shows the branch diagram of a leaf switch port that starts at the top root of
the ACI fabric MIT and follows a hierarchy that is composed of a chassis with two-line
module slots, with a line module in slot 2.
|--root--------- (root)
|--sys-----------(sys)
|--ch-----------(sys/ch)
|--lsclot-1------(sys/ch/lcslot-1)
|--lsclot-2------(sys/ch/lcslot-2)
|--lc------(sys/ch/lcslot-2/lc)
|--leafport-1------(sys/ch/lcslot-2/lc/leafport-1)
Figure 9-12 ACI Fabric MIT Branch Diagram of a Leaf Switch Port
You can identify and access a specific object by its distinguished name (DN) or by its
relative name (RN), depending on the current location in the MIT. The RN identifies an
object from its siblings within the context of its parent object.
The DN enables you to unambiguously identify a specific target object. The DN consists
of a series of RNs:
dn = {rn}/{rn}/{rn}/{rn}...
<dn = "sys/ch/lcslot-1/lc/fabport-1"/>
||||||||||||||||||||
Because of the hierarchical nature of the tree and the attribute system used to identify
object classes, the tree can be queried in several ways for obtaining managed object infor-
mation. Queries can be performed on an object itself through its distinguished name, on
a class of objects such as a switch chassis, or on the tree level to discover all members of
an object:
■ Tree-level query: Tree-level queries return the referenced object and its child objects.
This approach is useful for discovering the components of a larger system.
■ Class-level query: Class-level queries return all the objects of a given class. This
approach is useful for discovering all the objects of a certain type that are available
in the MIT.
Figure 9-13 illustrates different query levels. The two tree-level queries discover the cards
and ports of a given switch chassis. The class-level query used is Cards, which returns all
the objects of type Cards. The two object-level queries used are for Node 1 in Chassis 2
and for Node 1 in Chassis 1 in Card 1 in Port 2.
For all MIT queries, an administrator can optionally return the entire subtree or a partial
subtree. Additionally, the role-based access control (RBAC) mechanism in the system dic-
tates which objects are returned; only the objects that the user has rights to view will ever
be returned.
Technet24
||||||||||||||||||||
Fabric
Chassis 1 Chassis 2
Tree-Level Query
Class-Level Query
Object-Level Query
http://apic/api/mo/uni/tn-Cisco/ap-Software/epg-Download.xml
Because the REST API is HTTP based, defining the URI to access a certain resource
type is important. The first two sections of the request URI simply define the protocol
and access details of the APIC. Next in the request URI is the literal string /api, indicat-
ing that the API will be invoked. Generally, read operations are for an object or class,
||||||||||||||||||||
as discussed earlier, so the next part of the URI specifies whether the operation will be
for an MO or class. The next component defines either the fully qualified domain name
(FQDN) being queried for object-based queries or the package and class name for class-
based queries. The final mandatory part of the request URI is the encoding format: either
.xml or .json. The REST API supports a wide range of flexible filters, useful for narrow-
ing the scope of your search to allow information to be located more quickly. The filters
themselves are appended as query URI options, starting with a question mark (?) and
concatenated with an ampersand (&). Multiple conditions can be joined together to form
complex filters.
Both create and update operations in the REST API are implemented using the POST
method so that if an object does not already exist, it will be created, and if it does
already exist, it will be updated to reflect any changes between its existing state and
desired state. Create and update operations use the same syntax as read operations,
except that they are always targeted at an object level, because you cannot make changes
to every object of a specific class (nor would you want to). The create or update opera-
tion should target a specific managed object, so the literal string /mo indicates that the
DN of the managed object will be provided, followed next by the actual DN. The pay-
load of the POST operation will contain the XML- or JSON-encoded data representing
the MO that defines the Cisco API command body. Figure 9-15 shows a sample REST
payload.
<fvTenant name=“NewTenant”>
<fvAp name=“NewApplication”>
<fvAEPg name=“WebTier”>
<fvRsPathAtt encap=“vlan-1” mode=“regular” tDn=“topology/pod-1/
paths-17/pathep-[eth1/1]”/>
</fvAEPg>
</fvAp>
</fvTenant>
Technet24
||||||||||||||||||||
this data, at a variety of levels that will cater to the level of comfort the user has with
programming, all of which use open standards and open source.
APIC is very flexible in terms of how it can accept configuration and provide administra-
tive and operable states, in addition to and how it extends that configuration into subor-
dinate components. Two primary categories of interfaces facilitate these functions: the
northbound REST API and the southbound programmatic interfaces.
The northbound REST API is responsible for accepting configuration as well as providing
access to management functions for the controller. This interface is a crucial component
for the GUI and CLI and also provides a touch point for automation tools, provisioning
scripts and third-party monitoring and management tools. The REST API is a singular
entry point to the fabric for making configuration changes, and as such, it is a critical
aspect of the architecture for being able to provide a consistent programmatic
experience.
Southbound interfaces on APIC allow for the declarative model of intent to be extended
beyond the fabric, into subordinate devices. This is a key aspect to the openness of the
ACI fabric, in that policy can be programmed once via APIC and then pushed out to
hypervisors, L4–L7 devices, and third-party partner devices such as F5, Citrix Embrane,
Palo Alto, A10, Sourcefire, and so on, without the need to individually configure those
devices.
OpFlex is designed to allow a data exchange of a set of managed objects defined as part
of an informational model. OpFlex itself does not dictate the information model and can
be used with any tree-based abstract model in which each node in the tree has a universal
resource identifier (URI) associated with it. The protocol is designed to support XML
and JSON (as well as the binary encoding used in some scenarios) and to use standard
remote procedure call (RPC) mechanisms such as JSON-RPC over TCP.
For northbound and southbound API references and tools, Cisco DevNet offers a single
central repository. On this site, you can find learning materials for network programma-
bility basics, APIs, tools, a developer sandbox, sample code on GitHub (which includes
scripts and libraries for developers of Cisco ACI), and so on. Also, you can use this site
to find communities of interest, get access to support, and find more topics on this sub-
ject. You can register for Cisco DevNet at https://developer.cisco.com/.
||||||||||||||||||||
and other future solutions yet to come. Together, ACI Anywhere solutions transform
ACI into a true hybrid cloud solution. Cisco ACI Anywhere facilitates application agility
and data center automation. It automates management of end-to-end connectivity and
enforcement of consistent security policies for applications running throughout the
edge-to-cloud continuum.
Figure 9-16 illustrates various ACI Anywhere solutions that facilitate the any workload,
any location, any cloud strategy.
ACI Anywhere
Remote leaf / Virtual PoD APIC / Multi-Site Public Cloud Environment Extensions
In Chapter 8, “Describing Cisco ACI,” we discussed various Cisco ACI deployment mod-
els that make Cisco ACI Anywhere possible, including ACI MultiPod, Nexus Dashboard
Orchestrator, Cloud ACI, Remote Leaf, and so on. Here, we will discuss some of the pop-
ular integration solutions for Cisco ACI that are an integral part of Cisco ACI Anywhere.
Technet24
||||||||||||||||||||
Connect data centers, branches, campuses, colocation facilities, and Enforce a common set of access control policies uniformly
cloud to improve network speed, security, and efficiency throughout the enterprise
||||||||||||||||||||
■ Dynamically maps the application and service components to the Cisco ACI network
elements, thus providing a shared view of the application and infrastructure across
teams
■ Provides a dynamic view of application use in the infrastructure for the network
operations team
With this integration, you can correlate application service-level management with infra-
structure monitoring. This new integration significantly reduces the time it takes to iden-
tify and troubleshoot end-to-end application performance issues.
As a result of this integration, the attack surface is greatly reduced, and any unauthor-
ized or suspicious access to resources and potential threats can quickly be controlled and
remediated.
Technet24
||||||||||||||||||||
This integration allows Cisco ACI to provide a ready-to-use, secure networking environ-
ment for Kubernetes. The integration maintains the simplicity of the user experience in
deploying, scaling, and managing containerized applications while still offering the con-
trols, visibility, security, and isolation required by an enterprise.
The Cisco ACI and Kubernetes solution offers the following benefits:
■ Secure multitenancy
Remote
data center
APIC
aws
||||||||||||||||||||
and tools. From the Nexus Dashboard, you can cross-launch any of the sites’ controllers,
including APIC, Cloud APIC, and Cisco Data Center Network Manager (DCNM) fabrics,
which drives the adoption of cloud-native application practices.
■ Easy to use
■ Single sign-on (SSO) for a seamless user experience across operation services
■ Easy to maintain
Technet24
||||||||||||||||||||
Summary
This chapter discusses Cisco ACI external connectivity options, various third-party solu-
tion integrations with Cisco ACI, Cisco ACI management and automation options, Cisco
ACI Anywhere, and Cisco Nexus Dashboard, including the following points:
■ The Cisco ACI fabric supports a wide range of methods to interconnect the fabric
with external networks, data center environments, or segments, including Layer 2
(L2Out) and Layer 3 (L3Out) connections.
■ A Layer 2 domain can be extended beyond the ACI fabric to the existing Layer
2 network via one of two methods: extending the EPG out of the ACI fabric or
extending the bridge domain out of the ACI fabric.
||||||||||||||||||||
References 359
■ Cisco ACI allows you to define a sequence of meta-devices, such a firewall of a cer-
tain type followed by a load balancer of a certain make and version, using a service
graph template, also known as an abstract graph.
■ Cisco APIC can be configured using a graphical user interface (GUI), command-line
interface (CLI), and API. The underlying interface for all access methods is provided
through a REST-based API.
■ All the physical and logical components that comprise the ACI fabric are represented
in a hierarchical management information model (MIM), also referred to as the MIT.
■ Queries on the MIT can be performed using three levels: tree level, class level, and
object level.
■ The Cisco Nexus Dashboard has three main components: Nexus Dashboard
Orchestrator, Nexus Dashboard Insights, and Nexus Dashboard Data Broker.
■ Cisco Nexus Dashboard Insights provides the ability to monitor and analyze the
fabric in real time to identify anomalies, to provide root-cause analysis and capacity
planning, and to accelerate troubleshooting.
■ The Cisco Nexus Dashboard Data Broker is a simple, scalable, and cost-effective
solution for data center, enterprise, and service provider customers who need to
monitor high-volume and business-critical traffic.
References
“Cisco Application Centric Infrastructure Solution Overview,” https://www.cisco.com/c/
en/us/solutions/collateral/data-center-virtualization/application-centric-infrastruc-
ture/solution-overview-c22-741487.html
“Cisco Application Centric Infrastructure Fundamentals, Release 5.2(x),” https://www.
cisco.com/c/en/us/td/docs/dcn/aci/apic/5x/aci-fundamentals/cisco-aci-fundamentals-
52x.html
“ACI Fabric L3Out Guide,” https://www.cisco.com/c/en/us/solutions/collateral/data-
center-virtualization/application-centric-infrastructure/guide-c07-743150.html
Technet24
||||||||||||||||||||
||||||||||||||||||||
Chapter 10
The data center is the home for applications. The servers provide the computing resources,
such as CPU and memory, for the applications to run. The applications consume these
resources as they process the requests from users or other applications and they return
the needed information. The applications need to be capable of receiving, organizing,
structuring, storing, and sending data. This all means that the applications work with
data, and this data needs to be not only processed but also stored in a secure and redun-
dant manner.
This chapter discusses the different options for storing data in the data center. You will
learn about the different components that make up the storage solutions. There are dif-
ferent ways of transporting data and storing it as well as protocols that communicate
with the physical storage devices and the operating systems and applications, and there
are transport protocols that allow transporting the data between storage systems and the
servers on which the applications are running.
Technet24
||||||||||||||||||||
are multiple applications, the generated amounts of data are usually very large and require
storage solutions that can scale up to meet the needs.
Storage solutions always start with two main components: the servers, which need to
store or read data and are the active part of such a communication (and are therefore
called initiators), and the storage systems, which receive and store data from the servers,
or read and send data to the servers in response to their requests. The storage systems do
not start communication with the servers; they cannot initiate it. They can only wait for
communication to start, and because it is intended for them, they are called targets.
The storage systems hold multiple storage elements: HDDs or SSDs (we’ll call them
disks for simplicity). This is the physical storage in a storage system. There are also the
controllers that manage these disks and that manage the communication of the storage
system. Also, there is the software that manages, abstracts, and virtualizes the underly-
ing hardware. In general, the physical disks are abstracted and virtualized on the storage
system by combining the physical drives into a logical unit. From these logical units
are created volumes. The volumes are assigned identifiers, which are called Logical Unit
Numbers (LUNs). A LUN is the address the initiator will need to access the correct stor-
age resource.
■ LUN masking: The real LUNs are not exposed, visible, to the SAN and the initia-
tors. The exposed LUNs are different, usually a 0 or a 1, and an internal table is used,
which maps the correct LUN to the initiator’s request, based on the initiator’s identity.
■ Initiator groups: These are similar as in concept to access lists and define which ini-
tiators are allowed to access which volume.
Although this book focuses on Cisco data center solutions, the information for the
storage systems is very important, because you need to know it when you design a data
center storage solution as well as when you need to perform troubleshooting. When trou-
bleshooting the communication between the initiator and the target, you have to follow
the whole path of the communication to understand the protocol used, to know which
components are configured, and to know where they operate in the storage network
system.
The storage systems and the servers can be directly connected or can go through a com-
munication infrastructure. Communication between the initiators and the targets can use
different transport protocols. Examples are the Fibre Channel Protocol (FCP) and the
Internet Protocol (IP). What transport protocol will be chosen depends on the type of
network storage. There are two types: block-based and file-based. The decision of which
network storage system type will be used depends on the specific needs of the data cen-
ter, the requirements, and the infrastructure.
||||||||||||||||||||
Storage Protocols
With block-based storage, the data is sent to the storage systems and stored in blocks.
The data is broken down into blocks of a specific, fixed size. The storage system decides
where, physically, each block will be stored. When the initiator requests specific data, the
storage system reads the blocks that form the data and return it to the server. The proto-
cols used for the block-based storage communication use unique addressing in order for
each block to be able to break down the data in blocks and transport it to the storage
system, for the storage system to write the blocks, and to know to which piece of data
the blocks belong. The addressing of the blocks is also important for the blocks to be put
together in the correct order!
The block-based protocols used with local storage (the storage inside a PC) are as follows:
The local storage is internal to the computers and uses a bus-based architecture. This
means that the communication protocol uses a hardware bus to communicate between
the operating system and the storage.
Figure 10-1 illustrates the different protocols using block-based storage and their trans-
port options.
Fibre
SCSI iSCSI FCIP FCoE
Channel
FC FC FC
FCIP
TCP TCP
IP IP FCoE
Lossless
Ethernet Ethernet Ethernet
PHYSICAL WIRE
Technet24
||||||||||||||||||||
■ FC protocol: The SCSI commands are encapsulated in FCP frames and transported
between the initiators and the targets. The communication takes place in SANs.
■ FC over IP (FCIP): The SCSI commands are encapsulated in FC frames, but the need
to transport these frames over wide area networks, or over long distances, requires
the FC frames to be encapsulated in TCP frames and carried over IP environments.
■ FC over Ethernet (FCoE): The SCSI commands are encapsulated in FC frames, but the
infrastructure is an IP environment, built with Ethernet switches that support the FCoE
protocol. The FC frames are encapsulated in Ethernet frames and thus transported.
■ Internal Small Computer Systems Interface (iSCSI): The SCSI commands are
encapsulated in TCP packets and transported over an IP network. The Transmission
Control Protocol (TCP) provides the security for the transmissions.
A file-based protocol is used to transfer files between a server and a client. The com-
munication happens over a network. Note that with file-based storage, the atomic unit
transferred is a file, whereas with the block-based storage, it’s a block of data. The pro-
tocols used for the file-based storage communication are the Network File System (NFS)
on UNIX/Linux systems and Server Message Block (SMB), originally developed by IBM
Corporation, but later adopted and enhanced by Microsoft, Intel, IBM, 3Com, and others.
The most popular version (dialect) of SMB is the Common Internet File System (CIFS).
Windows OS Linux/UNIX OS
||||||||||||||||||||
The CIFS and NFS protocols communicate over TCP/IP. Because of this, latency is high.
Which network storage approach (block- or file-based) is used and which protocols are
used depend on the specific storage needs. The block-based storage characteristics are as
follows:
■ Benefits:
■ Security and data safety: The protocols used to transfer data blocks are designed
to create secure, reliable, and safe communication between the initiators and the
targets.
■ Scalability and flexibility: The storage systems’ capacities and the SAN infra-
structures allow for rapid scaling to keep up with the growth and demands of the
organizations.
■ Easy file modifications: With block-based storage, when a file is changed, the
modifications in the storage system are made at the level where the blocks are
changed, not the whole file.
■ Drawbacks:
■ Benefits:
■ Simplicity: File-based storage is simpler to implement and does not require spe-
cialized hardware.
■ File sharing: File-based storage uses sharing to provide access to file resources to
multiple users.
■ Common protocols: The SMB/CIFS and the NFS are components of the
Windows and Linux/Unix operating systems.
■ Cost: File-based storage allows for cheaper means to share backup/archive files, as
there is no need for specialized infrastructure or communication adapters.
Technet24
||||||||||||||||||||
■ Drawbacks:
■ Traffic isolation: Traffic isolation is based on the use of VLANs, which might
pose a security issue.
■ NFSv2: Defined in RFC 1094 (March 1989). Originally used the User Datagram
Protocol (UDP) for stateless communication. Because it is 32-bit only, the first 2GB
of a file can be read.
■ NFSv3: Defined in RFC 1813 (June 1995). Developed to overcome some of the
NFSv2 limitations. It is 64-bit, which allows it to handle files bigger than 2GB.
Supports additional file attributes in the replies and asynchronous writes to the NFS
server. Added support for TCP-based communication.
■ NFSv4: First defined in RFC 3010 (December 2000) and revised in RFC 3530 (April
2003). There are additional minor versions, such as NFSv4.1 and NFSv4.2, that add
support for clustered server deployments, parallel access to files distributed among
different servers, server-side clone and copy, space reservation, application data
block (ADP), and session trunking (also known as NFS multipathing). NFSv4 adds
support for end-to-end security, such as Kerberos 5. A big advantage is that the NFS
server runs the service on a single port (TCP/UDP 2049). The latter makes control-
ling the NFS communication through firewalls and applying quality of service (QoS)
much easier.
■ Solaris
■ AIX
■ HP-UX
■ Apple’s macOS
||||||||||||||||||||
■ FreeBSD
■ Microsoft Windows (it’s more common to use SMB/CIFS on the Windows OS)
■ IBM OS/2
■ OpenVMS
The NFS is based on a client/server architecture. This approach is based on the concept
that one computer, which will be the server, has resources that are needed by another
computer. This other computer is the client. The server will share, or make available, the
needed resources to the client. The client and the server communicate with each other
through established protocols, which in this case are the protocols defined by the NFS.
The client will consume the shared resources as if they were local. The communication
takes place in a network environment. If there are multiple servers and multiple clients, or
multiple clients and one server, it is a distributed client/server network.
Using the client/server approach is beneficial because of the cost reduction, as there is no
need for any additional hardware and there are minimum space requirements. The clients
do not need a lot of local disk space because they use storage on the server. Other ben-
efits are the centralized management and maintenance, such as backups, performed on
the server.
As the NFS operates at Layer 7, the communication between the client and the server is
basically communication between two applications. The applications in a client/server
architecture communicate using Remote Procedure Call (RPC), which makes it pos-
sible for a client process to call a server process and ask it to execute the call as a local
process. The rpcbind, or portmap, is an RPC service that allows clients and servers to
communicate with one another by using interprocess communication methods. In other
words, the rpcbind (portmap) is a service that takes care of the addressing in the com-
munication between services. It is responsible for mapping the service to a port on which
it listens. The RPC services are assigned ports, as long as they are registered with the
portmap with a program number, version, and transport protocol. The portmap service is
usually running on TCP/UDP port 111, also known as a privileged port in Unix. The NFS
servers use the UDP/TCP port 2049 by default.
Based on this information, the NFS flow between the NFS server and client can be
described with the following sequence:
1. The NFS server must run the needed services and daemons, such as mountd, nlm_
main, NFSD, status monitor, quota daemon, and portmap (also known as rpcbind).
These services are required, as each has its own role in that communication. The
portmap is responsible for the mapping of the ports to the services and announcing
this; the mountd allows the client to mount the storage and so on.
2. The storage that will be shared and available must be exported under the /etc/exports
configuration file.
Technet24
||||||||||||||||||||
3. The client uses the mount command to connect to and use the shared resources (or
using the NFS terminology, to mount the shared storage).
a. The client will call the portmap service on the NFS server to find on which port
to communicate.
b. The client will connect to the NFSD.
c. The NFSD will proxy the call to the mountd service.
4. The client will be able to access the shared storage.
Figure 10-3 illustrates the NFS client/server architecture and the communication over the
network infrastructure.
NFS commands
■ SMB/CIFX/SMB1: The SMB/CIFS protocol was created to use NetBIOS over TCP/
IP (NBT) communication. Starting with Microsoft Windows 2000 Server operat-
ing system, the communication has been changed to use TCP as a transport utiliz-
ing TCP port 445. Used natively in Microsoft Windows 2003. In 1996, Microsoft
renamed the protocol to Common Internet File System after Sun Microsystems
announced the WebNFS initiative.
||||||||||||||||||||
■ SMB 2.0: Released in 2006 and supported by Windows 2008 Server and the
Windows Vista. Improved the communication by decreasing the handshake mes-
sages. Supports symbolic links and the HMAC SHA-256 hashing algorithm for sign-
ing. SMB 2.0 uses 32-bit and 64-bit-wide storage fields, and 128 bits for file handles.
This allowed for improved performance when copying large files. Fully supported in
Samba 3.6.
■ SMB 3.0: Introduced the SMB Direct Protocol, SMB Multichannel, and SMB
Transparent Failover. These features provided better support for data centers.
■ SMB 3.1.1: Released in Windows Server 2016 and Windows 10. Requires secure
negotiation and supports AES-128 encryption.
The SMB/CIFS protocol is supported on the Microsoft Windows Server and desktop
operating systems, but it can also be used with Linux/Unix and the Apple operating sys-
tems. This can be achieved by using the Samba software. The Samba was developed in
1992 by Andrew Tridgell. Here are some of the services and protocols supported:
■ SMB/CIFS
■ DCE/RPC
■ WINS
■ NTLM
As already mentioned, the CIFS uses the client/server architecture, but in fact it consists
of three separate entities:
■ CIFS client: This piece is on the end-user machine and is capable of communicating
with the CIFS server using the CIFS protocol. It is like a driver on your operating
system; it has all the needed functionality to communicate using this protocol. The
requests to the CIFS server always originate from the CIFS client.
■ CIFS Server: This entity provides the CIFS functionality. As the CIFS protocol pro-
vides access to shared file systems, but also supports the communications between
processes on different systems, the CIFS server includes multiple subcomponents and
subsystems responsible for the needed functionality. Such components and systems
are the SMB Trans, SMB Trans2, and NT Trans, forming the transactions-processing
subsystem. There are also the RPC and the user authentication pieces, as well as the
Remote Administration Protocol (RAP) and the Distributed File System (DFS).
Technet24
||||||||||||||||||||
■ CIFS Application: This entity triggers the communication between the CIFS client
and the server. The CIFS client and server are the pieces that can communicate with
each other, using the CIFS protocol. The application is the piece that actually utilizes
the functionality offered by the VIFS server. It cannot natively communicate with
the server; that’s why it uses the CIFS client.
The SCSI protocol was initially implemented for communication between the operating
system and the local storage, and later it was used for communication with storage that’s
reachable over a network.
Initially, the communication took place over a parallel SCSI bus. The parallel SCSI bus is
half-duplex, which means at a given moment only a single device can communicate: either
a request can be sent or a response can be received. Because commands are exchanged
over this channel, data blocks and status messages, being half-duplex, might cause some
commands to be dropped while the channel is busy, which creates a multidrop environ-
ment. The parallel SCSI bus had limitations such as the following in the latest version:
■ A maximum of 16 devices
■ Half-duplex
As already mentioned in the overview of the storage protocols, there are other transports
for the SCSI protocol that allow its usage as a block-based protocol over larger distances
and bigger infrastructures.
The SCSI protocol itself has the task of performing two major functions:
■ To form and exchange units that contain commands, blocks of data, and status
messages
The SCSI protocol sits between the operating system and the peripheral resources.
Through the operating system, the applications are capable of using the SCSI protocol
for communication with the storage devices and other peripheral units, such as printers.
||||||||||||||||||||
The architecture of the SCSI protocol is shown on Figure 10-4. The enhanced paral-
lel port (ECP) is a standard signaling method for bidirectional parallel communication
between a computer and peripheral devices. The figure illustrates that the communication
of the SCSI protocol can happen on different transport mediums, such as ECP and IP
networks.
Responses
vol1
Tasks
Parallel ECP or IP
The SCSI architecture defines the communication between the initiators and the targets.
As initially that communication was taking place over a parallel SCSI bus, the architecture
was defined for such a transport. Later, with the use of the iSCSI and FC Protocol, the
commands’ definition continued to be used with some minimal changes.
The SCSI commands are sent in a Command Descriptor Block (CDB), which consists of
an operation code and command-specific parameters. In general, the initiator starts the
communication with a request, which is a CDB, to the target. Then the target returns
a status message. There are more than 60 SCSI commands, grouped into four types: N
(non-data), W (write from initiator to target), R (read data), and B (bidirectional).
The SCSI protocol uses a specific addressing, called a SCSI ID, to support the commu-
nication between the different participants. For the purposes of better utilization and
flexibility, the physical storage is divided into logical units. Because of this, for the stor-
age systems it is not enough to use only a SCSI ID as a form of identification, as it points
to the physical devices only. It also needs to identify a specific logical unit on top of the
physical storage. For this, the Logical Unit Numbers (LUNs) are used, which are the num-
bers that identify the logical units. In this way, the combination between the SCSI ID and
the LUN forms the address used in the SCSI communication.
Technet24
||||||||||||||||||||
Each of these designs has its advantages and disadvantages, which makes them suitable
for different needs and applications.
Storage device
Host
On one hand, this is an advantage, as the storage provides all its resources to the host and
the access is controlled and secured, which also can be described as captive storage. But,
on the other hand, this can be looked at as a disadvantage, as the resources of the stor-
age system are available to and controlled by a single host, which means storage system
capacities can be underutilized.
The DAS can also be used by other hosts across a network if it is shared by its host. Still,
it’s the host that is in control of who uses the resources of the storage system and how
||||||||||||||||||||
they are used. Implementing such a sharing approach can be complex and difficult to
monitor and manage.
The resources of the storage system are limited, and even though in certain situations the
host might not be able to utilize all of them, there might be situations in which the host
will need more resources and the storage system cannot provide more than it has. This
means that scalability with the DAS is limited. Using DAS for the purposes of creating
and storing backups for multiple hosts is not a good application.
An NAS device usually has multiple physical hard drives. The storage controllers on that
device or server support virtualization technologies, such as RAID, to manage the physi-
cal drives, create logical volumes from them, and provide the needed levels of redundan-
cy and data protection.
Network
Infrastructure
Technet24
||||||||||||||||||||
The speed with which the data can be accessed and transferred depends on both the
hardware resources and configuration of the physical hard drives on the NAS side as well
as the capacities and the speed of the networks.
These characteristics of NAS make it suitable for sharing files and directories with mul-
tiple users, where speed is not of critical importance. However, NAS is not suitable for
the purposes of running mission- and latency-critical applications, as it does not have the
needed scalability, secure data transfer, and secure storage that block-based storage does.
The initiators usually are the servers in the data center because the applications run on
them, and therefore they are the consumer of the services of the storage systems. The
servers also need to be physically equipped with HBAs to connect to the SAN.
The communication component of the SAN is built by using specialized switches. These
switches use the Fibre Channel Protocol for the communication between the initiators
and the servers connected to them. The three major participants—the initiators, the tar-
gets, and the FC switches—create the SAN fabric. The fabric is not only the sum of the
physical components but also the convergence of all the needed processes and services
for the FC Protocol to run in this environment.
The FC switches for physical connectivity use fiber optics and special FC transceivers.
This creates a very fast infrastructure, as the speeds supported can be 2, 4, 8, 16, 32, or
64Gbps.
The Fibre Channel Protocol (FCP) is the transport protocol used in the SAN. It provides
a high-speed, secure, and lossless exchange of data, sent in blocks, in the form of SCSI
commands. It is standardized from the Technical Committee T11 of the INCITS, an ANSI-
accredited standards committee. Something interesting is that although nowadays the FCP
is used to transport the SCSI commands, it is still a transport protocol and can also be
used to transport the communication for other protocols, as long as you can encapsulate
the data units of another protocol in FC frames. This is not common; however, it is
||||||||||||||||||||
just mentioned here to make a clear difference between the transport, the FCP, and the
block-storage protocol, in this case the SCSI protocol.
The FCP was standardized in 1988, with further approval from ANSI in 1994. Cisco
Systems entered the SAN market in 2002 with the acquisition of Andiamo Systems, Inc.,
which is the company that developed the first intelligent multilayer storage switches. The
same year, Cisco released the Cisco MDS switches, FC switches running the SAN-OS
(Cisco’s implementation of FCP).
A SAN, illustrated in Figure 10-7, provides a lot of benefits for data centers:
■ Up to 16 million devices
LAN
Eth
Eth
FC SAN
FC FC
FC Storage FC Storage
Technet24
||||||||||||||||||||
The initiators and the targets to connect to an FC SAN need to use specialized com-
munication adapters designed to process the Fibre Channel Protocol. These adapters, as
already mentioned, are called host bus adapters (HBAs). The HBAs process the FCP in
their silicone, which provides faster and more secure communication. Figure 10-8 shows a
comparison between the stack of an HBA and an NIC.
App/OS
Server with NIC Server with HBA
App/OS
I/O Subsystem
I/O Subsystem
FC Driver
TCP Driver:
• Sequencing
• Segmentation
NIC
NIC NIC
HBA FC Stack:
• FC Frame Formation
• Error Correction
• Flow Control
• Flow Control
• Serialization
• Segmentation
• Sequencing
Figure 10-8 Comparison in the Stack Processing Between an Ethernet NIC and an FC HBA
The HBAs are similar to the Ethernet network interface cards (NICs) in terms of a func-
tion—both are used to provide connectivity to a network environment for the purposes
of the communication. However, there is a huge difference in the way they function.
The Ethernet NICs rely on the software drivers in the operating system for protocol-
related functions and processing, such as flow control, sequencing, error correction, seg-
mentation, and others.
In comparison, the processing of the Fibre Channel Protocol stack happens in the hard-
ware of the HBA. The SCSI protocol passes the commands to the HBA, where the FCP
||||||||||||||||||||
frames are formed and the physical connectivity is established. This takes the load off the
device’s CPU and allows for better control.
The communication between the initiator and the target needs to be secure and reliable, as
the FC frames carry the data, divided into blocks. It is important that the data is reassem-
bled in the correct order at the destination of the communication and that all the blocks
are present. That’s why there are multiple mechanisms and stages of FCP communication to
take care of that. Figure 10-9 shows the different components of the FCP communication.
Initiator Target
Sequence 1
FC Frame FC Frame FC
C Frame
Sequence 2
NIC
HBA FC Frame FC Frame FC
C Frame
Sequence N
FC Frame FC Frame FC
C Frame
FC Frame
Word
In the FCP communication, the smallest piece of data transmitted over the FC links is
called a word. The size of a word is 32 bits (4 bytes), which is serialized into 40 bits using
8-bit/10-bit encoding. The words form the FC frames. A series of FC frames sent in one
direction is called a sequence, and all the sequences that form the whole conversation
between an initiator and a target is called an exchange.
For clarity, here are two examples of FCP communication. The first one is a SCSI-FCP
read operation, where an initiator requests data stored on the target to be read and sent to
it. The second example is of a SCSI-FCP write operation, where an initiator opens com-
munication with a target and sends data to be stored.
The SCSI-FCP read operation, which is illustrated in Figure 10-10, consists of the
following steps:
Step 1. The initiator generates a SCSI read request (FCP_CMD). The server notifies
the storage system that it wants to read data stored on it.
Step 2. The HBA of the initiator encapsulates the SCSI Read command into an FC
frame and sends it to the target. This process is the first sequence (sequence 1)
of the whole exchange.
Technet24
||||||||||||||||||||
Step 3. The target receives and processes the FC frame. It retrieves the requested data
(FCP_DATA) from storage and encapsulates the data blocks in FC frames.
Step 4. The target sends the FC frames to the initiator in one sequence (sequence 2).
Step 5. The target generates a status command (FCP_RSP) that informs the initiator
that the requested data transmission is complete.
Step 6. The target encapsulates the status command in an FC frame and sends it in
another sequence (sequence 3).
Sequence 1
FCP_CMD FC Frame
SCSI read request
Sequence 2
Sequence 3
FCP_RSP
FC Frame transmission complete
Initiator Target
At this point, the I/O operation is complete. The collection of all the three sequences is
the entire exchange.
The SCSI-FCP write operation, illustrated in Figure 10-11, consists of the following steps:
Step 1. The initiator node generates a SCSI write request (FCP_CMD). The server will
notify the storage system that it wants to send data to the storage system.
Step 2. The HBA of the initiator encapsulates the SCSI-FCP Write command in an
FC frame and sends it to the target in a single sequence (sequence 1).
Step 3. The target node responds with a SCSI write request response (FCP_XFR_
RDY). The write request response is required for synchronization between the
initiator and target.
Step 4. The target response is encapsulated in an FC frame and sent to the initiator in
another sequence (sequence 2).
||||||||||||||||||||
Step 5. The initiator retrieves the data (FCP_DATA) from its upper-layer protocol
(ULP) buffers and packages it.
Step 6. The data is encapsulated in FC frames and sent to the target in a sequence
(sequence 3).
Step 7. The target generates a status command (FCP_RSP) to confirm the end of the
exchange.
Step 8. The target encapsulates the status command in an FC frame and sends it as a
last sequence of the communication (on the diagram this is sequence 4). All
four of these sequences form the entire exchange needed to store data on a
storage system.
Sequence 1
FCP_CMD FC Frame
SCSI write request
Sequence 2
FCP_XFR_RDY
FC Frame
write req sync
Sequence 3
FCP_DATA FC Frame FC Frame FC Frame
data to store
Sequence 4
FCP_RSP
FC Frame transmission complete
Initiator Target
■ Point-to-Point
Technet24
||||||||||||||||||||
FC NIC
HBA NIC
HBA NIC
HBA
NIC
HBA
Point-to-Point
NIC
HBA
NIC
HBA NIC
HBA
NIC
HBA
NIC
HBA
FC FC
NIC
HBA
Point-to-Point Topology
The point-to-point topology is the smallest and simplest one. It can be qualified as a
DAS, as the target is directly connected to the initiator. The initiator and the target are
equipped with Fibre Channel HBAs and use fiber optics with FC transceivers for the
physical connectivity.
The obvious drawback of this architecture is that the target is dedicated to a single initia-
tor only, meaning that either the resources can be underutilized or they can’t match the
requirements of the server’s workloads. If designed correctly, such a topology might be
used for some specific applications requiring more security and control. As this topology
lacks scalability, it is not common in the data centers. The topology is shown on
Figure 10-13.
||||||||||||||||||||
FC
NIC
HBA
Point-to-Point
NIC
HBA
NIC
HBA NIC
HBA
Hub
NIC
HBA NIC
HBA
NIC
HBA
FC-Arbitrated Loop
The devices can be physically connected to each other through the HBAs in a loop
topology. In this setup, if a device’s HBA malfunctions, the ring is broken and there is no
communication. To overcome this limitation, you can use a fiber channel hub. As it is a
passive device (unlike an FC switch), it does not process the communication but instead
(in case of a failed HBA) preserves the physical connectivity for the rest of the partici-
pants in the loop.
Technet24
||||||||||||||||||||
In the FC Protocol, the ports are assigned different roles, based on which participant
they belong to and what they connect to. The ports, which are used to connect to an
FC-AL, are assigned the role of NL_ports, which stands for Node Loop ports.
When there is an FC-AL that connects only NL_ports, meaning it is not connected to an
FC switch, this is called a “private arbitrated loop.”
When the FC-AL uses a hub, and one of the ports is connected to an FC switch, which
is called an FL_port (Fabric Loop port), this topology is usually referred to as a “public
arbitrated loop.”
The performance of the arbitrated loop is affected by the type of communication defined
by the physical connectivity in a ring—only two of the participants can speak at a time.
This introduces very high latencies.
Another drawback is the lack of redundancy in a private loop topology, as the failure of a
node will bring down the whole loop.
NIC
HBA NIC
HBA NIC
HBA
FC FC
Switched Fabric
||||||||||||||||||||
The iSCSI protocol supports IPsec for secure connectivity as well as authentication.
Because iSCSI allows for the communication between hosts that use the SCSI protocol
over a network, using TCP as a transport, there are some additional concepts and charac-
teristics to allow for the encapsulation and de-encapsulation of that traffic.
■ iSCSI initiator: This is the server/host that wants to use the resources of a storage
system
■ iSCSI target: A storage system that can communicate using the iSCSI protocol
Each iSCSI network entity contains an iSCSI node, which can be either an initiator or a
target. The iSCSI node is identified by an iSCSI node name.
The iSCSI node needs to be capable of utilizing the IP network that’s connected to the
iSCSI network entity to which it belongs. For this, the so called “network portal” is used.
This component has network access, supports TCP/IP, and is identified by an IP address.
Usually this is the network adapter, which can also be a wireless one.
Here, you can clearly see the different layers of processing data with iSCSI. The iSCSI
node takes care of the mapping and encapsulation between the SCSI protocol and the
underlying TCP transport protocol. The iSCSI node can be considered an overlay virtual
component on the top of the network portal. The concept is shown on Figure 10-16.
Technet24
||||||||||||||||||||
Network Portal
IP Network
Network Portal
iSCSI Node
The iSCSI nodes use specific addressing known as iSCSI node names. The size of the node
names can be up to 255 bytes. They need to use UTF-8 encoding in a human-readable
string. They are used for target discovery and for login and authorization purposes.
■ iSCSI Qualified Name (IQN), defined in RFC 3720, is one of the most popular types
of iSCSI addressing. The fields in an IQN carry the following information:
■ A 64-bit address.
■ T11 Network Address Authority (NAA), which has the following structure:
■ It starts with “naa”.
||||||||||||||||||||
Technet24
||||||||||||||||||||
Solid-state drives (SSDs) were developed later. These are hard drives, but instead of being
electromechanical devices with spinning components, they are based on flash memory.
The SSDs are much faster because they are just like RAM, but they are considered a non-
volatile memory, which means that the data will be preserved even after the power is off.
At the beginning, SSDs, which replaced HDDs, used the SCSI protocol for communica-
tion. This meant that there had to be a SCSI controller to manage them and also to com-
municate with the CPU, and the same message-based communication was still in place.
Figure 10-18 illustrates the SCSI SSD communication and compares it to the communica-
tion between the CPU and the RAM.
RAM
Memory Channel
PCIe SCSI
NIC
To overcome this challenge, the Non-Volatile Memory Express (NVMe) was devel-
oped, also known as the Non-Volatile Memory Host Controller Interface Specification
(NVMHCIS). The NVMe is a standard or specification that defines how the non-volatile
media (NAND flash memory, in the form of SSDs and M.2 cards) can be accessed over
the PCIe bus using memory semantics—said in a different way, how the flash memory
storage to be treated as memory, just like the RAM, and the SCSI controller and protocol
can become obsolete. Figure 10-19 illustrates the NVMe concept.
RAM
Memory
Channel
PCIe
CPU SSD
||||||||||||||||||||
NVMe has been designed from the ground up and brings significant advantages:
■ No adapter needed.
For the sake of the comparison, with the SCSI protocol and a SATA controller, there is
support for one queue and up to 32 commands per queue. With the SSD there are 65535
queues each supporting 65535 commands.. This shows the huge difference and impact
of the NVMe as a technology, especially in the data center, where the constantly increas-
ing workloads demand an increase in the resources, including the storage devices and the
communication with them.
Cisco Unified Computing System (UCS) C-series rack and B-series blade servers support
NVMe-capable flash storage, which is best suited for applications requiring ultra-high
performance and very low latency. This is achieved by using non-oversubscribed PCIe
lanes as a connection between the multicore CPUs in the servers and the NVMe flash
storage, bringing the storage as close as possible to the CPU.
The NVMe brings a huge improvement in the performance of the local storage, but in
the data centers the data is stored on storage systems, as it needs to be accessed by dif-
ferent applications running on different servers. The adoption of SSDs as a technology
also changed the DC storage systems. Now there are storage systems whose drives are
only SSDs, also called all-flash arrays. For such storage systems to be able to communi-
cate with the servers and to bring the benefits of the flash-based storage, Non-Volatile
Memory Express over Fabrics (NVMe-oF) was developed. It specifies how a transport
protocol can be used to extend the NVMe communication between a host and a storage
system over a network. The first specification of this kind was NVMe over Fibre Channel
(FC-NVMe) from 2017. Since then, the supported transport protocols for the NVMe-oF
are as follows (see Figure 10-20):
■ TCP (NVMe/TCP)
■ InfiniBand (NVMe/IB)
Technet24
||||||||||||||||||||
RDMA
TCP
SSD
SSD
SSD
SSD
SSD
SSD
The NVMe-oF uses the same idea as iSCSI, where the SCSI commands are encapsulated
in IP packets and TCP is the transport protocol, or Fibre Channel, where FCP is used to
transport the SCSI commands.
Fibre Channel is the preferred transport protocol for connecting all-flash arrays in the data
centers because of its secure, fast, scalable, and plug-and-play architecture. FC-NVMe
offers the best of Fibre Channel and NVMe. You get the improved performance of NVMe
along with the flexibility and scalability of the shared storage architecture.
FC-NVMe is supported by the Cisco MDS switches. Here are some of the advantages:
■ Multiprotocol support: NVMe and SCSI are supported over FC, FCoE, and FCIP.
■ Superior architecture: Cisco MDS 9700 Series switches have superior architecture
that can help customers build their mission-critical data centers. Their fully redun-
dant components, non-oversubscribed and nonblocking architecture, automatic
isolation of failure domains, and exceptional capability to detect and automatically
recover from SAN congestion are a few of the top attributes that make these switch-
es the best choice for high-demand storage infrastructures that support NVMe-
capable workloads.
||||||||||||||||||||
■ Integrated storage traffic visibility and analytics: The 32Gbps products in the
Cisco MDS 9000 family of switches offer Cisco SAN Telemetry Streaming, which
can be combined with the FC-NVMe traffic with just a nondisruptive software-only
upgrade.
■ Strong ecosystem: Cisco UCS C-series rack servers with Broadcom (Emulex) and
Cavium (QLogic) HBAs.
Additionally, in the SAN-OS, and later in the NX-OS, there was interoperability support
for when the Cisco MDS FC switches would be deployed in a multivendor environment.
There were three interoperability modes supporting the specifics of Brocade, McData,
and FCP standards based Fibre Channel switches.
The SAN-OS was running on and managing the purpose-built Cisco MDS switches. There
were three separate management tools and options.
The SAN-OS provided a command line interface (CLI), which allowed administrators to
connect to the FC switch via a terminal emulator supporting Telnet and SSH. The alter-
native to the CLI was, and still is, the Device Manager. Figure 10-21 shows screenshots
from the Device Managers from different models of Cisco MDS switches.
Device Manager is a graphical user interface (GUI) that supports switch provisioning and
provides a graphical representation of the linecard and supervisor modules, ports, power
supplies, and so on. Here are some of the features supported:
Technet24
||||||||||||||||||||
The tool for discovering, provisioning, configuring, monitoring, and troubleshooting SAN
fabrics built with the Cisco MDS and Nexus switches was the Cisco Fabric Manager.
Figure 10-22 shows a screenshot from the Cisco Fabric Manager.
||||||||||||||||||||
Later, with the development of the data center technologies, Cisco released the Nexus
family of switches. These switches support not only Ethernet environments but also the
Fibre Channel over Ethernet (FCoE) protocol. As the FCoE allows FC communication
over Ethernet networks, this provided support and flexibility for using different designs
in the data centers and led to optimization in the access layer by using Cisco Nexus
switches for both Ethernet and SAN communication. This duality in the nature of the
Cisco Nexus switches required a new operating system that could support both Ethernet
and FC communication. As a result, the NX-OS was released. It is a modular operating
system used on the Cisco Nexus and MDS switches. Based on the SAN-OS, the NX-OS
module was developed, which is responsible for storage communication.
These developments also affected the Cisco Fabric Manager. As the Cisco MDS and
Nexus switches were working together in the new converged designs for the data cen-
ter infrastructures, there was a need for a provisioning, management, monitoring, and
troubleshooting application that could support both the Ethernet and the SAN infra-
structures. Thus, the Cisco Fabric Manager evolved as the foundation for the Cisco
Data Center Network Manager (DCNM), a management solution for all NX-OS network
deployments spanning multitenant, multifabric LAN fabrics, SAN fabrics, and IP Fabric
for Media (IPFM) networking in the data center powered by Cisco.
Technet24
||||||||||||||||||||
■ Smart topology views showing virtual port channels (vPCs) and virtual device con-
texts for Cisco Nexus networks (topology views include VXLAN search)
■ Role-based access control (RBAC) within the fabric to separate administrative tasks
between functional domains
■ Storage bandwidth
■ Dashboards for a custom summary view of LAN and SAN domains and topology
groups
Figure 10-23 shows a screenshot from the topology view of the Cisco DCNM.
||||||||||||||||||||
Cisco MDS 9000 multilayer SAN switches are designed based on the switched fab-
ric flexible hardware architecture. This is combined with the use of hardware buffers,
queues, and a virtual output queueing technique with a central arbiter to allow for cut-
through speeds, avoidance of head-of-line blocking, security, stability, and scalability.
Here are some of the important benefits of the Cisco MDS 9000 series SAN switches:
■ Flexibility: Multiprotocol support, support for 32Gbps Fibre Channel, and readi-
ness for 64Gbps Fibre Channel, 40G Fibre Channel over Ethernet (FCoE), and Non-
Volatile Memory Express (NVMe) over fabric.
Technet24
||||||||||||||||||||
||||||||||||||||||||
The family of the Cisco MDS 9700 directors, shown in Figure 10-26, includes the
following platforms:
■ Cisco MDS 9718 is an 18-slot chassis with 16 linecard slots and up to 16 power
supplies. It’s the biggest one, occupying 26RU.
■ Cisco MDS 9710 is a 10-slot chassis with eight linecard slots and up to eight power
supplies. The middle-sized chassis is 14RU.
■ Cisco MDS 9706 is a six-slot chassis with four linecard slots and up to four power
supplies. It is 9RU.
Figure 10-26 Linecard Options and Scalability per Cisco MDS 9700 Director Chassis
Each of the director chassis supports up to two Supervisor modules. These modules
support the management and control planes or provide the management and processes
needed for the switches’ operation. Using two supervisors provides redundancy.
In the Cisco MDS 9700 Directors, the physical switching of the frames happens in the
crossbar fabric modules, also called just “fabric” modules. Each of the chassis has six
slots for fabric modules. Depending on the needs, there can be a different number of
fabrics installed in these slots, thus controlling the switching capacities. The minimum
number of fabric modules per chassis is three for the purposes of supporting a minimum
redundant configuration.
The linecards are line rate and support 48′ 32Gbps or 48′ 16Gbps FC ports, 48′ 10Gbps
FCoE ports, or 24′ 40Gbps FCoE ports.
Technet24
||||||||||||||||||||
■ Cisco MDS 9132T: 16′ fixed FC ports operating at speeds of 4/8/16/32Gbps and a
slot for an additional module, which adds another 16 FC ports operating at the same
speeds
All of these 32Gbps Fabric Cisco MDS switches support the following features:
■ Autozone feature
■ NVMe/FC ready
■ Extended BB_Credits (up to 8270 per port or 8300 per port group)
||||||||||||||||||||
■ The Cisco MDS 9148S switch provides up to 48 fixed FC ports capable of speeds of
2, 4, 8, and 16Gbps, and the Cisco MDS 9396S switch provides up to 96 autosensing
Fibre Channel ports that are capable of speeds of 2, 4, 8, 10, and 16Gbps of dedicat-
ed bandwidth for each port. The 16Gbps Cisco MDS 9148S and Cisco MDS 9396S
are the first generation of NVMe-ready Fibre Channel 1RU and 2RU switches.
■ The Cisco MDS 9396S switch offers more buffer-to-buffer credits and support for
more VSANs, making it an excellent choice for standalone small and midsize busi-
ness (SMB) Fibre Channel networks.
■ High-availability - the Cisco MDS 9148S and Cisco MDS 9396S switches support
In-Service Software Upgrades (ISSU). This means that Cisco NX-OS Software can be
upgraded while the Fibre Channel ports carry traffic.
■ The Cisco MDS 9148S and Cisco MDS 9396S switches include dual redundant hot-
swappable power supplies and fan trays, port channels for Inter-Switch Link (ISL)
resiliency, and F-port channeling for resiliency on uplinks from a switch operating in
NPV mode.
■ The Cisco MDS 9148S and Cisco MDS 9396S switches offer built-in storage net-
work management and SAN plug-and-play capabilities. All features are available
through a command line interface (CLI) or Cisco Prime DCNM for SAN Essentials
Edition, a centralized management tool. Cisco DCNM task-based wizards simplify
management of single or multiple switches and fabrics. For virtual infrastructures, it
manages the entire path: from the virtual machine and switch to the physical storage.
Technet24
||||||||||||||||||||
■ The Cisco MDS 9148S and Cisco MDS 9396S switches also support PowerOn Auto
Provisioning (POAP) to automate software image upgrades and configuration file
installation on newly deployed switches. Additionally, they provide intelligent diag-
nostics, protocol decoding, network analysis tools, and Cisco Call Home for added
reliability, faster problem resolution, and reduced service costs.
■ The switches offer support for virtual SAN (VSAN) technology for hardware-
enforced, isolated environments within a physical fabric. They offer access control
lists (ACLs) for hardware-based, intelligent frame processing. Advanced traffic
management features, such as fabric-wide quality of service (QoS) and Inter-VSAN
Routing (IVR), are included in the optional Cisco MDS 9000 Family Enterprise
Package. QoS prioritizes application data traffic for better and more predictable net-
work service. Zone-based QoS simplifies configuration and administration by using
the familiar zoning concept. IVR facilitates resource sharing across VSANs without
compromising scalability, reliability, availability, and network security.
||||||||||||||||||||
The Cisco MDS 9250i Multiservice Fabric switch is equipped with 40 channel ports
operating at speeds of 2, 4, 8, and 16Gbps. There are two 1/10-Gigabit Ethernet ports on
the IP storage services built-in module that support the FCIP communication. Another
eight 10-Gigabit Ethernet ports are used for the FcoE communication. All of this fits
within a small fixed 2RU form factor. The Cisco SAN Extension over IP application
package license is enabled as standard on the two fixed 1/10-Gigabit Ethernet IP storage
services ports, enabling features such as FCIP and compression on the switch without the
need for additional licenses. Also, using the eight 10-Gigabit Ethernet FcoE ports, the
Cisco MDS 9250i platform attaches to directly connected FcoE and Fibre Channel stor-
age devices and supports multitiered unified network fabric connectivity directly over
FcoE.
Here are some of the unique advantages of the Cisco MDS 9250i Multiservice Fabric
switch:
■ Intelligent application services engine. The Cisco MDS 9250i includes as standard
a single application services engine that enables the included Cisco SAN Extension
over IP software solution package to run on the two fixed 1/10-Gigabit Ethernet
storage services ports.
Technet24
||||||||||||||||||||
■ Support for all the Cisco MDS 9000 family enhanced capabilities, including
VSANs, IVR, advanced traffic management, and network security across remote
connections.
■ IBM Control Unit Port (CUP) support enables in-band management of Cisco
MDS 9200 Series switches from the mainframe management console.
■ FICON tape acceleration reduces latency effects for FICON channel extension
over FCIP for FICON tape read and write operations to mainframe physical or
virtual tape.
■ Support for the IBM Extended Remote Copy (XRC) Acceleration feature, which
enables acceleration of dynamic updates for IBM z/OS Global Mirror (formerly
known as XRC).
■ Cisco Data Mobility Manager (DMM). This fabric-based data migration solution
transfers block data, without disruption, across heterogeneous storage volumes and
across distances, whether the host is online or offline.
■ Support for extended security such as RADIUS and TACACS+, FC-SP, SFTP, SSH
protocol, SNMPv3 implementing AES, VSANs, hardware-enforced zoning, ACLs,
and per-VSAN RBAC.
■ Support for IPv6 as mandated by the U.S. Department of Defense, Japan, and China.
IPv6 support is provided for FCIP, iSCSI, and management traffic routed both in
band and out of band.
||||||||||||||||||||
Summary 401
Summary
This chapter covers the different storage connectivity options in the data center, the
different storage communication and transport protocols, and the Cisco MDS 9000
family of Fibre Channel switches. The following information was also covered:
■ The requirements for more complex storage infrastructures in the data centers are
dictated by the requirements of the applications.
■ The communication between the host and the storage device uses block-based and
file-based storage communication protocols, which in turn use different transport
protocols.
■ The Network File System (NFS) is a file-based storage protocol based on the client/
server architecture and supported on Linux/Unix systems.
■ SCSI is a block-based storage protocol. The major participants are the initiator, which
is the server, and the target, which is the storage device/system. SCSI uses specific
addressing called a Logical Unit Number.
■ There are different network storage system designs, depending on the complexity of
the infrastructure and the selected storage protocols.
■ DAS is when the storage system is directly attached to the host and it’s available only
to it or through it.
■ NAS is when the storage is reachable through a network environment, usually used in
data centers where file-based storage communication is implemented.
■ SANs are separate physical infrastructures built to provide isolated and secure com-
munication for block-based storage.
■ Fibre Channel storage networking is the transport protocol for the SANs.
■ In FCP communication, the initiators and the targets connect to the FC switches
using specialized communication adapters called HBAs.
■ The FCP topologies are point-to-point, arbitrated loop, and switched fabric.
■ The point-to-point topology involves DAS connectivity that uses the FCP and HBAs.
Technet24
||||||||||||||||||||
■ The switched fabric topology is also known as a SAN. It is built by using FC switch-
es to create a communication environment for the initiators and the targets.
■ The iSCSI protocol transports the SCSI protocol commands over an IP network
by encapsulating the data in IP packets. The TCP is responsible for the secure
transmission.
■ The NVMe standard allows the NAND flash storage devices to communicate with-
out the limitations of the SCSI protocol and to be accessed as memory.
■ NVMe over Fabric allows the NVMe storage devices to be accessed by initiators
over different communication infrastructures.
■ The Cisco MDS 9000 family of switches includes the Cisco Fibre Channel switches.
■ The Cisco MDS 9700 switches are director-class, modular chassis-based devices.
■ The Cisco MDS 9000 16Gbps and 32Gbps switches are suitable for the SAN access
layer.
■ The Cisco MDS 9250i multiservice switch supports different storage communication
protocols such as FCP, FCIP, FCoE, and iSCSI, allowing for the implementation of
complex and flexible storage communication.
Reference
“Storage Area Networking,” https://www.cisco.com/c/en/us/products/storage-network-
ing/index.html
||||||||||||||||||||
Chapter 11
The previous chapter discussed the building blocks of storage in the data center. The
need for scalability and flexibility led to the use of specialized storage infrastructures
called storage area networks (SANs). The transport protocol is the Fibre Channel Protocol
(FCP). The name comes from the British English spelling and the fact that the storage
networks on which the FCP operates are built using fiber optical physical connectivity
between the initiators, the targets, and the FC switches. In the SAN, the FCP runs on all
the layers of the data center stack—from the physical layer, up to the applications that
utilize the storage systems. In this chapter, you will get acquainted with the FCP layered
model, the components operating on the different layers, and the relationship between
them, such as the FC port types, the physical and logical addressing, and the routing of
the FC frames. When you think of storage communication, and specifically the type that
happens over the FCP, the most important aspect is the reliable and secure transmission
of the FC frames. To understand how this happens in the FCP, we will look at the connec-
tions established at the different levels to create the secure transmission environment and
the processes that control it.
In the same way, the FCP is described and defined by using a layered model. The FCP
consists of five layers, which are illustrated in Figure 11-1:
■ FC-4 Upper Layer Protocol (ULP) mapping: Responsible for the protocol mapping.
Identifies which ULP is encapsulated into a protocol data unit (PDU) for delivery to
Technet24
||||||||||||||||||||
the FC-2 layer. The major ULP using the FCP is the SCSI protocol. There are others,
including the Internet Protocol (IP), that are also defined in the standard.
■ FC-3 Common Services: The common services for the advanced FCP services, such
as striping (multiple node ports transmitting), hunt groups (multiple node ports
receiving), multicast, encryption, and redundancy algorithms.
■ FC-2 Signaling: This is the layer where the transport mechanism exists. The follow-
ing functions, defined by the standard, happen at this layer:
■ Ordered Set: A 4-byte transmission word that contains important operations for
the frames, such as Start-of-Frame (SOF) and End-of-Frame (EOF), as well as the
so-called primitive signals, such as Idle and Receiver Ready (R_RDY), which con-
trol when the transmitting side will start to send frames and such.
■ FC Frame: The block of communication in the FCP. There are two types of FC
frames: the Data frame and the Link_control frames. The Data frame contains
the ULP data blocks that have to be carried to the destination. The Link_control
frames are the Acknowledge (ACK) and the Link_response frames. The FC frames
start with a 4-byte start of frame (SOF) field, followed by a 24-byte Frame head-
er, the Data field (2112 bytes), and the CRC Error check field, and they end with
the 4-byte end of frame (EOF) field.
■ Exchange: All the sequences in one conversation between an initiator and a target.
■ Protocol: The protocols for the FCP services. Here are the protocols running on
this layer:
■ Fabric Login Protocol: How the initiator or the target connect to the switched
fabric and what information is exchanged.
■ Port Login Protocol: Responsible for the exchange of the service parameters
and capabilities between two ports.
■ Port Logout Protocol: Used to disconnect a port from another port and to
free resources.
||||||||||||||||||||
■ Flow control: The mechanism at FC-2 responsible for controlling the speed at
which the FC frames are exchanged between two ports and for avoiding dropped
frames.
■ FC-1 encoding: Defines the rules for the serial encoding and decoding of the data
to be transmitted over the fiber. The 8b/10b encoding means that each 8 bits of
data are put in a 10-bit transmission character. This is related to how the signals
are transmitted over the physical media and the clock synchronization for this
serial communication. These details go beyond the scope of the discussion in this
chapter. The 8b/10b encoding is used on links with speeds of 1, 2, 4, and 8Gbps.
For the faster links, with speeds of 10 and 16Gbps, the encoding used is 64b/66b.
The 32, 64, and 128Gbps links use the 256b/257b encoding. The faster links are
backward compatible with the slower links with the 8b/10b encoding.
ULPs SCSI
FC-2
Signaling Protocol
Protocol
FC-1
Transmission Protocol
Coding
Technet24
||||||||||||||||||||
■ N_Port: This stands for “node port.” This is the port of the HBA that is installed on
an initiator or a target node. The N_Port represents a node. A node can have more
than one N_Port.
■ F_Port: The “fabric port” is a port on the FC switch that connects to an N_Port. The
F_Port is the point where the end node, initiator, or target connects to the fabric.
In standard connectivity, one N_Port always connects to one F_Port. Later will be
mentioned some virtualized FC port types, but they will be discussed in more detail
when we discuss the topics NPV/NPIV and FCoE.
■ E_Port: The “expansion port” describes a port on the fabric switch that connects
to another fabric switch, thus expanding the switched fabric. The E_Port to E_Port
links are called Inter-Switch Links (ISLs).
N_Port F_Port
HBA
NIC
After that, we have the FC-AL topology, where the devices are connected in a loop
directly to each other, or through an FC hub. Here are the FC port types for this connec-
tivity (see Figure 11-3):
■ L_Port: This is the “loop port,” which supports the arbitrated loop (AL) topology
functions.
||||||||||||||||||||
■ FL_Port: This is the “fabric loop port,” which is an FC port on a switch and is used
to connect an FC-AL to the FC switched fabric.
Hub
NL_Port NL_Port FC
Standard Ports NIC
FC
FL_Port Storage
Host
Array
Additionally, you can encounter the Fx_Port and the Nx_Port, which are ports that can
function as either F_Port/FL_Port or as N_Port/NL_Port, respectively.
When the FCP connectivity communicates over some other transport infrastructure,
such as an IP network, the physical connectivity is not native FC. In this situation, virtual
FC interfaces, or overlays, are created that are assigned the needed port types and con-
nect to each other using the FCP mechanisms. As illustrated in Figure 11-4, these virtual
FC port types are as follows:
■ VF_Port: The “virtual fabric port” is used as an overlay port on a switch when the
communication goes through an Ethernet port. The VF_Port is mapped to the under-
lying physical port.
■ VN_Port: This is the “virtual node port,” which is the overly FC port, mapped to a
physical Ethernet interface in an end node. The VF_Port and the VN_Port are used
in FCoE and FCIP communication.
Technet24
||||||||||||||||||||
VN_Port VF_Port
Virtual FC Link Cisco MDS/Nexus FCoE Switch
CNA
NIC
■ Port World Wide Name (pWWN/WWPN): This address uniquely identifies one FC
port. On one FC device there will be as many WWPNs as the number of FC ports.
For example, if an HBA has one FC port, then there will be one WWPN, which will
identify that FC port. If there are two FC interfaces, there will be two WWPNs—
one for each FC interface. Of course, the FC interface can be virtual, so there will be
a WWPN to the virtual FC interface as well.
■ Node World Wide Name (nWWN/WWNN): This is the physical address that
uniquely identifies an FC device. It can be an HBA or an FC switch. This means that
an HBA with one FC port will have one WWNN for itself and one WWPN for the
FC port. If there are two FC ports on the HBA, there will be one WWNN for the
HBA as an FC device and a WWPN for each of the FC ports.
It is important to know that the WWNNs and the WWPNs have exactly the same format.
Because each physical address is used to uniquely identify an FC port or FC device, it
has to be different and unique. There are rules for how these addresses are generated and
assigned. As they are physical addresses, the FC components come with these addresses
already burned in. However, they can be changed, just like the MAC addresses in the
Ethernet devices. In this case, it’s important that the addresses are unique! An overlap
between a WWPN and a WWNN or another WWPN will break the FC communication.
||||||||||||||||||||
This is very important when you work with the Cisco Unified Computing System,
because the WWNNs and the WWPNs have to be created by you when FCoE is used.
That’s why in the Cisco UCS, WWPN and WWNN address pools are used, which
requires careful design and planning.
FC Physical Addressing
Example WWNs from a Dual-Ported Device
nWWN 20:00:00:45:68:01:EF:25
pWWNA 21:00:00:45:68:01:EF:25
pWWNB 22:00:00:45:68:01:EF:25
As already mentioned, both the WWNNs and the WWPNs have the same structure:
■ The size is 64 or 128 bits (the most commonly used are 128-bit WWNNs).
The structure of a WWN is either 8 or 16 bytes, as the specific format and length are
defined by the Network Address Authority (NAA) bits. Then there is the organizationally
unique identifier (OUI) of the manufacturer, in addition to other vendor-specific informa-
tion.
The most common WWN format you will work with in the data center has the following
characteristics:
■ The first 2 bytes, which include the NAA nibble and three additional nibbles, can be
separated roughly as follows:
■ 10:00: The NAA value of 1 followed by three 0s. Usually assigned to HBAs.
Technet24
||||||||||||||||||||
■ 2x:xx: The NAA has a value of 2, but here the difference is that the other three
nibbles, marked with x’s, can be used by the vendor.
■ 5x:xx: Usually the WWNs assigned to vendors of storage equipment start with 5
as the NAA value.
■ Domain_ID: When the switch fabric is created by having one or more FC switches
to connect, interact together, and achieve convergence, it becomes one whole entity.
As it consists of multiple switches, and each switch has multiple ports, an identi-
fier for each of the switches inside the switched fabric is needed. This is the domain
ID—a unique ID assigned to the switch based on a specific mechanism. In general,
the switches (if it’s a single-switch fabric, there will be a single domain ID), when
first connected, establish links between each other and elect a principal switch. The
principal switch becomes responsible for the domain IDs selected by the switches in
the fabric being unique. Therefore, each switch in an FC fabric has a unique domain
ID. Theoretically, there can be up to 239 domain IDs in a switched fabric, as the
hex values for the Domain_ID are in the range of 01 to EF, but in reality the most
Domain_IDs (switches) operating in a fabric achieved by Cisco is just 100 switches
making up the FC switched fabric. Domain_IDs 00 and F0–FF are reserved for the
switch fabric services.
■ Area_ID: Used to identify groups of N_Ports within a domain (switch). Areas are
also used to uniquely identify fabric-attached arbitrated loops. Also, the Area_ID
part can be used to encode specific vendor information for components that have
specific behavior. Allowed values are in the range of 00–FF (hex).
||||||||||||||||||||
FCIDs
FC
• Bits 23 to 16
Domain_ID
• 239 Domains (01-EF)
N_Port:F_Port
Domain_ID • Bits 15 to 8
Area_ID
• Group of ports/vendor info
N_Port:F_Port
• Bits 7 to 0
Port_ID
• N_Port: F_Port identifier
FC
NIC
When the N_Port establishes a successful link to the F_Port (the fabric switch), which is
a successful login to the fabric (FLOGI process, which will be covered later), the N_Port
uses an initial FCID of 0x000000 to request an FCID assigned to it. The address manager
service, which runs on the switch, generates the FCID, as it uses the switch’s Domain_ID,
the Area_ID for the switch, and the Port_ID to identify the F_Port: N_Port for the device.
The FCP services use FCIDs as well. These FCIDs are referred to as the well-known
addresses. They cannot be used for anything other than their assigned and reserved ser-
vice. Table 11-1 provides well-known addresses for some of the FCP services.
Technet24
||||||||||||||||||||
The well-known addresses are the highest 16 addresses in the 24-bit address space for the
FCIDs.
An essential service is the Fibre Channel Name Server (FCNS), also known as just the
name server. This server runs in a distributed manner; each FC switch runs its own name
server and synchronizes the database with the rest of the switches in the fabric. The
FCNS database exchanges information for which WWPNs and WWNNs use which
FCIDs, which services are supported, what connectivity is used, which VSAN the device
belongs to, and so on. This is important information used by the PLOGI and PRLI pro-
cesses as well as other FCP services. The information in the FCNS is also very useful for
monitoring and troubleshooting the switched fabric.
■ PLOGI: Then, the N_Port logs in to the N_Port with which it will communicate.
■ PRLI: Finally, the PRLI process ensures that the two N_Ports exchange the needed
information regarding the supported ULPs for successful communication between
the target and initiator.
Fabric
F Port A F Port B
N Port A N Port B
FLOGI FLOGI
PLOGI
NIC
HBA NIC
HBA
||||||||||||||||||||
For all these negotiations and exchanges of information to be possible, as well as for
the switched fabric to be managed, multiple services run in the FCP, as defined in the
FC-SW-6 specification:
■ Login Service
■ Address Manager
■ Alias Service
■ Management Server
■ Time Server
■ Fabric Controller
FLOGI Process
After the N_Port and the F_Port are physically connected, the negotiations at the physical
layer are successful, and the electrical parameters are in sync, the link negotiation and the
registration to the fabric start. This is the FLOGI process, which can be described as the
initial bootstrap process that occurs when an N_Port is connected to an F_Port. The FLOGI
process is used by the N_Port to discover whether a fabric is present and what its capabili-
ties are as well as to register to it. The fabric uses FLOGI to get the information for the node
and assign it an FCID. Once the FLOGI process is successful, the N_Port can start the next
process of attempting to communicate with other N_Ports (that is, the PLOGI process).
1. The F_Port and the N_Port will reset the link between them, as shown in Figure
11-8. The goal is for the link initialization to start from fresh and for both parties to
verify the exchanged parameters and information for this link establishment. This
happens by exchanging the LR and LRR commands.
FF Port
Port
N port LR N port
F Port
F Port
Switch
NIC
HBA LPR NIC
HBA
F Port
Technet24
||||||||||||||||||||
2. After the link is initialized, as shown in Figure 11-9, it will be active and IDLE fill
words will flow in both directions on the link. The N_Port will use a source FCID
(SID) of 0x000000, as at this time it does not have an FCID assigned by the address
manager.
FF Port
Port
F Port
Switch
NIC
HBA IDLE NIC
HBA
F Port
3. The N_Port will send a FLOGI link services command to the switch login server
using the well-known address 0xFFFFFE. The N_Port will include its Node name,
N_Port name, or the WWNN and the WWPN it uses as well as its service
parameters.
4. The login server sends an ACC response frame, as shown in Figure 11-10, that con-
tains the N_Port address in the Destination FCID (DID) field.
FF Port
Port
LS_ACC
N port N port
Server
F Port
F Port
Login
Switch
NIC
HBA FLOGI NIC
HBA
F Port
5. After an FCID, the N_Port logs in to the fabric name server at the address
0xFFFFFC. The N_Port transmits its service parameters, such as the number of
buffer credits it supports, its maximum payload size, and the supported Classes of
Services (CoSs).
||||||||||||||||||||
6. The name server responds with an LS_ACC frame, shown in Figure 11-11, that
acknowledges the information is registered. After that comes the next login pro-
cess—the port login between two N_Ports, when the initiator prepares to communi-
cate with a specific target.
FF Port
Port
N port
LS_ACC N port
F Port
Server
Name
F Port
Switch
NIC
HBA PLOGI NIC
HBA
F Port
To verify a successful FLOGI on Cisco MDS switches, the command show flogi data-
base can be used on the command-line interface (CLI), as demonstrated in Figure 11-12.
This command provides in the output the information for the N_Ports that performed the
successful FLOGI. Which WWNN and WWPNs are used, to which F_Port is connected,
and to which VSAN belongs the communication of an N_Port. This is a simple and effec-
tive way to find out which node is connected to which switch port.
The same information can also be seen in the Cisco MDS switch graphical user interface
(GUI), which is the Device Manager (see Figure 11-13). When you select the FC-enabled
Technet24
||||||||||||||||||||
interfaces from the menu, you will be taken to a new window where the FLOGI database
entries can be seen.
Figure 11-13 Verify the FLOGI Process in the Cisco MDS Device Manager
One of the results of the FLOGI process is that the N_Port will be assigned an FCID, and
the FCID is also provided in the output of the command.
All the information for the N_Port is also registered in the FCNS. In the CLI, the com-
mand to verify this, and also to get information for the N_Ports (or end devices) con-
nected or registered to the switch fabric (not limited to the local switch), is show fcns
database. In the output of this command, we can see information for the vendor of the
end node and its capabilities (is it an initiator or a target?).
VSAN 100:
----------------------------------------------------------------------
FCID TYPE PWWN (VENDOR) FC4-TYPE:FEATURE
----------------------------------------------------------------------
0x2b273b N 21:00:c4:19:b4:12:d8:24 scsi-fcp:init
0x4e0041 N 50:06:00:a0:98:cc:c3:0e (NetApp) scsi-fcp:target
0x4e0061 N 50:06:00:a0:98:cc:c3:ea (NetApp) scsi-fcp:target
From the Cisco MDs Device Manager, you can select the Name Server option under the
FC options to check the FCNS database. In a new window will open the Name Server
database, as shown in Figure 11-14.
||||||||||||||||||||
Figure 11-14 Verifying the FCNS Database in the Cisco MDS Device Manager
PLOGI Process
After a successful FLOGI process, the N_Port can communicate in the switched fabric.
This means sending FC frames to another FC node. That other FC node (either an initia-
tor or target) is connected to the fabric through an N_Port as well. As you already know,
FCP is a protocol that is focused on the secure and reliable transmission of the frames.
That’s why, for the N_Port to be allowed to communicate with a destination N_Port,
it needs to establish and negotiate a connection to it. This is known as the port login
(PLOGI). The PLOGI must be completed successfully before the nodes can perform a
ULP operation.
1. The initiator N_Port sends a PLOGI frame that contains its operating parameters in
the payload (see Figure 11-15).
PLOGI Initiation
FF Port
Port
F Port
N port N port
F Port
NIC
HBA PLOGI PLOGI NIC
HBA
F Port
Technet24
||||||||||||||||||||
2. The target N_Port responds with an ACC frame that specifies the target N_Port
operating parameters (see Figure 11-16).
PLOGI ACC
FFPort
Port
LS_ACC F Port LS_ACC
N port N port
F Port
NIC
HBA NIC
HBA
F Port
After the FLOGI and PLOGI processes are complete, the N_Port can use the following
ELS commands to query and verify the fabric and port parameters without performing
the PLOGI process and thus forcing a logout of the current session:
■ ADISC: The address discovery is used to confirm the address of another port or to
discover whether the other port has a hard-coded address.
■ PDISC: Port discovery is used to verify the service parameters of another N_Port.
Process Login
After the successful FLOGI and PLOGI processes, the N_Ports have knowledge of their
capabilities and operating parameters. The next step is the PRLI process, which is used to
establish a session between two FC-4-level logical processes. It is executed by the PRLI
command and allows one or more images of the N_Port to be related to another N_Port.
Thus, an image pair is created, or a specific communication between ULPs is negotiated.
1. The initiator N_Port sends a PRLI frame with the information for its ULP support
(see Figure 11-17).
||||||||||||||||||||
PRLI Initiation
FFPort
Port
F Port
N port N port
F Port
NIC
HBA NIC
HBA
PRLI PRLI
F Port
2. The target N_Port responds with an ACC frame (see Figure 11-18). It contains the
information for the ULP support at the target. At this point, a channel has been suc-
cessfully opened and communication takes place. The relationship between the ini-
tiator process and the target process is known as an image pair.
PRLI ACC
FFPort
Port
LS_ACC LS_ACC
N port N port
F Port
F Port
NIC
HBA NIC
HBA
F Port
3. At the end of the data exchange, the initiator sends a PRLO frame.
4. The target responds with an ACC frame, and the image pair is then terminated. For a
new communication to happen, a new image pair needs to be established.
Technet24
||||||||||||||||||||
well as how steps are in place at the different levels and on each link hop to make sure the
environment is stable and that the devices communicating are compatible, support the
needed FC classes of service and ULPs mapping, and so on. One very important compo-
nent in the way the FC protocol communicates is the mechanism utilized to control the
flow of FC frames in a transmission between two FC ports.
The FCP uses a very strict flow control mechanism based on two major components:
■ The size of buffers to store received FC frames on each FC port. These buffers are
called BB_Credits, or buffer-to-buffer credits.
These two components are the foundation of credit-based flow control, which creates a
lossless environment and supports huge traffic loads.
During the link initialization, the FC ports negotiate the electrical parameters, the
speeds, and the serialization, but they also exchange information about their capabili-
ties, such as the size of the memory buffers, which are used to store the FC frames
received at the port. Once the link is established, each FC port knows the size of the
BB_Credits of the port to which it is connected. There are two types of BB_Credits: the
BB_Credit_TX and the BB_Credit_RX. The former defines how many FC frames can be
stored at the port for transmission, and the latter defines how many frames the port can
receive and store before processing. This means that the FC port knows the maximum
number of FC frames the connected FC port can receive and store before it can start
processing them.
When FC frames have to be transmitted, the transmitting port (Tx_Port), which already
knows the size of the BB_Credit_RX at the receiving side (the Rx_Port), will have a vari-
able that has this value. If the other port has reported a BB_Credit_RX size of 64, the
Tx_Port will store that value in the BB_Credit_RX variable. The Tx_Port knows that the
Rx_Port has space to store four FC frames without the risk of them being dropped due to
a busy port that cannot process them.
Now that the Tx_Port knows how many FC frames to send to the Rx_Port, to be able to
control the number of frames sent, it also sets another variable, BB_Credit_CNT, to zero.
The Tx_Port is ready to start sending frames, but it cannot because transmission in the
FCP is controlled by the receiving side. This means that the Rx_Port must notify the Tx_
Port that it is ready to receive the frames. This happens when the Rx_Port sends a receive
ready notification (R_RDY) to the transmitting port. The R_RDY notification means that
the BB_Credit buffers at the Rx_Port are empty, and it is ready to receive and process the
FC frames. Figure 11-19 illustrates the credit-based flow control.
||||||||||||||||||||
Tx Port Rx Port
FFPort
Port
R_RDY Acknowledgment
F Port
F Port
FC Frame FC Frame FC Frame
HBA
F Port
Transmit Start Transmit Stop
BB_Credit_CNT= 0
Once the Tx_Port receives the R_RDY notification, it starts to send frames. With every
frame sent, the BB_Credit_CNT value is incremented by one until it reaches the value of
BB_Credit_RX minus one. Then the Tx_Port stops sending FC frames, resets the value of
the BB_Credit_CNT, and waits for the R_RDY to start transmitting again.
Just as the FCP secures the communication at many levels (like with the login processes,
where there is a login between an F_Port and an N_Port), in the same way the flow con-
trol is not limited only between two directly connected ports, such as the connections
between an N_Port and an F_Port, or between two E_Ports, but extends on the next level
of communication—between the two N_Ports of the initiator and the target. At that
level, there is another variable called the EE_Credits, which is the end-to-end flow control
credits. In the same manner as the buffer-to-buffer flow control, the transmitting N_Port
also sets a variable called EE_Credit_CNT initially to zero and then increments it during
the transmission.
Additionally, the Cisco MDS switches implement an architecture based on using internal
buffer queues and a central arbiter—a component that monitors and controls the switch-
ing of the FC frames internally in the switch. It allows an FC frame to be sent to the
crossbar fabric in order to be switched in hardware to the egress switch port only when
it has received an acknowledgment that the port has enough free resources to store and
process the frame.
As you can see, there are various levels of flow control that make the FC communication
secure and reliable.
Technet24
||||||||||||||||||||
using a Hello protocol. The Hello protocol is also used for the FC switches to exchange
the FSPF parameters and capabilities.
The FSPF protocol maintains a distributed topology database with the needed func-
tionality to keep it synchronized among the switches in the fabric, including the routing
updates.
The FSPF protocol implements a path computation mechanism to calculate and build the
routing topology in a way in which the fastest route will be used to avoid the occurrence
of loops. It bases the calculation on the following:
■ Link cost: Depending on the speed of the link, a cost value is assigned. It ranges
from 1 to 65535. For example, a link operating at 1Gbps will have a link cost of
1000. If a link operates at 2Gbps, the link cost will be 500.
■ Number of hops between the destination and the source switch: Each hop repre-
sents a node on the path of the Fibre Channel frame.
■ Link state: Either up or down. A link in down state is excluded from the topology.
The FSPF maintains a Link State Database that is populated with Link State Records
(LSRs). The standard defines one type of LSR—the Switch Link Record. This record
contains the information for the connectivity of each switch in the fabric.
■ Path cost: The path cost is calculated based on the link cost and the number of hops
for the available routes with active state links. The route with the lowest path cost
is considered the fastest, although it might not be the route with the least number
of hops and is selected by the FSPF for use in the communication between the two
specific switches.
■ Per-VSAN instance: The FSPF runs per VSAN, which means that there is a separate
instance of the protocol for each VSAN, and there is also a separate FSPF database
set maintained for each VSAN.
Basically, the FSPF is a routing protocol that calculates the routes between the Domain_
IDs in a fabric. Each switch has a unique Domain_ID in a switched fabric. In this way,
the FSPF gains knowledge for building a database of how each pair of domain IDs, or
each pair of switches, can reach each other in the fastest and most secure way. If you
remember, the first field of the FCID is the Domain_ID of the switch, which is connected
to an N_Port. The FC frame also contains the SID and DID fields, which contain the
Source FCID of the N_Port sending the frames as well as the Destination FCID of the
destination N_Port, which is the target of that communication. When the source switch
receives the FC frame, it looks in the frame header—more specifically, in the DID. From
the destination FCID, it looks at the Domain_ID part. Once it knows the destination of
the switch, it needs to find a way to reach it (or a route to it), which it can look up in the
||||||||||||||||||||
Summary 423
routing database created by the FSPF protocol. It selects the route with the lowest path
cost, calculated by FSPF, and forwards the FC frame to the next hop switch on the path.
The FSPF path selection is based on the cost, as illustrated in Figure 11-20.
2Gbps 2Gbps
Cost=500 Cost=500
Switch C
Summary
This chapter covered the Fibre Channel Protocol fundamentals, physical and logical
addressing, the relationship between ports, the flow control at different levels, and rout-
ing in FCP. It also provided some examples from the CLI and the GUI of Cisco MDS
switches, in addition to the following information:
■ FC-3 runs the protocol’s common services, such as multicast, encryption, and so on.
■ FC-2 is the signaling layer, which is responsible for the FC frames, the ordered sets,
the sequencing, the exchange, the FCP services’ protocols, and the flow control.
■ FC-0 is the physical layer, and it covers the electrical signals and the media.
■ The ports in the FCP have different roles, depending on what device they belong to
and what other port and device they connect to.
Technet24
||||||||||||||||||||
■ The ports on the end devices, such as initiators and targets, are called N_Ports (node
ports).
■ F_Port, a fabric port, is the FC port on the FC switch to which an N_Port connects.
■ E_Ports, or expansion ports, are the switch ports that connect to other switches,
thus expanding the switched fabric.
■ The VF_Port and VN_Port are virtual FC ports used when physical communication
is realized over Ethernet networks.
■ There are two types of WWNs: the WWPN identifies an FC port, and the WWNN
identifies an FC device, which can have one or more FC ports.
■ The FCID is the logical addressing used in the FC routing and communication.
■ The FCID consists of three fields that identify the location of the end device: the
domain ID, which identifies the switch, the area ID, which identifies the group of
ports, and the port ID, which identifies the combination between an N_Port and
F_Port.
■ PLOGI is the process that allows an N_Port to establish communication with another
N_Port.
■ PRLI is the process that allows a successful negotiation at the process level between
the initiator and the target.
■ The communication between two N_Ports uses a similar flow control based on end-
to-end credits.
■ The Cisco MDS switches have a specific architecture that guarantees internal flow
control that’s managed by a central arbiter.
■ FSPF is a routing protocol that builds the routing topology based on the costs of the
links, their state, and the number of hops.
Reference
“Storage Area Networking,” https://www.cisco.com/c/en/us/products/storage-networking/
index.html
||||||||||||||||||||
Chapter 12
The SANs in the data centers are built to secure the communication between the servers
(or the workloads running on the servers) and the storage systems. This is a very sensi-
tive communication, because if data is lost, it is lost forever. If the data is not received in
the correct order, it cannot be reassembled correctly, which means it’s lost in this case as
well. Then there is also the challenge of which initiators are allowed to communicate with
which targets. To solve these challenges, the Fibre Channel Protocol (FCP) implements
the use of Fibre Channel zones (or zoning). You will learn about this security mechanism
in this chapter.
Another challenge is the size and the complexity of the SANs. The FC switches and the
needed equipment are not cheap, and the more devices that are in use, the more expen-
sive the SAN. Also, management, monitoring, and troubleshooting become more compli-
cated. To solve these issues, Cisco in its implementation of the Fibre Channel Protocol
provides a new feature called a virtual SAN (VSAN).
Because VSANs divide the physical SAN, and because the FC zoning is a mechanism that
exists per SAN, both topics are covered in this chapter. It’s important to understand the
relationship between them, as it affects the communication between the initiators and the
targets.
VSAN Overview
It’s very common to use the analogy of Ethernet’s VLANs when explaining FC VSANs.
Certainly, there are a lot of similarities between the two, but do not forget that these are
features of two very different communication protocols, so there are some fundamental
differences.
So, what is a VSAN, and why do you need it in the data center?
Technet24
||||||||||||||||||||
To answer this question, you have to look back at the beginning of the SANs, before
Cisco introduced this feature in the FC Protocol (later adopted to some degree by
Brocade as well). Traditionally, as there was the need to keep the communication within
a SAN protected, under control, and reliable, the approach used was to build multiple
separate physical SANs, each of them dedicated to the specific storage communication
between a set of servers, running an application or a set of common applications, and the
storage system or systems used only by these servers. For example, let’s assume that the
servers used by the HR department are running the HR applications and databases. They
will use a dedicated set of storage systems only for their data. The HR servers and the HR
storage systems will be connected to each other with a physical dedicated SAN, built by
Fibre Channel switches. This means that the SAN will be able to communicate only with
the specified servers and storage systems, for the simple reason that only they are physi-
cally connected to the SAN. Traditionally, this is how we achieved the fabric isolation
and separation between the needs of different groups, workloads, and so on. Just imagine
how you would need multiple, separate, dedicated, physical SANs, servicing the needs of
other departments, groups, and applications in an organization. Usually, an organization
has multiple different departments and types of applications, with different functional-
ity. Therefore, there was a lot of Fibre Channel equipment, installed and connected in
separate SANs, to meet the requirements for separation and isolation between the fabrics.
That was expensive. Extremely expensive! Another issue was that these Fibre Channel
infrastructures had to be monitored and maintained. These operations were complicated
and difficult. This was also known as a siloed approach (that is, creating multiple SAN
silos), as shown in Figure 12-1
Siloed SANs
HBA HBA
CRM SAN
HBA HBA
ERP SAN
HBA HBA
Dev SAN
HBA HBA
The solution came from Cisco: the idea and functionality implemented in their version of
the Fibre Channel Protocol was that the physical SAN could be virtualized, following the
||||||||||||||||||||
analogy with VLANs that the communication in an infrastructure can be separated and
isolated in different groups. However, because this is the Fibre Channel Protocol, there
was the need for this separation to be stricter compared to that with VLANs. With the
VLANs, based on the VLAN ID the traffic is tagged with, the communication can hap-
pen only where this tag is allowed. However, at the same time, it’s very easy to allow com-
munication between devices that communicate in different VLANs. This behavior is not
acceptable in Fibre Channel communication. That’s why the VSAN isolation is implemented
in the hardware of the Cisco MDS switches, at the level of the physical Fibre Channel
ports. When a Fibre Channel frame enters the Cisco MDS switch through an F_Port, it
is tagged with the appropriate VSAN tag in the hardware. Then it can be communicated
only through ports, which belong to the same VSAN. Basically, the VSANs do exactly as
their name states—they divide the physical Fibre Channel switched fabric, composed of
one or more switches, into multiple SANs, but they are virtual. The initiators and the tar-
gets can communicate only with other end nodes that belong to the same VSAN. It is not
allowed for devices from different VSANs to communicate to each other. Actually, there
is a functionality called Inter-VSAN Routing (IVR) that can allow such a communication,
but it needs to be specifically configured, and that configuration is complex. In other
words, you cannot allow the communication between devices from different VSANs by
accident. The traffic isolation implemented with the usage of the VSANs is a huge advan-
tage, as it allows fewer Fibre Channel switches to be used and at the same time have the
flexibility and scalability needed by the organization. Figure 12-2 shows the benefits of a
VSAN through traffic isolation.
HBA
HBA
HBA
HBA
Each VSAN has an ID. The ID is a number in the range 1 to 4094. There are some VSAN
IDs that are reserved. VSAN 1 is the default VSAN, and it can be used. It just exists by
default and cannot be deleted.
Technet24
||||||||||||||||||||
There is the special VSAN 4094, which is used by the system to isolate orphaned FC
ports. Orphaned FC ports on the switch are those that belonged to a VSAN, but the
VSAN was deleted. These ports cannot automatically move to VSAN 1, as there might be
a configuration or end devices connected to them that would cause miscommunication or
loss of data. That’s why such ports are automatically put in VSAN 4094 and are isolated,
which means they can’t communicate with any other ports. This leaves the range of user-
assigned VSAN IDs from 2 to 4093. However, there is a little bit more to this. Table 12-1
shows which VSAN IDs can be assigned by the administrators and which are reserved.
■ Name: The name must be unique for each VSAN. A string of up to 32 characters can
be used for management purposes. If a name is not configured, the system takes the
string VSAN and adds the number as a four-digit string. For example, the system will
provide a VSAN with an ID of 11 with the name VSAN0011.
■ State: The state can be active or suspended. Active state means that the VSAN is
configured and enabled, which also enables the Fibre Channel Protocol services
for the VSAN. The suspended state specifies that the VSAN is created but it’s not
enabled.
■ Load-balancing scheme: This is the load-balancing scheme used for path selection. It
can be based on the source-destination FCID (src-dst-id) or the default one, which is
based on using the combination of source and destination FCIDs together with the
originator exchange ID (OX ID), presented as src-id/dst-id/oxid. It means that the
communication between a specific source and destination for a specific exchange (or
conversation) will take the same path.
The VSAN’s goal is to mimic a SAN, and because of that most FCP services run per
VSAN. There are separate per-VSAN instances running for some of the FCP services, and
other services will use separate databases for each VSAN. No matter the approach, the
FCP services must be configured and managed per VSAN. These services are the zone
server, principal switch selection and domain ID distribution, management server, and
name server.
||||||||||||||||||||
This being the case, the SAN is not limited only to the theoretical maximum of 239
domain IDs, as each VSAN is a separate virtual SAN, and there is now a theoretical maxi-
mum of 239 domain IDs per VSAN. Each switch will have a unique domain ID in each
VSAN. This domain ID is unique within the VSAN among the other switches that also
communicate in the same VSAN. The principal switch selection service will select one
of the switches for a VSAN to become the principal as well as control the domain ID
distribution within the VSAN. So, the domain IDs are now limited to 239 per VSAN, and
within the VSAN, each of the switches must have a unique domain ID. Because multiple
VSANs can be created on a switch, this means there will be a separate domain ID for
each of the VSANs. Can the domain ID of a specific switch have the same value for all
the VSANs that exist on the switch? It can, if this is configured manually, but this doesn’t
mean that this domain ID is the same! The domain IDs, even using the same value, are
separate and independent between the VSANs. The other important point is that a switch
can be elected to be the principal switch for one VSAN but not for another. It is not man-
datory that the same switch be the principal for all the VSANs that exist on it. It can be
the principal switch for one VSAN and not the principal switch for another.
The E_Ports are the Fibre Channel ports that connect the SAN switches to each other.
They expand the switched fabric. The VSANs have a new E_Port mode called the TE_
Port. When a port is in E_Port mode, it will allow communication only for the traffic of
the VSAN to which it belongs. Also, this means that the other E_Port to which it is con-
nected must also belong to the same VSAN. Otherwise, there will not be communication,
and even though the two ports come up, they will stay isolated. If there is a need to carry
traffic for multiple VSANs across the E-to-E_Port link, VSAN trunking must be enabled,
and the ports will be in TE_Port mode, which stands for trunking E_Ports. The configura-
tion must be the same on both sides of the TE_Ports, and they have to allow communica-
tion for the same VSANs. Figure 12-3 illustrates VSAN trunking.
HBA
HBA
HBA
HBA
Technet24
||||||||||||||||||||
The same applies if the F_Port of the switch is connected to an NPV switch, and there
is the need to carry the communication for multiple VSANs. The F_Port will become a
TF_Port (trunking), and the NP_Port on the NPV switch will become TNP_Port (also
trunking). The same rules apply in this situation as well; the same VSANs need to be
allowed on both sides.
■ They are equal to SANs, with their own routing, zoning, naming, and addressing.
■ They are transparent for the initiators and targets, as the membership is defined on
the F_Ports.
■ The membership is enforced on every F_Port and E_Port through which the frames
travel.
VSAN Configuration
The configuration examples provided are for a Cisco MDS switch. To configure, modify,
or delete a VSAN, you have to be in configuration mode. Keep in mind that even though
the useable range of VSAN IDs is large, the switches support only 256 VSANs at the
same time.
Step 1. Connect to the command line interface (CLI) of the switch using a terminal
client and the SSH or Telnet protocol. If this is your first time configuring the
switch, you will connect through the console port of the switch.
Step 2. Log in to the switch and enter the configuration mode using the command
config:
mds-9200-a# config
Enter configuration commands, one per line. End with Crtl+Z.
mds-9200-a(config)#
Step 3. To be able to create or modify a VSAN, you need to do it from the VSAN
database. You enter it using the vsan database command:
mds-9200-a(config)# vsan database
mds-9200-a(config-vsan-db)#
Step 4. You must create a VSAN with an ID using the vsan X command:
mds-9200-a(config-vsan-db)# vsan 20
mds-9200-a(config-vsan-db)#
||||||||||||||||||||
Step 5. If you don’t want to go with the name the system generates, use the command
vsan X name ABC to set the preferred name:
mds-9200-a(config-vsan-db)# vsan 20 name Sales
mds-9200-a(config-vsan-db)#
Step 6. Use the show vsan command to verify that the VSAN is created:
mds-9200-a(config-vsan-db)# sh vsan
vsan 1 information
name:VSAN0001 state:active
interoperability mode:default
loadbalancing:src-id/dst-id/oxid
operational state:down
vsan 20 information
name:Sales state:active
interoperability mode:default
loadbalancing:src-id/dst-id/oxid
operational state:down
Step 7. Using the vsan X ? command, you can get a list of options that can be used
for further configuring the VSAN. The settings that can be changed are the
load-balancing mechanism, the operational state, and the interoperability
mode. The latter is used when the Cisco MDS switch needs to communicate
with switches from other vendors that do not support the standard Fibre
Channel Protocol specifications.
mds-9200-a(config-vsan-db)# vsan 20 ?
<CR>
- Range separator
interface Add interfaces to vsan
interop Interoperability mode value
loadbalancing Configure loadbalancing scheme
name Assign a name to vsan
suspend Suspend vsan
mds-9200-a(config-vsan-db)#
At this point, you have created a VSAN and assigned it an ID and a name. You
also checked the load-balancing mechanism in use and whether the VSAN
was created. The next step is to assign interfaces that will communicate with
it or belong to it.
Step 8. To assign interfaces to a VSAN, use the vsan X interface fcX/Y command.
With the ? option, you can see the different supported interfaces on the
switch that can be assigned:
mds-9200-a(config-vsan-db)# vsan 20 interface ?
fc Fiber Channel interface
fv Fiber Channel Virtualization interface
port-channel Port Channel interface
Technet24
||||||||||||||||||||
mds-9200-a(config-vsan-db)#
mds-9200-a(config-vsan-db)#
To verify which interfaces belong to which VSAN, use the show vsan mem-
bership command without specifying a VSAN ID:
mds-9200-a(config)# sh vsan membership
vsan 1 interfaces:
fc1/1 fc1/2 fc1/3 fc1/4
output omitted
fc1/38 fc1/39 fc1/40
vsan 20 interfaces:
fc1/20
mds-9200-a(config)#
From the output, you can see that interface fc1/20 belongs to VSAN 20 and
the rest of the interfaces on the switch belong to the default VSAN (VSAN 1).
There are no interfaces in the isolated VSAN (VSAN 4094). This step ends the
sequence of commands for creating a VSAN, assigning interfaces to it, and
verifying the correct configuration.
To delete a VSAN, use the no vsan X command (you have to be in the VSAN
database to execute it):
mds-9200-a(config)# no vsan 20
^
% Invalid command at '^' marker.
||||||||||||||||||||
After you confirm the deletion, the VSAN will be removed from the switch, and you can
verify this with the show vsan command:
vsan 4079:evfp_isolated_vsan
vsan 4094:isolated_vsan
mds-9200-a(config-vsan-db)#
You can see that there is no more VSAN 20 on the switch. But, what happened to the
interface that was assigned to it? When you check the membership of the interfaces, as
shown in Example 12-1, you will see that interface 1/20 now belongs to VSAN 4094 (the
isolated VSAN). Therefore, it cannot communicate with any other interface, and it is iso-
lated to avoid any wrong configuration being introduced into the default VSAN (VSAN 1).
mds-9200-a(config-vsan-db)#
Technet24
||||||||||||||||||||
This is done to force you to check the configuration. If you are sure there is nothing
potentially dangerous in the configuration of an interface, you can manually add it to
another VSAN. In this case, there is only one VSAN, the default one, and the interface
will be manually assigned to it:
Example 12-2 shows how to verify that the interface is no longer under the isolated
VSAN.
mds-9200-a(config-vsan-db)#
The interfaces operating as F_Ports allow the communication of the VSAN to which they
belong. By default, this is VSAN 1. They can allow the communication of the traffic for
multiple VSANs when trunking is enabled for the interface. This means that the interface
will operate in TF_Port mode and will allow the port to process frames with tags from
multiple VSANs. The trunking F_Ports can be used when a non-NPV switch is connected
to an NPV switch. Both N_Port Virtualization (NPV) and N_Port ID Virtualization
(NPIV) are explained in Chapter 13, “Storage Virtualization.” That’s why this chapter
focuses on configuring the trunking on the E_Ports, which form the ISL link between
two Fibre Channel switches.
The ports that interconnect two Fibre Channel switches operate as E_Ports. The link that
is formed by them is called an Inter-Switch Link (ISL). The E_Port allows the communi-
cation for a single VSAN. If you need to allow the communication of multiple VSANs
between two Fibre Channel switches, trunking on the E_Ports must be enabled. The
E_Ports become Trunking Extension ports (TE_Ports). When the trunking is enabled for
an ISL, the link becomes Extended Inter-Switch Link (EISL).
||||||||||||||||||||
The default setting for the trunk mode is enabled on all Fibre Channel interfaces (E, F, FL,
Fx, ST, and SD) on non-NPV switches. On the NPV switches, the trunk mode is disabled
by default. You can configure trunk mode as on (enabled), off (disabled), or auto (auto-
matic). The trunk mode configuration at the two ends of an ISL, between two switches,
determines the trunking state of the link and the port modes at both ends. Table 12-2
provides information for the possible combinations.
To configure the ISL, the two Fibre Channel interfaces need to be connected to each
other and configured to operate in E_Port mode. Example 12-3 provides an example of
such a configuration.
mds-9100-a# conf t
Enter configuration commands, one per line. End with CNTL/Z.
mds-9100-a(config)# interface fc1/2
mds-9100-a(config-if)# switchport mode ?
E E mode
F F mode
Fx Fx mode
NP NP port mode for N-Port Virtualizer (NPV) only
SD SD mode
auto Auto mode
Then you can check the state of the port, as shown in Example 12-4.
Technet24
||||||||||||||||||||
mds-9100-a(config-if)#
The port is down, as it is not enabled yet. The other side also needs to be configured and
enabled for the ports to come up and operational.
From the output you can see that the port is in E mode and that the trunking is on, which
is the default setting.
There is also another very interesting piece of information that is provided in this
output—the port VSAN is 1, which means that the interface belongs to VSAN 1 at
this moment. However, there is no information about which VSANs are allowed if the
interface trunk mode is enabled. This can be checked with the following command:
The trunk mode is set to on, but the interface is not trunking because it is not enabled; it
didn’t create an ISL and negotiate the trunking with the opposite port.
When the same configuration is performed on the other switch and the two E_Ports are
enabled, the output will be different, as shown in Example 12-5.
||||||||||||||||||||
mds-9100-a(config-if)#
From this output you can see that the ISL is up and trunking, which means Extended
Inter-Switch Link (EISL). The port operates in TE mode. This means that the two ports
negotiated the trunking because it is set to on by default on both sides.
Regarding the trunking, there is also the important information of which VSANs are
allowed to communicate over the EISL. By default, all the VSANs that exist on a Fibre
Channel switch are allowed on the EISL, but the same VSANs must also exist on the
other switch. Otherwise, as you can see from the output, the VSANs that do not exist
on the two switches will become isolated, or the frames tagged in these VSANs will not
be allowed to traverse the link. In the example, VSAN 999 is isolated, as it exists only on
one of the switches. That’s why only VSAN 1 is allowed over this EISL.
Technet24
||||||||||||||||||||
When the same VSAN is created on the other switch, the situation will change, and the
output in Example 12-6 shows that VSAN 999 is now up and trunking on the link as well.
There is an option to define which VSANs will be allowed over a specific port.
Another important note is that the examples and explanations are given using a two-
switch setup, but this does not limit the number of switches that connect together to
extend the fabric to two. If you remember from previous discussions, the theoretical limit
of switches forming a switched fabric is 249.
||||||||||||||||||||
■ The primary goal is to prevent certain devices from accessing other fabric devices.
Understanding zoning correctly is important for creating and deploying stable switched
fabrics. Oftentimes the “limitations” of the zoning are discussed, with the idea that it is
supposed to provide other functions such as load balancing (bandwidth allocation) or
redundancy.
Zoning was designed to be a simple and effective security mechanism only! It prevents
devices from communicating with other unauthorized devices. That’s it—nothing more,
nothing less.
Because of that, I do not like to discuss the “limitations” of the zoning mechanism.
Instead, I focus on what it is and how it can be used. When one understands what zoning
is, how it is implemented and operates, what its options are, and how to configure and
manage it, then the switched fabric is stable and secure.
The specific rules of engagement for the zones are as follows (see Figure 12-4):
■ Until an initiator and a target belong to a zone, they cannot see each other.
Technet24
||||||||||||||||||||
■ Zones can be overlapping, which means that an initiator or a target can be a member
of multiple different zones.
Zone A
HBA
Zone C
HBA
Zone B
HBA
■ Hard zoning
■ Enforced as access control lists (ACLs) in the hardware of the Fibre Channel port.
■ The ACL rules are applied to the whole data path, which means that these limita-
tions are applied in the hardware of every switch port on the path.
■ This is the default mode of operation on Cisco Fibre Channel switches. No need
to be changed to soft zoning.
■ Soft zoning
■ The name server responds to discovery queries only with the devices that are in
the zones of the requester.
||||||||||||||||||||
The soft zoning is based on a mechanism that relies on the information the name server
returns to the end devices, when they log in to the fabric and try to discover it. Then the
FCNS returns a list of the devices that can be accessed by the end device. These are the
devices that are configured to be in the same zones with the end device that requested
this information. In other words, the end device will know only of the devices that the
name server told it about. Then it will be able to communicate with them, as it knows
their addresses from the information in the FCNS response. This means that if the FCID
of a different target becomes known to this end device, regardless of whether or not that
FCID was in the response from the name server, communication will be possible because
there is nothing else on the path to apply the limitations in the zoning configuration.
With hard zoning, the control is total, as the limitations in the zoning configuration are
applied to each of the ports on the communication path between an initiator and a target.
They are applied in the silicone of the ports and are enforced on each frame of the
communication.
Zoning Configuration
The configuration of the zoning on the Cisco MDS and Nexus switches is specific to
Cisco’s implementation of the Fibre Channel Protocol, and even though it is based on the
Fibre Channel Protocol standard, there’s still the need to have a more flexible approach
that also considers features such as the VSANs—something that did not exist on other
FCP-capable switches until recently.
The zone is a list of members. The members can communicate with each other. Because
the zone is an ACL with permit statements, when members are added, the system gener-
ates the needed amount of permit rules to cover all the possible communication between
the members. The formula to calculate the number of ACL entries based on the number
of members (n) is n*(n–1). This means that if there are eight initiators and one target, the
number of ACLs the switch will have to generate is 72! This is a huge number, and a lot of
the resources of the switch will be consumed. At the same time, if you have a zone that
has one initiator and one target as members, you can easily find out by using the same
formula that the number of ACLs needed is two! If you take the first example of eight
initiators and one target and convert that zone into eight separate one-to-one zones, one
for each initiator, with the target belonging to all of them, then you will have two ACLs
per zone, with eight zones, for a total of 16 ACLs. This is a significantly smaller number
than 72. Also, in that zone with eight initiators and one target, most of these permissions
will be controlling the communication between initiators, which is a waste of resources.
That’s why it is always recommended to have multiple one-to-one zones, then a few
zones, but each with multiple members. It is also true that in data centers with a huge
number of initiators and targets, this approach can be an administrator’s nightmare. To
help solve this challenge, Cisco has implemented a feature called Smart Zoning, but it’s
not covered in this book.
Technet24
||||||||||||||||||||
■ Multi-initiator, single target: The addresses of multiple initiators and the address of
a single target are placed in the same zone. Multiple devices (initiators) are able to
access the same target. The drawback is that each such zone generates more ACLs,
which results in more hardware resources being used.
■ Single-initiator, single target: The address of a single initiator and the address of a
single target are placed in the same zone. A single device (initiator) is able to access a
single target. This results in optimal hardware resource utilization, but it’s an admin-
istrator’s nightmare in environments with a huge number of end nodes.
■ FCID.
■ IP address (iSCSI).
■ Device alias or Fibre Channel alias. (The Cisco MDS switches have the option to
specify an alias (an administrator-friendly name) that maps to the WWPN of the
end node.)
When the needed zones are created, they are combined into a group, which is called a
zoneset. On a switch there can be multiple zonesets, each containing different zones with
different configuration. The group of all the zonesets that exist on a switch is called a full
zoneset. Figure 12-5 illustrates the difference between a full zoneset and an active zoneset.
Zone
Zone A Zone 1
AA
Zone
Zone B Zone 2
BB
Activate
Zoneset C Zoneset C
Zone AA Zone AA
Zone BB Zone BB
Activate
||||||||||||||||||||
The next step is to enforce one of the zonesets. This is called “activating a zoneset.” A
copy of this zoneset is created that is read-only; in other words, it cannot be modified.
The copy is sent to all the switches that have ports belonging to the same VSAN. The
ACL rules from the zones in this zoneset are applied to the silicone of the ports on the
switches. Only one zoneset can be active per VSAN.
When there is a need to modify the active zoneset, changes are made in the original zone-
set (that is, the one that was activated). As you’ll remember, the active zoneset is a copy,
which means that for the changes to be enforced, a new activation is required. Then the
modified zoneset will be activated, and the currently active one will be deactivated.
This also means that if one zoneset has been activated and then you activate another
zoneset, the same thing will happen—the currently active zoneset will be deactivated and
the new one will become active.
To configure the zoning, your first task is to create the needed zones, and for that you
need to know the identities of the members you want to add to the zones. You will be
working with VSAN 999, and to find the initiators and targets in it, you use the sh fcns
data vsan 999 command, as shown in Example 12-7.
Example 12-7 Output from the FCNS Command for a Specific VSAN
VSAN 999:
--------------------------------------------------------------------------
FCID TYPE PWWN (VENDOR) FC4-TYPE:FEATURE
--------------------------------------------------------------------------
0x0c0000 N 21:00:f4:e9:d4:58:d7:88 scsi-fcp:init
0x360000 N 21:00:00:11:0d:40:af:00 scsi-fcp:target
There is no zoning-related configuration for VSAN 999. This also means that the initiator
and the target cannot see each other and communicate.
Technet24
||||||||||||||||||||
Once you make sure you have the correct initiator and target, the next step is to create a
zone. You will have to specify a name for the zone as well as the VSAN to which it will
belong:
This will take you into the zone configuration submode. Here you can add the members.
The identity options for adding members are shown in the output of Example 12-8.
mds-9200-a(config-zone)# member ?
device-alias Add device-alias member to zone
domain-id Add member based on domain-id,port-number
fcalias Add fcalias to zone
fcid Add FCID member to zone
fwwn Add Fabric Port WWN member to zone
interface Add member based on interface
ip-address Add IP address member to zone
pwwn Add Port WWN member to zone
symbolic-nodename Add Symbolic Node Name member to zone
Based on the output of the show fcns database vsan 999 command, the WWPNs are
known for the initiator and the target. They will be added as members of this zone:
Verify that the zone is created with the show zone vsan X command:
At this stage, a zone has been created. In order for you to work with it, it needs to
become a member of a zoneset; even if there is only a single zone, it still needs to be in a
zoneset. When a zoneset is created, a name and the VSAN need to be specified:
||||||||||||||||||||
Next, verify that the zoneset is created and the zone is a member of it:
So far, a zone has been created and members have been added to it. Then, this zone was
added as a member of the newly created zoneset. However, is it active in the VSAN? To
check, use the following command:
There is no currently active zoneset for VSAN 999. To activate the zoneset you just cre-
ated, use the following command:
mds-9200-a(config)#
The switch also informs you that the zoneset was activated. Now, let’s verify it on both
the switches that communicate in VSAN 999 (see Example 12-9).
Technet24
||||||||||||||||||||
■ The zoneset was copied and then sent to and enforced on the switch mds-9100-a,
regardless that no zoning configuration was performed on it.
■ On the mds-9100-a switch, the full zoneset is empty, as no zoning configuration was
performed on it. It only has a copy of the zoneset that was activated on the other switch.
mds-9100-a(config)# sh zoneset vsan 999
Zoneset not present
mds-9100-a(config)#
■ An asterisk (*) appears in front of the members in the zone in the active zoneset,
which means that the members are connected and online in the VSAN and the zone.
They will be able to communicate with each other.
This is the flow of configuring zoning on the Cisco MDS/Nexus switches. The configura-
tion was performed in basic zone mode for VSAN 999.
To see the attributes and settings for the zoning in a VSAN, use the sh zone status vsan
999 command, as shown in Example 12-10.
mds-9200-a(config)#
||||||||||||||||||||
When the zoning for a VSAN is set to basic mode, the configuration can be made simul-
taneously from any switch in this VSAN. Also, it can simultaneously be activated in dif-
ferent zonesets from different switches. This can cause a serious misconfiguration, which
can also lead to a loss of data.
To avoid such a situation and to address full zoneset database consistency across the
switches, you can use enhanced zone mode. In this mode, you perform all configura-
tions within a single configuration session. When a session begins, the switch advertises
a lock to all switches in the entire fabric for the specific VSAN. The lock does not allow
any zoning configuration for this VSAN to be performed on any other switch in the same
VSAN. Once you have finished with the configuration and are sure it is correct, you need
to perform a commit with the zone commit vsan command. The commit will write the
configuration to the local full zoneset database and will synchronize it with the rest of
the switches. This approach ensures consistency within the fabric.
In basic zoning, even with distribute full enabled, it is possible that the full zone database
is different among switches. With enhanced zoning, it is not possible to change only the
local full zoning database, as shown in Figure 12-6.
Commit Fabric-wide
To change the zone mode, use the zone mode enhanced vsan 999 command, as shown
in Example 12-11.
Technet24
||||||||||||||||||||
You can verify this using the command shown in Example 12-12.
Zoning Management
When managing the zoning configuration, you need to be aware of some additional
guidelines, options, and recommendations. Therefore, before working on the zoning con-
figuration, consider the following:
■ Each VSAN can have multiple zonesets, but only one zoneset can be active at any
given time.
■ When you create a zoneset, that zoneset becomes a part of the full zoneset.
■ When a zoneset is activated, a copy of the zoneset from the full zoneset is used to
enforce zoning, and it’s called the active zoneset.
||||||||||||||||||||
■ You can modify the full zoneset even if a zoneset with the same name is active. The
changes do not take effect until the zoneset is activated with the zoneset activate
name command.
■ During the activation, the active zoneset is stored in the persistent configuration.
This process allows the switch to preserve the active zoneset information across
switch resets. There is no need to use the copy running-config startup-config com-
mand to store the active zoneset. However, you need to save the running configura-
tion of the switch to store the full zoneset.
■ All other switches in the fabric receive the active zoneset so they can enforce zoning
in their respective switches.
■ An FCID or N_Port that is not part of the active zoneset belongs to the default zone.
The default zone information is not distributed to other switches.
■ If one zoneset is active and you activate another zoneset, the currently active zoneset
is automatically deactivated. You do not need to explicitly deactivate the currently
active zoneset before activating a new zoneset.
Two types of zones are supported on the Cisco MDS switches. Up until now, we have
discussed the regular zones. The other type is the default zone.
The default zone always exists for each VSAN on the Cisco Fibre Channel switches. All
the end devices that connect to a VSAN and are not added to a zone are automatically
put in the default zone. As the zoning is a security mechanism to allow communication
between specific initiators and targets, adding them automatically to a zone might pose
a significant risk. That’s why the communication policy inside the default zone is set to
“deny” by default. This means that every end node in the default zone will be isolated.
At this point, the logical question is, why is the default zone needed if the devices will
still be isolated until they become members to a zone? The answer is related to the design
and scale of your SAN infrastructure. If you are dealing with a significantly small infra-
structure and just a couple of initiators and a target, for example, you can set the policy
for the default zone to “allow,” and the communication of all the devices connected to
your switched fabric will be automatically allowed.
This is still a very risky approach, and it is recommended that you create a specific zoning
configuration, no matter how small your environment is. Also, you should always set the
default zone policy to deny.
Another interesting feature of the Cisco MDS zoning configuration is that you can con-
trol exactly what access the initiators will have to the target in a zone. By default, when
an initiator and a target belong to a zone, and it is active, the initiator has read and write
Technet24
||||||||||||||||||||
access. It can retrieve data from the storage system, and it can send data to be stored. But
there might be a situation where you want a certain initiator (or initiators) to only be able
to read data from a target. Then you will put the initiator(s) in a zone and will set the zone
as read-only with the command attribute read-only in the zone configuration mode, as
shown here:
As already mentioned, the zones belong to a VSAN, or each VSAN has its own zoning
configuration. Also, if a VSAN with the same VSAN ID exists on two Fibre Channel
switches and they are not connected to each other, these two VSANs are actually sepa-
rate, different VSANs, regardless of whether you see the same VSAN ID. This also means
that each of the VSANs will have its own zoning configuration—one or multiple zonesets
with one or multiple zones per zoneset. It is interesting to see what happens when a link
is created between these two switches, because then the two separate VSANs with the
same VSAN ID will connect and will become one VSAN. In other words, a zone merge
will occur between the zone servers of the two VSANs. This is the process where two
zone servers need to compare the zones present in each of the VSANs and make sure
the zones and the zonesets are the same (that is, the zones have the same members). The
question here might be, why do the zone members have to be the same? Or, why do the
zones have be the same? The answer is, because the zoning is a security feature, and
the goal is always to be in control. Also, there’s a risk that suddenly allowing this commu-
nication that is not supposed to happen will result in merging together all the zones from
the two VSANs, which can lead to serious consequences. Figure 12-7 illustrates the
zone merge.
So, the zone merge will happen when the two switches are connected with a link. If the
link is an ISL, which means that a single VSAN is allowed to communicate, the zone
merge will happen only for this VSAN.
If the switches are connected with an EISL, allowing multiple VSANs, the zone merge
will occur for each of the VSANs. Some VSANs might merge their zones successfully,
but for others there might be different zoning configurations and the merge will fail.
These VSANs will become isolated, and no traffic will be allowed to cross the EISL for
these VSANs.
||||||||||||||||||||
Summary 451
Zone Merge
SAN A
VSAN Dev
VSAN ERP VSAN Dev
Zoneset 100
Zoneset AZ Zoneset 100
• Zone 100
• Zone A • Zone 100 • Initiator 1
• Initiator A • Initiator 1 • Target 10
• Target Z • Target 10
VSAN ERP
Zoneset AZ
• Zone A
Different members of
Merge • Initiator A the zone. Merge Failure.
• Target Z Traffic Isolation
To recover from a zone merge failure, you basically have three options, all of them
manual:
To import or export the full zoneset, the command is zoneset {distribute | export |
import interface { fc slot-number | fcip interface-number | port-channel port-number}}
vsan vsan-id.
Summary
This chapter covers the Fibre Channel Protocol zoning and the Cisco-developed virtual
SAN feature, which allow for the secure and flexible management and control of SANs
and for overcoming the limitations in the protocol for scalable and virtualized data center
environments. In this chapter, you learned about the following:
■ VSAN traffic isolation is deployed in the hardware of the Fibre Channel ports on the
Cisco MDS switches.
Technet24
||||||||||||||||||||
■ The adoption of a VSAN allows you to better utilize your SANs and avoid having
separate physical siloed SANs.
■ The VSAN has a name, ID, operational and administrative states, and a load-
balancing scheme.
■ VSAN 1 is the default VSAN and cannot be deleted, but it can be suspended.
■ When a VSAN is deleted, the ports that belonged to it are moved to the isolated
VSAN to avoid introducing a dangerous configuration.
■ VSAN IDs 2–4078 and 4080–4093 can be used for VSAN assignment.
■ An ISL allows a single VSAN. When multiple VSANs are allowed, the ISL becomes
an EISL and the E_Ports become trunking E_Ports.
■ The traffic for a VSAN is tagged in the Fibre Channel Protocol frames.
■ To configure a VSAN, you must enter the VSAN database from the switch configu-
ration mode.
■ Once a VSAN is created, the next step is to assign ports to become members.
■ To check which ports are members of which VSAN, use the show vsan membership
command.
■ The command show vsan gives information for the VSANs existing on the current
switch.
■ In order for a link to become trunking, it must have the same configuration on both
sides.
■ If a VSAN is not allowed on one of the sides of a trunking link, it becomes isolated.
■ Fibre Channel zoning is a security mechanism that controls which initiators are
allowed to access which targets.
■ The Fibre Channel zoning configuration is mandatory because the Fibre Channel
Protocol will not allow any communication without it.
■ Fibre Channel zoning is created per VSAN, and it is only locally significant for the
VSAN.
■ When Fibre Channel zoning is configured, you first create zones and then add mem-
bers to them. Then you combine them into zonesets. The last step is to apply one of
these sets, and it becomes the active zoneset.
■ All the zonesets on a switch for a VSAN form the full zoneset for this VSAN.
||||||||||||||||||||
Reference 453
■ Only a copy of the zoneset that is activated is distributed among the switches that
communicate in the same VSAN.
■ Changes are made in the full zoneset. Changes cannot be made in the active zoneset.
■ Members are added to a zone based on different identity values, such as WWNs,
FCID, device alias, and so on.
■ In basic mode zoning, the configuration can be changed at the same time from dif-
ferent switches and activated. This can lead to misconfiguration.
■ In enhanced zone mode, when you start to change the configuration of a full zone-
set, the configuration session lock is applied to all the switches for the specific
VSAN.
■ Once the changes are complete, you must commit them in order for them to be
implemented in the configuration of the switch and distributed to the rest of the
switches.
■ The command attribute read-only means the initiators are only allowed read access
to the targets in a zone.
■ By default, the internal communication policy for the default zone is deny.
■ During zone merges, the full zonesets on the merging switched fabrics must be the
same; otherwise, the merge will fail.
■ To recover from a merge fail, you can import one of the full zonesets to the other
fabric, export it, or manually fix the configuration.
Reference
“Storage Area Networking (SAN),” https://www.cisco.com/c/en/us/products/storage-
networking/index.html
Technet24
||||||||||||||||||||
||||||||||||||||||||
Chapter 13
Storage Virtualization
When a storage area network (SAN) is designed and built, scalability and reliability are
fundamental. The Fibre Channel standards, as defined by the ANSI T11 committee,
impose some limitation; for example, the maximum theoretical supported number of
domain IDs is 239 per fabric or VSAN. As each Fibre Channel switch is identified by
a single domain ID, this limitation also puts a theoretical upper limit on the number of
switches per fabric. This directly affects the scalability of the SAN.
Cisco has validated stable and reliable performance in a fabric with up to 100 domain IDs
(that is, switches), while other vendors have tested 40 or do not specify a number. As a
result, the practical maximum number of switches per fabric is 80, if the fabric is built
with Cisco MDS/Nexus switches. Also, the number 100 is far from the theoretical maxi-
mum of 239. Figure 13-1 illustrates the scalability challenge with the SAN infrastructures
in the data center.
When you think about the different physical designs for the SAN in the data center, you
must take into account that the blade switches and the top-of-rack access layer switches
that are running the Fibre Channel protocol consume domain IDs as well. Additionally,
even if you run the Fibre Channel communication over Ethernet, meaning that you use
FCoE, the switches will still consume domain IDs.
As data centers grow, the number of devices in a converged fabric increases, and the
operational and management issues become more complex. In fact, the whole data center
design becomes more complex. This means that having a limitation such as the maximum
number of supported domain IDs is not good for the future growth and the adoption of
the SAN.
Another challenge is the growing virtualization in the data center. The need to utilize
resources at maximum capacity and the added need for flexibility makes virtualization a
de facto standard requirement for data center design.
Technet24
||||||||||||||||||||
Initiators
Targets
HBA
HBA
HBA
HBA
HBA
HBA
HBA
HBA
To respond to these challenges, storage virtualization technologies are used, such as run-
ning the storage switch in N_Port Virtualization (NPV) mode and using the N_Port ID
Virtualization (NPIV) feature.
In this chapter, you will learn about the fundamentals of storage virtualization technolo-
gies, what you can use them for, which Cisco devices support them, and how to config-
ure and verify them.
And one very important item of note: N_Port ID Virtualization (NPIV) is a feature on
the Cisco MDS/Nexus switches. You either enable it or disable it; it’s not disruptive for
the operation of the switch. On the other hand, N_Port Virtualization (NPV) is a mode in
which the switch operates. Changing between the Fibre Channel switching mode and the
Fibre Channel NPV mode is disruptive. The configuration must be saved before changing
the mode, and the switch will go through a restart. From a designer’s perspective and also
from the perspective of data center monitoring and support, such a change needs to be
planned for and executed during a maintenance window.
In a typical physical design, there is a server, or the initiator, that connects to a Fibre
Channel switch through its host bus adapter (HBA). There is a physical link between the
HBA Fibre Channel port and a Fibre Channel port on the switch. You already know about
||||||||||||||||||||
the N_Port to F_Port connectivity. So, there is a single physical N_Port connected to a
single physical F_Port. And the Fibre Channel processes run as follows:
2. PLOGI: Then the N_Port logs in to the N_Port with which it will communicate.
3. PRLI: The two N_Ports exchange the needed information for the supported upper-
layer protocols (ULPs) to ensure the target and the initiator can successfully
communicate.
If you look at the Fibre Channel ID (FCID), you have the domain ID of the switch to
which the server is connected and the port ID. If you remember, the port ID is a unique
N_Port-to-F_Port reference between physical ports.
However, in the data center there is now a lot of virtualization. Actually, when it comes
to the servers, it is all virtualization, with some minor exceptions when there are specific
requirements by the application to run on a bare-metal server, or in other words, to use its
own dedicated physical server.
In the case of virtualization, on the top of the physical server you run a specialized oper-
ating system, called a hypervisor, that creates virtual images of the physical resources,
and then you utilize them to create protected, isolated environments called virtual
machines (VMs) to run your applications in.
Let’s go back to the example from the beginning of this section and now add virtualiza-
tion to the server. This means that the communication of multiple different VMs, running
on top of it, will go through a single physical N_Port to the F_Port. Fine, but is this going
to be possible, as now there are multiple N_Ports that would like to log in to the fabric
and get their unique FCIDs? The domain ID will be the same (the area ID is not taken
into account), and the port ID will be the same as well, because all of these virtual N_
Ports communicate through the same physical N_Port. Figure 13-2 illustrates the N_Port
challenge in a virtualized environment.
VMI/O
VN_Port1
Multiple VN_Ports
VM I/O through a single FFPort
Port
VN_Port2 physical N_Port
F Port
FC switch can’t
F Port
N_Port
VM I/O
VN_Port3 create unique FCIDs
HBA
VM I/O
VN_Port4 F Port
VM I/O
VN_Port5
Technet24
||||||||||||||||||||
To solve this problem, N-Port ID Virtualization (NPIV) was developed (see Figure 13-3).
It allows the switch to assign multiple N-Port IDs or FCIDs to a single Fibre Channel host
connection (N-Port). This is possible because, on the server side, the NPIV-capable HBA
creates and assigns multiple virtual N_Ports to the VMs, and the virtual N_Ports have
unique World Wide Port Name (WWPNs) addresses and communicate through the same
physical N_Port. Because of that, the switch will see multiple different WWPNs, which
will allow it to create multiple different N_Port-to-F_Port references. Based on that, the
switch will be able to generate and assign the needed unique FCIDs even if the traffic for
all the VMs goes through the same physical N_Port.
NPIV
VMI/O
VN_Port1
WWPN1
Each VM N_Port will
VMI/O
get a unique WWPN FFPort
Port
VN_Port2
WWPN2 to log in and will get
a unique FCID.
VMI/O
N_Port
F Port
NPIV feature
F Port
VN_Port3
WWPN3 HBA enabled
VMI/O
VN_Port4
WWPN4 F Port
VMI/O
VN_Port5
WWPN5
The device login process starts normally, with an initial FLOGI. Then for all subsequent
logins, the fabric discovery (FDISC) login process is used.
The NPIV feature is available in the NX-OS and is supported on the following Cisco data
center switches:
||||||||||||||||||||
Support for the NPIV feature on the various Cisco data center products dynamically
changes. For the latest information, check the data sheets of the products on the Cisco
Systems website.
The command to enable this feature in the Cisco NX-OS is global. First, you must go into
the configuration mode and then to enable NPIV, use the feature command, as shown in
Example 13-2.
To verify that the NPIV feature is enabled, use the same show feature command as
before (see Example 13-3).
Technet24
||||||||||||||||||||
So, the NPV Edge switch is accepted as an end node, as it does not run the needed Fibre
Channel protocol services and does not log in to the fabric. However, from the perspec-
tive of the initiators and the targets, the real end nodes that connect to it, it is still a Fibre
Channel switch. There is still the communication between the N_Ports of the servers and
the storage systems as well as the F_Ports (the Fibre Channel ports on the NPV Edge
switch) to which the nodes connect. Figure 13-4 shows the NPV Core-Edge
topology.
Initiators Targets
HBA
||||||||||||||||||||
The Fibre Channel ports on the NPV Edge switch facing the nodes are operating as
F_Ports, but the switch itself, as it does not run the needed services, and especially the
FLOGI and the FCNS services, can’t support the FLOGI requests from the N_Ports.
However, the NPV Core switch, as it is running the needed services and is part of the
switched fabric, can perform the FLOGI. It can also generate and assign the needed
FCIDs, as it has the NPIV enabled. The link between the end nodes and the NPV Core
switch happens as the F_Ports on the NPV Edge switch are mapped to the Fibre Channel
ports that connect to the NPV Core switch. The NPV Edge switch does not switch the
Fibre Channel frames from the nodes to the core switch, but it proxies them to the Fibre
Channel port connected to the NPV Core switch. That’s why this port operates as the
NP_Port, which stands for Node Proxy Port. And why do you need to have the NPIV
feature enabled on the NPV Core switch? Because, just like with the virtualized server
connected to the Fibre Channel switch, the NPV Edge switch will present multiple N_
Ports communicating through the NP_Port. In this way, the NPV Core switch will solve
the same challenge and will be capable of assigning multiple different FCIDs. The NPV
port roles are shown in Figure 13-5.
Initiators Targets
F_Port F_Port
F_Port
NP_Port
HBA
N_Port N_Port
F_Port
VF_Port
CNA FCOE
NP_Port
VN_Port
The NPV Edge switches support the F_ and VF_Ports for the connectivity of the end
nodes. The latter are used when there is no native Fibre Channel physical connectivity,
and a protocol such as the FCoE is used to carry the Fibre Channel frames over Ethernet
connectivity. And because the FCoE protocol was mentioned, it is important to make
another note here. Sometimes there is a bit of confusion because of the impression that
the terms FCoE and NPV are interchangeable. They are not. NPV refers to the mode in
which the switch operates from the perspective of the Fibre Channel protocol. FCoE
is a way of encapsulating the Fibre Channel frames in Ethernet frames to be capable
of transmitting them over an Ethernet network. It’s important to realize that the FCoE
Technet24
||||||||||||||||||||
switch might operate in Fibre Channel switching mode or in NPV mode, if supported. As
a quick summary, the NPV Edge switch has these characteristics:
■ The NPV Core switch provides the needed Fibre Channel protocol services.
■ It appears as a switched fabric switch to the initiators and targets connected to it;
that is, it appears as F_Ports.
For the purposes of keeping the explanation simple up until now, the link between the
NPV Edge and Core switches was explained as one between an NP_Port and an F_Port.
In the data center it is very rare to have a single link between two network devices, as
this represents a bottleneck and also a single point for a failure. That’s why usually there
are multiple links between the NPV Edge and Core switches. Depending on the NPV
Edge device, there are different approaches for how the F_Ports are mapped to the avail-
able NP_Ports. Usually this happens automatically, as the switch tries to map every new
F_Port to the NP_Port, which is either the next one, using a round-robin mechanism, or
the NP_Port, which is least busy.
There is also the option to use manual mapping through a traffic map, which is a compo-
nent from the configuration. This allows the administrator to manually configure which
F_Ports will communicate upstream through which NP_Ports.
The NPV mode of operation is supported on the following Cisco Data Center switches:
||||||||||||||||||||
Like the configuration of the NPV Edge switch, this is enabled as a feature of NX-OS.
However, do not forget that by enabling this “feature,” you actually change the mode,
and the switch will need to be reloaded to start operating in the new mode. That’s why
you should first check whether or not the feature is enabled (see Example 13-4).
The second step is to enable the NPV (see Example 13-5). This will be disruptive, as the
NX-OS will warn you.
After you confirm that you want to continue, the switch will reload itself, which takes a
couple of minutes.
On the upstream Fibre Channel switch (the NPV Core switch), the NPIV feature needs to
be enabled (see Example 13-6).
mds-9200-a# conf t
Enter configuration commands, one per line. End with CNTL/Z.
mds-9200-a(config)# feature npiv
Technet24
||||||||||||||||||||
The next step on the NPV Edge switch is to configure the NP ports (see
Example 13-7).
You also need to configure the ports to which connect the end node in F mode (see
Example 13-8).
On the NPV Core switch, the ports that connect to the NP ports have to be configured
as F_Ports (see Example 13-9).
Example 13-9 Configuring the NPV Core Switch Ports Connecting to the NPV
Edge Switch
Then, the ports will come up on the Core switch, as shown in Example 13-10.
||||||||||||||||||||
--------------------------------------------------------------------------------
Interface Vsan Admin Admin Status SFP Oper Oper Port Logical
Mode Trunk Mode Speed Channel Type
Mode (Gbps)
--------------------------------------------------------------------------------
fc1/1 1 auto on down swl -- -- -- --
fc1/2 1 F on up swl F 16 -- edge
The same will happen on the NPV Edge switch, as shown in Example 13-11.
--------------------------------------------------------------------------------
Interface Vsan Admin Admin Status SFP Oper Oper Port Logical
Mode Trunk Mode Speed Channel Type
Mode (Gbps)
--------------------------------------------------------------------------------
fc1/1 1 F off up swl F 16 -- edge
The output of the show interface brief command shows that the interface connected to
the initiator operates in F mode and is up and that the interfaces connected to the NPV
Core switch are operating in NP mode and are also up.
By using the command show npv status on the Edge switch, you can check the mode of
operation for the external interfaces and the initiator-facing ones (see Example 13-12).
Technet24
||||||||||||||||||||
npiv is disabled
External Interfaces:
====================
Interface: fc1/2, VSAN: 1, FCID: 0x0a0600, State: Up
Interface: fc1/3, VSAN: 1, FCID: 0x0a0500, State: Up
Server Interfaces:
==================
Interface: fc1/1, VSAN: 1, State: Up
On the NPV core switch, you can check the FLOGI and FCNS databases to make sure
the NPV Edge switch logged in to the switched fabric as an NPV device and also that the
server connected to it was capable to successfully log in to the fabric through the NPV
Core switch (see Example 13-13).
Example 13-13 Verify Successful Fabric Login on the NPV Core Switch
||||||||||||||||||||
VSAN 1:
--------------------------------------------------------------------------
FCID TYPE PWWN (VENDOR) FC4-TYPE:FEATURE
--------------------------------------------------------------------------
0x0a0500 N 20:03:00:de:fb:ce:d9:00 (Cisco) npv
0x0a0600 N 20:02:00:de:fb:ce:d9:00 (Cisco) npv
0x0a0601 N 21:00:f4:e9:d4:58:d7:88 scsi-fcp:init
Back on the Edge switch, you can see that the show npv command has a couple of useful
options (see Example 13-14).
The flogi-table option provides information for the end nodes that reached the Core
switches through the Edge switch in order to log in to the fabric. It also shows the
mapping between the F_Port and the NP_Port (see Example 13-15).
mds-9100-a(config-if)#
Technet24
||||||||||||||||||||
The option external-interface-usage shows the mapping between the server interface and
the upstream interface. It also allows you to search based on a specific server interface
(that is, F_Port), as shown in Example 13-16.
Example 13-16 Server-Facing Port to Egress Port Mapping on the Edge Switch
----------------------------------------
fc1/1 fc1/2
----------------------------------------
As you can see, there are enough tools to provide the needed information for the status
and the operation of both the Core and Edge NPV switches, as well as the communica-
tion of the connected end nodes. Here are some of the benefits of NPV mode:
■ Simplified configuration on the NPV Edge switches, as they do not participate in the
SAN fabric. The configuration is limited to configuring the interfaces, the VSANs,
and traffic maps, if needed.
■ The NPV Edge switches utilize less resources, which allows for better and more reli-
able operation. Especially if these are Ethernet switches on which you run FCoE, the
resources will be freed to support other services.
||||||||||||||||||||
Summary 469
Summary
This chapter reviewed the Fibre Channel protocol features, which allow for the use of
storage virtualization technologies to overcome limitations in the protocol for scalable
and virtualized data center environments. In this chapter you learned about the following:
■ The scalability of the SAN is limited by the theoretical maximum of 239 domain IDs
per fabric, which can be translated to a maximum number of 239 switches.
■ The biggest stable switched fabric is achieved by Cisco with 80 domain IDs.
■ A virtualized physical server also has the challenge to log in the VMs in the switched
fabric, due to the requirement for a unique N_Port-to-F_Port reference for the gen-
eration of an FCID.
■ The N_Port ID Virtualization (NPIV) feature solves the virtual server challenge.
■ NPIV-capable HBAs are capable of using a separate WWPN for the N_Ports of
the VMs.
■ The NPIV allows the switch to see a unique N_Port for each VM communicating
through a single physical N_Port.
■ N_Port Virtualization (NPV) mode solved the challenge of limited domain IDs in a
switched fabric.
■ The switch that runs in NPV mode is called an NPV Edge switch.
■ The NPV Edge switch does not run the core Fibre Channel protocol services and
does not log in to the switched fabric.
■ The NPV Edge switch connects to an upstream Fibre Channel switch with the Fibre
Channel protocol services running.
■ The NPV Edge switch proxies the communication of the end nodes to the NPV
Core switch, where they can log in to the switched fabric.
■ The end nodes still accept the NPV Edge switch as a Fibre Channel fabric switch and
connect to F_Ports on it.
■ Inside the NPV Edge switch, the traffic entering through F_Ports cannot be pro-
cessed and is mapped to the ports, which connect to the upstream NPV Core switch.
These are the NP_Ports, or node proxy ports.
Technet24
||||||||||||||||||||
■ NPV is a mode in which the edge switch operates, and it is disruptive to enable it.
■ The command show npv allows one to gain valuable information on the status of
the NPV mode running on the switch, including the NP_Ports, F_Ports, the mapping
between them, and more.
Reference
“Storage Area Networking (SAN),” https://www.cisco.com/c/en/us/products/
storage-networking/index.htmlCisco MDS NX-OS Configuration Limits - https://
www.cisco.com/c/en/us/td/docs/dcn/mds9000/sw/9x/configuration/configuration-
limits/cisco-mds-9000-nx-os-configuration-limits-9x.html
||||||||||||||||||||
Chapter 14
The previous chapters discussed storage area networks (SANs) and the requirements for a
secure and reliable environment that can support the communication between initiators
and targets. The Fibre Channel Protocol (FCP) is selected to be the transport for this type
of communication exactly because it is built to create a secure and reliable transmis-
sion. The credit-based flow control in the FCP, which is controlled by the receiving side,
is the basis of the reliable, secure, and lossless transmission of the FC frames. The FCP
runs on specially designed Fibre Channel switches, such as the Cisco MDS family, which
provides the needed capacities and resources to support it. Building and maintaining the
SANs is expensive and does not always provide the needed flexibility and scalability
at the required cost. That’s why we’ve developed other ways to carry the Fibre Channel
frames over other infrastructures using less reliable protocols such as the Ethernet proto-
col. Fibre Channel over Ethernet (FCoE) is one such protocol, which uses as a transport
the Ethernet protocol and carries the Fibre Channel frames over the local area networks
in the data centers. The FCoE protocol will be discussed in the next chapter. However,
before we discuss the protocol, it is important to understand how the Ethernet LANs
were modified to support the highly demanding communication of the Fibre Channel
Protocol.
This chapter will cover the IEEE Data Center Bridging extensions, which add the needed
functionality and features to the Ethernet protocol to allow for secure and reliable trans-
missions over the Fibre Channel frames.
These enhancements might have been developed for the needs of the FCoE protocol, but
nowadays, based on them and the resulting reliable Ethernet infrastructure, new protocols
for accessing data between devices are supported, such as RoCE, which stands for Rapid
Direct Memory Access over Converged Ethernet. This is a very new protocol, and Cisco
Systems has announced that certain models of the Nexus 9000 switches will support it.
This technology will be adopted in data centers in the near future.
Technet24
||||||||||||||||||||
The Cisco DCB architecture is based on the IEEE 802.1 DCB, with some further improve-
ments. The IEEE DCB includes the following standards to enable FCoE transport of
inherently lossless traffic over an infrastructure that is inherently susceptible to loss, as is
Ethernet communication:
■ IEEE 802.1Qbb Priority Flow Control (PFC): Lossless delivery for selected types
of traffic, based on their priority.
■ IEEE 802.1Qaz Data Center Bridging Exchange (DCBX): Protocol for exchang-
ing parameters between DCB devices. An extension of the Link Layer Discovery
Protocol (LLDP).
The IEEE 802.1AB LLDP is also worth mentioning, as the DCBX protocol uses it as
a means to communicate and negotiate FCoE capabilities between the participating
devices. The DCBX protocol uses specific LLDP type, length, value (TLV) parameters
to pack the FCoE parameters, and LLDP is used for the exchange. Because of that, even
though LLDP is not mentioned in the preceding list of DCB specifications, it needs to be
enabled, as it is used by the DCBX protocol.
||||||||||||||||||||
Tx Port Rx Port
FFPort
Port
R_RDY Acknowledgment
F Port
F Port
FC Frame FC Frame FC Frame
HBA
F Port
BB_Credit_RX =
64 BB_Credit_CNT= BB_Credit_RX - 1
BB_Credit_CNT=0
In the SAN, the only communication that takes place is the Fibre Channel Protocol com-
munication. Over a link between an N-Port and an F_Port, there is only communication
to or from the initiator, and it is only block-based data. In other words, the link is not
divided by different types of communication. It is dedicated to this communication.
In Ethernet, a mechanism known as the IEEE 802.3X defines the native flow control
at the level of the link. It allows when Ethernet frames can be transmitted over a link,
and in the case of congestion, the receiving side can send a pause frame instructing the
transmitting side to stop sending frames, as there are no resources to store and process
them. This means the receiving side makes a decision to send a pause frame based on its
available buffers. When the resources are available again, the receiving side sends another
pause frame to notify the Tx side that it can now start transmitting again. This is illus-
trated on Figure 14-2.
Tx Port Tx Port
Figure 14-2 IEEE 802.3X Link-Level Flow Control Using a Pause Frame
Technet24
||||||||||||||||||||
Although the 802.3X pause frame mechanism is not the same, it is very similar to the
Fibre Channel protocol flow control. In both cases, the receiving side notifies the trans-
mitting side, and a decision is made based on the availability of resources at the receiving
side. In both situations, it is assumed that the link is dedicated only to one type of com-
munication. With both the FCP flow control and the 802.3X pause frame mechanism, all
the communication over the link will be stopped.
The challenge comes in using a converged Ethernet infrastructure to carry multiple types
of traffic. Over the same link will be different types of Ethernet communication as well
as FCoE data communication. If only the native flow control mechanisms are relied on,
all the traffic will be disrupted.
To solve these issues, IEEE 802.1Qbb Priority Flow Control (PFC) was developed (see
Figure 14-3). It uses the fact that in Ethernet environments, traffic is divided into differ-
ent VLANs, and for each VLAN, you can set a different class of service (CoS) value in
the VLAN tag. So, the IEEE 802.1p CoS and IEEE 802.1Q VLAN specifications can be
used to allow traffic over the shared link to be divided and marked with different CoS
values, based on which different priorities will be assigned. This also allows the pause
frame to be used, not at link level, but to stop only traffic for a specific priority!
CoS 1 CoS 1
CoS 2 CoS 2
CoS 5 CoS 5
CoS 6 CoS 6
CoS 7 CoS 7
In summary, the PFC enables lossless behavior for the Layer 2 flows on an Ethernet seg-
ment as follows:
||||||||||||||||||||
■ Defining which CoS values will be used for the pause control mechanism.
■ Using the IEEE 802.3X pause control frames to communicate buffer availability to
the sender.
■ When the receive buffer of CoS is full, the PFC on the switch sends a pause for this
class of traffic to prevent dropped frames.
If you implement PFC on the DCB link, you can create eight distinct virtual link types on
the physical link, with the option of any of these links being paused and restarted inde-
pendently, as shown in Figure 14-3. Not every type of traffic needs to be controlled with
a pause frame. The Fibre Channel traffic, which is carried by the FCoE protocol, cannot
be dropped. That’s why in the Cisco switches and converged network adapters (CNAs),
which support the unified fabric, the CoS value of 3 is reserved for FCoE communica-
tion, and the pause frame control mechanism is always used for that traffic. On the other
hand, the rest of the traffic, which is Ethernet communication, might not need to utilize
the same mechanism, as it relies on the upper protocols’ mechanisms for secure communi-
cation, such as the TCP protocol.
The switches negotiate PFC capability with the connected CNA or DCB switch by
default using the DCBX protocol. Then the PFC is enabled and link-level flow control
remains disabled, regardless of its configuration settings. If the PFC negotiation fails, you
can force PFC to be enabled on the interface, or you can enable IEEE 802.3X link-level
flow control, which is a flow-control scheme within full-duplex Ethernet networks. If
you do not enable PFC on an interface, you can enable the IEEE 802.3X link-level pause,
which is disabled by default.
The ETS defines two types of classes: the bandwidth allocation classes and the strict-
priority classes. This allows for time-sensitive or low-latency traffic to be assigned strict
priorities and not to be affected by the bandwidth-sharing mechanism.
Technet24
||||||||||||||||||||
In summary, the IEEE 802.1Qaz ETS, shown in Figure 14-4, allows for the following:
■ Traffic class management: Uses strict-priority traffic classes, which guarantees com-
pliance with special requirements, and solves the challenge of bursty traffic.
Tx Port Rx Port
DCBX Protocol
The PFC and the ETS provide the means to divide, mark, and control the traffic and
the bandwidth used, but there is still the need for these parameters and capabilities to
be exchanged and negotiated between the devices in the converged infrastructure. The
devices need to support the same standards. They need to be enabled. There is the need
to exchange information about which CoS value is assigned to which traffic class. What
is the priority for that class, and will the pause frame be used for it?
The converged, expanded Ethernet fabric consists of switches that support the DCB
extensions to allow communication over the Ethernet infrastructure of both Ethernet and
FCoE (or similar) traffic. Connected to the converged Ethernet infrastructure, also called
a unified fabric, are end devices that support multiple different types of communication
as well as non-DCB Ethernet switches. Because of that, two challenges need to be solved
in such environments:
■ Define the borders of the unified fabric, or identify which devices, switches, and end
devices support the converged infrastructure and which do not.
||||||||||||||||||||
■ Exchange information about DCB support and capabilities between the devices in
the unified fabric
To achieve that, the IEEE 802 working group specified the IEEE 802.1Qaz Data Center
Bridging Exchange (DCBX) protocol (see Figure 14-5), which takes care of the exchange
and negotiation of the following items:
■ Traffic classes
■ Logical link-down
Eth
F Port
Port
DCBX negotiation and
capabilities exchange
Eth Port
Eth Port
Cisco Nexus switch
DCB capable
ETS Bandwidth
PFC Traffic Classes
Allocation
FCoE, etc.
The DCBX utilizes the IEEE 802.1AB LLDP as the transport protocol to communicate
between the two devices. The DCBX communication consists of request and acknowl-
edgment messages, as the information is encoded as TLV parameters.
When the devices, switches, and CNAs are connected to each other and the interfaces are
enabled, the Cisco switches immediately try to initiate DCBX communication with the
other devices. In case the other devices also support the DCB, the DCBX negotiations
continue. In the LLDP TLVs, the parameters for the PFC, ETS, and the rest of the required
information are exchanged to create a lossless environment for communication.
Technet24
||||||||||||||||||||
Summary
This chapter reviewed the standards known as the Data Center Bridging extensions,
which allow an Ethernet infrastructure to be converted into a lossless environment for
combined communication of both Ethernet and FCoE traffic. In this chapter you learned
about the following:
■ The IEEE Data Center Bridging extensions were developed by the IEEE 802 working
group.
■ The unified fabric allows for the communication of Ethernet frames and FCoE
frames over the same medium.
■ The IEEE 802.1Qbb Priority Flow Control enables the traffic to be divided into dif-
ferent types, marked with different class of service values for the different priorities,
and to use the pause frame to avoid drops for specific priorities.
■ The FCoE traffic in the Cisco unified fabric is marked with a CoS value of 3, and by
default it’s applied a no drop policy (that is, the IEEE 802.3X Pause frame mecha-
nism is used).
■ The IEEE 802.1Qaz Data Center Bridging Exchange is a protocol that exchanges and
negotiates the DCB support and capabilities among the connected devices.
■ The DCBX uses the LLDP TLVs to exchange information in the request and acknowl-
edge messages.
References
“Storage Area Networking,” https://www.cisco.com/c/en/us/products/storage-
networking/index.html
||||||||||||||||||||
Chapter 15
Describing FCoE
This chapter covers the Cisco Unified Fabric and the protocol it uses for storage
communication—Fibre Channel over Ethernet (FCoE). You will learn what the Cisco
Unified Fabric is, why it is needed, and what its building components are. This chapter
also focuses on the architecture and operation of FCoE and the adapters that support
this protocol, and some configuration examples are provided as well.
The LAN infrastructure provides connectivity between the users and the applications
running on the servers, and between the various applications in the data center, such as
databases, application front ends, and so on. LAN infrastructures are used for the com-
munication between computers using different network protocols. The most popular
ones nowadays are the protocols in the Open System Interconnection (OSI) layer. The
components of a LAN are the switches and routers, which build the infrastructure itself,
and the computers connected to them. The computers need to have a network interface
card (NIC), which is the LAN I/O interface used to connect and communicate on the net-
work. The NICs support the communication of the OSI model protocols, and they oper-
ate at different speeds, starting with 10/100/1000Mbps and reaching 10/40Gbps and even
100Gbps with the latest Cisco adapters.
The LAN infrastructure is usually consolidated, which in this case means that, usually,
unless there are security or compliance requirements, the LAN is one infrastructure,
although it might be segmented at the different layers of the OSI model through the
use of VLANs, routing, and security access-control lists. It can grow and it has different
physical layers, but it stays one infrastructure.
There are also SAN infrastructures in the data center. SANs, as already covered in the
previous chapters, are used for the communication between the servers and the storage
Technet24
||||||||||||||||||||
systems. The servers and the storage systems need to be equipped with I/O controllers
that support the communication of the Fibre Channel Protocol. These controllers are
called host bus adapters (HBAs) and can communicate at speeds of 1/4/8/16 and 32Gbps.
The SAN infrastructures are built by using Cisco MDS switches, which are specialized
switches supporting the communication of the Fibre Channel Protocol. Historically,
before Cisco entered the SAN market, the only security mechanism supported for isolat-
ing the communication was zoning. However, this mechanism was not enough, as it didn’t
support true segmentation, and this led to the need for building multiple separate physi-
cal SAN infrastructures to separate and secure the data communication between different
applications and their storage.
Cisco’s implementation of the Fibre Channel Protocol introduced the concept of virtual
SANs (VSANs), which allowed for consolidating multiple separate physical SAN infra-
structures, and on the top of a single SAN, separate virtual SANs, totally isolated from
each other, could be created. This led to significant savings when organizations had to
plan for and deploy storage infrastructures in their data centers. However, there was still
the issue of having the two separate major infrastructures, the LAN and the SAN, each
built with different types of devices, requiring the servers to have both NICs and HBAs.
When specialized cabling is added into the budget, the cost becomes significant, espe-
cially with the price of the small form-factor pluggable (SFP) transceivers used for Fibre
Channel Protocol connectivity. The SAN in general is much more expensive to be built
compared to the cost of a LAN infrastructure. Figure 15-1 illustrates separate LAN and
SAN infrastructures in the data center.
LAN
Infrastructure
NIC
HBA
SAN
Infrastructure
||||||||||||||||||||
To overcome these challenges and to offer a more flexible way of designing and deploy-
ing the needed communication infrastructures in the data center, Cisco came up with a
new approach by introducing the creation of a consolidated I/O infrastructure, better
known as the Cisco Unified Fabric. The idea is that the cheaper LAN infrastructure can
be modified, using the Data Center Bridging enhancements, to create a lossless environ-
ment that allows the use of overlay encapsulation for the secure communication of the
Fibre Channel Protocol frames. The Fibre Channel frames are encapsulated in Ethernet
frames and sent over the Cisco Unified Fabric switches to the storage systems. The Cisco
switches that support the FCoE protocol, and thus the Cisco Unified Fabric, include
some of the Cisco Nexus and MDS switches as well as some of the Cisco Nexus 2000
fabric extenders. At the time this chapter was written, the list of Cisco switch models that
support FCoE was constantly changing, so my advice would be to go and check with the
Cisco product documentation for the latest information. Figure 15-2 illustrates the Cisco
Unified Fabric.
LAN
Infrastructure
CNA SAN
Infrastructure
Cisco Unified
Fabric
In the Cisco Unified Fabric, the servers must use a different I/O controller that supports
both the Ethernet and the FCoE protocols. These adapters are called converged adapters.
The Cisco converged adapters offer extended capabilities compared to the ones offered
by Broadcom and Emulex and are called Cisco virtual interface cards (VICs). The FCoE
adapters are discussed in a dedicated section later in this chapter.
With the FCoE adapter, the server connects to the FCoE-capable switches with Ethernet
cabling and also allows for FCoE communication over the same cheaper cabling.
Although the FCoE standard does not impose any specific minimum bandwidth for the
Technet24
||||||||||||||||||||
link, Cisco requires at least 10Gbps of bandwidth in order to support FCoE in a reliable
manner.
The Cisco Unified Fabric provides the following benefits and flexibility in the data
center:
■ Reduced number of I/O adapters in the servers. Instead of separate NICs and HBAs,
now converged adapters are used.
■ Reduced cost of the cabling, as there are fewer or no native Fibre Channel links,
which means less-expensive Fibre Channel Protocol transceivers.
■ Significant savings in the amount of power per rack when FCoE is employed.
■ Significant savings on power because less cooling is needed to compensate for the
heat produced.
■ Storage administrators manage their fabrics in the same manner they always have, as
the FCoE does not change the Fibre Channel Protocol model.
■ FCoE maps the Fibre Channel traffic onto lossless Ethernet. This results in perfor-
mance benefits over technologies that require a gateway.
FCoE Architecture
The FCoE is a transport protocol defined by the American National Standards Institute
(ANSI) T11 committee. It has the needed functionality and enhancements to create a
lossless environment and to carry the FC frames encapsulated into Ethernet frames. In
the FCoE protocol, the lower layers of the Fibre Channel Protocol are replaced with
unified fabric I/O consolidation over Ethernet. These are the layers that take care of the
encapsulation and de-encapsulation as well as the lossless transmission over Ethernet. The
upper-layer Fibre Channel Protocol services, such as domain IDs, Fabric Shortest Path
First (FSPF), Fibre Channel Name Server (FCNS), fabric login (FLOGI), zoning, and so on,
stay the same as in the Fibre Channel world, because it is the Fibre Channel protocol that
operates at these layers. Figure 15-3 shows which layers are changed.
||||||||||||||||||||
In the Fibre Channel Protocol operation, the FC frame is created in the hardware of the
HBA, which is the FC-2 layer responsible for the framing and flow control. Afterward,
the FC frame is passed to the FC-1 layer to be encoded with the needed serialization in
place and transmitted over the native FC link. In the case of the Unified Fabric, there is
no native FC link but rather an Ethernet link. That’s why after the FC frame is formed
in the HBA part of the converged adapter, it is passed to the FCoE Logical End Point
(LEP). The FCoE LEP is a new component that is responsible for taking the FC frame
and encapsulating it in an Ethernet frame, with all the needed information required by
the Data Center Bridging Exchange (DCBX) enhancements for the secure transmission of
the new Ethernet frame, which is contained inside an FC frame. This new Ethernet frame
has some special characteristics. Because the standard FC frame is up to 2112 bytes in
size, and the standard Ethernet frame is usually 1500 bytes, without any overhead from
encryption or something similar, the maximum transmission unit (MTU) for the FCoE
communication must either be set to the default for the FCoE protocol (2240 bytes) or,
if the switch does not allow defining such a size, just be allowed to use jumbo frames,
which is sufficient enough (see Figure 15-4).
Figure 15-4 shows that the FC frame is inside the Ethernet frame, and there are not any
changes made to the original FC frame. It is encapsulated just by adding the needed
FCoE header and the Ethernet header. No manipulations are allowed to be performed
on the FC frame, as this will invalidate it. This also means that the FC header still has the
source and destination FCIDs needed for the FC protocol communication. Do not forget
that from the perspective of the Fibre Channel Protocol, the FCoE protocol is just a dif-
ferent cable/transport mechanism. The FCoE header contains control info. The Ethernet
header contains important information such as the Ethertype, as the value for the FCoE
protocol is 0x8906. This value notifies the switch that inside this Ethernet frame is a Fibre
Channel Protocol frame. Additionally, in the Ethernet header is the 802.1Q VLAN tag.
Technet24
||||||||||||||||||||
This is important because it defines to which VLAN the FCoE traffic will belong. In
Cisco’s implementation of the FCoE, dedicated VLANs are used for its traffic, which
allows the needed priority flow control and no-drop policies to be applied. And last but
not least, in the Ethernet header are the source and destination MAC addresses, as the
FCoE frame is basically an Ethernet frame that will be transported over an Ethernet infra-
structure. The formation of the MAC address and how it maps to the FCID in the servers’
converged adapter is a function of the FCoE Initialization Protocol (FIP), which takes
care of the negotiations before any FCoE communication can happen.
FCoE
Ethernet Fibre Channel Frame – up to 2112 bytes
Header
• Source MAC
• Destination MAC Header Source FCID Destination FCID Payload FOS
• Ethertype 0x8906 CRC
In the FCoE communication are two roles (or participating elements): the FCoE Ethernet
nodes (ENodes) and the Fibre Channel Forwarders (FCFs).
An FCF is an Ethernet switch that also supports the FCoE protocol. When a switch sup-
ports the FCoE protocol and is also an FCF, it means that this switch is composed of two
switches: an Ethernet switch with the Ethernet physical ports and a Fibre Channel switch
(or a component that can run and process the Fibre Channel protocol frames). This means
that this component runs all the needed Fibre Channel Protocol services, just like any
other physical standalone Fibre Channel switch. That’s why the FCF, shown in
Figure 15-5, can process the Fibre Channel Protocol logins, services, and frames.
When an FCoE frame enters the FCF, it is processed as an Ethernet frame, as it enters
through a physical Ethernet port. Based on the Ethertype, the switch knows that it is
an FCoE frame, and it must be sent to the FCoE LEP, where it is de-encapsulated. The
remaining FC frame is then processed based on the rules of the Fibre Channel Protocol.
Once it is processed, either it will be sent as a native FC frame through an egress native
FC port, if the FCF also has native FC connectivity, or it will be encapsulated again in an
Ethernet FCoE frame if it is supposed to leave through the Ethernet egress port.
||||||||||||||||||||
Ethernet switch
The servers are equipped with converged networks adapters (CNAs) in the Cisco Unified
Fabric. This allows them to be physically connected to an Ethernet port of a switch with
native Ethernet connectivity but over that link to carry both their Ethernet communication
as well as their storage communication using the FCoE protocol, in case the switch is FCoE
capable. The hardware of the CNA, shown in Figure 15-6, is very different from the traditional
NICs and HBAs. For external physical connectivity, the CNA uses 10Gbps or faster Ethernet
ports, but inside, facing the server, separate NICs and HBAs are built in. The OS of the server
communicates with the CNA and sees the separate HBAs and NICs through the PCIe bus.
10Gbps ASCs
FCoE + DCBX PCIe
NICs & HBAs
10Gbps
8Gbps
1Gbps
Ethernet FC Fibre Channel
PCIe PCIe
NIC HBA
8Gbps
1Gbps
FC
Technet24
||||||||||||||||||||
The NICs in the CNA natively use the Ethernet physical egress ports. The situation is
more complex with the Fibre Channel Protocol communication.
The HBAs in the CNA cannot communicate directly using the physical Ethernet ports.
That’s why there is a specialized ASIC, or silicone, that performs the function of the
FCoE LEP. The HBAs are acting as native FC ports, but as there are no physical FC ports,
the HBAs are presented as virtual Fibre Channel (VFC) ports. Through the ASIC, the
VFCs use the physical external ports for FCoE communication. By the way, it is the same
on the FCF side—because the FC switching component is behind the physical Ethernet
ports, the FC ports are virtual (VFCs) and act just like the physical FC ports. Therefore,
you have the standard Fibre Channel Protocol communication with the Fibre Channel
ports, with the appropriate port roles assigned, with the only difference that these are
virtual ports, and the roles are also presented as virtual to provide the information that
these roles are assigned to virtual FC interfaces. As with the Fibre Channel Protocol port
modes, the virtual ones are as follows (see Figure 15-7):
■ Virtual Node ports (VN): The VFC on the CAN of the server.
■ Virtual Fabric port (VF): The FCF VFC ports to the VN ports are connected.
■ Virtual Expansion ports (VE): The FCF VFC connect to another FCF VFC.
■ Virtual Node Proxy port (VNP): When the FCF operates as NPV edge switch (that
is, when it is not running the FC Protocol services).
The VN port communicates with the VF port at the side of the switch in the same way
as in a SAN infrastructure the N_Port will connect and communicate with the F_Port on
the FC switch. The difference is that because there is no native FC physical connectivity,
the negotiations and the login processes will be performed by the FCoE Initialization
Protocol (FIP).
||||||||||||||||||||
The VE ports connect multiple FCoE switches over the physical Ethernet connectivity,
just like with the native FC communication.
As the VFC ports are Fibre Channel ports, when they communicate, they use FCIDs.
However, because the Fibre Channel frames are encapsulated in Ethernet frames to be
transported, as the Ethernet frames use MAC addresses, this brings up the issue of what
MAC address to use for the FCoE communication of a specific VFC that will allow a
direct mapping between the FCID and the MAC address to be created in the FCoE LEP.
As the MAC addresses are 48 bits and the FCIDs are 24 bits, a direct mapping is not pos-
sible. That’s why there needs to be a component added to the value of the FCID. This
is called the Fibre Channel MAC Address Prefix (FC-MAP), which has a size of 24 bits,
and it represents the first part of the MAC address. The second part is the 24-bit FCID
of the VFC. In this way, there is a unique 48-bit MAC address used for the communica-
tion of this specific VFC. This also means that each VFC on the CAN will have its own
unique MAC address, as each will have its own unique FCID. The FC-MAP is a value
set on the FCF switch, and the default value is 0E.FC.00. As you’ll remember from the
previous chapters, the FCID in the Fibre Channel Protocol communication is created and
assigned by the Fibre Channel switch, after a successful fabric login from the end node.
With the FCoE protocol, the FCID is also created on the FC switch; in this case, it’s the
FCF switch. This means that both values that form the MAC address for the FCoE com-
munication are provided by the FCF switch, or the fabric. This method of creating and
assigning a MAC address is called a Fabric Provided MAC Address (FPMA) and shown
in Figure 15-8. Based on the FC-BB-5 definition, there is a range of 256 FC-MAP values
that can be used. In environments where there might be overlapping FCIDs, or for other
purposes, administrators can create and use up to 256 different pools of MAC addresses.
Technet24
||||||||||||||||||||
With the FPMA approach and the FCoE protocol there is one challenge. For the VFC to
log in to the fabric, it needs to be capable of communicating over the Ethernet infrastruc-
ture using the FCoE protocol. However, for this to happen, a MAC address is needed, and
the MAC address can be created by the fabric after a successful fabric login. Therefore,
without an FCID there is no a MAC address, and without a MAC address there is no
FCID. To solve this challenge, and others, there is one additional protocol that communi-
cates before the FCoE protocol, and that is the FCoE Initialization Protocol.
The FCF and the end node are capable of differentiating between the FIP and the FCoE
communication based on the Ethertype value. The FIP Ethertype value is 0x8914.
1. FIP VLAN discovery: The FCoE VLAN used by all other FIP protocols and the
FCoE communication is discovered. The FIP uses the native VLAN to discover the
FCoE VLAN. The FIP VLAN discovery protocol is the only FIP running on the
native VLAN. The rest of the FIP and FCoE communication happens in the discov-
ered FCoE VLANs.
3. FCoE virtual link instantiation: The FIP carries the needed fabric login (FLOGI),
fabric discovery (FDISC), and logout (LOGO). The result is that the VFC port is
assigned an FCID and a MAC address for the subsequent FCoE communication. The
Ethertype is still FIP.
4. FCoE virtual link maintenance: After the previous step, all the needed information
has been exchanged between the devices and the needed negotiations have taken
place. The devices can communicate using the FCoE protocol to carry FC frames.
Therefore, the FCoE protocol is operating at this stage. In the meantime, the FIP
continues to regularly exchange control information between the devices to maintain
the virtual link between the VN and the VF ports. The Ethertype is FCoE.
||||||||||||||||||||
FC Command
4. FC Command FCoE 0x8906
Responses
During the first stage of the process, the devices communicate with each other with the
goal of the CNA reporting that it is FCoE capable and finding the FCoE VLAN, which
is used by the FIP. This happens as the CNA, using the Ethernet burnt-in address (BIA)
as a source MAC address, sends a solicitation to the All-FCF-MACs multicast address to
which all the FCFs listen. The Ethertype of the multicast solicitation frame is set to the
FIP value of 0x8914.
This communication happens in the native VLAN, and the FIP VLAN discovery is the
only stage that happens in it. The subsequent communication of the FIP takes place in
the FCoE VLAN used by the chosen FCF.
In the FC-BB-5 standard, the FIP VLAN discovery protocol is described as optional. This
means that if it is not used, the FCoE VLAN will default to VLAN 1002, and it might not
be the FCoE VLAN used by the Unified Fabric. That’s why Cisco strongly recommends
that the FIP VLAN discovery protocol be used.
Once the FCoE VLAN discovery finishes by providing information for the FCoE VLAN,
it then starts the next stage, where the FIP FCF discovery protocol takes place. It uses the
FCoE VLANs, and the goal is to find the FCFs and to start negotiations with them. For
this purpose, the CNA sends a multicast solicitation, the ALL-ENode-MACs multicast
MAC address. This message will identify the VF-Port-Capable FCF MACs. One such
multicast solicitation will be sent to each of the FCoE VLANs discovered in the previous
phase.
Technet24
||||||||||||||||||||
Upon receiving the multicast solicitation frame, the FCF will respond with a unicast
advertisement. It will contain the first 24 bits of the FPMA; this is the FC-MAP value
configured on the FCF. The other information contained in the FCF advertisement is as
follows:
■ FCF priority
■ Switch name
■ Switch capabilities
Based on that information, the CAN will decide which FCF to communicate with. For
this decision, the CAN considers the FCF priority and the FCF MAS address.
The third phase will be to establish the virtual link between the VN and the VF port.
The ENode, or the host, will send FLOGI frames to the FCID of FF:FF:FE. This is a well-
known address for the fabric login communication. The Ethertype of the FCoE frames
will still be FIP.
The FCF will assign an FCID for the host and will respond to the FLOGI request. With
this, an FPMA MAC address will be created for the subsequent FCoE communication,
and the virtual link will be established.
This concludes the work of the FIP for negotiating and establishing the environment for
the FCoE communication. The subsequent frames will have the FCoE Ethertype. The FIP
will continue to communicate in the background with the goal of maintaining the virtual
link.
FCoE Configuration
The FCoE protocol is currently supported on the following Cisco switches:
■ Cisco UCS
||||||||||||||||||||
The “References” section provides the links to the data sheets of the Cisco switches and
linecards that support the FCoE protocol.
The Cisco Nexus 5000, Cisco Nexus 5500, and the Cisco Nexus 6000 Series switches
supported the FCoE. These models are end of sale and soon will no longer be supported.
The Cisco Nexus 6000 switches also have end of life status. For these reasons, they are
not covered in this section.
The sample configuration of the FCoE communication in this chapter refers to a Cisco
Nexus 7000/7700 switch with an F-linecard.
The FCoE is a licensed feature in the Cisco NX-OS. The needed license is described in
the Cisco NX-OS licensing guide. In the “References” section is the link to the guide on
the Cisco website.
The Nexus 7000/7700 switches do not have native Fibre Channel ports. They support
FCoE only through Ethernet ports. This means they cannot bridge between an FCoE
Ethernet infrastructure and a native Fibre Channel Switch fabric.
The storage VDC is a virtual Fibre Channel switch inside the Cisco Nexus 7000/7700
switches. It is the FCF that will own the VFC ports and will communicate with the FCoE
LEP. Only one storage VDC can exist on a switch. All of the subsequent FCoE configura-
tion and processing will happen in the storage VDC or will be linked to it.
The Ethernet interfaces used for the FCoE communication come in one of two types,
depending on how they are utilized by the storage VDC: dedicated or shared. The dedi-
cated Ethernet ports will be used only to carry the VLANs for the FCoE communication,
while the shared interfaces will be shared between the storage VDC and a data VDC. This
means that both data VLANs and the FCoE VLANs will be carried through these inter-
faces.
The FCoE feature set needs to be enabled first. This happens when you are in the default
VDC of the switch:
The LLDP feature needs to be enabled for the DCBX protocol communication:
Technet24
||||||||||||||||||||
The next step is to enable the system quality of service (QoS), which supports the FCoE
communication. Without this step, the storage VDC creation will fail later:
Prepare the physical Ethernet interfaces that will be used for the FCoE communication.
They will need to allow the communication of multiple VLANs (one of them will be the
FCoE VLAN) and to be configured as edge ports in order for the Spanning Tree Protocol
not to block them:
Finally, save the configuration. It is always a good idea to save the configuration:
With this, the switch is prepared for the needed FCoE configuration.
Storage VDC
The storage VDC is a virtual Fibre Channel switch inside the Cisco Nexus 7000/7700
switches. It is the FCF, which will own the VFC ports and will communicate with the
FCoE LEP. All of the subsequent FCoE configuration and processing will happen in the
storage VDC or will be linked to it.
The first step is to create the storage VDC. Creating the storage VDC and assigning
resources to it happen in the admin VDC:
Now that the storage VDC with the name fcoe-vdc is created, the next step is to specify
which Ethernet physical ports will be used for the FCoE communication. In this example,
the first five interfaces are allocated as dedicated and the second five will be shared:
The other resource that needs to be assigned is a VLAN or a range of VLANs that will be
used to carry the FCoE traffic:
||||||||||||||||||||
switch- fcoe_vdc(config-vsan-db)#vlan 10
switch- fcoe_vdc(config-vlan)#fcoe vsan 10
switch- fcoe_vdc(config-vlan)exit
Specify the port mode. You can specify two port modes (E or F), depending on the
device that is connected to the Ethernet port through which the VFC will communicate.
The default port mode is F:
The next step is to bind the VFC to a physical Ethernet port for the FCoE traffic:
Technet24
||||||||||||||||||||
Just like with the native FC ports, the VFC also needs to be assigned to a VSAN:
FCoE Verification
As the final steps, you need to verify the interface status:
vfc10 is up
Bound interface is Ethernet2/10
Hardware is Virtual Fibre Channel
Port WWN is 20:02:00:0d:ec:3d:16:13
Port WWN is 20:02:00:0d:ec:3d:16:13
snmp link state traps are enabled
Port WWN is 20:02:00:0d:ec:3d:16:13
APort WWN is 20:02:00:0d:ec:3d:16:13
snmp link state traps are enabled
Port mode is F, FCID is 0x390100
Port vsan is 10
1 minute input rate 0 bits/sec, 0 bytes/sec, 0 frames/sec
1 minute output rate 0 bits/sec, 0 bytes/sec, 0 frames/sec
0 frames input, 0 bytes 0 discards, 0 errors
0 frames output, 0 bytes 0 discards, 0 errors
Interface last changed at Wed Jan 6 01:42:11 2021
||||||||||||||||||||
Summary 495
Summary
This chapter described the FCoE protocol, the Unified Fabric created by it, and the
FCoE Initialization Protocol (FIP), and it also provided an example of an FCoE-related
configuration on the Cisco Nexus 7000 switches. In this chapter, you also learned about
the following:
■ The Cisco Unified Fabric is a consolidated I/O infrastructure that allows Fibre
Channel protocol frames to be carried over an Ethernet infrastructure.
■ The Cisco Unified Fabric optimizes the utilization of the resources and minimizes
the cost of the infrastructure.
■ The FCoE protocol allows Fibre Channel frames to be encapsulated intact inside
Ethernet frames and to be sent for communication over the Ethernet infrastructure.
■ The DCBX extensions allow the Ethernet infrastructure to become lossless and
secure for FCoE communication.
■ The hosts are equipped with specialized communication controllers called converged
networks adapters (CNAs).
■ The network switch that supports the FCoE actually is two switches in one: a physi-
cal Ethernet switch and inside a Fibre Channel protocol entity, responsible for pro-
cessing to the FC.
■ The FCoE switches that process the FCoE are called Fibre Channel forwarders.
■ The MAC address used for FCoE communication by the CAN is a fabric-provided
MAC address (FPMA), which consists of two parts: the FC-MAP and the FCID.
■ The FC-MAP is a 3-byte value configured on the FCF. The default value is 0E-FC-00.
■ Finally, FIP performs the FLOGI/FDISC operation in order to establish the virtual
FC link.
■ In this way, FIP supports the creation of the FPMA MAC address for subsequent
communication.
■ On Cisco Nexus 7000 switches, the FCF entity is the storage VDC.
Technet24
||||||||||||||||||||
■ Only a single storage VDC can exist on a Cisco Nexus 7000 switch.
■ To create the storage VDC, you must first install the FCoE feature set and then
enable the system QoS for FCoE.
■ The resources that need to be assigned to the storage VDC are the FCoE VLANs
and the physical Ethernet ports that will service the FCoE traffic.
■ Once the storage VDC is created, the VSAN must be created and mapped to an
FCoE VLAN.
■ For each VFC interface, you must configure which VSAN it will belong to, which
mode it will operate in, and which physical Ethernet port it will communicate
through.
References
“Storage Area Networking,” https://www.cisco.com/c/en/us/products/storage-network-
ing/index.html
“Unified Fabric White Paper—Fibre Channel over Ethernet (FCoE),” https://www.cisco.
com/c/en/us/td/docs/solutions/Enterprise/Data_Center/UF_FCoE_final.html
Design and configuration guides for Cisco Nexus switches: https://www.cisco.com/c/en/
us/support/switches/index.html
“Cisco NX-OS Licensing Guide,” https://www.cisco.com/c/en/us/td/docs/switches/data-
center/sw/nx-os/licensing/guide/b_Cisco_NX-OS_Licensing_Guide.htmlCisco Nexus
2000 FEXs supporting FCoE - https://www.cisco.com/c/en/us/products/switches/
nexus-2000-series-fabric-extenders/models-comparison.html#~tab-10g
||||||||||||||||||||
Chapter 16
This chapter covers the Cisco Unified Computing System (UCS). The Cisco UCS is
Cisco’s innovative approach to the compute component in the data center. It is a complex
integrated solution to address multiple challenges and consists of multiple components.
You will get acquainted with the different physical components of the Cisco UCS and the
different generations of the integrated system. You will take a closer look at the differ-
ent compute nodes and their technical characteristics. You will learn which can operate
in a standalone mode and which operate as an integrated component of the Cisco UCS.
Additionally, we will take a look of the Cisco HyperFlex system as an example of a fully
integrated infrastructure for the data center.
Technet24
||||||||||||||||||||
Cisco Systems has been in the network and storage business for a very long time, and
somewhere around 2008 it entered the server market with the Cisco Unified Computing
System (UCS). It was and still remains a revolutionary, next-generation, highly integrated
platform for providing the needed computing resources, together with their communica-
tion and storage needs, in a flexible way that allows for the optimal utilization of the
hardware computing resources through hardware abstraction.
You will investigate how exactly the hardware abstraction happens and what the benefits
are in the next chapter, where Cisco UCS management is discussed. However, to be able
to understand the Cisco UCS better, first you will learn about the different physical com-
ponents of the Cisco UCS and how they connect to each other. You will also learn about
the whole physical topology of the Cisco UCS as well as the standalone Cisco UCS serv-
ers and their benefits and applications.
Based on the Cisco UCS, Cisco Systems has developed Cisco HyperFlex. It is a hyper-
converged application-centric solution that integrates not only the hardware and the
management of the Cisco UCS but goes further and creates the needed environment for
the virtualization supporting containers. It also includes the needed storage and commu-
nication.
Before we can dig deeper into the Cisco UCS hardware abstraction or Cisco HyperFlex,
you will need to learn about the physical components that make up the Cisco UCS, as
illustrated in Figure 16-2.
||||||||||||||||||||
■ Cisco UCS Manager (UCSM): This application manages and monitors all the com-
ponents of the Cisco UCS. In the last generation of the Cisco UCS, the UCS-X, the
Cisco UCS Manager is replaced by the Cisco Intersight Managed Mode (IMM).
■ Cisco UCS Fabric Interconnects (FIs): The communication devices responsible for
the physical connectivity of the Cisco UCS to the LAN and SAN infrastructures. An
additional important function is that the Cisco UCS Manager runs on them.
■ Cisco UCS 5108 Blade Chassis: The chassis that houses the servers with a blade
physical form factor and the I/O modules (IOMs). It provides the connectivity, power
supply, and cooling for the blade servers. The new chassis (the UCS X9508), the new
blade servers, and their management are discussed in a separate topic.
■ CISCO I/O Modules (IOMs): These are the connectivity modules, also known as
Fabric Extenders (FEX). They are installed in the blade chassis and provide for the
physical connectivity of the servers and the chassis up to the Fabric Interconnects.
■ Cisco UCS B-series servers: The Cisco B-series servers are servers that have a
smaller physical form factor. This is the blade form factor, and that’s why they are
usually called blade servers. The Cisco B-series servers are installed in the Cisco
UCS blade chassis.
Technet24
||||||||||||||||||||
■ Cisco UCS C-series servers: The Cisco C-series servers are designed as rack-mount
servers. They can be either a component of the Cisco UCS, under the management
of the Cisco UCSM, or they can be used in the standalone mode, where they are
managed by their own Cisco Integrated Management Controller (CIMC).
■ Cisco virtual interface cards (VICs): The Cisco VIC is a purpose-built converged
network adapter to be used in the Cisco UCS servers. It allows for Ethernet, FC,
or FCoE communication but also supports the Cisco VM-FEX technology, which
allows for the network connectivity awareness and management up to the level of
the virtual machines.
These are the major components that build a Cisco Unified Computing System. The
whole system appears as one component in front of the rest of the data center compo-
nents.
All Cisco UCS blade servers come with Cisco UCS Manager capability. Cisco UCS with
Cisco UCS Manager provides the following:
■ Local (and optionally global) server profiles and templates for policy-driven server
provisioning
The Cisco UCS components are connected and managed in a specific way, as the result-
ing integrated system is known as a Cisco UCS domain (see Figure 16-3). The Cisco UCS
domain consists of two Cisco UCS Fabric Interconnects, which fulfill two roles:
■ Communication: All the servers’ communication to and from the data center goes
through them. Upstream, facing the data center LAN and SAN infrastructures, the
Cisco UCS Fabric Interconnects provide the connectivity for both the Ethernet data
communication and the storage communication. The storage communication can
be based on block or file storage protocols; as for the block-based communication,
the supported options cover the FC, FCoE, and the iSCSI protocols. The Fabric
Interconnects connect upstream to the LAN and SAN infrastructures. And, down-
stream, the FIs are responsible for the connectivity to the servers. Inside the Cisco
UCS, all the communication is based on Ethernet. There is only Ethernet physical
connectivity from the Fabric Interconnects down to the servers. That’s why the serv-
ers are equipped with converged network adapters, which externally use Ethernet
connectivity but also support FCoE communication between them and the Fabric
Interconnects.
■ Management: The Cisco UCS Manager application, which takes care of the discov-
ery, provisioning, configuration, and management of the Cisco UCS servers, runs on
the Fabric Interconnects. Additionally, the Fabric Interconnects provide a separate
out-of-band dedicated interface to connect to the management network in your data
center.
||||||||||||||||||||
In a Cisco UCS domain, the two Fabric Interconnects are connected to each other with
a special link, called the cluster link. By bringing in the cluster link, you create a cluster;
the link is used for configuration synchronization and the exchange of state information
between the two Fabric Interconnects. In this way, if one of them becomes unresponsive,
the other takes care of the management. However, keep in mind that this applies to the
management application (that is, on which of them the management application will run
in active mode). When it comes to processing the data communication from the servers,
the two Fabric Interconnects are both active. This is another way of adding redundancy
and reliability.
South from the Fabric Interconnects are the Cisco UCS servers. There are two types of
servers: the B-series and C-series servers (that is, the blade and rack servers). The C-series
servers are equipped with their own power supplies, cooling, and communication adapt-
ers. Therefore, when the C-series servers are part of a Cisco UCS domain, they are
connected directly to the Fabric Interconnects. There are other supported connectivity
options between the C-series servers and the Fabric Interconnects that involve the use of
special Cisco 2000 Fabric Extenders (FEXs).
The B-series servers are installed in the Cisco UCS 5108 blade chassis. The blade chassis
takes care for the redundant power supply, cooling, and connectivity. For the connectiv-
ity, the chassis has installed the Cisco UCS 2000/2200/2300/2400 FEXs, or I/O Modules
(IOMs). The series represents the generation of the Cisco UCS FEX. For example, the
Cisco UCS 2000 FEXs are the first generation and are no longer sold or supported. The
Cisco UCS 2200 FEXs are the second generation and are to be used with the second gen-
eration of the Fabric Interconnects and such.
Technet24
||||||||||||||||||||
The IOMs, or FEXs, are not switches. This means they do not perform L2 lookups as
any other Ethernet switch, so there’s no processing of the Ethernet frames. The IOMs
are communication devices that physically connect the uplinks between the FI and
the blade chassis with the servers installed in the blade chassis. This happens through
mapping of the server-facing (internal) ports to the uplink (external) ports of the IOM.
Depending on the model of the IOM/FEX, there’s a different number of internal ports.
They are hard-pinned to the slots in which the blade servers are installed. This is a huge
advantage, as it means the communication of the servers will always have to reach the
Fabric Interconnects, and they will be the first networking-capable devices to process the
Ethernet frames. In this way, the number of hops in the communication between the serv-
ers inside the UCS will be kept to a minimum.
But here’s one very important thing: the Fabric Interconnects can operate in two modes
when it comes to Ethernet processing. The default mode is end host mode, which again
is passive. In it, the servers’ traffic is pinned (mapped) to an uplink port and sent to an
upstream switch, and there the Ethernet frame is processed. This is because in this mode
the Fabric Interconnects do not learn MAC addresses from the LAN infrastructure.
Therefore, the upstream network switch will have the knowledge of where to switch to
the Ethernet frame from the Cisco UCS server. But the Fabric Interconnects in this mode
still learn MAC addresses on their server ports. These are the Ethernet ports that connect
to the Cisco UCS IOMs in the blade chassis, or to the C-series servers connected directly
to them. In this way, if two Cisco UCS servers want to communicate with each other, and
communicate through the same FI, this traffic will be switched locally, without exiting
the FI, and without the burden of additional network processing hops, as the IOMs are
passive devices.
The second mode of operation for the Fabric Interconnects is the switching mode, where,
with some minor limitations, they operate as network switches.
Note that previously, the terms Fabric Extender (FEX) and I/O Module (IOM) have
been used interchangeably. In general, from Cisco’s perspective, they mean the same
thing, but historically they are used for different things. For example, the Cisco Nexus
2000/2200/2300 FEXs are standalone devices, intended to increase the port density
when they are connected to upstream Nexus 5500, 5600, 6000, 7000, 7700, or 9000
Series switches. They belong to the LAN component of the data center, and their role is
to extend the access layer. The Cisco UCS FEXs/IOMs devices are installed in the blade
chassis (they do not have their own power supply or cooling) for the purpose of provid-
ing connectivity for the blade servers. As it is true that they operate the same way as the
Nexus 2K FEXs, they are different devices with different application.
Now we reach the blade servers. As already mentioned, they are installed in the server
slots of the blade chassis, which has a total of eight server slots. The blade servers come
in two physical forms: half-width and full-width servers. The half-width blade server
occupies one server slot on the chassis. If equipped only with half-width blade servers,
the chassis can accommodate up to eight servers. The full-width blade servers occupy
two horizontal slots in that chassis, for a maximum of four full-width servers per chassis.
||||||||||||||||||||
The B-series servers are equipped with a blade connector, which connects the server to
the chassis in the slot. The blade connector provides the power to the server, enables the
management communication, and also communicates with the server’s converged net-
work adapter. With that said, there are two major types of communication for the blade
server:
■ The data plane communication: This includes both the LAN and SAN communica-
tion. It happens through the converged adapter or in the Cisco UCS, also called a
mezzanine adapter. The mezzanine adapter provides the server separate NICs and
HBAs, and externally it has only 10Gbps or higher Ethernet interfaces and supports
FCoE for the FC communication.
Both the management and data plane communication on the B-series server go through
the blade connecter, using separate paths of communication through the blade chassis
midplane, and through the chassis’ IOMs to reach the Fabric Interconnects.
With all of that said, it is very important to understand that Cisco UCS's physical connec-
tivity creates the underlying communication infrastructure. This infrastructure is available
for the communication needs of the servers. How it will be used by each of the servers
is a matter of configuration in the Cisco UCSM and relates to the hardware abstraction
principle on which the Cisco UCS is based, discussed in the chapter “Describing Cisco
UCS Abstraction”.
■ Provides LAN and SAN connectivity for all servers within the domain
■ Includes unified ports with support for Ethernet, Fibre Channel over Ethernet
(FCoE), and Fibre Channel
Technet24
||||||||||||||||||||
The Cisco UCS Fabric Interconnects are now in their fourth generation. The Cisco UCS
6100 Fabric Interconnects were the first generation. They are no longer supported or sold.
The second generation is the Cisco UCS 6248UP and Cisco 6296UP Fabric Interconnects,
which are shown in Figure 16-4.
They offer line-rate, low-latency, lossless 10-Gigabit Ethernet, FCoE, and Fibre Channel
functions. Cisco UCS 6200 Series Fabric Interconnects provide LAN and SAN connectiv-
ity for all blades within their domain. Cisco UCS 6200 Series uses a cut-through network-
ing architecture, supporting deterministic, low-latency, line-rate 10-Gigabit Ethernet on all
ports, a switching capacity of 2 terabits (Tb), and a bandwidth of 320-Gbps per chassis,
independent of packet size and enabled services.
The Cisco UCS 6248UP 48-Port Fabric Interconnect is a one-rack-unit (1-RU) 10-Gigabit
Ethernet, FCoE, and Fibre Channel switch offering up to 960-Gbps of throughput and
up to 48 ports. The switch has 32 1Gbps and 10Gbps fixed Ethernet, FCoE, and Fibre
Channel ports and one expansion slot.
The Cisco UCS 6296UP 96-Port Fabric Interconnect is a 2-RU 10-Gigabit Ethernet,
FCoE, and native Fibre Channel switch offering up to 1920-Gbps of throughput and
up to 96 ports. The switch has 48 1Gbps and 10Gbps fixed Ethernet, FCoE, and Fibre
Channel ports and three expansion slots.
The third generation is the Cisco 6300 Fabric Interconnects, shown in Figure 16-5. They
provide the following:
■ Up to 2.56-Tbps throughput
||||||||||||||||||||
The Cisco UCS 6324 Fabric Interconnect extends the Cisco UCS architecture into envi-
ronments with requirements for smaller domains. Providing the same unified server and
networking capabilities as in the full-scale Cisco UCS solution, Cisco UCS 6324 Fabric
Interconnect embeds the connectivity within the Cisco UCS 5108 blade server chassis to
provide a smaller domain of up to 20 servers.
Cisco UCS 6324 Fabric Interconnect provides the management, LAN, and storage con-
nectivity for the Cisco UCS 5108 blade server chassis and direct-connect rack-mount
servers. It provides the same full-featured Cisco UCS management capabilities and XML
application programming interface (API) as the full-scale Cisco UCS solution, in addition
to integrating with Cisco UCS Central Software and Cisco UCS Director.
From a networking perspective, Cisco UCS 6324 Fabric Interconnect uses a cut-through
architecture, supporting deterministic, low-latency, line-rate 10-Gigabit Ethernet on all
ports, switching capacity of up to 500-Gbps, and 80-Gbps uplink bandwidth for each
chassis, independent of packet size and enabled services. Sixteen 10Gbps links connect to
the servers, providing a 20-Gbps link from each Cisco UCS 6324 Fabric Interconnect to
each server. The product family supports Cisco low-latency, lossless 10-Gigabit Ethernet
unified network fabric capabilities, which increase the reliability, efficiency, and scalabil-
ity of Ethernet networks. The Fabric Interconnect supports multiple traffic classes over a
lossless Ethernet fabric from the blade through the Fabric Interconnect.
The Cisco UCS 6400 Fabric Interconnects, shown in Figure 16-6, comprise the fourth
and latest generation:
■ 54 or 108 ports
■ Up to 7.64-Tbps of throughput
Technet24
||||||||||||||||||||
The Cisco UCS 6454 Fabric Interconnect is a core part of Cisco UCS, providing net-
work connectivity and management capabilities for the system. Cisco UCS 6454 Fabric
Interconnect offers line-rate, low-latency, lossless 10-, 25-, 40-, and 100-Gigabit Ethernet,
FCoE, and Fibre Channel functions.
The Cisco UCS Fabric Interconnects provide the management and communication back-
bone for Cisco UCS B-series blade servers, Cisco UCS 5108 B-series server chassis, Cisco
UCS Managed C-series rack-mount servers, and Cisco UCS S-series storage servers. All
servers attached to the Cisco UCS Fabric Interconnect become part of a single, highly
available management domain. In addition, by supporting a unified fabric, Cisco UCS
provides LAN and SAN connectivity for all servers within its domain.
From a networking perspective, Cisco UCS 6454 Fabric Interconnect uses a cut-through
architecture, supporting deterministic, low-latency, line-rate 10-, 25-, 40-, and 100-Gigabit
Ethernet ports, a switching capacity of 3.82-Tbps (terabits per second), and 320-Gbps
bandwidth between Cisco UCS 6454 Fabric Interconnect and the Cisco UCS 2208XP
I/O module per Cisco UCS 5108 blade server chassis, independent of packet size and
enabled services. The product family supports Cisco low-latency, lossless 10-, 25-, 40-,
and 100-Gigabit Ethernet unified network fabric capabilities, which increase the reli-
ability, efficiency, and scalability of Ethernet networks. The Fabric Interconnect supports
multiple traffic classes over a lossless Ethernet fabric from the server through the Fabric
Interconnect.
The Cisco UCS Fabric Interconnect comes with a fixed number of ports. You can option-
ally install expansion modules that allow you to have a greater number of unified ports.
||||||||||||||||||||
server chassis. This simplicity eliminates the need for dedicated chassis management and
blade switches, reduces cabling, and enables Cisco UCS to scale to 20 chassis without
adding complexity.
The Cisco UCS 5108 blade server chassis is the first blade server chassis offering by
Cisco and is 6-RU. The chassis can mount in an industry-standard 19-inch (48-cm) rack
and uses standard front-to-back cooling.
At the front of the chassis, shown in Figure 16-7, are the server slots. It can accommodate
up to eight half-width or four full-width Cisco UCS B-Series blade server form factors
within the same chassis.
Also, at the front are installed the power supplies. The Cisco UCS 5108 blade server chas-
sis supports up to four fully modular PSUs that are hot-swappable under certain power
redundancy configurations.
■ N+1: The total number of PSUs to satisfy non-redundancy, plus one additional PSU
for redundancy, are turned on and equally share the power load for the chassis. If
any additional PSUs are installed, Cisco UCS Manager sets them to a “turned-off”
state. If the power to any PSU is disrupted, Cisco UCS Manager can recover without
an interruption in service.
Technet24
||||||||||||||||||||
■ Grid: Two power sources are turned on, or the chassis requires greater than N+1
redundancy. If one source fails (which causes a loss of power to one or two PSUs),
the surviving PSUs on the other power circuit continue to provide power to the
chassis.
The power policy is a global policy that specifies the redundancy for power supplies in
all chassis in the Cisco UCS domain. This policy is also known as the PSU policy.
At the back, as shown in Figure 16-8, are installed the FEXs for the uplink connectivity
to the Fabric Interconnects and the fan modules.
As the blade servers use passive colling, which means that they do not have their own
cooling, the fan modules of the chassis take care of the cooling of the chassis, the serv-
ers, the power supplies, and the FEX modules. The chassis has a total of eight slots for
fan modules to provide the needed redundant cooling capacity.
In the chassis are also installed the FEX modules. They are installed at the back of the
chassis because there are two slots for the two FEX modules. To build a Cisco UCS clus-
ter, you need to have two Fabric Interconnects. The cluster link between them does not
carry any data communication, because each of the Fabric Interconnects is a separate
data path. This is very important because, upstream, the Cisco UCS connects to two dif-
ferent infrastructures—the LAN and the SAN. Each of them has different design require-
ments for supporting redundancy and increased reliability. A very important principle is
always to have two physically separate paths between the initiator and the target. In the
LAN infrastructure the approach is almost the opposite—the goal is to interconnect as
||||||||||||||||||||
much as possible and then leave it to the switching and routing protocols as well as to
technologies such as port channels and virtual port channels (VPCs) to take care of the
redundancy. However, inside the Cisco UCS, it is one and the same physical connectivity,
over which communication to both the LAN and the SAN occurs. And this also means
that there must be a way for the internal Cisco UCS infrastructure to support the redun-
dancy design for both the LAN and SAN. As the LAN is more flexible, the challenge is
to be compliant with the more restrictive design of the SAN infrastructure. That’s why,
inside the Cisco UCS, it is important to preserve two physically separate paths of com-
munication. Because of that, there are two Cisco UCS FEXs per chassis: one for the com-
munication path through each of the Fabric Interconnects. Put in a different way, each
UCS FEX connects to only one UCS Fabric Interconnect.
Cisco UCS 2200 Series extends the I/O fabric between Cisco UCS 6200 Series Fabric
Interconnects and the Cisco UCS 5100 Series blade server chassis, enabling a lossless and
deterministic FCoE fabric to connect all blades and chassis. Because the Fabric Extender
is similar to a distributed line card, it does not perform switching and is managed as an
extension of the Fabric Interconnects. This approach removes switching from the
Technet24
||||||||||||||||||||
chassis, reducing overall infrastructure complexity and enabling Cisco UCS to scale to
many chassis without multiplying the number of switches needed, reducing total cost of
ownership (TCO) and allowing all chassis to be managed as a single, highly available man-
agement domain.
Cisco UCS 2200 Series also manages the chassis environment (the power supply, fans,
and blades) along with the Fabric Interconnect. Therefore, separate chassis management
modules are not required. Cisco UCS 2200 Series Fabric Extenders fit into the back of
the Cisco UCS 5100 Series chassis. Each Cisco UCS 5100 Series chassis can support up
to two Fabric Extenders, allowing increased capacity and redundancy.
The Cisco UCS 2204XP Fabric Extender has four 10-Gigabit Ethernet, FCoE-capable,
SFP+ ports that connect the blade chassis to the Fabric Interconnect. Each Cisco UCS
2204XP has 16 10-Gigabit Ethernet ports connected through the midplane to each
half-width slot in the chassis. Typically configured in pairs for redundancy, two Fabric
Extenders provide up to 80-Gbps of I/O to the chassis.
The Cisco UCS 2208XP Fabric Extender has eight 10-Gigabit Ethernet, FCoE-capable,
Enhanced Small Form-Factor Pluggable (SFP+) ports that connect the blade chassis to the
Fabric Interconnect. Each Cisco UCS 2208XP has 32 10-Gigabit Ethernet ports connect-
ed through the midplane to each half-width slot in the chassis. Typically configured in
pairs for redundancy, two Fabric Extenders provide up to 160-Gbps of I/O to the chassis.
The Cisco UCS 2304 Fabric Extender has four 40-Gigabit Ethernet, FCoE-capable, Quad
Small Form-Factor Pluggable (QSFP+) ports that connect the blade chassis to the Fabric
Interconnect. Each Cisco UCS 2304 can provide one 40-Gigabit Ethernet port con-
nected through the midplane to each half-width slot in the chassis, giving it a total eight
40G interfaces to the compute. Typically configured in pairs for redundancy, two Fabric
Extenders provide up to 320-Gbps of I/O to the chassis.
The Cisco UCS 2408 Fabric Extender has eight 25-Gigabit Ethernet, FCoE-capable,
Small Form-Factor Pluggable (SFP28) ports that connect the blade chassis to the Fabric
Interconnect. Each Cisco UCS 2408 provides 10-Gigabit Ethernet ports connected
through the midplane to each half-width slot in the chassis, giving it a total 32 10G inter-
faces to UCS blades. Typically configured in pairs for redundancy, two Fabric Extenders
provide up to 400Gbps of I/O from FI 6400 series to 5108 chassis.
The Cisco UCS servers either have a two-CPU socket mainboard or a four-CPU socket
mainboard. To distinguish between them and also to get more information quickly, you
need to know the naming convention. For example, Cisco UCS B200 M5 means that this
||||||||||||||||||||
is a blade server (B). If the server name starts with a C, it means that it’s a C-series rack
mount server. The first digit, which is 2, gives you the information for the number of
CPU sockets on the mainboard. This digit can be either 2 or 4 (the latter is for specifying
a four-CPU socket mainboard). The M5 at the end of the model’s name specifies that this
is a fifth-generation server.
The Cisco UCS blade servers come in full-width and half-width physical sizes. The
full-width size takes up two slots in a Cisco UCS 5108 chassis and the half-width size
occupies a single slot.
The Cisco UCS B200 M5 blade server has the following features:
■ Up to two second-generation Intel Xeon Scalable and Intel Xeon Scalable processors
with up to 28 cores per CPU.
Technet24
||||||||||||||||||||
■ Modular LAN On Motherboard (mLOM) card with Cisco UCS Virtual Interface
Card (VIC) 1440 or 1340, a two-port, 40-Gigabit Ethernet (GE), Fibre Channel over
Ethernet (FCoE) capable mLOM mezzanine adapter.
■ Optional rear mezzanine VIC with two 40-Gbps unified I/O ports or two sets of 4×
10-Gbps unified I/O ports, delivering 80-Gbps to the server; adapts to either 10 or
40-Gbps fabric connections.
■ Two optional, hot-pluggable, hard disk drives (HDDs), solid-state drives (SSDs), or
Nonvolatile Memory Express (NVMe) 2.5-inch drives with a choice of enterprise-
class RAID (redundant array of independent disks) or pass-through controllers.
■ Support for optional SD Card or M.2 SATA drives for flexible boot and local storage
capabilities.
■ Memory:
■ 3200 MHz DDR4 memory, plus other speeds, depending on the CPU installed
■ 16× DDR4 DIMMs + 16× Intel Optane persistent memory modules for up to
12TB of memory
■ Cisco UCS Virtual Interface Card (VIC) 1440 modular LAN On Motherboard
(mLOM)
■ One rear slot for the VIC (Cisco UCS VIC 1480 or port expander)
■ One front dedicated slot for a Cisco FlexStorage RAID controller, Cisco
FlexStorage passthrough, or Cisco M.2 RAID controller
||||||||||||||||||||
The Cisco UCS B480 M5 blade server is the only currently offered full-width blade
server, and it has the following characteristics:
■ Four new second-generation Intel Xeon scalable CPUs (up to 28 cores per socket)
■ Four existing Intel Xeon scalable CPUs (up to 28 cores per socket)
■ Support for higher-density DDR4 memory, from 6TB (128G DDR4 DIMMs) to 12TB
(256G DDR4 DIMMs)
■ Intel Optane DC Persistent Memory Modules (DCPMMs): 128G, 256G, and 512G1
■ Up to 18TB using 24× 256G DDR4 DIMMs and 24× 512G Intel Optane DC
Persistent Memory Modules
■ Cisco UCS Virtual Interface Card (VIC) 1340 modular LAN On Motherboard
(mLOM) and upcoming fourth-generation VIC mLOM
Only one CPU is required for normal system operation. If only one CPU is installed, the
CPU must go into the first socket. The CPUs must be identical on the same blade server
but can be mixed between blade servers in the same chassis. Also, depending on which
CPU sockets, different memory configurations are supported.
Technet24
||||||||||||||||||||
The Cisco UCS C-Series rack servers, shown in Figure 16-12, from the fifth generation
(M5) are as follows:
The Cisco UCS C-series offers various models that can address different workload chal-
lenges through a balance of processing, memory, I/O, and internal storage resources.
When used with Cisco UCS Manager, Cisco UCS C-Series servers bring the power
and automation of unified computing to enterprise applications, including Cisco
||||||||||||||||||||
Cisco UCS C-Series Servers and the Cisco IMC Supervisor 515
Cisco UCS Manager uses service profiles, templates, and policy-based management to
enable rapid deployment and to help ensure deployment consistency. It also enables
end-to-end server visibility, management, and control in both virtualized and bare-metal
environments.
Cisco UCS C-Series M5 servers are Cisco Intersight ready. Cisco Intersight is a new
cloud-based management platform that uses analytics to deliver proactive automation and
support. By combining intelligence with automated actions, you can reduce costs dra-
matically and resolve issues more quickly.
The Cisco UCS C220 M5 rack server is among the most versatile, general-purpose, enter-
prise infrastructure and application servers in the industry. It is a high-density two-socket
rack server that delivers exceptional performance and efficiency for a wide range of work-
loads, including virtualization, collaboration, and bare-metal applications.
The Cisco UCS C220 M5 server extends the capabilities of the Cisco UCS portfolio in
a 1RU form factor. It incorporates Intel Xeon Scalable processors, which support up to
20% more cores per socket, twice the memory capacity, 20% greater storage density, and
five times more PCIe NVMe SSDs compared to the previous generation of servers. These
improvements deliver significant performance and efficiency gains that will improve your
application performance.
The Cisco UCS C220 M5 delivers outstanding levels of expandability and performance in
a compact package that includes the following:
■ Latest (second-generation) Intel Xeon Scalable CPUs, with up to 28 cores per socket
■ Support for first-generation Intel Xeon Scalable CPUs, with up to 28 cores per socket
■ Support for the Intel Optane DC Persistent Memory (128G, 256G, 512G)
■ Support for a 12-Gbps SAS modular RAID controller in a dedicated slot, leaving the
remaining PCIe Generation 3.0 slots available for other expansion cards
The Cisco UCS C240 M5 rack server is a two-socket, 2RU rack server that offers indus-
try-leading performance and expandability. It supports a wide range of storage and I/O-
intensive infrastructure workloads, from big data and analytics to collaboration.
Technet24
||||||||||||||||||||
■ The latest (second-generation) Intel Xeon Scalable CPUs, with up to 28 cores per
socket
■ Support for the Intel Optane DC Persistent Memory (128G, 256G, 512G)
■ Support for 12-Gbps SAS modular RAID controller in a dedicated slot, leaving the
remaining PCIe Generation 3.0 slots available for other expansion cards
■ Modular M.2 or Secure Digital (SD) cards that can be used for boot
The Cisco UCS C480 M5 rack server is a storage- and I/O-optimized enterprise-class rack
server that delivers industry-leading performance for in-memory databases, big data ana-
lytics, virtualization, and bare-metal applications. The C480 M5 comes in a 4RU form-
factor and has the following characteristics:
■ Four new second-generation Intel Xeon Scalable CPUs (up to 28 cores per socket)
■ Four existing Intel Xeon Scalable CPUs (up to 28 cores per socket)
■ Support for the Intel Optane DC Persistent Memory (128G, 256G, 512G)
■ Support for higher-density DDR4 memory, from 6TB (128G DDR4 DIMMs) to 12TB
(256G DDR4 DIMMs)
■ Intel Optane DC Persistent Memory Modules (DCPMM): 128G, 256G, and 512G1
■ Up to 18TB using 24× 256G DDR4 DIMMs and 24× 512G Intel Optane DC
Persistent Memory Modules
||||||||||||||||||||
Cisco UCS C-Series Servers and the Cisco IMC Supervisor 517
The Cisco UCS C480 ML M5 rack server is a purpose-built server for deep learning. It is
storage- and I/O-optimized to deliver performance for training models. It comes in a 4RU
form factor and offers these capabilities:
■ The latest Intel Xeon Scalable processors with up to 28 cores per socket and support
for two processor configurations. It supports both first- and second-generation Intel
Xeon Scalable CPUs.
■ 2933MHz DDR4 memory and 24 DIMM slots for up to 7.5TB of total memory
■ Support for the Intel Optane DC Persistent Memory (128G, 256G, and 512G)
■ Intel Optane DC Persistent Memory Modules (DCPMM): 128G, 256G, and 512G[1]
■ Up to 7.5TB using 12× 128G DDR4 DIMMs and 12× 512G Intel Optane DC
Persistent Memory Modules
■ Four PCI Express (PCIe) 3.0 slots for 100G UCS VIC 1495
Figure 16-13 shows the Cisco UCS C4200 rack server and the C125 M5 rack server node.
Technet24
||||||||||||||||||||
Figure 16-13 Cisco UCS C4200 Series Rack Server and the C125 M5 Rack Server Node
The C4200 chassis extends the capabilities of the Cisco UCS portfolio in a two-rack-unit
(2RU) form factor supporting up to four Cisco UCS C125 M5 rack server nodes. The lat-
est update includes support for AMD EPYC 2 (Rome) 7002 processors. The AMD EPYC
2 processors have higher core density (up to 64 cores) and higher performance with an
enhanced AMD Zen 2 core design. The existing AMD EPYC 7001 processors will con-
tinue to be offered for flexibility of customer choice. Both CPU types deliver significant
performance and efficiency gains in a compact form factor that will improve your appli-
cation performance while saving space. The C4200 and C125 M5 nodes deliver outstand-
ing levels of capability and performance in a highly compact package, with the following
features:
■ AMD EPYC 7002 (Rome) series processors with up to 64 cores per socket, and
AMD EPYC 7001 (Naples) series processors with up to 32 cores per socket
■ Up to 1TB of DRAM using 16 64GB DDR4 DIMMs for two-socket CPU configura-
tion (eight DIMMs/memory channels per CPU)
■ 3200MHz 16G/32G/64G DIMMs for AMD EPYC 7002 (Rome) CPUs, and
2666MHz 16G/32G/64G DIMMs of AMD EPYC 7001 (Naples) CPUs
■ Either six direct-attached SAS/SATA drives or two NVMe plus four SAS/SATA
drives
■ Optional dual SD cards or M.2 modular storage for increased storage or boot drive
capacity
■ Support for the Cisco 12-G 9460-8i PCIe SAS RAID controller with 2GB Flash-
Backed Write Cache (FBWC)
||||||||||||||||||||
Cisco UCS C-Series Servers and the Cisco IMC Supervisor 519
■ Support for the Cisco 12-G 9400-8i PCIe SAS controller for use with external disk
arrays
The sixth-generation (M6) Cisco UCS C-Series rack servers, shown in Figure 16-14, are as
follows:
The Cisco UCS C220 M6 rack server extends the capabilities of the Cisco UCS rack
server portfolio. It incorporates the third-generation Intel Xeon Scalable processors with
more than 40% more cores per socket and 33% more memory versus the previous genera-
tion. These improvements deliver up to 40% more performance that will improve your
application performance. The Cisco UCS C220 M6 rack server delivers outstanding levels
of expandability and performance. It offers the following features:
■ Memory:
Technet24
||||||||||||||||||||
■ 3200 MHz DDR4 memory plus other speeds, depending on the CPU installed
■ 16× DDR4 DIMMs + 16x Intel Optane persistent memory modules for up to
10TB of memory
■ Up to three PCIe 4.0 slots plus a modular LAN On Motherboard (mLOM) slot
■ Support for Cisco UCS VIC 1400 Series adapters as well as third-party options
The C225 M6 rack server is single socket optimized. All I/O is tied to one CPU and its
128 PCIe lanes. Since each server supports up to 2TB and 64 cores per socket, many cus-
tomers find that one CPU server now meets their needs. This can reduce software licens-
ing and support costs leading to a better TCO.
■ One or two third-generation AMD EPYC CPUs, with up to 64 cores per socket
■ Memory:
■ 32 DIMM slots (16 DIMMs per CPU socket), 3200 MHz DDR4
■ Up to 4TB of capacity
■ Support for 1400 Series VIC and OCP 3.0 network cards
The Cisco UCS C240 M6 server extends the capabilities of the Cisco UCS rack server
portfolio with third-generation Intel Xeon Scalable processors supporting more than 43%
more cores per socket and 33% more memory when compared with the previous genera-
tion. This provides up to 40% more performance than the M5 generation for your most
demanding applications. It is well-suited for a wide range of storage and I/O-intensive
applications such as big data analytics, databases, collaboration, virtualization, consolida-
tion, and high-performance computing in its two-socket, 2RU form factor, as it offers the
following features:
||||||||||||||||||||
■ Memory:
■ 3200 MHz DDR4 memory plus other speeds, depending on the CPU installed
■ 16× DDR4 DIMMs + 16× Intel Optane persistent memory modules for up to
12TB of memory
■ Up to eight PCIe 4.0 slots plus modular LAN On Motherboard (mLOM) slot
■ Support for Cisco UCS VIC 1400 Series adapters as well as third-party options
Technet24
||||||||||||||||||||
The Cisco UCS S3260 server is based on a chassis, which supports two compute nodes,
each based on the second-generation Intel Xeon Scalable processor. The compute nodes
provide the computing power. The supported local storage is up to 1080TB in a com-
pact four-rack-unit (4RU) form factor. The drives can be configured with enterprise-class
RAID (redundant array of independent disks) redundancy or with a pass-through host
bus adapter (HBA) controller. Network connectivity is provided with dual-port up-to-
40Gbps nodes in each server, with expanded unified I/O capabilities for data migration
between network-attached storage (NAS) and SAN environments. This storage-optimized
server comfortably fits in a standard 32-inch-depth rack, such as the Cisco R 42610 rack.
■ Dual two-socket server nodes based on Intel Xeon Scalable processors. The M5
server node supports the following processors: 4214, 5218, 5220, 6238, 6240,
6262V, 4210R, 4214R, 5218R, 5220R, 6226R, and 6230R.
■ Massive 1080TB data storage capacity that easily scales to petabytes with Cisco
UCS Manager software.
■ Supported drives:
||||||||||||||||||||
■ PCIe Slot based with choice of Cisco UCS VIC 1455 Quad Port 10/25G, Cisco UCS
VIC 1495 Dual Port 40/100G, or third-party Ethernet and FC adapters.
■ Unified I/O for Ethernet or Fibre Channel to existing NAS or SAN storage environ-
ments.
■ Support for Cisco bidirectional transceivers, with 40Gbps connectivity over existing
10Gbps cabling infrastructure.
■ Choice of I/O Ethernet and Fibre Channel options: 1-, 10- or 40-Gigabit Ethernet
or 16-Gbps Fibre Channel
Cisco UCS management provides enhanced storage management functions for the Cisco
UCS S3260 and all Cisco UCS servers. Storage profiles give you flexibility in defining the
number of storage disks and the roles and uses of those disks and other storage param-
eters. You can select and configure the disks to be used for storage by a virtual drive.
A logical collection of physical disks is called a disk group, and a disk group configura-
tion policy defines the way in which a disk group is created and configured. A disk group
can be partitioned into virtual drives. Each virtual drive appears as an individual physical
device to the operating system. The policy specifies the RAID level for each disk group.
It also specifies either manual or automatic selection of disks for the disk group and the
roles for the disks. This feature allows optimization of the storage resources without
additional overhead and licensing costs. The RAID storage controller characteristics are as
follows:
■ M5 server node:
■ Dual-chip RAID controller based on LSI 3316 ROC with 4GB RAID cache per
chip and Supercap
■ Controller support for RAID 0, 1, 5, 10, 50, and 60 and JBOD mode, providing
enterprise-class data protection for all drives installed in the system
Technet24
||||||||||||||||||||
■ Pass-through controller:
■ Dual-chip pass-through controller with LSI IOC 3316 using LSI IT firmware
Nowadays, the organizations must meet the challenges of growing data center require-
ments because of a wider set of application deployment models:
■ Big data and analytics: Require scale-out architectures with large amounts of high-
performance storage. Data scientists need massive amounts of GPU acceleration for
artificial intelligence and machine learning processes.
Building and integrating data center infrastructures that meet these versatile requirements
is complex, challenging, and takes a lot of resources. Reducing this complexity is the only
way to support existing and new applications, ensure service delivery, maintain control
over data, and attain necessary performance. This makes the hyperconverged solutions
the right answer because they deliver the compute, networking, storage, virtualization,
and container support as one integrated and tested solution.
The Cisco HyperFlex System, shown in Figure 16-16, is an adaptive, hyperconverged sys-
tem built on the foundation of the Cisco UCS. It meets the requirements for the different
application deployments by supporting multicloud environments.
||||||||||||||||||||
The Cisco HyperFlex platform is faster to deploy, simpler to manage, easier to scale,
and ready to provide a unified pool of resources to power your business applica-
tions. You harness these resources with your choice of centralized management tools:
Cisco Intersight cloud-based Management as a Service (covered in Chapter 21, “Cloud
Computing”), Cisco HyperFlex Connect, Microsoft System Center Virtual Machine
Manager, Hyper-V Manager, or a VMware vSphere plug-in. Cisco HyperFlex systems
integrate into the data center you have today without creating an island of IT resources.
You can deploy Cisco HyperFlex systems wherever you need them—from central data
center environments to remote locations and edge-computing environments.
Cisco HyperFlex systems are based on Cisco UCS M5 rack servers. Based on Intel Xeon
Scalable processors, these fifth-generation servers have faster processors, more cores, and
faster and larger-capacity memory than previous-generation servers. In addition, they are
ready for Intel 3D XPoint nonvolatile memory, which can be used as both storage and
system memory, increasing your virtual server configuration options and flexibility for
applications.
Technet24
||||||||||||||||||||
The Cisco HX240c M5 node is excellent for balanced-capacity clusters. The HX240c M5
LFF node delivers high-capacity clusters, and the HX240c M5 All Flash node is excellent
for balanced-performance and capacity clusters. Each node configuration includes the
following:
■ Storage:
■ Cisco 12Gbps modular SAS host bus adapter (HBA) with internal SAS
connectivity
■ Software:
■ VMware vSphere ESXi 6.0 software preinstalled (ESXi 6.5 supported but not pre-
installed)
Cisco HyperFlex nodes can be deployed with various Cisco UCS B-series blade servers
and C-series rack servers to create a hybrid cluster. With a single point of connectivity
and management, you can easily scale your cluster to support more workloads and deliver
the performance, bandwidth, and low latency your users and applications need.
||||||||||||||||||||
■ The system operates in Intersight Managed Mode (IMM), as it is managed from the
Cisco Intersight.
■ The new Cisco UCS X9508 chassis has a midplane-free design. The I/O connectiv-
ity for the X9508 chassis is accomplished via frontloading, with vertically oriented
compute nodes intersecting with horizontally oriented I/O connectivity modules in
the rear of the chassis.
■ Cisco UCS 9108 Intelligent Fabric modules provide connectivity to the upstream
Cisco UCS 6400 Fabric Interconnects.
■ Cisco UCS X210c M6 compute nodes – blade servers designed for the new chassis.
The new Cisco UCS X9508 chassis provides a new and adaptable substitute for the first
generation of the UCS chassis. It is designed to be expandable in the future. As proof of
this, the X-Fabric slots are intended for future use. It has optimized cooling flows to sup-
port reliable operation for longer times. The major features are as follows:
■ A seven-rack-unit (7RU) chassis has 8× front-facing flexible slots. These can house
a combination of compute nodes and a pool of future I/O resources, which may
include GPU accelerators, disk storage, and nonvolatile memory.
■ 2× Cisco UCS 9108 Intelligent Fabric Modules (IFMs) at the top of the chassis that
connect the chassis to upstream Cisco UCS 6400 Series Fabric Interconnects. Each
IFM has the following features:
■ 8× 25-Gbps SFP28 uplink ports. The unified fabric carries management traffic
to the Cisco Intersight cloud-operations platform, Fibre Channel over Ethernet
(FCoE) traffic, and production Ethernet traffic to the fabric interconnects.
■ At the bottom are slots, ready to house future I/O modules that can flexibly con-
nect the compute modules with I/O devices. This connectivity is called “Cisco
UCS X-Fabric technology” because X is a variable that can evolve with new tech-
nology developments.
■ Six 2800W power supply units (PSUs) provide 54V power to the chassis with N,
N+1, and N+N redundancy. A higher voltage allows efficient power delivery with
less copper and reduced power loss.
Technet24
||||||||||||||||||||
The chassis supports up to eight Cisco UCS X210c M6 compute nodes. These are newly
designed Cisco UCS blade servers, specifically for the Cisco UCS X9508 chassis. The
main features are as follows:
■ mLOM virtual interface card: The Cisco UCS Virtual Interface Card (VIC) 14425
can occupy the server’s modular LAN On Motherboard (mLOM) slot, enabling up
to 50-Gbps of unified fabric connectivity to each of the chassis Intelligent Fabric
Modules (IFMs) for 100-Gbps connectivity per server.
■ Optional mezzanine virtual interface card: Cisco UCS Virtual Interface Card (VIC)
14825 can occupy the server’s mezzanine slot at the bottom of the chassis. This
card’s I/O connectors link to Cisco UCS X-Fabric Technology, which is planned
for future I/O expansion. An included bridge card extends this VIC’s 2× 50Gbps
of network connections through IFM connectors, bringing the total bandwidth to
100Gbps per fabric—for a total of 200Gbps per server.
||||||||||||||||||||
Summary 529
Summary
This chapter described the components of the Cisco UCS and HyperFlex system. In this
chapter, you learned about the following:
■ The Cisco Unified Computing System (UCS) is a complex, highly integrated solution
for the compute component of the data center.
■ The Cisco UCS Manager provides the management functionality and runs on the
Cisco Fabric Interconnects.
■ The northbound Cisco Fabric Interconnects connect the Cisco UCS to the LAN and
SAN infrastructures of the data center.
■ The physical connectivity inside the Cisco UCS is based on Ethernet physical links
and the use of the FCoE protocol for storage communication.
■ The servers are equipped with converged network adapters, called mezzanine cards,
that support converged communication.
■ The B-series servers are installed in the Cisco UCS 5108 blade chassis.
■ The blade chassis provides eight slots for the servers and takes care of their cooling
and power supply needs.
■ The Cisco UCS 2200/2300/2400 FEXs are installed in the blade chassis. They con-
nect the chassis to the Fabric Interconnects.
■ Half-width blade servers, such as the Cisco UCS B200 M5 and B200 M6, occupy a
single slot in the chassis.
■ The Cisco UCS B480 M5 is a full-width blade server, as it occupies two horizontal
slots in the chassis.
■ The C-series servers are rack-mountable servers that can operate as a part of the
Cisco UCS or as standalone servers.
■ When operating in standalone mode, the C-series servers are managed through the
Cisco Integrated Management Controller (CIMC) directly.
Technet24
||||||||||||||||||||
■ The Cisco UCS C4200 rack server is a chassis itself, as the memory and CPUs are
separated on the so-called compute nodes. This allows you to increase or decrease
the computing capacity based on your needs.
■ The Cisco UCS S3260 is a rack mount server that’s optimized to be a storage server,
with the support of a massive amount of local storage.
■ The latest generation of the Cisco UCS is the Cisco UCS X-Series Modular platform.
■ The Cisco UCS-X platform consists of the new Cisco X9508 chassis and the new
X210c compute nodes.
Reference
“Cisco Servers – Unified Computing System (UCS),” https://www.cisco.com/c/en/us/
products/servers-unified-computing/index.html
||||||||||||||||||||
Chapter 17
This chapter discusses the Cisco Unified Computing System (UCS) abstraction. The
Cisco UCS is a complex system that consists of different physical components covered
in the previous chapter, such as the B- and C-series servers, which provide the computing
resources, the Cisco 5108 blade chassis, the IOMs (FEXs) for the physical connectiv-
ity of the servers, and the Fabric Interconnects, which connects the Cisco UCS with
the other components of the data center and is the platform on which the Cisco UCS
Manager application runs. Therefore, you can see that there are various components with
different functions that, through integration and centralized management, operate as a
single entity. And on top of this is the major benefit of using the Cisco UCS—the ability
to abstract the servers from the physical environment, to improve the utilization of the
computing resources, and to provide redundancy and flexibility. The abstraction in the
Cisco UCS is achieved by using two components to create a server—the physical server
node and the associated service profile. Only when the two are associated with each
other in a one-to-one relationship is there a server that can be used. The service profile
is the logical component created in the Cisco UCS Manager by the administrator, and
it describes the server. With other words, it contains all the needed configurations, the
identity values used by the server, and the communication to the data center LAN and
SAN. Then, when the service profile is associated with the physical server node, either
a B- or C-series server, the two components create a server that can be used for further
installation of the operating system (OS) and the applications. In this chapter, you will be
acquainted with the Cisco UCS Manager, the server environment, the global policies, or
how the Cisco UCS Manager discovers the hardware it will manage. Then you will learn
about the server identities and how they are abstracted in the Cisco UCS. You will look
into what exactly a service profile and a service profile template are, what policies can be
used in them, and how to configure them in the Cisco UCS Manager. At the end of this
chapter, you will learn about a centralized management application that allows you to
oversee multiple Cisco UCS domains from a single pane of glass.
Technet24
||||||||||||||||||||
The two Fabric Interconnects in a Cisco UCS cluster operate in a high-availability and
redundant mode. When it comes to the data communication of the servers, or the data
plane, the two Fabric Interconnects are active (that is, they both process the data commu-
nication or provide active data paths). At the level of the management and control planes,
they operate in an active-standby manner. This means that after the Fabric Interconnects
have booted up, there is a negotiation between the two, based on which is determined
to be the primary one. On the primary Fabric Interconnect, the management and control
applications and services run in active mode; on the other Fabric Interconnect (the sub-
ordinate), they run in standby mode. The active instances will be distributed between the
two for load balancing. The cluster link is used to synchronize configuration and state
information between the active and standby instances of the applications and services.
But what are these applications and services that run on the Fabric Interconnects?
■ Kickstart image file: This is the Linux kernel of the operating system.
■ System image file: This file contains all the applicable modules of the Nexus OS
for the hardware platform on which it is used. Let’s not forget that Cisco uses the
Nexus OS on a variety of devices—the Cisco Nexus and MDS switches as well as
on the Cisco UCS Fabric Interconnects.
■ Cisco UCS Manager application- an XML application that manages the Cisco UCS.
So far, you know that a Cisco UCS domain consists of two Fabric Interconnects running
in a redundant mode, also known as a Cisco UCS cluster. On both Fabric Interconnects
the NX-OS is loaded and the services needed for the Cisco UCS are up and running
in a load-balancing and redundant mode. This means that there will be a single active
instance for each service and one standby. Which service will be active on which Fabric
Interconnect, regardless of whether it’s the primary or the subordinate, is decided dur-
ing the negotiations at the start. The primary Fabric Interconnect is the one on which
the Cisco UCSM instance runs in active mode. The other is the subordinate Fabric
Interconnect because on that one the Cisco UCSM application runs in standby
||||||||||||||||||||
(passive) mode. It will receive configuration and state updates from the primary Fabric
Interconnect and then update itself. The following points are important to remember:
Managed End-Points
Figure 17-1 Cisco UCS Primary/Subordinate Cluster with Nexus OS and the Cisco
UCSM Application
The Cisco UCSM is an XML application. It is based on an XML scheme that represents
the hierarchical structure of the logical and physical components of the Cisco UCS.
This allows for each of these components to be addressed and for the related configura-
tion and information to be applied to them. All the configuration and state information
is stored in the Cisco UCSM database in XML format. The physical components under
Cisco UCSM management are called managed endpoints, and they include the following:
■ The switch elements: The Fabric Interconnects, the Cisco FEXs, the switch modules
and ports for the Fabric Interconnects
■ The chassis elements: The chassis management controller (CMC), the chassis I/O
modules (IOMs), and the chassis power and fan modules.
Technet24
||||||||||||||||||||
■ Managed server endpoints: The servers and their components, such as the disks,
mezzanine adapters, the BIOS, the Cisco Integrated Management Controller (CIMC),
and so on
Managed Endpoints
The Cisco UCSM, which is the central point for the management and configuration of
the whole Cisco UCS domain, sits between the administrator and the server system. The
administrator, or other applications and systems, can communicate with the Cisco UCSM
through the available management interfaces:
■ Cisco GUI: The graphical user interface initially was based on Java, and an addi-
tional client application was used to access it, but with the latest versions of the
Cisco UCSM, starting with major version 4, the GUI is based on HTML5 and only a
browser is needed.
■ Cisco CLI: The command line interface to access the Cisco UCSM
■ Third-party access: Using the Cisco UCS XML application programming interface
(API), third-party applications can connect to the system.
All the management access—whether you are using the GUI or the CLI, or it’s an applica-
tion trying to communicate—goes through the Cisco UCSM XML API. This means your
actions will be converted into XML and sent to the appropriate managed endpoint as a
configuration request, and the operational state will be communicated back.
The operational state communicates information for the managed endpoint. This consists
of not only the state of the component but also monitoring and reporting information.
||||||||||||||||||||
The Cisco UCSM supports widely used industry-standard protocols such as the
following:
■ KVM over IP
■ SNMP
■ SMASH CLP
■ CIM XML
■ IPMI
It also supports the following Cisco features for monitoring and reporting:
■ Call Home: This feature automates the notifications and even creates Cisco TAC
cases based on physical issues occurring within the system.
■ Cisco UCSM XML API: With the XML API, the Cisco UCSM can be integrated
with third-party monitoring and reporting tools.
■ Cisco GUI and CLI: The monitoring and reporting information is available in the
GUI and CLI of the UCSM as well.
The Cisco UCSM GUI, shown in Figure 17-3, consists of two major elements: the navi-
gational pane, where you traverse the XML hierarchical tree, and the content pane. Once
you select a component, either physical or logical, you will see in the content pane to the
right all the related information and options.
Additionally, the buttons in the top-right corner allow you to access the quick links, help,
administrative session properties, and “about” information as well as to exit.
Technet24
||||||||||||||||||||
Another convenient feature, located at the middle top of the GUI, is quick information
regarding the critical, major, and minor faults and warning messages (see Figure 17-4). By
clicking one of these icons, you will be taken to the appropriate page containing informa-
tion about what is happening with the system.
The navigational pane is divided into eight tabs. The Equipment tab, shown in Figure 17-5,
is where you can access all the physical components of the Cisco UCS. This is where you
can get information for and access to the Fabric Interconnects, the chassis and its compo-
nents, and the servers and their components.
The Equipment tab points only to physical components and policies related to the hard-
ware discovery, power redundancy, fan control, and so on. You can access the topol-
ogy view for the UCS domain in a separate tab for the Fabric Interconnects, the B- and
C-series servers’ hardware, the firmware management policies, thermal information,
decommissioned servers, and so on. The information you can gather from the Equipment
tab is important for understanding the physical connectivity and the resources in the
UCS domain. The Equipment tab is where you can monitor the discovery process and
configure policies related to it.
||||||||||||||||||||
The UCS Manager is responsible for managing, pushing configurations to, and monitor-
ing the hardware components of a Cisco UCS domain. To be capable of doing this, the
UCS Manager needs to know what is connected and how. The process of learning and
acquiring this information is called discovery. In the discovery process, the UCS Manager,
which runs on the Fabric Interconnects, communicates with the control and manage-
ment components from the underlying hardware. These are the Chassis Management
Controllers (CMC) on the IOMs, the Cisco Integrated Management Controllers (CIMC)
on the B- and C-series servers, and, in the case of C-series servers’ integration through
Cisco Nexus 2000 FEXs, their CMCs as well. The whole process of discovery starts when
the UCS Manager finds Fabric Interconnect ports that are configured as server ports. This
means that servers are connected to these ports. The UCS Manager establishes communi-
cation with the CMCs from the IOMs. The CMCs, as they are the management and con-
trol components of the IOMs in the server chassis, have already gathered the following
information for the hardware in the chassis:
■ Thermal sensors
■ Fans status
■ Servers
For the servers’ discovery, the CMCs of the IOMs connect through dedicated manage-
ment interfaces in the mid-plane of the chassis to the CIMCs of the servers. The CIMCs
Technet24
||||||||||||||||||||
communicate all the information for the hardware of the server and its status to the
CMC. This means that when the UCS Manager starts to communicate with the CMCs,
they already have all the information for the hardware in the chassis, including the serv-
ers. Thus, the UCS Manager can build a topology and gather the information for the
available hardware resources and their connectivity.
■ Chassis/FEX Discovery Policy: Defines the minimum number of links that must
exist between a Fabric Interconnect and an FEX in order for the FEX and the
equipment behind it to be discovered.
■ Power Policy: Defines the power redundancy. Depending on the setting, there
might be requirements for the number of power supplies in the chassis.
■ Fan Control Policy: Controls the fan speed to optimize power consumption and
noise levels.
■ MAC Address Table Aging: Configures how long a MAC address will remain in
the MAC address table.
■ Global Power Allocation Policy: Specifies the power capping policies (how much
power is allowed per server) to apply to the servers in a chassis.
■ Info Policy: Enables the information policy to display the uplink switches to
which the Fabric Interconnects are connected. By default, this is disabled.
■ Global Power Profiling Policy: Defines how the power cap values for the servers
are calculated.
■ Hardware Change Discovery Policy: Any change in the server hardware compo-
nent will raise a critical “hardware inventory mismatch” fault.
||||||||||||||||||||
■ Server Discovery Policies: These define the behavior of the UCS Manager when a
new server is discovered.
■ SEL Policy: A configuration that allows you to export the system event logs from
the hardware.
The next tab is the Servers tab, shown in Figure 17-7. The Equipment tab is the only
one that shows information for the physical equipment. The Servers tab provides all the
logical configuration and state information related to the servers. Here is where you can
create a service profile or look at the available one. Also, you can create service profile
templates and all the policies related to the servers, such as the boot policy or the BIOS
policy. In the Cisco UCS Manager, anything created for a specific aspect of the server
configuration is called a policy. That’s why in the Servers tab there is a separate section
for all the policies that can be created and used in a service profile. These include not
only the forementioned BIOS and boot policies, but also maintenance, memory, scrub,
power, adapter, and many more policies. In general, the Cisco UCS Manager is so granu-
lar that you can create a policy for any specific piece of configuration you can think of.
This allows for extreme flexibility when it comes to the configuration of the servers in
your data center. At first glance this might not look very important, but when you start
solving the challenges in a real data center, you will appreciate this flexibility.
Technet24
||||||||||||||||||||
The LAN tab, shown in Figure 17-8, is where you perform all the configurations related
to the network connectivity of the Cisco UCS as a whole and the network connectivity
of the servers in particular. Here, under the LAN Cloud option, you create the VLANs in
which all the server communication will occur, and you configure the communication of
directly attached network devices under the Appliances section. In this tab, you also cre-
ate the specific configuration for the servers’ communication.
||||||||||||||||||||
The fourth tab is the SAN tab (see Figure 17-9). In the SAN tab, you create all the logical con-
figuration related to the storage communication of the servers. This is about the Fibre Channel–
based communication of the servers. All of the needed policies, identity pools, VSANs, and
HBA adapters’ configuration is created and managed in this tab—both for Cisco UCS commu-
nication with the data center SAN and with directly attached FC storage appliances.
The VM tab, shown in Figure 17-10, allows for the integration of the Cisco UCS Manager
with the VMware and Microsoft hypervisors to reach up to the level of the virtual
machines.
Technet24
||||||||||||||||||||
The Storage tab, shown in Figure 17-11, allows for the creation of local storage profiles
and policies for the provisioning of the local storage.
The Chassis tab, shown in Figure 17-12, is a new one. It allows the DC administrators to
focus on tasks specific to the management of multiple chassis in a Cisco UCS domain. In
one place, you can create and manage all the needed policies for the chassis of the blade
servers.
||||||||||||||||||||
The Admin tab, shown in Figure 17-13, is where all the configuration related to the
administration of, monitoring of, and access to the Cisco UCS is configured. Here you
can access the faults, events, and audit logs. You can also configure the authentication
and authorization, which can be based on creating a database of local users or using
external authentication, authorization, and accounting (AAA) servers such as a Microsoft
Windows Active Directory (AD).
Technet24
||||||||||||||||||||
In order to keep things clear and simple, our explanation will focus on a Cisco UCS
cluster that consists of a pair of Fabric Interconnects, a blade chassis, and B-series servers
in the chassis.
A pair of Fabric Interconnects is a requirement for a Cisco UCS cluster. A topology with
a single Fabric Interconnect is allowed for lab and test purposes, as it does not provide
redundancy. The two Fabric Interconnects must be connected to each other only using
the dedicated cluster ports, which are marked with L1 and L2 (L1 connects to L1, and
L2 connects to L2). This is all you must do for the cluster link—connect the cluster port.
The system will automatically detect the link and configure a port channel using the two
physical links. This happens transparently.
When it comes to the data plane connectivity of the Cisco UCS, you have external con-
nectivity to the LAN and SAN infrastructures of the data center and internal connectiv-
ity to the system, starting from the Fabric Interconnects and going down to the servers.
The external connectivity to the LAN infrastructure is created by linking the Ethernet
ports of the Fabric Interconnects to the upstream LAN switches. Based on best practices,
and in order to achieve redundancy, each Fabric Interconnect can be connected to a pair
of upstream LAN switches. Technologies such as port channels or vPCs can be used for
adding additional redundancy and load balancing. For the Cisco UCSM to know that
through these specific Ethernet ports the Cisco UCS is connected to the LAN infrastruc-
ture, the ports need to be configured as such. This happens by selecting the port on the
Fabric Interconnect and assigning it the Uplink Port role.
Creating and configuring the connectivity to the SAN infrastructure is similar, but there
are some differences. Again, you have to select ports to connect to the SAN. This time
there are some requirements and some options. The major choice that needs to be made,
depending on the design of your data center, is which storage communication protocol
to use. The options are iSCSI, NFS, CIFS/SMB, FC, and FCoE. For the iSCSI, NFS, and
CIFS/SMB options, there is no need for any additional upstream connectivity other than
the LAN connectivity, as these protocols are using Ethernet and IP connectivity to carry
the storage communication.
For the FCoE connectivity, the Ethernet uplink ports need to be configured as FCoE
Uplink ports or as Unified ports, depending on whether you will use dedicated Ethernet
links for the storage communication or you will use the same Ethernet links for both
Ethernet and FCoE traffic.
||||||||||||||||||||
In both cases, you must consider the SAN design best practices, which require each
server to be connected to two separate physical SAN infrastructures. As all the server’s
data plane communication goes through the Fabric Interconnects, this means that each of
them can be connected to only one SAN, as they will become an extension of the SANs
down to the server.
The same design considerations apply if you want to connect the Cisco UCS to the SANs
using native Fibre Channel connectivity. For this, you must use dedicated FC ports. Once
again, each Fabric Interconnect connects to a single SAN (no cross-connects like in LAN
connectivity). As discussed Chapter 11, “Fibre Channel Protocol Fundamentals,” the
redundancy is achieved in a different way than in the LAN infrastructure. The FC ports
must be assigned the Fiber Channel Uplink ports role.
Figure 17-14 illustrates the internal and external physical connectivity of Cisco UCS.
SAN A SAN B
LAN
FC
ETH ETH FC
FI A FI B
Eth&FCoE Eth&FCoE
Midplane
Blade
Servers
MEZZANINE
Once external connectivity, or upstream connectivity to the other elements of the data
center, is created, it is important to understand how the internal connectivity is realized
so that you know what is available to the servers to consume.
As already mentioned, the servers’ communication always goes through the Fabric
Interconnects. Here we will explain the physical internal connectivity of the Cisco UCS
starting with the Fabric Interconnects and moving down to the servers.
In each Cisco UCS 5108 blade chassis are two FEXs that connect to the Fabric
Interconnects externally, thus securing the blade chassis’ outside connectivity. Each FEX
connects to only one of the Fabric Interconnects. This allows for the separation of the
Technet24
||||||||||||||||||||
physical infrastructure to be maintained down to the level of the physical server. This is
Ethernet connectivity, as the Ethernet links are shared to carry the management and con-
trol communication, the Ethernet data communication of the servers, as well as their stor-
age communication using the FCoE protocol. That’s why these links operate at speeds of
10, 25, and 40Gbps.
The number of links between the FEX and the Fabric Interconnect define the total band-
width of the connectivity of the chassis using that path. This is the available communi-
cation capacity to the servers in that chassis for this path. The other FEX is connected
to the other Fabric Interconnect in the same way. This ensures the blade chassis has two
paths of communication: one through Fabric Interconnect A and the other through Fabric
Interconnect B. Both paths are active and available.
Another important point is that the FEXs do not perform Layer 2 switching for the data
plane communication. This results in one hop less of processing the server communica-
tions. This makes transmission faster, and it decreases latency. This is possible because,
instead of the standard Layer 2 switching, inside the FEX is a mapping between the
external ports and the internal ports.
The FEX is connected to the midplane of the blade chassis. The midplane is a passive
component that provides multiple electrical lanes for the signal, hard wiring the different
components as required. Also connected to the midplane are the blade servers when they
are installed in the chassis; this is done through a special blade connector. Therefore, the
midplane provides the internal physical connectivity at the lowest physical layer. Think of
the midplane as the physical network cabling. This physical connectivity is utilized by the
internal ports on the FEX, from one side, and the mezzanine adapter of the server. The
FEX has dedicated internal ports for each server slot in the chassis. Once installed in the
blade chassis, these ports are connected to the midplane and are available to the server
slots. How much of the internal FEX data connectivity will be utilized by a blade server
depends on two things:
The mezzanine adapter in the blade server is installed in a specific slot, which provides for
the physical connectivity in both directions:
■ External: The mezzanine adapter, through the mezzanine slot and the blade con-
nector, is connected to lanes from the midplane. For each mezzanine slot, half of
the lanes of the midplane are hard-wired to one of the FEXs, and the other half
lead to the other FEX. If we look at the Cisco UCS 1380 mezzanine adapter, shown
in Figure 17-15, we see that it has two 40-Gbps ports, as each connects to one of
the available paths through the midplane to one of the FEXs. Each of the 40-Gbps
ports can be divided into four 10-Gbps ports. Depending on the capabilities of the
installed FEXs, the server can have up to 40-Gbps available bandwidth for each path.
||||||||||||||||||||
■ Internal: Again, through the mezzanine slot, the adapter is connected to the PCIe
bus of the server. Through the PCIe bus, the OS is capable of seeing and utilizing the
vHBAs and the vNICs on the mezzanine adapter. The Cisco virtual interface cards
utilize a programmable silicon that allows you to program (or create) multiple NICs
and HBAs for the purposes of the OS on the server and for providing redundant con-
nectivity and load balancing. These logical adapters are called vNICs and vHBAs, but
these are not the vNICs and vHBAs from the VMware ESXi hypervisor. Take note
that although the terminology is the same, these are two different vNICs and vHBAs.
Let’s try to clarify it a bit more: as you know, the VMware ESXi creates a virtual
image of the physical hardware, and after that the virtualized resources are utilized
to create the isolated spaces, which are called virtual machines. In the case of net-
working, the ESXi hypervisor creates a virtual image of the network interface card
that it sees on the server and names it a vmNIC. Afterward, when a VM is created, a
logical network adaptor is created, named vNIC, which connects to a virtual switch,
which in turn is connected to a vmNIC for the external connectivity of the ESXi
host and the VMs. In the case of using a Cisco VIC, the Cisco vNIC will be seen as
a “physical” network adaptor by the ESXi and will be created. Therefore, the Cisco
VIC vNIC is equal to a VMware ESXi vmNIC. This is important for you to under-
stand to avoid further confusion.
40Gbps to FI A 40Gbps to FI B
Failover
Primary
PCIe bus
To summarize, starting with the Cisco VIC in the server and going through the midplane
and the two FEXs, and reaching the two Fabric Interconnects, two communication paths
Technet24
||||||||||||||||||||
are available for each server. These paths are physically separated, and their bandwidth
depends on the capabilities of the hardware used, such as FEXs and mezzanine adapters.
However, this is the physical infrastructure for the communication of each server. What
and how much of it a server will utilize depends on how you configure it in the Cisco
UCS Manager by using a service profile. Before we can explain the service profiles, which
allow us to abstract the hardware resources of the Cisco UCS, you will need to under-
stand the challenge of the static hardware servers.
Each physical server has an identity that uniquely identifies it for the purposes of net-
work and storage communication, and the operating system. This identity is used for
security purposes and for licensing. As such, it is tightly connected to the behavior of
the applications running on the server. The result is that the applications are stuck to the
identity of the server. If something happens with the server, such as a hardware failure
or an upgrade, this identity changes. This can affect the applications in different ways,
from needing a slight reconfiguration to needing to be reinstalled with a new configura-
tion. This is usually disruptive and affects the provided services. These identity values are
||||||||||||||||||||
the UUID of the server, the MAC addresses used for the network communication, the
WWPN and WWNN addresses for the Fibre Channel communication, and the BIOS and
firmware settings.
With the Cisco UCS, this challenge is solved through hardware abstraction, which is
achieved by removing the identity and configuration of the server from the physical
hardware. All of the identity information and the needed configuration for the server is
contained in a logical object created in the Cisco UCS Manager, called a service profile.
When the service profile is associated with a physical server, the identity and configura-
tion are applied to it. If the service profile is disassociated, the identity stays with the
service profile, and it will be used on the physical server to which the service profile is
associated. This mobility of the server identity and configuration, achieved through the
decoupling from the physical hardware via the use of service profiles, allows the physi-
cal server to be used only as the provider of the needed hardware resources. As long as
a physical server has the needed hardware resources, as required by the service profile,
it can be used. It does not matter which one is the physical server—the first, second, or
the fifth one in the blade chassis—or whether it’s going to be on the same chassis or on
another chassis from the same Cisco UCS domain.
The identity values can be manually configured for each service profile. To avoid over-
lapping values, you can use identity resource pools, which can be created in the Cisco
UCSM, and from them the service profiles can consume identity values.
Each computer, whether it’s a server, desktop computer, or laptop, has a unique identifier
that is burned into the BIOS. This is called a universally unique identifier (UUID). UUIDs
are 128-bit identifiers with the following characteristics:
When the UUID is assigned manually, the whole 128-bit value needs to be entered. For
automating the assignment, you can use a UUID pool. When a UUID pool is created (see
Figure 17-17), the Cisco UCSM follows some rules:
■ The Cisco UCSM divides the 128-bit UUID into two pieces: a 64-bit prefix and a
64-bit suffix pool.
■ To create a UUID, you take the prefix and a value from the suffix pool. This results
in a unique ID.
■ Cisco recommends using the same 24-bit organizationally unique identifier (OUI)
across all the identity pools.
Technet24
||||||||||||||||||||
For the network communication, you use physical network addresses called Media
Access Control (MAC) addresses. The MAC address uniquely identifies a network inter-
face and has a 48-bit value. In the Cisco UCS, the MAC address is assigned to the vNIC,
which is visible to the operating system (OS). Just like with the UUIDs, the MAC address
value can be configured manually in the service profile, or you can use a MAC pool from
which to derive the value (see Figure 17-18). Cisco recommends using a Cisco OUI of
00:25:B5 as the first 24 bits of the MAC address.
||||||||||||||||||||
The World Wide Node Name (WWNN) and the World Wide Port Name (WWPN)
addresses are the physical addresses used in the Fibre Channel protocol communication.
Both have the same structure and a size of 64 bits, but it is important to remember that
they identify different things in the communication. The WWNN uniquely identifies the
device. Usually, it is the HBA of the server, but in the Cisco UCS, as the HBAs are created
in the silicon of the mezzanine adapter, and are virtualized, the WWNN will represent
the server. The WWPN identifies an FC port in the storage communication. In the Cisco
UCS, a WWPN will uniquely identify a vHBA from the mezzanine adapter. Again, just
like with the previously described types of identities, the WWNN and WWPNs can be
set manually in the service profile, or you can use WWNN pools and WWPN pools (see
Figure 17-19). In both approaches, special attention has to be paid to the values used. The
WWNN must be a different value from the WWPN—especially when the WWNN and
WWPN ranges of values are configured in the pools. If they overlap, the Cisco UCSM
will fail to associate the service profile with the server, as it will assign first the WWNN
value. When it tries to assign the WWPNs, if there is an overlap with the WWNNs, it
will stop the procedure with a failure message.
Besides the identity resource pools, which can also be referred to as logical resource
pools since they are created by the Cisco UCS administrator and the values are generated,
there is one other type of pool in the Cisco UCSM: the server pool. The server pool is
also referred to as a physical pool because it is used to group and classify the existing
physical servers in a Cisco UCS domain, based on their hardware characteristics, such as
model and type of the CPU, number of cores, minimum and maximum memory, mezza-
nine adapters, and so on. Server pools allow you to utilize in full the mobility of the ser-
vice profiles. When a server pool is specified in a service profile, the physical server will
Technet24
||||||||||||||||||||
be taken from it. In case of a failure, the service profile can get another physical server
from the same pool, and it will provide the required hardware, as based on this it was
included in the server pool.
A physical server can belong to a server pool only when it is not associated with a service
profile. It can also belong to multiple server pools, as long as it matches the requirements
for the hardware (see Figure 17-20). The moment that a service profile is associated with
a physical server, then it becomes unavailable in the pool, or pools, to which it belongs.
There are two options for how a server can be added to a server pool:
■ Manual: The administrator manually adds the server to one or multiple server pools.
■ Automated: This is done using a server pool policy, which has an empty server pool
and a qualification policy. The qualification policy is a set of hardware requirements.
When a physical server is discovered by the UCSM, its hardware is checked against
the qualification policy. If it matches the requirements, it will be added to the empty
server pool.
■ Compute node: This is the physical server, which provides the needed computing
resources, such as CPUs, memory, local storage, and mezzanine adapters for physical
connectivity.
||||||||||||||||||||
■ Service profile: This is the logical construct created in the Cisco UCSM that defines
the following items related to the server:
■ Identity: The combination of UUID, MAC addresses, and WWNN and WWPN
addresses.
■ Connectivity: This defines the number of vNICs and vHBAs that will be con-
nected to the available communication paths and used by the server in an active
manner.
■ Policies: These are the pieces of the service profile that define the specific con-
figuration and behavior related to the server and its components.
When a service profile, shown in Figure 17-21, is associated with a compute node, the
result is a server that can be used. Without a service profile, the physical server cannot
be used in the Cisco UCS. Also, there is a 1:1 relationship between the service profile
and the compute node. At any given time, a service profile can be associated only with
a single server, and a compute node can be associated only with a single service profile.
This also means that for each physical server used in the Cisco UCS, a service profile is
needed.
During the association process, the Cisco UCSM pushes the service profile down to the
CIMC of the server, together with the instruction to apply the configuration to the
Technet24
||||||||||||||||||||
compute node. Then the CIMC reboots the server, as it needs the server to boot from a
specialized operating system that resides in the storage of the CIMC that’s designed to
take all the configuration from the service profile and apply it to the correct components
of the server. Once this process has finished, the compute node is rebooted again. After
the second reboot, the association of the service profile with the server has finished, all
the needed configuration and identity values are applied to the server, and the server is
ready for the installation of the operating system (a bare-metal OS or a virtualization
hypervisor).
When the service profile is disassociated from the compute node, the same process is
followed, but this time the configuration is reset to the default values, and the identity
values are removed. The identity values always stay with the service profile, and when the
service profile is associated with another server, the identity is applied to the new com-
pute node.
The one challenge with the service profiles is that a service profile is needed for every
server. To optimize the process of creating the needed amount of service profiles, espe-
cially when they are to be similar, you can use a service profile template.
The service profile template is a special type of service profile that is used to generate
multiple service profiles at once. They will be similar, which means they will have the
same configuration and policies, but each will have a unique identity. For example, all the
generated service profiles will have the same BIOS configuration policy, the same vNICs
and vHBAs, but each vNIC and vHBA will have a unique physical address. Because of
this, during the process of creating multiple service profiles from a service profile tem-
plate, you will need to get multiple unique identity values. This is achieved by using iden-
tity resource pools.
There are two types of service profile templates (detailed next), and it’s important to
know the difference between them, as this affects the service profiles generated from
them (see Figure 17-22):
■ Initial template: Service profiles are created from an initial template; they inherit all
the properties of the template. Any changes to the initial template do not automati-
cally propagate to the bound service profiles.
■ Updating template: The service profiles created from an updating template inherit
all the properties of the template and remain connected to the template. Any changes
to the template automatically update the service profiles created from the template.
To create a service profile template, you must start the wizard, which is located in the
Servers tab. It is similar to the Service Profile Wizard, as the major difference is that you
have to set the type at the start.
||||||||||||||||||||
■ Configuration policies
■ Operational policies
Examples of configuration policies are the QoS policies— they are applied to the con-
figuration of the vNICs and the vHBAs, and are used to configure how the traffic will be
handled. Another example is the boot policy, which configures the boot order, and so on.
Operational policies are the BIOS policy, power control policy, scrub policy, maintenance
policy, and so on.
Some policies are specific and need a little bit more explanation. The scrub policy
defines what happens with the BIOS configuration and with the data on the local storage
during a disassociation of the service profile. It allows the administrator to instruct the
system to erase the local data and reset the BIOS to the default settings.
If the physical server needs to run specific versions of the firmware for specific com-
ponents, a firmware policy can be used with the service profile. It can specify which
versions of the firmware are needed. Then, during the association process, the system
will first check the versions of the firmware currently running on the compute node, and
Technet24
||||||||||||||||||||
based on the information from the firmware policy, it will retrieve the needed versions
from the Fabric Interconnects and perform the upgrades before the rest of the configura-
tion in the service profile is applied.
One extremely important policy is the maintenance policy, and Cisco strongly recom-
mends that you always use it with every service profile. The maintenance policy defines
when the changes you make in a service profile are applied to the compute node. It
affects the behavior of a service profile already associated with a compute node. In this
situation there are certain changes that can require the use of a specialized OS from the
CIMC that applies these changes. As this will be disruptive to the operation of the server,
the maintenance policy allows you to define whether such changes will be applied after
the administrator’s acknowledgment or immediately.
At the first step, you have to specify a name for the service profile. Here it is important
to note that the names in the Cisco UCSM are important to be unique for the same type
of objects. As the UCSM is an XML-based application, it refers to the objects using their
||||||||||||||||||||
names, and if two objects have different names, they are different for the UCSM, no mat-
ter what their content is, even if it is the same. That’s why it’s important to use unique
names.
After a name for the service profile is specified, the next thing to do is to specify the
UUID that will be used for the server (see Figure 17-24). There are three options:
■ Hardware Default: The service profile will not provide a UUID value. Instead, it will
use the UUID in the BIOS. This means this service profile will not be mobile.
■ Manual Using OUI: Allows the administrator to enter a UUID value manually.
■ Domain Pools: Select a UUID pool from which a UUID value will be taken. If the
administrator hasn’t created UUID pools in advance, there is a link that allows a
UUID pool to be created without exiting the Service Profile Wizard.
Figure 17-24 Cisco UCS Service Profile Wizard – Identity Service Profile
The second step is for storage-related configuration (see Figure 17-25). Here you can
select or create a storage profile that defines the number of storage disks, roles, and the
use of these disks, and other storage parameters. With it you can configure multiple
virtual drives and select the physical drives used by a virtual drive. Additionally, you can
configure the local disk configuration policy, which defines the local RAID controller
and disks configuration.
Technet24
||||||||||||||||||||
The third step is for networking (see Figure 17-26). The vNICs that will be used by the
server are created and configured here. You have to switch to Expert mode to be able
to add vNICs. A vNIC can use one communication path at a time. You have to specify
a name for the vNIC, which will be the method to acquire a MAC address value, as the
options are to create it manually or use a MAC address pool. You have to select through
which Fabric Interconnect this vNIC will communicate, A or B, and whether you want to
enable the hardware-based failover. The vNICs in the Cisco UCS are capable of switching
the paths of communication in case of a failure in the hardware. This is extremely fast
switching, and it practically cannot be detected.
The next thing that needs to be defined is which VLANs are allowed to communicate
through this vNIC. Optionally, you can configure the MTU size, a QoS policy, and some
other operational policies.
The SAN Connectivity step, shown in Figure 17-27, allows you to create and configure
the vHBAs that will be used by the server to communicate with the SAN infrastructures.
Just like with the network configuration, you have to switch to Expert mode to be able to
add vHBAs. There is no hardware failover supported for the vHBAs because of the design
standards for the SANs; that’s why, if you need to have redundancy, it is strongly recom-
mended that you have at least one vHBA per communication path.
Once the WWNN assignment method is selected, manually or from a WWNN address
pool, you can add the vHBA.
For the vHBA you have to specify a name, the WWPN assignment method, which fabric
path will be used (A or B), which VSAN, the QoS, and so on.
||||||||||||||||||||
Figure 17-26 Cisco UCS Service Profile Wizard – Networking and vNIC Creation
Figure 17-27 Cisco UCS Service Profile Wizard – SAN Connectivity and vHBA
Creation
The Zoning step, shown in Figure 17-28, allows you to create a Fibre Channel zoning
configuration that will apply to the vHBAs of the server.
Technet24
||||||||||||||||||||
The vNIC/vHBA Placement step, shown in Figure 17-29, allows you to manually change
which lane on the midplane will be used by which vNIC and vHBA. The general recommen-
dation is to let the system make this decision, but in case of issues, you can change it here.
If you need to map a specific image to the server, you can create a vMedia policy in the
vMedia Policy step of the wizard (see Figure 17-30). The supported protocols are NFS,
CFS, HTTP, and HTTPS, as the software image can be mapped as a CDD or an HDD.
||||||||||||||||||||
The Server Boot Order step, shown in Figure 17-31, allows you to create the boot order
for the server. Here you can specify the local and remote boot devices and their order.
You can select a remote network-based boot, over the vNIC or an iSCSI vNIC, or specify
a boot from a specific storage system over the vHBA.
Figure 17-31 Cisco UCS Service Profile Wizard – Server Boot Order
The next step allows for the selection of a maintenance policy (see Figure 17-32). If there isn’t
one, you can create a new one. The maintenance policy, a highly recommended component,
Technet24
||||||||||||||||||||
allows you to specify when any disruptive changes in the service profile will be applied to
the server: immediately, after acknowledgment from the administrator, or scheduled.
The Server Assignment step, shown in Figure 17-33, is where you can select the physical
server that will be used by the service profile. It is not mandatory to select a server at this
point. You can just skip this step and assign a physical server later. The other options here
are to select from the existing server and to select a server pool.
||||||||||||||||||||
Another important option at this step is to select whether you need to use a firmware
management policy. With it, you can specify the versions of the firmware for the differ-
ent components of the server, such as BIOS, mezzanine adapter, and so on.
The last step, shown in Figure 17-34, is where you specify the operational policies. Here
you can select or create a custom configuration for the following items:
■ BIOS
■ Management IP Address
■ Scrub Policy
Technet24
||||||||||||||||||||
Once you click the Finish button, the service profile will be created. The system will
warn you of any conflicting situations. The service profile will appear under the appropri-
ate area in Service Profiles, and you can make changes (see Figure 17-35).
At this point the service profile can be associated with a physical server, if such was not
selected during the wizard.
In the FSM tab you can see information for the association or disassociation processes,
as well as any firmware upgrades.
||||||||||||||||||||
■ Centralized inventory and health status information for all Cisco UCS components
for a definitive view of the entire infrastructure
■ Global service profiles and templates that extend the power of policy-based manage-
ment introduced in Cisco UCS Manager well beyond the boundary of a single Cisco
UCS domain
■ A centralized keyboard, video, and mouse (KVM) manager to launch KVM sessions
anywhere in the Cisco UCS environment from a single pane
■ Global administrative policies that enable both global and local management of
Cisco UCS domains, help ensure consistency and standardization across domains,
and eliminate any configuration drift typically experienced after servers are initially
deployed
Technet24
||||||||||||||||||||
Summary
This chapter described the Cisco UCS abstraction. In this chapter, you learned about the
following:
■ The Cisco Unified Computing System (UCS) is a complex system that builds a physi-
cal compute infrastructure and deploys abstraction from the hardware by separating
the server into two components—the physical compute node and the logical object
of the service profile.
■ The Cisco UCS Manager (UCSM) provides management and runs on the Cisco
Fabric Interconnects.
■ On one of the Fabric Interconnects, the Cisco UCSM runs in active mode; on the
other, it runs in stand-by mode.
■ The Fabric Interconnect on which the UCSM is active is called the primary Fabric
Interconnect.
■ The software that runs on the Fabric Interconnect consists of three files: the kick-
start and system image files of NX-OS and the UCSM application.
■ The Cisco UCSM is the management point for the UCS administrator.
■ The UCSM communicates with the management end points in the UCS.
■ The management end points in the UCS are the Fabric Interconnects, the chassis ele-
ments, the blade and rack servers, and their components.
■ The Cisco UCSM is an XML application and, as such, is based on an XML schema.
■ The Cisco UCSM has an API interface to which you can connect in three ways: via
the GUI, via the CLI (in a programmatic way), and using third-party tools and appli-
cations.
■ The Cisco UCSM GUI is divided into navigation and content panes.
■ The Equipment tab provides information for the physical equipment and policies.
■ The discovery process starts from the CMCs in the FEXs, which gather information
from the servers and provide it to the UCSM.
■ The Servers tab is for the logical objects related to the servers—policies and service
profiles.
■ The Networking tab is for the connectivity of the system to the LAN and any
specific network policies.
||||||||||||||||||||
Summary 567
■ The SAN tab is for the SAN communication and storage policies.
■ The VM tab allows for integration with hypervisors and visibility to the level of the
virtual machines.
■ The Chassis tab is for information, profiles, and policies related to the chassis.
■ The Admin tab is for monitoring, reporting, administering the system, and control-
ling administrative access to the UCS.
■ Physical connectivity depends on the mezzanine adapters in the servers, the models
of the FEXs in the chassis, and the number of links between an FEX and a Fabric
Interconnect.
■ The ports on the Fabric Interconnects connected to FEXs are configured as server
ports.
■ The links between the Fabric Interconnects and the FEXs are Ethernet.
■ These links are used for the data, management, and storage communication of the
servers.
■ The ports of the Fabric Interconnects that connect to the upstream LAN infrastruc-
ture are configured as uplink ports.
■ The FC ports that connect to the SAN infrastructures are configured as Fibre
Channel uplink ports.
■ The identity of a server is formed from the unique values for the UUID, MAC
addresses, and WWPN and WWNN addresses.
■ In the Cisco UCSM, the identity of the server does not depend on the hardware
values provided by the physical server; instead, they are generated and stored in the
service profile.
■ The service profile contains all the identity and configuration information for the
server.
■ To avoid overlaps and to automate the assignment of identity values to the service
profiles, you use identity resource pools.
■ There are UUID, MAC address, and WWNN and WWPN address pools.
Technet24
||||||||||||||||||||
■ They are also called logical resource pools, as the values in them are created by the
administrator.
■ The server pools are used to group the physical servers that are not associated to a
service profile based on their hardware characteristics.
■ The combination of the physical resources of a compute node and a service profile
creates a server in the Cisco UCS.
■ The relationship is 1:1, as at any moment one SP can be used by a single server, and a
single SP can be associated with only one server at a time.
■ To automate the creation of multiple service profiles, you can use service profile
templates.
■ Configuration and operational policies in the service profile define the behavior of
the server and how the different components will be configured.
■ The Service Profile Configuration Wizard consists of 11 steps that allow for the con-
figuration of the server’s identity, network and SAN communication, zoning, vNIC/
vHBA placement, maintenance and operational policies, boot order, vMedia map-
ping, and server assignment.
■ One UCS Manager can manage one UCS domain. To manage multiple UCS domains,
or UCS Managers, you can use the Cisco UCS Central application.
Reference
“Cisco Servers – Unified Computing System (UCS),” https://www.cisco.com/c/en/us/
products/servers-unified-computing/index.html
||||||||||||||||||||
Chapter 18
Server Virtualization
In this chapter, we will be discussing the key components of server virtualization, includ-
ing the virtual machine and its components, types of hypervisors, VMware vSphere archi-
tecture, VMware ESXi, and VMware vCenter Server Appliance installation procedure.
Virtual Machine
A virtual machine (VM) is a software computer that, like a physical computer, runs an
operating system and applications. The virtual machine is composed of a set of specifi-
cation and configuration files and is backed by the physical resources of a host. One or
more virtual “guest” machines run on a physical “host” machine. Each virtual machine
runs its own operating system and functions separately from the other VMs, even when
they are all running on the same host. Every virtual machine has virtual devices that pro-
vide the same functionality as physical hardware and have additional benefits in terms of
portability, manageability, and security.
Technet24
||||||||||||||||||||
Virtual machines are easy to manage and maintain, and they offer several advantages over
physical machines:
■ Multiple VMs can run on a single physical host, allowing more efficient use of
resources and saving physical space, time, and management costs.
■ VMs can be easily moved between physical hosts, providing optimum performance,
maintenance, and resource optimization options. Recovery is much faster with VMs
than with physical servers. Other hosts in the virtualized infrastructure can take over
VMs from the failed host and have less downtime.
While virtual machines have several advantages over physical machines, there are also
some potential disadvantages:
■ Running multiple virtual machines on one physical machine can result in unstable
performance if infrastructure requirements are not met.
■ The failure of a critical hardware component of the physical host such as the moth-
erboard or power supply can bring down all the VMs that reside on the affected
host.
Figure 18-1 shows the components and capabilities of a vSphere 7.0.1, which has a Version
18 VM format.
CPU and memory are typically the two resources that strongly affect VM performance.
The number of vCPUs assigned to a VM depends on the logical cores present in the
ESXi host and the license that is purchased. Virtual random access memory (vRAM) cre-
ates many virtual address spaces and allows the ESXi host to allocate the virtual address
space to any licensed VM. Virtual memory for each VM is protected from other VMs.
The operating system detects the resources assigned to the VM as though they are physi-
cal resources.
||||||||||||||||||||
1 IDE Controller
1 Floppy Controller
Up to 4 CD-ROMs Up to 2 Floppy Drives
VM
Up to 4GB Graphics
Video Memory
Up to 24TB RAM
VM Chipset
1 CPU Ethernet
(Up to 768 vCPUs with VMware SMP) SCSI 1–10 vNICs
IDE = Integrated Drive Electronics
1–4 SCSI Adapters SCSI = Small Computer Systems Interface
1–64 Devices per Adapter SMP = Symmetric Multi-Processing
Datastores/virtual disks are storage containers for files. Datastores are where the host
places virtual disk files and other VM files. Datastores hide the specifics of physical stor-
age devices and provide a uniform model for storing VM files. They could be located
on a local server hard drive or across the network on a SAN. Different types of control-
lers, including Integrated Drive Electronics (IDE), floppy, and Small Computer Systems
Interface (SCSI), let the VM mount one or more types of disks and drives. These control-
lers do not require any physical counterparts.
vNICs facilitate communication between virtual machines on the same host, between vir-
tual machines on different hosts, and between other virtual and physical machines. While
configuring a virtual machine, you can add the vNICs along with the adapter type speci-
fication. The adapter type used typically depends on the type of guest operating system
and applications installed. Selecting a wrong adapter type can result in low networking
performance or the inability of the guest operating system to properly detect the virtual-
ized hardware. Following are some network adapters that might be available for your VM.
■ Vlance: This adapter is also called PCnet32, supports 32-bit legacy guest operating
systems, and offers 10Mbps speeds.
■ E1000: This adapter supports various guest operating systems, including Windows
XP and later and Linux versions 2.4.19 and later.
■ E1000E: This is the default adapter for Windows 8 and Windows Server 2012.
■ VMXNET3: This adapter offers all the features available in VMXNET2 and adds
several new features such as IPv6 offloads.
Technet24
||||||||||||||||||||
Additional files are created when you perform certain tasks with the virtual machine:
■ A .hlog file is a log file used by vCenter Server to keep track of virtual machine files
that must be removed after a certain operation completes.
■ A .vmtx file is created when you convert a virtual machine to a template. The .vmtx
file replaces the virtual machine configuration (.vmx) file.
Note Every vendor has different file types that make up a virtual machine.
||||||||||||||||||||
The container image contains all the information for the container to run, such as appli-
cation code, operating system, and other dependencies (for example, libraries). There
are multiple container image formats, with the most common of them being the Open
Container Initiative (OCI). The container engine pulls the container images from a reposi-
tory and runs them. There are a number of container engines, such as Docker, RKT, and
LXD. Container engines can run on any container host (such as a laptop), on a datacenter
physical server, or in the public cloud. The container is a container image that has been
initiated or executed by the container engine.
Containers are often compared to VMs, as they are both portable single units of pack-
aged compute; however, they solve different problems. Whereas VMs aim to abstract
an operating system from the physical server, containers aim to create an environment
for application code to be executed in. Similar to how VM hypervisors virtualize the
hardware to host multiple isolated operating systems, the container engine virtualizes the
operating system to host multiple isolated applications. Containers are naturally smaller
in size compared to VMs, as they are purposely built to run applications and they pack-
age only the absolute minimum amount of data and executables required. Containers
have introduced the concept of immutability, as they do not need to be updated or
patched, as with virtual machines. Any updates require an existing container to be
destroyed and replaced with a new one.
Server Server
Technet24
||||||||||||||||||||
Hypervisor
A hypervisor is a function that abstracts (that is, isolates) operating systems and applica-
tions from the underlying hardware. A hypervisor enables multiple operating systems to
share a single hardware host machine. Whereas each operating system appears to have the
dedicated use of the host’s processor, memory, and other resources, the hypervisor con-
trols and allocates only needed resources to each operating system and ensures that the
operating systems (VMs) do not disrupt each other.
The two types of hypervisors are Type 1 and Type 2, as detailed next:
Here are some of the common types of Type 1 hypervisors on the market today:
■ VMware ESXi
■ KVM
■ Microsoft Hyper-V
■ Citrix XenServer
Here are some of the common types of Type 2 hypervisors on the market today:
■ Windows Virtual PC
■ Parallels Desktop
■ Oracle VM VirtualBox
Figure 18-3 illustrates the difference in the deployment of Type 1 and Type 2
hypervisors.
||||||||||||||||||||
VM 1 VM 2 VM 1 VM 2 VM ...
Guest OS Guest OS
Hypervisor
Hypervisor Host OS
Hardware Hardware
Type 1 Type 2
Virtual Switch
A virtual switch (vSwitch) is a software application that connects VMs with both virtual
and physical networks. The functionality of a virtual switch is quite similar to that of an
Ethernet switch, with added security controls provided specifically for virtual environ-
ments. A vSwitch does more than just forward data packets; it intelligently directs the
communication on a network by checking data packets before forwarding them to a
destination.
Virtual switches are usually embedded into installed hypervisor software and provide the
connectivity that each VM requires. A virtual switch is completely virtual and can con-
nect to a Physical NIC. Figure 18-4 illustrates typical vSwitch connectivity.
Physical Server
VMs VM 1 VM 2 VM 3 VM 4
vNICs
vSwitch vSwitch
Hypervisor
Physical Switch
Physical NICs
Technet24
||||||||||||||||||||
Since physical host network adapters are usually limited in number, it is not possible to
assign individual physical NICs to each VM for network connectivity. VMs are usually
assigned one or more vNICs, which are attached to a vSwitch. The physical NICs act as
uplink ports to the vSwitch for network access.
There are, however, a few challenges in using vSwitches. vSwitches create an additional
processing load on the physical host. vSwitches lack familiar management options such
as SSH and do not support all the features of a physical switch. Configuration performed
on a physical network switch that connects to a host will affect all the VMs on that host.
If the VM is moved between hosts, a different configuration on the network switch port
connected to the target host can impact it. In a VM environment, if you shut down the
switch port connected to the server, it impacts the network access for all the VMs and
causes a greater impact to the production environment.
VMware vSphere
VMware vSphere is the name of VMware’s server virtualization product suite. It was
formerly known as VMware Infrastructure, and it consists of ESXi, a Type 1 hypervi-
sor, vCenter Server, vSphere Client, and a few other important features that are not eas-
ily replicated on a physical infrastructure, such as vSphere vMotion and vSphere High
Availability, to ensure virtual servers are up and running. Figure 18-5 shows the VMware
vSphere components.
Manage
VM VM VM VM VM VM VM VM VM VM VM VM
||||||||||||||||||||
VMware ESXi
VMware ESXi (Elastic Sky X Integrated) is a Type 1 hypervisor that installs directly on
the physical server. VMware ESXi is based on the VMkernel operating system, which
interfaces with agents that run on top of it. With direct access to and control of under-
lying resources, VMware ESXi effectively abstracts the CPU, storage, memory, and
networking resources of the physical host into multiple virtual machines. This means
that applications running in virtual machines can access these resources without direct
access to the underlying hardware. Through the ESXi, you run the VMs, install operating
systems, run applications, and configure the VMs. Admins can configure VMware ESXi
using its console or a vSphere Client. An ESXi 7.0 host can support up to 1024 VMs.
vCenter Server requires an extra license and serves as a focal point for management of
ESXi hosts and their respective virtual machines. It correlates traffic between hosts for
functionalities that span more than a single host, such as vSphere vMotion, vSphere
Distributed Switch (vDS), vSphere Fault Tolerance, vSphere High Availability, and so on.
Failure of a vCenter Server does not stop production traffic or affect VMs operation,
but features like central management of ESXi hosts, vSphere High Availability, vSphere
Distributed Switch, vSphere DRS, and vSphere vMotion are not available until the vCen-
ter Server is restored. It is always recommended to have a redundant vCenter deployment.
vCenter version 7.0 can manage up to 2500 ESXi hosts per vCenter Server and can have
45,000 registered VMs with 40,000 powered-on VMs.
The VMware vSphere environment supports integration with Active Directory (AD) as
the identity source, which simplifies user management. Therefore, when users log in to
the vSphere Client, they can use their domain credentials. This approach allows
Technet24
||||||||||||||||||||
administrators to define user roles for users with different permissions (administrator,
read-only, and so on) that can then be used to access and manage all systems connected
to the AD database. Depending on the access permissions configured, the user can access
some of the virtualized environment or the entire environment using vSphere Client. For
example, users can connect to the consoles of the VMs but not be able to start or stop
those VMs, or change the parameters of the underlying host. Users also use one set of
credentials to access multiple systems or devices. This allows administrators to create
user accounts, change user permissions, or disable user accounts via a central location
(AD database) and not on each system/device separately. Figure 18-6 shows the GUI
interface of vSphere Client.
vSphere vMotion
The vSphere vMotion feature migrates a live, running VM from one physical server to
another with zero downtime and continuous service availability. This capability is pos-
sible across vSwitches, clusters, and even clouds. The entire process takes less than two
seconds on a gigabit Ethernet network. The vMotion feature enables you to perform
hardware maintenance without scheduling downtime or disrupting business operations.
You can also move virtual machines away from failing or underperforming servers using
vMotion. It is a powerful tool for maintenance or resource distribution situations, but it
is not designed for disaster recovery. If a physical host fails, the VMs running on the host
go offline until another host recovers the VMs. Figure 18-7 shows the vSphere vMotion
feature in action.
||||||||||||||||||||
VM
VM
VMotion Technology
Hardware Hardware
Shared Storage
Technet24
||||||||||||||||||||
Virtual Machine B Virtual Machine C Virtual Machine E Virtual Machine C Virtual Machine E
Virtual Machine A Virtual Machine D Virtual Machine F Virtual Machine D Virtual Machine F
ESXi Host 1 ESXi Host 2 ESXi Host 3 ESXi Host 1 ESXi Host 2 ESXi Host 3
Figure 18-9 shows how VMware Fast Checkpointing synchronizes the primary and sec-
ondary VMs by following the changes on the primary VM and mirroring them to the
secondary VM.
Instantaneous
Failover
Fast Checkpointing
VM VM
||||||||||||||||||||
vSphere DRS
The VMware vSphere Distributed Resource Scheduler (DRS) feature ensures that VMs
and their applications are always getting the compute resources they need to run effi-
ciently. vSphere DRS provides resource management capabilities like load balancing and
virtual machine placement across the available hosts in an ESXi cluster to provide opti-
mum performance. Migrating the VMs is done automatically using the underlying vMo-
tion feature. If usage of the resources on a current host exceeds the defined limit, one or
more of the VMs are relocated to other hosts to prevent degraded performance. If you
need to perform an upgrade on one of the hosts in the group, vSphere DRS allows you
to automatically migrate all VMs to other hosts once it is placed into maintenance mode
without service disruption. Figure 18-10 shows how vSphere DRS helps dynamically bal-
ance VM workloads across resource pools.
Distribute Load
VM VM VM VM VM VM VM VM VM
Resource Pool
vSphere DPM
The VMware vSphere Distributed Power Management (DPM) feature reduces power
consumption in the datacenter by powering hosts on and off based on cluster resource
utilization. vSphere DPM monitors the cumulative demand of all virtual machines in the
cluster and migrates the VMs to as few of the hosts as possible and then powers off the
remaining hosts. Migrating the VMs is done automatically using the underlying vMo-
tion feature. Wake-up protocols are used to restart the standby host if more resources
are needed. Figure 18-11 depicts vSphere DPM consolidating VM workloads to reduce
power consumption.
Technet24
||||||||||||||||||||
Step 1. Log in to the KVM (keyboard, video, mouse) Console of the server and acti-
vate virtual devices (Virtual Media, Activate Virtual Devices), as shown in
Figure 18-12.
Step 2. Map the drive where the ISO image of the ESXi hypervisor is located to the
virtual CD/DVD device on the KVM Console (Virtual Media, Map CD/
DVD, locate the ISO, Map Drive), as shown in Figures 18-13 and 18-14.
||||||||||||||||||||
Step 3. Reset the system and enter the boot menu by pressing F6 when Cisco BIOS
screen with the Cisco logo appears (Power, Reset System (warm boot), press
F6 on Cisco BIOS screen), as shown in Figures 18-15 and 18-16.
Technet24
||||||||||||||||||||
Step 4. Choose Cisco vKVM-Mapped vDVD1.24 as the boot device and press Enter
(see Figure 18-17). Then select the VMware ESXi Installer from the VMware
ESXi boot menu (see Figure 18-18).
||||||||||||||||||||
Step 5. Once the installer loads, press Enter to begin the installation (see Figure 18-19).
On the End User License Agreement (EULA) page, press F11 to proceed (see
Figure 18-20).
Technet24
||||||||||||||||||||
Step 6. Choose a disk (local or remote) where you want to install the ESXi hypervisor
(see Figure 18-21).
Step 7. Choose a keyboard layout (see Figure 18-22) and set the root password for
the ESXi host (see Figure 18-23), which will be used to log in to the ESXi
host later.
||||||||||||||||||||
Step 8. Press F11 to confirm the installation of the ESXi hypervisor (see Figure 18-24).
Once the installation is complete, press Enter to reboot the server (see
Figure 18-25). The completed installation initiates a CD-ROM eject call,
which unmounts the virtual ESXi ISO image we mapped earlier in Step 2.
Technet24
||||||||||||||||||||
Figure 18-25 Rebooting the Server After the ESXi Hypervisor Is Installed
Step 9. Once the reboot has completed, you will see the VMware ESXi home screen
(see Figure 18-26). This completes the ESXi installation on the server. Before
you can connect to the HTTP management of the newly installed ESXi
hypervisor, you need to configure its management network and connectivity
options. There are two ways to configure a management IP address on an ESXi
hypervisor: via DHCP automatic IP assignment or via static IP configuration.
||||||||||||||||||||
Step 10. To configure the management IP address statically, press F2 and log in using
the root password to go to the System Customization menu (see Figure 18-27).
In the System Customization menu, select Configure Management Network
and press Enter (see Figure 18-28).
Figure 18-27 Authenticating with Root Credentials for the System Customization
Menu
Technet24
||||||||||||||||||||
Step 11. Here, you can select which network adapter will be used for management
network connectivity as well as what VLAN it will be on. Also, you can
configure the static IP address along with the gateway for the management
interface (see Figures 18-29 and 18-30).
Step 12. SSH access to the ESXi hypervisor can be enabled from Troubleshooting
Options under the System Customization menu (see Figure 18-31).
||||||||||||||||||||
Step 13. Once you configure the management network, you can connect to the ESXi
hypervisor at https://<Management IP of the ESXi Server> (see Figures 18-32
and 18-33).
Technet24
||||||||||||||||||||
Step 1. Locate the vCenter Server Appliance ISO image and then double-click the
installer.exe file located under the vcsa-ui-installer\<operating system folder>\
folder. Once the installer window opens, click Install (see Figure 18-34).
||||||||||||||||||||
Step 2. In the Introduction step, shown in Figure 18-35, read the instructions and
click Next.
Step 3. In the End user license agreement step, shown in Figure 18-36, accept the
license agreement and click Next.
Step 4. In the Select deployment type step, shown in Figure 18-37, select Embedded
Platform Services Controller and click Next. The External Platform Services
Controller deployment model is deprecated.
Technet24
||||||||||||||||||||
||||||||||||||||||||
Step 5. In the Appliance deployment target step, shown in Figure 18-38, enter the
details of your ESXi host where you want the vCenter Server Appliance VM
to run and then click Next. Here we will give the details of the ESXi host
installed in the previous section.
Step 6. In the Set up appliance VM step, shown in Figure 18-39, enter the name and
set the root password for the vCenter Server Appliance and then click Next.
Step 7. In the Select deployment size step, shown in Figure 18-40, select the deploy-
ment size as per your organization’s requirements and click Next.
Technet24
||||||||||||||||||||
||||||||||||||||||||
Step 8. In the Select datastore step, shown in Figure 18-41, select the appropriate
datastore connected to the ESXi host and click Next.
Step 9. In the Configure network settings step, shown in Figure 18-42, fill in the
network settings such as the fully qualified domain name (FQDN), IP address,
subnet mask, default gateway, and DNS server details for the vCenter Server
appliance and click Next.
Step 10. In the Ready to complete stage 1 step, shown in Figure 18-43, review the
configuration and click Finish. In stage 1, the vCenter Server Appliance is
deployed (see Figure 18-44). Once the server is deployed, click Continue to
move to stage 2, where we will set up the vCenter Server Appliance deployed
in Step 1 (see Figure 18-45).
Technet24
||||||||||||||||||||
||||||||||||||||||||
Technet24
||||||||||||||||||||
Step 11. In stage 2’s Introduction step, shown in Figure 18-46, read the instructions
and click Next.
Step 12. In stage 2’s Appliance configuration step, shown in Figure 18-47, choose the
appropriate options for time synchronization mode and SSH access. In this
example, we will let the vCenter server synchronize time with the ESXi host
and will keep the SSH access to the vCenter Server disabled.
Step 13. In stage 2’s SSO configuration step, shown in Figure 18-48, you can either
create a new SSO domain or join an existing SSO domain. In this example, we
will create an SSO domain named vsphere.local and configure the SSO user-
name and password to log in to the vCenter Server Appliance using SSO.
Step 14. In stage 2’s Configure CEIP step, shown in Figure 18-49, you can join
the VMware’s Customer Experience Improvement Program as per your
preference.
||||||||||||||||||||
Technet24
||||||||||||||||||||
Step 15. In stage 2’s Ready to complete step, shown in Figures 18-50, 18-51, and
18-52, review the configuration and click Finish to complete the installation.
This completes the vCenter Server Appliance installation.
||||||||||||||||||||
Technet24
||||||||||||||||||||
Step 16. Once the installation is complete, you can log in to the vCenter Server using
the link https://vcenter.pod.local:443, as per the configuration we did in Step 9.
Click the LAUNCH VSPHERE CLIENT(HTML5) button and log in using the
SSO credentials configured in Step 13 (see Figures 18-53, 18-54, and 18-55).
Figure 18-54 Logging in to the vCenter Server Appliance Using SSO Credentials
||||||||||||||||||||
Step 17. Once the vCenter Server is up and running, you can add hosts to the vCen-
ter Server to manage them. These hosts can be segmented by their location
(vCenter sub-unit known as data center) and group (data center sub-unit
known as cluster). In this example, we will add the same ESXi host where the
vCenter Server Appliance is hosted. Right-click vCenter vcenter.pod.local in
the left pane and choose New Datacenter (see Figure 18-56). Provide a name
for the data center and click OK (see Figure 18-57).
Technet24
||||||||||||||||||||
Step 18. Right-click the newly created data center and choose Add Host (see
Figure 18-58).
||||||||||||||||||||
Step 19. In the Add Host Wizard’s Name and location step, shown in Figure 18-59,
provide the IP address of the host you want to add. In this example, we will
add the ESXi host configured in the previous example.
Step 20. In the Add Host Wizard’s Connection settings step, shown in Figure 18-60,
enter the root login credentials of the ESXi host and click Next.
Technet24
||||||||||||||||||||
Step 21. In the Add Host Wizard’s Host summary step, shown in Figure 18-61, review
the host details and click Next.
Step 22. In the Add Host Wizard’s Assign license step, shown in Figure 18-62, assign
the appropriate license. For this example, we will use the evaluation license.
||||||||||||||||||||
Step 23. In the Add Host Wizard’s Lockdown mode step, shown in Figure 18-63, choose
the appropriate lockdown mode. Lockdown mode prevents remote users from
logging directly in to the host. For this example, we will keep it disabled.
Step 24. In the Add Host Wizard’s VM location step, shown in Figure 18-64, choose
the data center where you want the VM to be located and click Next.
Technet24
||||||||||||||||||||
Step 25. In the Add Host Wizard’s Ready to complete step, shown in Figures 18-65
and 18-66, review the information and click Finish. Verify that the ESXi
host has been added successfully and that the vCenter virtual machine shows
underneath the ESXi host.
||||||||||||||||||||
Summary 611
Summary
This chapter discusses server virtualization components such as the virtual machine,
hypervisor, virtual switch, VMware vSphere product suite, and VMware ESXi as well as
vCenter Server Appliance installation, including the following points:
■ Servers are often underutilized, and virtualization provides better performance and
efficiency from the existing computing resources using hardware abstraction.
■ A virtual machine is a software computer that uses abstracted resources via virtual-
ization software and can run an operating system and applications just like a physical
computer.
■ A virtual machine consists of components such as vCPU, vRAM, vNICs and so on,
and it is typically stored on the physical host’s datastore/virtual disk as a set of files.
■ There are two types of hypervisor: Type 1 runs on bare-metal x86 hardware archi-
tecture, and Type 2 runs on an operating system as a hosted environment.
■ A virtual switch (vSwitch) is a software application that connects VMs with both vir-
tual and physical networks.
■ VMware vCenter Server is server management software that enables you to manage
the vSphere environment from a centralized location.
■ The vSphere vMotion feature enables the migration of live, running VMs from one
physical server to another with zero downtime and continuous service availability.
■ The vSphere High Availability feature can automatically restart a failed VM on alter-
native host servers to reduce application downtime.
■ The vSphere Fault Tolerance feature allows a protected VM, called the primary VM,
to survive the failure of a host with zero downtime.
Technet24
||||||||||||||||||||
References
“VMware Infrastructure Architecture Overview” (White Paper), https://www.vmware.
com/pdf/vi_architecture_wp.pdf
“VMware vSphere Documentation,” https://docs.vmware.com/en/VMware-vSphere/
index.html
Relevant Cisco Live sessions: http://www.ciscolive.com
||||||||||||||||||||
Chapter 19
Using APIs
Each application provides one or more services. Depending on the logic of the service,
the application architecture, and the processes, the different components will have to
communicate with each other. The applications will have to connect to other applica-
tions to exchange data, and there will be requests for data and responses with the needed
information. Because the applications are different, the communication, interfaces, and
data formats need to be standardized.
The APIs specify these sets of requirements, defining how the applications can connect
to each other and exchange data.
This chapter covers application programming interfaces (APIs) and the different proto-
cols, data formats, and methods.
This approach requires the use of standard protocols, data formats, and access methods
for programmatic communication with the network, compute, and storage devices.
Technet24
||||||||||||||||||||
An API is a set of definitions, functions, and procedures that enable the communica-
tion between devices, applications, or services. Each API, as shown in Figure 19-1, must
define which functions or endpoints to call, which data to provide as parameters, which
data to expect as outputs, which data formats are supported by the endpoint for the data
encoding, and what authentication is required. An API exposes internal functions to
the outside world, allowing external applications to utilize functionality within the
application.
Operationally, when an API is used, the communication consists of calls sent from one
application to another. For these calls to be successful, the target application must under-
stand the calls and accept them as commands.
Application/Server
The two most used API types are the remote procedure call (RPC) API and the REST
API.
The RPC API is based on remote procedure calls sent by the client to an application (see
Figure 19-2). The call contains parameters and represents a remote function call, as it trig-
gers an action on the application. The application responds with a message, and then the
client continues the execution. With the RPC API, the following must be considered:
■ Latency and overhead: These aspects depend on the transport and the load of the
application.
■ Error handling: Errors are handled by the application and error messages are sent to
the client.
||||||||||||||||||||
The RPC API is an approach to (or architecture for) building an API, not a protocol.
Because of this, many different protocols use it.
Client
RPCs
API
The REST API is another style of building APIs (see Figure 19-3). The REST API is
based on HTTP and uses the HTTP methods GET, PUT, and DELETE as its operations.
With the REST API, the application transfers a representation of the state of the objects
defined in the call by the client. The objects are the resources in this client/server archi-
tecture and are identified by a URL/URI. There are six specific characteristics of the
REST API architecture:
■ Stateless: No client context needs to be stored on the application side. The client
request contains all the required information.
■ Cacheable: On the web, HTTP responses can be cached. This results in caching
being used to overcome issues such as latency. That is why the REST API responses
need to contain information about whether they are cacheable.
■ Layered: The architecture must support multiple layers of functions, such as load
balancers, firewalls, proxy and cache servers in a transparent manner for the client.
Technet24
||||||||||||||||||||
■ Uniform interface: The representation of the resources provided to the client by the
server must contain enough information for the client to be able to work with them.
However, it is not mandatory that the internal representation of the resources be the
same.
■ Code on demand: This is an optional characteristic that defines the ability of the
server to temporarily extend the execution of code to the client, such as running
JavaScript or Java applets.
Client
API
/resource
Based on the constraints of the REST API architecture, the RESTful API leverages HTTP
to define operations over the exposed resources. Based on the representations of the
resources and on which resource an HTTP method is used, the result can be different. For
example, if the method GET is used on a collection resource such as /users, the response
will contain the URIs of all the users, but if the same method is used on a member
resource, such as the user “John” (/users/john), the response will contain a representation
of the user John or all the attributes that define this object.
■ POST: Requests the resource to process the representation sent. Used to modify a
resource.
||||||||||||||||||||
■ PUT: Creates or updates a resource state based on the representation in the request.
■ HEAD: Like GET, this method requests the representation of a resource, but without
the data enclosed in the response body.
Additional methods are available, but they are not discussed in this chapter.
Also contained in the HTTP response bodies are status codes. The HTTP status codes are
grouped as follows:
■ Informational (1xx): Indicates the request has been received and processed by the
server.
■ Success (2xx): Indicates a successful operation. Here are some of the most com-
monly used codes:
■ 201 (Created): The request has been received and the resource created.
■ 204 (No content): The request has been received and successfully processed, but
the response body is empty.
■ Redirection (3xx): The resource is either moved permanently and the requests will
be redirected to a different URI (301) or the resource is found under a different URI
temporarily (302).
■ Client error (4xx): Specifies an error condition because of an issue with the client
request. Here are some of the most commonly used codes:
■ 400 (Bad request): The request has bad syntax, routing, size and cannot be pro-
cessed.
■ 403 (Forbidden): A valid request was received, the server refused to process it
due to insufficient permissions to the resource.
■ Server error (5xx): Error messages generated because of an issue on the server side.
Here are some of the most commonly used codes:
■ 500 (Internal Server Error): A generic error message indicating an issue occurred
on the server, and there is no specific explanation of the problem.
■ 501 (Not Implemented): The server does not support the requested function.
■ 503 (Service Unavailable): The service or the server is not available because of a
system crash, power failure, and so on.
Technet24
||||||||||||||||||||
The traditional implementation of the APIs presented a lot of challenges, as they were
developed initially to take care of the communication between applications and not for
the purpose of automating the components of the data center. It was very difficult to try
to translate the command line interface (CLI), as it is unstructured text, not intended to
be consumed by the machines but by humans. This led to the adoption of a model-based
approach for creating device APIs.
The data models specify how the different components of the data center devices are
described as objects and how configurations can be created and applied to these objects.
This allows the whole API framework to be built based on this model-driven approach
(see Figure 19-4). The data models for the specific devices, YANG for the network devic-
es (such as the Cisco Nexus switches and Cisco ACI), and a management information tree
(MIT) for the Cisco UCS are the base layers in the API framework.
The next layer of the framework contains the transport protocols used for communica-
tion with devices. This depends on the implementation, but some examples are HTTP/S,
SSH, TLS. The transport protocols are also used to provide the authentication services
for access through the API.
The encoding layer defines the data formats to be used in the communication with the
device. The most commonly used are JSON, YAML, and XML.
The final layer contains the network configuration protocols, such as NETCONF,
RESTCONF, and gRPC.
The data model used with the Cisco network devices in the data center is the YANG data
model (see Figure 19-5). YANG is a data-modeling language used to describe configura-
||||||||||||||||||||
tion and operational data, remote procedure calls, and notifications for network devices.
It provides the definitions for the data sent over network configuration and management
protocols such as NETCONF and RESTCONF. Because YANG, is protocol independent,
it can be converted to any encoding format, such as XML, JSON or YAML.
YANG creates a hierarchical data model consisting of namespaces with modules contain-
ing the definitions and the configurations for the objects. Cisco sticks to the open source
and open standards approach, and that’s why the YANG implementation consists of
two major namespaces: the YANG OpenConfig and the YANG native. Under the YANG
OpenConfig are vendor-agnostic configuration and management models. This allows
Cisco to adopt standards-based configuration and management of the network devices
and components in the data center. The features specific only for the Cisco network
devices are under the YANG native models. This allows for the precise configuration and
management of advanced features and functionality.
YANG
native OpenConfig
cisco-nx-openconfig-bgp-policy-deviations.yang cisco-nx-openconfig-system-deviations.yang
When data is exchanged, it must be understandable by both sides of the exchange. This
means we will first need to agree on the rules of how the data will be structured, how it
will look, and what the constraints will be, before it can be transferred. In other words,
what are the syntax and grammatical rules? These rules are organized and standardized as
sets called data formats. That is why the data formats are part of the APIs. The most com-
mon are JavaScript Object Notation (JSON), eXtensible Markup Language (XML), and
the YAML Ain’t Markup Language (YAML).
Each of the data formats has different semantics and syntax. One will use spaces, and
others will use commas, quotation marks, and so on. However, the common thing
Technet24
||||||||||||||||||||
between them is the concept of an object. This is the element that will be manipulated
and described with multiple attributes. Usually, the attributes are defined by using
key:value pairs. The key is the name of the attribute and is positioned on the left side
of the pair. The value component defines the attribute state. Because key:value pairs are
used to describe objects, it is important to know the syntax for the different data formats
in order to notate them.
JSON is a data format for transmitting data between web services. It is simpler and more
compact compared to XML. JSON is faster for humans to code in, and because it is bet-
ter suited for object-oriented systems, it is widely used with scripting platforms.
JSON is platform independent and language independent. There are parsers and libraries
for JSON in many different programming languages.
JSON is also plaintext and human readable. The JSON text format is syntactically identi-
cal to the code for crating JavaScript objects. It uses a hierarchical structure with nested
values. JSON can use arrays, and there are no end tags and reserved words.
As shown in the example in Figure 19-6, JSON uses curly braces to define a new object.
The objects appear in quotes, and the key:value pairs are separated using a colon.
JSON Example
{
“user": {
“name": “George",
“email": “George@company.com",
“location": “Alabama",
“title": “IT Administrator"
}
}
■ It is human readable.
||||||||||||||||||||
XML uses tags in the following format: <tag></tag>. The tags are used to define both the
objects and the key:value pairs. The tag can be considered the key in the pair, as the value
will be enclosed between the opening and the closing tags. As shown in the example in
Figure 19-7, the key is <name> and the value is George. The object being described is a
user, and between the opening and the closing tags for the user object are all the attri-
butes describing this object.
XML Example
YAML has a very minimalistic syntax, as shown in Figure 19-8. It was designed to be
easier for humans to work with. It’s becoming more and more popular because it is not
only more convenient for humans but also easy for programs to use as well.
The key:value pairs are separated by colons, but in the syntax of YAML, the use of
whitespace is very important. Whitespace is used for indentation levels, and it defines
the structure of the objects and the YAML file. For example, all the data inside an object
must have the same indentation level.
Technet24
||||||||||||||||||||
YAML Example
User:
name: George
email: George@company.com
location: Alabama
title: IT Administrator
■ Secure Transport layer: Secure transport for the communication of the NETCONF
protocol
The NETCONF protocol also defines and uses objects called datastores. A datastore is a
complete set of configurations needed by a network device to get from its initial default
configuration state to a required operational configuration state. Explained in a different
way, each network device has a running configuration, which resides in the memory and
is lost during a reset or power failure. This is referred to as the running configuration. The
configuration used during the boot of the device and stored in the permanent memory is
called the startup configuration. With NETCONF, the datastores use a similar approach
to the device configuration. There’s the “running” datastore, which holds the complete
configuration currently running on the device, just like the running configuration you are
used to working with on network devices. The second type of datastore is the startup,
which holds the complete configuration used during the boot of the device. An addi-
tional datastore is called the candidate datastore. It creates an environment in which con-
figuration operations are executed and configuration changes are made without affecting
||||||||||||||||||||
the running configuration. After a commit operation, the candidate configuration will be
pushed to the running configuration of the device. There are also other datastores; the
datastores that exist on a device are reported as capabilities by the device during the start
of the NETCONF sessions.
The NETCONF protocol operations allow for retrieving, copying, configuring, and delet-
ing datastores and for exchanging state and operational information. The operations are
encoded in XML and use the YANG data models.
During the message exchanges, RPCs with three tags are used: <rpc>, <rpc-reply>, and
<notification>, depending on the type of the message.
■ <get>: Retrieve the running configuration and the device state data
The NETCONF protocol defines a set of operations to support the basic CRUD (create,
read, update, delete) approach and the use of datastores with the YANG data models.
■ HTTP/S transport
■ HTTP methods
■ Subset of NETCONF
Technet24
||||||||||||||||||||
The supported operations are based on HTTP and include GET, POST, PUT, PATCH, and
DELETE.
The principles and concepts behind automation and orchestration in Cisco data cen-
ter devices throughout the lifecycle are covered in Chapter 20, “Automating the Data
Center.” Here you will get acquainted with the Cisco NX-OS programmability options.
The Cisco NX-OS software stack is built based on the Cisco Open NX-OS model-driven
programmability (MDP) architecture (see Figure 19-9). It is an object-oriented software
framework aimed at developing management systems. At the base of the MDP archi-
tecture is the object model, which is used to represent the components, configurations,
operational states, and features of the devices.
NX-OS Device
DME
In the MDP architecture, the goal is to abstract the physical device and to represent its
components using a data model, such as the YANG data model. This allows the device’s
components and features to be accessed and configured through APIs, thus achieving
automation and supporting the needed levels of orchestration. The component from the
||||||||||||||||||||
Cisco NX-OS MDP responsible for looking into the hardware and software features,
based on the YANG OpenConfig and native models to create this representation, called
a management information tree (MIT), is the data management engine (DME). The DME
creates the hierarchical namespaces populated with objects representing the underlying
components. It is part of the APIs, as it exposes the MIT to the external clients, applica-
tions, and administrator automation tools in different ways.
As a result, Cisco NX-OS offers the following automation and programmability frame-
work:
■ NX-API CLI: Provides support for embedding Cisco NX-OS CLI commands for
execution on the switch. The transport protocol is HTTP/S, and the commands need
to be structured using the JSON or XML data format.
■ Cisco NX-OS Software Development Kit (NX-SDK): A C++ abstraction and plug-
in library layer that streamlines access to infrastructure for automation and custom
application creation. The programming languages supported with the NX-SDK are
C++, Python, and Go.
The NX-API REST and CLI are backed by a built-in web server, called NGINX, which
is the endpoint for HTTP communication. It responds to HTTP requests and is used by
both the NX-API CLI and the NX-API REST.
The NX-API CLI extends the Cisco NX-OS CLI outside of the device and allows admin-
istrators to use a programmatic approach for configuration. The API communicates
through the NGINX web server using the HTTP/S protocols as a transport. This interface
does not provide a big difference when it comes to optimizing the configuration and
achieving better automation, as you still have to work with a single device to use the
CLI commands, encode them in JSON or XML, and then to push them to the device.
However, this API allows administrators to examine and learn the MIT, the different pro-
grammatic operations, integrations, and so on.
The NX-API REST is a RESTful programmatic interface. The configuration and state data
for the objects is stored in the MIT. The NX-API REST has access to the MIT. When the
configuration is pushed to the MIT, the DME validates and rejects any incorrect values.
Technet24
||||||||||||||||||||
It is said that the DME operates in forgiving mode during the validation process, because
if there are values missing for the attributes of objects in the MIT, instead of rejecting the
whole configuration, the DME will use default values.
The Cisco NX-OS devices offer a unique option to run Python scripts directly on them.
This is achieved by embedding a native Python execution engine that supports the use of
a dynamic Python interpreter on the device. This is known as onboard Python. It can be
used to automate CLI commands and generate syslogs for event-based activities. It can
be used together with the Embedded Event Manager (EEM) and the scheduler. To start it,
simply type python in the CLI and press Enter.
Cisco NX-OS devices come with a preinstalled Python module called “cisco.” It can be
used with three core methods:
■ cli(): Returns the raw output of the CLI commands, including control and special
characters
■ clip(): Prints the output of the CLI commands directly to stdout and returns nothing
to Python
■ clid(): Returns a dictionary of attribute key:value pairs for CLI commands that sup-
port JSON
■ Action statement: The actions to be taken when an event occurs. Actions include
sending an email and disabling an interface.
This structure allows the EEM to be used to monitor and react when specific events
occur according to the actions specified. Additionally, it can be used with a scheduler
and integrated with the onboard Python interpreter.
From another side, the XML API exposes the representation of the underlying physical
and logical resources. These resources are abstracted and represented as objects with the
||||||||||||||||||||
Exploring the Cisco UCS Manager XML API Management Information Tree 627
needed attributes to describe them by the DME in an MIT. In Figure 19-10, we can see a
similar approach to the Cisco NX-OS APIs.
Each of the objects in the hierarchical structure of the MIT is described with attributes
that represent the configuration and the state.
The managed objects in the MIT have unique distinguished names (DNs), which describe
the object and its location in the tree.
Root
DN RN
{RN}* {prefix - <naming - prop>}*
chassis - 1 (chassis - <id>)
MO Is-bob (Is - <name>)
DN Class Instance ID
Registry Registry Registry
The DME centrally stores and manages the information model, which is a user-level pro-
cess that runs on the Fabric Interconnects. When a user initiates an administrative change
to a Cisco UCS component (for example, applying a service profile to a server), the DME
first applies that change to the information model and then applies the change to the
actual managed endpoint.
The Cisco UCS Manager XML API supports operations on a single object or an object
hierarchy. An API call can initiate changes to attributes of one or more objects such as
physical hardware, policies, and other configurable components. The API operates in
forgiving mode. Missing attributes are replaced with applicable default values that are
maintained in the internal DME.
To examine the MIT, you can use the built-in Cisco UCSM and CIMC-managed object
browser. It’s called Visore (see Figure 19-11) and can be accessed at the following URL:
http://<UCSM/CIMC-IP-Address>/visore.html.
Technet24
||||||||||||||||||||
Visore uses the Cisco UCS XML API to query the DME for information on the man-
aged objects. For each object, you can see all its information, including child and parent
objects, to which class it belongs, and more. A pink background color is used for the
fields that display the managed object instances and the class name. The property names
have a green background, and the values are in yellow.
There is also a reference to the Cisco UCS Manager information model on the Cisco
DevNet site, which can be accessed and researched for free at the following URL:
https://developer.cisco.com/site/ucs-mim-ref-api-picker/
Cisco UCS PowerTool suite is a set of PowerShell modules that helps automate all aspects
of Cisco UCS Manager. It also helps automate server, network, storage, and hypervisor
management. Cisco UCS PowerTool suite enables easy integration with existing IT man-
agement processes and tools. The PowerTool cmdlets work on the Cisco UCS MIT. The
cmdlets can be used to execute, read, create, modify, and delete operations on all the
UCS managed objects (MOs) in the MIT. Cisco UCS PowerTool 2.0 also provides support
for Microsoft’s Desired State Configuration (DSC).
An additional functionality, which helps administrators understand the MIT and how
to use it with the Cisco UCS PowerTool, is the ability to convert actions in the UCS
Manager GUI into DSC configuration code. This functionality is provided by the
ConvertTo-UcsDSCConfig cmdlet.
The Cisco UCS Python SDK is a Python module that helps automate all aspects of Cisco
UCS management, including server, network, storage, and hypervisor management, simi-
lar to Cisco UCS PowerTool. Just like Cisco UCS PowerTool, the Cisco UCS Python SDK
also performs similar functions on the Cisco UCS Manager MIT in that it creates, modi-
fies, and deletes managed objects in the tree.
||||||||||||||||||||
Reference 629
Summary
This chapter describes the Cisco UCS abstraction. In this chapter you learned about the
following:
■ The RPC API is an architecture based on remote procedure calls sent by the client to
an application.
■ The REST API is an architecture based on HTTP and uses the HTTP methods GET,
PUT, and DELETE for its operations.
■ HTTP uses methods to define the operations and status codes for the results.
■ Data models specify how the different components of the data center devices are
described as objects and how the configurations can be created and applied to these
objects.
■ The data model used with the Cisco network devices in the data center is the YANG
data model.
■ The data formats for encoding are JSON, XML, and YAML.
■ Cisco NX-OS offers NX-API REST, NX-API CLI, NX-SDK, NX-Toolkit, onboard
Python, and EEM.
■ The Cisco UCSM APIs support the model-driven framework based on an MIT.
■ Cisco UCS PowerTool is a set of PowerShell modules to help automate the Cisco
UCSM.
■ The Cisco UCS Python SDK is a Python module that helps automate the Cisco
UCSM.
Reference
The Cisco Data Center Dev Center: https://developer.cisco.com/site/data-center/
Technet24
||||||||||||||||||||
||||||||||||||||||||
Chapter 20
The data center is home to the applications that provide needed services. There are
multiple different components, starting with the physical characteristics of the premises,
such as the power supply and cooling, going through the infrastructure components, such
as the compute, network, and storage building blocks, and moving up through the appli-
cation stack to the operating systems, virtualization, microservices, and everything else
that’s needed to create the redundant environment for our applications to run in and offer
services. Multiple different professionals are involved in the designing and creation of a
data center, and it continues to evolve and change over time. It is obvious that building
and operating a data center is a complex effort. That is why one of the most important
goals has always been to optimize and simplify this effort. The way this can happen when
it comes to the data center infrastructure and the whole application stack is through auto-
mation. In Chapter 19, “Using APIs,” we discussed APIs and the model-driven API frame-
work with the data formats and network configuration protocols such as NETCONF and
RESTCONF. These tools and standards are used in the automation. In this chapter, we
discuss the need for automation, what automation and orchestration are, Infrastructure
as Code, and some of the toolsets supported with the Cisco data center solutions. The
focus will be on the data center infrastructure components—the network, storage, and
compute.
Automation Basics
The data center infrastructure components consist of multiple separate devices that need
to be interconnected and configured for initial deployment. Afterward, they need full
configuration in order to achieve the needed levels of integration. Then, when we start
to use them—or, in other words, to operate them—we have to monitor and troubleshoot
them in order to maintain a stable and redundant environment for our applications. This
effort requires multiple different administrators to plan, configure, and monitor multiple
separate devices. Each needs to be accessed, to perform the initial configuration, and
then fully configured to make sure it operates as designed.
Technet24
||||||||||||||||||||
Troubleshooting is another huge challenge in the data center. We all know about the big
support teams that are maintained by the vendors, their partners, and customers. This
is because when it comes to troubleshooting, a simple workflow is followed—define
the problem, find out the root cause for the problem, and apply a fix. Although this is a
simplified view of the process, which involves many more steps and much more effort,
we will use it to demonstrate the challenges of troubleshooting in a modern data center.
Now, to define the problem, engineers need to take multiple steps to isolate which device
(or devices) is affected. Is it a feature that runs on a single device? Is it a global running
process? For this to happen, even with the use of complex monitoring systems, multiple
devices still need to be accessed separately, endless logs need to be examined, and con-
figurations need to be checked. And the interesting thing is that in most cases, the issue
is a result of human intervention. A human-induced error results in a wrong configura-
tion, broken process, or disruption in the infrastructure that might even affect the avail-
ability of the services. There are two huge challenges when it comes to troubleshooting—
how to make sure the correct configuration is applied to the correct devices at scale, and
how to be proactive in preventing failures due to such errors.
These challenges accompany the whole operations lifecycle of the data center solutions.
The lifecycle, shown in Figure 20-1, is usually represented by days:
■ Day 0: Onboarding of the device in the infrastructure, which includes the very initial
basic setup.
■ Day 1: Configuration and operation. This is the stage at which the full configuration
is applied, depending on the solution from which the device is part of. The device
becomes fully operational as a component of the infrastructure.
■ Day 2: Monitoring and troubleshooting. The device is operational and is being moni-
tored to identify and fix any issues.
■ Day N: Optimization and upgrade. This stage includes all the activities related to the
optimization of the configuration, the solution, and the maintenance, such as apply-
ing patches and software upgrades.
Going through all the stages, from Day 0 to Day N, requires the administrators to access
each device separately, perform the initial setup, and make sure that each device has the
basic needed connectivity. Then, on each device separately, they need to create the full
configuration, test it, and then operate, monitor, and troubleshoot, again on a per-device
basis.
The solution to these challenges is automation, which is the idea of using a wide range of
technologies and methods to minimize human intervention. With automation, the goal is
to be able to work with multiple devices and configurations, as we want to have predict-
able results and avoid any issues created by human error. We want to have faster results,
and the data centers need to be agile, as the services require changes to be implemented
quickly in the data center.
||||||||||||||||||||
create the needed configuration, using a supported programming language, test it, and
then to send it for execution on the device through the API. This allows the administra-
tor to send the configuration to multiple devices simultaneously. And this is just one
example. Another use case is when another application uses the APIs to connect to other
devices to make the needed configurations and changes.
The automation goes hand-in-hand with another concept in the data center—orchestra-
tion. Orchestration is the ability, through automation, to manage and configure entire
infrastructures or data centers, up to the level of the application. To do so, the orchestra-
tion uses workflows, which specify the exact sequence of tasks to be executed on the
devices with the needed actions to be performed in order to manage the resources of
the data center. The tasks in the workflows are automated pieces of actions, as each is
executed on a specific device. An example would be to configure interfaces on a switch
or to create a virtual machine. Then, the tasks are put in the correct order of execution,
depending which component needs to exist before the creation of another. This is how
the workflows are created and then automatically executed.
The automation and orchestration can be implemented in a data center at different levels.
We can have some limited automation, affecting some components of the infrastructure
and certain functionality, or we can have automation and orchestration that are fully
implemented, allowing for provisioning of resources up to the level of deploying applica-
tions. In this last scenario, this is the level of automation required from a data center that
will offer cloud services. The adoption of the automation and orchestration affects each
of the operations lifecycle stages. Depending on the technologies in use, we can have
automation and orchestration that will automate the initial provisioning of the devices on
Day 0, or to orchestrate the full configuration and creation of virtual resources on Day
1. The automation can also be a component of Day 2, to facilitate proactive monitoring
and troubleshooting approaches, or in the maintenance and optimization of Day N, by
Technet24
||||||||||||||||||||
One example of the benefits of using automation and orchestration in a data center com-
ponent is software-defined networking (SDN). This approach uses the idea of separating
the control and management plane processes to a dedicated SDN controller and leaving
to the physical switches only the task of maintaining the data plane (that is, to switch
the Ethernet frames). The administrators will connect only to the SDN controller, which
is usually an application that is capable of orchestrating, monitoring, and managing the
switches in an automated manner, and this is where all the configurations will be created.
Then, these configurations will be pushed to the correct devices. The control plane pro-
cesses, such as routing protocols, will be running on the SDN controller, and as the result
of their operation, they will be pushed only to the devices that require this information.
Therefore, the resources of the data plane devices are utilized better, as they do not have
to run such processes any more, and only the devices that need them receive the appro-
priate configuration. At the same time, the SDN controller will collect streaming telem-
etry and other monitoring information, analyze it, and present it to the administrators.
Depending on the SDN solution, as there are a few on the market, the system can take
proactive measures to mitigate certain issues, or it can limit itself only to notifications for
certain events.
Maybe one of the best examples of a solution for the data center networking component
is the Cisco Application Centric Infrastructure (ACI). You already know about ACI from
the first chapters of this book, but from the perspective of discussing automation and the
orchestration, which are the main principles in the core of the SDN concept, Cisco ACI
goes several steps further, as it utilizes a new approach to networking, looking from the
perspective of the communication needs of the application.
Cisco POAP
One technology for automating the initial provisioning during Day 0 is Cisco PowerOn
Auto Provisioning (POAP). It is supported in Cisco NX-OS and allows the processes
of configuring and upgrading devices to be automated. This significantly reduces the
manual tasks related to the deployment of networking and storage communication infra-
structures in the data center.
The POAP feature is enabled by default, and for its successful operation it needs to have
the following components in the network (Figure 20-2):
■ DHCP server: Used to assign IP addressing information (IP address, gateway, DNS)
||||||||||||||||||||
Cisco POAP
Power Up
YES NO
Continue the
Switch reboot
boot process
The POAP process starts with booting up the switch. If the switch has a configuration,
it is loaded, and then the switch goes into normal operation. However, if there is no con-
figuration, by default, the switch will enter POAP mode. Then the following will occur:
■ DHCP discovery: The switch will send a DHCP request through the management
interface.
■ DHCP response: The DHCP server will reply with the needed IP addressing con-
figuration and with information about the IP address of the TFTP server and the
configuration script filename.
■ TFTP server communication: The switch will connect to the TFTP server and down-
load the configuration script file. It will execute the file.
■ Software images (kickstart and system image files) download: If the configuration
script file specifies that the switch has to upgrade its software image files, it will
download the images, install them, and reboot the switch.
■ Normal operation: At this point, the POAP process has finished. The switch runs the
required version of the software, has the needed configuration applied, and is func-
tioning as expected.
Technet24
||||||||||||||||||||
The POAP configuration script file is developed in Python, and Cisco provides a
reference one that can be modified and reused. The configuration script file contains
the following information:
■ A procedure to get the configuration file and copy it to the startup configuration file
The reference POAP configuration script file for the Cisco Nexus 9000 switches can be
found at https://github.com/datacenter/nexus9000/blob/master/nx-os/poap/poap.py.
Cisco POAP is a great example of automation and orchestration that fits in Day 0 and
can be used to perform the provisioning of the devices in the infrastructure and to auto-
mate the process of upgrading the operating system.
There are two main components to the scheduler (see Figure 20-3):
■ Job: This is the task that needs to be performed. There can be multiple jobs.
When you’re working with the scheduler, it is important to keep in mind that if a fea-
ture you intend to use as a job requires a specific license, you have to check whether the
license is present on the switch. Otherwise, if the scheduler attempts to execute the job
and the license is not present, the job will fail. The other limitation is that if the command
to be executed is interactive, the job will fail again.
For job creation, you use the standard command from the CLI. You also have the flex-
ibility to use variables. Some of the variables are predefined in the system, such as
TIMESTAMP, but you can also define a variable with the command cli var name.
The scheduler is a feature for automating tasks on a per-device basis. It needs to be con-
figured and executed on each device.
||||||||||||||||||||
Cisco EEM
The Cisco Embedded Event Manager (EEM) is one extremely flexible tool for automating
the behavior of a Cisco NX-OS device. It allows you to configure events that will trigger
one or more actions.
■ Event: What will trigger the policy? Various events can start the execution of the
EEM policy, including the following:
■ CLI: When a specific command is executed in the CLI, the policy is triggered.
■ Action: What will happen when the event that triggers the policy occurs. This can
be one or more actions, including the following:
■ Applet: This is the configuration construct of the EEM that links events with
actions.
Technet24
||||||||||||||||||||
The applet is used to create an EEM policy in the CLI. There is also the option to create a
policy with a script in virtual shell and then copy it to the device and activate it.
Infrastructure as Code
With the growing need for more agility in the data center came the idea of changing
the way we look at the configuration of the infrastructure by looking at it as code. By
creating definitions of the desired infrastructure and then applying them to the needed
devices and components, we can create the needed environment in our data centers much
faster. Infrastructure as Code (IaC) is this process, and it uses machine-readable defini-
tions to orchestrate the infrastructure.
One of the best examples of why such an approach is needed is to look at the flow of
the DevOps. In a very simplified way, three major teams are involved in the DevOps—the
developers, the testers, and the operations team. The developers have the task of devel-
oping an application. For this purpose, they set up an environment in the data center
infrastructure using some servers running in the needed software environment, with the
needed security, network, and storage connectivity. After the application is developed in
this environment, it needs to be tested. However, in order for the results from the testing
to be valid and to find any real issues, the application needs to be run in an environment
that is the same as the development environment. The third stage is when the application
needs to be deployed into production. The production environment also needs to be the
same as the development and testing environments in order to be sure the application will
behave as planned during the development and testing phases. An operations team is usu-
ally responsible for setting these environments, and they need to be able to quickly repli-
cate the same environment—for development, then testing, and finally production—and
they might have to be able to scale it. That’s why defining the state and configuration of
the environment as a machine code makes the process easier and faster and helps reduce
the risk of human-induced errors. Also, it guarantees that the environment will be the
same. This is guaranteed by a characteristic of the IaC called idempotence, which means
that certain mathematical or computer operations can be applied multiple times with the
same result. Put in a different way, the idempotence of the IaC approach guarantees that
every time the same configuration is applied, the result will be the same.
■ Imperative: This is a procedural approach that focuses on how to achieve the needed
configuration state by defining the specific commands needed to achieve the desired
configuration.
Examples of IaC tools include Chef, Puppet, Ansible, Terraform, and PowerShell DSC.
We will take a look at some of these tools that are supported by Cisco NX-OS.
||||||||||||||||||||
Note Use of the terms “master” and “slave” is ONLY in association with the official
terminology used in industry specifications and standards, and in no way diminishes
Pearson’s commitment to promoting diversity, equity, and inclusion, and challenging,
countering and/or combating bias and stereotyping in the global population of the learners
we serve.
Cisco NX-OS supports integration with Puppet, an automation IaC tool developed by
Puppet, Inc., in 2005. The main product is Puppet Enterprise. There is also open source
Puppet software, which is convenient for small infrastructures consisting of several devic-
es. For medium-to-big infrastructures it’s recommended to use Puppet Enterprise. Cisco
NX-OS has a native Puppet agent that can communicate with the Puppet master server.
Puppet uses a declarative approach. The Puppet Domain Specific Language (DSL) is used
to create manifests, which are Puppet programs where the states that need to be achieved
by the devices are defined. As with any of the declarative IaC tools, the specific com-
mands to create the configuration to achieve the desired state are not used.
From an operational perspective, there are two components—the Puppet master and the
Puppet agent. The Puppet agent is installed on the end device. It collects information
for the device and communicates it to the Puppet master. Based on this information, the
Puppet master will have the knowledge of the IP address and hardware-specific informa-
tion as well as the supported device-specific commands. After a manifest is created, and
based on the information from the agent, Puppet will determine which commands are
needed for the device to get to the desired state, as described in the manifest.
■ A graph is built with the resources and interdependencies based on the information
from all the agents in the infrastructure. This allows the Puppet master to decide on
the order of the execution of the manifests.
■ The appropriate program is sent to each of the agents, and the state of the device is
configured based on the instructions.
■ The agent will send the master a report of the changes and any errors that might have
occurred.
In the Cisco data center infrastructure offerings, Puppet is supported on Cisco Nexus
9000, 7000, 5000, 3000 switches and the Cisco UCS. The appropriate Puppet agents need
to be installed. For the Nexus switches, this is done in the bash or guest shell. For the
Cisco UCS, the Puppet modules are built with Ruby and the Cisco UCS Python SDK.
Technet24
||||||||||||||||||||
Ansible uses a declarative approach by defining the desired state of the managed device.
The tasks are organized in playbooks, which are executed against the managed nodes.
When it comes to the nodes, the control node(s) have the task of managing the target
nodes. The control nodes are only supported on Linux at this time. The managed nodes,
or the target nodes, are described in inventory files in order for the control node to know
how to connect. One requirement for the managed nodes is to have Python installed. The
control nodes use a push model, which means they initiate the communication with the
target nodes.
■ Inventory files: Define the target nodes, as we can organize them hierarchically
using groups.
■ Templates: Generate configurations using the information from the inventory files.
This helps with generating faster configuration for larger environments.
■ Roles: Allow specific, commonly used tasks with properties to be organized into a
role and then reused.
There is support for Ansible in Cisco NX-OS, which makes it a very convenient tool for
network automation, configuration management, orchestration, and deployment for the
Cisco Nexus switches and the Cisco UCS.
Red Hat Ansible Automation Platform, with the automation controller (formerly Ansible
Tower), allows the IT and DevOps teams to define, operate, manage, and control access
to the automation tasks at a centralized place. For up to 10 users, Ansible is free to use
and test, and it offers a web UI and REST API.
||||||||||||||||||||
The configuration files are created using a proprietary syntax called HashiCorp
Configuration Language (HCL).
The Terraform configuration file works with resources as the fundamental construct. The
resources are organized in blocks. A resource block describes an infrastructure object,
along with the needed characteristics and the intent.
The target infrastructure is defined as providers, which contain the target IP address,
URL, credentials, and so on. HashiCorp maintains and continues to grow a huge list of
official providers; there is support for such external resources as AWS, Microsoft Azure,
Google Cloud Platform, OpenStack, VMware vSphere, and others.
Cisco Systems is an official provider for HashiCorp, and the Terraform software can
be used for automation and integration of the Cisco Data Center, Cloud, Security,
Enterprise products.
The only thing needed for a Python script to be executed is a Python interpreter. It is
installed natively on Cisco NX-OS. In the recent past, there was a requirement for the
Cisco Nexus switches to use Python v2.7, but this limitation is no longer valid. Cisco
NX-OS supports all the latest Python releases. In addition to the Python interpreter in
Cisco NX-OS, for the Cisco UCS Manager there is the Cisco UCS Python software devel-
opment kit (SDK), which provides support for the automation of all aspects of the Cisco
UCS.
Support for Python on the Cisco data center devices allows for the following:
Support for Python is a feature that allows for the integration of the Cisco data center
devices and solutions with automation and orchestration systems. Because of the open
source nature, there’s enormous support for extensions and ease of use.
Technet24
||||||||||||||||||||
There might be other questions that need answers as well; it all depends on the specific
situation, infrastructure, and capabilities. But in any situation, to be able to make the cor-
rect choice for your needs, you need to know the capabilities of your devices and what
they support.
The Cisco data center devices, such as the Cisco Nexus and MDS switches and the Cisco
UCS, run Cisco NX-OS. It contains open source software (OSS) and commercial tech-
nologies that provide automation, orchestration, programmability, monitoring, and com-
pliance support. Additionally, there are tools and SDKs to extend the support for automa-
tion, programmability, orchestration, proactive monitoring, and troubleshooting.
Table 20-1 summarizes the types of support for automation and orchestration during the
various operations stages.
You will need to have a deeper knowledge and understanding of the different automation
tools and platforms. This chapter is only intended to give you a glimpse and a starting
point into the world of automation and is not meant to be an extensive guide. That’s why
it is strongly recommended that you continue your education in the field of data center
automation. The official Cisco training material and books from Cisco Press can help you
in your journey.
||||||||||||||||||||
Summary 643
Summary
This chapter describes the basics of data center automation and orchestration. In this
chapter you learned about the following:
■ Day 1 is the full configuration, and the device goes into normal operation.
■ Automation and orchestration can be used in each of the operation lifecycle stages.
■ Cisco POAP is a tool that can be used for provisioning devices in the infrastructure
on Day 0.
■ The Cisco NX-OS Scheduler is an automation toll per device. It allows us to schedule
the execution of jobs without the need for human intervention.
■ Cisco EEM allows us to manage the behavior of a device when specific events occur.
Technet24
||||||||||||||||||||
■ To choose the correct toolset, you must know the capabilities of the different auto-
mation tools and also what your devices support.
Reference
The Cisco Data Center Dev Center: https://developer.cisco.com/site/data-center/
||||||||||||||||||||
Chapter 21
Cloud Computing
The book so far has covered the different components of the data center, how they integrate
and interact with each other, and how the utilization of the resources is maximized
through abstraction and virtualization, with the sole purpose of creating the environment
in which the applications will run. This whole stack—starting with the physical infrastruc-
ture, building through the operating systems, the required middleware, up to the applica-
tions—is the de facto data center.
It is particularly important to understand the driving mechanisms for the evolution of the
data center. This will clarify the current stage of data center development—cloud com-
puting, or “the cloud.”
This chapter discusses cloud computing, its definitions and components, the different
deployment models, and the services provided. As there are a lot of cloud service provid-
ers and vendors of different cloud-oriented products and technologies, the cloud termi-
nology is often diluted and misused based on marketing needs and stories. That’s why the
cloud discussion here will be based on the definition of cloud by the National Institute
for Standards and Technology (NIST). We believe that this approach will allow readers to
understand and build a solid foundation of cloud computing concepts.
The ancient Greek philosopher Heraclitus had the following to say: “Everything changes
and nothing remains still.” Change is the only constant in human life. And, as the world
changes, so do the requirements for companies to develop their business, for organiza-
tions to continue their operations, and so on. This leads to the services offered by differ-
ent entities that address the need to continuously change and evolve, stay up to date, and
satisfy the needs of businesses or organizations.
Technet24
||||||||||||||||||||
So how does this change affect data centers, and what is the link to cloud computing?
The services are provided by applications. The applications run on top of servers, virtu-
alized or bare metal, which require operating systems and additional software to create
the environment for the applications. The applications have to be accessible by the end
users and administrators, and at the same time the applications have to be capable of
communicating with the databases, which are other applications running on servers, with
operating systems, and responsible for the data being organized and stored within a spe-
cific structure. Therefore, there is the need for communication, which is provided by the
networks in the data centers. The Cisco ACI is one such example, and it’s covered in
Chapter 8, “Describing Cisco ACI,” and Chapter 9, “Operating ACI,” of this book. All the
data needs to be stored somewhere, and this is where the storage infrastructures and the
storage systems are involved. Also, all these components need to be secured and moni-
tored because they are the components that build the data center. Said in a different way,
these components, working together, provide the needed computing resources for the
applications to run.
The aforementioned leads us to the conclusion that what will change, and what is chang-
ing, is how the computing resources provided by the data centers are utilized, managed,
and consumed! And this is the next step in the evolution of the data centers, and it is
known as “cloud computing.” Cloud computing is an approach in which the computing
resources are utilized in a way where their consumption can be measured, automated,
accessible, and managed.
One especially important clarification needs to be made: what are the “computing
resources”? The cloud resources are not only the resources of the server or virtual
machine required to run the applications. They also include the resources we need for
communication, as the network and security devices, being physical or virtual, also use
CPUs, memory, operating systems, and so on. The storage infrastructures and systems
also need CPUs, memory, and operating systems. Therefore, the term “computing
resources,” from the perspective of defining and describing cloud computing, includes
any and all hardware and software resources in the data center.
Three important documents were created by the computer security division at the
Information Technology Laboratory at NIST. These documents provided the first defini-
tion of “cloud computing” and its components:
||||||||||||||||||||
This definition specifies a flexible, highly automated approach to how the computing
resources of a data center, or multiple data centers, can be consumed and rented to exter-
nal or internal users. The five essential characteristics describe in detail the requirements
for such an environment. Based on that, the consumer of the resources does not need to
bother with where exactly the resources are and how they are managed. This is done by
the cloud service provider, the owner of the data centers. Also, the cloud service provider
is responsible to build the infrastructures with the needed redundancy, reliability, and
scalability. The required automation and orchestration need to be in place to allow the
computing resources to be provisioned and made available transparently for the end user.
Nowadays, users are highly mobile. That is why these resources, which are called “cloud
services,” must be accessible, which happens through network connectivity. Access
through the Internet or through protected lines is essential, as the user has to be capable
to connect from anywhere and on any device. The computer resources offered cover the
whole variety in the application stack—from providing compute in the form of virtual
Technet24
||||||||||||||||||||
machines, even dedicated physical servers, going through platforms, up to the application
resources that run in the cloud.
The cloud approach offers advantages for both the cloud users and the cloud service
providers.
For the end user, some of the benefits include the following:
■ The end user does not need to build and maintain a data center.
■ Scalability is not an issue. The resources scale based on the needs of the consumer.
■ Redundancy and reliability. Usually, the consumer needs to design and deploy solu-
tions to achieve the needed service uptime and to prevent loss of data. With the
cloud, there are solutions integrated with the infrastructure at different levels and are
optional.
■ Optimizing expenses. The cloud services are offered and consumed based on the
pay-as-you-go model, which means that the end user pays only for the resources
used and only for the amount of time they are used.
The cloud offers a way for the user to plan for and to manage costs. The expenses shift
from capital expenses (CAPEX), which are needed if the infrastructure needs to be pur-
chased and deployed, to operational expenses (OPEX) due to the way in which the cloud
services are offered, measured, and paid for.
The CAPEX for building a data center includes not only the budget to buy the needed
equipment, but also the real estate (the premises can be rented or bought), expenses for
the redundant power supplies and cooling (which is a huge expense), and the budget for
the needed services to design, deploy, operate, and maintain the entire infrastructure.
In the process, it is common to have unexpected expenses, which are exceedingly dif-
ficult to plan for. Also, it is hard to plan for future releases of the applications, additional
services needed, and additional resources that need to be added to the data center. Also,
experts are needed to operate and maintain the data center, including all its components,
up to the level of the applications. And these teams need to go through continuous edu-
cation, as the technologies evolve and change. In such a situation, a small- to medium-
sized company that needs to run its applications, but does not want to invest in building
the needed infrastructure, can benefit from the cloud services, as it will be able to plan
in advance, to upsize or downsize resources, and to rely on the mechanisms deployed by
the cloud service providers for redundancy and reliability based on service level agree-
ments (SLAs). And it is not only small- to medium-sized companies that can benefit from
using cloud services. Take Netflix for example—a huge content streaming company.
Netflix started out using its own resources, but the need to support massive scale and
optimize costs cause, Netflix to move to the Amazon Web Services (AWS) cloud. This
allows Netflix to support more than 200 million members in more than 190 countries
worldwide, resulting in more than 125 million hours of TV shows and movies watched
every day!
||||||||||||||||||||
The cloud service providers benefit from this approach in a variety of ways:
■ The cloud computing approach allows the owners of data centers to achieve huge
and efficient utilization of the resources. When the resources can be managed in
such a way, the costs for maintaining the data center and expanding its resources are
optimized as well.
■ This approach allows for increased profits and at the same time an expansion in the
services offered.
■ Through the adoption of orchestration and automation, which are mandatory when
building a cloud, the processes of operating and maintaining the data centers are
highly optimized, the results are repeatable, and the outcomes are predictable. This
leads to better usage of the computing and human resources.
■ Microsoft Azure
They offer a huge variety of cloud services, covering all the levels of the data center,
including infrastructure services such as VMs and storage as well as platform services
such as containers, microservices, and databases, and application services.
■ On-demand self-service
■ Resource pooling
■ Rapid elasticity
■ Measured service
All five essential characteristics are important, and they separate the cloud services offer-
ing from a hosting or colocation service.
Technet24
||||||||||||||||||||
On-Demand Self-Service
The NIST defines on-demand self-service as follows (see Figure 21-2):
“A consumer can unilaterally provision computing capabilities, such as server time and
network storage, as needed automatically without requiring human interaction with each
service provider.”
This definition immediately sets a requirement for the data center to have the needed level
of automation and orchestration, which will allow the user to access the cloud portal and
select a service and provision it, without waiting for administrators from the provider to
set it up. A simple example is how a user can open Outlook.com or Gmail.com and select
an offered service, such as registering and creating a personal email account. Then, with-
out waiting for any interaction with an employee from the provider, the account will be
created and available almost immediately.
Computing
resources
Measured Service
When the services are used, it is important for both the service provider and the user
to know what was consumed and how it was consumed, what resources were utilized,
and how the consumption changes in time (see Figure 21-3). This also applies to the
access bandwidth and the quantity of data transported. This is important because the
charges are calculated based on what was consumed and for how long. It is also impor-
tant for the end user, as they can downsize or upsize the resources they use, based on
the cost, the need for resources, and so on. Another aspect is that service providers
can analyze the data for consumption and trends in order to plan for expanding the
capacities of their data centers, as the ability to scale is important for the needs of the
customers.
||||||||||||||||||||
Cloud
services
Resource Pooling
Service providers rent their computing resources in an automated manner. They also have
to be capable of supporting multiple users simultaneously, and there must be enough
resources available for the customers.
At the same time, the resources used by the customers are the resources of the data
centers, which are not unlimited. To solve this situation, the NIST defines the resource
pooling characteristic (see Figure 21-4). It describes the need for the utilization of the
resources to be most effective. This requirement also means that the resources’ utilization
will become more flexible as well. The virtualized resources provided from a physical
server have to be available to multiple users, as a single user might not utilize them all.
And this becomes even more effective when we scale it to the level of a data center and
beyond. The end user does not know on which exact physical resources their workloads
are running—and does not need to know this information, as long as the resources are
available and the services are up and running and accessible.
For the purposes of compliance and to stay close to the customers, the cloud service
providers use the approach of dividing their resources into regions, countries, continents,
and so on.
VM
pools
Technet24
||||||||||||||||||||
Rapid Elasticity
When resources are not being used by an end user, they need to go back into the pools
to be available for new provisioning. This is the rapid elasticity characteristic (see
Figure 21-5), and it creates the impression that the cloud resources are infinite. However,
the more important aspect is that when the resources are managed in such an elastic man-
ner, the cloud providers are capable of supporting horizontal and vertical scaling of the
capacities, to support the needs of the users. As an example, some businesses see spikes
in the services they offer during certain periods of the year. For these time slots, the
companies will need more resources to keep their services up and running, and during
the rest of the year, these resources will be idle. If these companies were to buy and build
their own infrastructure, the cost would be too high for something that’s used for a short
period of time. By using cloud resources, these companies can be developing and testing
their services, using far fewer resources, while during the peak hours, days, weeks, they
can automatically, or manually, scale up and down. In this way, they will be capable of
providing only the needed resources, and thus expenses will be optimized.
VM
resources
Network
resources
Storage
resources
Because the cloud services are usually available remotely, and because the users are high-
ly mobile, the access to these services needs to be protected (see Figure 21-6). Once the
necessary secured access is provided, the services can be consumed over different private
and public networks.
||||||||||||||||||||
Cloud
services
Software
“As a Service”
Platform
“As a Service”
Infrastructure
“As a Service”
Virtualized Platform:
Infrastructure Application Services:
• Development
Resources: • Enterprise
and Runtime
Tools and • Collaboration
• Compute
Environment • Security
• Networks
• B2B/B2C
• Storage • Microservices
Technet24
||||||||||||||||||||
With the huge variety of services offered nowadays, the boundaries between them are
becoming blurred. That’s why the term Everything as a Service (XaaS) can be used, even
though it is not defined officially by the NIST.
The important point here is the “as a service” approach. When cloud services are offered
“as a service,” the customer can plan their costs, as the services are metered. Having mul-
tiple different cost models provides flexibility in the way services are offered.
The relationship between the cloud provider and the cloud customer has additional
parameters that can influence the cost. The most important parameter is the scope of
responsibility of the provider and the customer. Depending on the category of services,
the responsibilities of the provider can be limited only to the physical and virtual layers,
and the customer is responsible for everything else, or the provider can be responsible for
the whole application stack, as is the case of SaaS.
With IaaS, as the customer uses VMs or storage, the provider is responsible for the
underlying infrastructure as well as the reliability, compliance, and availability of the
resources. The customer is responsible for any software or data used or stored on top
of the resources. For example, if the customer decides to use a virtual server, which is a
VM, the provider is responsible for providing and securing the resources up to the level
of the virtual machine; from there, the operating system and installed applications are the
responsibility of the customer, including the needed measures to be taken to safeguard
the data. Some IaaS offerings also include the operating system, but this is more for con-
venience, as the customer is still responsible for maintaining and operating it.
NIST defines IaaS as follows “The capability provided to the consumer is to provision
processing, storage, networks, and other fundamental computing resources where
the consumer is able to deploy and run arbitrary software, which can include
operating systems and applications. The consumer does not manage or control the
underlying cloud infrastructure but has control over operating systems, storage, and
deployed applications; and possibly limited control of select networking components
(e.g., host firewalls).”
Operating Middleware
Networking Storage Servers Virtualization Data Apps
System and Runtime
||||||||||||||||||||
This also shifts the responsibilities, as now the cloud provider is responsible for the whole
environment, the access, the communication, the needed middleware, and the runtime.
The customer has to deploy the application, which will provide services, or the data if the
PaaS is a cloud database or data warehouse. The customer has to patch and update the
software under their control, define the access control, and back up the data.
NIST defines PaaS as follows “The capability provided to the consumer is to deploy onto
the cloud infrastructure consumer-created or acquired applications created using
programming languages, libraries, services, and tools supported by the provider. The
consumer does not manage or control the underlying cloud infrastructure including
network, servers, operating systems, or storage, but has control over the deployed
applications and possibly configuration settings for the application-hosting
environment.”
Operating Middleware
Networking Storage Servers Virtualization Data Apps
System and Runtime
■ AWS Lambda
■ Amazon Aurora
Technet24
||||||||||||||||||||
■ Azure Cosmos DB
One interesting Platform as a Service that’s offered by all the three major cloud service
providers (that is, AWS, Azure, and GCP) is microservices. With GCP App Engine, AWS
Lambda, and Azure App Service, the consumer can provision the whole environment in
which to directly publish their application. But this application is still more or less a tra-
ditional monolithic application. Microservices comprise a new architectural and design
approach for creating applications, as the applications are divided into separate processes,
and only the process needed to service a user request will run and then stop afterward.
This approach provides the benefit of extremely efficient utilization of the resources, as
there is no need to run all the processes of the applications and to keep them idle waiting
for the next request. The cloud providers have created and offer such environments where
the customers can utilize this approach.
When it comes to SaaS responsibilities, the provider is responsible for the underlying
infrastructure, connectivity, reliability, storage, security, runtime, middleware, and appli-
cations. The provider also has to make sure the services are compliant and available.
NIST defines SaaS as follows “The capability provided to the consumer is to use the
provider’s applications running on a cloud infrastructure. The applications are accessible
from various client devices through either a thin client interface, such as a web browser
(e.g., web-based email), or a program interface. The consumer does not manage or
control the underlying cloud infrastructure including network, servers, operating systems,
storage, or even individual application capabilities, with the possible exception of limited
user-specific application configuration settings.”
Operating Middleware
Networking Storage Servers Virtualization Data Apps
System and Runtime
||||||||||||||||||||
The list of SaaS offerings is exceedingly long, but here are just a few:
■ Salesforce
■ Google Apps
■ Dropbox
■ Box
■ Netflix
■ Cisco WebEx
■ Cisco Meraki
■ Cisco Umbrella
■ Cisco Duo
To understand what changes in the data center and why, it is important to understand the
relationship between the cloud and the consumer and how it develops.
The providers want to utilize the resources of their data centers better and increase their
profits. For that purpose, they need to make the resources available, and they have to be
able to manage these resources in a flexible way. They must adopt the needed levels of
virtualization and abstraction, organize the resources into pools, and allow for flexible
and quick provisioning of resources and their release. This can happen when orchestration
is used to dynamically manage resources and automate usage. For this to happen, differ-
ent tools and approaches are employed. Programs called orchestrators can coordinate the
different components of the data center, such as storage, compute, and network, by sup-
porting the required abstraction to bring up the virtualized environment. Additional auto-
mation through application programming interfaces (APIs) can be used for deploying con-
tainers, microservices, runtime environments, and the needed applications. Automation,
orchestration, and APIs in the data center were discussed in Chapter 19, “Using APIs,”
and Chapter 20, “Automating the Data Center.”
Once the data center has achieved the needed levels of automation and orchestration, the
resources need to be made available to the consumers. The providers create service offerings.
Technet24
||||||||||||||||||||
These offerings define what resources can be used, the cost model, the licensing, and
additional dependencies on other resources. These offerings not only have technical
aspects but also financial. This means that to create the offerings, providers need tools
that can simulate both the technical and business logic. Special applications called cata-
logs are used for this. The catalogs can communicate through APIs with the orchestrators
and the other management and automation frameworks of the data center to gain access
to the resources, which are exposed and grouped into different services. In the catalog is
information for the technical services supported by the orchestrators; then service offer-
ings can be created with the rest of the attributes—cost model, budgets, time restric-
tions, regional restrictions, reporting, chain of approval, and so on.
Once the service offerings are ready in the catalogs, they need to be presented to the
customers. For this, applications called portals are used. The portals are intended to pro-
vide the needed interface for the consumers to access and work with the cloud resources.
This whole relationship can be seen in Figure 21-11, which presents the cloud solution
framework. These are the separate components that build the cloud environment.
User Portal
Catalog
Resource Management
||||||||||||||||||||
The data centers, being a major part of a cloud environment, have certain architecture and
components that form the “cloud operating system” or the “cloud stack.”
To map the cloud solution framework to the data center in a more technical way,
Figure 21-12 shows the cloud solution architecture.
Cloud Services
At the bottom of Figure 21-12 is the physical infrastructure, which provides the resources
for the needed abstraction and virtualization. The virtualization is created on the top of
the physical servers, network communication, storage infrastructure, and systems. The
virtualization extends even outside single data centers for the purposes of redundancy
and high availability.
The next step is the system abstraction. This is when, with the help of automation and
orchestration, the virtualized resources are abstracted to be separated from the depen-
dencies on the specific systems and then organized into pools.
This creates the foundation on which the catalogs will be created and the services, such
as IaaS, PaaS, and SaaS, will be offered and published to the portals.
The terms catalog, portal, and orchestrator are used in this discussion for clarity. They
describe certain major functions that are fundamental for the cloud computing. These
functions can be implemented in separate applications, or they can be combined within
a few applications, even on a single one. It all depends on the size of the cloud that needs
to be created and the services to be offered.
These principles are used by the big three cloud providers—AWS, Azure, and GCP—
with their own implementations and automation. On their portals you can find further
documentation on the specific approach used to build their cloud environments.
Technet24
||||||||||||||||||||
■ Private cloud
■ Public cloud
■ Hybrid cloud
■ Community cloud
When the NIST’s definition was created, only these four types of clouds were defined
based on the technologies and customer needs at the time. Nowadays, with technology
developments and the need for services, there are also the multicloud and government
cloud types.
Private Cloud
The NIST defines the private cloud as follows (see Figure 21-13):
“The cloud infrastructure is provisioned for exclusive use by a single organization com-
prising multiple consumers (e.g., business units). It may be owned, managed, and oper-
ated by the organization, a third party, or some combination of them, and it may exist on
or off premises.”
In the infrastructure, the needed automation, orchestration, catalog, and portal functions
will be implemented to allow for the management of the resources and the creation of
catalogs and service offerings, which will be published either internally or externally in a
protected manner.
The services made available to the employees, partners, and customers will be provided
only by the data centers of that company. It is the responsibility of the company’s IT
teams to deploy, operate, and maintain the cloud environment.
■ Ownership of all resources: The resources are provided by the infrastructure of the
company or organization, which allows for the full control and management of the
resources. The resources are not usually shared with other cloud users. This excludes
the access provided to the partners and the customers of the company.
||||||||||||||||||||
■ Compliance: The organization or the company, depending on its activities and the
region or country in which it resides and operates, might be subject to laws or regu-
lations. Owning the resources makes compliance easier.
Private cloud
Orchestrated
Suited for information Intended for use by Consolidate, virtualize, provisioning and cost-
Pay-as-you-go
that requires a high internal users, partners, and automate data metering interfaces to
chargeback
level of security and customers center resources enable self-service IT
consumption
■ Big capital expenses: When the company operates its own data center, it incurs the
capital expense for building and deploying the infrastructure and the cloud environ-
ment.
Technet24
||||||||||||||||||||
Public Cloud
The NIST defines the public cloud as follows (see Figure 21-14):
“The cloud infrastructure is provisioned for open use by the general public. It may be
owned, managed, and operated by a business, academic, or government organization, or
some combination of them. It exists on the premises of the cloud provider.”
Public cloud
Owned and
Private Cloud
maintained by
Services
one organization
These are cloud environments created with the goal of providing resources to external
entities. We already mentioned the three big public clouds: AWS, Azure, and GCP. The
infrastructure is owned and managed by an entity, and usually the resources are rented
for profit. The public cloud providers build multiple data centers, in different geographi-
cal locations, with the goal of being close to the consumers, to achieve the needed reli-
ability and redundancy and to be compliant with any local regulations or laws.
Another aspect, which has been the subject of heated discussions since the beginning, is
the security of the information. Because the consumers are not in control of the underly-
ing infrastructure, which also includes the storage systems, security is the responsibility
of the cloud providers. They must take the needed measures to protect the data when
it resides on their storage infrastructure (data at rest) but also provide means for the
customers to protect their data on the wire (data in transit). The way consumer data is
handled is also a subject of different laws and regulations. Even when the public cloud
providers have taken the needed measures and have deployed solutions to take care of
||||||||||||||||||||
the customers’ data, there’s still the need for external proof. That’s why public cloud
providers are subject to constant audit processes from external regulatory and standard-
ization bodies.
■ Self-service
■ Pay-as-you-go billing
■ No capital expenses
■ Scalability
■ Cost, which might be more than the cost of running a data center on premises
Hybrid Cloud
Companies and organizations differ in size, activities, and needs. Because of that, when
an application or workload needs to be deployed, the different requirements and possible
solutions need to be considered. For some companies, a specific workload might be more
cost effective to run on premises on its own equipment. In other situations, that same
company might be better off running certain workloads in a public cloud. For example,
consider a DevOps process. DevOps is a set of procedures and tools as well as a mental-
ity for how the work of the development teams and operations teams should be orga-
nized, where a product of the development team can be easily tested and then deployed
in a production environment. This means the operations teams need to be capable of cre-
ating and re-creating the same environment used first in the development, then in testing,
and finally for deployment in production. However, there’s one additional important dif-
ference: the production environment, unlike the development and testing environments,
has to be capable of scaling!
The process of building a specific environment in a private data center can be time- and
resource-consuming. It requires planning. In some situations, additional equipment needs
to be acquired, and resources are needed to set up that equipment and configure it. In
other words, this process usually takes time. And when it comes to scaling, the process is
the same, unless it was taken into account from the beginning, but this is rarely the case.
Technet24
||||||||||||||||||||
If the company is required to run these workloads in-house, the task becomes compli-
cated. The company will have to build these environments in its own data center and deal
with the scalability challenges.
The opposite can also be true: the company can build the development and testing envi-
ronments, but it isn’t cost-effective to run the production environment with the needed
scale.
Also, the company might develop a solution that’s idle, or at least working at a minimum
capacity, for most of the year, and then for a specific short period of time, the solution
requires massive computing resources. How does one justify a multimillion-dollar budget
for equipment that will be used for only one month (or less) of the year?
In such situations, companies can benefit from combining the resources of their private
clouds and a public cloud. This is known as a hybrid cloud (see Figure 21-15), which is
defined as follows:
Hybrid Cloud
||||||||||||||||||||
As of late, the industry has adopted the use of the term multicloud, which is an upgrade
of the hybrid cloud, as it demands integration between the private clouds of a customer
with a minimum of two public clouds. Currently the hybrid cloud has the widest adop-
tion, as companies are starting to move in the direction of multicloud.
The major benefit of using hybrid and multicloud is that the consumer can dynamically
decide which environment offers the best combination of price, resources, and protection
for each workload. This allows the workloads to be moved between the clouds in a trans-
parent way for the end user.
Community Cloud
The community cloud is a specific type of collaboration, or resource sharing, between
the private clouds of organizations or companies in a specific industry or field. The need-
ed level of integration between these clouds allows only the participating organizations
to have access. As per the official definition, a community cloud is defined as follows (see
Figure 21-16):
Hospital
Research
Center
Technet24
||||||||||||||||||||
As you can tell from the definition, maintaining and operating the integration and
resources among the clouds that share these resources is a shared responsibility. It must
be stressed that the resources are not available publicly.
Community clouds are created because of a specific need to share resources, but the data
shared is heavily regulated and needs to stay compliant. Examples of areas where commu-
nity clouds are applicable are science research, healthcare, and insurance.
Government Clouds
Various countries and their governments have certain requirements when it comes to
working with their information and their agencies. In these situations, a public cloud can
be an obstacle, as the public environment might not adhere to the governmental require-
ments. For these use cases, some of the public cloud providers, such as AWS and Azure,
have created totally isolated clouds, dedicated to the needs of some governments. The
government clouds operate using separate data centers that do not have any connectivity
to public clouds.
■ AWS GovCloud (U.S.): Specifically for the U.S. government, agencies, and state
authorities
■ Azure Government: Specifically for the U.S. government, agencies, and state author-
ities
■ Azure China 21Vianet: A dedicated cloud, located in China, that’s compliant with
the Chinese government’s regulations and operated by 21Vianet
Cisco Intersight
Cisco Intersight is an SaaS cloud operations platform that provides automation and
orchestration to support the lifecycle management of a private cloud. It is a modular
platform, allowing only the needed services to be used. As Cisco Intersight is an SaaS
platform, it runs in the Cisco cloud, which makes it accessible from everywhere. The data
center infrastructure is connected to Cisco Intersight and is managed and maintained
from there. The IT departments can add multiple data centers, as this will simplify opera-
tions and monitoring. Cisco Intersight offers services to manage the whole cloud stack—
from the underlying physical infrastructure, all the way up to the application level (see
Figure 21-17).
||||||||||||||||||||
CISCO INTERSIGHT
Intersight Intersight
Virtualization Service Kubernetes Service
3rd Party
Compute Storage Networking AWS
Infra
Delivered as SaaS
The Cisco Intersight Workload Optimizer helps IT departments solve the challenges of
underutilized on-premises infrastructure, public cloud overprovisioning, and cost over-
runs, optimizing the approach to troubleshooting and monitoring, which leads to saved
time cycles, in the multicloud. Support for more than 50 common platforms and public
clouds provides real-time, full-stack visibility across your applications and infrastructure.
This allows for optimal distribution of workloads across private and public clouds. Here
are some of the capabilities of the Intersight Workload Optimizer:
■ Manage resource allocation and workload placement in all your infrastructure envi-
ronments, in a single pane of glass, for supply and demand across the combined
private and public clouds
Technet24
||||||||||||||||||||
■ Optimize cloud costs with automated selection of instances, reserved instances (RIs),
relational databases, and storage tiers based on workload consumption and optimal
costs
■ Lower the risk for migrations to and from the cloud with a data-driven scenario mod-
eling engine
■ Dynamically scale, delete, and purchase the right cloud resources to ensure perfor-
mance at the lowest cost
■ Support modeling what-if scenarios based on the real-time environment for the
Kubernetes clusters
||||||||||||||||||||
Summary 669
Last, but not least, is the Intersight Terraform Service, shown in Figure 21-19, which
integrates the Terraform Cloud with Cisco Intersight and provides support for
Infrastructure as Code, an important component for any DevOps organization.
Because HashiCorp Terraform is supported in all the major public cloud providers, and
the DevOps teams can use it for automated provisioning of resources in the cloud, its
adoption in private clouds has been a challenge, as there was the need to deploy addi-
tional Terraform agents on premises, change firewall rules and access, and so on. With the
integration of the Terraform Cloud with Cisco Intersight, the adoption of Terraform has
become seamless and natural. Cisco Intersight already manages the private cloud, so the
automation extends from the public clouds used by your organization to the data centers,
thus forming a successful, transparent multicloud.
Summary
This chapter covered cloud computing, based on the NIST definition, by looking at the
essential characteristics of a cloud, the different deployment models, and the types of
cloud services. Additionally, this chapter reviewed the Cisco Intersight cloud platform.
Here are some other important points covered in this chapter:
■ The NIST was the first standardization body to provide a definition for cloud com-
puting, or “the cloud.”
Technet24
||||||||||||||||||||
■ Cloud computing is an approach for better utilizing and consuming the computing
resources of a data center (or multiple data centers) through orchestration and auto-
mation.
■ The NIST has defined five essential characteristics, three types of services, and four
cloud deployment models.
■ The cloud environment must possess the following characteristics: on-demand self-
service, measured service, broad network access, resource pooling, and rapid elastic-
ity.
■ The difference between the cloud services comes down to what computing resources
the customer uses and where the demarcation is between the provider’s responsibili-
ties and the customer’s.
■ There are three cloud deployment models: private, public, and hybrid.
■ The private cloud is the organization’s own data center with the needed levels of
automation and orchestration.
■ The hybrid cloud is a combination of private and public clouds with the needed level
of integration to allow workloads to easily move between the clouds.
■ The ability to integrate multiple public and private clouds, as well as cloud services
from other companies, has created what’s called the “multicloud.”
■ The Cisco Intersight is an SaaS cloud operations platform and a suite of multiple ser-
vices allowing for the management, provisioning, orchestrating, automating, deploy-
ing, and monitoring of resources across multiple private and public clouds, thus pro-
viding companies with the ability to create multiclouds.
■ The Intersight Infrastructure Service manages compute, storage, and network infra-
structures.
■ The Intersight Kubernetes Service centralizes and automates the management and
provisioning of containerized workloads.
||||||||||||||||||||
References 671
■ The Intersight Cloud Orchestrator allows for creating and working with workflows
to coordinate resources in the multicloud.
■ The Intersight Terraform Service integrates the HashiCorp Terraform Cloud to sup-
port Infrastructure as Code automation for the purposes of DevOps processes.
References
NIST Special Publication 800-144: Guidelines on Security and Privacy in Public Cloud
Computing, https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublica-
tion800-144.pdf
NIST Special Publication 800-145: The NIST Definition of Cloud Computing, https://nvl-
pubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf
NIST Special Publication 800-146: Cloud Computing Synopsis and Recommendations:
Recommendations of the National Institute of Standards and Technology, https://
nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-146.pdf
Netflix on AWS: https://aws.amazon.com/solutions/case-studies/netflix/
Azure for the U.S. Government: https://azure.microsoft.com/en-us/global-infrastructure/
government/
Azure Germany: https://docs.microsoft.com/en-us/azure/germany/germany-welcome
Azure China 21Vianet: https://docs.microsoft.com/en-us/azure/china/overview-
operations#:~:text=Microsoft%20Azure%20operated%20by%2021Vianet%20
(Azure%20China)%20is%20a%20physically,Center%20Co.%2C%20Ltd
AWS GovCloud (U.S.): https://aws.amazon.com/govcloud-us/?whats-new-ess.sort-
by=item.additionalFields.postDateTime&whats-new-ess.sort-order=desc
Technet24
||||||||||||||||||||
||||||||||||||||||||
Index
Technet24
||||||||||||||||||||
||||||||||||||||||||
Technet24
||||||||||||||||||||
676 authentication
||||||||||||||||||||
Technet24
||||||||||||||||||||
||||||||||||||||||||
Technet24
||||||||||||||||||||
||||||||||||||||||||
configuring 681
Technet24
||||||||||||||||||||
682 configuring
OSPFv2, 216–221
PIM, 236–237, 241–246, 252
D
RIPv2, 193–195 DAS (Direct-Attached Storage),
subinterface configuration command 372–373
mode, 50 Data Center VM-FEX, Cisco, 35
VDC, 166–176 data centers
vPC, 114–117 automation, 631
commands, 106–108 basics, 631–634
guidelines, 105–106 choosing automation toolsets,
limitations, 105–106 642
preparing for, 109–114 Cisco EEM, 637–638
sample topology, 108–109 Cisco NX-OS, Ansible
VRF, 135–136, 142–144 integration, 640
VRRP, 81–82 Cisco NX-OS, Puppet
integration, 639
VSAN, 430–438
Cisco NX-OS, Python
connectivity
integration, 641
ACI
Cisco NX-OS Scheduler, 636
external connectivity, 333–338
Cisco NX-OS, Terraform
L2Out, 337–338 integration, 640–641
L3Out, 334–337 Cisco POAP, 634–636
Layer 2 connectivity, 333–334 IaC, 638
Layer 3 connectivity, 333 basics, 1–3
storage connectivity, 361–362 Cisco Unified Data Centers, 4–5
containers vs. VM (Virtual Machines), Unified Computing, 5–6
573
Unified Fabric, 5
context-sensitive help, pipe (|), 53–54
Unified Management, 6
contracts, ACI, 321
cloud data centers, 4
control plane, vPC, 98–101
colocation data centers, 4
control plane, VXLAN, 270–280
computing infrastructures, 2, 12
Converged Infrastructure, Cisco,
Cisco Converged Infrastructure,
15–17
15–17
CoPP (Control Plane Policing),
Cisco Hyperflex Systems,
verifying, 128–132
17–18
Core Edge SAN topologies, 11
Cisco UCS, 13–15
CPU shares, VDC, 156–157
HCI, 17–19
cryptographic authentication, 215
DCB, 472
C-series servers, Cisco, 500, 501,
DCBX, 472, 476–477
514–521
enterprise data centers, 4
ETS, 472, 475–476
||||||||||||||||||||
Technet24
||||||||||||||||||||
data plane, vPC, traffic flow, 101–103 bidirectional shared trees, 229–231
data plane, VXLAN, 280–285 shared trees, 228
Data Transfer protocol, 404 source trees, 227–228
DCB (Data Center Bridging), 472 DME, 627
DCBX (Data Center Bridging DN (Distinguished Names), ACI API,
Exchange), 472, 476–477 MIT, 348
declarative IaC (Infrastructure as documentation
Code), 638 cloud computing, NIST
dedicated resources, VDC, 154 documentation, 646
default gateway redundancy, 65–67 NIST documentation, cloud
default VDC (Virtual Device computing, 646
Contexts), 148–149 DR (Designated Routers), OSPFv2,
DELETE method, 616 211
delivery assurance, applications, 3 DRM (Dynamic Power Management),
vSphere, 581–582
on-demand self-service, cloud
computing, 650 DRS (Distributed Resource
Scheduler), vSphere, 581
deployment models
DUAL (Diffusing Update Algorithm),
APIC, 303–307 EIGRP, 199
cloud computing, 660 dynamic pinning, FEX forwarding,
Cisco Intersight, 666–669 41–42
community clouds, 665–666 dynamic routing, 184–185
government clouds, 666
hybrid clouds, 663–665
private clouds, 660–661
E
public clouds, 662–663 Edge-Core-Edge SAN topologies,
deployments 11–12
server deployment models, 36 EEM (Embedded Event Manager),
End-of-Row (EoR) deployment 626, 637–638
model, 36–37 EGP (Exterior Gateway Protocols),
FEX deployment model, 38–39 185
Top-of-Rack (ToR) deployment EIGRP (Enhanced Interior Gateway
model, 37–38 Routing Protocol), 198–202
distance vector protocols, 185–188 commands, 202–203
distributed anycast gateways, configuring, 204–206
MP-BGP EVPN, 278 DUAL, 199
distributed computing, 2 metrics, 200
security, 3 RTP, 199
distributed switches, VMware verifying, 206–210
vSphere switches, 289–291 elasticity (rapid), cloud computing,
distrtribution trees, multicasting, 227 652
end command, 51
||||||||||||||||||||
Technet24
||||||||||||||||||||
||||||||||||||||||||
Technet24
||||||||||||||||||||
||||||||||||||||||||
Technet24
||||||||||||||||||||
J VTEP, 268–276
LAN (local Area Networks), Cisco
Unified Fabric, 479–482
JSON data model, 620
latency, RPC API, 614
Layer 2 address learning, VDC,
K 158–159
Layer 2 connectivity, ACI, 333–334
keepalives, OSPFv2, 210–211
Layer 2 domains, extended, data
keystroke shortcuts, 52–53 centers, 7–8
Kubernetes Anywhere, ACI Layer 3 connectivity, ACI, 333
integration, 356
Layer 3 FHRP (First Hop Redundancy
Protocols), 65
L gateway redundancy, default, 65–67
GLBP, 82
L2Out (Layer 2 Out), ACI, 337–338 ARP, 83–84
L3 specifications, Cisco APIC, forwarding, 82–83
307–308
interface tracking, 84–85
L3Out (Layer 3 Out), ACI, 334–337
operation, 83–84
L4-L7 service devices, ACI, 342–343
traffic forwarding, 84
labels, ACI, 321
HSRP, 68
LACP (Link Aggregation Control
commands, 72–73
Protocol), port channels, 89–90
configuring, 72–75
LAN (Local Area Networks)
load balancing, 69–70
VLAN
object tracking, 69
FCoE VLAN and VSAN, 493
preempting feature verification,
VLAN/VNI communication in
76–77
VXLAN, 276–285
states, 70–71
VXLAN, 267–268
verifying, 75–77
control plane, 270–280
versions of, 71–72
data plane, 280–285
VRRP vs., 80–81
end-host learning, 274–276
routers, 66
flood-and-learn, 270–273
VRRP, 78
forwarding, 285
configuring, 81–82
frame formats, 268–270
HSRP vs., 80–81
MP-BGP EVPN, 273–280
load balancing, 80
routing, 280–281
preempting, 80
VLAN/VNI communication,
276–285 prioritizing routers, 79
VMware vSphere switches, tracking, 79
287–291 Layer 3 resources, VDC, 159–160
||||||||||||||||||||
Technet24
||||||||||||||||||||
||||||||||||||||||||
Technet24
||||||||||||||||||||
||||||||||||||||||||
Technet24
||||||||||||||||||||
||||||||||||||||||||
routers 697
Technet24
||||||||||||||||||||
698 routers
||||||||||||||||||||
Technet24
||||||||||||||||||||
||||||||||||||||||||
switches 701
Technet24
||||||||||||||||||||
702 switches
||||||||||||||||||||
Technet24
||||||||||||||||||||
||||||||||||||||||||
virtualization 705
Technet24
||||||||||||||||||||
706 virtualization
||||||||||||||||||||
Technet24
||||||||||||||||||||