Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Cisco ACI: Delivering Intent for Data Center
Minh Dang
Cisco Systems
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CISCO CONNECT 2018 . IT’S ALL YOU
Intent Lifecycle Intent
Assurance
Configuration Analysis
“Very Large State-Space”
Analytics
Traffic Analysis
“Lots of Data”
Guarantees
Compliance
Consistency
Policy
ADM
Monitoring
Forensics
TetrationNetwork Assurance
Engine
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CISCO CONNECT 2018 . IT’S ALL YOU
Problem: DC Paradigms Are Fundamentally Reactive
Intent Frequently
Breaks …
Operational Troubleshoot
We Always React …
An Inability to
Assure Intent
Proactively
Leaving Us With …
Security Scramble to fix it
Compliance Fail audits
Change Undo changes
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CISCO CONNECT 2018 . IT’S ALL YOU
...Creating a Major Assurance Gap
VM
Controllers
How do I have confidence that I don’t have
errors due to my changes?
1
How do I rapidly analyze the network to
identify issues?
3
How do I easily understand the state of my
entire infrastructure?
2
Intent
Infrastructure
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CISCO CONNECT 2018 . IT’S ALL YOU
Intent Assurance
Intent Encompasses Data Center Operations
Configs, Changes, Routing, VMs, Security, … Compliance, Audits
The confidence that the
infrastructure is doing what you
intended it to do
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CISCO CONNECT 2018 . IT’S ALL YOU
Comprehensive, Intelligent, Continuous
Based on mathematical models of
the network
Continuously verifies and validates
the entire network
Proactively delivers the confidence
that the network is operating
correctly
Introducing Cisco Network Assurance Engine
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CISCO CONNECT 2018 . IT’S ALL YOU
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Idea Networks devices fundamentally are
deterministic
Leaf1
Spine
Leaf2
Header Data
0110101
Header Data
1000101
FW
We Can Build Comprehensive Mathematical Models of Network Behavior
Core Technology
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CISCO CONNECT 2018 . IT’S ALL YOU
Fortunately These Problems Have Been Solved
Chip Design
Functional and Physical
Design Verification, Lint,
Timing Analysis
Software Verification
Semantic Checks, Dynamic
Testing, Memory Profiling
Mars Rover
Mars Rover (B) Still
Operational After 14 yrs
with Formal Verification
Formal Methods Assure Intent
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CISCO CONNECT 2018 . IT’S ALL YOU
Cisco Network Assurance Engine: How It Works
Comprehensive
Network Modeling
Mathematically accurate models
spanning underlay, overlay and
virtualization layers
5000+ domain knowledge-based
error scenarios built-in, codified
remediation steps
Data
Collection
Captures all non-packet data:
intent, policy, state across data
center network
Intelligent
Analysis
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CISCO CONNECT 2018 . IT’S ALL YOU
PREDICTTHE IMPACT OF
CHANGES
Challenge
• Mainframe misconfiguration
in DR site
Potential Impact
• Mainframe cluster inaccessible
in case of fail-over event
Benefit
• Identify latent misconfigurations
before outages happen
• Avoid $$ in lost revenue
PROACTIVELYVERIFY
NETWORK-WIDE BEHAVIOR
Challenge
• Overlapping subnets due to
routes leaked across VRFs
Potential Impact
• Connectivity loss for Skype VoIP
and Video users
Benefit
• Continuous & proactive network-
wide dynamic state analysis
• Save days in downtime
ASSURE NETWORK SECURITY
POLICY AND COMPLIANCE
Challenge
• TCAM utilization hitting capacity,
inefficient security policy
definitions
Potential Impact
• Degraded security posture &
inability to deploy policies
Benefit
• Identified 17K+ redundant policies
• Surfaced opportunity for 20-70%
TCAM optimization
Stories from Customer Trials
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CISCO CONNECT 2018 . IT’S ALL YOU
User Interface: Centred Around “Smart Events”
Change Management Compliance and
Visualisation
Incidence and
Problem Management
Smart Events: What, Where, Why, and How
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CISCO CONNECT 2018 . IT’S ALL YOU
Cisco Network Assurance Engine – Dashboard
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CISCO CONNECT 2018 . IT’S ALL YOU
Transforming Change Management with NAE
(play video)
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CISCO CONNECT 2018 . IT’S ALL YOU
Tenant End-point Assurance
• Analyze
• Static configurations of VLANs, IPs, MACs ..
• Dynamic EP Learning, Mobility, …
• EP Connectivity, Communication …
• Common issues found
• Duplicate IPs: human error, NIC teaming, migrations, …
• DHCP errors
• EPs deployed against leafs without BD subnet
• EP table consistency across fabric …
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CISCO CONNECT 2018 . IT’S ALL YOU
Assure Network Security Policies & Compliance
(play video)
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CISCO CONNECT 2018 . IT’S ALL YOU
What Makes us Different?
Comprehensive
Capture, analyze and correlate
entire network state: across
security policies, forwarding,
end-points, TCAM utilization,
controller policies
Intelligent
5000+ built-in failure
scenarios, 30+ years of
Cisco Operational
knowledge
Continuous
Runs Continuously
Near real-time: collection,
modeling, analysis
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CISCO CONNECT 2018 . IT’S ALL YOU
Cisco Network
Assurance Engine
Deployment Model
No sensors
Read only credentials
Time to Value
30 mins to deploy
60 mins to value
Form Factors
Software only OVA
Lightweight: 3 VMs (v2.0)
Available Now 30 Day Free Trial Subscription Licensing
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CISCO CONNECT 2018 . IT’S ALL YOU
Available
Now
ACI Data Center
Fabric
Available
2018
Cross-platform
Network Integration Firewal
l
Virtual
Machine
Manager
Building a Rich Ecosystem Around Open API
Integration with
Operations Toolchains CWOM
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
ThankYou

More Related Content

[Cisco Connect 2018 - Vietnam] Minh dang hcmc_cisco aci_delivering intent for data center networking

  • 1. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Cisco ACI: Delivering Intent for Data Center Minh Dang Cisco Systems
  • 2. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential CISCO CONNECT 2018 . IT’S ALL YOU Intent Lifecycle Intent Assurance Configuration Analysis “Very Large State-Space” Analytics Traffic Analysis “Lots of Data” Guarantees Compliance Consistency Policy ADM Monitoring Forensics TetrationNetwork Assurance Engine
  • 3. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential CISCO CONNECT 2018 . IT’S ALL YOU Problem: DC Paradigms Are Fundamentally Reactive Intent Frequently Breaks … Operational Troubleshoot We Always React … An Inability to Assure Intent Proactively Leaving Us With … Security Scramble to fix it Compliance Fail audits Change Undo changes
  • 4. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential CISCO CONNECT 2018 . IT’S ALL YOU ...Creating a Major Assurance Gap VM Controllers How do I have confidence that I don’t have errors due to my changes? 1 How do I rapidly analyze the network to identify issues? 3 How do I easily understand the state of my entire infrastructure? 2 Intent Infrastructure
  • 5. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential CISCO CONNECT 2018 . IT’S ALL YOU Intent Assurance Intent Encompasses Data Center Operations Configs, Changes, Routing, VMs, Security, … Compliance, Audits The confidence that the infrastructure is doing what you intended it to do
  • 6. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential CISCO CONNECT 2018 . IT’S ALL YOU Comprehensive, Intelligent, Continuous Based on mathematical models of the network Continuously verifies and validates the entire network Proactively delivers the confidence that the network is operating correctly Introducing Cisco Network Assurance Engine
  • 7. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential CISCO CONNECT 2018 . IT’S ALL YOU © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Idea Networks devices fundamentally are deterministic Leaf1 Spine Leaf2 Header Data 0110101 Header Data 1000101 FW We Can Build Comprehensive Mathematical Models of Network Behavior Core Technology
  • 8. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential CISCO CONNECT 2018 . IT’S ALL YOU Fortunately These Problems Have Been Solved Chip Design Functional and Physical Design Verification, Lint, Timing Analysis Software Verification Semantic Checks, Dynamic Testing, Memory Profiling Mars Rover Mars Rover (B) Still Operational After 14 yrs with Formal Verification Formal Methods Assure Intent
  • 9. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential CISCO CONNECT 2018 . IT’S ALL YOU Cisco Network Assurance Engine: How It Works Comprehensive Network Modeling Mathematically accurate models spanning underlay, overlay and virtualization layers 5000+ domain knowledge-based error scenarios built-in, codified remediation steps Data Collection Captures all non-packet data: intent, policy, state across data center network Intelligent Analysis
  • 10. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential CISCO CONNECT 2018 . IT’S ALL YOU PREDICTTHE IMPACT OF CHANGES Challenge • Mainframe misconfiguration in DR site Potential Impact • Mainframe cluster inaccessible in case of fail-over event Benefit • Identify latent misconfigurations before outages happen • Avoid $$ in lost revenue PROACTIVELYVERIFY NETWORK-WIDE BEHAVIOR Challenge • Overlapping subnets due to routes leaked across VRFs Potential Impact • Connectivity loss for Skype VoIP and Video users Benefit • Continuous & proactive network- wide dynamic state analysis • Save days in downtime ASSURE NETWORK SECURITY POLICY AND COMPLIANCE Challenge • TCAM utilization hitting capacity, inefficient security policy definitions Potential Impact • Degraded security posture & inability to deploy policies Benefit • Identified 17K+ redundant policies • Surfaced opportunity for 20-70% TCAM optimization Stories from Customer Trials
  • 11. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential CISCO CONNECT 2018 . IT’S ALL YOU User Interface: Centred Around “Smart Events” Change Management Compliance and Visualisation Incidence and Problem Management Smart Events: What, Where, Why, and How
  • 12. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential CISCO CONNECT 2018 . IT’S ALL YOU Cisco Network Assurance Engine – Dashboard
  • 13. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential CISCO CONNECT 2018 . IT’S ALL YOU Transforming Change Management with NAE (play video)
  • 14. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential CISCO CONNECT 2018 . IT’S ALL YOU Tenant End-point Assurance • Analyze • Static configurations of VLANs, IPs, MACs .. • Dynamic EP Learning, Mobility, … • EP Connectivity, Communication … • Common issues found • Duplicate IPs: human error, NIC teaming, migrations, … • DHCP errors • EPs deployed against leafs without BD subnet • EP table consistency across fabric …
  • 15. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential CISCO CONNECT 2018 . IT’S ALL YOU Assure Network Security Policies & Compliance (play video)
  • 16. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential CISCO CONNECT 2018 . IT’S ALL YOU What Makes us Different? Comprehensive Capture, analyze and correlate entire network state: across security policies, forwarding, end-points, TCAM utilization, controller policies Intelligent 5000+ built-in failure scenarios, 30+ years of Cisco Operational knowledge Continuous Runs Continuously Near real-time: collection, modeling, analysis
  • 17. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential CISCO CONNECT 2018 . IT’S ALL YOU Cisco Network Assurance Engine Deployment Model No sensors Read only credentials Time to Value 30 mins to deploy 60 mins to value Form Factors Software only OVA Lightweight: 3 VMs (v2.0) Available Now 30 Day Free Trial Subscription Licensing
  • 18. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential CISCO CONNECT 2018 . IT’S ALL YOU Available Now ACI Data Center Fabric Available 2018 Cross-platform Network Integration Firewal l Virtual Machine Manager Building a Rich Ecosystem Around Open API Integration with Operations Toolchains CWOM
  • 19. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential ThankYou

Editor's Notes

  1. Cisco ACI: Delivering Intent for Data Center 
  2. That’s said, the pervasive problem that we all face is that our operational paradigms in Data Centers are fundamentally reactive. Narrative: We have operational issues – end up with long troubleshooting cycles and war-rooms We suffer a breach – and then look for where we left a hole in our policies, and do forensics after the fast We are often not compliant with business intent, and failing audits initially is not uncommon And how often do we make changes, only to roll-back because we made mistakes. It’s almost a norm, not an exception We fundamentally have the inability to ASSURE INTENT PROACTIVELY
  3. This is creating a major assurance gap in the industry. As an operator, given the power, the questions you’d much rather ask the network are much more fundamental. I’m making a set of changes to the network: how do I know that I haven’t introduced some blatant misconfigurations or errors that will bring down the application a couple of weeks from now. Maybe the security policy I programmed is conflicting with an existing deny policy I don’t know about, or I am programming a subnet overlapping with an existing subnet. Or I am migrating 500 VLANs from the legacy network to my new fabric, and fat finger 5 subnets. Likely I’ll only know about it weeks from now when some apps are not accessible and will take days to debug the errors. Wouldn’t it be nice if you had a system that PROACTIVELY analyzed all your policies and configurations for correctness and consistency and told you if you are making mistakes? Second, I have programmed the network, great. But then these systems are dynamic systems, complex distributed systems. The fabric is learning prefixes from the outside world, what if a routing loop was created in the forwarding tables, or we learnt a more specific route from the branch so that traffic meant for an internal app will get diverted outside? Or when a VM moves from Leaf 1 to Leaf 45, if one in a million times the default gateway doesn’t get correctly programmed due to some connectivity issue, or the leaf ran out of TCAM space and all the policies didn’t get correctly programmed. We are now sitting with transient vulnerabilities. Or what if the Vmware admin programmed the port groups inconsistent with the APIC creating config mismatch. These are all extremely hard to find issues, even harder to reason about. But it’s critical to identify these because they can expose us to potential outages or vulnerabilities we have no clue about. Wouldn’t it be nice if you had a system that continuously analyzed your entire networks dynamic state – the forward state, end-point configs etc to ensure it is always consistent with your intent? 3. And finally, I am now programming the network in this abstracted language, in this new policy language with tenants, app profiles and EPGs but as networking folks, we need to understand the bottom view up of the network. Where are my BDs and VLANs sitting? Where are my EPGs deployed? How is connectivity being established between A and B? Reconstructing this bottom up network state is 80% of the challenge when I need to troubleshooting actual issues. Wouldn’t it be nice if you had a system that reconstructed the bottom up state of the network and correlated it to the policy, enabling you to troubleshoot issues order of magnitude faster?
  4. What we need is the ability to assure intent. It is a guarantee, the confidence that the infrastructure is doing exactly what you intended it to do That your changes and config are correct and consistent That the forwarding state has not drifted to a something bad That VMs deployment and movement hasn’t broken your reachability intent Or your security policies are achieving the segmentation goals per intent That they are always compliant with business rules and you can pass audits easily
  5. That’s what we are bringing to the market with Cisco Network Assurance Engine. It’s a whole new way to solving this problem It starts with building mathematically precise models of the network ---- For instance, we pick all your security contracts, represent them in a software model. Now you can ask that model all sorts of questions – can A talk to B, is A isolated, do we have any conflicting policies out of 1000s or millions of policies, and so on. We build models spanning security, forwarding, end-point configs, hardware resource utilization, policies etc. and This is the most comprehensive model of the network We didn’t stop there. We then codified1000s of failure scenarios right out of the box, that run against these models – continuously verify and validate the entire network. These checks are based on our experience of how networks should correctly operate, best design practices from our AS teams, and the collective knowledge we have across TAC cases from 1000s of customers. These failure scenarios run against the real-time models, continuously checking the network for correctness. That’s whats gives the operators the confidence that the network is indeed operating consistent with their intent. And here’s the key point – We can do this without needing to look at any packets – we build our models with all the configurations and dynamic state! And that make it’s fundamentally proactive, before any data traffic even enters the network. The product is AVAILABLE NOW. It is delivered in an entirely software form factor.
  6. The core idea here is that networks are deterministic. Every switch, router, firewall in the network essentially reads the header makes a decision on whether to push the packet, where, what priority etc. and changes the packet header. Essentially if you can infer this “network transfer function” you can predict and model the behavior of every device – in response to any change, or any incoming data packet. You tie these models across the dc and you have a mathematical model of the entire DC network. We do this using a class of technique called “formal techniques” which is just an academic word to specify techniques that are intelligent and can reason about the behavior of the network…
  7. Fortunately these ideas are NOT new, and the concept has been there from Academia. Researchers at Stanford, UIUC re-kindled an idea that was initially talked about almost a decade ago… They have been used extensively in other domains like chip design for instance. 1) These chips with billions of transistors they’re actually more complex than the networks we actually built. But amazingly, when you send a chip out for fabrication, it comes back and actually works most of the time it is because they have these set of tools built around formal methods. When the designer builds an adder or multiplier or a data pipeline in the high level language like verilog, he can look at his adder and check that given any 64 bit inputs this adder will always do A + B correctly without having to put every possible input stream into a simulation and checking it, that is computationally prohibitive. You can check that this adder, this finite state machine under any input, will not go into some funky state and that it will always complete its operation within 1 clock cycle and when the system translates that adder into the physical design - the gates and metal lines - it can check that they are both exactly the same giving you confidence that the chip will actually come back and work correctly. 2) The same in the software world, the developers have a whole set of tools around dynamic testing, checking memory profiling, checking pointers, checking variables etc.. And as a result, they’re able to catch 99% of the issues before the code is put into production and then putting monitoring tools like app dynamics to catch production traffic related issues. We don’t have that luxury in networks unfortunately we’ve been asked to always make changes in production with no tools, and ensure that there is zero down time. Kind of impossible, but that’s what formal methods help you assure and what Candid is bringing to this market … With formal methods you can assure intent
  8. Let’s double-click to see how it works. 1. Starting from the left – what data do we collect. Candid goes to every leaf, every spine in the network and collects all the configurations and control-place state, data-plane state, even hardware state like TCAM tables, VLAN tables etc. From the controller we pick up the entire policy and configs and a representation of the intent. In addition, we have the implicit intent based on the expected network behavior. 2. With all this we now build the comprehensive network model – underlay, overlay, and tenancy layers. 3. Against this model – we run checks based on 30+ years of Cisco operational domain experience. These checks are based on 3 things: i) our expertise on how networks and our hardware should correctly operate, - there should be no routing loops, or no overlapping subnets in a VRFs of duplicate Ips and so on. ii) best design practices that we learn from our AS teams. If you want a subnet to talk externally what are all the BD and L3out configs required, or all the access policies required to correctly deploy an EPG iii) finally, from our TAC cases. The 10% of of failure scenarios that cause 90% of failures in the field. Bringing this collective knowledge for all our customers. Every 15 mins orso, the engine builds the most real-time model of the network, and runs these checks against that model – like an intelligent robot watching your back, always checking the network for correctness.
  9. The first story is about a lurking human error in the config space. Heavy Equipment Manufacturer in the US. With a over a 100 leafs over 2 production fabrics. Mainframe device in a DR Datacenter. Innocent configuration error by operator: There was no contract to the Wan interface, which basically means traffic arriving from or destined to the mainframe subnet would basically be dropped. On a DR, this would prevent applications from failover from the production to the DR datacenter. Single error in tens of thousands of configuration code. [This is a company that counts thousands of dollars per minute of downtime for some of their applications…] This was a potential $M outage in case of a fail-over event, that we were able to avoid proactively. The second example is related to analysis of the network-wide dynamic state. This was a Govt organisation in Europe. The users there were experiencing intermittent Skype traffic, with intermittency their VoIP and video communication. They eventually created a major ticket and were troubleshooting for days at which point they brought in Candid to look at their network. Literally in 15 mins, we found that they had a contract between 2 VRFs, leaking subnets that happened to overlap and leading to this issue. This is a classic example, had Candid had been in their product network ahead of time, we’d have caught the issue the moment the contract was created avoiding days of downtime. The third one shows the true power of the formal modeling approach. This was a European service provider, multi-tenant network. Over the last couple of years, they had huge policy sprawl, with100K+ security policies. They reached a point where 20% of their leafs were running a max TCAM capacity, and they were unable to push any configs or policies to the network reliably. We go pulled in that point. Literally in few hours of analysis Candid was able to identify that 20% of their policies were redundant, duplicate intent basically opening the same ports in mutiple policies. Further, by looking at hit counters, we were able to get granular insight on the up to another 50% policies that had never been used, giving them the visibility to have the conversation with their security teams on tightening their security aperture and optimizing TCAM utilization.
  10. Narrative: Discuss smart events, discuss the drilling down into human readable suggested next steps. The “Assurance Engine” talks to you…
  11. Assurance Everywhere. Cross Platform: F5, Sourcefire, Citrix, vCenter (??), Cisco logo (firewall), AVI Workflow Optimization: Turbonomic Operation Toolchain: Splunk Toolchain. Core Network Fabrics:
  12. Cisco ACI: Delivering Intent for Data Center