BRKCRS 3146
BRKCRS 3146
BRKCRS 3146
Cisco is bringing together the best of wired and wireless networking into
“One Network” with Converged Access on the Catalyst 3850 and 3650
Switches
In this session, learn about the capabilities of the 3850 and 3650 switches and
troubleshoot common issues seen on the 3850 and 3650 running the IOS-XE
Operating System. Learn about the switch architecture and troubleshooting
hardware, RTU Licensing, Boot-up Sequence, Memory and CPU utilization,
Stacking, High Availability, Forwarding features on the UADP ASIC, and QoS.
Your Instructor today …
Naoshad Mehta
Principal Engineer, Enterprise Campus Switching Group
I’m a Principal Engineer with the Enterprise Campus Switching Software team at Cisco.
My current focus is the adoption of Catalyst 3850/3650 and Converged Access Architecture
in the marketplace. I’ve been with Cisco for 13+ years. My primary responsibility since 2010
was the delivery of the Catalyst 3850/3650 and CT5760 Wireless Controller. I have been
intimately involved with the design and implementation of almost every software aspect of
the 3850/3650 and I’m here to help you learn more about the architecture and how to
troubleshoot the 3850/3650.
C3560G
Sasquatch ASIC
IOS package A
C3750G
IOS package B
Limited Modularity and
Flexibility, No Aggregation SKU
C3750E
IOS package C
Strider ASIC
C3750X
NGWC Switching Portfolio
IoT Protocols
VXLAN
Wireless CAPWAP
Up to 1000 Clients Termination
per Stack
Fixed 1G/10G Uplinks
SGT/SGACL
40 Gbps Uplink
Bandwidth
Granular QoS/Flexible
NetFlow
Interface HA
Wireless Controller
Manager Consolidated
IOSd RP/LC
Logging
Availability Framework
Forwarding &
Stack Manager (3K)
Feature Mgr (FFM) Internal IPC Licensing
Services
Features PD Comet
External
Libraries/
Utilities Services
Platform UADP ASIC Transports
Drivers Drivers
(TCP/SCTP/UDP) Services
Platform
Low Level APIs Manager
System
Forwarding Engine Driver Packet Delivery Service
Manager
Kernel
Recommended Release IOS-XE 3.3.5
• First Release IOS-XE 3.2.0(SE) (Jan 2013)
• No further rebuilds after 3.2.3(SE)
• IOS-XE 3.3.0 supports 3650
• Many critical fixes in recommended release 3.3.5(SE) (Sep 2014)
Agenda
• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
Front Panel
System LEDs
LEDs Overview
System LEDs –LED
Front Panel Description
Definitions
• SYST LED • ACTV LED
Off = System off Off = Switch is not the active switch
Green = System operating normally Green = Switch is the active switch or is in
standalone mode
Blinking green = Running POST
Blinking green = Switch is in standby mode
Amber = System is malfunctioning
Amber = An error has occurred in the data stack,
Blinking amber = Network module, power supply, possibly related to active member selection
or fan module is malfunctioning
• XPS LED • S-PWR LED
Off = StackPower cable not connected or switch is
Off = No XPS cable installed or switch is in in standalone mode
StackPower mode
Green = Switch is connected to an XPS or to 2
Green = XPS connected and ready to provide StackPower neighbors in a ring configuration
backup power
Blinking green = Switch is connected to only 1
Blinking green = XPS is connected but cannot StackPower neighbor in a ring configuration
provide backup power
Amber = Fault detected
Amber = XPS is in standby or a fault condition
Blinking amber = StackPower configuration is
Blinking amber = Power supply in the switch has overbudget
failed and is being backed up by XPS
System LEDs –LED
Front Panel Description
Definitions (cont.)
• STAT LED • STACK LED
Off = Rather than indicating link status, the port Off = Rather than indicating stack status, the port
LEDs are indicating duplex, speed, stack, or PoE LEDs are indicating link, duplex, speed, or PoE
status status
Green = Port LEDs are indicating link status Green = Port LEDs are indicating stack status
• DUPLX LED • PoE LED
Off = Rather than indicating duplex status, the port Off = Rather than indicating PoE status, the port
LEDs are indicating link, speed, stack, or PoE LEDs are indicating link, duplex, speed, or stack
status status; None of the downlink ports have been
denied power or are in a fault condition
Green = Port LEDs are indicating duplex status
Green = Port LEDs are indicating PoE status and
• SPEED LED none of the downlink ports have been denied
power or are in a fault condition
Off = Rather than indicating speed status, the port
LEDs are indicating link, duplex, stack, or PoE Blinking amber = Port LEDs are indicating PoE
status status and at least one of the downlink ports has
been denied power or is in a fault condition
Green = Port LEDs are indicating speed status
• CONSOLE LED
Off = USB console is inactive
Green = USB console is active (RJ45 console is
inactive)
Back Panel LED Description
• CONSOLE SERIAL LED
Off = RJ45 console is inactive (USB console is active)
Green = RJ45 console is active (USB console is inactive)
• MGMT LED
Off = Link down
Green = Link is up with no activity
Blinking green = Link is up with activity
Agenda
• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
Image Naming Convention
cat3k_caa-universalk9.SPA.03.03.05.SE.150-1.EZ5.bin
Switch(config)# end
Switch# write (copy running-config startup-config)
Software Upgrade on 3x50
Software upgrade in Installed Mode is done via the “software install …” command
Copy the image bundle to USB flash and recover the switch by using the recovery
mechanism built into the switch from the Bootloader prompt:
emergency-install usbflash0:cat3k_caa-
universalk9.SPA.03.03.05.SE.150-1.EZ5.bin
Bundle boot image from USB, “software clean file flash”, copy usb
image bin to flash, “software expand file flash:<image.bin>”
Right To Use (RTU) / Honor Based Licensing
IP
IP Base
Services
Licensing Show commands
Switch# show license right-to-use slot 1
Slot# License name Type Count Period left
----------------------------------------------------------
1 ipbase permanent N/A Lifetime
1 lanbase permanent N/A Lifetime
1 apcount adder 4 Lifetime
SGMII UART
DDR3 - 1333
4GB DDR3 PCIe
w/ ECC UADP 1
Cavium 6230
800 MHz, 4 core CPU
2MB L2 Cache PCIe
UADP 2
ACT II
I2C
RTC
Boot Bus
E.g., Baseline memory usage 40%. Start troubleshooting when the memory goes above 70% and constantly keeps
increasing without adding any new configuration.
Memory show commands
Switch1# show processes memory sorted
System memory : 3930916K total, 1118032K used, 2812884K free, 221968K kernel reserved
Lowest(b) : 2252987972
PID Text Data Stack Heap RSS Total Process
IOS-XE Process
10623 56892 36452 92 Memory
Total 5400 196116 336728 iosd
5534 8716 311168 92 4620 136908 562460 fed
10619 21976 555372 88 13980 102320 723240 wcm
6032 4 97708 116 91996 99044 116676 idope.py
12259 4 193244 236 38244 73672 299464 wnweb_paster.py
5536 660 163524 88 4332 55968 336496 stack-mgr
6057 3532 137308 88 2200 54200 311676 ffm
6076 112 160908 88 6764 44728 233548 cli_agent
6058 1232 287972 88 8112 38352 438040 eicored
Memory show commands
Switch1# show processes memory detailed process iosd sorted
Processor Pool Total: 536870912 Used: 135242980 Free: 401627932
IOS Proce Pool Total: 16777216 Used: 9483360 Free: 7293856
IOS Tasks
PID TTY Allocated Increasing?
Freed Holding Getbufs Retbufs Process
0 0 168268072 31876024 126376204 0 0 *Init*
164 0 1534944 0 1558112 907264 0 NGWC DOT1X Proce
0 0 0 0 984492 0 0 *MallocLite*
1 0 657344 1544 678968 0 0 Chunk Manager
276 0 925564 297800 563696 0 0 os_info_p provid
39 0 415892 1856 376480 0 0 IPC Seat RX Cont
250 0 298204 464 320908 0 0 IPC LC Message H
Common Causes for high memory utilization
E.g., Baseline CPU usage 25%. Start troubleshooting when the CPU usage is consistently at 50% or above.
Troubleshooting High CPU
Identify the Culprit
4 Core CPU
Switch# show proc cpu sorted
Core 0: CPU utilization for five seconds: 96%; one minute: 7%; five minutes: 6%
Core 1: CPU utilization for five seconds: 5%; one minute: 1%; five minutes: 1%
Core 2: CPU utilization for five seconds: 0%; one minute: 0%; five minutes: 0%
Core 3: CPU utilization for five seconds: 41%; one minute: 1%; five minutes: 1%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process Platform Processes
5533 120300 1608989 74 0.29 0.40 0.42 1088 fed
5535 44890 1401868 32 0.24 0.11 0.10 0 stack-mgr
10582 416280 5787047 71 34.25 0.57 0.62 34816 iosd
6201 111520 119850 930 0.15 0.15 0.15 0 cpumemd
137% across 4
5534 38430 3608873 10 0.10 0.10cores 0.10 0 platform_mgr
10578 115030 4737397 24 0.10 0.12 0.11 0 wcm
5455 1500 40856 36 0.05 0.05 0.05 1088 slproc
6183 5270 211347 24 0.05 0.02 0.04 0 obfld
6185 4320 110250 39 0.05 0.01 0.03 0 console_relay IOS-XE Processes
6198 20900 186795 111 0.05 0.02 0.00 0 ffm
1 1700 1112 1528 0.00 0.09 1.43 0 init
2 0 138 0 0.00 0.00 0.00 0 kthreadd
3 10 1634 6 0.00 0.00 0.00 0 migration/0
4 0 3 0 0.00 0.00 0.00 0 sirq-high/0
Troubleshooting High CPU
Drill Down Deeper
Switch# show processes cpu detailed process iosd sorted
Core 0: CPU utilization for five seconds: 96%; one minute: 7%; five minutes: 6%
Core 1: CPU utilization for five seconds: 5%; one minute: 1%; five minutes: 1%
Core 2: CPU utilization for five seconds: 0%; one minute: 0%; five minutes: 0%
Core 3: CPU utilization for five seconds: 41%; one minute: 1%; five minutes: 1%
PID T C TID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
(%) (%) (%) Interrupt switched
traffic (eg. Wireless
10582 L 451160 6379641 70 34.25 0.71 0.60 34816 iosd Control)
10582 L 0 10582 414060 6194757 0 24.00 0.60 0.50 34816 iosd
10582 L 3 11543 36980 180107 0 10.25 0.11 0.10 0 iosd.fastpath
10582 L 2 11544 120 4777 0 0.00 0.00 0.00 34816 iosd.aux
6 I 57680 5216 0 3.00 0.33 0.22 0 Check heaps
304 I 2200 1790 0 12.17 0.00 0.00 0 HTTP CORE
218 I 2370 14495 0 8.33 0.00 0.00 0 IP Input
High CPU caused
211 I 190 214 0 0.33 0.00 0.00 0 RSMP Server
by HTTP traffic
306 I 10 23 0 0.11 0.00 0.00 0 SEP NODE PROC
5 I 0 2 0 0.00 0.00 0.00 0 IPC ISSU Dispatch
P
7 I 220 336 0 0.00 0.00 0.00 0 Pool Manager
3 I 0 1 0 0.00 0.00 0.00 0 HA-IDB-SYNC
Command Summary - High CPU
32 RX Queues 8 TX Queues
UADP ASIC
Common Cause for Punting Traffic to CPU
Common Cause Recommended Solution
Same interface forwarding change design, use “no ip redirect”
ACL logging disable ACL logging
ACL deny causing switch to send ICMP unreachable no ip unreachables1
Forwarding/Feature exception (out of TCAM/adj reduce TCAM usage
space)
SW-supported feature disable the feature or reduce the amount of
traffic
IP packets with TTL<2 or options disable the offending traffic
Broadcast Storm Fix STP loop, disable traffic
Core 0: CPU utilization for five seconds: 99%; one minute: 64%; five minutes: 69% Several cores
Core 1: CPU utilization for five seconds: 99%; one minute: 89%; five minutes: 80% experiencing
Core 2: CPU utilization for five seconds: 12%; one minute: 57%; five minutes: 69% high CPU
Core 3: CPU utilization for five seconds: 98%; one minute: 99%; five minutes: 91%
• 6 rings in total
• 3 rings go East Is math really an
• 3 rings go West opinion?
• Each ring is 40G
• Total Stack BW = 240G Assuming
4 x 24-port
• With Spatial Reuse = 480G 3850 Switches
Stack Interface
of UADP
Assuming 4
3
1
2
4 x 24-port
3850 Switches Destination
Stripping
Packet travels
½ the rings.
Taken out of
stack by
destination
3
1
2
4
Show commands
Switch# show switch detail
Switch/Stack Mac Address : 6400.f124.df80 - Local Mac Address
Mac persistency wait time: Indefinite
Priority, followed by MAC
H/WAddress
Currentdetermines which
Switch# Role Mac Address switch gets
Priority Version Stateelected as Active
------------------------------------------------------------
*1 Active 6400.f124.df80 10 0 Ready
2 Standby 6400.f124.de80 1 0 Ready
Sw#/Port# Port Status Neighbor Cable Length Link OK Link Active Sync OK #Changes to LinkOK In Loopback
Cable with corrupted
---------------------------------------------------------------------------------------------------------------
EEPROM
1/1 OK 2 50cm Yes Yes Yes 0 No
1/2 OK 2 Unknown Yes Yes Yes 0 No
2/1 OK 1 100cm Yes Yes Yes 1 No
2/2 OK 1 50cm Yes Yes Yes 1 No
Image Version Mismatch
• If the switches are in version mismatch state, they will not stack
• Debugging:
• Check if the switch version matches the active using show version command
• If they do not match, upgrade the switch to the Active’s version
Switch# show switch
* Indicates which member is providing the “stack Identity” (aka “stack MAC)
Show redundancy states
Switch# show redundancy states show redundancy history show redundancy
switchover history show redundancy
my state = 13 –ACTIVE
peer state = 8 -STANDBY HOT
Mode = Duplex
Unit ID = 2 Terminal state for SSO. If “peer
state” is stuck in any other state
Redundancy Mode (Operational) = SSO
for more than 10 minutes, open
a service request with TAC
Redundancy Mode (Configured) = SSO
Redundancy State = SSO
Manual Swact = enabled
Communications = Up
Refer to configuration guide and command line reference for full details
TCAM Concept on 3x50
TCAM used by several features
A Hash Table Manager (HTM) provides configurable resources to features
so they can select specific banks or hashing mechanisms
HTM provides the required abstraction layer to its users so that the details of
the TCAM HW is irrelevant
TCAM Utilization
Switch1# show platform tcam utilization asic all
CAM Utilization for ASIC# 0
Table Max Values Used Values
--------------------------------------------------------------------------
Unicast MAC addresses 32768/512 82/22
Directly or indirectly connected routes 32768/8192 7/89
IGMP and Multicast groups 8192/512 0/16
Security Access Control Entries 3072 173
QoS Access Control Entries 2816 52
Netflow ACEs 1024 15
Input Microflow policer ACEs 256 7
Output Microflow policer ACEs 256 7
Control Plane Entries 512 187
Policy Based Routing ACEs 1024 9
Tunnels 256 12
Input Security Associations 256 4
Output Security Associations and Policies 256 9
TCAM - ACL client region
The ACL’s total # VMR should not exceed 1375 VMR entries for the client
region. Once the VMR limit is reached there will be ACL UNLOAD EVENT and all of
that client’s packets will be dropped.
3750: When ACL limit is reached, packets are punted to the CPU and software
forwarded
3850: When ACL limit is reached, ACL is not downloaded and packets are dropped.
No Software forwarding capability
https://techzone.cisco.com/t5/NGWC-Switches-3650-3850/3750-MLS-to-3850-MQC-
Conversion-of-QoS-Configuration/ta-p/697153
Agenda
• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
Additional Troubleshooting Commands
Topic Command
Platform specific Feature information show tech-support platform <feature> (eg.
wireless, acl, fnf, etc.)
Trace Buffers for non-IOSd processes show mgmt-infra trace messages <component>
(eg. fed-punject-detail, stack-mgr-events, etc.)
Generate Live Core of a Process (internal resource process dump <process id obtained from
command) show process> [ switch <switch number> ]
Generate System Report (internal command) resource create_system_report