Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

BRKCRS 3146

Download as pdf or txt
Download as pdf or txt
You are on page 1of 89
At a glance
Powered by AI
The presentation provides an overview of the Cisco Catalyst 3850 and 3650 switches and covers troubleshooting common issues seen on these platforms. It discusses the switch architecture and troubleshooting hardware, licensing, boot sequence, memory/CPU utilization, stacking, high availability, forwarding features and QoS.

The main topics covered include: product review, hardware troubleshooting, image management and licensing, memory/CPU resources, stacking and high availability, hardware forwarding, QoS, miscellaneous tools and tricks.

Additional troubleshooting commands like show tech-support platform, show mgmt-infra trace messages, resource process dump, resource create_system_report and show memory debug leaks are discussed. The use of core dumps, crashinfo and system reports for troubleshooting is also covered.

Troubleshooting Cisco

Catalyst 3650 and 3850


Series Switches
Naoshad Mehta
Principal Engineer, Enterprise Campus Switching Group
Twitter: @naoshad, #CLUS, #convergedaccess
BRKCRS-3146
Troubleshooting Cisco Catalyst 3650 and 3850
Series Switches
Session Overview and Objectives

Cisco is bringing together the best of wired and wireless networking into
“One Network” with Converged Access on the Catalyst 3850 and 3650
Switches

In this session, learn about the capabilities of the 3850 and 3650 switches and
troubleshoot common issues seen on the 3850 and 3650 running the IOS-XE
Operating System. Learn about the switch architecture and troubleshooting
hardware, RTU Licensing, Boot-up Sequence, Memory and CPU utilization,
Stacking, High Availability, Forwarding features on the UADP ASIC, and QoS.
Your Instructor today …
Naoshad Mehta
Principal Engineer, Enterprise Campus Switching Group
I’m a Principal Engineer with the Enterprise Campus Switching Software team at Cisco.
My current focus is the adoption of Catalyst 3850/3650 and Converged Access Architecture
in the marketplace. I’ve been with Cisco for 13+ years. My primary responsibility since 2010
was the delivery of the Catalyst 3850/3650 and CT5760 Wireless Controller. I have been
intimately involved with the design and implementation of almost every software aspect of
the 3850/3650 and I’m here to help you learn more about the architecture and how to
troubleshoot the 3850/3650.

Prior to working on the 3850/3650, I have worked on a wide spectrum of technologies


(MPLS, Traffic Engineering, L2VPN, EVCs, etc.), Products (Nexus 7K, 7600, 7500, 7200)
and Operating Systems (Classic IOS, NXOS and IOS-XE).
Agenda
• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
Glossary
3850/3650 Switch Reference slide that may not
be presented in the session

A Active Switch S Standby Switch

FED – Forwarding Engine Driver WCM – Wireless Controller Module


PDS – Packet Delivery Service UADP – Unified Access Data Plane ASIC

3x50 – 3650 or 3850 Switch


Suggested Sessions and Reference Material
• BRKCRS-2889 - Converged Access System Architecture - Diving into the
'One Network’
• BRKCRS-2888 – Advanced Enterprise Campus Design: Converged
Access
• BRKARC-3438 - Cisco Catalyst 3850 and 3650 Series Switching
Architecture
• Cisco Unified Access Technology Overview: Converged Access,
http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps12686/whi
te_paper_c11-726107.html
• Cisco Enterprise Campus Infrastructure Best Practices
Guide,http://www.cisco.com/c/en/us/products/collateral/switches/catalyst
-6800-series-switches/guide-c07-733457.html
Agenda
• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
Catalyst 3K Switching Portfolio – Before NGWC

C3560G
Sasquatch ASIC
IOS package A
C3750G

IOS package B
Limited Modularity and
Flexibility, No Aggregation SKU
C3750E
IOS package C
Strider ASIC
C3750X
NGWC Switching Portfolio

IoT Protocols

VXLAN

Catalyst 3650 Catalyst 3850 SDN


1G Copper, 1G Fiber, mGig, 10G Fiber SKUs
Wireless

Same Hardware Architecture and UADP ASIC

Same Software Bundle for all switches

Modular uplinks, 10G Aggregation SKU


Catalyst 3850 Switch
Integrated
Wireless CAPWAP Controller:
Up to 50 APs and 480 Gbps
Termination Stacking Bandwidth
40G per switch

FRU Fans, Power


Up to 2000 Clients Supplies
per Stack
Stackpower
Full POE+

Granular QoS/Flexible SGT/SGACL


NetFlow
40 Gbps Uplink
Bandwidth
Line Rate on All Ports

Built on Cisco’s Innovative “UADP” ASIC


Catalyst 3650 Switch
New Front-End
Power Supplies
Modular 160 Gbps
9 member Stack
Up to 25 Aps/1000 clients per
stack, and 40G per switch
FRU Fans

Wireless CAPWAP
Up to 1000 Clients Termination
per Stack
Fixed 1G/10G Uplinks
SGT/SGACL
40 Gbps Uplink
Bandwidth
Granular QoS/Flexible
NetFlow

Line Rate on All Ports Full POE+

The foundation for full wired and wireless convergence on a


single platform.
IOS XE Evolution
IOS-XE IOS 12.2(52)SE IOS XE 3.3.5(SE )
• Modern IOS to enable
multi-core CPU Hosted
IOS IOSd Apps
• Easy customer
migration Features WCM
Features Components
Components
• While maintaining IOS
functionality and look
and feel
Common Infrastructure / HA
• Allow hosted
Management Interface
3.3.x Features
applications like
Wireshark • 9 member stack
Module Drivers • QoS Revamp
Linux
Kernel
Kernel • Wireshark
• HSRP
• UPOE
IOS XE Software Internals Overview
Service
Location

Interface HA
Wireless Controller
Manager Consolidated
IOSd RP/LC

Logging

Availability Framework
Forwarding &
Stack Manager (3K)
Feature Mgr (FFM) Internal IPC Licensing
Services
Features PD Comet
External
Libraries/
Utilities Services
Platform UADP ASIC Transports
Drivers Drivers
(TCP/SCTP/UDP) Services
Platform
Low Level APIs Manager

System
Forwarding Engine Driver Packet Delivery Service
Manager

Kernel
Recommended Release IOS-XE 3.3.5
• First Release IOS-XE 3.2.0(SE) (Jan 2013)
• No further rebuilds after 3.2.3(SE)
• IOS-XE 3.3.0 supports 3650
• Many critical fixes in recommended release 3.3.5(SE) (Sep 2014)
Agenda
• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
Front Panel
System LEDs
LEDs Overview
System LEDs –LED
Front Panel Description
Definitions
• SYST LED • ACTV LED
 Off = System off  Off = Switch is not the active switch
 Green = System operating normally  Green = Switch is the active switch or is in
standalone mode
 Blinking green = Running POST
 Blinking green = Switch is in standby mode
 Amber = System is malfunctioning
 Amber = An error has occurred in the data stack,
 Blinking amber = Network module, power supply, possibly related to active member selection
or fan module is malfunctioning
• XPS LED • S-PWR LED
 Off = StackPower cable not connected or switch is
 Off = No XPS cable installed or switch is in in standalone mode
StackPower mode
 Green = Switch is connected to an XPS or to 2
 Green = XPS connected and ready to provide StackPower neighbors in a ring configuration
backup power
 Blinking green = Switch is connected to only 1
 Blinking green = XPS is connected but cannot StackPower neighbor in a ring configuration
provide backup power
 Amber = Fault detected
 Amber = XPS is in standby or a fault condition
 Blinking amber = StackPower configuration is
 Blinking amber = Power supply in the switch has overbudget
failed and is being backed up by XPS
System LEDs –LED
Front Panel Description
Definitions (cont.)
• STAT LED • STACK LED
 Off = Rather than indicating link status, the port  Off = Rather than indicating stack status, the port
LEDs are indicating duplex, speed, stack, or PoE LEDs are indicating link, duplex, speed, or PoE
status status
 Green = Port LEDs are indicating link status  Green = Port LEDs are indicating stack status
• DUPLX LED • PoE LED
 Off = Rather than indicating duplex status, the port  Off = Rather than indicating PoE status, the port
LEDs are indicating link, speed, stack, or PoE LEDs are indicating link, duplex, speed, or stack
status status; None of the downlink ports have been
denied power or are in a fault condition
 Green = Port LEDs are indicating duplex status
 Green = Port LEDs are indicating PoE status and
• SPEED LED none of the downlink ports have been denied
power or are in a fault condition
 Off = Rather than indicating speed status, the port
LEDs are indicating link, duplex, stack, or PoE  Blinking amber = Port LEDs are indicating PoE
status status and at least one of the downlink ports has
been denied power or is in a fault condition
 Green = Port LEDs are indicating speed status
• CONSOLE LED
 Off = USB console is inactive
 Green = USB console is active (RJ45 console is
inactive)
Back Panel LED Description
• CONSOLE SERIAL LED
 Off = RJ45 console is inactive (USB console is active)
 Green = RJ45 console is active (USB console is inactive)
• MGMT LED
 Off = Link down
 Green = Link is up with no activity
 Blinking green = Link is up with activity
Agenda
• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
Image Naming Convention

cat3k_caa-universalk9.SPA.03.03.05.SE.150-1.EZ5.bin

cat3k = Platform Feature


Family Set IOS XE Version IOSd Version
S = Digitally signed Image
C = Converged Enabling/Disabling of
P = Production Image
A = Access features controlled by
A = Key Version
A = Access Switchinstalled license
Booting IOS-XE Software
Install Boot (default mode) Bundle Boot

• Packages are installed on flash • Packages are expanded in RAM


• Supports AP image pre-download • No support for AP image pre-
• No additional memory requirement download
• Image must be installed in flash: • Additional memory equal to the size
• software expand of image bundle required
• software install • Image can be booted from flash:,
usbflash: or tftp:
• boot flash:packages.conf
• boot flash:cat3k_caa-
universalk9.SPA.03.03.03.SE.1
50-1.EZ3.bin
3850/3650 Password recovery
• Password recovery on 3850/3650 do NOT follow the 3750 family procedure
• 3850 password recovery is as follows:
1. Power cycle switch and hold the Mode button (on the front top left) for a few seconds (officially
12) until the status LED gets amber, that will get you in Boot Loader prompt (Switch:)
2. Initialize flash
Switch:flash_init
Switch:

3. Set the following variables


Switch: SWITCH_IGNORE_STARTUP_CFG=1
Switch: SWITCH_DISABLE_PASSWORD_RECOVERY=0
Warning!
4. Boot the 3850 Console:9600 baud
8 data bits, No flow control
Switch: boot 1 stop bit, No parity
3850/3650 Password recovery – (Cont’d…)
5. Skip the initial configuration dialog and go to enable (no password required):

--- System Configuration Dialog ---


Would you like to enter the initial configuration dialog? [yes/no]: no
Press RETURN to get started!
Switch> enable
Switch#

6. Copy startup-config back to running-config:

Switch# copy startup-config running-config


3850/3650 Password recovery - End
7. Go to global configuration, and remove or change the password:
Switch# configure terminal
Switch(config)# no enable password
Switch(config)# no enable secret
Switch(config)# enable secret cisco

8. Enable reading of startup-config


Switch(config)# no system ignore startupconfig switch all

9. Disable password recovery if required


Switch(config)# system disable password recovery switch all

10. End the configuration and save the change

Switch(config)# end
Switch# write (copy running-config startup-config)
Software Upgrade on 3x50
Software upgrade in Installed Mode is done via the “software install …” command

Prerequisites for software installation:


 The switch’s free memory must be greater than the size of the bundle being installed
 The free space in flash: must be greater than the size of the bundle being installed
 All switches must be running in installed mode
 When installing a bundle from a local storage device, the device must exist on all switches performing
the installation operation
 The packages in the bundle to be installed must have valid digital signatures

A failed installation might require a rollback using “software rollback” command or


a manual clean using “software clean” command.
Upgrade/Install a Bundle on flash
Switch# software install file flash:cat3k_caa-universalk9.SPA.03.03.05.SE.150-1.EZ5.bin

Preparing install operation ...


[2]: Copying software from active switch 2 to switch 1 Preparation stage
[2]: Finished copying software to switch 1
[1 2]: Starting install operation
[1 2]: Expanding bundle flash:cat3k_caa-…
[1 2]: Copying package files
[1 2]: Package files copied
[1 2]: Finished expanding bundle flash:cat3k_caa-… Installing to Flash
[1 2]: Verifying and copying expanded package files to flash:
[1 2]: Verified and copied expanded package files to flash:
[1 2]: Starting compatibility checks
[1 2]: Finished compatibility checks
[1 2]: Starting application pre-installation processing Post Install Checks
[1 2]: Finished application pre-installation processing
[1]: Old files list:
Removed cat3k_caa-base.SPA.03.03.03.SE.pkg
Removed cat3k_caa-drivers.SPA.03.03.03.SE.pkg
Removed cat3k_caa-infra.SPA.03.03.03.SE.pkg
Removed cat3k_caa-iosd-universalk9.SPA.150-1.EZ3.pkg Removing old files
Removed cat3k_caa-platform.SPA.03.03.03.SE.pkg
Removed cat3k_caa-wcm.SPA.03.03.03.SE.pkg
Software Rollback
Use the ‘software rollback’ command to revert to the previously installed package set (packages.conf.00-).

Switch# software rollback


Preparing rollback operation ...
[2]: Starting rollback operation
[2]: Starting compatibility checks
[2]: Finished compatibility checks
[2]: Starting application pre-installation processing
[2]: Finished application pre-installation processing
[2]: Old files list:
Removed cat3k_caa-base.SPA.03.03.05.SE.pkg
Removed cat3k_caa-drivers.SPA.03.03.05.SE.pkg
Removed cat3k_caa-infra.SPA.03.03.05.SE.pkg
Removed cat3k_caa-iosd-universalk9.SPA.150-1.EZ5.pkg
Removed cat3k_caa-platform.SPA.03.03.05.SE.pkg Removed newly installed image
Removed cat3k_caa-wcm.SPA.03.03.05.SE.pkg
[2]: New files list:
Added cat3k_caa-base.SPA.03.03.03.SE.pkg
Added cat3k_caa-drivers.SPA.03.03.03.SE.pkg
Added cat3k_caa-infra.SPA.03.03.03.SE.pkg
Added cat3k_caa-iosd-universalk9.SSA.150-1.EZ3.pkg
Added cat3k_caa-platform.SPA.03.03.03.SE.pkg
Added cat3k_caa-wcm.SPA.03.03.03.SE.pkg Reverted to older image
[2]: Creating pending provisioning file
[2]: Finished rolling back software changes. New software will load on reboot.
[2]: Do you want to proceed with reload? [yes/no]: n
Switch#
Recover a Corrupted Install
Copy the image bundle to USB flash and bootup using the following command from
the Bootloader prompt:
boot usbflash0:cat3k_caa-universalk9.SPA.03.03.05.SE.150-1.EZ5.bin

Copy the image bundle to USB flash and recover the switch by using the recovery
mechanism built into the switch from the Bootloader prompt:
emergency-install usbflash0:cat3k_caa-
universalk9.SPA.03.03.05.SE.150-1.EZ5.bin

Bundle boot image from USB, “software clean file flash”, copy usb
image bin to flash, “software expand file flash:<image.bin>”
Right To Use (RTU) / Honor Based Licensing

Trust Based Licensing Model

Built in licenses, not tied to Unique Device Identifier

Three license levels – lanbase, ipbase and ipservices

Activated using CLI by accepting the End User License Agreement

Portable across devices

No Need to access cisco.com License Portal


License Mismatch
IP Base
A

license right-to-use deactivate ipservices


license right-to-use activate ipbase acceptEULA
IP Base Reload switch
S

IP
IP Base
Services
Licensing Show commands
Switch# show license right-to-use slot 1
Slot# License name Type Count Period left
----------------------------------------------------------
1 ipbase permanent N/A Lifetime
1 lanbase permanent N/A Lifetime
1 apcount adder 4 Lifetime

License Level on Reboot: ipservices

Switch# show license right-to-use mismatch

Slot# License Name Adder AP Count Base AP Count


---------------------------------------------------------------
3 ipservices 0 0
Agenda
• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
CPU Complex
FPGA for 10/100/1000 RJ-45
USB/RJ-45 Console
Stack Power Ethernet Mgmt
I2C

SGMII UART

DDR3 - 1333
4GB DDR3 PCIe
w/ ECC UADP 1
Cavium 6230
800 MHz, 4 core CPU
2MB L2 Cache PCIe
UADP 2
ACT II
I2C

RTC
Boot Bus

FPGA for PHY, 64MB


2GB Flash
LED, etc. Bootloader
Frequently Asked Questions

Why should I be concerned about high memory utilization ?


It is very important have enough free memory to support features and network convergence events that require
transient memory.

What are the usual symptoms of high memory usage ?


 Memory utilization of process(es) keeps increasing
 System runs out of buffers and software packet forwarding stops
 Memory allocation failures are reported
 System crashes after reporting out of memory

At what percentage level should I start troubleshooting ?


It depends on the nature and level of feature config on the switch. It is very essential to find a baseline memory
usage during normal working conditions, and start troubleshooting when it goes above specific threshold.

E.g., Baseline memory usage 40%. Start troubleshooting when the memory goes above 70% and constantly keeps
increasing without adding any new configuration.
Memory show commands
Switch1# show processes memory sorted
System memory : 3930916K total, 1118032K used, 2812884K free, 221968K kernel reserved
Lowest(b) : 2252987972
PID Text Data Stack Heap RSS Total Process
IOS-XE Process
10623 56892 36452 92 Memory
Total 5400 196116 336728 iosd
5534 8716 311168 92 4620 136908 562460 fed
10619 21976 555372 88 13980 102320 723240 wcm
6032 4 97708 116 91996 99044 116676 idope.py
12259 4 193244 236 38244 73672 299464 wnweb_paster.py
5536 660 163524 88 4332 55968 336496 stack-mgr
6057 3532 137308 88 2200 54200 311676 ffm
6076 112 160908 88 6764 44728 233548 cli_agent
6058 1232 287972 88 8112 38352 438040 eicored
Memory show commands
Switch1# show processes memory detailed process iosd sorted
Processor Pool Total: 536870912 Used: 135242980 Free: 401627932
IOS Proce Pool Total: 16777216 Used: 9483360 Free: 7293856
IOS Tasks
PID TTY Allocated Increasing?
Freed Holding Getbufs Retbufs Process
0 0 168268072 31876024 126376204 0 0 *Init*
164 0 1534944 0 1558112 907264 0 NGWC DOT1X Proce
0 0 0 0 984492 0 0 *MallocLite*
1 0 657344 1544 678968 0 0 Chunk Manager
276 0 925564 297800 563696 0 0 os_info_p provid
39 0 415892 1856 376480 0 0 IPC Seat RX Cont
250 0 298204 464 320908 0 0 IPC LC Message H
Common Causes for high memory utilization

Common Cause Recommended Solution


Extensive Config Reduce configuration to supported scale
Excessive memory allocated to trace buffers1 Reset trace buffers to default sizes
DoS Attack/Punted traffic causing buffer Identify packets and block them using an
depletion ACL
Protocol flaps/re-convergence causing high Identify reason for network instability
transient memory utilization
Memory Leak caused by software bug Open a Service Request

1. set trace control <> buffer default


Command Summary - Memory

Troubleshooting Steps Commands


Check memory usage on system show processes memory sorted
Check memory usage of a particular process show processes memory detailed process fed
Check memory usage of IOSd show processes memory detailed process iosd
Check allocators of memory within IOSd show memory detailed process iosd allocating-
process totals
Frequently Asked Questions
Why should I be concerned about high CPU utilization ?
It is very important to protect the control plane for network stability, as resources (CPU, Memory and buffer) are
shared by control plane and data plane traffic (sent to CPU for further processing)

What are the usual symptoms of high CPU usage ?


 Control plane instability e.g., OSPF flap
 Reduced switching / forwarding performance
 Slow response to Telnet / SSH
 SNMP poll miss

At what percentage level should I start troubleshooting ?


It depends on the nature and level of the traffic. It is very essential to find a baseline CPU usage during normal
working conditions, and start troubleshooting when it goes above a specific threshold.

E.g., Baseline CPU usage 25%. Start troubleshooting when the CPU usage is consistently at 50% or above.
Troubleshooting High CPU
Identify the Culprit
4 Core CPU
Switch# show proc cpu sorted
Core 0: CPU utilization for five seconds: 96%; one minute: 7%; five minutes: 6%
Core 1: CPU utilization for five seconds: 5%; one minute: 1%; five minutes: 1%
Core 2: CPU utilization for five seconds: 0%; one minute: 0%; five minutes: 0%
Core 3: CPU utilization for five seconds: 41%; one minute: 1%; five minutes: 1%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process Platform Processes
5533 120300 1608989 74 0.29 0.40 0.42 1088 fed
5535 44890 1401868 32 0.24 0.11 0.10 0 stack-mgr
10582 416280 5787047 71 34.25 0.57 0.62 34816 iosd
6201 111520 119850 930 0.15 0.15 0.15 0 cpumemd
137% across 4
5534 38430 3608873 10 0.10 0.10cores 0.10 0 platform_mgr
10578 115030 4737397 24 0.10 0.12 0.11 0 wcm
5455 1500 40856 36 0.05 0.05 0.05 1088 slproc
6183 5270 211347 24 0.05 0.02 0.04 0 obfld
6185 4320 110250 39 0.05 0.01 0.03 0 console_relay IOS-XE Processes
6198 20900 186795 111 0.05 0.02 0.00 0 ffm
1 1700 1112 1528 0.00 0.09 1.43 0 init
2 0 138 0 0.00 0.00 0.00 0 kthreadd
3 10 1634 6 0.00 0.00 0.00 0 migration/0
4 0 3 0 0.00 0.00 0.00 0 sirq-high/0
Troubleshooting High CPU
Drill Down Deeper
Switch# show processes cpu detailed process iosd sorted
Core 0: CPU utilization for five seconds: 96%; one minute: 7%; five minutes: 6%
Core 1: CPU utilization for five seconds: 5%; one minute: 1%; five minutes: 1%
Core 2: CPU utilization for five seconds: 0%; one minute: 0%; five minutes: 0%
Core 3: CPU utilization for five seconds: 41%; one minute: 1%; five minutes: 1%
PID T C TID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
(%) (%) (%) Interrupt switched
traffic (eg. Wireless
10582 L 451160 6379641 70 34.25 0.71 0.60 34816 iosd Control)
10582 L 0 10582 414060 6194757 0 24.00 0.60 0.50 34816 iosd
10582 L 3 11543 36980 180107 0 10.25 0.11 0.10 0 iosd.fastpath
10582 L 2 11544 120 4777 0 0.00 0.00 0.00 34816 iosd.aux
6 I 57680 5216 0 3.00 0.33 0.22 0 Check heaps
304 I 2200 1790 0 12.17 0.00 0.00 0 HTTP CORE
218 I 2370 14495 0 8.33 0.00 0.00 0 IP Input
High CPU caused
211 I 190 214 0 0.33 0.00 0.00 0 RSMP Server
by HTTP traffic
306 I 10 23 0 0.11 0.00 0.00 0 SEP NODE PROC
5 I 0 2 0 0.00 0.00 0.00 0 IPC ISSU Dispatch
P
7 I 220 336 0 0.00 0.00 0.00 0 Pool Manager
3 I 0 1 0 0.00 0.00 0.00 0 HA-IDB-SYNC
Command Summary - High CPU

Troubleshooting Steps Commands


Check CPU usage on IOS threads show process cpu detailed process iosd
[sorted]
Check CPU usage on platform dependent and show process cpu detailed process {fed |
Nova threads platform_mgr | stack-mgr | ha_mgr | eicored…}
Check traffic on the RX and TX CPU queues show platform punt client, show platform punt tx
Check details of CPU queues show platform punt statistics port-asic 0 cpuq 0
direction {rx | tx}
CPU Punt Path Architecture
Processes
Processes Wireless Control
Control Packets Packets
IOSd WCM
Punt Shim

32 RX PDS Queues 8 TX PDS Queues


Interfaces with
UADP ASIC and
Packet Delivery Packet Handler
Service (PDS)
Forwarding Engine Driver

32 RX Queues 8 TX Queues

UADP ASIC
Common Cause for Punting Traffic to CPU
Common Cause Recommended Solution
Same interface forwarding change design, use “no ip redirect”
ACL logging disable ACL logging
ACL deny causing switch to send ICMP unreachable no ip unreachables1
Forwarding/Feature exception (out of TCAM/adj reduce TCAM usage
space)
SW-supported feature disable the feature or reduce the amount of
traffic
IP packets with TTL<2 or options disable the offending traffic
Broadcast Storm Fix STP loop, disable traffic

Unexpected control/data traffic Control Plane Policing (CoPP), Deny ACL


Software Bug Open a Service Request
1. Should be configured on all the L3 interfaces of the switch.
Decoding CPU Queues
Switch# show platform punt client
tag buffer jumbo fallback packets received failures
alloc free bytes conv buf
CPU Queue0/1024/1600
65536 Number 0/0of packets
Number 0/512
in 64845 64845 3371071 0 0
25 (65561-65536) queue
65544 0/ 96/1600 0/4 awaiting0/0 0 0 0 0 0
processing
65545 0/ 96/1600 0/8 0/32 1947 1947 612588 0 0
65546 0/ 512/1600 0/32 0/512 13563 137795 24587306 0 0
65548 0/ 512/1600 0/32 0/256 10903 10903 650232 0 0
65551 0/ 512/1600 0/0 0/256 56 56 12088 0 0
65561 411/ 512/1600 0/0 0/128 557245 556834 39010862 0 0
65562 0/ 512/1600 0/16 0/256 0 0 0 0 0

Size of Queue Size of each buffer


Displaying packets in the queue
show buffers detailed process iosd assigned packet | beg ng3k_rx25

Buffer information for ng3k_rx25 buffer at 0x35E98E8C


data_area 0x35E9932C, refcount 1, next 0x0, flags 0x80
linktype 7 (IP), enctype 1 (ARPA), encsize 14, rxtype 1
if_input Vlan10, if_output 0x0 (None)

source: 10.32.111.83, destination: 10.33.21.219, id: 0x4BE0, ttl: 63,


TOS: 0 prot: 6, source port 51378, destination port 22

35E99382: 6400F124 F1C11410 9FE43A49 08004500 d.q$qA...d:I..E.


35E99392: 00984BE0 40003F06 56110A20 6F530A21 ..K`@.?.V.. oS.!
35E993A2: 15DBC8B2 0016588A DB9F6C34 421A5018 .[H2..X.[.l4B.P.
35E993B2: FFFF8666 000072A2 E1AB5431 78970F84 ...f..r"a+T1x...
Troubleshooting High CPU in stack-mgr – Known issue

Switch# show proc cpu sorted

Core 0: CPU utilization for five seconds: 99%; one minute: 64%; five minutes: 69% Several cores
Core 1: CPU utilization for five seconds: 99%; one minute: 89%; five minutes: 80% experiencing
Core 2: CPU utilization for five seconds: 12%; one minute: 57%; five minutes: 69% high CPU
Core 3: CPU utilization for five seconds: 98%; one minute: 99%; five minutes: 91%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process


5700 2311985 24103536 2114 49.58 49.70 49.72 0 stack-mgr
Fed and Stack
5698 1475012 42309915 522 25.80 25.74 25.76 1088 fed Mgr the top
12472 1779005 16386647 90 1.49 1.58 1.65 0 iosd consumers
6239 3163525 50452155 150 0.30 0.31 0.31 0 ffm
43 3496392 43374714 17 0.10 0.10 0.10 0 sirq-net-rx/3
29 70700 12468288 0 0.05 0.01 0.03 0 sirq-timer/2
5699 1747090 31690173 20 0.05 0.10 0.11 0 platform_mgr
Continued
• High CPU in stack_mgr process observed for prolonged time with no functional
impact
• Stack Mgr - RCA
• top/htop output in kernel and show process cpu report different values.
Kernel counter gets rolled over and once they roll-over their values do not
change – Cosmetic issue in display
• FED process - RCA
• Frequent mac flaps, mac learning events
• Frequent STP Topology Change Notifications
Related Defects
DDTS Description Fixed Release

ARP broadcast for vlan which is not SVI punted to


CSCuo98789 03.3(04)SE
CPU incase of Layer 2

Routing Protocol packets cause unknown protocol


CSCuh47950 03.6(00)E 03.3(04)SE
drops in L2 only vlan

Changing Aging timer does not change timer on


CSCup05630 03.6(01)E 03.3(04)SE
Active/Local switch

CSCup24497 Serviceability and enhancement for OOB 03.6(01)E 03.3(04)SE

SifExceptionInterruptA8 need to handle all


CSCup15995 03.3(04)SE
conditions besides balloting
show process cpu different from top/htop in linux
CSCup39058 03.3(04)SE
kernel
Agenda
• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
3850 StackWise-480 Overview
StackWise-480
• 3850 StackWise-480 is a new generation of Catalyst 3850 stacking
 240Gbps of bandwidth (120Gbps TX & 120Gbps RX per connector)
 Similar to previous stacking implementations, ring redundancy is achieved via ring-wrap capabilities provided
in hardware
 NOT backward compatible with currently fielded stacking technologies, most notably StackWise Plus
Stack
3850 cables
StackWise-480 Cables
• StackWise-480 currently supports 3 cables
 STACK-T1-50CM = 0.5m cable
 STACK-T1-1M = 1m cable
 STACK-T1-3M = 3m cable
• All StackWise-480 cables include ACT II chips for counterfeit protection
StackWise-160
3850 &80
StackWise-4& cables
Overview
• 3650 StackWise-160 is a new generation of Catalyst 3650 stacking
 160Gpbs stacking bandwidth
 NOT backward compatible with currently fielded stacking technologies, most notably StackWise Plus
 Stack cable can NOT be used on 3850
 Stack cables are 50cm, 1m, and 3m in length
Understanding the Stack Ring
ASIC Stack Interface of UADP

• 6 rings in total
• 3 rings go East Is math really an
• 3 rings go West opinion?
• Each ring is 40G
• Total Stack BW = 240G Assuming
4 x 24-port
• With Spatial Reuse = 480G 3850 Switches

Stack Interface
of UADP

Packets are segmented/reassembled in HW (256 byte


segments)
Spatial Reuse

Assuming 4
3
1
2
4 x 24-port
3850 Switches Destination
Stripping
Packet travels
½ the rings.
Taken out of
stack by
destination

3
1
2
4
Show commands
Switch# show switch detail
Switch/Stack Mac Address : 6400.f124.df80 - Local Mac Address
Mac persistency wait time: Indefinite
Priority, followed by MAC
H/WAddress
Currentdetermines which
Switch# Role Mac Address switch gets
Priority Version Stateelected as Active
------------------------------------------------------------
*1 Active 6400.f124.df80 10 0 Ready
2 Standby 6400.f124.de80 1 0 Ready

Stack Port Status Neighbors


Switch# Port 1 Port 2 Port 1 Port 2
--------------------------------------------------------
1 OK OK 2 2
2 OK OK 1 1
Show commands
Switch# show switch stack-ports summary

Sw#/Port# Port Status Neighbor Cable Length Link OK Link Active Sync OK #Changes to LinkOK In Loopback
Cable with corrupted
---------------------------------------------------------------------------------------------------------------
EEPROM
1/1 OK 2 50cm Yes Yes Yes 0 No
1/2 OK 2 Unknown Yes Yes Yes 0 No
2/1 OK 1 100cm Yes Yes Yes 1 No
2/2 OK 1 50cm Yes Yes Yes 1 No
Image Version Mismatch
• If the switches are in version mismatch state, they will not stack
• Debugging:
• Check if the switch version matches the active using show version command
• If they do not match, upgrade the switch to the Active’s version
Switch# show switch

Switch# Role Mac Address Priority Version State


---------------------------------------------------------------------------
*1 Active 6400.f125.1480 1 V01 Ready
2 Member 6400.f125.2680 1 0 V-Mismatch
3 Member 6400.f125.2500 1 0 V-Mismatch
4 Member 6400.f125.2480 1 0 V-Mismatch
Switch Stuck in Syncing State
• If a switch is stuck in syncing state Get ng3k-ses-oir “trace buffer” using Switch# show
mgmt-infra trace messages Diag OnDemand test for the Stack cable diagnostic start
switch 1 test 7
• Debugging:
• Run the command “sh switch” to see the states
• Open a Service Request with Cisco TAC

Switch# show switch


Switch# Role Mac Address Priority Version State
---------------------------------------------------------------------------
*1 Active 6400.f125.1480 1 V01 Ready
2 Standby 6400.f125.2680 1 V01 Ready
3 Member 6400.f125.2500 1 V01 Ready
4 Member 6400.f125.2480 1 0 Syncing
HA Redundancy – Shift from 3750-X
Catalyst 3750-X – StackWise-Plus
- Hybrid control-plane processing
- N:1 stateless control-plane redundancy
- Distributed L2/L3 Forwarding Redundancy
- Stateless L3 protocol Redundancy

Catalyst 3850 – StackWise-480


- Centralized control-plane processing
- 1+1 Stateful redundancy (SSO)
- Distributed L2/L3 Forwarding Redundancy
- IOS HA Framework alignment for L3 protocol
HA SSO Architecture
Interfaces
L2
A L3
QoS
Wireless Feature State is synced
between Active and
Interfaces
L2
Standby Member in stack
S L3
QoS
Wireless

Feature States are inactive


on Standby Member
HA– Roles and Definitions

 Route Processor Domain – a set of SW processes (e.g. IOSd, WCM) that


implement the centralized Active and Standby portions of the stack control plane
 Line Card Domain – a set of SW processes (e.g. FED, Platform Manager) that
implement the distributed Line Card portions of the stack control plane
 Infra Domain – Support SW for the RP and LC Domains
 Active Switch – supports the Active RP Domain, a LC Domain and Infra Domain
 Standby Switch – supports the Standby RP Domain, a LC Domain and Infra
Domain
 Member Switch – supports a LC Domain and Infra Domain.
 Election – assigning roles or functions within the stack
Catalyst 3x50 – HA State Machine

• Active starts RP Domain 2min timer

(IOSd, WCM, etc) locally


LC RP Infra A
• Programs hardware on all LC Domains
• Traffic resumes once hardware is RP LC Infra
S
programmed
LC Infra
• Starts 2min Timer to elect Standby in parallel
• Active elects Standby
LC Infra
• Standby starts RP Domain locally
• Starts Bulk Sync with Active RP
• Standby reaches “Standby Hot”
Show switch with SSO
Switch# show switch
Switch/Stack Mac Address : 2037.06cf.0e80
H/W Current
Switch# Role Mac Address Priority Version State
------------------------------------------------------------
*1 Active 2037.06cf.0e80 10 PP Ready
Active
2 Standby 2037.06cf.3380 8 PP Ready

3 Member 2037.06cf.1400 6 PP Ready Standby

4 Member 2037.06cf.3000 4 PP Ready

* Indicates which member is providing the “stack Identity” (aka “stack MAC)
Show redundancy states
Switch# show redundancy states show redundancy history show redundancy
switchover history show redundancy
my state = 13 –ACTIVE
peer state = 8 -STANDBY HOT
Mode = Duplex
Unit ID = 2 Terminal state for SSO. If “peer
state” is stuck in any other state
Redundancy Mode (Operational) = SSO
for more than 10 minutes, open
a service request with TAC
Redundancy Mode (Configured) = SSO
Redundancy State = SSO
Manual Swact = enabled

Communications = Up

client count = 76 If Communication channel is not


client_notification_TMR = Up, there might be a problem
360000 milliseconds
keep_alive TMR = with stack connectivity. Check
9000 milliseconds
keep_alive count = 0 stack cable.
keep_alive threshold = 9
RF debug mask = 0
Agenda
• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
Configuration and Show commands

No new configuration added for Unicast Features on 3x50

Configuration, show commands compatible with 3750X

Additional Platform CLIs have been added

Refer to configuration guide and command line reference for full details
TCAM Concept on 3x50
 TCAM used by several features
 A Hash Table Manager (HTM) provides configurable resources to features
so they can select specific banks or hashing mechanisms
 HTM provides the required abstraction layer to its users so that the details of
the TCAM HW is irrelevant
TCAM Utilization
Switch1# show platform tcam utilization asic all
CAM Utilization for ASIC# 0
Table Max Values Used Values
--------------------------------------------------------------------------
Unicast MAC addresses 32768/512 82/22
Directly or indirectly connected routes 32768/8192 7/89
IGMP and Multicast groups 8192/512 0/16
Security Access Control Entries 3072 173
QoS Access Control Entries 2816 52
Netflow ACEs 1024 15
Input Microflow policer ACEs 256 7
Output Microflow policer ACEs 256 7
Control Plane Entries 512 187
Policy Based Routing ACEs 1024 9
Tunnels 256 12
Input Security Associations 256 4
Output Security Associations and Policies 256 9
TCAM - ACL client region
 The ACL’s total # VMR should not exceed 1375 VMR entries for the client
region. Once the VMR limit is reached there will be ACL UNLOAD EVENT and all of
that client’s packets will be dropped.
 3750: When ACL limit is reached, packets are punted to the CPU and software
forwarded
 3850: When ACL limit is reached, ACL is not downloaded and packets are dropped.
No Software forwarding capability

000435: Sep 25 13:00:02: %ACL_ERRMSG-4-UNLOADED: 1 fed: Input IPv4 Group ACL on


interface Client MAC 1822.34be.c1ca for label 10 on asic1 could not be programmed in hardware
and traffic will be dropped.
TCAM programming example – (interface Gi1/0/1)
We first obtain the iif-id of Gi1/0/1: Logical Port
3850# sh platform port-asic ifm mappings local-port switch 1 number
Mappings Table
LPN ASIC Port Interface IIF-ID Active iif-id
1 1 21 Gi1/0/1 0x010290000000007f Y
3850# sh platform acl iifid 0x010290000000007f
######################################################## Client MAC
## LE INFO: (LETYPE: Group) address
########################################################
LE: 17 (Client MAC 20bb.c021.a540) (ASIC255)
------------ Input group label
---
=4
LE Type: Group
IIF ID: 0x107a840000003b3
Input IPv4 ACL: label 4 h/w 4 (read from h/w 4)
BO 0x164000000 [CGACL]: xACSACLx-IP-PERMIT_ALL_TRAFFIC-51ef7db1
Output IPv4 ACL: label 0 h/w 0 (Group LE and label are not linked)
Input IPv6 ACL: label 0 h/w 0 (Group LE and label are not linked)
Output IPv6 ACL: label 0 h/w 0 (Group LE and label are not linked)
A sample dynamic
Input MAC ACL: label 0 h/w 0 (Group LE and label are not linked) ACL downloaded
Output MAC ACL: label 0 h/w 0 (Group LE and label are not linked)
---

IPv4 ACL: xACSACLx-IP-PERMIT_ALL_TRAFFIC-51ef7db1


aclinfo: 0x5fc9d0a0
ASIC255 Input Group labels: 4 5
Commands to check TCAM Utilization
How to check IPV4 FIB/Route TCAM
3850-1# show platform ip route summary Number of routes
IP Fib Summary having mask length
Total number of v4 fib entries = 36 32
Total number succeeded in hardware = 36

Mask-Len 0 :- Total-count 1 hw-installed count 1


Mask-Len 4 :- Total-count 1 hw-installed count 1
Mask-Len 8 :- Total-count 7 hw-installed count 7
Mask-Len 24 :- Total-count 3 hw-installed count 3
Check the HTM index for the
Mask-Len 32 :- Total-count 24 hw-installed count 24
corresponding ipv4 prefix. In this
3850-1# show platform ip route case for 43.43.43.2/32 summary
IP Fib entries prefix.

vrf dest htm flags


--- ---- --- -----
0 0.0.0.0/32 0x131ceec0 0x3
0 43.255.255.255/32 0x1311b10c 0x3
0 43.43.43.1/32 0x1311b21c 0x3
0 43.43.43.2/32 0x13124ba4 0x3
0 43.0.0.0/8 0x1311b084 0x3
3850-1# show platform abstraction print-resource-handle 0x13124ba4 1
Handle:0x13124ba4 Res-Type:ASIC_RSC_HASH_TCAM Res-Switch-Num:0 Asic-Num:255 Feature-ID:AL_FID_L3_UNICAST_IPV4 Lkp-ftr-
id:LKP_FEAT_IPV4_L3_UNICAST ref_count:1 Check to make sure
Hardware Indices/Handles:priv_ri/priv_si Handle:(nil) handle0:0x5d46e77c handle1:0x5d46e3fc
it is the correct
Detailed Resource Information (ASIC# 0)
prefix 43.43.43.2
---------------------------------------- Take a note of the
Number of HTM Entries: 1
station index which
has the information
Entry #0: (handle 0x5d46e77c)
KEY - vrf:0 mtr:0 prefix:43.43.43.2 rcp_redirect_index:0x0 for packet forwarding
MASK - vrf:4095 mtr:0 prefix:255.0.0.0 rcp_redirect_index:0x0
FWD-AD = afd_label_flag:0 icmp_redir_enable:1 priority:3 afdLabelOrDestClientId:0 SI:89 destined_to_us:0 hw_stats_idx:2 stats_id:0
redirectSetRouterMac:0
Destined to us = 0
means it is not
3850# sh platform port-asic rm 0 stationindex 89 switch 1 destined to the
switch
al_rsc_si Tells you how
station_index = 0x75 Mac will be
rewriteIndex = 0x1 rewritten
destIndex = 0x513c
stationTableGeneric Label = 0x0 Tells you where
packet will be
forwarded
Agenda
• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
QoS – What’s New with Converged Access
Policy-map PER-PORT-POLICING
Class VOIP
set dscp ef
Wired Wireless
police 128000 conform-action transmit
exceed-action drop
Class VIDEO
• Modular QoS based CLI (MQC) set dscp CS4
Granular QoS control at the wireless edge
police 384000 conform-action transmit
 Tunnel termination allows customers to provide QoS treatment per
• Alignment with 4500E series (Sup6, Sup7) exceed-action drop and common treatment of wired and wireless traffic
SSIDs, per-Clients
• Class-based Queuing, Policing, Shaping, Class SIGNALING
throughout the network

Marking set dscp cs3


 police
Enhanced 32000Bandwidth
conform-action Managementtransmit
• More Queues exceed-action drop
 Approximate Fair Drop (AFD) Bandwidth Management ensures fairness
Class TRANSACTIONAL-DATA
at Client, SSID and Radio levels for NRT traffic
• Up to 2P6Q3T queuing capabilities
set dscp af21
• Standard 3750X provides 1P3Q3T
 Wireless
Class class-defaultSpecific Interface Control
• Not limited to 2 queue-sets set dscp default
 Policing capabilities Per-SSID, Per-Client upstream and downstream
• Flexible MQC Provisioning abstracts
 AAA support for dynamic Client based QoS and Security policies
queuing hardware
 Per SSID Bandwidth Management
Platform QoS CLI
Switch# show platform qos policy target GigabitEthernet 1/0/48
Input policy :
--------------
Not attached
Output policy :
--------------
POLICY: defportangn Num Classes:1 PLC Map Targets:0 Queue LBL Targets uplink:0 downlink:0
PMAP:0x6345d778 NextPMAP:0x585d5518 PrevPMAP:0x57b02b98
UP Mask: 0, Lookup Type:0
COS Mask: 0, dscp mask:0
Filter flags: 0, Action Flags:0x14, num_classmaps 1 policy_type: MARKING/POLICING
nfl_req_pending_cnt:0 pmap_qsize:0
CLASS: class-default
CMAP:0x124b42a0 Next:(nil) Prev:(nil)
Masks:- UP:0, CoS: 0, Dscp:0
Filter flags 0
Not Supported
Negate: NO Next:(nil) . . . .

Switch# sh platform qos policy hw_state target GigabitEthernet 1/0/48


Input policy : Not attached
Output policy : defportangn
H/W programming State: INSTALLED IN HW
MLS QoS and MQC QoS Default behaviors
 3750, With “mls qos” enabled at global level all the ports are untrusted and DSCP/precedence/COS of the
incoming packets are reset to 0
 3750, “mls qos trust” is needed at the interface level to change the trust mode
 3850, port is trusted by default, DSCP/precedence/COS values are retained
3750 MLS QoS vs. 3850 MQC QoS
3750 3850

Basic Structure MLS MQC

Support mls qos No mls qos support


Global Config Support some of MQC at ingress Support MQC [class-map, policy-map]
Support mls qos config and some of MQC cli at
Interface Config ingress
Attach the policy to the interface

Port Ingress Classification/Policing/Marking/Queuing Classification/Policing/Marking

Port Egress Queing Classification/Policing/Marking/Queuing

SVI Ingress Classification/Policing/Marking Classification/Marking

SVI Egress None Classification/Marking

https://techzone.cisco.com/t5/NGWC-Switches-3650-3850/3750-MLS-to-3850-MQC-
Conversion-of-QoS-Configuration/ta-p/697153
Agenda
• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
Additional Troubleshooting Commands
Topic Command
Platform specific Feature information show tech-support platform <feature> (eg.
wireless, acl, fnf, etc.)
Trace Buffers for non-IOSd processes show mgmt-infra trace messages <component>
(eg. fed-punject-detail, stack-mgr-events, etc.)
Generate Live Core of a Process (internal resource process dump <process id obtained from
command) show process> [ switch <switch number> ]
Generate System Report (internal command) resource create_system_report

Identify memory leaks show memory debug leaks detailed process


<process name> summary
Core Dumps and System Reports
• System generates a fullcore, crashinfo and System Report when a process terminates
abnormally
• A System Report is generated each time the switch is rebooted
• System Report contains a dump of all the trace buffers in the system
• When filing a TAC case, please attach the fullcore, crashinfo and System Report files
(whatever is applicable) from the crashinfo: filesystem
Agenda
• Product Review
• Hardware Troubleshooting
• Image Management and Licensing
• Memory and CPU Resources
• Stacking & High Availability
• Hardware Forwarding
• Qos
• Misc Tools & Tricks
• Summary
Summary
• Provided a High Level Architectural overview of features on the 3x50
• Basic Troubleshooting functionality available on the 3x50

• Do you have a better understanding of:


• 3x50 as a platform
• Key differences between 3x50 and 3750X
• Basic troubleshooting on the 3x50

• Would you like to see:


• More/Less of any particular topic
• More topics
• Longer session
Complete Your Online Session Evaluation
• Give us your feedback to be
entered into a Daily Survey
Drawing. A daily winner
will receive a $750 Amazon
gift card.
• Complete your session surveys
though the Cisco Live mobile
app or your computer on
Cisco Live Connect.
Don’t forget: Cisco Live sessions will be available
for viewing on-demand after the event at
CiscoLive.com/Online
Continue Your Education
• Demos in the Cisco campus
• Walk-in Self-Paced Labs
• Table Topics
• Meet the Engineer 1:1 meetings
• Related sessions
Thank you

You might also like