SBC - Troubleshooting Guide-21.8

Session Border Controller
Release 21.8
Troubleshooting Guide
3TB-30120-HSAA-TCZZA
Issue 4
©2021 Nokia. Nokia Condential Information. Use subject to agreed restrictions on disclosure and use.
Troubleshooting Guide
Nokia is committed to diversity and inclusion. We are continuously reviewing our customer documentation and consulting with standards bodies
to ensure that terminology is inclusive and aligned with the industry. Our future customer documentation will be updated accordingly.
This document includes Nokia proprietary and confidential information, which may not be distributed or disclosed to any third parties without the
prior written consent of Nokia.
This document is intended for use by Nokia's customers ("You"/"Your") in connection with a product purchased or licensed from any company
within Nokia Group of Companies. Use this document as agreed. You agree to notify Nokia of any errors you may find in this document; however,
should you elect to use this document for any purpose(s) for which it is not intended, You understand and warrant that any determinations You
may make or actions You may take will be based upon Your independent judgment and analysis of the content of this document.
Nokia reserves the right to make changes to this document without notice. At all times, the controlling version is the one available on Nokia’s site.
No part of this document may be modified.
NO WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY OF AVAILABILITY, ACCURACY,
RELIABILITY, TITLE, NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, IS MADE IN RELATION TO THE CONTENT
OF THIS DOCUMENT. IN NO EVENT WILL NOKIA BE LIABLE FOR ANY DAMAGES, INCLUDING BUT NOT LIMITED TO SPECIAL, DIRECT, INDIRECT,
INCIDENTAL OR CONSEQUENTIAL OR ANY LOSSES, SUCH AS BUT NOT LIMITED TO LOSS OF PROFIT, REVENUE, BUSINESS INTERRUPTION,
BUSINESS OPPORTUNITY OR DATA THAT MAY ARISE FROM THE USE OF THIS DOCUMENT OR THE INFORMATION IN IT, EVEN IN THE CASE OF
ERRORS IN OR OMISSIONS FROM THIS DOCUMENT OR ITS CONTENT.
Copyright and trademark: Nokia is a registered trademark of Nokia Corporation. Other product names mentioned in this document may be
trademarks of their respective owners.
©2021 Nokia.
Session Border Controller 3TB-30120-HSAA-TCZZA 4 2

Release 21.8 ©2021 Nokia. Nokia Confidential Information
Use subject to agreed restrictions on disclosure and use.
Troubleshooting Guide Table of Contents
Table of Contents
List of Figures............................................................................................................................................................. 6
List of Tables...............................................................................................................................................................7
1 About this document.................................................................................................................................................. 8

1.1 Reason for new issue...............................................................................................................................................8
1.2 Intended audience....................................................................................................................................................8
1.3 Used conventions..................................................................................................................................................... 9
1.4 General information............................................................................................................................................... 10
2 Introduction to troubleshooting.............................................................................................................................. 15
2.1 Workflow to troubleshoot a problem on Nokia SBC............................................................................................ 15
2.2 Troubleshooting guidelines................................................................................................................................... 17
2.3 Things to do before escalating to level 4 support.............................................................................................. 17
2.3.1 Troubleshooting tools - logs, diagnostics and health checks.................................................................. 18
2.3.2 Collecting SBC logs using sbc_logs command...................................................................................... 19
2.3.3 Collecting CBIS/Cloud platform audit logs................................................................................................21
2.3.4 Collecting logs from CBAM VNF................................................................................................................. 24
2.3.5 Collecting VMM-HI logs on RMS.................................................................................................................25
2.4 Performance check list on receiving outage calls for SBC media....................................................................... 25
3 Troubleshooting with Alarms................................................................................................................................... 29
3.1 Determining alarm severity................................................................................................................................... 29
3.2 Viewing active SBC alarms in Web UI....................................................................................................................29
3.3 Viewing proposed repair actions for an active SBC alarm in Web UI.................................................................. 30
3.4 Viewing proposed repair actions for an active SBC alarm without using Web UI............................................... 30
3.5 Troubleshooting using FM process....................................................................................................................... 30
3.6 Troubleshooting when unable to open Fault Management view in Web UI.........................................................31
3.7 Troubleshooting when no Alarms are present for Media/Signaling plane in Web UI..........................................32
3.8 Troubleshooting using help provided for Media and Signaling Alarms in Web UI...............................................32
3.9 Troubleshooting OAM internal alarms.................................................................................................................. 33
3.10 Troubleshooting when no Alarms are sent to Northbound OSS.......................................................................33

4 Troubleshooting connectivity/configuration issues................................................................................................35

4.1 Troubleshooting SBC general connectivity or configuration issues.................................................................. 35
4.2 Troubleshooting connectivity issues (missing static routes).............................................................................. 35
4.3 Troubleshooting connectivity issues (other)........................................................................................................ 36
4.4 Troubleshooting connectivity issues of the RMS system....................................................................................36
4.5 Troubleshooting RCC offline issue........................................................................................................................ 39
4.6 Troubleshooting degraded configuration resource issue....................................................................................40
4.7 Troubleshooting chk_db_repl FAILED and chk_database FAILED issue.............................................................. 40
4.8 Troubleshooting PFW EIPM error...........................................................................................................................41
4.9 Troubleshooting SBC signaling plane health check error due to rolling inits of the h248ds under bgc1-a
and bgc1-b............................................................................................................................................................. 41
4.10 Troubleshooting wrong media plane image issue in OpenStack...................................................................... 42
4.11 Troubleshooting SRIOV service issue on compute node...................................................................................43
4.12 Troubleshooting SBC OAM service issues.......................................................................................................... 44
4.13 Troubleshooting connectivity issues between SBC FW VM and remote S-GW/P-GW.......................................44
4.14 Troubleshooting SBC CLI commands timeout issue..........................................................................................45
4.15 Troubleshooting partition table issue on RMS hosts........................................................................................ 45
5 Troubleshooting SBC database problems............................................................................................................... 47
5.1 Troubleshooting 500 internal server error...........................................................................................................47
6 Troubleshooting SBC software upgrade issues.......................................................................................................48
6.1 Checking and understanding ansible.log..............................................................................................................48
6.2 Checking and understanding InteractiveInstaller.sh.log..................................................................................... 50
6.3 Troubleshooting VMM-HI upgrade........................................................................................................................ 51
6.4 Troubleshooting SIM Based Software Upgrade (SBSU) issues............................................................................ 51
6.4.1 Troubleshooting CBAM GUI upgrade/backout failure when sim.log indicates stoppage at
CP_WARNING VERIFY_SOFTWARE or COMMIT pause point...................................................................... 51
6.5 Troubleshooting Image Based Software Upgrade (IBSU).....................................................................................52
6.5.1 Upgrade logs............................................................................................................................................... 52
6.5.2 Useful files...................................................................................................................................................52
6.5.3 Stage debug and cleanup...........................................................................................................................53
6.5.4 Resume considerations (for advanced users only)................................................................................... 54
6.5.5 sbc_hc.py tool............................................................................................................................................. 56

6.5.6 Hardware slowdown.................................................................................................................................... 58
7 Troubleshooting SBC media plane issues................................................................................................................61

7.1 Logging in to media plane using CLI.................................................................................................................... 61
7.2 Troubleshooting the media path problem............................................................................................................61
7.3 Troubleshooting the networking problem............................................................................................................ 66
7.3.1 Scenario-1................................................................................................................................................... 66
7.3.2 Scenario-2................................................................................................................................................... 68
7.4 Troubleshooting the application crash problem.................................................................................................. 70
7.5 Troubleshooting the H248 transaction failure problem...................................................................................... 71
7.6 Troubleshooting backplane connection issues.................................................................................................... 72
7.6.1 Troubleshooting media backplane connection down issue......................................................................73
7.6.2 Checking backplane connection status..................................................................................................... 73
7.6.3 Checking TIPC connection status.............................................................................................................. 74
7.6.4 Checking backplane connection switchover history................................................................................. 74
7.6.5 Checking MAC table used in datapath.......................................................................................................75
7.6.6 Checking NPU MAC table............................................................................................................................76
7.7 Troubleshooting media total network failure issue............................................................................................. 77
7.8 Troubleshooting mute call issues......................................................................................................................... 78
7.9 Troubleshooting absence of Service Change information in signaling log.........................................................83
7.10 Troubleshooting high CPU usage issue.............................................................................................................. 83
7.11 Troubleshooting high CPU usage by 'rngd' service issue................................................................................. 88
8 Appendix....................................................................................................................................................................90
8.1 Example of Master.log analysis of a tablet to tablet video call...........................................................................90
8.1.1 Example of Master.log analysis.................................................................................................................. 90
8.2 SBC lab environment procedures....................................................................................................................... 108
8.2.1 Identifying currently isolated cores and verifying whether they match with expected test
configuration............................................................................................................................................. 108

Troubleshooting Guide List of Figures
List of Figures
Figure 1: Ansible.log field details................................................................................................................................. 48
Figure 2: Traditional RMS with large PIM (13 core) configuration............................................................................. 109
Figure 3: Traditional RMS with small PIM (11 core) with SC2 VM with 'PF_mode_support: True' configuration...... 109
Figure 4: Stacked RMS signaling host configuration.................................................................................................109
Figure 5: Sample entries in sig_prep_config.yml file................................................................................................ 110

Troubleshooting Guide List of Tables
List of Tables
Table 1: Conventions used.............................................................................................................................................. 9
Table 2: Related Documentation.................................................................................................................................. 10

Troubleshooting Guide About this document
1 About this document

This guide provides procedures for troubleshooting problems you may encounter when operating the Nokia
Session Border Controller (Nokia SBC), its related hardware, and software. The Nokia SBC is also called as
Converged SBC in its marketing collaterals.
This guide provides procedures for the following:
• General troubleshooting workflow and guidelines
• Collecting logs and use of logs
• Using SBC alarm information for troubleshooting
• Troubleshooting connectivity or configuration errors
• Checking for database issues
• Detailed media plane troubleshooting tips and procedures
• Software upgrade troubleshooting
• Retrieval of alarms and conditions
• Troubleshooting the cause of an alarm or fault condition
• Clearing the alarm and fixing a known problem
1.1 Reason for new issue

This is the revision history of this document for Nokia SBC 21.8 release.
Issue No. Updates
4 Added Troubleshooting partition table issue on RMS hosts on page 45
3 Added Troubleshooting CBAM GUI upgrade/backout failure when sim.log

indicates stoppage at CP_WARNING VERIFY_SOFTWARE or COMMIT pause point
on page 51 under Troubleshooting SIM Based Software Upgrade (SBSU)
issues on page 51
2 Added Note (Attention) regarding usage of sbc_logs command with --level

2 in Collecting SBC logs using sbc_logs command on page 19

1.2 Intended audience

The SBC Troubleshooting Guide is intended for support engineers responsible for identifying and resolving SBC
issues.
Examples of types of issues are:
• Connectivity and interface issues.
• Configuration issues.
• Web UI startup, and communication issues.
The personnel using this guide must have the following knowledge:
• Experience in configuring and managing telephony switching equipment
• A working knowledge of SBC
• Knowledge of TL1 command usage
• An understanding of protocols running on the system
• Experience in use of test equipment
1.3 Used conventions

The typographical conventions used in this document are described below.
Table 1: Conventions used
Appearance Description
graphical user interface text Text that is displayed in a graphical user interface or
in a hardware label
Emphasis Text that is emphasized
document titles Titles of books or other documents
file or directory names The names of files or directories
keyboard keys The name of a key on the keyboard
command-syntax Text used for commands
system output Text that a system displays or prints

Table 1: Conventions used (continued)
Appearance Description
variable A value or command-line parameter that the user

provides
[] Text or a value that is optional
{value1 | value2 } A choice of values or variables from which one value

or variable is used
{variable1 | variable2 }
1.4 General information

The following table lists the documents belonging to the Nokia SBC documentation set:
Table 2: Related Documentation
Document Name Document Ordering ID Description
Nokia SBC 21.8 Guide to 3TB-30118-HSAA-CEZZA This document lists all the
Documentation documents available for Nokia
SBC and provides a brief
description regarding each one
of them, thus helping to guide
the user to the appropriate
document.
Nokia SBC 21.8 Product 3TB-30101-HSAA-DEZZA This document provides an

Description overview of the Nokia SBC
product, its functions, and the
services offered by Nokia SBC.
Nokia SBC 21.8 Feature Handbook 3TB-30102-HSAA-TCZZA This document lists all the major
features supported by Nokia SBC
per release. It also provides the
provisioning details of each of
these features.

Table 2: Related Documentation (continued)
Nokia SBC 21.8 Operations for 3TB-30103-HSAA-TCZZA This document provides details
Integrated Configuration of the OAM operations that
can be performed using the
Nokia SBC Web UI for integrated
configuration. It also provides
configuration details about the
various SBC Web UI configuration
tables.
Nokia SBC 21.8 Charging Interface 3TB-30104-HSAA-TCZZA This document provides details of
Specification the charging and billing interface
specifications for Nokia SBC.
Nokia SBC 21.8 Configuration 3TB-30105-HSAA-TCZZA This document provides details

Parameters Specification on the use of the XML interface
on the configuration database
provided by the Nokia Control
Platform, which can be used
for performing Configuration
Management either from XML
interface or from the Provisioning
GUI.
Nokia SBC 21.8 Software Licenses 3TB-30106-HSAA-TCZZA This document provides details
of the software licenses for Nokia
SBC.
Nokia SBC 21.8 Security Guide 3TB-30107-HSAA-TCZZA This document serves as a high-
level description of security
architecture of Nokia SBC in
integrated configuration.
Nokia SBC 21.8 Hardware 3TB-30108-HSAA-TCZZA This document provides

Description descriptive details of the RMS
hardware required for Nokia SBC.

Nokia SBC 21.8 Hardware 3TB-30109-HSAA-TCZZA This document provides details of

Configuration the RMS hardware configurations
required for Nokia SBC.
Nokia SBC 21.8 System and 3TB-30110-HSAA-PRZZA This document provides details
Network Parameters Job Aid of the system and network
parameters for Nokia SBC in MS
Excel format.
Nokia SBC 21.8 System Ports and 3TB-30111-HSAA-TCZZA This document provides details of
Protocols Job Aid the system ports and protocols
for Nokia SBC in MS Excel format.
Nokia SBC 21.8 A-SBC Rx 3TB-30112-HSAA-PBZZA This document provides the Rx

Interface Specification diameter interface specification
including the diameter messages/
commands and Diameter
Attribute Value Pairs applicable to
Nokia A-SBC.
Nokia SBC 21.8 Key Performance 3TB-30113-HSAA-TCZZA This document lists the Key
Indicators Performance Indicators (KPI) and
Measurements available for Nokia
SBC, which are critical indicators
used for dimensioning purposes.
Nokia SBC 21.8 Privacy 3TB-30114-HSAA-DWZZA This document provides

Considerations information on the Nokia SBC
product features that impact
privacy, and the measures taken
to protect such data.
Nokia SBC 21.8 Commands 3TB-30115-HSAA-TCZZA This document provides the

Reference Guide syntax and descriptions of all the
CLI commands available for Nokia
SBC.

Nokia SBC 21.8 Release Changes 3TB-30116-HSAA-TCZZA This document lists the changes
to Ports, Protocols, System and N/
W Parameters, PM Counts, Alarms,
Charging Records and so on for
SBC Release 21.8 w.r.t its previous
main release.
Nokia SBC 21.8 Signaling 3TB-30117-HSAA-TCZZA This document provides details

Administration of the various administrative
operations that can be performed
for the Nokia SBC by a user with
administrative privileges.
Nokia SBC 21.8 SIP Interface 3TB-30121-HSAA-PBZZA This document provides the SIP
Specification interface specification details for
Nokia SBC.
Nokia SBC 21.8 Configuring SBC 3TB-30122-HSAA-TCZZA This document provides

for MS Teams configuration steps that need to
be performed at the Nokia SBC, to
connect with the Microsoft Teams.
Nokia SBC 21.8 Connecting with 3TB-30123-HSAA-TCZZA This document provides

Emergency Service Providers for configuration procedures that
MS Teams need to be performed for the
Nokia SBC, to connect with the
Emergency Service Providers.
Nokia SBC 21.8 Data Flows 3TB-30130-HSAA-EBZZA This document provides data flow
information for Nokia SBC, to
identify interfaces with sensitive
data and use of encryption.
Nokia SBC 21.8 Release Notes The release notes for this Nokia
SBC release.

Legal notice
Nokia is committed to diversity and inclusion. We are continuously reviewing our customer documentation and
consulting with standards bodies to ensure that terminology is inclusive and aligned with the industry. Our
future customer documentation will be updated accordingly.
Nokia is a registered trademark of Nokia Corporation. Other products and company names mentioned herein
may be trademarks or tradenames of their respective owners.
Document support
For support in using this or any other Nokia (former Alcatel-Lucent) document, please call one of the following
telephone numbers.
From United States
• If you are using a landline, a cellular phone or VoIP, dial this number: 1-888-582-3688
From other countries
• If you are using a cellular phone or VoIP, dial this number: +1-469-646-4025
• If you are using a landline (phone without a plus [+] character), replace the plus sign with your country's
exit code. Dial this number: Exit code for the country of origin: +1-469-646-4025. See the country-
specific exit codes listed http://www.howtocallabroad.com/codes.html.
Technical support
For technical support, contact your local customer support team. See the Support web site (https://
networks.nokia.com/support/) for contact information.
How to order
To order Nokia documents contact your local sales representative or use Nokia Support portal.
How to comment
To comment on this document, go to the Online Comment Form or e-mail your comments to the Comments
Hotline (comments@nokia.com).

Troubleshooting Guide Introduction to troubleshooting
2 Introduction to troubleshooting
This chapter provides general information about the troubleshooting process, guidelines, and tools, along with
a workflow for troubleshooting a problem in Nokia SBC.
2.1 Workow to troubleshoot a problem on Nokia SBC
General information
The troubleshooting process workflow identifies and resolves issues related to a service or component. The
issue can be an intermittent or a continuous degradation in service, or a complete network failure.
The first step in problem resolution is to identify the problem. Problem identification can include an alarm
received from a component, an analysis of performance data, or a customer problem report.
The personnel responsible for troubleshooting the problem must:
• understand the designed state and behavior of the network, and the services that use the network.
• recognize and identify symptoms that impact the intended function and performance of the product.
Overview of troubleshooting problem-solving model

An effective troubleshooting problem-solving model consists of the following tasks:
1. Establish a performance baseline on page 15
2. Categorize the problem on page 15
3. Identify the root cause of the problem on page 16
4. Plan corrective action and resolve the problem on page 16
5. Verify the solution to the problem on page 17
Establish a performance baseline

You must have a thorough knowledge of the network element and how it operates under normal conditions
to troubleshoot problems effectively. This knowledge facilitates the identification of fault conditions or
performance concerns. You must establish and maintain baseline information for your network and services.
The maintenance of the baseline information is critical because a network is not a static environment.
Categorize the problem

When you categorize a problem, you must differentiate between total failures and problems that result in a
degradation in performance. Performance degradations exhibit different symptoms from total failures and
may not generate alarms or significant network events. Multiple problems can simultaneously occur and create
related or unique symptoms.

Detailed information about the symptoms that are associated with the problem helps the Support engineer
diagnose and fix the problem. The following information can help you assess the scope of the problem:
• Alarms
• Error logs
• Performance statistics
• Accounting logs
• Customer problem reports
Use the following guidelines to help you categorize the problem:
• Is the problem intermittent or static?
• Is there a pattern associated with intermittent problems?
• Is there an alarm or network event that is associated with the problem?
• Is there congestion in the routers or network links?
• Has there been a change in configuration since proper function?
Identify the root cause of the problem

A symptom for a problem can be the result of more than one issue. You can resolve multiple, related problems
by resolving the root cause of the problem. Use the following guidelines to help you implement a systematic
approach to resolve the root cause of the problem:
• Focus on the resolution of a specific problem.
• Divide the problem based on signaling or bearer planes and try to isolate the problem to one of these
planes.
• Determine the SBC state before the problem appeared.
• Extrapolate from alarms and logs of events the cause of the symptoms. Try to reproduce the problem.
Plan corrective action and resolve the problem

The corrective action required to resolve a problem depends on the problem type. The problem severity and
associated QoS commitments affect the approach to resolving the problem. You must balance the risk of
creating further service interruptions against restoring service in the shortest possible time.
The corrective action should:
1. Document each step of the corrective action.
2. Test the corrective action.
3. Verify behavior changes in each step.
4. Apply the corrective action to the live network.

5. Test to verify that the corrective action resolved the problem.
Verify the solution to the problem

You must make sure that the corrective action associated with the resolution of the problem did not introduce
new symptoms. If new symptoms are detected, or if the problem has only recently been mitigated, you need to
repeat the troubleshooting process.
2.2 Troubleshooting guidelines
Checklist for identifying problems

When a problem is identified, track and store data to use for troubleshooting purposes:
• Determine the type of problem by reviewing the sequence of events before the problem occurred:
– Trace the actions that were performed to see where the problem occurred.
– Identify what changed before the problem occurred.
– Determine whether the problem happened before under similar conditions.
• Check the documentation or your procedural information to verify that the steps you performed followed
documented standards and procedures.
• Check the alarm log for any generated alarms that are related to the problem.
• Record any system-generated messages, such as error dialog boxes, for future troubleshooting.
• If you receive an error message, perform the actions recommended in the error dialog box, client GUI
dialog box, or event notification.
During troubleshooting:
• Keep both the Nokia documentation and procedures nearby.
• Check the appropriate release notice from the Nokia Support Documentation Service for any release-
specific problems, restrictions, or usage recommendations that relate to your problem.
• If you need help, confirmation, or advice, contact your technical support representative.
2.3 Things to do before escalating to level 4 support
Overview
View the SBC Support Checklist page and perform the checklist tasks provided there, before contacting SBC
support team.
Also, use the following resources and perform health check and collect logs, as need be:

• Troubleshooting tools - logs, diagnostics and health checks on page 18
• Collecting SBC logs using sbc_logs command on page 19
2.3.1 Troubleshooting tools - logs, diagnostics and health checks
Diagnostics, audits, and logs

SBC supports a number of troubleshooting tools and event logs to help identify the root cause of a problem.
Diagnostics and health checks

The SBC system health can be checked by using the health command (which checks signaling health only) or
the sbc_health command (which checks both signaling and media health). These health checks are useful
and powerful tools for checking the overall health of the SBC. Use the-h option for a detailed help menu.
If no argument is provided, the health tool will apply all the checking tests, includes ssh_check, version check,
cpm check, db check, connectivity check, diskspace check, shmc check, hub check, REM state check, logical
volume check, VM state check, alarm check and so on.
Any issues or errors will be printed to the screen. You will also be able to check the errors via /var/opt/log/
health/health.log. If you are seeing a problem, running a health check is a good first step to determine
what the cause might be.
SBC log les

The following are log files available on SBC:
• /var/log/auth.log: All authentication related log info, each user login attempts will be logged here.
• /var/log/bash.log: All signaling and media CLI command activities will be logged here (media plane
logged based on log level, ERROR level by default).
• /var/log/syslog: All signaling and media syslog.
• /export/home/lss/logs/master.log: Signaling application log, call log.
For information on viewing log files, see Procedure for viewing SBC log files on page 18 section.
• /export/home/lss/logs/media*.log: Media plane syslog, call log.
• /opt/sbc/CurrRel/logs: All SBC OAM logs.
Procedure for viewing SBC log les

For viewing the list of SBC log files and examine their contents, the user can SSH to the OAM VM and then
proceed to the log directory using the following procedure:

1. Log in from your computer to OAM server by performing SSH to OAM server of SBC and logging in as root
user using root password.
$ ssh root@123.456.789.6
Password: <enter root password>
2. Navigate to the ~lss/logs directory to view the SBC log files.
User log les

The following are user log files available on SBC:
• /var/log/bash.log: All signaling and media CLI command activities will be logged here (media plane
logged based on log level, ERROR level by default).
• /export/home/lss/logs/media*.log: Media plane syslog only, redirected by rsyslog. Log level

setting from media plane, ERROR level is the default setting.
SBC log scrambling for ensuring data privacy

The scrambling and anonymizing of customer sensitive data (like equipment identifier, subscriber identity,
IP addresses, host names, and so on) is legally required to facilitate troubleshooting by teams from remote
locations. The Compliance Audit and Privacy System Vault (CAPS-V) is the official web-based tool for
privacy data scrambling, which is supported by CAPS. For more information, refer the CAPS portal (https://
caps.americas.nsn-net.net/projects).
To support Call Detail Record (CDR) scrambling using CAPS-V, the CDR in ASN.1 binary format has to be
decoded into ASCII format first, by using asnccflSearch CLI command, as explained below:
1. Log in to standby CDR card and find the encoded ASN.1 CDR file (which is to be scrambled) from the
following directory:
/storage/ccfl_app/charging/stream1/primary
or
/storage/ccfl_app/charging/stream1/secondary
2. Execute the following command:
asnccflsearch <cdrfile.name>
where <cdrfile.name> is the name of ASN.1 CDR file.
3. The output file <cdrfile.anme>.decoded present in /export/home/lss/ccfl_decode/ directory

has to be used as the input for CAPS-V.

2.3.2 Collecting SBC logs using sbc_logs command
Details
Provided below is a guideline on collecting various logs using sbc_logs command, before you contact Level
4 support. The sbc_logs command can be used to retrieve log files on both signaling plane as well as media
plane, and collect status data as well as configuration data on the system.
The sbc_logs command (from /opt/LSS/sbin/sbc_logs directory) can be executed from the standby MI
card.
Command Syntax
sbc_logs [ --help | --examples ] [ --level <n> ] [--run_health] [--only_sig | --

only_media] [--only_pm] [--health_logs] [--callp_logs] [--option]
Parameters
--help, -h : Displays command usage.
--examples: Displays command usage, description, as well as examples.
--level, -l <levelnum>: Set log retrieval level where <levelnum> values can be:
• 1: Collects the current logs (default).
• 2: Collects current logs as well as historic log files.
• 3: Executes get_logs command to collect log files.
Attention:
The sbc_logs command with --level 2 should be only executed during a maintenance window.
However, if it is necessary to execute the sbc_logs command with --level 2 during normal
operating window, avoid executing this command during peak hours. Also continuously monitor CPU
usage on Media VMs during the course of execution of this command.
--run_health, -e: Execute health --test all
--only_sig, -s: Collects only the signaling plane logs.
--only_media, -m: Collects only the media plane logs.
--only_pm, -p: Collects only the PM data files from both the signaling plane as well as the media plane.
--health_logs, -d: Bundles and collects the lcp_status, health check and the alarm_cli outputs, tagged
with their version information.
--callp_logs, -c: Bundles and collects the master.logs, signalling.logs and the fslogs.
--option, -o: Choose specific logs which are to be collected.
--version, -v: Displays the sbc_logs version information.

Examples
To print the help information for sbc_logs command, execute:
sbc_logs --help
To collect the current logs from the active CNFG card, execute:
sbc_logs
Or
sbc_logs --level 1
To collect PM data of just the first 10 entries, in the last 11 files, execute:
sbc_logs --level 1 tail -n 11 | head -n 10
To collect the current as well as historic log files from the active CNFG card, execute:
sbc_logs --level 2
To execute get_logs command to retrieve information from both NODE_LOCAL and NODE_NETWORK hosts,
execute:
sbc_logs --level 3
To run health check by executing health --test field_install command initially and then subsequently
collect the signaling plane as well as media plane log files, execute following command.
sbc_logs --run_health
To collect the current as well as historic signaling plane log files, execute:
sbc_logs --level 2 --run_health --only_sig
To collect the current as well as historic media plane log files, execute:
sbc_logs --level 2 --run_health --only_media
To collect the PM count data files both on the signaling plane and the media plane, execute:
sbc_logs --level 2 --only_pm
To choose the specific log files (like master.log) to be collected in level 2, execute:
sbc_logs --level 2 --option
Note:
• If system prompts: “Are you sure you want to continue connecting (yes/no)?", input Yes and press ENTER.
• If ^M text format issue is there, then retry after converting the script from Microsoft Windows style to
Linux style using dos2unix Linux command.
• Press CTRL+C to terminate the execution of sbc_logs command. Then use ps -ef | grep sbc_logs
to check if it is still running in the background. Use kill -9 pid to terminate the background program.

2.3.3 Collecting CBIS/Cloud platform audit logs

Whenever there are issues with SBC Virtual Network Function (VNF) on CBIS/Cloud platform, collect the
following standard list of command outputs from the platform to audit the SBC settings. This audit helps check
if SBC Virtual Machines (VMs) lack any non-pinned VMs on same host, or about NUMA settings, and so on.
Verifying VM conguration
1. Log in to Undercloud server as a stack user with appropriate privileges.
2. Export Overcloud credentials by executing the command:
# source overcloudrc
3. Collect availability zones by executing the command:
# nova availability-zone-list > cmd_nova_zone_list
4. Collect aggregates list by executing the command:
# nova aggregate-list > cmd_nova_aggr_list
5. Create a file with openstack server show commands added in it for all SBC VMs, by executing the
single command:
# openstack server list --all |egrep -i '\| sbc|\| asbc|\| psbc' |

awk 'BEGIN{FS=" "; OFS=""} {print "openstack server show ",$2, " >
cmd_op_server_show_",$4}' > cmd_gen_showAllSBCvms
Note: The above command assumes that SBC VMs prefixes are sbc, asbc, and/or psbc.
6. Concatenate generated file by executing the command:
# cat cmd_gen_showAllSBCvms
7. Verify that commands are generated for all the following SBC VMs:
• Signaling plane: OAM, SC, DFED, CFED, BGC, FW, and iCCF
• Media plane: SCM, PIM, and MCM
Note: If some SBC VMs are missing then the assumption from previous step is not correct. You
need to create other command with proper prefixes
8. Collect SBC VMs detailed information (like state, addresses, flavor and so on) by executing the
commands:
# chmod 755 cmd_gen_showAllSBCvms
# ./cmd_gen_showAllSBCvms

9. Execute the command:
# ls -lrth cmd_op_server_show*
Confirm that the displayed servers' data is written to the files, by confirming that the size of all the
collected files are greater than zero (0) Kb.
10. Collect the details of all volumes (like size, what VM it is attached to, and so on) by executing the
command:
# openstack volume list --all > cmd_op_volume_list_all
11. Collect the details of all flavors (like ram, vcpu, swap, disk and so on) by executing the command:
# openstack flavor list --all --long > cmd_op_flavor_list_all
12. Collect the details about all the compute host information by executing the single command:
# openstack server list --all --long -c ID -c Name -c "Host" -c "Networks" -c

"Availability Zone" > cmd_op_server_list_all
Verifying NUMA and CPU pinning conguration data

1. Log in to Undercloud server as a stack user with appropriate privileges.
2. Collect the instance CPU configuration and NUMA nodes information, by executing the command:
# salt '*ompute*' cmd.run 'lscpu' > cmd_lscpu
3. Collect details about the pcpus which that instance of vcpus can use by executing the command:
# salt '*ompute*' cmd.run 'cat /etc/nova/nova.conf | grep vcpu_pin_set | grep -

v "#"' > cmd_grep_nova
4. Collect the list of all instances in the compute nodes by executing the command:
# salt '*ompute*' cmd.run "virsh list | awk '{print \$2}' | grep inst | xargs
> /tmp/vlist"
5. Collect CPU pinning information for all instances in the compute nodes by executing the command:
# salt '*ompute*' cmd.run 'for i in `cat /tmp/vlist`; do echo $i; virsh vcpupin
$i ; done; rm -rf /tmp/vlist' > cmd_vcpupin
6. Confirm that size of all the collected files are greater than zero (0) Kb. Execute the command:
# ls -lrth cmd_*
Note: The list of files which should be collected by above procedure is:

• For VM configuration:
– cmd_nova_zone_list
– cmd_nova_aggr_list
– cmd_gen_showAllSBCvms
– cmd_op_server_show_*
– cmd_op_volume_list_all
– cmd_op_flavor_list_all
– cmd_op_server_list_all
• For NUMA configuration:
– cmd_lscpu
– cmd_grep_nova
– cmd_vcpupin
• CONFIG.TXT file (from active SCM at /opt/v7510/data/ directory)
2.3.4 Collecting logs from CBAM VNF

Whenever there are issues with SBC Virtual Network Function (VNF) on CloudBand Application Manager (CBAM)
cloud platform, collect the following standard list of logs, before you contact Level 4 support.
1. Log in to the CBAM server.
2. Change directory by executing following command:
cd /home/cbam
3. Download getLogs tool by executing following command:
wget https://repo.lab.pl.alcatel-lucent.com/artifactory/sbc-generic-releases/3rd/mgw/cbam_tools/
getLogs/1/getLogs
4. Add execute permission by executing following command:
chmod +x getLogs

5. Execute the getlogs command to collect logs from specified VNF:
./getLogs <output_name> <VNF_id>
Parameters:
• -h:
Shows the help message.
• -v:
Shows the version.
• <output_name> <VNF_id>:
Collect logs from <VNF_id> (format: CBAM-xxxx...) and saves zip file to ./<output_name>.zip
• -q <output_name> <VNF_id>:
Quiet mode. Shows less information.
2.3.5 Collecting VMM-HI logs on RMS

Whenever there are issues with VMM-HI platform or SBC on RMS server, collect the following standard list of
logs on RMS with VMM-HI platform, before you contact Level 4 support.
Execute getlogs command (/opt/vcp/sbin/getlogs) as a root user.
Parameters:
• -h, --help:
Shows the help message and exits.
• -t HOST, --target=HOST:
When this parameter is used, getlogs command creates logs from the specified host(s) only. Possible
values are:
[host01,host02,host03]|all
When the -t HOST, --target=HOST parameter is not used then, the getlogs command gets
executed for all hosts, by default.
Note:
• System takes few minutes to create the logs. The logs are stored at /storage/logs/
<site_name>_hosts_<date>.zip.
• The main log zip file contains one or more zip files, with logs from specified hosts, and logs from
OAM VM (if it exists).

2.4 Performance check list on receiving outage calls for SBC media
Overview
The L4 support people need to answer following questions:
• What was the call about? on page 26
• What logs/traces were requested? on page 26
• What recovery actions were performed? on page 27
• What was the follow up performed after the call? on page 27
What was the call about?

• What is the exact issue?
• When did this issue happen?
• Has traffic already been switched to other SBC (if we have)?
• Did the case ever work before?
• Was any operation like upgrade, cut-over been executed on elements (not only SBC) before this issue
happened?
• Has any trace been collected?
• Has a preliminary analysis been performed on the collected trace?
What logs/traces were requested?

On CBAM/CBIS:
• Are the status of VNFs and instances normal?
On Media CLI: Were the following commands executed?
• view node (if all nodes are up)
• view date
• view uptime all (if any node was rebooted recently)
• view version all
• view redundancy
• view vmg status (if all vmgs are in registered status)
• view resource usage (if there is free MCP and PIP resource)
• view overload status (if any node is in overload status)
• view alarm active

• view alarm history
• view ip if
• view nh-ip-monitor session
• view h248 statistics current all-vmg
• view h248 statistics history all-vmg
• view traffic statistics current global
• view traffic statistics history global
• view admiss mcm statistics current
• view admiss mcm statistics history
• diag view rmgr cache filled mcp
• diag define admiss mcm rt en 200
• ping (ping peer side or next-hop)
On VM: Were the following commands executed?
• tipc-config -l
• ifconfig
• top
• pgrep mpu
• pidstat -p (pgrep mpu) -t 1 (on MCM VM)
• mpstat -P ALL 1 (on MCM VM)
Bz2 files, CLI logs, message files, syslog files, MEGACO and RTP
What recovery actions were performed?

Were the following actions performed?
• Reboot module on media CLI
• Nova restart instances on CBIS
• Shut off/start instance on dashboard
• Re-deploy
What was the follow up performed after the call?

• Subsequent log
• Reproduction

• Emergency patch
• Official release
• Re-deployment

Troubleshooting Guide Troubleshooting with Alarms
3 Troubleshooting with Alarms

This chapter provides procedures for obtaining alarm information on SBC and provides references to trouble
analysis practices for determining the cause and correction of the problems.
3.1 Determining alarm severity
Overview
SBC alarms are of the following five severities:
• Critical alarms - Critical alarms are used to indicate that a severe, service-affecting condition has
occurred and that immediate corrective action is imperative, regardless of the time of day or day of the
week.
• Major alarms - Major alarms are used for conditions that indicate a serious disruption of service or the
malfunctioning or failure of important functions. These troubles require the immediate attention and
response of a crafts person to restore or maintain system capability. The urgency is less than in critical
situations because of a lesser immediate effect or impending effect on service or system performance.
• Minor alarms - Minor alarms are used for troubles that do not have a serious effect on service to
customers or for troubles in functions that are not essential to NE operation.
• Warning alarms - Warning alarms indicate the detection of a potential or impending service affecting
fault, before any significant effects are felt. Action should be taken to further diagnose (if necessary) and
correct the problem in order to prevent it from becoming a more serious service affecting fault.
• Indeterminate alarms - Indeterminate alarms are alarms whose severity level could not be determined.
3.2 Viewing active SBC alarms in Web UI

1. Log in to the SBC Web UI.
2. Go to the Fault Management screen to get a listing of all active alarms. To navigate to Fault Management
screen, do one of the following:
• Either click on Fault Management after logging in to SBC Web UI.
• Or, click Dashboard, then click on alarms chart displayed for alarms of any Managed Elements in
Alarm Info Overview. Click Yes in View Alarm Confirmation pop-up screen.
• Or, click menu icon and select Fault Management.
Expected outcome
The application displays a tabular listing of all active alarms in Fault Management screen.

3. Double click on a listed alarm, to view its detailed information.
Expected outcome
The details of the alarm get displayed in View Alarm pop-up screen.
3.3 Viewing proposed repair actions for an active SBC alarm in Web UI
1. To view the details of a currently active alarm of a managed element, perform all the steps of Viewing
active SBC alarms in Web UI on page 29 procedure.
2. Navigate to Custom Fields tab.
Expected outcome
The proposed repair actions for this alarm are displayed under Proposed Repair Actions section.
3.4 Viewing proposed repair actions for an active SBC alarm without using Web
UI
1. Determine the alarm name for the specific alarm which is being analyzed.
2. Search for (or look up) the alarm name in the SBC Alarms spreadsheet.
The SBC Alarms spreadsheet is available in Microsoft ® Excel format, on the Nokia Support portal. To
download the excel file, login to the Nokia Support portal and navigate to the Session Border Control
(SBC) page and then select Documentation: Doc Center. Search using keyword (Alarm) and locate the
Alarms spreadsheet for the appropriate release of the SBC application. This spreadsheet lists the SBC
alarms along with their associated attributes.
Note:
Specific alarms can also be located by searching the entire spreadsheet using the Additional text
field of the alarm or by using the Specific Problem field of the alarm.
3. The specific alarm's repair actions will be listed in the spreadsheet column labeled Action/trouble
resolution steps to be taken by maintenance personal.
Note:
You can also view other attributes of the alarm in the other columns of the spreadsheet.

3.5 Troubleshooting using FM process

1. Check FM process status by executing following commands:
cd /opt/sbc/CurrRel/apps/fault-mgt
./runFM.sh monitor
2. Stop FM process by executing following commands:
./runFM.sh stop
Note:
Once FM is stopped, it will be restarted after 30 secs automatically, irrespective of whether step
3 is performed or not.
3. Start FM process by executing following commands:
./runFM.sh start
4. The FM log location is:
/opt/sbc/CurrRel/logs/fault-mgt*
5. By default, the FM log level is INFO. The user can change it to DEBUG for issue debugging if necessary as
shown below.
cd /opt/sbc/CurrRel/apps/fault-mgt/config
vi logback.xml
change INFO to DEBUG
Save the modification. It will take effect after 30s or you can restart FM process to make it take effect
immediately.

3.6 Troubleshooting when unable to open Fault Management view in Web UI

1. Log in to the OAM server and check FM process status.
2. If FM process cannot be started, check whether:
• Port 8001 is available: This port is used for FM northbound interface.
• Port 162 is available: This port is used to receive alarms forwarded by MITrapreceiver.
3. When Port 8001 and Port 162 are unavailable, check which process is using them by executing below
commands (FM process requires to use these two ports):
ps -ef | grep DProcess
lsof -i:<port number>
3.7 Troubleshooting when no Alarms are present for Media/Signaling plane in

Web UI
1. It may be a normal case if there is no alarm sent from the media or signaling plane. So first check if there
are indeed alarms sent from media or signaling plane. You can check alarms DB on media or signaling
plane to ensure this.
2. Check that there are no manual configuration changes for:
• SNMP setting between MITrapreceiver and media or signaling plane
• SNMP setting between MITrapreceiver and FM process (/opt/LSS/share/xml/

TrapForward2_sbc.xml),
If yes, please use the default values.
3. If the issue still exists, please use the command below to catch the PCAP files. Then check the PCAP file
with Wireshark and make sure:
• SNMP traps are received for real-time alarms,
• Both SNMP get/get-next request and response are successfully sent and received for alarm re-
synchronization.
tcpdump -ni any udp -w yourfilename.pcap
Please pay attention to the IP, port and community string. And it would be better to run the command
more than 9 minutes, because alarm re-synchronization will be run every 9 minute, by default.

3.8 Troubleshooting using help provided for Media and Signaling Alarms in Web
UI
Double-click on an alarm listed in the Fault Management screen and then navigate to Alarm Help tab.
Expected outcome
The alarm's help information is displayed.
3.9 Troubleshooting OAM internal alarms
Description
The troubleshooting for this is same as the mentioned for Troubleshooting using help provided for Media and
Signaling Alarms in Web UI on page 32.
Currently the following 3 types of internal alarms are supported:
• Alarm resynchronization failure: Please refer to step 2 and 3 in the Troubleshooting when no Alarms are
present for Media/Signaling plane in Web UI on page 32 section.
• Trap burst: Too many incoming traps as a result of which the trap mapping internal queue size exceeds
threshold.
• Alarm Queue Overflow: Too Many Alarms need to be handled and internal alarm system event queue size
exceeds threshold.
Help for HP iLO alarms

This capability will only be necessary for customers not employing a separate EMS. By default, this capability is
disabled. And no extra help information is provided but the SNMP trap content.
3.10 Troubleshooting when no Alarms are sent to Northbound OSS

1. Log in to SBC Web UI and navigate to Fault Management.
2. Select Settings icon.
Expected outcome
The Configurations screen is displayed.
3. Check the OAM Configuration and OSS Configuration in the Configurations screen. Ensure that SNMP
setting are correct for both OAM and OSS.

4. If the issue still exists, use the command below to catch the PCAP files. Then analyze the PCAP file with
wireshark and make sure:
• SNMP alarms are sent to OSS IP/Port successfully.
• SNMP alarms are received by OSS successfully.
tcpdump -ni any udp -w yourfilename.pcap

Troubleshooting Guide Troubleshooting connectivity/configuration issues
4 Troubleshooting connectivity/conguration issues

This chapter describes the troubleshooting procedures for resolving connectivity or configuration issues.
4.1 Troubleshooting SBC general connectivity or conguration issues
Commands to verify operational and administrative states of service components

To verify the operational and administrative states of service components use the following commands:
• health --test connectivity to check connectivity issue.
• health --test config to check the configuration issue.
• Or you can always run health -a to check all components issues.
4.2 Troubleshooting connectivity issues (missing static routes)
Steps for troubleshooting connectivity issues (missing static routes)

You typically need static routes for all ip addresses in subnets other than the published (non-trusted signaling)
subnet unless SBPR (source based policy routing) is turned on. Without the static routes, traffic to those ip
addresses may be routed through the feph and will fail. This can show up in several different ways including:
• Calls fail because the external DNS servers are unreachable and core element FQDNs can't be resolved.
• Calls fail because core elements are unreachable.
To determine if you have problems with things not being routed correctly, you can check the master.log. Do a
grep in the master.log searching for KFEPH. For example:
<ibc02-s00c03h0:lss>/export/home/lss/logs:
>grep KFEPH master.log
FP_KRNL:KFEPH: eth0 ipd_prt=16 vlan0=800, vlan1=6 protocol=6 saddr=10.223.8.131

daddr=10.10.120.168, sport=57907, dport=3868
In the above example the sending address (saddr) is 10.223.8.131 which is an IP address from the trusted
signaling subnet, and the destination address (daddr) is 10.10.120.168 which is a remote address used for
the E2 diameter connection to the CLF. Because the KFEPH report is printing, this indicates the route from our
trusted signaling subnet to the CLF is going through the FEPH untrusted signaling when it should not be. A
static route needs to be setup for this destination so that it goes through the trusted signaling subnet instead
of the default, untrusted signaling subnet.

The full report of the log shows the following. The last line starting with LOG_13 is saying that FEPH does not
have a trusted flow for this connection.
+++ 2011/08/02 16:40:33.399 FEPH HIGH ACTIVE feph:4191 E:3453368 S:271

(FPukMsgHdlr.cpp 66 M-0:1:4 24.15.01.00:1309213795 lssbld 169.254.168.0)
FP_KRNL:KFEPH: eth0 ipd_prt=16 vlan0=800, vlan1=6 protocol=6 saddr=10.223.8.131

daddr=10.10.120.168 sport=57907 dport=3868
LOG_13: PKT: New outbound flow BEPH class or instance lookup failure
4.3 Troubleshooting connectivity issues (other)
UE not allowed to be in FEPH (access signaling subnet)

For the UE to reach the P-CSCF published ip address it must be in a subnet that has a route to the access
signaling subnet but it cannot reside directly in the FEPH subnet. This can show up in trial situations where
there is a desire to save ip addresses/subnets by placing the UE in the FEPH subnet.
Connectivity Issues (Ping the P-CSCF published ip, not the FEPH published ip address)
After installing a system it is a good health check to verify that ip addresses are externally pingable. The FEPH
published ip address will not be pingable though, only the P-CSCF published ip address.
Check the /export/home/lss/logs/sig_firewall.log le for messages about

dropped packets
If you are seeing connectivity problems on the access signaling subnet (like can't ping P-CSCF published ip
address or SIP messages can't get to/from P-CSCF) you should check the firewall.log for details about
dropped packets.
4.4 Troubleshooting connectivity issues of the RMS system
About this task

In cases that the RMS system experiences network connectivity issues or unexplained timeouts when
accessing VMs or hosts, follow the troubleshooting instructions here to identify the causes and fix the issues.
The possible causes include but are not limited to the following:
• The cables connecting NIC cards are not working.
• NIC cards are broken.
• Network configuration is incorrect.

1. Log in to the RMS host as the root user.
2. Check the status of interfaces by entering:

ip address show
Expected outcome
In the output of the command, you can get the status of each interface (UP, DOWN, NO-CARRIER).
• UP: This is the correct status, which means that the interferce is working.
• DOWN: This status indicates that the configuration on either host or switch side is incorrect.
• NO-CARRIER: This status indicates that the cable is disconnected.
The following is an example output:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group

default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br1 state
UP group default qlen 10000
link/ether 94:57:a5:6d:ce:dc brd ff:ff:ff:ff:ff:ff
inet6 fe80::9657:a5ff:fe6d:cedc/64 scope link
3: eno2: <BROADCAST,MULTICAST,DOWN> mtu 1500 qdisc mq master br2 state UP
group default qlen 10000
link/ether 94:57:a5:6d:ce:dd brd ff:ff:ff:ff:ff:ff
inet6 fe80::9657:a5ff:fe6d:cedd/64 scope link
4: eno3: <BROADCAST,MULTICAST,SLAVE,NO-CARRIER> mtu 1500 qdisc mq master
bond3 state UP group default qlen 10000
link/ether 94:57:a5:6d:ce:de brd ff:ff:ff:ff:ff:ff
…
2789: vnet23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
master br990 state UNKNOWN group default qlen 10000
link/ether fe:84:60:93:01:41 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fc84:60ff:fe93:141/64 scope link

3. Check the defailed information of an interface for a NIC card by entering:

ethtool <NIC>
For example:
ethtool eno1
Expected outcome
Settings for eno1:

Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes: 10baseT/Half 10baseT/Full
1000baseT/Full
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: on
Supports Wake-on: g
Wake-on: g
Current message level: 0x000000ff (255)
drv probe link timer ifdown ifup rx_err tx_err
Link detected: yes

4. Check the following parameters in the command output of the ethtool <NIC> command, and fix any
issue that can be identified.
• Supported link modes: This parameter shows the link modes supported by the interface.
• Advertised link modes: This parameter shows the link modes that are currently advertised to
the other end (switch or second host) of the NIC card.
• Link partner advertised link modes: This parameter shows the available modes advertised
by the other end of the NIC card.
• Speed: This parameter indicates the current link speed. In cases that cable is not working or
interface settings are incorrect, you may not get the expected speed or the speed may not match the
speed of the other end of the NIC card.
• Duplex: This parameter indicates the duplex of the current interface, and the duplex should be
Full. In cases that cable is not working or interface settings are incorrect, you may get incorrect
duplex.
• Auto-negotiation: This parameter indicates the current auto-negotiation status. Both ends of
the NIC card should have the same speed, duplex and auto-negotiation settings.
• Link detected: This parameter indicates whether the physical link of the NIC card has been
detected. If the status is no, it means that one of the interfaces is disconnected or the cable is not
working.
5. Ensure that your routing has correct gateways for each subnet. To check the routing settings, enter:
ip route show
Expected outcome
default via 1.2.3.1 dev br1

1.2.3.0/25 dev br1 proto kernel scope link src 1.2.3.19
192.168.64.0/20 dev bond3 proto kernel scope link src 192.168.72.8
4.5 Troubleshooting RCC oﬄine issue
Description
chk_RCC_VMstates RCC FAILED and chk_RCC_VMstates VM FAILED are observed in the health check report.
The output of RCCcstat shows that the RCC Clusters are offline. The following entry can be found in the
rccout.log under /opt/RCC/var directory:
2018 Jul 20 06:43:44:626: ** lcm: Recovery Mode 2 Init Threshold Exceeded, Going
Offline-L:3, C:2001000, LFLG:3

The root cause of this problem is that the RCC has a threshold (3 times in 20 minutes) for VM reboots. Once the
SBC VMs are rebooted many times thereby exceeding the threshold, then the RCC on the VMs become offline.
Solution
Log in to the VMs (where the RCC is offline) and manually run the following command:
sudo RCCmachonline -u
If the problem persists, contact next level support team.
4.6 Troubleshooting degraded conguration resource issue
Description
A warning is present in the health check report, as follows:
Running chk_REMstates to check REM states on hosts with REM-based services Checking
REM states of REM-controlled services WARNING: chk_REMstates: Shelf Slot Host T-
I-M (00 02 0 026-004-000) is WARNING: degraded and may cause an SU/Patch to fail
WARNING: To see which resources are degraded, WARNING: run "dumpcars -s -d 026 004
000"
The output of the dumpcars -s -d 026 004 000 command is:
The following log entry is present in the Host_Manager.log:
This issue is caused by the mismatch in the database configuration and the service.
Solution
Reboot the active IMS cards where the degraded configuration issue occurs. The standby cards become active.
Then reboot the IMS cards that are presently active.

4.7 Troubleshooting chk_db_repl FAILED and chk_database FAILED issue
Description
The following errors are reported in the health check:
chk_db_repl FAILED
chk_database FAILED
The output of the mysqlrepl_adm --action check_health --db webnms command is:
local host: qrms4-oam-a mate host: qrms4-oam-b
Start replication health check:
Slave_IO_Running: Yes Slave_SQL_Running: Standby Cold
This issue is caused by the corruption of master.info file present in /data0/db/webnms/ directory.
Solution
Manually re-synchronize the master.info file from the mate host, by running following command on the host
where the health check errors are reported:
mysqlrepl_adm --action resync_from_mate --db webnms
4.8 Troubleshooting PFW EIPM error
Description
The PFW EIPM error is reported while performing health check.
This is caused because of external network issue.
The root cause of the issue is that the VLAN 2102 is missing from 6125XLG switch on the chassis, where fwp-a
VM is located.
4.9 Troubleshooting SBC signaling plane health check error due to rolling inits
of the h248ds under bgc1-a and bgc1-b
Description
This issue causes the health check to complain about chk_rem_sv_bld and chk_connectivity failures.
Both above issues are caused by rolling inits of the h248ds under bgc1-a and bgc1-b.

Run the following command:
REMcli su 1 1 1
Initially: bgc1-a is Active and bgc1-b is in Standby mode.
After 2 minutes: bgc1-b is Active and bgc1-a is in Standby mode.
To view the logs, run the command:
tail -f master.log
As per the log: bgc1-a goes SM_ACT and then 2 minutes later bgc1-b goes SM_ACT
Solution
Perform the following to recover from this issue:
• Perform a hard reboot of bgc1-a VNF and bgc1-b VNF on the cloud GUI.
After about 5 minutes the problem is resolved. The BGC VM is in a steady Active/Standby status. The
health check runs clean.
Note:
Even if the MI is active on oam-b on verifying using MIcmd state vc command, running MIcmd
switch vc command to force it to be active on oam-a doesn't resolve this issue.
4.10 Troubleshooting wrong media plane image issue in OpenStack
Description
The issue occurs when media plane (MCM id 1) has a wrong image and VM will not come up in OpenStack
environment.
Solution
Perform the following steps to resolve this issue:
1. Delete the wrong media plane image from MCM id 1

2. Run following command to recreate server from the appropriate media plane image:
openstack --insecure server rebuild --image <image name or ID> <server>
For example:
openstack --insecure server rebuild --image bf79fcd9-

dcb8-4d24-8f7c-76a63a5a09cf d4848de19-a87d-42d9-8cd7-baaf860fc8cb
3. Verify MCM status on the OpenStack dashboard.
It changes from Rebuild to Active once the VM is up.
4.11 Troubleshooting SRIOV service issue on compute node
Description
When SRIOV service has an issue in compute mode, and only a lower number of VFs (for example, 189 nos.) are
running instead of expected number of VFs (for example, 252 nos).
Solution
Perform the following steps to resolve this issue:
1. Run the following command to get list of VMs on the compute:
virsh list
2. Run the following command to find VM name:
virsh dumpxml <instance id> |grep name
3. Run the following command to find MAC addresses for all vNICs assigned to the VM:
virsh dumpxml <instance id> |grep mac
4. Run the following command to get a list of all active SR-IOV VFs on the compute and to look for the VLAN
values and MAC addresses for a given VM:
ip link ls |grep vlan
5. Correct the config_sriov.py SRIOV script to get compute recovered.
6. Run following command to recover VMs after fixing SR-IOV state:
nova reboot -hard <vm uuid>

7. Run following command to clear ERROR state (and then perform a hard reboot again):
nova reset-state -active <vm uuid>
4.12 Troubleshooting SBC OAM service issues
Troubleshooting SBC OAM service issues

To verify all SBC Web UI processes are all up and in a healthy state, execute the following commands:
• Execute ps -efa | grep java| grep Process to make sure all State-Management, Performance-
Management, Fault-Management, and web ui java processes are up and running.
• Execute ps -efa | grep ncagent to make sure netconf agent process is up and running.
• Go to the directory cd /opt/sbc/CurrRel/logs/
All processes log file located here. You could grep ERROR or Exception for any issue.
To restart SBC OAM processes, execute the following commands:
• systemctl stop LU3Psbc_oam to stop OAM java processes, webui/pm/fm/sm
• /opt/sbc/CurrRel/apps/netconf/script/stop_isbcnetconf to stop C process netconf

process
• systemctl start LU3Psbc_oam to start OAM java processes, webui/pm/fm/sm
• /opt/sbc/CurrRel/apps/netconf/script/start_isbcnetconf to start C process netconf

process
If you fail to login with existing/default user/password
• Check if all OAM processes are up and running.
• Check if CPM is running by executing health --test cpm
If CPM is disabled, execute the command cpm_adm --action setup_cpm;cpm_adm --action

pre_enable_cpm;cpm_adm --action enable_cpm to enable CPM.
500 Internal Server Error:
• 500 Internal Server Error will provide you the detailed error message per Request URL
provided in the Error message. This will tell you which process is down. For example, Request
URL:https://135.252.144.22:8443/oam/pm/dashboard/counter indicates Performance
Management process did not respond to the Web UI request.
• The easiest way to solve this problem is to restart the OAM processes, as explained above.

4.13 Troubleshooting connectivity issues between SBC FW VM and remote S-

GW/P-GW
Details
The EIPM reports error when external connectivity is lost.
1. Execute the ipm_cli -a dump -t status command, to check overall EIPM status.
2. Execute the ipm_cli -a dump -t shm command, to dump EIPM shared memory and then check the
detailed interface/subnet status.
3. Perform tcpdump on FW VM and check if the packets in question did arrive at FW or not.
4. Perform traceroute and check if the packets are lost by the routers.
4.14 Troubleshooting SBC CLI commands timeout issue
Timeout issue for ip_adm/dns_adm/ntp_adm commands

If ip_adm/dns_adm/ntp_adm commands get timed out in the course of their execution, it results in
metadata errors.
Workaround
Execute the ip_adm/dns_adm/ntp_adm commands with either nohup or &, so that the commands keep
running in background and will not get timed out, thus avoiding this issue.
4.15 Troubleshooting partition table issue on RMS hosts
Description
After FRU installation/software upgrade is executed, the partitions tables on RMS hosts may differ from each
other. However, it is important that both the partition tables be exactly the same.
Solution
Perform the following steps to diagnose and fix this issue:

1. Run command on both hosts to get current partition tables:
ssh host01 parted -l | grep "Partition Table"
ssh host02 parted -l | grep "Partition Table"
Result:
The output might be different, for the commands executed on the 2 different hosts. For example:
Partition Table: msdos
Partition Table: gpt
2. If different outputs are seen for the commands executed on the 2 different hosts, then execute the host
FRU replacement procedure for the host with msdos partition table.
The issue is resolved successfully.

Troubleshooting Guide Troubleshooting SBC database problems
5 Troubleshooting SBC database problems
5.1 Troubleshooting 500 internal server error
Description
If restarting OAM processes does not solve the 500 internal server errors, one possible reason is there are
something wrong with DB content:
1. To confirm, check if there are DB Table Exceptions in /opt/sbc/CurrRel/logs/state-mgt*.
2. Check if MI processes are up and running. MIcmd state vc, MI service should be in A (active) state (if
MIcmd state vc reporting RCC error, please execute sudo RCCmachonline -u on each OAM host
first, then check MI service again).
3. If MI service is not in active state, then execute MIcmd start vc to start MI service.
4. After MI service get started successfully, restarting OAM processes could solve the DB content issue.
5. The Health tool also can be used to check the DB issue with health --test db to check if any db
issues are present.
Backup/restore
Following are the backup/restore methods:
• Using -v option on sbc_backup, sbc_restore_mp, and sbc_restore_all will provide additional output
helpful for debugging.
• These commands must be executed from the root login.
• These commands require password less access to the media plane (7510 MGW) through the root login.
If the -v output indicates an ssh failure you can confirm this by doing an ssh manually and observing
whether you can connect without being prompted for a password.

Troubleshooting Guide Troubleshooting SBC software upgrade issues
6 Troubleshooting SBC software upgrade issues

Starting from Nokia SBC 21.8 release, only Image Base Software Upgrade (IBSU) type of software upgrade is
supported by Nokia SBC.
• The IBSU is much more reliable than Sim Based Software Upgrade (SBSU). However, in case of permanent
failure, disaster recovery is the only possible rollback when using IBSU.
• The resume process has been automated and extended. This makes system more resistant to unexpected
events.
6.1 Checking and understanding ansible.log

The ansible.log file (present in /storage/guestdata/log/ansible.log) contains output from the
execution of SBC playbooks.
Figure 1: Ansible.log eld details
Details of information captured in the ansible.log file are:
• ok=<number> (for example, ok=19): This indicates that 19 tasks from playbooks have been executed
successfully.
• changed=<number> (for example, changed=3): This indicates that 3 tasks have changed something in the
existing configuration.
• unreachable=<number> (for example, unreachable=0): This indicates that 0 tasks failed due to
unreachable host.
• failed=<number> (for example, failed=0): This indicates that 0 tasks have failed.

• skipped=<number> (for example, skipped=12): This indicates that no change was required for 12 tasks.
• rescued=<number> (for example, rescued=2): This indicates that 2 tasks have failed, but ansible was able
to recover from the failure.
• ignored=<number> (for example, ignored=1): This indicates that 1 task has failed, however this failure
was expected.
Example 1:
…
2021-04-08 09:40:05,090 p=5871 u=root | TASK [sig-prep-host01 : Get md5 checksum
from downloaded file in /storage/guestdata/qcow2] ***
2021-04-08 09:40:05,169 p=5871 u=root | fatal: [localhost]:
FAILED! => {"changed": true, "cmd": "cat /storage/guestdata/qcow2/
sbc_signaling_jenkins_nokia-SBC_sig-RHEL7-R37.38.00.x86_64.qcow2.md5", "delta":
"0:00:00.002983", "end": "2021-04-08 09:40:05.157592", "failed": true, "rc": 1,
"start": "2021-04-08 09:40:05.154609", "stderr": "cat: /storage/guestdata/qcow2/
sbc_signaling_jenkins_nokia-SBC_sig-RHEL7-R37.38.00.x86_64.qcow2.md5: No such
file or directory","stdout": "", "stdout_lines": [], "warnings": []}
2021-04-08 09:40:05,169 p=5871 u=root | PLAY RECAP
*********************************************************************
2021-04-08 09:40:05,169 p=5871 u=root | localhost : ok=15 changed=8
unreachable=0 failed=1
In this example, the failure is caused by missing of MD5 file, and the failure appears in the sig-prep-host01
task.
Example 2:
…
2021-05-25 05:30:05,194 p=34592 u=root | TASK [close_lcm : lcm_perm --clean]
********************************************
2021-05-25 05:30:05,253 p=34592 u=root | fatal: [MI_F]: FAILED!
=> {"changed": false, "failed": true, "module_stderr": "Sorry,
user lcmadm is not allowed to execute '/bin/sh -c echo BECOME-
SUCCESSiphbymkjuxzxteazfmmcciytqrikgttu; LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8 /usr/bin/python' as root on rms09-oam-
a.\n", "module_stdout": "", "msg": "MODULE FAILURE", "parsed":
false} 2021-05-25 05:30:05,254 p=34592 u=root | PLAY RECAP
*********************************************************************
2021-05-25 05:30:05,254 p=34592 u=root | MI_F : ok=4 changed=1
2021-05-25 05:30:05,255 p=34592 u=root | localhost : ok=12 changed=0

In this example, the failure is caused because the lcmadm user tried to perform operations that can only be
done by the root user, and the failure appears in the close_lcm task.
6.2 Checking and understanding InteractiveInstaller.sh.log

The InteractiveInstaller.sh.log (present in /storage/guestdata/log/
sbc-su-log/InteractiveInstaller.sh.log) contains logs from the execution of
nokia.vmmhi_upgrade_sig_XX.YY.ZZ_ABC.sfx tool.
• The InteractiveInstaller.sh.log contains data useful from development as well as debugging

perspective.
• The user needs to have detailed SBC system knowledge for understanding the information contained in
InteractiveInstaller.sh.log
Checking sequence of executed steps

For checking the sequence of steps that were executed, run the following command:
grep LAST_ InteractiveInstaller.sh.log
Sample output:
# LAST_ACTION_NR=6
# LAST_ACTION_NR=6
# LAST_ACTION_NR=1
# LAST_ACTION_NR=1
# LAST_ACTION_NR=2
# LAST_ACTION_NR=2
# LAST_ACTION_NR=3
# LAST_ACTION_NR=3
# LAST_ACTION_NR=3
Note:
Duplicate entries are expected in the output. This information is useful in case steps get executed in
incorrect sequence in the procedure (for example, as 1,2,3,2,6).
Checking which step has failed to execute

For checking which step has failed to execute, run the following command:
grep ACTION InteractiveInstaller.sh.log
Sample output:
# LAST_ACTION_NR=1

ACTION_BEGIN-Mon Aug 23 05:19:42 UTC 2021-Automatic upgrade to 37.42.06 -

Executes steps from 1 to 5 automatically-
ACTION_BEGIN-Mon Aug 23 05:19:42 UTC 2021-Make system backup-
# LAST_ACTION_NR=1
# LAST_ACTION_NR=2
ACTION_END-Mon Aug 23 05:23:55 UTC 2021-253-Make system backup-
ACTION_BEGIN-Mon Aug 23 05:23:55 UTC 2021-Check system health-
# LAST_ACTION_NR=2
# LAST_ACTION_NR=3
ACTION_END-Mon Aug 23 05:36:49 UTC 2021-774-Check system health-
ACTION_BEGIN-Mon Aug 23 05:36:49 UTC 2021-Upgrade system or resume upgrade-
# LAST_ACTION_NR=3
# LAST_ACTION_NR=3
ACTION_EXIT-Mon Aug 23 08:40:50 UTC 2021-11041-Upgrade system or resume upgrade-
ACTION_EXIT: This indicates the name of the failed step, and also the exact date of failure. After finding this
date and time of the failure, inspect other logs or check InteractiveInstaller.sh.log file for more
details.
6.3 Troubleshooting VMM-HI upgrade

If any problems are encountered while performing VMM-HI upgrade then, inspect the following logs to identify
the source of the problem:
• /storage/guestdata/log/sbc-su-log/InteractiveInstaller.sh.log
• /var/log/vcp.log
• /storage/pcs/log/update/pcs.log
• /var/opt/vcp/log/cm.log
• /var/opt/vcp/log/cm_healthchk.log
• /var/log/messages
Note: Detailed explanation of all problems that might occur during VMM-HI upgrade is out of scope of
this section.
6.4 Troubleshooting SIM Based Software Upgrade (SBSU) issues

6.4.1 Troubleshooting CBAM GUI upgrade/backout failure when sim.log indicates stoppage
at CP_WARNING VERIFY_SOFTWARE or COMMIT pause point
Details
Note: This is only applicable for SBC SU in VMware environment.
1. If CBAM GUI upgrade/backout fails and sim.log indicates it stopped at expected pause point
(CP_WARNING,VERIFY_SOFTWARE or COMMIT), re-run upgrade/backout from CBAM GUI. If re-attempt
fails, please contact next level of support to resolve underlying issue.
2. However, if sim.log indicates failures, please contact next level of support to resolve underlying issue.
6.5 Troubleshooting Image Based Software Upgrade (IBSU)

Details for troubleshooting problems encountered while performing Image Based Software Upgrade (IBSU) are
explained in this section.
6.5.1 Upgrade logs

The most important upgrade logs are stored in the following locations:
• stage.log: This log is stored at /storage/downloads/stage_SBC_software/stage.log. This log

contains information from stage process of nokia.vmmhi_upgrade_sig_XX.YY.ZZ_ABC.sfx tool.
• InteractiveInstaller.sh.log: This log is stored at /storage/guestdata/log/sbc-su-log/

InteractiveInstaller.sh.log. This log contains information from all tools executed during IBSU.
• ansible.log: This log is stored at /storage/guestdata/log/ansible.log. This log contains

information only from the ansible playbook execution.
6.5.2 Useful les

Files that are useful for troubleshooting any problems encountered during IBSU, are stored in the following
locations:
• /storage/guestdata/app_data/sbc-playbooks/sbc-installer-data.yml
• /storage/guestdata/tmp_data/sbc-playbooks/sbc-installer-su-data.yml

• In directory /storage/guestdata/log/sbc-su-log/
– sbc-su-status
– .hcStatus
– .media1Status
– .media1SuReqStatus
– .media2Status
– .media2SuReqStatus
– .mediaStatus
– .mediaSuReqStatus
– .sbcStatus
– .sigStatus
– .sigSuReqStatus
– .status_file
• /storage/guestdata/log/media-log/upgrade-ISSU-log/upgrade-
ISSU_default_<date>.log
• /storage/guestdata/log/media-log/upgrade-ISSU-log/upgrade-
ISSU_default_<date>.status
• /storage/guestdata/log/media-log/upgrade-ISSU-precheck-log/upgrade-ISSU-
precheck_default_<date>.log
• /storage/guestdata/log/media-log/upgrade-ISSU-precheck-log/upgrade-ISSU-
precheck_default_<date>.status
6.5.3 Stage debug and cleanup

Stage process is executed only once when the nokia.vmmhi_upgrade_sig_XX.YY.ZZ_ABC.sfx tool is
executed for the first time. For successful completion of the stage process, all of the following requirements
must be met:
• Exactly two nokia*sfx files must be present in /storage/downloads/ directory:
– nokia.vmmhi_upgrade_sig_<version>.sfx
– nokia.vmmhi_upgrade_mgw_<version>.sfx
• There should be no VCP.VM<version>.zip file in /storage/downloads/
• All VMs should be reachable.
Most common problems encountered at stage process are:

• ERROR: Expecting exactly two sfx files in /storage/download dir - one for media
and one for signaling, but found:
This error is caused by incorrect number of nokia*sfx files in /storage/downloads/
To fix this problem remove unnecessary *sfx files and re-try.
• ERROR: Unexpected files found. This may indicate unfinished or broken VMM-HI
upgrade:
This error is caused by presence of VCP.VM<version>.zip file in the /storage/downloads/
To fix this error:
1. Ensure that there is no VMM-HI upgrade in progress.
2. Check if previous upgrade of VMM-HI was successful.
3. If there is no VMM-HI upgrade in progress and expected VMM-HI is installed then, remove
VCP.VM<version>.zip files from /storage/downloads/
4. Re-try stage.
• In case there is legitimate need to re-stage all software, execute the following command:
nokia.vmmhi_upgrade_sig_XX.YY.ZZ_ABC.sfx cleanup
Warning: Above command re-stages and replaces all software regardless of whether the
upgrade is in progress or not.
Sample use case: Use the above command to re-stage all software when one VM is not functioning
properly after stage (but before the upgrade has started). In this case staged files might be missing on
that VM.
6.5.4 Resume considerations (for advanced users only)

In SBC 21.8 resume process is fully automated. However, in case of emergency some steps can be skipped.
Sample of correct sequence of SU steps from a successful SBC upgrade in /storage/guestdata/log/sbc-

su-log/sbc-su-status file is shared below:
16:07:15 start.start-sbc-imgsu
16:07:15 start.sig-pre-su-update
16:07:30 completed.sig-pre-su-update
16:07:30 completed.start-sbc-imgsu
16:09:53 start.sbc-imgsu-require-check
16:10:20 completed.sbc-imgsu-require-check
16:10:20 start.media-imgsu-precheck
16:12:15 completed.media-imgsu-precheck
16:12:15 start.sig-imgsu-start
16:12:19 start.sig_prep_config_start
16:12:19 completed.start.sig_prep_config

16:12:34 start.sig-imgsu-deft-zip-check
16:12:44 completed.sig-imgsu-deft-zip-check
16:12:44 start.sbc-prep-su
16:12:48 completed.sbc-prep-su
16:12:48 start.sig-img-pre-su-play
16:30:05 completed.sig-img-pre-su-play
16:30:05 completed.sig-imgsu-start
16:30:05 start.sig-imgsu-create-vm-list
16:30:22 completed.sig-imgsu-create-vm-list
16:30:22 start.sig-imgsu-shutdown-sideB
16:30:55 completed.sig-imgsu-shutdown-sideB
16:30:55 start.sig-imgsu-updateSideB
16:51:08 completed.sig-imgsu-updateSideB
16:51:08 start.sig-imgsu-wait-update-B
17:10:48 completed.sig-imgsu-wait-update-B
17:10:48 start.sig-imgsu-active-sideB
17:15:20 completed.sig-imgsu-active-sideB
17:15:20 start.sig-imgsu-shutdown-sideA
17:17:32 completed.sig-imgsu-shutdown-sideA
17:17:32 start.sig-imgsu-updateSideA
17:36:46 completed.sig-imgsu-updateSideA
17:36:46 start.sig-imgsu-wait-update-A
17:59:47 completed.sig-imgsu-wait-update-A
17:59:47 start.sig-img-post-su-play
18:04:23 completed.sig-img-post-su-play
18:04:23 start.media-imgsu-upgrade
19:06:47 completed.media-imgsu-upgrade
19:06:47 completed.media-imgsu-upgrade
19:06:47 completed.media-upgrade-successfully
19:06:52 start.sig-imgsu-sideA-numa-alignment
19:21:08 completed.sig-imgsu-sideA-numa-alignment
19:21:08 start.sig-imgsu-health-check-wait-srv-ready-1
19:26:09 completed.sig-imgsu-health-check-wait-srv-ready-1
19:26:09 start.sig-imgsu-sideB-numa-alignment
19:40:21 completed.sig-imgsu-sideB-numa-alignment
19:40:21 start.sig-imgsu-health-check-wait-srv-ready-2
19:44:37 completed.sig-imgsu-health-check-wait-srv-ready-2
19:44:37 start.sig-img-post-su-update
19:45:58 completed.sig-img-post-su-update
19:45:58 start.sig-known-hosts-setup
19:46:29 completed.sig-known-hosts-setup
19:46:29 completed.sbc-su-successfully

However, consider another scenario where the SBC SU keeps failing at:
16:07:30 failed.sig-pre-su-update
16:07:30 failed.start-sbc-imgsu
The start-sbc-imgsu is name of the main block, and the sig-pre-su-update is the sub-block. In this case it is
important to keep this sequence.
To skip sig-pre-su-update you need to edit sbc-su-status file as follows:
16:07:30 completed.start-sbc-imgsu
16:07:31 start.sbc-imgsu-require-check
16:07:31 failed.sbc-imgsu-require-check
The sbc-imgsu-require-check is the name of the main block and there are no sub-blocks in this case. Use with
caution.
Warning: The block execution sequence may vary depending on SBC version. It is recommended to
avail SBC expert's assistance.
Once sbc-su-status is updated, and all problems are removed, re-try SU. The nokia*sfx tool automatically
re-tries execution of block(s) marked as failed.
6.5.5 sbc_hc.py tool

Starting from SBC 19.5 release, the sbc-playbooks delivers an automatic health check tool (sbc_hc.py).
Health check is executed before each upgrade.
This sbc_hc.py file might be in the following locations:
• /storage/downloads/sbc_hc/
• /storage/guestdata/app_data/sbc-playbooks/sbc_hc/
• /storage/guestdata/tmp_data/sbc-playbooks/sbc_hc/
Start health check

To start a health check in normal mode, execute the following:
./sbc_hc.py
Warning: Health check executes multiple actions simultaneously and may impact performance. It is
highly advised to run it during a maintenance window.

Start health check

To start a health check in debug mode, execute the following:
./sbc_hc.py -d
Sample outputs
[root@rms17-host01 sbc_hc]# /storage/guestdata/app_data/sbc-playbooks/sbc_hc/

sbc_hc.py -d
Collecting checks. Please wait...
Logs are stored in /var/tmp/sbc_hc/hc_log/sbc_hc.7.xml
Getting list of nodes...
Starting checks on nodes...
Checks started successfully...
Task RMS_Chk_Application_Backup [host02]: ok
Task RMS_Chk_No_Carrier [host02]: ok
Task RMS_Chk_cm_adm [host02]: ok
Task RMS_Chk_Archive [host02]: ok
Task RMS_Chk_Def_Gw [host02]: ok
Task RMS_Chk_CPU_isolation [host02]: ok
Task RMS_Chk_Media_DIAG_PUB [host02]: ok
Task RMS_Chk_MissingFiles [host02]: error
ERROR: rms17-host02: Following files are missing:
/storage/guestdata/app_data/sbc-playbooks/media-vars/media-config.yaml
Contact your next level of support for further assistance.
Task RMS_Chk_SW_Version [host02]: ok

Task RMS_Chk_Tmp_Data_Directory [host02]: ok
Task RMS_Chk_cm_adm [host01]: ok
Task RMS_Chk_Host_Arch [host02]: ok
Task RMS_Chk_pre_su [host02]: ok
Task RMS_Chk_Commit [host02]: ok
Task RMS_Chk_PreSu_Backup [host02]: ok
Task RMS_Chk_Sbc_Playbooks_Version [host02]: ok
Task RMS_Chk_Archive [host01]: ok
Task RMS_Chk_Def_Gw [host01]: ok
Task RMS_Chk_CPU_isolation [host01]: ok
Task RMS_Chk_FW_Version [host01]: ok
Task RMS_Chk_Media_DIAG_PUB [host01]: ok
Task RMS_Chk_MissingFiles [host01]: ok
Task RMS_Chk_DSPandMCM [host01]: ok
Task RMS_Chk_SW_Version [host01]: ok
Task RMS_Chk_Tmp_Data_Directory [host01]: ok
Task RMS_Chk_Storage [host01]: ok

Task RMS_Chk_Host_Arch [host01]: ok

Task RMS_Chk_pre_su [host01]: ok
Task RMS_Chk_Commit [host01]: ok
Task RMS_Chk_PreSu_Backup [host01]: ok
Task RMS_Chk_Sbc_Playbooks_Version [host01]: ok
Task RMS_Chk_Application_Backup [host01]: ok
Task RMS_Chk_pcs_system [host01]: ok
Task RMS_Chk_No_Carrier [host01]: ok
******************************************************************
SUMMARY FROM HEALTHCHECK
******************************************************************
NOTICE: sbc_hc finished at 08:45:05, 08/24/2021
NOTICE: debug file: /tmp/sbc_hc.dbg
******************************************************************
HEALTHCHECK COMPLETED WITH STATUS: ERROR!
See output of
/storage/guestdata/app_data/sbc-playbooks/sbc_hc/sbc_hc.py -A -f /var/tmp/sbc_hc/
hc_log/sbc_hc.7.xml
for details.
******************************************************************
Once the health check completes execution, it provides details about the logs for decoding. In the above
example it is:
******************************************************************
HEALTHCHECK COMPLETED WITH STATUS: ERROR!
See output of
hc_log/sbc_hc.7.xml
for details.
******************************************************************
If more detailed information is needed about the detected errors, add -V option at the end, and redirect
output to a file. For example:
hc_log/sbc_hc.7.xml -V > /tmp/yourfilename.txt

6.5.6 Hardware slowdown

If the following exception is seen during disaster recovery (or while performing any other procedure) then
hardware should be examined:
2021-08-30 12:37:33: lcm_base.py: Traceback (most recent call last):

File "/opt/LSS/share/basecfg/fi/bin/lcm_base.py", line 400, in execute_cmds
eval(cmd)
File "<string>", line 1, in <module>
File "/opt/LSS/share/basecfg/fi/bin/lcm_lib.py", line 2571, in
checkDiscoverForCloudLikeSys
raise Exception(msg)
Exception: Expected 22 new discovery files, found 15
2021-08-30 12:37:33: lcm_base.py: failed to run checkDiscoverForCloudLikeSys()
In emergency scenarios, execute the following procedure to recover the system:
1. Execute the following commands:
mkdir -p /storage/mnt
cd /storage/guestdata/qcow2
file=<expected_qcow_version>
Where, the expected_qcow_version is the version of qcow that is desired after procedure is
completed. For example:
file=nokia-SBC_sig-RHEL7-R37.38.06.0400.x86_64.qcow2
2. Continue the procedure by executing the following commands:
cp -p ${file} ${file}.orig
cp -p ${file}.md5 ${file}.md5.orig
guestmount -a /storage/guestdata/qcow2/${file} -i --rw /storage/mnt/
sed -i 's/retry = 1200/retry = 2400/' /storage/mnt/opt/LSS/share/basecfg/fi/

bin/lcm_lib.py
umount /storage/mnt/
guestmount -a /storage/guestdata/qcow2/${file} -i --rw /storage/mnt/
3. Confirm that the changes are correctly applied by executing the following command:
grep 2400 /storage/mnt/opt/LSS/share/basecfg/fi/bin/lcm_lib.py
Expected result:
retry = 2400

4. After confirming that the fix is visible, umount the qcow, and retry failed procedure by executing following
commands:
umount /storage/mnt/
md5sum ${file}|cut -f1 -d" " > ${file}.md5
scp ${file}* host02:/storage/guestdata/qcow2

Troubleshooting Guide Troubleshooting SBC media plane issues
7 Troubleshooting SBC media plane issues

This chapter describes the troubleshooting procedures of Nokia SBC's media plane issues in its integrated
configuration.
For more information on media plane commands, see The 7510 Border Gateway (BGW) CE/SE Commands
Reference Guide for appropriate release.
7.1 Logging in to media plane using CLI

1. Log in from your computer to OAM server by performing SSH to OAM server of SBC and logging in as root
user using root password.
Example
$ ssh root@xxx.yyy.zzz.6
Password: <enter root password>
2. Log in from OAM server to the media plane by performing SSH to media plane and log in as diag. No
password is needed while you log in.
Example
# ssh diag@xxx.yyy.zzz.6
diag@xxx.yyy.zzz.6's password:<hit Enter>
7.2 Troubleshooting the media path problem

1. Log in to the media plane as diag.

2. Get general information on SCM.
view date
view node
view redundancy group
view uptime all
view version all
view overload status
view resource usage
view alarm active
view alarm history
3. Get H248 context information of the bad call:
view h248 context all
view h248 context <ctx-id>
diag view h248 datapath <ctx-id>
4. Check nh-ip-monitor session status, to see if any data path broken.
view nh-ip-monitor configuration
view nh-ip-monitor session

5. Check the link selection for the voice ports:
view port state all
diag view pif port all
diag view lpi table all
6. Collect the current and history statistics.
view traffic statistics current global
view traffic statistics current interface <port>
view traffic statistics current ip-subnet >ip-subnet>
view traffic statistics current realm <realm-name>
view traffic statistics history global
view traffic statistics history interface <port>
view traffic statistics history ip-subnet <ip-subnet>
view traffic statistics history realm <realm-name>

7. Collect the NPU statistics. (Commands should be run on active PIM appl)
view npu ctrlpath statistics
view npu datapath packetlost
view npu datapath statistics egress detail
view npu datapath statistics ingress detail
view npu datapath statistics ring detail
view npu datapath statistics ether detail
view npu history all
8. Collect the NPU IO trace. (Commands should be run on active PIM appl)
NPU IO trace is a feature to capture the packets in/out through the NPU. It is used to identify if the
packets are received or sent out. It is limited to use for single calls or under the very few call loads.
clear current NPU io-trace buffer
define npu io-trace start {<gr-port>|*}{<local-port>|*} <remote-

address>|*} {<remote-port>|*}
define npu io-trace stop *
view npu io-trace file [<filename>]
Note: If you save the NPU IO trace file, it will be generated on vSCM and stored in the /
export/home/lss/logs/ SBC directory.
9. For the media-aware calls (the vMCM application is involved), execute the following command to capture
NPU IO trace (another NPU IO trace, different with the NPU IO in the previous chapter), DSP IO trace and
DSP SDK message.
Note: Execute during light traffic period.
(1) Execute command on vPIM.
d run npu DspFwdCtrlPktIotraceOn

------------------------------------------------------------------------------------
DspFwdCtrlPktIotraceOn --trace ctrl packets in DSP PKT FWD.
DspFwdDataPktIotraceOn --trace data packets in DSP PKT FWD.
DspFwdAllPktIotraceOn --trace ctrl and data packets in DSP PKT FWD.
DspFwdAllPktIotraceOff --stop io-trace in DSP PKT FWD.
(2) Stop using command on vPIM
d run npu DspFwdAllPktIotraceOff
(3) Dump trace to file via command on vPIM view
npu io-trace file abc.txt
Then, The ABC.TXT is saved in SCM's VM /opt/v7510/data/.
10. DSP IO trace
a. Set up IO trace configuration.
diag define dsp io-trace settings buffersize 1000
b. Start Capturing DSP IO trace.
diag define dsp io-trace enable mp.mcm1.1.1 ( the media port mp.mcm1.1.1 is get
from the output of view h248 context <ctx-id>)
c. Stop Capturing DSP IO trace.
diag define dsp io-trace disable all
d. List trace bin files on vSCM and get these files out.
ls *.bin

11. DSP SDK Message
a. Login on virtual machine. (vmcm)
[root@spark-host01 ~]# virsh console 32
b. Tcp dump packet file.
[root@sbc06-media-mcm1 ~]# tcpdump -i eth1 -w control_plane.pcap
control_plane.pcap --you can naming any filename.
c. List dump file.
[root@sbc06-media-mcm1 ~]# ll *.pcap
-rw-r--r--. 1 tcpdump tcpdump 524 Nov 1 05:12 control_plane.pcap
12. Related OBC UI reading ( on vSCM)
view performance statistics current busy mcp

view performance statistics history busy mcp
view media stats channel <port>
diag view rmgr lb mcp
diag view rmgr cache filled mcp
diag view rmgr cache config
diag view rmgr dyndrm
7.3 Troubleshooting the networking problem
7.3.1 Scenario-1
2. Get basic ip and route configuration on SCM application.
view ip if
view route table
3. Get the port state on PIM application.

view port state all

view port state <port>
Example: view port state en.pim1.8
4. Ping with source ip address on PIM application.
If you cannot reach a remote host using 'Ping', please always ping it again with source ip address
specified.
On the PIM application, need to specific the source ip address, vlan-id (should be the same vlan is as
source ip address) and vlan-priority (usually 0).
Example: ping 10.11.13.254 0 0 10.11.12.130
Note:
In the check point below, always take this ping CLI for instance, so please adjust the destination
and source IP with specified value as required.
5. Check IP filter configuration and statistics on SCM application.
view ip filter configuration

view ip filter all
view ip filter statistics
ping 10.11.13.254 0 0 10.11.12.130
view ip filter cache
view ip filter statistics
6. Check the IP Flooding Statistics on PIM.
view ip flooding protocol threshold all

view ip flooding protocol statistics all
ping 10.11.13.254 0 0 10.11.12.130
view ip flooding protocol statistics all
7. Check ARP table, ARP statistics and icmp statistics on PIM.
view arp table

view arp statistics
view icmp statistics
ping 10.11.13.254 0 0 10.11.12.130
view arp table
view arp statistics

view icmp statistics
8. Get NPU io trace on PIM.
define npu io start <ge-port>****
Note:
<ge-port> indicates the number of GE port but start from 0 to 7 here. For example, specify 0
here if want to do npu io trace on 1st GE en.pim1.1.
ping 10.11.13.254 0 0 10.11.12.130

define npu io stop *
view npu io file <filename>
Example to do npu io trace on en.pim1.8
My-Chassis:rem-cons:ACT-PIM:1.19(r0)>=2:diag:main# define npu io start 7 * * *

*
My-Chassis:rem-cons:ACT-PIM:1.19(r0)>=4:diag:main# ping 10.11.13.254 0 0

10.11.12.130
Using IP Interface 10.11.12.130, vlan_id 0
Pinging 10.11.13.254... Hit Ctrl-c followed by Enter to abort.
Response from 10.11.13.254(vlan_id=0, vlan_prio=0): seq=0 time=2.943 ms.
My-Chassis:rem-cons:ACT-PIM:1.19(r0)>=5:diag:main# define npu io stop *
My-Chassis:rem-cons:ACT-PIM:1.19(r0)>=6:diag:main# view npu io file npu19.txt
My-Chassis:rem-cons:ACT-PIM:1.19(r0)>=7:diag:main#
start dumping trace to file...
trace dump to file finished.
9. Ping and capture wireshark on external switch by port mirror, if possible.
7.3.2 Scenario-2
2. Get basic ip and route configuration on SCM application.

view ip if
view ipv6 route
3. Get the port state and on PIM application.
view port state all

view port state <port>
Example: view port state en.pim1.8
4. Ping with source ip address on PIM application.
If you cannot reach a remote host using Ping, please always ping it again with source ip address
specified.
On the PIM application, need to specific the source ip address, vlan-id (should be the same vlan is as
source ip address) and vlan-priority (usually 0).
Example: # ping6 3004::1 3004::152
Note:
In the check point below, always take this ping CLI for instance, so please adjust the destination
and source IP with specified value as required.
5. Check IP filter configuration and statistics on SCM application.
view ipv6 filtering state

view ipv6 filter rule all
view ipv6 filter statistics all
ping6 3004::1 3004::152
view ipv6 filter statistics all
6. Check the IP Flooding Statistics on PIM application.
view flooding threshold all

view flooding statistics all
ping6 3004::1 3004::152
view flooding statistics all
7. Check NDP table, icmp statistics on PIM application.
view ndp table

view icmp statistics v6

ping6 3004::1 3004::152

view ndp table
view icmp statistics v6
8. Get NPU io trace on PIM.
define npu io start <ge-port>****
Note:
<ge-port> indicates the number of GE port but start from 0 to 7 here. For example, specify 0
here if want to do npu io trace on 1st GE en.pim1.1.
ping6 3004::1 3004::152

define npu io stop *
view npu io file <filename>
Example to do npu io trace on en.pim1.8
My-Chassis:rem-cons:ACT-PIM:1.19(r0)>=2:diag:main# define npu io start 7 * * *

*
My-Chassis:rem-cons:ACT-PIM:1.19(r0)>=4:diag:main# ping6 3004::1 3004::152
Pinging 3004::1 with 1024 bytes of data:
... Hit Ctrl-c followed by Enter to abort.
Reply from 3004::1 bytes=1024 time=292ms hlim=64
My-Chassis:rem-cons:ACT-PIM:1.19(r0)>=5:diag:main# define npu io stop *
My-Chassis:rem-cons:ACT-PIM:1.19(r0)>=6:diag:main# view npu io file npu19.txt
My-Chassis:rem-cons:ACT-PIM:1.19(r0)>=7:diag:main#
start dumping trace to file...
trace dump to file finished.
9. Check Ethernet statistics on SCM.
view ethernet statistics current <port> detailed
10. Ping and capture wireshark on external switch by port mirror, if possible.

7.4 Troubleshooting the application crash problem

2. Download the coredump files of related application.
The coredump file stores in: /export/home/lss/logs/*.BZ2
3. Collect the output of following commands:
view node
view redundancy group
view version
view date
view alarm active
view alarm history
view overload status
view resource usage
view performance statistics current mcp
view performance statistics history mcp
diag view rmgr statistics
view h248 errlog sum
view h248 errlog history
7.5 Troubleshooting the H248 transaction failure problem

1. Get the H248 connection status and configuration.
define vmg scope <vmg_id>

view mgw config
view h248 termination root long
diag view trillium status
diag view trillium mmgr
diag view h248 svc_flags
diag view h248 inac-timer info
2. Get the system performance status:
view overload statusdiag view trillium status

view mmgr pool
view mmgr usage

3. Get the H248 transaction status and log:
view h248 errlog history

view h248 history e500
view h248 lib status
view h248 statistics sum all-vmg
view h248 statistics current all-vmg
view h248 statistics current coaps
view h248 statistics history coaps
view h248 statistics current command
view h248 statistics current error-code
view h248 statistics history error-code
view h248 statistics current transport
view h248 statistics history transport
diag view h248 swup status
Note:
To get the SYSLOG files under the directory: /export/home/lss/logs/media*.log
4. Get the H248 message trace:
diag view h248 activeMsg snapshot

define ip capture start udp <port>[{any|v4|v6} ][any|<addr>][any|<port>][{in|
out|both}]
[<count>]
define ip capture stop
Note:
The H248 message file captured is placed under the directory: /export/home/lss/logs/
Examples:
define ip capture start udp en.scm.1

define ip capture stop
Note:
The log file captured is named as "EN_10_1.CAP" and placed under the directory: /export/
home/lss/logs/

7.6 Troubleshooting backplane connection issues
7.6.1 Troubleshooting media backplane connection down issue
Description
The Backplane connection down alarm is displayed in NetAct (or can be viewed as output of view alarm
active CLI command).
Solution
The reason for this problem is the SRIOV interface missing to attach to the VM. This can be verified as follows:
1. Run the following command to get list of VMs on the compute:
virsh list
2. Run the following command to find VM name:
virsh dumpxml <instance id> |grep name
3. Run the following command to find MAC addresses for all vNICs assigned to the VM:
virsh dumpxml <instance id> |grep mac
4. Run the following command to get a list of all active SR-IOV VFs on the compute and to look for the VLAN
values and MAC addresses for a given VM:
ip link ls |grep vlan
This problem cannot be resolved by rebooting VMs from SBC side, but can only be resolved by hard rebooting
from CBIS side, by executing following commands:
1. Run following command to recover VMs after fixing SR-IOV state:
2. Run following command to clear ERROR state (and then perform a hard reboot again):
nova reset-state -active <vm uuid>

7.6.2 Checking backplane connection status

Use the following command to check the backplane connection status, including working link selection on a
specific VM.
diag view backplane link
The command provides the link status of different destinations.
For example,
My-Chassis:ACT-PIM:1.17(r0)>=19:diag:main#view backplane link

own peer lbi: pim1; own slot:17
-----------------------------------------------
slot lbi port0 port1 link_selection
-----------------------------------------------
5 mcm1 down down invalid
10 scm up down port0
19 pim2 up down port0
For more information about backplane connection, see SBC Media Plane Cloud Alarm Report Guide. Also see
Syslog Commands in the SBC Media Plane Cloud Commands Reference Guide to obtain the required logs.
For more information about saving the DPC (Data Path Check) log entries, see the topic Defining the Data Path
Check Savemode in the SBC Media Plane Cloud Commands Reference Guide. The DPC file contains readme
that further describes how to understand the DPC files.
7.6.3 Checking TIPC connection status

Use the following command to check the TIPC connection status on a specific VM.
diag view tipc link
For example,
My-Chassis:ACT-SCM:1.10(r0)>=4:diag:diag:vMGx# view tipc link

own tipc id:<1.1.10>
------------------------------------------------------
slot tipcid eth0 eth1
------------------------------------------------------
5 <1.1.25> up up
17 <1.1.17> up up
The state must be consistent with the command diag view backplane link. It can be used to check
backplane connection status while debugging, and cross verify whether the defect is in the software.

7.6.4 Checking backplane connection switchover history

Use the following command to check the history of all the backplane connection switchover events.
diag view backplane history
For example,
My-Chassis:ACT-PIM:1.17(r0)>=20:diag:main# view backplane history
Backplane Link switched - history

[ 0]: JUL 13 07:19:36:314778 backplane pim1 to scm switched to (port0)
rc=0
[ 1]: JUL 13 07:19:36:314806 backplane pim1 to pim1 switched to
(port0) rc=0
(port0) rc=0
[ 3]: JUL 13 07:20:44:783768 backplane pim1 to mcm2 switched to
(port0) rc=0
(port0) rc=0
(port1 ) rc=0
[ 6]: JUL 13 07:22:43:920611 backplane pim1 to scm switched to
(port1 ) rc=0
(port1 ) rc=0
(invalid) rc=0
(port0) rc=0
(invalid) rc=0
[ 11]: JUL 13 07:44:51:487406 backplane pim1 to scm switched to (port0)
rc=0
This command along with the active alarm or alarm history helps to debug backplane issues.
7.6.5 Checking MAC table used in datapath

Check MAC table on SCM and all vMCM and vPIM using the UI command diag vi rmgr mac-table. These
tables should be consistent. These MAC addresses will be used in datapath.
For example,
COTS:rem-cons:ACT-PIM:1.18(r0)>=38:diag:main# d vi rmgr mac

NPU MAC table : (*) active link port
IDX SLOT GE_ID LINK MAC

-------------------------------------------------
1 17 0 0 e2-68-02-04-00-ae
2 17 0 1 e2-68-02-04-00-af
3 17 1 0 e2-68-02-04-00-ae
4 17 1 1 e2-68-02-04-00-af
5 17 2 0 e2-68-02-04-00-ae
6 17 2 1 e2-68-02-04-00-af
7 17 3 0 e2-68-02-04-00-ae
8 17 3 1 e2-68-02-04-00-af
9 17 4 0 e2-68-02-04-00-ae
10 17 4 1 e2-68-02-04-00-af
11 17 6 0 e2-68-02-04-00-ae
12 17 6 1 e2-68-02-04-00-af
13 18 0 0 e2-68-02-04-00-ce
14 18 0 1 e2-68-02-04-00-cf
15 18 1 0 e2-68-02-04-00-ce
16 18 1 1 e2-68-02-04-00-cf
17 18 2 0 e2-68-02-04-00-ce
18 18 2 1 e2-68-02-04-00-cf
19 18 3 0 e2-68-02-04-00-ce
20 18 3 1 e2-68-02-04-00-cf
21 18 4 0 e2-68-02-04-00-ce
22 18 4 1 e2-68-02-04-00-cf
23 18 6 0 e2-68-02-04-00-ce
24 18 6 1 e2-68-02-04-00-cf
25 19 0 0 e2-68-02-04-00-ee
26 19 0 1 e2-68-02-04-00-ef
27 19 2 0 e2-68-02-04-00-ee
28 19 2 1 e2-68-02-04-00-ef
MCM MAC table :
IDX SLOT CORE LINK MAC

-------------------------------------------------
1 5 0 0 e2-68-02-04-01-4e
2 5 0 1 e2-68-02-04-01-4f
7.6.6 Checking NPU MAC table

Check MAC table in NPU using the command diag view npu mac udp on vPIM.

For example,
COTS:rem-cons:ACT-PIM:1.18(r0)>=37:diag:main# d vi npu mac u

dlnpuVudp_mac: L2 table for Ingress (1024 entries)
ptr_L2 Nexthop MAC used-counter

[1023] ff-ff-ff-ff-ff-ff 5
[1022] e2-68-02-04-00-ce 14
[1021] e2-68-02-04-01-4e 4
[1020] e2-68-02-04-00-ee 6
dlnpuVudp_mac: L2 table for Ingress (4 used
7.7 Troubleshooting media total network failure issue
About this task

It is applicable for RMS platform of SBC-media.
Normally, when there is a Voice Port failure on both PIMs on RMS, media RMS is not able to process calls.
But when the links are restored, the RMS system gets back to normal. However, in some cases even when all
the other connections are restored, PIMs do not recover automatically and manual action has to be taken to
recover the PIMs. The SBCSST-468/SBCCGART-260 feature is implemented to address this issue.
With the implementation of this feature from Release 20.5, after the connections are restored, the RMS
system will recover automatically and start to process calls. Depending on the failure sequences (link up/down
versus STB/ACT PIM), during system recovery, there may be PIM reboots or longer synchronization time than
in normal case. Once the links are up for at least one PIM and PIM redundancy group is at A-Work state, traffic
should be back to normal without any manual intervention.
You can use the following commands to check current status of the RMS system:
• To check the current state of the applications, use the view nodes command.
• To check the current state of the redundancy groups, use the view redundancy groups command.
When you check the state of the redundancy groups, if the redundancy group for PIM is A-Bulk, wait for it
to change to the A-Work state. When the redundancy group for PIM is A-Work, and the links are up, the RMS
system should have recovered at this point. In cases that the traffic is still not processed, do the following:
For both IPv4 and IPv6 interfaces:
1. Check the port state by entering:

view port state all
2. Check the IP realm state by entering:

view ip realm system config

3. Check the interface state by entering:

view ip if
In cases that IPv6 interfaces are up, but the tentative flag is on, the tentative flag can be cleared manually.
You can use the diag define ipv6 netif reset all false command on PIM to clear the
tentative flag for all links that are up.
For IPv4 interfaces only:
4. Check the VLAN state on all the PIMs by entering:

rexec <gateway> <appl> linux enter vlan
For example:
rexec 1 19 linux enter vlan
Expected outcome
If VLANs are defined on the RMS system, they should be listed in the command output. If VLANs are not
listed in the command output, you can trigger the voice port initialization manually.
5. To trigger the voice port initialization manually, enter the following command on each PIM where GE port
is down:
diag linux ip init voice-port <GE>
7.8 Troubleshooting mute call issues
About this task

The mute call issue occurs when the signalling part successfully made a call connection, but the voice is
inaudible on one end or both ends of the call.
The root cause of the problem is that the RTP packets for the call are not arriving at the SBC media plane
interface, or packets are arriving but are dropped or discarded by the SBC media plane.
• A packet loss: In this case, it means that the RTP packets for the call do not reach the SBC media plane
interface and the packet can get lost anywhere in the network before coming to the SBC media plane. In
this case, the problem is outside of media plane.
• A packet drop: Invalid packets will be dropped. It means that there is something wrong with the packet,
which can be caused by any kind of packet proccesing error including transmission errors where a packet
is damaged on its way to its destination, format errors where the format of the packet is not what the
receiving device expects, or property errors where the packet has incorrect properties (for example,
wrong MAC addresses). The packet drop can also be caused by problems with SBC media plane interface
(for example, DPDK interface is not working).
• A packet discard: It means that the packet itself seems to be correct, but it is still discarded, which can be
caused due to some properties, like violation of SDR/PDR limits.

To solve the problem, you need to identify where the fault occurs first. For SBC media plane, the mute call issue
comes down to two different situations:
• The packets get lost on their way to the SBC media plane.
In this case, the packets may have been damaged on their way through the network before they arrive at
SBC media plane.
• The packets are dropped or discarded by SBC media plane.
This mainly occurs when SBC media plane sends less packets than it receives.
Make a test call, and then do the following to get more information to identify where the fault occurs:
1. Ensure that there is no connectivity issue and all basic health checks of the system are performed.
2. Get the call trace using one of the following methods:
• Use the tcpdump tool.
For example:
tcpdump -i eth0 -w test.pcap
• Use the NPU IO trace feature on the active PIM application.
For more information, see Troubleshooting the media path problem on page 61 and
Troubleshooting the networking problem on page 66.
3. Log in to SBC media plane as the diag user.

4. Collect the current and history statistics on SCM application.
The statistics can be collected per interface, per realm or on global level. Use the command which is
appropriate for your case.
• view traffic statistics current global
• view traffic statistics current interface <port>
• view traffic statistics current realm <realm-name>
• view traffic statistics history global
• view traffic statistics history interface <port> [<start-interval>][<end-

interval>]
• view traffic statistics history realm <realm-name> [<start-interval>][<end-

interval>]
For example:
view traffic statistics current interface en.pim1.1
view traffic statistics current realm AccessRealm
view traffic statistics history interface en.pim1.1 1 2
view traffic statistics history realm AccessRealm 1 2
Expected outcome
The following is a sample fragment output of the view traffic statistics current global
command after making calls with netem simulation from outside. You can check the Discarded
Packets, RTP packet loss number avg, and RTP package loss number max parameters to
identify whether the issue is with SBC media plane. If discarded packets are observed in statistics like pdr
rate violations or sdr rate violations, the problem is in configuration on signaling side.
Current Global Traffic PV data :

VNF Name: cb0027_2-mgw1v
(Elapsed time: 0:9:45)

--------------------- Locally measured accumulated values ----------------
Received octets(nt/or): | 4316580(NPU)
RTP received packets: | 23981(NPU)
Sent octets(nt/os): | 4316580(NPU)
RTP sent packets: | 23981(NPU)
Discarded Packets: | 0(NPU)
- Traffic management: pdr rate violations | 0(NPU)
- Traffic management: sdr rate violations | 0(NPU)
- Traffic management: Packet size violations | 0(NPU)
- RTP Payload type filtering | 0(MPU)
- Source address filtering | 0(NPU)
MSRP TLS connections attempts: | 0(NPU)
MSRP TLS connections successes: | 0(NPU)
MSRP TLS connections handshake failures: | 0(NPU)
--- Locally measured minimum, maximum and average values per interval ----

Duration (nt/dur) (ms): avg | 23343(NPU)

Duration (nt/dur) (ms): max | 25018(NPU)
Duration (nt/dur) (ms): min | 20039(NPU)
RTCP round trip delay(rtp/delay)(ms): avg | 0.000(NPU)
RTCP round trip delay(rtp/delay (ms): max | 0.000(NPU)
RTP packet loss number avg: | 108(NPU)
RTP packet loss number max: | 129(NPU)
RTP packet loss rate(rtp/pl) (%) avg: | 9.826(NPU)
RTP packet loss rate(rtp/pl) (%) max: | 11.652(NPU)
RTP interarrival jitter(rtp/jit) (ms) avg: | 0.003(NPU)
RTP interarrival jitter(rtp/jit) (ms) max: | 0.375(NPU)
------------ Locally measured jitter values for each level (ms) ----------
Jitter Range j (ms) | Nbr Terminations | % of all Terminations
0 <= j < 1 | 24 | 100.000
1 <= j < 2 | 0 | 0.000
2 <= j < 3 | 0 | 0.000
3 <= j < 4 | 0 | 0.000
4 <= j < 5 | 0 | 0.000
5 <= j < 6 | 0 | 0.000
6 <= j < 7 | 0 | 0.000
7 <= j < 8 | 0 | 0.000
8 <= j < 9 | 0 | 0.000
9 <= j < 10 | 0 | 0.000
10 <= j | 0 | 0.000
Unknown | 0 | 0.000
Nbr Lost Pkts per Termination| Nbr Terminations | % of all Terminations
70-79: | 2 | 8.333
80-89: | 3 | 12.500
>=90: | 19 | 79.167
…
5. There are several reasons for which packets are dropped, a major reason being malformed packets which
causes packet drop in an application or the NPU-DPDK. If most of the packets from one stream are
dropped, then the packets are not delivered to the end user which causes one side mute call. Collect the
NPU datapath statistics on PIM application using the following commands:
• diag run npu e
• diag run npu DpdkEthStat
• diag run npu ShowDpdkEthExtStat 0
• diag run npu ShowDpdkEthExtStat 1
• diag run npu dpdkringstat
• diag run npu NPUIngressCounterSum
• diag run npu NPUEgressCounterSum

• diag run npu DspFwdPortStatisticsShow
By checking the output of these commands, you can verify the cause of possible packet drop or discard.
The following example displays the output for diag run npu DpdkEthStat command. The packets
are checked by trusted PIM and are dropped later. The illegal packet information are taken from the
statistics. The statistics gives details of the error descriptions and the number of packets that are
dropped. The drop packets on port 1 are for core and for port 2 are for access, however the port details
are configurable.
-----------------------------------illegal pkts detected by appl.

(Tx)--------------------------------------------------
Tx301:content of seg < real packet size.
Tx401:real packet size < content of seg.
Tx501:datalen = 0.
Tx601:pktlen != data_len_1 + data_len_2 + ... + data_len_x.
-----------------------------------illegal pkts detected by
dpdk(Tx)---------------------------------------------------
Tx101:large packet len > 9728.
Tx102:large packet len > 262114.
Tx201:small packet len < 17.
Tx701:number of the segment > 8.
Tx702:number of the segment > UINT8_MAX.
Tx703:tso_segsz < 256.
Tx704:tso_segsz > 9674.
Tx801:tx offload not supported.
Tx802:invalid tx offload.
Tx901:invalid checksum.
** vCPU tx_prep_drop(appl) Tx301 Tx401 Tx501 Tx601

***vCPU tx_prep_drop(dpdk) Tx101 Tx102 Tx201 Tx701 Tx702 Tx703 Tx704 Tx801
Tx802 Tx901
8 11 11 0 0 0
***8 5 5 0 0 0 0 0 0 0 0 0
9 7 7 0 0 0
***9 2 2 0 0 0 0 0 0 0 0 0
10 3 3 0 0 0
Port Name tx_prep_drop(appl) tx_prep_drop(dpdk)
2 eth3 190 33
vCPU tx_prep_drop(appl) Tx301 Tx401 Tx501 Tx601
***vCPU tx_prep_drop(dpdk) Tx101 Tx102 Tx201 Tx701 Tx702 Tx703 Tx704 Tx801
Tx802 Tx901
8 19 19 0 0 0
***8 3 3 0 0 0 0 0 0 0 0 0
9 17 17 0 0 0

***9 3 3 0 0 0 0 0 0 0 0 0
10 13 13 0 0 0
14 15 15 0 0 0
6. Collect all of the packet captures and check the subtract and summary output information.
Note: If you need to inspect the subtract output from H.248, run MEGACO catpture on SCM.
The following example shows some H.248 MEGACO subtract replay information:
C=23{S=ip/1/1/45{SA{nt/dur=20045,nt/os=164340,nt/or=150300,rtp/ps=913,rtp/
pr=835,
rtp/pl=8.461538,rtp/jit=1,rtp/delay=0,rtp/dur=20045,rtp/os=164340,rtp/
or=150300,tmanr/
dp=0}},S=ip/1/0/46{SA{nt/dur=20043,nt/os=150300,nt/or=164340,rtp/ps=835,rtp/
pr=913,
or=164340,tmanr/
dp=0}}}}
7. If needed, contact Nokia support team with the information collected to help solve the problem.
7.9 Troubleshooting absence of Service Change information in signaling log
Details
Nokia SBC siglinag log records only the details regarding service change to a specific termination and context,
and does not record details regarding service change when it is for all context or to null context.
Workaround
Media plane service change event details is pegged in the corresponding PM counter's (VS.mgcForcedServChg)
data. Refer to PM counter's data for information regarding all context or null context service change event.
7.10 Troubleshooting high CPU usage issue
Details
If high CPU usage issue is detected on MCM, SCM, or Packet Interface Module (PIM) (for example, the ‘MCM CPU
Overload’ media alarm is raised) then, use the cpu_usage.sh tool described below for collecting the required
data for troubleshooting the issue.
Collect the required data by running the cpu_usage.sh tool, and forward the same to the appropriate Nokia
SBC support team, to help investigate the high CPU usage issue.

• The cpu_usage.sh tool collects data that can help in finding the root cause of the high CPU usage.
• The cpu_usage.sh tool is available for use on MCM, SCM, or PIM.
• The cpu_usage.sh tool invokes the sar, top, ps, and perf record commands, and saves their output
in respective files (sar.txt,top.txt {no threads-mode}, topH.txt {threads-mode; top command was run with
-H parameter}, ps.txt, and perf.data respectively) in a folder, named with timestamp (for example, /opt/
v7510/data/cpu_usage/cpu_usage-yyyy-Mon-dd_hh-mm-ss/)
Ways of running cpu_usage.sh tool

The cpu_usage.sh tool can be run in one of the two ways listed below:
• Run in foreground: The cpu_usage.sh tool immediately collects data for the component on which the
command is run (MCM, SCM, or PIM).
• Run in background: The cpu_usage.sh tool monitors the alarms for the component on which the
command is run (MCM, SCM, or PIM), and collects data when an alarm is active.
Running cpu_usage.sh tool in foreground

The cpu_usage.sh tool, when run in the foreground mode, immediately collects data for the component on
which the command is run (MCM, SCM, or PIM).
The cpu_usage.sh tool, can be run in the foreground mode as shown below:
/opt/v7510/bin/cpu_usage.sh
Additional parameters that can be used are:
• --run_perf_record
To collect additional data using perf record command. The perf record command is a performance
analysis tool for Linux.
• --perf_time <perf_time>
To specify for how long perf record data is collected.
Range: 10 to 600 seconds.
Default value: 300 seconds.
• --process <process_name>
To specify for what process perf record data is collected (can select one of the processes from: mcm,
scm, pim, nodemgr, mpu, or npu-dpdk).
Note: For more information about the cpu_usage.sh tool's parameter options, see the help by
running:
/opt/v7510/bin/cpu_usage.sh --help
Examples:

• /opt/v7510/bin/cpu_usage.sh
Runs sar, top, ps commands for 60 seconds, and saves their outputs in files.
• /opt/v7510/bin/cpu_usage.sh --run_perf_record
Runs sar, top, ps commands. Additionally runs perf record command for 5 minutes for the most CPU
consuming process from the list (mcm, mpu, scm, pim, nodemgr, or npu-dpdk).
• /opt/v7510/bin/cpu_usage.sh --run_perf_record --process mpu
Runs sar, top, ps commands. Additionally runs perf record command for 5 minutes for the mpu
process.
• /opt/v7510/bin/cpu_usage.sh --run_perf_record --perf_time 120
Runs sar, top, ps commands. Additionally runs perf record command for 2 minutes for the most CPU
consuming process from the list (mcm, mpu, scm, pim, nodemgr, or npu-dpdk).
• /opt/v7510/bin/cpu_usage.sh --run_perf_record --process mcm --perf_time 180
Runs sar, top, ps commands. Additionally runs perf record command for 3 minutes for the mcm
process.
Output example:
[root@cb0334i1-mgw1vm002 ~]# /opt/v7510/bin/cpu_usage.sh

--run_perf_record --perf_time 120
Tool is collecting performance data.
Data will be available in /opt/v7510/data/cpu_usage/cpu_usage-2021-
Mar-18_10-58-14
Tool will be running for 120 seconds.
Please wait . . .
Data collected successfully.
[root@cb0334i1-mgw1vm002 ~]# ll /opt/v7510/data/cpu_usage/cpu_usage-2021-
Mar-18_10-58-14/
total 6548
-rw-------. 1 root root 2084152 Mar 18 11:00 perf.data
-rw-r-----. 1 root root 7391 Mar 18 10:58 ps.txt
-rw-r-----. 1 root root 29426 Mar 18 10:59 sar.txt
-rw-r-----. 1 root root 2805590 Mar 18 10:59 topH.txt
-rw-r-----. 1 root root 1771773 Mar 18 10:59 top.txt
Running cpu_usage.sh tool in background

The cpu_usage.sh tool, when run in the background, monitors the alarms for the component on which the
command is run (MCM, SCM, or PIM), and collects data when an alarm is active.
To run the cpu_usage.sh tool in background mode, use --background option while running the tool, as
shown below:
/opt/v7510/bin/cpu_usage.sh --background --scm_vip a.b.c.d

Additional parameters that can be used are:
• --key_path <key_path>
To run the cpu_usage.sh tool on MCM or PIM, additionally use --key_path option, and provide full
path to SCM SSH key.
• --time <time>
To specify for how long to monitor alarms.
Range: 1 to 60 minutes.
Default value: 5 minutes.
• --interval <interval>
To specify how often to check alarms.
Range: 1 to 900 seconds.
Default value: 60 seconds.
• --alarm_name <alarm_name>
To specify which alarm should trigger data collection. Provide a string (which can be a substring of the
specific problem description) as <alarm_name>.
• --stop
To stop the cpu_usage.sh tool which is running in the background mode.
Note: For more information about the cpu_usage.sh tool's parameter options, see the help by
running:
/opt/v7510/bin/cpu_usage.sh --help
Examples:
• /opt/v7510/bin/cpu_usage.sh --background
To monitor the alarms for the component on which the cpu_usage.sh tool is run (for example, MCM).
Checks alarms every 60 seconds, during 5 minutes interval, and collects data when alarm(s) is fired.
• /opt/v7510/bin/cpu_usage.sh --background --scm_vip a.b.c.d
To run the cpu_usage.sh tool on SCM.
• /opt/v7510/bin/cpu_usage.sh --background --scm_vip a.b.c.d --key_path /home/

cloud-user/cbam_media.pem
To run the cpu_usage.sh tool on MCM or PIM.
• /opt/v7510/bin/cpu_usage.sh --background --time 10
To monitor alarms every 60 seconds, for 10 minutes, and collect data when alarm(s) is fired.
• /opt/v7510/bin/cpu_usage.sh --background --interval 30

• /opt/v7510/bin/cpu_usage.sh --background --time 2 --interval 5
• /opt/v7510/bin/cpu_usage.sh --background --alarm_name "MCM CPU Overload"
To monitors alarms for the component on which the cpu_usage.sh tool is run (every 60 seconds, for 5
minutes) and collect data when MCM CPU Overload alarm is raised.
• /opt/v7510/bin/cpu_usage.sh --background --alarm_name "CPU"
To monitors alarms for the component on which the cpu_usage.sh tool is run (every 60 seconds, for 5
minutes) and collect data when alarm with CPU string, in specific problem description, is raised.
• /opt/v7510/bin/cpu_usage.sh --background --time 12 --interval 120 --alarm_name

"MCM CPU Overload"
To monitors alarms for the component on which the cpu_usage.sh tool is run (every 120 seconds, for
12 minutes) and collect data when MCM CPU Overload alarm is raised.
• /opt/v7510/bin/cpu_usage.sh --stop
To stop the background execution of cpu_usage.sh tool, which was started earlier to monitor alarms.
Additional information:
• The cpu_usage.sh tool running in the background, writes logs to:
/opt/v7510/data/cpu_usage/monitor_alarms.log
Check this file to see what alarms were detected, and where the collected data is stored.
• If the alarm history (on diag) shows that an alarm was active, but the cpu_usage.sh tool has failed to
detect the alarm then, the <interval> value is too big. Restart the tool after modifying the<interval>
to a smaller value. The cpu_usage.sh tool now checks for alarms more frequently and will detect the
active alarm.
• The tool will stop:
– after <time> minutes (for example, log message: “2 minutes elapsed. Exiting…”),
– after the alarm is cleared (log message: “the alarm has been cleared. Exiting…”),
– when the “--stop” parameter is used (log message: “Got signal 2”).
Output example:
[root@cb0334i1-mgw1vm002 ~]# /opt/v7510/bin/cpu_usage.sh --background

--scm_vip a.b.c.d --key_path /home/cloud-user/cbam_media.pem
--interval 15 --alarm_name "CPU_load_factor Overload Major Alarm"
Starting monitor_alarms.py script
monitor_alarms.py script (process with PID 22334) is started in background
[root@cb0334i1-mgw1vm002 ~]#

Example of /opt/v7510/data/cpu_usage/monitor_alarms.log file:
2021-04-09 06:39:23,000 - INFO - STARTED

2021-04-09 06:39:23,000 - INFO - SBC architecture: cgSBC
2021-04-09 06:39:23,002 - INFO - I am mcm
2021-04-09 06:39:23,002 - INFO - My appl_id: 2
2021-04-09 06:39:23,004 - INFO - This script was run with options:
Namespace(alarm_name='CPU_load_factor Overload Major Alarm', interval=15,
key_path='/home/cloud-user/cbam_media.pem', scm_vip='a.b.c.d', time=5)
2021-04-09 06:39:23,004 - INFO - Program will stop after 5 minutes
2021-04-09 06:39:23,004 - INFO - Program will check for alarms every 15 seconds
2021-04-09 06:39:23,404 - INFO - No alarms found
2021-04-09 06:39:23,404 - INFO - Sleep 15 seconds...
2021-04-09 06:39:38,843 - INFO - Sleep 15 seconds...
2021-04-09 06:39:54,683 - INFO - Alarm "CPU_load_factor Overload Major Alarm"
detected!
2021-04-09 06:39:54,684 - INFO - start cpu_usage.sh on mcm
2021-04-09 06:40:55,687 - INFO - command output:
Tool is collecting performance data.

Data will be available in /opt/v7510/data/cpu_usage/cpu_usage-2021-
Apr-09_06-39-54
Tool will be running for 60 seconds.
Please wait . . .
Data collected successfully.
2021-04-09 06:40:55,688 - INFO - end of cpu_usage.sh script execution -

data collected successfully
2021-04-09 06:40:56,122 - INFO - the alarm has been cleared. Exiting...
2021-04-09 06:40:56,122 - INFO - FINISHED
7.11 Troubleshooting high CPU usage by 'rngd' service issue
Details
If high CPU usage issue is detected on MCM, SCM, or Packet Interface Module (PIM) due to rngd.service
then, change start parameters on all active/standby VMs (of SCM , PIM, and MCM, as need be):
1. Make a copy of /usr/lib/systemd/system/rngd.service file.
2. Edit ExecStart parameter in /usr/lib/systemd/system/rngd.service file as follows:
ExecStart=/sbin/rngd -x 5 -W 2048 -f

3. Reload the Systemd configuration:
# systemctl daemon-reload
4. Restart the rngd service:
# systemctl restart rngd
5. Check status of rngd service:
# systemctl status rngd

Troubleshooting Guide Appendix
8 Appendix
8.1 Example of Master.log analysis of a tablet to tablet video call

The example below references the t_t_video_0329_1938.log at
http://ihgpweb.ih.lucent.com/~rgs/logs/Verizon_Video_Launch_Training/t_t_video_0329_1938.log
Note: This log was obtained by having the IMS/NGSS service set to log level 3 and the H248DS service set to
log level 4. Test was done using Exfo endpoints in the VTC demo lab, wgw03.
8.1.1 Example of Master.log analysis
Originating INVITE
Tue Mar 29 19:37:46 2016 (1459280266.913586)

RECEIVED from [10.254.254.152]:1024 over TLS on local socket=135
[135.104.224.190]:5061
INVITE received from tablet 10.254.254.152 port 1024 to PCSCF over TLS
[INVITE sip:+19992504000@ims.net SIP/2.0^M
Max-Forwards: 70^M
Via: SIP/2.0/TLS 10.254.254.152:1024;branch=z9hG4bK1-0afefe98-0000-1901970257^M
Supported: 100rel,timer,100rel,timer^M
Call-ID: 2-160329193640-216580061@0afefe98^M
Can use Call-ID to search messages associated with this call
CSeq: 713540930 INVITE^M
From: <sip:+19992501000@ims.net>;tag=2-0afefe98-160329193640-1860858780^M
To: <sip:+19992504000@ims.net>^M
Contact: <sip:+19992501000@10.254.254.152:1024;transport=tcp>^M
Expires: 50^M
Content-Type: application/sdp^M
Accept: application/sdp,application/dtmf-relay^M
Content-Length: 476^M
P-Preferred-Identity: sip:+19992501000@ims.net^M
P-Asserted-Identity: sip:+19992501000@ims.net^M
^M
v=0^M
o=- 166913806 166913807 IN IP4 10.254.254.152^M
s=-Johnny^M
c=IN IP4 10.254.254.152^M tablet IP address
t=0 0^M
b=AS:64^M 64 kilobits per second of audio bandwidth

m=audio 40000 RTP/SAVP 110^M

port 40000 used on tablet for audio, SAVP identifies it as SRTP. 110 is the
codec
a=rtpmap:110 AMR-WB/16000^M codec 110 is AMR Wideband
a=fmtp:110 octet-align=0; mode-set=8^M some properties of AMR Wideband
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:Ka5hgdls9GyycMbokKmWe/
LywG8bf5qYrf5vcZtN^M
a=ptime:20^M
m=video 40002 RTP/SAVP 101^M
port 40002 used on tablet for video, SAVP identifies it as SRTP. 101 is the
codec
a=rtpmap:101 H264/90000^M 101 is the video codec – H264
a=fmtp:101 profile-level-id=42000a^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80
inline:wsT7I0XUjzlAQaoGKTqwv7WisnURzfSrZaGq1RJF^M
b=AS:460^M 460 kilobits per second of video bandwidth
]
Setup H248 bearer context for Originator, Add Request, and Add Reply
The h248ds of the BGC VM sets up the bearer context using Megaco protocol and talking to the SCM on the
7510 gateway. The gateway chosen by the h248ds is based on which gateway is least busy.
H248Dds sends an Add Instruction with dollar signs, $, filled in by the gateway. Here we see it is being sent to
gateway 1024 - which is the first vmg on the first gateway.
_MsgGwS
Sending to GW 1024 (2620:0:60:8ae::310a:2944) encoded message of size
1092:
!/2 [2620:0:60:8ae::3138]:2944
T=2287{C=${A=ip/1/$/${M{ST=1{O{MO=IN,tman/sdr=375,ds/dscp=11,ipdc/
realm="untrustedWGW1"},L{
H248ds is asking for gateway to provide a Context ID, C=$, and termination ID, A=ip/1/$$, from the
untrustedWGW1 realm. Mode is set to Inactive, MO = IN
v=0^M c=IN IP4 $^M
Since call originated on IPV4, need an IPV4 address on gateways untrustedWGW1 realm for audio stream
m=audio $ RTP/SAVP -^M Need port that gateway is using on untrustedWGW1 realm for audio stream
b=AS:3^M - Do not allocate the full audio bandwidth at this point
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:AW0KR3MtBe4/yhWJSvDpHdns+kfGuAiH

+UXaBn4p^M

},R{ - below is the information that came in on the INVITE for the audio stream
v=0^M
c=IN IP4 10.254.254.152^M
m=audio 40000 RTP/SAVP -^M
Below is asking for an IP and port on the gateway for video stream of call
}},ST=2{O{MO=IN,tman/sdr=2875,ds/dscp=11,ipdc/realm="untrustedWGW1"},L{
v=0^M
c=IN IP4 $^M
m=video $ RTP/SAVP -^M
b=AS:23^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:hvFznJ/
Zztu0+XN2VCgXEPEHcIVAOxbUZNX9Gvmo^M
},R{ And here is the info that came in on the INVITE for the video stream
v=0^M
c=IN IP4 10.254.254.152^M
m=video 40002 RTP/SAVP -^M
}}},E=3{hangterm/thb{timerx=435},adid/ipstop{dt=18,dir=IN}}},A=ip/1/$/${M{ST=1{
Below info is to secure resources for the core side of the call - core realm is trustedWGW1
O{MO=IN,tman/sdr=9000,ds/dscp=11,ipdc/realm="trustedWGW1"},L{
v=0^M
c=IN IP6 $^M - Core side is always IPV6 so asking for an IPV6 address on trustedWGW1 realm for audio
stream
m=audio $ RTP/AVP -^M Need port that gateway is using on untrustedWGW1 realm for audio stream
b=AS:72^M - here the full bandwidth is requested on core side, includes padding for RTCP
}},ST=2{O{MO=IN,tman/sdr=62500,ds/dscp=11,ipdc/realm="trustedWGW1"},L{
Below is requesting core resources for the video stream
v=0^M
c=IN IP6 $^M
m=video $ RTP/AVP -^M
b=AS:500^M

}}},E=4{adid/ipstop{dt=18,dir=IN}}}}}
_MsgE
^M
Next you see the gateway respond with all the $ filled in
_MsgGwR
H.248 message from Gateway (1024),size 860
!/2 [2620:0:60:8ae::310A]:2944 P=2287{C=9379{A=ip/1/4/92762{M{ST=1{L{v=0^M
Context 9379 is used by the gateway for the originating side of the call, below are the details for Stream 1,
ST=1 - audio
c=IN IP4 135.104.226.231^M IPV4 address of the untrustedWGW1 realm - pim1.5
b=AS:3^M
m=audio 21090 RTP/SAVP -^M gateway is using port 21090 for audio

+UXaBn4p^M
},R{v=0^M
c=IN IP4 10.254.254.152^M
}},ST=2{L{v=0^M - Now the details for Stream 2, ST=2, video
c=IN IP4 135.104.226.231^M IPV4 address of the untrustedWGW1 realm - pim 1.5
b=AS:23^M
m=video 15894 RTP/SAVP -^M gateway is using port 15894 for video

},R{v=0^M
c=IN IP4 10.254.254.152^M
}}}},A=ip/1/0/100655{M{ST=1{L{v=0^M
Now the details for the core side connection for Stream 1, ST=1, audio
<input>c=IN IP6 2620::60:8AF:0:0:0:320A^M</input> IPV6 address of the trustedWGW1

realm – pim 1.1
b=AS:72^M

m=audio 17782 RTP/AVP -^M Using port 17782 for core side audio
}},ST=2{L{v=0^M
Now the details for the core side connection for Stream 2, ST=2, video
c=IN IP6 2620::60:8AF:0:0:0:320A^M

b=AS:500^M
m=video 17464 RTP/AVP -^M
}}}}}}
_MsgE
^M
And the context diagram looks as follows at this point:
INVITE message out to core to determine where term party is located

Next in the logs you will see the INVITE message being sent to the CFED VM which will send it out to the NSN
core.
Key thing to notice here is that the Call-ID has been changed to an internal one - beginning with LU-.
First you will see this CFED type log
IMS:CFED set destination msg:

Max-Forwards: 69^M
Via: SIP/2.0/UDP 127.0.0.1;branch=z9hG4bK_002_1459280267-97786-1-LucentPCSF^M
Supported: 100rel^M
Call-ID: LU-145928026797740-1@imsgroup0-000.wgw03.vtc-sru-bg.ims.net^M

And then the actual sending of the message from the pcscf core side, port 5060 to the CFED. Its internal
messaging at this point. 169.254.253.0 is the cfed.
SS:SIPtrans Message Log (0x4a77d11c:0x4a7f525c):

Tue Mar 29 19:37:47 2016 (1459280267.98101)
SENDING from socket=85 [169.254.218.0]:5060 over UDP to [169.254.253.0]:5060
(req=0x4a77d11c):
Max-Forwards: 69^M
Via: SIP/2.0/UDP
169.254.218.0:5060;branch=z9hG4bK933213abb24ae6f625e8053c65e0e4f056fab106-0-111-56fad98b
Supported: 100rel^M
INVITE message being delivered from core to terminating side

The NSN core determines the B party location and sends an INVITE to the webGW servicing that user. The CFED
receives it and then delivers it to the IMS/NGSS service where that terminating subscriber had registered on.
Being delivered to the core side of the PCSCF, port 5060.
SS:SIPtrans Message Log (0x487933fc:0x487935f8):

Tue Mar 29 19:37:47 2016 (1459280267.403394)
RECEIVED from [169.254.253.0]:34174 over UDP on local socket=85
[169.254.218.0]:5060
[INVITE sip:+19992504000@10.254.254.152:6024;transport=tcp SIP/2.0^M
Max-Forwards: 66^M
Via: SIP/2.0/UDP
169.254.253.0:5060;branch=z9hG4bK7cbfe5403b6082a458dc41096328fe9156fab0fd-6-21-56fad98b1
Via: SIP/2.0/UDP 127.0.0.1;branch=z9hG4bK_001_938-140249570268812^M
Via: SIP/2.0/TCP [2620:0:60:8ae::314f]:5080;received=2620:0:60:8ae::314f;
branch=z9hG4bK8215df2de9342ad658f71c574ac0b90f56f94374-6-5e86-56fad98b17!
db25e3^M
Via: SIP/2.0/UDP 169.254.218.5:5060;received=169.254.218.5;
branch=z9hG4bK8215df2de9342ad658f71c574ac0b90f56f94343-0-1614-56fad98b17dc20cc^M
Via: SIP/2.0/UDP 127.0.0.1;branch=z9hG4bK_001_31018-1159639964;dp;
lsstag=st-582-582.19251;lzh=555e85aaede6fe99b35f6054ee266997^M
branch=z9hG4bK418ac25ad2c5ee57db73df95e9dd4c5456f94352-0-174d-56fad98b17cfb5f2^M
Via: SIP/2.0/UDP
127.0.0.1;branch=z9hG4bK_001_32216-1162293724;lsstag=it-2.19812^M
Via: SIP/2.0/UDP 127.0.0.1;branch=z9hG4bK_002_32215-1162350204;dp;
lsstag=so-609-609.19811;lzh=6cc729e66ed4d70017e3cac585a230e3^M

branch=z9hG4bKa6585cde7e806133016714324cc7cc7f56f94374-6-5e85-56fad98b17afd290^M
Via: SIP/2.0/TCP [2620:0:60:8ae::310f]:5060;received=2620:0:60:8ae::310f;
branch=z9hG4bKce6e68b983f42566788088c11acec63656fab0fd-6-20-56fad98b5e43!
20e^M
branch=z9hG4bK933213abb24ae6f625e8053c65e0e4f056fab106-0-111-56fad98b5d814d1^M
Supported: 100rel^M
CSeq: 1 INVITE^M
From: <sip:+19992501000@ims.net>;tag=56fab106-56fad98b5d376c2-mw-po-
lucentPCSF-000002^M
To: <sip:+19992504000@ims.net>^M
Next you will see it delivered to the untrusted side of the PCSCF, pcsf-tls1.
SENDING from network to pcsf-tls1.imsgroup0-000.wgw03.vtc-sru-bg.ims.net(2
[INVITE sip:+19992504000@10.254.254.152:6024;transport=tcp SIP/2.0^M
And then on to the terminator who is at 10.254.254.152 port 6024
SS:SIPtrans Message Log (0x4abdfc7c:0x4a74f09c):

Tue Mar 29 19:37:47 2016 (1459280267.585011)
SENDING from socket=136 [135.104.224.190]:5061 over TLS to [10.254.254.152]:6024
(req=0x4abdfc7c):
[INVITE sip:+19992504000@10.254.254.152:6024 SIP/2.0^M
Setup H248 bearer context for Terminator, Add Request, and Add Reply
Here you see the Add request being done for the terminator. Notice that the IP address of the terminator, nor
the audio or video ports are included. That's because the terminator has not answered yet. But what we do
have is the IP address of the core connection of the originator. That information will be used to connect the
core originating info with the core terminating info Notice also this went to gateway 1025, the 2nd vmg on
gateway 1.
_MsgGwS
939:
!/2 [2620:0:60:8ae::3139]:2945
T=2258{C=${A=ip/1/$/${M{ST=1{O{MO=IN,tman/sdr=7875,ds/dscp=11,ipdc/
v=0^M
c=IN IP4 $^M

m=audio $ RTP/SAVP -^M

b=AS:63^M
inline:pq8loNWjOv0MdclvLXz1C6Y4H3B7rUp0XE8CfGuK^M
}},ST=2{O{MO=IN,tman/sdr=60375,ds/dscp=11,ipdc/realm="untrustedWGW1"},L{
v=0^M
c=IN IP4 $^M
m=video $ RTP/SAVP -^M
b=AS:483^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:UipQ0pMomW6lVNWMHSYta8IdGPp9gCmVv
+kAYwsl^M
}}},E=3{hangterm/thb{timerx=435},adid/ipstop{dt=18,dir=IN}}},A=ip/1/$/
${M{ST=1{O{MO=IN,tman/sdr=500,ds/dscp=11,ipdc/realm="trustedWGW1"},L{
v=0^M
c=IN IP6 $^M - Request IPV6 core connection on terminating side for the audio stream
m=audio $ RTP/AVP -^M

b=AS:4^M
},R{
Below is the IPV6 information for the core side audio stream on the originator side
v=0^M
c=IN IP6 2620:0:60:8af::320a^M
m=audio 17782 RTP/AVP -^M
}},ST=2{O{MO=IN,tman/sdr=3000,ds/dscp=11,ipdc/realm="trustedWGW1"},L{
v=0^M
c=IN IP6 $^M Request IPV6 core connection on terminating side for the video stream
m=video $ RTP/AVP -^M

b=AS:24^M
},R{
Below is the IPV6 information for the core side video stream on the originator side
v=0^M
c=IN IP6 2620:0:60:8af::320a^M
}}},E=4{adid/ipstop{dt=18,dir=IN}}}}}K{2257}
_MsgE
^M

And here is the response
_MsgGwR
!/2 [2620::60:8ae:0:0:0:310a]:2945 P=2258{C=9380{A=ip/1/4/62834{M{ST=1{L{v=0^M
c=IN IP4 135.104.226.231^M
b=AS:63^M
}},ST=2{L{v=0^M
c=IN IP4 135.104.226.231^M
b=AS:483^M
+kAYwsl^M
}}}},A=ip/1/0/78241{M{ST=1{L{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
b=AS:4^M
},R{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
}},ST=2{L{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
b=AS:24^M
},R{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
}}}}}}
_MsgE
Context diagram looks as follows at this point

200-OK of INVITE comes back with SDP of Terminator

So now we have 2 contexts setup, one for the originator, 9379 and one for the terminator, 9380. But we do not
have the address info or SDP info of the terminator. This comes in the 200-OK of the INVITE.

Tue Mar 29 19:37:48 2016 (1459280268.346754)
[135.104.224.190]:5061
[SIP/2.0 200 OK ^M
CSeq: 1 INVITE^M
From: <sip:+19992501000@ims.net>;tag=56fab106-56fad98b22d966dc-gm-pt-
lucentPCSF-000004^M
To:<sip:+19992504000@ims.net>;tag=3-0afefe98-160329193640-940381981^M
Via: SIP/2.0/TLS
135.104.224.190:5061;branch=z9hG4bKbe3d2fa59ef0708598906cfe24857d3356fab106-0-112-56fad9
Contact:<sip:+19992504000@10.254.254.152:6024;transport=tcp>^M
Supported: 100rel^M
Content-Type: application/sdp^M
Content-Length: 470^M
^M
v=0^M
o=- 167888454 167888455 IN IP4 10.254.254.152^M
s=-^M
c=IN IP4 10.254.254.152^M Terminator IP address
t=0 0^M
b=AS:64^M Terminator bandwidth for audio

m=audio 60000 RTP/SAVP 110^M Terminator Port 6000 for audio , SAVP is for SRTP, 110 is the codec
a=rtpmap:110 AMR-WB/16000^M 110 codec is AMR WB
a=fmtp:110 octet-align=0; mode-set=8^M

inline:hogyxzI58W3psCKLYpidL4xIlS7yakA3jBVBqQyQ^M
a=ptime:20^M
m=video 60002 RTP/SAVP 101^M Terminator Port 6002 for video , SAVP is for SRTP, 101 is the codec
a=rtpmap:101 H264/90000^M 101 is H.264
a=fmtp:101 profile-level-id=42000a^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:m5MYzlpLB0u48Pzae15yGI7/
YSMtU41tixmCzMKP^M
b=AS:460^M Terminator bandwidth for video
======================================= From Ray Schmidt
Master.log analysis and the drawing of context details at various points of a call.
End of Call Statistics
For debugging of RTP issues, it is quite valuable to turn H248DS log level to MEDIUM and capture these "from
Gateway" messages at the completion of the call test:
+++ 2013/12/19 20:23:45.716 H248 MEDIUM ACTIVESYN h248ds:5310 E:980670 S:247404

(printCurrH248Msg 4109 A-0:4:0 27.11.01.95:1387336784 lssbld 169.254.139.3)
_MsgGwR
!/2 [2001:1890:1001:2217::8]:5020 P=35004{C=4368{S=ip/1/5/16201{SA{nt/
dur=57016,nt/os=0,nt/or=42735,rtp/ps=0,
rtp/pr=829,rtp/pl=0.000000,rtp/jit=237,rtp/delay=0,rtp/dur=57016,rtp/os=0,rtp/
or=42735}},S=ip/1/3/16202{SA{nt/dur=57016,nt/os=41495,nt/or=18160,rtp/ps=809,rtp/
pr=386,rtp/pl=0.000000,rtp/jit=127625,rtp/delay=0,rtp/dur=57016,rtp/os=41495,rtp/
or=18160}}}}
_MsgE
1st section:
S=ip/1/5/16201{SA{nt/dur=57016,nt/os=0,nt/or=42735,rtp/ps=0,rtp/pr=829,
rtp/pl=0.000000,rtp/jit=237,rtp/delay=0,rtp/dur=57016,rtp/os=0,rtp/or=42735}}
2nd section:
S=ip/1/3/16202{SA{nt/dur=57016,nt/os=41495,nt/or=18160,rtp/ps=809,rtp/pr=386,
or=18160}}
1st section is access and 2nd section is core.
rtp/ps - packets sent
rtp/pr - packets received

rtp/pl - lost %age.

rtp/os - octets sent
rtp/or - octets received
nt/dur - duration
With no transcoding, the packets-received from access should roughly equal packets-sent on core side, and
vice-versa. In this example there was some issue, and the access side packets-sent is zero even though the
core side packets-received was 386.
7510 Error Response 430
MEGACO message at the end of a failed call may have an error reason found in the Access Border
Troubleshooting Ideas.doc Appendix 1. In this case, error 430 is "Unknown TerminationID". A likely cause is
that the signaling plane does not have the realms or IPDC GW variant option set correctly. Or it could be the
H248DSs were not re-initialized following the signaling plane provisioning:
+++ 2014/02/21 14:45:04.198 H248 MEDIUM ACTIVESYN h248ds:18165 E:46184745 S:19031

(printCurrH248Msg 4105 A-0:5
:0 26.12.01.16:1373629116 lssbld 169.254.139.4)
_MsgGwR
!/2 [2001:1890:FC:112A::1:1]:5021 P=4774413{C=${A=ip/${ER=430{"ip/$"}}}}
_MsgE
7510 Error Response 500
Another common error seen in response for Add request is the ER=500 with Return Code -1551
(RC_VM_NH_MAC_NOT_YET_RESOLVED). This error usually indicates a routing issue either on the 7510 PIM or
on the external router. 7510 alarms may also accompany these errors, for the specific PIM involved. Return
codes can be found in Appendix 3 of the "Access Border Troubleshooting Ideas.doc". Excerpt captured from
trace:
ERROR Description: ER=500{"Internal Gateway Error-H248_cmd_modify_proc.c:1707:

-1551"}
[Error code: Internal Software Failure in MG]
============================
Modify the terminating context

So now that we have the terminating party's info we modify the context associated with the terminating side.
Below is the Modify, identified by MF=, for the terminating context 9380
_MsgGwS
Sending to GW 1025 (2620:0:60:8ae::310a:2945) encoded message of size 1239:
!/2 [2620:0:60:8ae::3139]:2945

T=2259{C=9380{MF=ip/1/4/62834{M{ST=1{O{MO=SR,tman/sdr=7875,ds/dscp=1d,ipdc/
Notice the mode, is set to send/receive, MO=SR. This is an important piece of info if you ever have one way
media, make sure after the 200-OK of INVITE your mode gets changed to SR in the Modify
v=0^M
c=IN IP4 135.104.226.231^M
b=AS:63^M
},R{
v=0^M
c=IN IP4 10.254.254.152^M
}},ST=2{O{MO=SR,tman/sdr=60375,ds/dscp=1d,ipdc/realm="untrustedWGW1"},L{
v=0^M
c=IN IP4 135.104.226.231^M
b=AS:483^M
+kAYwsl^M
},R{
v=0^M
c=IN IP4 10.254.254.152^M
YSMtU41tixmCzMKP^M
}}}},MF=ip/1/0/78241{M{ST=1{O{MO=SR,tman/sdr=9000,ds/dscp=1d,ipdc/
realm="trustedWGW1"},L{
v=0^M
c=IN IP6 2620:0:60:8af::320a^M
b=AS:72^M
},R{
v=0^M
c=IN IP6 2620:0:60:8af::320a^M
}},ST=2{O{MO=SR,tman/sdr=62500,ds/dscp=1d,ipdc/realm="trustedWGW1"},L{
v=0^M
c=IN IP6 2620:0:60:8af::320a^M

b=AS:500^M
},R{
v=0^M
c=IN IP6 2620:0:60:8af::320a^M
}}}}}}K{2258}
_MsgE
And the response from the gateway
_MsgGwR
!/2 [2620::60:8ae:0:0:0:310a]:2945 P=2259{C=9380{MF=ip/1/4/62834{M{ST=1{L{v=0^M
c=IN IP4 135.104.226.231^M
b=AS:63^M
},R{v=0^M
c=IN IP4 10.254.254.152^M
}},ST=2{L{v=0^M
c=IN IP4 135.104.226.231^M
b=AS:483^M
+kAYwsl^M
},R{v=0^M
c=IN IP4 10.254.254.152^M
YSMtU41tixmCzMKP^M
}}}},MF=ip/1/0/78241{M{ST=1{L{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
b=AS:72^M
},R{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
}},ST=2{L{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
b=AS:500^M

},R{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
}}}}}}
_MsgE
Context diagram now looks as follows. Notice terminating side has mode set to Send/Receive. Originating side
is still set to Inactive.
Modify the originating context

Now here comes the final thing that connects these 2 parties so that they can send and receive media - the
modification of the originator's context. And this happens immediately after the modify of the terminating
context. Remember the originating context does not have the core information of the terminator. Nor does it
have its mode set to Send/Receive. This happens with the Modify.
_MsgGwS
1233:
!/2 [2620:0:60:8ae::3138]:2944
T=2288{C=9379{MF=ip/1/4/92762{M{ST=1{O{MO=SR,tman/sdr=7875,ds/dscp=1d,ipdc/
Context 9379 is the originator associated with gateway 1024. Notice also the mode is send/receive. Need this
for 2 way media path.
v=0^M
c=IN IP4 135.104.226.231^M
b=AS:63^M
+UXaBn4p^M
},R{
v=0^M

c=IN IP4 10.254.254.152^M

}},ST=2{O{MO=SR,tman/sdr=60375,ds/dscp=1d,ipdc/realm="untrustedWGW1"},L{
v=0^M
c=IN IP4 135.104.226.231^M
b=AS:483^M
},R{
v=0^M
c=IN IP4 10.254.254.152^M
}}}},MF=ip/1/0/100655{M{ST=1{O{MO=SR,tman/sdr=9000,ds/dscp=1d,ipdc/
realm="trustedWGW1"},L{
Below is doing the modify of Stream 1, ST=1, audio, on the trustedWGW1 realm
v=0^M
c=IN IP6 2620:0:60:8af::320a^M IP address and port of the originating core audio stream connection

b=AS:72^M
},R{
And below is the IP address and port of the terminating core audio stream connection
v=0^M
c=IN IP6 2620:0:60:8af::320a^M
}},ST=2{O{MO=SR,tman/sdr=62500,ds/dscp=1d,ipdc/realm="trustedWGW1"},L{
Likewise the same being done for Stream 2, ST=2, video
v=0^M
c=IN IP6 2620:0:60:8af::320a^M
b=AS:500^M
},R{
v=0^M
c=IN IP6 2620:0:60:8af::320a^M


}}}}}}
_MsgE
The gateway makes the change, linking the 2 contexts together, and sends back a modify response
_MsgGwR
!/2 [2620:0:60:8ae::310A]:2944 P=2288{C=9379{MF=ip/1/4/92762{M{ST=1{L{v=0^M
c=IN IP4 135.104.226.231^M
b=AS:63^M
+UXaBn4p^M
},R{v=0^M
c=IN IP4 10.254.254.152^M
}},ST=2{L{v=0^M
c=IN IP4 135.104.226.231^M
b=AS:483^M
},R{v=0^M
c=IN IP4 10.254.254.152^M
}}}},MF=ip/1/0/100655{M{ST=1{L{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
b=AS:72^M
},R{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
}},ST=2{L{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
b=AS:500^M
},R{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
}}}}}}

_MsgE
At this point there should be a 2 way path with audio and video. Context diagram looks as follows:
BYE processing
When the call ends you will see either a BYE from the originator/terminator and then the appropriate BYE/200-
OK processing. In our example the originator hung up first and sends the BYE.

Tue Mar 29 19:38:18 2016 (1459280298.988923)
[135.104.224.190]:5061
[BYE sip:+19992504000@135.104.224.190:5061;
encoded-parm=QbkRBthOEgsTXk5XXlFQWF5cWV5fQDEpQUJHRk1ISUpxe
HN0c3Z9eHErLnZ9fn9ga2JjZGVsZ2hpagsBAgRTaw0NC!
QYe SIP/2.0^M
Max-Forwards: 70^M
Via: SIP/2.0/TLS 10.254.254.152:1024;branch=z9hG4bK4-0afefe98-0000-1974605252^M
Call-ID: 2-160329193640-216580061@0afefe98^M
CSeq: 713540932 BYE^M
From: <sip:+19992501000@ims.net>;tag=2-0afefe98-160329193640-1860858780^M
To: <sip:+19992504000@ims.net>;tag=56fab106-56fad98a368a3141-gm-po-
lucentPCSF-000001^M
Contact: <sip:+19992501000@10.254.254.152:1024;transport=tcp>^M
Content-Length: 0^M
P-Preferred-Identity: sip:+19992501000@ims.net^M
^M
]

H.248 Tear down call contexts with Subtract IP, Subtract Response
You will see Subtracts for both contexts, followed by Subtract Replies. Notice that the Subtract Replies from
the gateway have statistics for jitter, delay, etc. This is due to the VTC demo lab having a Quality of Service
statistics feature turned on. This feature is also turned on at Westlake. The production sites do not have this.
_MsgGwS
!/2 [2620:0:60:8ae::3138]:2944
T=2298{C=9379{S=*{AT{SA}}}}K{2297}
_MsgE
_MsgGwS
!/2 [2620:0:60:8ae::3139]:2945
T=2264{C=9380{S=*{AT{SA}}}}
_MsgGwR
!/2 [2620:0:60:8ae::310A]:2944 P=2298{C=9379{S=ip/1/4/92762{SA{nt/dur=32086,
nt/os=526970,nt/or=529590,rtp/ps=5539,rtp/pr=5570,rtp/pl=0.000000,rt!
p/jit=0,rtp/delay=0,rtp/dur=32086,rtp/os=526970,rtp/or=529590,tmanr/dp=0}},
S=ip/1/0/100655{SA{nt/dur=32090,nt/os=473050,nt/or=470998,rtp/ps=5570,
rtp/pr=5542,rtp/pl=0.000000,rtp/jit=0,rtp/delay=0,rtp/dur=32090,rtp/os=473050,
rtp/or=470998,tmanr/dp=0}}}}
_MsgE
_MsgGwR
!/2 [2620::60:8ae:0:0:0:310a]:2945 P=2264{C=9380{S=ip/1/4/62834{SA{nt/dur=31751,
nt/os=529590,nt/or=529854,rtp/ps=5570,rtp/pr=5573,rtp/pl=0.000000,rtp/jit=0,
rtp/delay=0,rtp/dur=31751,rtp/os=529590,rtp/or=529854,tmanr/dp=0}},
S=ip/1/0/78241{SA{nt/dur=31755,nt/os=473256,nt/or=473050,rtp/ps=5573,
rtp/pr=5570,rtp/pl=0.000000,rtp/jit=0,rtp/delay=0,rtp/dur=31755,rtp/os=473256,
rtp/or=473050,tmanr/dp=0}}}}
_MsgE
8.2 SBC lab environment procedures

With PF mode support, stacked RMS support and limited RMS labs availability, it is worthwhile to perform below
listed procedures before attempting to deploy SBC in lab environments, and subsequently encountering errors,
and spending time debugging those errors.

8.2.1 Identifying currently isolated cores and verifying whether they match with expected
test conguration
1. Execute following command from host01 for viewing current host configuration:
cm_adm -a host_config -display
Note:
This command need not be executed for a new host installation.
Expected outcome
The host configuration summary is displayed.
Sample host configuration summary for various configurations are listed below.
Figure 2: Traditional RMS with large PIM (13 core) conguration
Figure 3: Traditional RMS with small PIM (11 core) with SC2 VM with 'PF_mode_support: True' conguration
Note:
The PIM cores and now FW/PFW cores are isolated (because PF_mode_support: True). However,
when PF_mode_support: No, then only cores 14-23 are isolated.
Figure 4: Stacked RMS signaling host conguration
Note:

The FW/PFW cores are isolated.
The above configurations are driven by the contents of OLV.zip file (FI Worksheet) and by the number of
SC VM pairs. They are classified as:
• Traditional RMS (large PIM) - One SC VM pair.
• Small PIM configuration with SC2 VM - Two SC VM pairs.
• Stacked signaling RMS - Three or more SC VM pairs.
2. Verify the cpu isolation configuration that will be deployed. This can be done in one of the 2 ways
described below:
Note:
Any time a user changes a configuration from any of the above listed configurations, the user
should perform the following steps before trying to deploy the modified configuration.
• Execute the following command:
./sbc-cpu-isolate.sh
• Execute the following commands:
cd /storage/guestdata/app_data/sbc-playbooks/
./sig_prep_config.sh
Verify the values of cpu_isolate and stacked_RMS fields in the sig_prep_config.yml and
sig_config.yml files generated under the /storage/guestdata/app_data/sbc-playbooks/
group_vars/all/ directory are appropriate.
Sample entries in sig_prep_config.yml are provided below.
Figure 5: Sample entries in sig_prep_config.yml le
3. In case of switching from PF mode to VF mode, make sure that the RMS hosts are correctly cabled.


SBC - Troubleshooting Guide-21.8

Uploaded by

Copyright:

Available Formats

SBC - Troubleshooting Guide-21.8

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SBC - Troubleshooting Guide-21.8

Uploaded by

Copyright:

Available Formats

Session Border Controller

No part of this document may be modified.

Session Border Controller 3TB-30120-HSAA-TCZZA 4 2

1 About this document.................................................................................................................................................. 8

1.2 Intended audience....................................................................................................................................................8

1.3 Used conventions..................................................................................................................................................... 9

1.4 General information............................................................................................................................................... 10

2.1 Workflow to troubleshoot a problem on Nokia SBC............................................................................................ 15

2.2 Troubleshooting guidelines................................................................................................................................... 17

2.3 Things to do before escalating to level 4 support.............................................................................................. 17

2.3.1 Troubleshooting tools - logs, diagnostics and health checks.................................................................. 18

2.3.2 Collecting SBC logs using sbc_logs command...................................................................................... 19

2.3.3 Collecting CBIS/Cloud platform audit logs................................................................................................21

2.3.4 Collecting logs from CBAM VNF................................................................................................................. 24

2.3.5 Collecting VMM-HI logs on RMS.................................................................................................................25

3 Troubleshooting with Alarms................................................................................................................................... 29

3.1 Determining alarm severity................................................................................................................................... 29

3.2 Viewing active SBC alarms in Web UI....................................................................................................................29

3.5 Troubleshooting using FM process....................................................................................................................... 30

3.9 Troubleshooting OAM internal alarms.................................................................................................................. 33

3.10 Troubleshooting when no Alarms are sent to Northbound OSS.......................................................................33

Session Border Controller 3TB-30120-HSAA-TCZZA 4 3

4 Troubleshooting connectivity/configuration issues................................................................................................35

4.2 Troubleshooting connectivity issues (missing static routes).............................................................................. 35

4.3 Troubleshooting connectivity issues (other)........................................................................................................ 36

4.4 Troubleshooting connectivity issues of the RMS system....................................................................................36

4.5 Troubleshooting RCC offline issue........................................................................................................................ 39

4.6 Troubleshooting degraded configuration resource issue....................................................................................40

4.7 Troubleshooting chk_db_repl FAILED and chk_database FAILED issue.............................................................. 40

4.8 Troubleshooting PFW EIPM error...........................................................................................................................41

4.10 Troubleshooting wrong media plane image issue in OpenStack...................................................................... 42

4.11 Troubleshooting SRIOV service issue on compute node...................................................................................43

4.12 Troubleshooting SBC OAM service issues.......................................................................................................... 44

4.13 Troubleshooting connectivity issues between SBC FW VM and remote S-GW/P-GW.......................................44

4.14 Troubleshooting SBC CLI commands timeout issue..........................................................................................45

4.15 Troubleshooting partition table issue on RMS hosts........................................................................................ 45

5 Troubleshooting SBC database problems............................................................................................................... 47

5.1 Troubleshooting 500 internal server error...........................................................................................................47

6 Troubleshooting SBC software upgrade issues.......................................................................................................48

6.1 Checking and understanding ansible.log..............................................................................................................48

6.2 Checking and understanding InteractiveInstaller.sh.log..................................................................................... 50

6.3 Troubleshooting VMM-HI upgrade........................................................................................................................ 51

6.4 Troubleshooting SIM Based Software Upgrade (SBSU) issues............................................................................ 51

6.5 Troubleshooting Image Based Software Upgrade (IBSU).....................................................................................52

6.5.1 Upgrade logs............................................................................................................................................... 52

6.5.2 Useful files...................................................................................................................................................52

6.5.3 Stage debug and cleanup...........................................................................................................................53

6.5.4 Resume considerations (for advanced users only)................................................................................... 54

6.5.5 sbc_hc.py tool............................................................................................................................................. 56

Session Border Controller 3TB-30120-HSAA-TCZZA 4 4

6.5.6 Hardware slowdown.................................................................................................................................... 58

7 Troubleshooting SBC media plane issues................................................................................................................61

7.2 Troubleshooting the media path problem............................................................................................................61

7.3 Troubleshooting the networking problem............................................................................................................ 66

7.4 Troubleshooting the application crash problem.................................................................................................. 70

7.5 Troubleshooting the H248 transaction failure problem...................................................................................... 71

7.6 Troubleshooting backplane connection issues.................................................................................................... 72

7.6.1 Troubleshooting media backplane connection down issue......................................................................73

7.6.2 Checking backplane connection status..................................................................................................... 73