SBC - Troubleshooting Guide-21.8
SBC - Troubleshooting Guide-21.8
SBC - Troubleshooting Guide-21.8
Release 21.8
Troubleshooting Guide
3TB-30120-HSAA-TCZZA
Issue 4
©2021 Nokia. Nokia Condential Information. Use subject to agreed restrictions on disclosure and use.
Troubleshooting Guide
Nokia is committed to diversity and inclusion. We are continuously reviewing our customer documentation and consulting with standards bodies
to ensure that terminology is inclusive and aligned with the industry. Our future customer documentation will be updated accordingly.
This document includes Nokia proprietary and confidential information, which may not be distributed or disclosed to any third parties without the
prior written consent of Nokia.
This document is intended for use by Nokia's customers ("You"/"Your") in connection with a product purchased or licensed from any company
within Nokia Group of Companies. Use this document as agreed. You agree to notify Nokia of any errors you may find in this document; however,
should you elect to use this document for any purpose(s) for which it is not intended, You understand and warrant that any determinations You
may make or actions You may take will be based upon Your independent judgment and analysis of the content of this document.
Nokia reserves the right to make changes to this document without notice. At all times, the controlling version is the one available on Nokia’s site.
NO WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY OF AVAILABILITY, ACCURACY,
RELIABILITY, TITLE, NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, IS MADE IN RELATION TO THE CONTENT
OF THIS DOCUMENT. IN NO EVENT WILL NOKIA BE LIABLE FOR ANY DAMAGES, INCLUDING BUT NOT LIMITED TO SPECIAL, DIRECT, INDIRECT,
INCIDENTAL OR CONSEQUENTIAL OR ANY LOSSES, SUCH AS BUT NOT LIMITED TO LOSS OF PROFIT, REVENUE, BUSINESS INTERRUPTION,
BUSINESS OPPORTUNITY OR DATA THAT MAY ARISE FROM THE USE OF THIS DOCUMENT OR THE INFORMATION IN IT, EVEN IN THE CASE OF
ERRORS IN OR OMISSIONS FROM THIS DOCUMENT OR ITS CONTENT.
Copyright and trademark: Nokia is a registered trademark of Nokia Corporation. Other product names mentioned in this document may be
trademarks of their respective owners.
©2021 Nokia.
Table of Contents
List of Figures............................................................................................................................................................. 6
List of Tables...............................................................................................................................................................7
2 Introduction to troubleshooting.............................................................................................................................. 15
2.4 Performance check list on receiving outage calls for SBC media....................................................................... 25
3.3 Viewing proposed repair actions for an active SBC alarm in Web UI.................................................................. 30
3.4 Viewing proposed repair actions for an active SBC alarm without using Web UI............................................... 30
3.6 Troubleshooting when unable to open Fault Management view in Web UI.........................................................31
3.7 Troubleshooting when no Alarms are present for Media/Signaling plane in Web UI..........................................32
3.8 Troubleshooting using help provided for Media and Signaling Alarms in Web UI...............................................32
4.9 Troubleshooting SBC signaling plane health check error due to rolling inits of the h248ds under bgc1-a
and bgc1-b............................................................................................................................................................. 41
6.4.1 Troubleshooting CBAM GUI upgrade/backout failure when sim.log indicates stoppage at
CP_WARNING VERIFY_SOFTWARE or COMMIT pause point...................................................................... 51
7.3.1 Scenario-1................................................................................................................................................... 66
7.3.2 Scenario-2................................................................................................................................................... 68
8 Appendix....................................................................................................................................................................90
8.2.1 Identifying currently isolated cores and verifying whether they match with expected test
configuration............................................................................................................................................. 108
List of Figures
Figure 1: Ansible.log field details................................................................................................................................. 48
Figure 2: Traditional RMS with large PIM (13 core) configuration............................................................................. 109
Figure 3: Traditional RMS with small PIM (11 core) with SC2 VM with 'PF_mode_support: True' configuration...... 109
List of Tables
Table 1: Conventions used.............................................................................................................................................. 9
• Configuration issues.
The personnel using this guide must have the following knowledge:
Appearance Description
graphical user interface text Text that is displayed in a graphical user interface or
in a hardware label
Appearance Description
Nokia SBC 21.8 Guide to 3TB-30118-HSAA-CEZZA This document lists all the
Documentation documents available for Nokia
SBC and provides a brief
description regarding each one
of them, thus helping to guide
the user to the appropriate
document.
Nokia SBC 21.8 Feature Handbook 3TB-30102-HSAA-TCZZA This document lists all the major
features supported by Nokia SBC
per release. It also provides the
provisioning details of each of
these features.
Nokia SBC 21.8 Operations for 3TB-30103-HSAA-TCZZA This document provides details
Integrated Configuration of the OAM operations that
can be performed using the
Nokia SBC Web UI for integrated
configuration. It also provides
configuration details about the
various SBC Web UI configuration
tables.
Nokia SBC 21.8 Charging Interface 3TB-30104-HSAA-TCZZA This document provides details of
Specification the charging and billing interface
specifications for Nokia SBC.
Nokia SBC 21.8 Software Licenses 3TB-30106-HSAA-TCZZA This document provides details
of the software licenses for Nokia
SBC.
Nokia SBC 21.8 Security Guide 3TB-30107-HSAA-TCZZA This document serves as a high-
level description of security
architecture of Nokia SBC in
integrated configuration.
Nokia SBC 21.8 System and 3TB-30110-HSAA-PRZZA This document provides details
Network Parameters Job Aid of the system and network
parameters for Nokia SBC in MS
Excel format.
Nokia SBC 21.8 System Ports and 3TB-30111-HSAA-TCZZA This document provides details of
Protocols Job Aid the system ports and protocols
for Nokia SBC in MS Excel format.
Nokia SBC 21.8 Key Performance 3TB-30113-HSAA-TCZZA This document lists the Key
Indicators Performance Indicators (KPI) and
Measurements available for Nokia
SBC, which are critical indicators
used for dimensioning purposes.
Nokia SBC 21.8 Release Changes 3TB-30116-HSAA-TCZZA This document lists the changes
to Ports, Protocols, System and N/
W Parameters, PM Counts, Alarms,
Charging Records and so on for
SBC Release 21.8 w.r.t its previous
main release.
Nokia SBC 21.8 SIP Interface 3TB-30121-HSAA-PBZZA This document provides the SIP
Specification interface specification details for
Nokia SBC.
Nokia SBC 21.8 Data Flows 3TB-30130-HSAA-EBZZA This document provides data flow
information for Nokia SBC, to
identify interfaces with sensitive
data and use of encryption.
Nokia SBC 21.8 Release Notes The release notes for this Nokia
SBC release.
Legal notice
Nokia is committed to diversity and inclusion. We are continuously reviewing our customer documentation and
consulting with standards bodies to ensure that terminology is inclusive and aligned with the industry. Our
future customer documentation will be updated accordingly.
Nokia is a registered trademark of Nokia Corporation. Other products and company names mentioned herein
may be trademarks or tradenames of their respective owners.
Document support
For support in using this or any other Nokia (former Alcatel-Lucent) document, please call one of the following
telephone numbers.
• If you are using a landline, a cellular phone or VoIP, dial this number: 1-888-582-3688
• If you are using a cellular phone or VoIP, dial this number: +1-469-646-4025
• If you are using a landline (phone without a plus [+] character), replace the plus sign with your country's
exit code. Dial this number: Exit code for the country of origin: +1-469-646-4025. See the country-
specific exit codes listed http://www.howtocallabroad.com/codes.html.
Technical support
For technical support, contact your local customer support team. See the Support web site (https://
networks.nokia.com/support/) for contact information.
How to order
To order Nokia documents contact your local sales representative or use Nokia Support portal.
How to comment
To comment on this document, go to the Online Comment Form or e-mail your comments to the Comments
Hotline (comments@nokia.com).
2 Introduction to troubleshooting
This chapter provides general information about the troubleshooting process, guidelines, and tools, along with
a workflow for troubleshooting a problem in Nokia SBC.
General information
The troubleshooting process workflow identifies and resolves issues related to a service or component. The
issue can be an intermittent or a continuous degradation in service, or a complete network failure.
The first step in problem resolution is to identify the problem. Problem identification can include an alarm
received from a component, an analysis of performance data, or a customer problem report.
• understand the designed state and behavior of the network, and the services that use the network.
• recognize and identify symptoms that impact the intended function and performance of the product.
Detailed information about the symptoms that are associated with the problem helps the Support engineer
diagnose and fix the problem. The following information can help you assess the scope of the problem:
• Alarms
• Error logs
• Performance statistics
• Accounting logs
• Divide the problem based on signaling or bearer planes and try to isolate the problem to one of these
planes.
• Extrapolate from alarms and logs of events the cause of the symptoms. Try to reproduce the problem.
• Determine the type of problem by reviewing the sequence of events before the problem occurred:
– Trace the actions that were performed to see where the problem occurred.
• Check the documentation or your procedural information to verify that the steps you performed followed
documented standards and procedures.
• Check the alarm log for any generated alarms that are related to the problem.
• Record any system-generated messages, such as error dialog boxes, for future troubleshooting.
• If you receive an error message, perform the actions recommended in the error dialog box, client GUI
dialog box, or event notification.
During troubleshooting:
• Check the appropriate release notice from the Nokia Support Documentation Service for any release-
specific problems, restrictions, or usage recommendations that relate to your problem.
• If you need help, confirmation, or advice, contact your technical support representative.
Overview
View the SBC Support Checklist page and perform the checklist tasks provided there, before contacting SBC
support team.
Also, use the following resources and perform health check and collect logs, as need be:
If no argument is provided, the health tool will apply all the checking tests, includes ssh_check, version check,
cpm check, db check, connectivity check, diskspace check, shmc check, hub check, REM state check, logical
volume check, VM state check, alarm check and so on.
Any issues or errors will be printed to the screen. You will also be able to check the errors via /var/opt/log/
health/health.log. If you are seeing a problem, running a health check is a good first step to determine
what the cause might be.
• /var/log/auth.log: All authentication related log info, each user login attempts will be logged here.
• /var/log/bash.log: All signaling and media CLI command activities will be logged here (media plane
logged based on log level, ERROR level by default).
For information on viewing log files, see Procedure for viewing SBC log files on page 18 section.
1. Log in from your computer to OAM server by performing SSH to OAM server of SBC and logging in as root
user using root password.
$ ssh root@123.456.789.6
• /var/log/bash.log: All signaling and media CLI command activities will be logged here (media plane
logged based on log level, ERROR level by default).
To support Call Detail Record (CDR) scrambling using CAPS-V, the CDR in ASN.1 binary format has to be
decoded into ASCII format first, by using asnccflSearch CLI command, as explained below:
1. Log in to standby CDR card and find the encoded ASN.1 CDR file (which is to be scrambled) from the
following directory:
/storage/ccfl_app/charging/stream1/primary
or
/storage/ccfl_app/charging/stream1/secondary
asnccflsearch <cdrfile.name>
Details
Provided below is a guideline on collecting various logs using sbc_logs command, before you contact Level
4 support. The sbc_logs command can be used to retrieve log files on both signaling plane as well as media
plane, and collect status data as well as configuration data on the system.
The sbc_logs command (from /opt/LSS/sbin/sbc_logs directory) can be executed from the standby MI
card.
Command Syntax
Parameters
--level, -l <levelnum>: Set log retrieval level where <levelnum> values can be:
Attention:
The sbc_logs command with --level 2 should be only executed during a maintenance window.
However, if it is necessary to execute the sbc_logs command with --level 2 during normal
operating window, avoid executing this command during peak hours. Also continuously monitor CPU
usage on Media VMs during the course of execution of this command.
--only_pm, -p: Collects only the PM data files from both the signaling plane as well as the media plane.
--health_logs, -d: Bundles and collects the lcp_status, health check and the alarm_cli outputs, tagged
with their version information.
--callp_logs, -c: Bundles and collects the master.logs, signalling.logs and the fslogs.
Examples
To print the help information for sbc_logs command, execute:
sbc_logs --help
To collect the current logs from the active CNFG card, execute:
sbc_logs
Or
sbc_logs --level 1
To collect PM data of just the first 10 entries, in the last 11 files, execute:
To collect the current as well as historic log files from the active CNFG card, execute:
sbc_logs --level 2
To execute get_logs command to retrieve information from both NODE_LOCAL and NODE_NETWORK hosts,
execute:
sbc_logs --level 3
To run health check by executing health --test field_install command initially and then subsequently
collect the signaling plane as well as media plane log files, execute following command.
sbc_logs --run_health
To collect the current as well as historic signaling plane log files, execute:
To collect the current as well as historic media plane log files, execute:
To collect the PM count data files both on the signaling plane and the media plane, execute:
To choose the specific log files (like master.log) to be collected in level 2, execute:
Note:
• If system prompts: “Are you sure you want to continue connecting (yes/no)?", input Yes and press ENTER.
• If ^M text format issue is there, then retry after converting the script from Microsoft Windows style to
Linux style using dos2unix Linux command.
• Press CTRL+C to terminate the execution of sbc_logs command. Then use ps -ef | grep sbc_logs
to check if it is still running in the background. Use kill -9 pid to terminate the background program.
Verifying VM conguration
1. Log in to Undercloud server as a stack user with appropriate privileges.
# source overcloudrc
5. Create a file with openstack server show commands added in it for all SBC VMs, by executing the
single command:
Note: The above command assumes that SBC VMs prefixes are sbc, asbc, and/or psbc.
# cat cmd_gen_showAllSBCvms
7. Verify that commands are generated for all the following SBC VMs:
• Signaling plane: OAM, SC, DFED, CFED, BGC, FW, and iCCF
Note: If some SBC VMs are missing then the assumption from previous step is not correct. You
need to create other command with proper prefixes
8. Collect SBC VMs detailed information (like state, addresses, flavor and so on) by executing the
commands:
# ./cmd_gen_showAllSBCvms
# ls -lrth cmd_op_server_show*
Confirm that the displayed servers' data is written to the files, by confirming that the size of all the
collected files are greater than zero (0) Kb.
10. Collect the details of all volumes (like size, what VM it is attached to, and so on) by executing the
command:
11. Collect the details of all flavors (like ram, vcpu, swap, disk and so on) by executing the command:
12. Collect the details about all the compute host information by executing the single command:
2. Collect the instance CPU configuration and NUMA nodes information, by executing the command:
3. Collect details about the pcpus which that instance of vcpus can use by executing the command:
4. Collect the list of all instances in the compute nodes by executing the command:
# salt '*ompute*' cmd.run "virsh list | awk '{print \$2}' | grep inst | xargs
> /tmp/vlist"
5. Collect CPU pinning information for all instances in the compute nodes by executing the command:
# salt '*ompute*' cmd.run 'for i in `cat /tmp/vlist`; do echo $i; virsh vcpupin
$i ; done; rm -rf /tmp/vlist' > cmd_vcpupin
6. Confirm that size of all the collected files are greater than zero (0) Kb. Execute the command:
# ls -lrth cmd_*
Note: The list of files which should be collected by above procedure is:
• For VM configuration:
– cmd_nova_zone_list
– cmd_nova_aggr_list
– cmd_gen_showAllSBCvms
– cmd_op_server_show_*
– cmd_op_volume_list_all
– cmd_op_flavor_list_all
– cmd_op_server_list_all
– cmd_lscpu
– cmd_grep_nova
– cmd_vcpupin
cd /home/cbam
wget https://repo.lab.pl.alcatel-lucent.com/artifactory/sbc-generic-releases/3rd/mgw/cbam_tools/
getLogs/1/getLogs
chmod +x getLogs
Parameters:
• -h:
• -v:
• <output_name> <VNF_id>:
Collect logs from <VNF_id> (format: CBAM-xxxx...) and saves zip file to ./<output_name>.zip
• -q <output_name> <VNF_id>:
Parameters:
• -h, --help:
• -t HOST, --target=HOST:
When this parameter is used, getlogs command creates logs from the specified host(s) only. Possible
values are:
[host01,host02,host03]|all
When the -t HOST, --target=HOST parameter is not used then, the getlogs command gets
executed for all hosts, by default.
Note:
• System takes few minutes to create the logs. The logs are stored at /storage/logs/
<site_name>_hosts_<date>.zip.
• The main log zip file contains one or more zip files, with logs from specified hosts, and logs from
OAM VM (if it exists).
2.4 Performance check list on receiving outage calls for SBC media
Overview
The L4 support people need to answer following questions:
• Was any operation like upgrade, cut-over been executed on elements (not only SBC) before this issue
happened?
• view date
• view redundancy
• view resource usage (if there is free MCP and PIP resource)
• view ip if
• tipc-config -l
• ifconfig
• top
• pgrep mpu
Bz2 files, CLI logs, message files, syslog files, MEGACO and RTP
• Re-deploy
• Reproduction
• Emergency patch
• Official release
• Re-deployment
Overview
SBC alarms are of the following five severities:
• Critical alarms - Critical alarms are used to indicate that a severe, service-affecting condition has
occurred and that immediate corrective action is imperative, regardless of the time of day or day of the
week.
• Major alarms - Major alarms are used for conditions that indicate a serious disruption of service or the
malfunctioning or failure of important functions. These troubles require the immediate attention and
response of a crafts person to restore or maintain system capability. The urgency is less than in critical
situations because of a lesser immediate effect or impending effect on service or system performance.
• Minor alarms - Minor alarms are used for troubles that do not have a serious effect on service to
customers or for troubles in functions that are not essential to NE operation.
• Warning alarms - Warning alarms indicate the detection of a potential or impending service affecting
fault, before any significant effects are felt. Action should be taken to further diagnose (if necessary) and
correct the problem in order to prevent it from becoming a more serious service affecting fault.
• Indeterminate alarms - Indeterminate alarms are alarms whose severity level could not be determined.
2. Go to the Fault Management screen to get a listing of all active alarms. To navigate to Fault Management
screen, do one of the following:
• Or, click Dashboard, then click on alarms chart displayed for alarms of any Managed Elements in
Alarm Info Overview. Click Yes in View Alarm Confirmation pop-up screen.
Expected outcome
The application displays a tabular listing of all active alarms in Fault Management screen.
Expected outcome
The details of the alarm get displayed in View Alarm pop-up screen.
3.3 Viewing proposed repair actions for an active SBC alarm in Web UI
1. To view the details of a currently active alarm of a managed element, perform all the steps of Viewing
active SBC alarms in Web UI on page 29 procedure.
Expected outcome
The proposed repair actions for this alarm are displayed under Proposed Repair Actions section.
3.4 Viewing proposed repair actions for an active SBC alarm without using Web
UI
1. Determine the alarm name for the specific alarm which is being analyzed.
2. Search for (or look up) the alarm name in the SBC Alarms spreadsheet.
The SBC Alarms spreadsheet is available in Microsoft ® Excel format, on the Nokia Support portal. To
download the excel file, login to the Nokia Support portal and navigate to the Session Border Control
(SBC) page and then select Documentation: Doc Center. Search using keyword (Alarm) and locate the
Alarms spreadsheet for the appropriate release of the SBC application. This spreadsheet lists the SBC
alarms along with their associated attributes.
Note:
Specific alarms can also be located by searching the entire spreadsheet using the Additional text
field of the alarm or by using the Specific Problem field of the alarm.
3. The specific alarm's repair actions will be listed in the spreadsheet column labeled Action/trouble
resolution steps to be taken by maintenance personal.
Note:
You can also view other attributes of the alarm in the other columns of the spreadsheet.
cd /opt/sbc/CurrRel/apps/fault-mgt
./runFM.sh monitor
cd /opt/sbc/CurrRel/apps/fault-mgt
./runFM.sh stop
Note:
Once FM is stopped, it will be restarted after 30 secs automatically, irrespective of whether step
3 is performed or not.
cd /opt/sbc/CurrRel/apps/fault-mgt
./runFM.sh start
/opt/sbc/CurrRel/logs/fault-mgt*
5. By default, the FM log level is INFO. The user can change it to DEBUG for issue debugging if necessary as
shown below.
cd /opt/sbc/CurrRel/apps/fault-mgt/config
vi logback.xml
Save the modification. It will take effect after 30s or you can restart FM process to make it take effect
immediately.
• Port 162 is available: This port is used to receive alarms forwarded by MITrapreceiver.
3. When Port 8001 and Port 162 are unavailable, check which process is using them by executing below
commands (FM process requires to use these two ports):
ps -ef | grep DProcess
lsof -i:<port number>
3. If the issue still exists, please use the command below to catch the PCAP files. Then check the PCAP file
with Wireshark and make sure:
• Both SNMP get/get-next request and response are successfully sent and received for alarm re-
synchronization.
Please pay attention to the IP, port and community string. And it would be better to run the command
more than 9 minutes, because alarm re-synchronization will be run every 9 minute, by default.
3.8 Troubleshooting using help provided for Media and Signaling Alarms in Web
UI
Double-click on an alarm listed in the Fault Management screen and then navigate to Alarm Help tab.
Expected outcome
Description
The troubleshooting for this is same as the mentioned for Troubleshooting using help provided for Media and
Signaling Alarms in Web UI on page 32.
• Alarm resynchronization failure: Please refer to step 2 and 3 in the Troubleshooting when no Alarms are
present for Media/Signaling plane in Web UI on page 32 section.
• Trap burst: Too many incoming traps as a result of which the trap mapping internal queue size exceeds
threshold.
• Alarm Queue Overflow: Too Many Alarms need to be handled and internal alarm system event queue size
exceeds threshold.
Expected outcome
3. Check the OAM Configuration and OSS Configuration in the Configurations screen. Ensure that SNMP
setting are correct for both OAM and OSS.
4. If the issue still exists, use the command below to catch the PCAP files. Then analyze the PCAP file with
wireshark and make sure:
• Calls fail because the external DNS servers are unreachable and core element FQDNs can't be resolved.
To determine if you have problems with things not being routed correctly, you can check the master.log. Do a
grep in the master.log searching for KFEPH. For example:
<ibc02-s00c03h0:lss>/export/home/lss/logs:
In the above example the sending address (saddr) is 10.223.8.131 which is an IP address from the trusted
signaling subnet, and the destination address (daddr) is 10.10.120.168 which is a remote address used for
the E2 diameter connection to the CLF. Because the KFEPH report is printing, this indicates the route from our
trusted signaling subnet to the CLF is going through the FEPH untrusted signaling when it should not be. A
static route needs to be setup for this destination so that it goes through the trusted signaling subnet instead
of the default, untrusted signaling subnet.
The full report of the log shows the following. The last line starting with LOG_13 is saying that FEPH does not
have a trusted flow for this connection.
LOG_13: PKT: New outbound flow BEPH class or instance lookup failure
Connectivity Issues (Ping the P-CSCF published ip, not the FEPH published ip address)
After installing a system it is a good health check to verify that ip addresses are externally pingable. The FEPH
published ip address will not be pingable though, only the P-CSCF published ip address.
The possible causes include but are not limited to the following:
Expected outcome
In the output of the command, you can get the status of each interface (UP, DOWN, NO-CARRIER).
• UP: This is the correct status, which means that the interferce is working.
• DOWN: This status indicates that the configuration on either host or switch side is incorrect.
For example:
ethtool eno1
Expected outcome
4. Check the following parameters in the command output of the ethtool <NIC> command, and fix any
issue that can be identified.
• Supported link modes: This parameter shows the link modes supported by the interface.
• Advertised link modes: This parameter shows the link modes that are currently advertised to
the other end (switch or second host) of the NIC card.
• Link partner advertised link modes: This parameter shows the available modes advertised
by the other end of the NIC card.
• Speed: This parameter indicates the current link speed. In cases that cable is not working or
interface settings are incorrect, you may not get the expected speed or the speed may not match the
speed of the other end of the NIC card.
• Duplex: This parameter indicates the duplex of the current interface, and the duplex should be
Full. In cases that cable is not working or interface settings are incorrect, you may get incorrect
duplex.
• Auto-negotiation: This parameter indicates the current auto-negotiation status. Both ends of
the NIC card should have the same speed, duplex and auto-negotiation settings.
• Link detected: This parameter indicates whether the physical link of the NIC card has been
detected. If the status is no, it means that one of the interfaces is disconnected or the cable is not
working.
5. Ensure that your routing has correct gateways for each subnet. To check the routing settings, enter:
ip route show
Expected outcome
Description
chk_RCC_VMstates RCC FAILED and chk_RCC_VMstates VM FAILED are observed in the health check report.
The output of RCCcstat shows that the RCC Clusters are offline. The following entry can be found in the
rccout.log under /opt/RCC/var directory:
2018 Jul 20 06:43:44:626: ** lcm: Recovery Mode 2 Init Threshold Exceeded, Going
Offline-L:3, C:2001000, LFLG:3
The root cause of this problem is that the RCC has a threshold (3 times in 20 minutes) for VM reboots. Once the
SBC VMs are rebooted many times thereby exceeding the threshold, then the RCC on the VMs become offline.
Solution
Log in to the VMs (where the RCC is offline) and manually run the following command:
sudo RCCmachonline -u
Description
A warning is present in the health check report, as follows:
Running chk_REMstates to check REM states on hosts with REM-based services Checking
REM states of REM-controlled services WARNING: chk_REMstates: Shelf Slot Host T-
I-M (00 02 0 026-004-000) is WARNING: degraded and may cause an SU/Patch to fail
WARNING: To see which resources are degraded, WARNING: run "dumpcars -s -d 026 004
000"
This issue is caused by the mismatch in the database configuration and the service.
Solution
Reboot the active IMS cards where the degraded configuration issue occurs. The standby cards become active.
Then reboot the IMS cards that are presently active.
Description
The following errors are reported in the health check:
chk_db_repl FAILED
chk_database FAILED
The output of the mysqlrepl_adm --action check_health --db webnms command is:
This issue is caused by the corruption of master.info file present in /data0/db/webnms/ directory.
Solution
Manually re-synchronize the master.info file from the mate host, by running following command on the host
where the health check errors are reported:
Description
The PFW EIPM error is reported while performing health check.
The root cause of the issue is that the VLAN 2102 is missing from 6125XLG switch on the chassis, where fwp-a
VM is located.
4.9 Troubleshooting SBC signaling plane health check error due to rolling inits
of the h248ds under bgc1-a and bgc1-b
Description
This issue causes the health check to complain about chk_rem_sv_bld and chk_connectivity failures.
Both above issues are caused by rolling inits of the h248ds under bgc1-a and bgc1-b.
REMcli su 1 1 1
tail -f master.log
As per the log: bgc1-a goes SM_ACT and then 2 minutes later bgc1-b goes SM_ACT
Solution
Perform the following to recover from this issue:
• Perform a hard reboot of bgc1-a VNF and bgc1-b VNF on the cloud GUI.
After about 5 minutes the problem is resolved. The BGC VM is in a steady Active/Standby status. The
health check runs clean.
Note:
Even if the MI is active on oam-b on verifying using MIcmd state vc command, running MIcmd
switch vc command to force it to be active on oam-a doesn't resolve this issue.
Description
The issue occurs when media plane (MCM id 1) has a wrong image and VM will not come up in OpenStack
environment.
Solution
Perform the following steps to resolve this issue:
2. Run following command to recreate server from the appropriate media plane image:
For example:
Description
When SRIOV service has an issue in compute mode, and only a lower number of VFs (for example, 189 nos.) are
running instead of expected number of VFs (for example, 252 nos).
Solution
Perform the following steps to resolve this issue:
virsh list
3. Run the following command to find MAC addresses for all vNICs assigned to the VM:
4. Run the following command to get a list of all active SR-IOV VFs on the compute and to look for the VLAN
values and MAC addresses for a given VM:
7. Run following command to clear ERROR state (and then perform a hard reboot again):
• Execute ps -efa | grep java| grep Process to make sure all State-Management, Performance-
Management, Fault-Management, and web ui java processes are up and running.
• Execute ps -efa | grep ncagent to make sure netconf agent process is up and running.
All processes log file located here. You could grep ERROR or Exception for any issue.
• 500 Internal Server Error will provide you the detailed error message per Request URL
provided in the Error message. This will tell you which process is down. For example, Request
URL:https://135.252.144.22:8443/oam/pm/dashboard/counter indicates Performance
Management process did not respond to the Web UI request.
• The easiest way to solve this problem is to restart the OAM processes, as explained above.
Details
The EIPM reports error when external connectivity is lost.
1. Execute the ipm_cli -a dump -t status command, to check overall EIPM status.
2. Execute the ipm_cli -a dump -t shm command, to dump EIPM shared memory and then check the
detailed interface/subnet status.
3. Perform tcpdump on FW VM and check if the packets in question did arrive at FW or not.
4. Perform traceroute and check if the packets are lost by the routers.
Workaround
Execute the ip_adm/dns_adm/ntp_adm commands with either nohup or &, so that the commands keep
running in background and will not get timed out, thus avoiding this issue.
Description
After FRU installation/software upgrade is executed, the partitions tables on RMS hosts may differ from each
other. However, it is important that both the partition tables be exactly the same.
Solution
Perform the following steps to diagnose and fix this issue:
Result:
The output might be different, for the commands executed on the 2 different hosts. For example:
2. If different outputs are seen for the commands executed on the 2 different hosts, then execute the host
FRU replacement procedure for the host with msdos partition table.
Description
If restarting OAM processes does not solve the 500 internal server errors, one possible reason is there are
something wrong with DB content:
2. Check if MI processes are up and running. MIcmd state vc, MI service should be in A (active) state (if
MIcmd state vc reporting RCC error, please execute sudo RCCmachonline -u on each OAM host
first, then check MI service again).
3. If MI service is not in active state, then execute MIcmd start vc to start MI service.
4. After MI service get started successfully, restarting OAM processes could solve the DB content issue.
5. The Health tool also can be used to check the DB issue with health --test db to check if any db
issues are present.
Backup/restore
Following are the backup/restore methods:
• Using -v option on sbc_backup, sbc_restore_mp, and sbc_restore_all will provide additional output
helpful for debugging.
• These commands require password less access to the media plane (7510 MGW) through the root login.
If the -v output indicates an ssh failure you can confirm this by doing an ssh manually and observing
whether you can connect without being prompted for a password.
• The IBSU is much more reliable than Sim Based Software Upgrade (SBSU). However, in case of permanent
failure, disaster recovery is the only possible rollback when using IBSU.
• The resume process has been automated and extended. This makes system more resistant to unexpected
events.
• ok=<number> (for example, ok=19): This indicates that 19 tasks from playbooks have been executed
successfully.
• changed=<number> (for example, changed=3): This indicates that 3 tasks have changed something in the
existing configuration.
• unreachable=<number> (for example, unreachable=0): This indicates that 0 tasks failed due to
unreachable host.
• failed=<number> (for example, failed=0): This indicates that 0 tasks have failed.
• skipped=<number> (for example, skipped=12): This indicates that no change was required for 12 tasks.
• rescued=<number> (for example, rescued=2): This indicates that 2 tasks have failed, but ansible was able
to recover from the failure.
• ignored=<number> (for example, ignored=1): This indicates that 1 task has failed, however this failure
was expected.
Example 1:
…
2021-04-08 09:40:05,090 p=5871 u=root | TASK [sig-prep-host01 : Get md5 checksum
from downloaded file in /storage/guestdata/qcow2] ***
2021-04-08 09:40:05,169 p=5871 u=root | fatal: [localhost]:
FAILED! => {"changed": true, "cmd": "cat /storage/guestdata/qcow2/
sbc_signaling_jenkins_nokia-SBC_sig-RHEL7-R37.38.00.x86_64.qcow2.md5", "delta":
"0:00:00.002983", "end": "2021-04-08 09:40:05.157592", "failed": true, "rc": 1,
"start": "2021-04-08 09:40:05.154609", "stderr": "cat: /storage/guestdata/qcow2/
sbc_signaling_jenkins_nokia-SBC_sig-RHEL7-R37.38.00.x86_64.qcow2.md5: No such
file or directory","stdout": "", "stdout_lines": [], "warnings": []}
2021-04-08 09:40:05,169 p=5871 u=root | PLAY RECAP
*********************************************************************
2021-04-08 09:40:05,169 p=5871 u=root | localhost : ok=15 changed=8
unreachable=0 failed=1
In this example, the failure is caused by missing of MD5 file, and the failure appears in the sig-prep-host01
task.
Example 2:
…
2021-05-25 05:30:05,194 p=34592 u=root | TASK [close_lcm : lcm_perm --clean]
********************************************
2021-05-25 05:30:05,253 p=34592 u=root | fatal: [MI_F]: FAILED!
=> {"changed": false, "failed": true, "module_stderr": "Sorry,
user lcmadm is not allowed to execute '/bin/sh -c echo BECOME-
SUCCESSiphbymkjuxzxteazfmmcciytqrikgttu; LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8 /usr/bin/python' as root on rms09-oam-
a.\n", "module_stdout": "", "msg": "MODULE FAILURE", "parsed":
false} 2021-05-25 05:30:05,254 p=34592 u=root | PLAY RECAP
*********************************************************************
2021-05-25 05:30:05,254 p=34592 u=root | MI_F : ok=4 changed=1
unreachable=0 failed=1
2021-05-25 05:30:05,255 p=34592 u=root | localhost : ok=12 changed=0
unreachable=0 failed=0
In this example, the failure is caused because the lcmadm user tried to perform operations that can only be
done by the root user, and the failure appears in the close_lcm task.
• The user needs to have detailed SBC system knowledge for understanding the information contained in
InteractiveInstaller.sh.log
Sample output:
# LAST_ACTION_NR=6
# LAST_ACTION_NR=6
# LAST_ACTION_NR=1
# LAST_ACTION_NR=1
# LAST_ACTION_NR=2
# LAST_ACTION_NR=2
# LAST_ACTION_NR=3
# LAST_ACTION_NR=3
# LAST_ACTION_NR=3
Note:
Duplicate entries are expected in the output. This information is useful in case steps get executed in
incorrect sequence in the procedure (for example, as 1,2,3,2,6).
Sample output:
# LAST_ACTION_NR=1
ACTION_EXIT: This indicates the name of the failed step, and also the exact date of failure. After finding this
date and time of the failure, inspect other logs or check InteractiveInstaller.sh.log file for more
details.
• /storage/guestdata/log/sbc-su-log/InteractiveInstaller.sh.log
• /var/log/vcp.log
• /storage/pcs/log/update/pcs.log
• /var/opt/vcp/log/cm.log
• /var/opt/vcp/log/cm_healthchk.log
• /var/log/messages
Note: Detailed explanation of all problems that might occur during VMM-HI upgrade is out of scope of
this section.
6.4.1 Troubleshooting CBAM GUI upgrade/backout failure when sim.log indicates stoppage
at CP_WARNING VERIFY_SOFTWARE or COMMIT pause point
Details
1. If CBAM GUI upgrade/backout fails and sim.log indicates it stopped at expected pause point
(CP_WARNING,VERIFY_SOFTWARE or COMMIT), re-run upgrade/backout from CBAM GUI. If re-attempt
fails, please contact next level of support to resolve underlying issue.
2. However, if sim.log indicates failures, please contact next level of support to resolve underlying issue.
• /storage/guestdata/app_data/sbc-playbooks/sbc-installer-data.yml
• /storage/guestdata/tmp_data/sbc-playbooks/sbc-installer-su-data.yml
• In directory /storage/guestdata/log/sbc-su-log/
– sbc-su-status
– .hcStatus
– .media1Status
– .media1SuReqStatus
– .media2Status
– .media2SuReqStatus
– .mediaStatus
– .mediaSuReqStatus
– .sbcStatus
– .sigStatus
– .sigSuReqStatus
– .status_file
• /storage/guestdata/log/media-log/upgrade-ISSU-log/upgrade-
ISSU_default_<date>.log
• /storage/guestdata/log/media-log/upgrade-ISSU-log/upgrade-
ISSU_default_<date>.status
• /storage/guestdata/log/media-log/upgrade-ISSU-precheck-log/upgrade-ISSU-
precheck_default_<date>.log
• /storage/guestdata/log/media-log/upgrade-ISSU-precheck-log/upgrade-ISSU-
precheck_default_<date>.status
– nokia.vmmhi_upgrade_sig_<version>.sfx
– nokia.vmmhi_upgrade_mgw_<version>.sfx
• ERROR: Expecting exactly two sfx files in /storage/download dir - one for media
and one for signaling, but found:
• ERROR: Unexpected files found. This may indicate unfinished or broken VMM-HI
upgrade:
3. If there is no VMM-HI upgrade in progress and expected VMM-HI is installed then, remove
VCP.VM<version>.zip files from /storage/downloads/
4. Re-try stage.
• In case there is legitimate need to re-stage all software, execute the following command:
nokia.vmmhi_upgrade_sig_XX.YY.ZZ_ABC.sfx cleanup
Warning: Above command re-stages and replaces all software regardless of whether the
upgrade is in progress or not.
Sample use case: Use the above command to re-stage all software when one VM is not functioning
properly after stage (but before the upgrade has started). In this case staged files might be missing on
that VM.
16:07:15 start.start-sbc-imgsu
16:07:15 start.sig-pre-su-update
16:07:30 completed.sig-pre-su-update
16:07:30 completed.start-sbc-imgsu
16:09:53 start.sbc-imgsu-require-check
16:10:20 completed.sbc-imgsu-require-check
16:10:20 start.media-imgsu-precheck
16:12:15 completed.media-imgsu-precheck
16:12:15 start.sig-imgsu-start
16:12:19 start.sig_prep_config_start
16:12:19 completed.start.sig_prep_config
16:12:19 start.sig-pre-su-update
16:12:34 completed.sig-pre-su-update
16:12:34 start.sig-imgsu-deft-zip-check
16:12:44 completed.sig-imgsu-deft-zip-check
16:12:44 start.sbc-prep-su
16:12:48 completed.sbc-prep-su
16:12:48 start.sig-img-pre-su-play
16:30:05 completed.sig-img-pre-su-play
16:30:05 completed.sig-imgsu-start
16:30:05 start.sig-imgsu-create-vm-list
16:30:22 completed.sig-imgsu-create-vm-list
16:30:22 start.sig-imgsu-shutdown-sideB
16:30:55 completed.sig-imgsu-shutdown-sideB
16:30:55 start.sig-imgsu-updateSideB
16:51:08 completed.sig-imgsu-updateSideB
16:51:08 start.sig-imgsu-wait-update-B
17:10:48 completed.sig-imgsu-wait-update-B
17:10:48 start.sig-imgsu-active-sideB
17:15:20 completed.sig-imgsu-active-sideB
17:15:20 start.sig-imgsu-shutdown-sideA
17:17:32 completed.sig-imgsu-shutdown-sideA
17:17:32 start.sig-imgsu-updateSideA
17:36:46 completed.sig-imgsu-updateSideA
17:36:46 start.sig-imgsu-wait-update-A
17:59:47 completed.sig-imgsu-wait-update-A
17:59:47 start.sig-img-post-su-play
18:04:23 completed.sig-img-post-su-play
18:04:23 start.media-imgsu-upgrade
19:06:47 completed.media-imgsu-upgrade
19:06:47 completed.media-imgsu-upgrade
19:06:47 completed.media-upgrade-successfully
19:06:52 start.sig-imgsu-sideA-numa-alignment
19:21:08 completed.sig-imgsu-sideA-numa-alignment
19:21:08 start.sig-imgsu-health-check-wait-srv-ready-1
19:26:09 completed.sig-imgsu-health-check-wait-srv-ready-1
19:26:09 start.sig-imgsu-sideB-numa-alignment
19:40:21 completed.sig-imgsu-sideB-numa-alignment
19:40:21 start.sig-imgsu-health-check-wait-srv-ready-2
19:44:37 completed.sig-imgsu-health-check-wait-srv-ready-2
19:44:37 start.sig-img-post-su-update
19:45:58 completed.sig-img-post-su-update
19:45:58 start.sig-known-hosts-setup
19:46:29 completed.sig-known-hosts-setup
19:46:29 completed.sbc-su-successfully
However, consider another scenario where the SBC SU keeps failing at:
16:07:15 start.start-sbc-imgsu
16:07:15 start.sig-pre-su-update
16:07:30 failed.sig-pre-su-update
16:07:30 failed.start-sbc-imgsu
The start-sbc-imgsu is name of the main block, and the sig-pre-su-update is the sub-block. In this case it is
important to keep this sequence.
16:07:15 start.start-sbc-imgsu
16:07:15 start.sig-pre-su-update
16:07:30 completed.sig-pre-su-update
16:07:30 completed.start-sbc-imgsu
16:07:31 start.sbc-imgsu-require-check
16:07:31 failed.sbc-imgsu-require-check
The sbc-imgsu-require-check is the name of the main block and there are no sub-blocks in this case. Use with
caution.
Warning: The block execution sequence may vary depending on SBC version. It is recommended to
avail SBC expert's assistance.
Once sbc-su-status is updated, and all problems are removed, re-try SU. The nokia*sfx tool automatically
re-tries execution of block(s) marked as failed.
• /storage/downloads/sbc_hc/
• /storage/guestdata/app_data/sbc-playbooks/sbc_hc/
• /storage/guestdata/tmp_data/sbc-playbooks/sbc_hc/
./sbc_hc.py
Warning: Health check executes multiple actions simultaneously and may impact performance. It is
highly advised to run it during a maintenance window.
./sbc_hc.py -d
Sample outputs
******************************************************************
SUMMARY FROM HEALTHCHECK
******************************************************************
NOTICE: sbc_hc finished at 08:45:05, 08/24/2021
NOTICE: debug file: /tmp/sbc_hc.dbg
******************************************************************
HEALTHCHECK COMPLETED WITH STATUS: ERROR!
See output of
/storage/guestdata/app_data/sbc-playbooks/sbc_hc/sbc_hc.py -A -f /var/tmp/sbc_hc/
hc_log/sbc_hc.7.xml
for details.
******************************************************************
Once the health check completes execution, it provides details about the logs for decoding. In the above
example it is:
******************************************************************
HEALTHCHECK COMPLETED WITH STATUS: ERROR!
See output of
/storage/guestdata/app_data/sbc-playbooks/sbc_hc/sbc_hc.py -A -f /var/tmp/sbc_hc/
hc_log/sbc_hc.7.xml
for details.
******************************************************************
If more detailed information is needed about the detected errors, add -V option at the end, and redirect
output to a file. For example:
/storage/guestdata/app_data/sbc-playbooks/sbc_hc/sbc_hc.py -A -f /var/tmp/sbc_hc/
hc_log/sbc_hc.7.xml -V > /tmp/yourfilename.txt
mkdir -p /storage/mnt
cd /storage/guestdata/qcow2
file=<expected_qcow_version>
Where, the expected_qcow_version is the version of qcow that is desired after procedure is
completed. For example:
file=nokia-SBC_sig-RHEL7-R37.38.06.0400.x86_64.qcow2
cp -p ${file} ${file}.orig
cp -p ${file}.md5 ${file}.md5.orig
umount /storage/mnt/
3. Confirm that the changes are correctly applied by executing the following command:
Expected result:
retry = 2400
4. After confirming that the fix is visible, umount the qcow, and retry failed procedure by executing following
commands:
umount /storage/mnt/
For more information on media plane commands, see The 7510 Border Gateway (BGW) CE/SE Commands
Reference Guide for appropriate release.
Example
$ ssh root@xxx.yyy.zzz.6
2. Log in from OAM server to the media plane by performing SSH to media plane and log in as diag. No
password is needed while you log in.
Example
# ssh diag@xxx.yyy.zzz.6
view date
view node
7. Collect the NPU statistics. (Commands should be run on active PIM appl)
8. Collect the NPU IO trace. (Commands should be run on active PIM appl)
NPU IO trace is a feature to capture the packets in/out through the NPU. It is used to identify if the
packets are received or sent out. It is limited to use for single calls or under the very few call loads.
Note: If you save the NPU IO trace file, it will be generated on vSCM and stored in the /
export/home/lss/logs/ SBC directory.
9. For the media-aware calls (the vMCM application is involved), execute the following command to capture
NPU IO trace (another NPU IO trace, different with the NPU IO in the previous chapter), DSP IO trace and
DSP SDK message.
------------------------------------------------------------------------------------
DspFwdCtrlPktIotraceOn --trace ctrl packets in DSP PKT FWD.
DspFwdDataPktIotraceOn --trace data packets in DSP PKT FWD.
DspFwdAllPktIotraceOn --trace ctrl and data packets in DSP PKT FWD.
DspFwdAllPktIotraceOff --stop io-trace in DSP PKT FWD.
diag define dsp io-trace enable mp.mcm1.1.1 ( the media port mp.mcm1.1.1 is get
from the output of view h248 context <ctx-id>)
d. List trace bin files on vSCM and get these files out.
ls *.bin
7.3.1 Scenario-1
view ip if
view route table
If you cannot reach a remote host using 'Ping', please always ping it again with source ip address
specified.
On the PIM application, need to specific the source ip address, vlan-id (should be the same vlan is as
source ip address) and vlan-priority (usually 0).
Note:
In the check point below, always take this ping CLI for instance, so please adjust the destination
and source IP with specified value as required.
Note:
<ge-port> indicates the number of GE port but start from 0 to 7 here. For example, specify 0
here if want to do npu io trace on 1st GE en.pim1.1.
My-Chassis:rem-cons:ACT-PIM:1.19(r0)>=7:diag:main#
7.3.2 Scenario-2
view ip if
view ipv6 route
If you cannot reach a remote host using Ping, please always ping it again with source ip address
specified.
On the PIM application, need to specific the source ip address, vlan-id (should be the same vlan is as
source ip address) and vlan-priority (usually 0).
Note:
In the check point below, always take this ping CLI for instance, so please adjust the destination
and source IP with specified value as required.
Note:
<ge-port> indicates the number of GE port but start from 0 to 7 here. For example, specify 0
here if want to do npu io trace on 1st GE en.pim1.1.
My-Chassis:rem-cons:ACT-PIM:1.19(r0)>=7:diag:main#
10. Ping and capture wireshark on external switch by port mirror, if possible.
view node
view redundancy group
view version
view date
view alarm active
view alarm history
view overload status
view resource usage
view performance statistics current mcp
view performance statistics history mcp
diag view rmgr statistics
view h248 errlog sum
view h248 errlog history
Note:
Note:
The H248 message file captured is placed under the directory: /export/home/lss/logs/
Examples:
Note:
The log file captured is named as "EN_10_1.CAP" and placed under the directory: /export/
home/lss/logs/
Description
The Backplane connection down alarm is displayed in NetAct (or can be viewed as output of view alarm
active CLI command).
Solution
The reason for this problem is the SRIOV interface missing to attach to the VM. This can be verified as follows:
virsh list
3. Run the following command to find MAC addresses for all vNICs assigned to the VM:
4. Run the following command to get a list of all active SR-IOV VFs on the compute and to look for the VLAN
values and MAC addresses for a given VM:
This problem cannot be resolved by rebooting VMs from SBC side, but can only be resolved by hard rebooting
from CBIS side, by executing following commands:
2. Run following command to clear ERROR state (and then perform a hard reboot again):
For example,
For more information about backplane connection, see SBC Media Plane Cloud Alarm Report Guide. Also see
Syslog Commands in the SBC Media Plane Cloud Commands Reference Guide to obtain the required logs.
For more information about saving the DPC (Data Path Check) log entries, see the topic Defining the Data Path
Check Savemode in the SBC Media Plane Cloud Commands Reference Guide. The DPC file contains readme
that further describes how to understand the DPC files.
For example,
The state must be consistent with the command diag view backplane link. It can be used to check
backplane connection status while debugging, and cross verify whether the defect is in the software.
For example,
This command along with the active alarm or alarm history helps to debug backplane issues.
For example,
For example,
Normally, when there is a Voice Port failure on both PIMs on RMS, media RMS is not able to process calls.
But when the links are restored, the RMS system gets back to normal. However, in some cases even when all
the other connections are restored, PIMs do not recover automatically and manual action has to be taken to
recover the PIMs. The SBCSST-468/SBCCGART-260 feature is implemented to address this issue.
With the implementation of this feature from Release 20.5, after the connections are restored, the RMS
system will recover automatically and start to process calls. Depending on the failure sequences (link up/down
versus STB/ACT PIM), during system recovery, there may be PIM reboots or longer synchronization time than
in normal case. Once the links are up for at least one PIM and PIM redundancy group is at A-Work state, traffic
should be back to normal without any manual intervention.
You can use the following commands to check current status of the RMS system:
• To check the current state of the applications, use the view nodes command.
• To check the current state of the redundancy groups, use the view redundancy groups command.
When you check the state of the redundancy groups, if the redundancy group for PIM is A-Bulk, wait for it
to change to the A-Work state. When the redundancy group for PIM is A-Work, and the links are up, the RMS
system should have recovered at this point. In cases that the traffic is still not processed, do the following:
In cases that IPv6 interfaces are up, but the tentative flag is on, the tentative flag can be cleared manually.
You can use the diag define ipv6 netif reset all false command on PIM to clear the
tentative flag for all links that are up.
For example:
Expected outcome
If VLANs are defined on the RMS system, they should be listed in the command output. If VLANs are not
listed in the command output, you can trigger the voice port initialization manually.
5. To trigger the voice port initialization manually, enter the following command on each PIM where GE port
is down:
diag linux ip init voice-port <GE>
The root cause of the problem is that the RTP packets for the call are not arriving at the SBC media plane
interface, or packets are arriving but are dropped or discarded by the SBC media plane.
• A packet loss: In this case, it means that the RTP packets for the call do not reach the SBC media plane
interface and the packet can get lost anywhere in the network before coming to the SBC media plane. In
this case, the problem is outside of media plane.
• A packet drop: Invalid packets will be dropped. It means that there is something wrong with the packet,
which can be caused by any kind of packet proccesing error including transmission errors where a packet
is damaged on its way to its destination, format errors where the format of the packet is not what the
receiving device expects, or property errors where the packet has incorrect properties (for example,
wrong MAC addresses). The packet drop can also be caused by problems with SBC media plane interface
(for example, DPDK interface is not working).
• A packet discard: It means that the packet itself seems to be correct, but it is still discarded, which can be
caused due to some properties, like violation of SDR/PDR limits.
To solve the problem, you need to identify where the fault occurs first. For SBC media plane, the mute call issue
comes down to two different situations:
• The packets get lost on their way to the SBC media plane.
In this case, the packets may have been damaged on their way through the network before they arrive at
SBC media plane.
This mainly occurs when SBC media plane sends less packets than it receives.
Make a test call, and then do the following to get more information to identify where the fault occurs:
1. Ensure that there is no connectivity issue and all basic health checks of the system are performed.
For example:
For more information, see Troubleshooting the media path problem on page 61 and
Troubleshooting the networking problem on page 66.
The statistics can be collected per interface, per realm or on global level. Use the command which is
appropriate for your case.
For example:
Expected outcome
The following is a sample fragment output of the view traffic statistics current global
command after making calls with netem simulation from outside. You can check the Discarded
Packets, RTP packet loss number avg, and RTP package loss number max parameters to
identify whether the issue is with SBC media plane. If discarded packets are observed in statistics like pdr
rate violations or sdr rate violations, the problem is in configuration on signaling side.
--- Locally measured minimum, maximum and average values per interval ----
------------ Locally measured jitter values for each level (ms) ----------
Jitter Range j (ms) | Nbr Terminations | % of all Terminations
0 <= j < 1 | 24 | 100.000
1 <= j < 2 | 0 | 0.000
2 <= j < 3 | 0 | 0.000
3 <= j < 4 | 0 | 0.000
4 <= j < 5 | 0 | 0.000
5 <= j < 6 | 0 | 0.000
6 <= j < 7 | 0 | 0.000
7 <= j < 8 | 0 | 0.000
8 <= j < 9 | 0 | 0.000
9 <= j < 10 | 0 | 0.000
10 <= j | 0 | 0.000
Unknown | 0 | 0.000
Nbr Lost Pkts per Termination| Nbr Terminations | % of all Terminations
70-79: | 2 | 8.333
80-89: | 3 | 12.500
>=90: | 19 | 79.167
…
5. There are several reasons for which packets are dropped, a major reason being malformed packets which
causes packet drop in an application or the NPU-DPDK. If most of the packets from one stream are
dropped, then the packets are not delivered to the end user which causes one side mute call. Collect the
NPU datapath statistics on PIM application using the following commands:
By checking the output of these commands, you can verify the cause of possible packet drop or discard.
The following example displays the output for diag run npu DpdkEthStat command. The packets
are checked by trusted PIM and are dropped later. The illegal packet information are taken from the
statistics. The statistics gives details of the error descriptions and the number of packets that are
dropped. The drop packets on port 1 are for core and for port 2 are for access, however the port details
are configurable.
8 11 11 0 0 0
***8 5 5 0 0 0 0 0 0 0 0 0
9 7 7 0 0 0
***9 2 2 0 0 0 0 0 0 0 0 0
10 3 3 0 0 0
Port Name tx_prep_drop(appl) tx_prep_drop(dpdk)
2 eth3 190 33
vCPU tx_prep_drop(appl) Tx301 Tx401 Tx501 Tx601
***vCPU tx_prep_drop(dpdk) Tx101 Tx102 Tx201 Tx701 Tx702 Tx703 Tx704 Tx801
Tx802 Tx901
8 19 19 0 0 0
***8 3 3 0 0 0 0 0 0 0 0 0
9 17 17 0 0 0
***9 3 3 0 0 0 0 0 0 0 0 0
10 13 13 0 0 0
14 15 15 0 0 0
6. Collect all of the packet captures and check the subtract and summary output information.
Note: If you need to inspect the subtract output from H.248, run MEGACO catpture on SCM.
The following example shows some H.248 MEGACO subtract replay information:
C=23{S=ip/1/1/45{SA{nt/dur=20045,nt/os=164340,nt/or=150300,rtp/ps=913,rtp/
pr=835,
rtp/pl=8.461538,rtp/jit=1,rtp/delay=0,rtp/dur=20045,rtp/os=164340,rtp/
or=150300,tmanr/
dp=0}},S=ip/1/0/46{SA{nt/dur=20043,nt/os=150300,nt/or=164340,rtp/ps=835,rtp/
pr=913,
rtp/pl=8.708709,rtp/jit=1,rtp/delay=0,rtp/dur=20043,rtp/os=150300,rtp/
or=164340,tmanr/
dp=0}}}}
7. If needed, contact Nokia support team with the information collected to help solve the problem.
Details
Nokia SBC siglinag log records only the details regarding service change to a specific termination and context,
and does not record details regarding service change when it is for all context or to null context.
Workaround
Media plane service change event details is pegged in the corresponding PM counter's (VS.mgcForcedServChg)
data. Refer to PM counter's data for information regarding all context or null context service change event.
Details
If high CPU usage issue is detected on MCM, SCM, or Packet Interface Module (PIM) (for example, the ‘MCM CPU
Overload’ media alarm is raised) then, use the cpu_usage.sh tool described below for collecting the required
data for troubleshooting the issue.
Collect the required data by running the cpu_usage.sh tool, and forward the same to the appropriate Nokia
SBC support team, to help investigate the high CPU usage issue.
• The cpu_usage.sh tool collects data that can help in finding the root cause of the high CPU usage.
• The cpu_usage.sh tool invokes the sar, top, ps, and perf record commands, and saves their output
in respective files (sar.txt,top.txt {no threads-mode}, topH.txt {threads-mode; top command was run with
-H parameter}, ps.txt, and perf.data respectively) in a folder, named with timestamp (for example, /opt/
v7510/data/cpu_usage/cpu_usage-yyyy-Mon-dd_hh-mm-ss/)
• Run in foreground: The cpu_usage.sh tool immediately collects data for the component on which the
command is run (MCM, SCM, or PIM).
• Run in background: The cpu_usage.sh tool monitors the alarms for the component on which the
command is run (MCM, SCM, or PIM), and collects data when an alarm is active.
The cpu_usage.sh tool, can be run in the foreground mode as shown below:
/opt/v7510/bin/cpu_usage.sh
• --run_perf_record
To collect additional data using perf record command. The perf record command is a performance
analysis tool for Linux.
• --perf_time <perf_time>
• --process <process_name>
To specify for what process perf record data is collected (can select one of the processes from: mcm,
scm, pim, nodemgr, mpu, or npu-dpdk).
Note: For more information about the cpu_usage.sh tool's parameter options, see the help by
running:
/opt/v7510/bin/cpu_usage.sh --help
Examples:
• /opt/v7510/bin/cpu_usage.sh
Runs sar, top, ps commands for 60 seconds, and saves their outputs in files.
• /opt/v7510/bin/cpu_usage.sh --run_perf_record
Runs sar, top, ps commands. Additionally runs perf record command for 5 minutes for the most CPU
consuming process from the list (mcm, mpu, scm, pim, nodemgr, or npu-dpdk).
Runs sar, top, ps commands. Additionally runs perf record command for 5 minutes for the mpu
process.
Runs sar, top, ps commands. Additionally runs perf record command for 2 minutes for the most CPU
consuming process from the list (mcm, mpu, scm, pim, nodemgr, or npu-dpdk).
Runs sar, top, ps commands. Additionally runs perf record command for 3 minutes for the mcm
process.
Output example:
To run the cpu_usage.sh tool in background mode, use --background option while running the tool, as
shown below:
• --key_path <key_path>
To run the cpu_usage.sh tool on MCM or PIM, additionally use --key_path option, and provide full
path to SCM SSH key.
• --time <time>
Range: 1 to 60 minutes.
• --interval <interval>
• --alarm_name <alarm_name>
To specify which alarm should trigger data collection. Provide a string (which can be a substring of the
specific problem description) as <alarm_name>.
• --stop
Note: For more information about the cpu_usage.sh tool's parameter options, see the help by
running:
/opt/v7510/bin/cpu_usage.sh --help
Examples:
• /opt/v7510/bin/cpu_usage.sh --background
To monitor the alarms for the component on which the cpu_usage.sh tool is run (for example, MCM).
Checks alarms every 60 seconds, during 5 minutes interval, and collects data when alarm(s) is fired.
To monitor alarms every 60 seconds, for 10 minutes, and collect data when alarm(s) is fired.
To monitor alarms every 30 seconds, for 5 minutes, and collect data when alarm(s) is fired.
To monitor alarms every 5 seconds, for 2 minutes, and collect data when alarm(s) is fired.
To monitors alarms for the component on which the cpu_usage.sh tool is run (every 60 seconds, for 5
minutes) and collect data when MCM CPU Overload alarm is raised.
To monitors alarms for the component on which the cpu_usage.sh tool is run (every 60 seconds, for 5
minutes) and collect data when alarm with CPU string, in specific problem description, is raised.
To monitors alarms for the component on which the cpu_usage.sh tool is run (every 120 seconds, for
12 minutes) and collect data when MCM CPU Overload alarm is raised.
• /opt/v7510/bin/cpu_usage.sh --stop
To stop the background execution of cpu_usage.sh tool, which was started earlier to monitor alarms.
Additional information:
/opt/v7510/data/cpu_usage/monitor_alarms.log
Check this file to see what alarms were detected, and where the collected data is stored.
• If the alarm history (on diag) shows that an alarm was active, but the cpu_usage.sh tool has failed to
detect the alarm then, the <interval> value is too big. Restart the tool after modifying the<interval>
to a smaller value. The cpu_usage.sh tool now checks for alarms more frequently and will detect the
active alarm.
– after <time> minutes (for example, log message: “2 minutes elapsed. Exiting…”),
– after the alarm is cleared (log message: “the alarm has been cleared. Exiting…”),
– when the “--stop” parameter is used (log message: “Got signal 2”).
Output example:
Details
If high CPU usage issue is detected on MCM, SCM, or Packet Interface Module (PIM) due to rngd.service
then, change start parameters on all active/standby VMs (of SCM , PIM, and MCM, as need be):
ExecStart=/sbin/rngd -x 5 -W 2048 -f
# systemctl daemon-reload
8 Appendix
http://ihgpweb.ih.lucent.com/~rgs/logs/Verizon_Video_Launch_Training/t_t_video_0329_1938.log
Note: This log was obtained by having the IMS/NGSS service set to log level 3 and the H248DS service set to
log level 4. Test was done using Exfo endpoints in the VTC demo lab, wgw03.
Originating INVITE
Setup H248 bearer context for Originator, Add Request, and Add Reply
The h248ds of the BGC VM sets up the bearer context using Megaco protocol and talking to the SCM on the
7510 gateway. The gateway chosen by the h248ds is based on which gateway is least busy.
H248Dds sends an Add Instruction with dollar signs, $, filled in by the gateway. Here we see it is being sent to
gateway 1024 - which is the first vmg on the first gateway.
_MsgGwS
Sending to GW 1024 (2620:0:60:8ae::310a:2944) encoded message of size
1092:
!/2 [2620:0:60:8ae::3138]:2944
T=2287{C=${A=ip/1/$/${M{ST=1{O{MO=IN,tman/sdr=375,ds/dscp=11,ipdc/
realm="untrustedWGW1"},L{
H248ds is asking for gateway to provide a Context ID, C=$, and termination ID, A=ip/1/$$, from the
untrustedWGW1 realm. Mode is set to Inactive, MO = IN
Since call originated on IPV4, need an IPV4 address on gateways untrustedWGW1 realm for audio stream
m=audio $ RTP/SAVP -^M Need port that gateway is using on untrustedWGW1 realm for audio stream
},R{ - below is the information that came in on the INVITE for the audio stream
v=0^M
c=IN IP4 10.254.254.152^M
m=audio 40000 RTP/SAVP -^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:Ka5hgdls9GyycMbokKmWe/
LywG8bf5qYrf5vcZtN^M
Below is asking for an IP and port on the gateway for video stream of call
}},ST=2{O{MO=IN,tman/sdr=2875,ds/dscp=11,ipdc/realm="untrustedWGW1"},L{
v=0^M
c=IN IP4 $^M
m=video $ RTP/SAVP -^M
b=AS:23^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:hvFznJ/
Zztu0+XN2VCgXEPEHcIVAOxbUZNX9Gvmo^M
},R{ And here is the info that came in on the INVITE for the video stream
v=0^M
c=IN IP4 10.254.254.152^M
m=video 40002 RTP/SAVP -^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80
inline:wsT7I0XUjzlAQaoGKTqwv7WisnURzfSrZaGq1RJF^M
}}},E=3{hangterm/thb{timerx=435},adid/ipstop{dt=18,dir=IN}}},A=ip/1/$/${M{ST=1{
Below info is to secure resources for the core side of the call - core realm is trustedWGW1
O{MO=IN,tman/sdr=9000,ds/dscp=11,ipdc/realm="trustedWGW1"},L{
v=0^M
c=IN IP6 $^M - Core side is always IPV6 so asking for an IPV6 address on trustedWGW1 realm for audio
stream
m=audio $ RTP/AVP -^M Need port that gateway is using on untrustedWGW1 realm for audio stream
b=AS:72^M - here the full bandwidth is requested on core side, includes padding for RTCP
}},ST=2{O{MO=IN,tman/sdr=62500,ds/dscp=11,ipdc/realm="trustedWGW1"},L{
v=0^M
c=IN IP6 $^M
m=video $ RTP/AVP -^M
b=AS:500^M
}}},E=4{adid/ipstop{dt=18,dir=IN}}}}}
_MsgE
^M
Next you see the gateway respond with all the $ filled in
_MsgGwR
H.248 message from Gateway (1024),size 860
!/2 [2620:0:60:8ae::310A]:2944 P=2287{C=9379{A=ip/1/4/92762{M{ST=1{L{v=0^M
Context 9379 is used by the gateway for the originating side of the call, below are the details for Stream 1,
ST=1 - audio
b=AS:3^M
m=audio 21090 RTP/SAVP -^M gateway is using port 21090 for audio
c=IN IP4 135.104.226.231^M IPV4 address of the untrustedWGW1 realm - pim 1.5
b=AS:23^M
m=video 15894 RTP/SAVP -^M gateway is using port 15894 for video
Now the details for the core side connection for Stream 1, ST=1, audio
m=audio 17782 RTP/AVP -^M Using port 17782 for core side audio
}},ST=2{L{v=0^M
Now the details for the core side connection for Stream 2, ST=2, video
Key thing to notice here is that the Call-ID has been changed to an internal one - beginning with LU-.
And then the actual sending of the message from the pcscf core side, port 5060 to the CFED. Its internal
messaging at this point. 169.254.253.0 is the cfed.
branch=z9hG4bKa6585cde7e806133016714324cc7cc7f56f94374-6-5e85-56fad98b17afd290^M
Via: SIP/2.0/UDP 127.0.0.1;branch=z9hG4bK_003_871753-140260329231468^M
Via: SIP/2.0/TCP [2620:0:60:8ae::310f]:5060;received=2620:0:60:8ae::310f;
branch=z9hG4bKce6e68b983f42566788088c11acec63656fab0fd-6-20-56fad98b5e43!
20e^M
Via: SIP/2.0/UDP 127.0.0.1;branch=z9hG4bK_001_935-140249571855084^M
Via: SIP/2.0/UDP 169.254.218.0:5060;received=169.254.218.0;
branch=z9hG4bK933213abb24ae6f625e8053c65e0e4f056fab106-0-111-56fad98b5d814d1^M
Via: SIP/2.0/UDP 127.0.0.1;branch=z9hG4bK_002_1459280267-97786-1-LucentPCSF^M
Supported: 100rel^M
Call-ID: LU-145928026797740-1@imsgroup0-000.wgw03.vtc-sru-bg.ims.net^M
CSeq: 1 INVITE^M
From: <sip:+19992501000@ims.net>;tag=56fab106-56fad98b5d376c2-mw-po-
lucentPCSF-000002^M
To: <sip:+19992504000@ims.net>^M
Next you will see it delivered to the untrusted side of the PCSCF, pcsf-tls1.
Setup H248 bearer context for Terminator, Add Request, and Add Reply
Here you see the Add request being done for the terminator. Notice that the IP address of the terminator, nor
the audio or video ports are included. That's because the terminator has not answered yet. But what we do
have is the IP address of the core connection of the originator. That information will be used to connect the
core originating info with the core terminating info Notice also this went to gateway 1025, the 2nd vmg on
gateway 1.
_MsgGwS
Sending to GW 1025 (2620:0:60:8ae::310a:2945) encoded message of size
939:
!/2 [2620:0:60:8ae::3139]:2945
T=2258{C=${A=ip/1/$/${M{ST=1{O{MO=IN,tman/sdr=7875,ds/dscp=11,ipdc/
realm="untrustedWGW1"},L{
v=0^M
c=IN IP4 $^M
c=IN IP6 $^M - Request IPV6 core connection on terminating side for the audio stream
Below is the IPV6 information for the core side audio stream on the originator side
v=0^M
c=IN IP6 2620:0:60:8af::320a^M
m=audio 17782 RTP/AVP -^M
}},ST=2{O{MO=IN,tman/sdr=3000,ds/dscp=11,ipdc/realm="trustedWGW1"},L{
v=0^M
c=IN IP6 $^M Request IPV6 core connection on terminating side for the video stream
Below is the IPV6 information for the core side video stream on the originator side
v=0^M
c=IN IP6 2620:0:60:8af::320a^M
m=video 17464 RTP/AVP -^M
}}},E=4{adid/ipstop{dt=18,dir=IN}}}}}K{2257}
_MsgE
^M
_MsgGwR
H.248 message from Gateway (1025),size 711
!/2 [2620::60:8ae:0:0:0:310a]:2945 P=2258{C=9380{A=ip/1/4/62834{M{ST=1{L{v=0^M
c=IN IP4 135.104.226.231^M
b=AS:63^M
m=audio 34062 RTP/SAVP -^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80
inline:pq8loNWjOv0MdclvLXz1C6Y4H3B7rUp0XE8CfGuK^M
}},ST=2{L{v=0^M
c=IN IP4 135.104.226.231^M
b=AS:483^M
m=video 37330 RTP/SAVP -^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:UipQ0pMomW6lVNWMHSYta8IdGPp9gCmVv
+kAYwsl^M
}}}},A=ip/1/0/78241{M{ST=1{L{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
b=AS:4^M
m=audio 45482 RTP/AVP -^M
},R{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
m=audio 17782 RTP/AVP -^M
}},ST=2{L{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
b=AS:24^M
m=video 38988 RTP/AVP -^M
},R{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
m=video 17464 RTP/AVP -^M
}}}}}}
_MsgE
t=0 0^M
m=audio 60000 RTP/SAVP 110^M Terminator Port 6000 for audio , SAVP is for SRTP, 110 is the codec
m=video 60002 RTP/SAVP 101^M Terminator Port 6002 for video , SAVP is for SRTP, 101 is the codec
a=fmtp:101 profile-level-id=42000a^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:m5MYzlpLB0u48Pzae15yGI7/
YSMtU41tixmCzMKP^M
Master.log analysis and the drawing of context details at various points of a call.
For debugging of RTP issues, it is quite valuable to turn H248DS log level to MEDIUM and capture these "from
Gateway" messages at the completion of the call test:
With no transcoding, the packets-received from access should roughly equal packets-sent on core side, and
vice-versa. In this example there was some issue, and the access side packets-sent is zero even though the
core side packets-received was 386.
MEGACO message at the end of a failed call may have an error reason found in the Access Border
Troubleshooting Ideas.doc Appendix 1. In this case, error 430 is "Unknown TerminationID". A likely cause is
that the signaling plane does not have the realms or IPDC GW variant option set correctly. Or it could be the
H248DSs were not re-initialized following the signaling plane provisioning:
Another common error seen in response for Add request is the ER=500 with Return Code -1551
(RC_VM_NH_MAC_NOT_YET_RESOLVED). This error usually indicates a routing issue either on the 7510 PIM or
on the external router. 7510 alarms may also accompany these errors, for the specific PIM involved. Return
codes can be found in Appendix 3 of the "Access Border Troubleshooting Ideas.doc". Excerpt captured from
trace:
Below is the Modify, identified by MF=, for the terminating context 9380
_MsgGwS
Sending to GW 1025 (2620:0:60:8ae::310a:2945) encoded message of size 1239:
!/2 [2620:0:60:8ae::3139]:2945
T=2259{C=9380{MF=ip/1/4/62834{M{ST=1{O{MO=SR,tman/sdr=7875,ds/dscp=1d,ipdc/
realm="untrustedWGW1"},L{
Notice the mode, is set to send/receive, MO=SR. This is an important piece of info if you ever have one way
media, make sure after the 200-OK of INVITE your mode gets changed to SR in the Modify
v=0^M
c=IN IP4 135.104.226.231^M
m=audio 34062 RTP/SAVP -^M
b=AS:63^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80
inline:pq8loNWjOv0MdclvLXz1C6Y4H3B7rUp0XE8CfGuK^M
},R{
v=0^M
c=IN IP4 10.254.254.152^M
m=audio 60000 RTP/SAVP -^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80
inline:hogyxzI58W3psCKLYpidL4xIlS7yakA3jBVBqQyQ^M
}},ST=2{O{MO=SR,tman/sdr=60375,ds/dscp=1d,ipdc/realm="untrustedWGW1"},L{
v=0^M
c=IN IP4 135.104.226.231^M
m=video 37330 RTP/SAVP -^M
b=AS:483^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:UipQ0pMomW6lVNWMHSYta8IdGPp9gCmVv
+kAYwsl^M
},R{
v=0^M
c=IN IP4 10.254.254.152^M
m=video 60002 RTP/SAVP -^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:m5MYzlpLB0u48Pzae15yGI7/
YSMtU41tixmCzMKP^M
}}}},MF=ip/1/0/78241{M{ST=1{O{MO=SR,tman/sdr=9000,ds/dscp=1d,ipdc/
realm="trustedWGW1"},L{
v=0^M
c=IN IP6 2620:0:60:8af::320a^M
m=audio 45482 RTP/AVP -^M
b=AS:72^M
},R{
v=0^M
c=IN IP6 2620:0:60:8af::320a^M
m=audio 17782 RTP/AVP -^M
}},ST=2{O{MO=SR,tman/sdr=62500,ds/dscp=1d,ipdc/realm="trustedWGW1"},L{
v=0^M
c=IN IP6 2620:0:60:8af::320a^M
m=video 38988 RTP/AVP -^M
b=AS:500^M
},R{
v=0^M
c=IN IP6 2620:0:60:8af::320a^M
m=video 17464 RTP/AVP -^M
}}}}}}K{2258}
_MsgE
_MsgGwR
H.248 message from Gateway (1025),size 1003
!/2 [2620::60:8ae:0:0:0:310a]:2945 P=2259{C=9380{MF=ip/1/4/62834{M{ST=1{L{v=0^M
c=IN IP4 135.104.226.231^M
b=AS:63^M
m=audio 34062 RTP/SAVP -^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80
inline:pq8loNWjOv0MdclvLXz1C6Y4H3B7rUp0XE8CfGuK^M
},R{v=0^M
c=IN IP4 10.254.254.152^M
m=audio 60000 RTP/SAVP -^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80
inline:hogyxzI58W3psCKLYpidL4xIlS7yakA3jBVBqQyQ^M
}},ST=2{L{v=0^M
c=IN IP4 135.104.226.231^M
b=AS:483^M
m=video 37330 RTP/SAVP -^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:UipQ0pMomW6lVNWMHSYta8IdGPp9gCmVv
+kAYwsl^M
},R{v=0^M
c=IN IP4 10.254.254.152^M
m=video 60002 RTP/SAVP -^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:m5MYzlpLB0u48Pzae15yGI7/
YSMtU41tixmCzMKP^M
}}}},MF=ip/1/0/78241{M{ST=1{L{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
b=AS:72^M
m=audio 45482 RTP/AVP -^M
},R{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
m=audio 17782 RTP/AVP -^M
}},ST=2{L{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
b=AS:500^M
m=video 38988 RTP/AVP -^M
},R{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
m=video 17464 RTP/AVP -^M
}}}}}}
_MsgE
Context diagram now looks as follows. Notice terminating side has mode set to Send/Receive. Originating side
is still set to Inactive.
_MsgGwS
Sending to GW 1024 (2620:0:60:8ae::310a:2944) encoded message of size
1233:
!/2 [2620:0:60:8ae::3138]:2944
T=2288{C=9379{MF=ip/1/4/92762{M{ST=1{O{MO=SR,tman/sdr=7875,ds/dscp=1d,ipdc/
realm="untrustedWGW1"},L{
Context 9379 is the originator associated with gateway 1024. Notice also the mode is send/receive. Need this
for 2 way media path.
v=0^M
c=IN IP4 135.104.226.231^M
m=audio 21090 RTP/SAVP -^M
b=AS:63^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:AW0KR3MtBe4/yhWJSvDpHdns+kfGuAiH
+UXaBn4p^M
},R{
v=0^M
Below is doing the modify of Stream 1, ST=1, audio, on the trustedWGW1 realm
v=0^M
c=IN IP6 2620:0:60:8af::320a^M IP address and port of the originating core audio stream connection
And below is the IP address and port of the terminating core audio stream connection
v=0^M
c=IN IP6 2620:0:60:8af::320a^M
m=audio 45482 RTP/AVP -^M
}},ST=2{O{MO=SR,tman/sdr=62500,ds/dscp=1d,ipdc/realm="trustedWGW1"},L{
v=0^M
c=IN IP6 2620:0:60:8af::320a^M
m=video 17464 RTP/AVP -^M
b=AS:500^M
},R{
v=0^M
c=IN IP6 2620:0:60:8af::320a^M
The gateway makes the change, linking the 2 contexts together, and sends back a modify response
_MsgGwR
H.248 message from Gateway (1024),size 1000
!/2 [2620:0:60:8ae::310A]:2944 P=2288{C=9379{MF=ip/1/4/92762{M{ST=1{L{v=0^M
c=IN IP4 135.104.226.231^M
b=AS:63^M
m=audio 21090 RTP/SAVP -^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:AW0KR3MtBe4/yhWJSvDpHdns+kfGuAiH
+UXaBn4p^M
},R{v=0^M
c=IN IP4 10.254.254.152^M
m=audio 40000 RTP/SAVP -^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:Ka5hgdls9GyycMbokKmWe/
LywG8bf5qYrf5vcZtN^M
}},ST=2{L{v=0^M
c=IN IP4 135.104.226.231^M
b=AS:483^M
m=video 15894 RTP/SAVP -^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:hvFznJ/
Zztu0+XN2VCgXEPEHcIVAOxbUZNX9Gvmo^M
},R{v=0^M
c=IN IP4 10.254.254.152^M
m=video 40002 RTP/SAVP -^M
a=crypto:1 AES_CM_128_HMAC_SHA1_80
inline:wsT7I0XUjzlAQaoGKTqwv7WisnURzfSrZaGq1RJF^M
}}}},MF=ip/1/0/100655{M{ST=1{L{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
b=AS:72^M
m=audio 17782 RTP/AVP -^M
},R{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
m=audio 45482 RTP/AVP -^M
}},ST=2{L{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
b=AS:500^M
m=video 17464 RTP/AVP -^M
},R{v=0^M
c=IN IP6 2620::60:8AF:0:0:0:320A^M
m=video 38988 RTP/AVP -^M
}}}}}}
_MsgE
At this point there should be a 2 way path with audio and video. Context diagram looks as follows:
BYE processing
When the call ends you will see either a BYE from the originator/terminator and then the appropriate BYE/200-
OK processing. In our example the originator hung up first and sends the BYE.
H.248 Tear down call contexts with Subtract IP, Subtract Response
You will see Subtracts for both contexts, followed by Subtract Replies. Notice that the Subtract Replies from
the gateway have statistics for jitter, delay, etc. This is due to the VTC demo lab having a Quality of Service
statistics feature turned on. This feature is also turned on at Westlake. The production sites do not have this.
_MsgGwS
Sending to GW 1024 (2620:0:60:8ae::310a:2944) encoded message of size 65:
!/2 [2620:0:60:8ae::3138]:2944
T=2298{C=9379{S=*{AT{SA}}}}K{2297}
_MsgE
_MsgGwS
Sending to GW 1025 (2620:0:60:8ae::310a:2945) encoded message of size 58:
!/2 [2620:0:60:8ae::3139]:2945
T=2264{C=9380{S=*{AT{SA}}}}
_MsgGwR
H.248 message from Gateway (1024),size 396
!/2 [2620:0:60:8ae::310A]:2944 P=2298{C=9379{S=ip/1/4/92762{SA{nt/dur=32086,
nt/os=526970,nt/or=529590,rtp/ps=5539,rtp/pr=5570,rtp/pl=0.000000,rt!
p/jit=0,rtp/delay=0,rtp/dur=32086,rtp/os=526970,rtp/or=529590,tmanr/dp=0}},
S=ip/1/0/100655{SA{nt/dur=32090,nt/os=473050,nt/or=470998,rtp/ps=5570,
rtp/pr=5542,rtp/pl=0.000000,rtp/jit=0,rtp/delay=0,rtp/dur=32090,rtp/os=473050,
rtp/or=470998,tmanr/dp=0}}}}
_MsgE
_MsgGwR
H.248 message from Gateway (1025),size 399
!/2 [2620::60:8ae:0:0:0:310a]:2945 P=2264{C=9380{S=ip/1/4/62834{SA{nt/dur=31751,
nt/os=529590,nt/or=529854,rtp/ps=5570,rtp/pr=5573,rtp/pl=0.000000,rtp/jit=0,
rtp/delay=0,rtp/dur=31751,rtp/os=529590,rtp/or=529854,tmanr/dp=0}},
S=ip/1/0/78241{SA{nt/dur=31755,nt/os=473256,nt/or=473050,rtp/ps=5573,
rtp/pr=5570,rtp/pl=0.000000,rtp/jit=0,rtp/delay=0,rtp/dur=31755,rtp/os=473256,
rtp/or=473050,tmanr/dp=0}}}}
_MsgE
8.2.1 Identifying currently isolated cores and verifying whether they match with expected
test conguration
1. Execute following command from host01 for viewing current host configuration:
Note:
Expected outcome
Sample host configuration summary for various configurations are listed below.
Figure 3: Traditional RMS with small PIM (11 core) with SC2 VM with 'PF_mode_support: True' conguration
Note:
The PIM cores and now FW/PFW cores are isolated (because PF_mode_support: True). However,
when PF_mode_support: No, then only cores 14-23 are isolated.
Note:
The above configurations are driven by the contents of OLV.zip file (FI Worksheet) and by the number of
SC VM pairs. They are classified as:
2. Verify the cpu isolation configuration that will be deployed. This can be done in one of the 2 ways
described below:
Note:
Any time a user changes a configuration from any of the above listed configurations, the user
should perform the following steps before trying to deploy the modified configuration.
./sbc-cpu-isolate.sh
cd /storage/guestdata/app_data/sbc-playbooks/
./sig_prep_config.sh
Verify the values of cpu_isolate and stacked_RMS fields in the sig_prep_config.yml and
sig_config.yml files generated under the /storage/guestdata/app_data/sbc-playbooks/
group_vars/all/ directory are appropriate.
3. In case of switching from PF mode to VF mode, make sure that the RMS hosts are correctly cabled.