Alarms and KPIs
Alarms and KPIs
Alarms and KPIs
Signaling Router
Alarms and KPIs
Release 8.5.0.1
F37854-02
May 2021
Oracle Communications Diameter Signaling Router Alarms and KPIs, Release 8.5.0.1
F37854-02
This software and related documentation are provided under a license agreement containing restrictions on
use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your
license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license,
transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse
engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is
prohibited.
The information contained herein is subject to change without notice and is not warranted to be error-free. If
you find any errors, please report them to us in writing.
If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on
behalf of the U.S. Government, then the following notice is applicable:
U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software,
any programs embedded, installed or activated on delivered hardware, and modifications of such programs)
and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government
end users are "commercial computer software" or "commercial computer software documentation" pursuant
to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such,
the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works,
and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs
embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle
computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the
license contained in the applicable contract. The terms governing the U.S. Government’s use of Oracle cloud
services are defined by the applicable contract for such services. No other rights are granted to the U.S.
Government.
This software or hardware is developed for general use in a variety of information management applications.
It is not developed or intended for use in any inherently dangerous applications, including applications that
may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you
shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its
safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this
software or hardware in dangerous applications.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of
their respective owners.
Intel and Intel Inside are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are
used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Epyc,
and the AMD logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered
trademark of The Open Group.
This software or hardware and documentation may provide access to or information about content, products,
and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly
disclaim all warranties of any kind with respect to third-party content, products, and services unless otherwise
set forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not
be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content,
products, or services, except as set forth in an applicable agreement between you and Oracle.
Contents
1 Introduction
Revision History 1-1
Overview 1-1
Scope and Audience 1-2
Manual Organization 1-2
My Oracle Support 1-2
iii
Active Tasks elements 2-20
Deleting a task 2-20
Deleting all completed tasks 2-21
Cancelling a running or paused task 2-21
Pausing a task 2-21
Restarting a task 2-22
Active Tasks report elements 2-22
Generating an active task report 2-23
Scheduled Tasks 2-23
Scheduled Tasks Elements 2-24
Editing a Scheduled Task 2-24
Deleting a Scheduled Task 2-24
Generating a Scheduled Task Report 2-24
iv
10008 - Database Provisioning Manually Disabled 3-27
10009 - Config and Prov DB Not Yet Synchronized 3-28
10010 - Stateful DB from Mate Not Yet Synchronized 3-29
10011 - Cannot Monitor Table 3-29
10012 - Table Change Responder Failed 3-30
10013 - Application Restart in Progress 3-30
10020 - Backup Failure 3-31
10050 - Resource Audit Failure 3-31
10051 - Route Deployment Failed 3-32
10052 - Route Discovery Failed 3-33
10053 - Route Deployment Failed - No Available Device 3-33
10054 - Device Deployment Failed 3-34
10055 - Device Discovery Failed 3-35
10073 - Server Group Max Allowed HA Role Warning 3-36
10074 - Standby Server Degraded While Mate Server Stabilizes 3-37
10075 - Application Processes Have Been Manually Stopped 3-37
10078 - Application Not Restarted on Standby Server Due to Disabled Failure
Cleanup Mode 3-38
10100 - Log Export Started 3-38
10101 - Log Export Successful 3-39
10102 - Log Export Failed 3-39
10103 - Log Export Already in Progress 3-40
10104 - Log Export File Transfer Failed 3-40
10105 - Log Export Cancelled - User Request 3-41
10106 - Log Export Cancelled - Duplicate Request 3-42
10107 - Log Export Cancelled - Queue Full 3-42
10108 - Duplicate Scheduled Log Export Task 3-43
10109 - Log Export Queue is Full 3-44
10110 - Certificate About to Expire 3-44
10111 - Certificate Expired 3-46
10112 - Certificate Cannot be Used 3-47
10115 - Health Check Started 3-49
10116 - Health Check Successful 3-49
10117 - Health Check Failed 3-50
10118 - Health Check Not Run 3-50
10120 - Server Group Upgrade Started 3-51
10121 - Server Group Upgrade Cancelled - Validation Failed 3-51
10122 - Server Group Upgrade Successful 3-52
10123 - Server Group Upgrade Failed 3-52
10124 - Server Group Upgrade Cancelled - User Request 3-53
10125 - Server Group Upgrade Failed 3-53
v
10130 - Server Upgrade Started 3-54
10131 - Server Upgrade Cancelled 3-54
10132 - Server Upgrade Successful 3-55
10133 - Server Upgrade Failed 3-55
10134 - Server Upgrade Failed 3-56
10140 - Site Upgrade Started 3-58
10141 - Site Upgrade Cancelled 3-59
10142 - Site Upgrade Successful 3-59
10143 - Site Upgrade Failed 3-60
10144 - Site Upgrade Cancelled - User Request 3-60
10145 - Site Upgrade Failed 3-61
10151 - Login Successful 3-61
10152 - Login Failed 3-62
10153 - Logout Successful 3-62
10154 - User Account Disabled 3-63
10155 - SAML Login Successful 3-63
10156 - SAML Login Failed 3-64
10200 - Remote Database Reinitialization in Progress 3-64
10300 - SNMP Trapping Not Configured 3-65
IDIH (11500-11549) 3-65
11500 - Tracing Suspended 3-65
11501 - Trace Throttling Active 3-66
11502 - Troubleshooting Trace Started 3-66
11503 - Troubleshooting Trace Stopped 3-67
11506 - Invalid IDIH-Trace AVP 3-67
11507 - Unable to Run Network Trace at This Site 3-68
11508 - Network Trace Configuration Error 3-68
11509 - Site Trace Configuration Error 3-69
11510 - Network Trace Activation Error 3-69
11511 - Invalid DIH HostName 3-70
SDS (14000-14999) 3-70
14100 - Interface Disabled 3-71
14101 - No Remote Connections 3-71
14102 - Connection Failed 3-72
14103 - Both Port Identical 3-72
14120 - Connection Established 3-73
14121 - Connection Terminated 3-73
14122 - Connection Denied 3-74
14140 - Import Throttled 3-74
14150 - Import Initialization Failed 3-75
14151 - Import Generation Failed 3-75
vi
14152 - Import Transfer Failed 3-76
14153 - Export Initialization Failed 3-76
14154 - Export Generation Failed 3-77
14155 - Export Transfer Failed 3-77
14160 - Import Operation Completed 3-78
14161 - Export Operation Completed 3-78
14170 - Remote Audit Started and In Progress 3-79
14171 - Remote Audit Aborted 3-79
14172 - Remote Audit Failed to Complete 3-80
14173 - Remote Audit Completed 3-80
14174 - NPA Split Pending Request Deleted 3-81
14175 - NPA Split Activation Failed 3-81
14176 - NPA Split Started and Is Active 3-82
14177 - NPA Split Completion Failed 3-82
14178 - NPA Split Completed 3-83
14179 - MSISDN Deleted From Blacklist 3-83
14180 - IMSI Deleted from Blacklist 3-84
14188 - PdbRelay Not Connected 3-84
14189 - PdbRelay Time Lag 3-85
14198 - ProvDbException 3-85
14200 - DP Stack Event Queue Utilization 3-86
14301- ERA Responder Failed 3-87
SS7/Sigtran (19200-19299) 3-87
19200 - RSP/Destination Unavailable 3-87
19201 - RSP/Destination Route Unavailable 3-88
19202 - Linkset Unavailable 3-89
19203 - Link Unavailable 3-90
19204 - Preferred Route Unavailable 3-90
19205 - TFP Received 3-91
19206 - TFA Received 3-92
19207 - TFR Received 3-92
19208 - TFC Received 3-93
19209 - M3RL Routing Error 3-93
19210 - M3RL Routing Error - Invalid NI 3-94
19211 - M3RL Routing Error - Invalid SI 3-95
19217 - Node Isolated - All Links Down 3-96
19226 - Timed Out Waiting for ASP-UP-ACK 3-96
19227 - Received Unsolicited ASP-DOWN-ACK 3-97
19229 - Timed Out Waiting for ASP-ACTIVE-ACK 3-98
19230 - Received Unsolicited ASP-INACTIVE-ACK 3-98
19231 - Received Invalid M3UA Message 3-99
vii
19233 - Failed to Send Non-DATA Message 3-100
19234 - Local Link Maintenance State Change 3-101
19235 - Received M3UA Error 3-101
19240 - Remote SCCP Subsystem Prohibited 3-103
19241 - SCCP Malformed or Unsupported Message 3-104
19242 - SCCP Hop Counter Violation 3-104
19243 - SCCP Routing Failure 3-105
19244 - SCCP Routing Failure Network Status 3-106
19245 - SCCP GTT Failure 3-107
19246 - Local SCCP Subsystem Prohibited 3-107
19248 - SCCP Segmentation Failure 3-108
19249 - SCCP Reassembly Failure 3-109
19250 - SS7 Process CPU Utilization 3-109
19251 - Ingress Message Rate 3-110
19252 - PDU Buffer Pool Utilization 3-111
19253 - SCCP Stack Event Queue Utilization 3-112
19254 - M3RL Stack Event Queue Utilization 3-113
19255 - M3RL Network Management Event Queue Utilization 3-114
19256 - M3UA Stack Event Queue Utilization 3-114
19258 - SCTP Aggregate Egress Queue Utilization 3-115
19259 - Operation Discarded Due to Local Resource Limitation 3-116
19260 - Transaction Could Not be Delivered to Remote TCAP Peer Due to
Conditions in the Network 3-117
19262 - Operation Discarded Due to Malformed Component Received from
Remote TCAP Peer 3-117
19263 - Transaction Discarded Due to Malformed Dialogue Message Received
from Local TC User 3-118
19264 - Transaction Discarded Due to Malformed Dialogue Message from
Remote TCAP Peer 3-119
19265 - Unexpected Event Received from Local TC User 3-119
19266 - Unexpected Event Received from Remote TCAP Peer 3-120
19267 - Dialogue Removed by Dialogue Cleanup Timer 3-121
19268 - Operation Removed by Invocation Timer Expiry 3-121
19269 - Dialogue Aborted by Remote TCAP Peer 3-122
19270 - Received Unsupported TCAP Message 3-123
19271 - Operation Rejected by Remote TCAP Peer 3-124
19272 - TCAP Active Dialogue Utilization 3-124
19273 - TCAP Active Operation Utilization 3-125
19274 - TCAP Stack Event Queue Utilization 3-126
19275 - Return Error from Remote TCAP Peer 3-126
19276 - SCCP Egress Message Rate 3-127
19281 - TCAP Routing Failure 3-128
viii
Transport Manager (19400-19419) 3-128
19400 - Transport Down 3-128
19401 - Failed to Configure Transport 3-130
19402 - Failed to Connect Transport 3-130
19403 - Received Malformed SCTP Message (Invalid Length) 3-131
19404 - Far-End Closed the Transport 3-132
19405 - Transport Closed Due to Lack of Response 3-133
19406 - Local Transport Maintenance State Change 3-133
19407 - Failed to Send Transport DATA Message 3-134
19408 - Single Transport Egress-Queue Utilization 3-135
19409 - Message Rejected by ACL Filtering 3-136
19410 - Adjacent Node IP Address State Change 3-136
19411 - SCTP Transport Closed Due to Failure of Multi-Homing Validation 3-137
19412 - SCTP Transport Configuration Mismatched for Adjacent Node IP 3-138
19413 - SCTP Transport Closed Due to Unsupported Peer Address Event
Received 3-138
Communication Agent, ComAgent (19420-19909) 3-139
19420 - BDFQFull - Broadcast Data Framework Work Queue Full 3-139
19421 - BDFThrotl - Broadcast Data Framework Throttle Traffic 3-140
19422 - BDFInvalidPkt - Broadcast Data Framework Invalid Corrupt StackEvent 3-140
19800 - Communication Agent Connection Down 3-141
19801 - Communication Agent Connection Locally Blocked 3-142
19802 - Communication Agent Connection Remotely Blocked 3-143
19803 - Communication Agent Stack Event Queue Utilization 3-145
19804 - Communication Agent configured connection waiting for remote client
to establish connection 3-146
19805 - Communication Agent Failed To Align Connection 3-148
19806 - Communication Agent CommMessage Mempool Utilization 3-149
19807 - Communication Agent User Data FIFO Queue Utilization 3-150
19808 - Communication Agent Connection FIFO Queue utilization 3-151
19810 - Communication Agent Egress Message Discarded 3-152
19811 - Communication Agent Ingress Message Discarded 3-153
19814 - Communication Agent Peer has not responded to heartbeat 3-154
19816 - Communication Agent Connection State Changed 3-154
19817 - Communication Agent DB Responder detected a change in
configurable control option parameter 3-155
19818 - Communication Agent DataEvent Mempool utilization 3-156
19820 - Communication Agent Routed Service Unavailable 3-156
19821 - Communication Agent Routed Service Degraded 3-157
19822 - Communication Agent Routed Service Congested 3-158
19823 - Communication Agent Routed Service Using Low-Priority Connection
Group 3-159
ix
19824 - Communication Agent Pending Transaction Utilization 3-160
19825 - Communication Agent Transaction Failure Rate 3-162
19826 - Communication Agent Connection Congested 3-163
19827 - SMS stack event queue utilization 3-164
19830 - Communication Agent Service Registration State Change 3-165
19831 - Communication Agent Service Operational State Changed 3-165
19832 - Communication Agent Reliable Transaction Failed 3-166
19833 - Communication Agent Service Egress Message Discarded 3-167
19842 - Communication Agent Resource-Provider Registered 3-167
19843 - Communication Agent Resource-Provider Resource State Changed 3-168
19844 - Communication Agent Resource-Provider Stale Status Received 3-168
19845 - Communication Agent Resource-Provider Deregistered 3-169
19846 - Communication Agent Resource Degraded 3-169
19847 - Communication Agent Resource Unavailable 3-170
19848 - Communication Agent Resource Error 3-171
19850 - Communication Agent Resource-User Registered 3-171
19851 - Communication Agent Resource-User Deregistered 3-172
19852 - Communication Agent Resource Routing State Changed 3-172
19853 - Communication Agent Resource Egress Message Discarded 3-173
19854 - Communication Agent Resource-Provider Tracking Table Audit Results 3-174
19855 - Communication Agent Resource Has Multiple Actives 3-174
19856 - Communication Agent Service Provider Registration State Changed 3-175
19857 - Communication Agent Service Provider Operational State Changed 3-175
19858 - Communication Agent Connection Rejected 3-176
19860 - Communication Agent Configuration Daemon Table Monitoring Failure 3-176
19861 - Communication Agent Configuration Daemon Script Failure 3-178
19862 - Communication Agent Ingress Stack Event Rate 3-179
19863 - Communication Agent Max Connections Limit In Connection Group
Reached 3-179
19864 - ComAgent Successfully Set Host Server Hardware Profile 3-180
19865 - ComAgent Failed to Set Host Server Hardware Profile 3-180
19866 - Communication Agent Peer Group Status Changed 3-181
19867 - Communication Agent Peer Group Egress Message Discarded 3-181
19868 - Communication Agent Connection Rejected - Incompatible Network 3-182
19900 - Process CPU Utilization 3-183
19901 - CFG-DB Validation Error 3-184
19902 - CFG-DB Update Failure 3-184
19903 - CFG-DB post-update Error 3-185
19904 - CFG-DB Post-Update Failure 3-186
19905 - Measurement Initialization Failure 3-187
Diameter Signaling Router (DSR) Diagnostics (19910-19999) 3-188
x
19910 - Message Discarded at Test Connection 3-188
19911 - Test message discarded 3-188
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999,
25600-25899) 3-189
8000 - MpEvFsmException 3-189
8000 - 001 - MpEvFsmException_SocketFailure 3-189
8000 - 002 - MpEvFsmException_BindFailure 3-190
8000 - 003 - MpEvFsmException_OptionFailure 3-190
8000 - 004 - MpEvFsmException_AcceptorCongested 3-191
8000 - 101 - MpEvFsmException_ListenFailure 3-191
8000 - 102 - MpEvFsmException_PeerDisconnected 3-192
8000 - 103 - MpEvFsmException_PeerUnreachable 3-192
8000 - 104 - MpEvFsmException_CexFailure 3-193
8000 - 105 - MpEvFsmException_CerTimeout 3-194
8000 - 106 - MpEvFsmException_AuthenticationFailure 3-194
8000 - 201 - MpEvFsmException_UdpSocketLimit 3-195
8001 - MpEvException 3-195
8001 - 001 - MpEvException_Oversubscribed 3-195
8002 - MpEvRxException 3-196
8002 - 001 - MpEvRxException_DiamMsgPoolCongested 3-196
8002 - 002 - MpEvRxException_MaxMpsExceeded 3-196
8002 - 003 - MpEvRxException_CpuCongested 3-197
8002 - 004 - MpEvRxException_SigEvPoolCongested 3-198
8002 - 005 - MpEvRxException_DstMpUnknown 3-198
8002 - 006 - MpEvRxException_DstMpCongested 3-199
8002 - 007 - MpEvRxException_DrlReqQueueCongested 3-199
8002 - 008 - MpEvRxException_DrlAnsQueueCongested 3-200
8002 - 009 - MpEvRxException_ComAgentCongested 3-200
8002 - 201 - MpEvRxException_MsgMalformed 3-201
8002 - 202 - MpEvRxException_PeerUnknown 3-201
8002 - 203 - MpEvRxException_RadiusMsgPoolCongested 3-202
8002 - 204 - MpEvRxException_ItrPoolCongested 3-203
8002 - 205 - MpEvRxException_RclRxTaskQueueCongested 3-203
8002 - 206 - MpEvRxException_RclSigEvPoolCongested 3-204
8002 - 207 - MpEvRxException_ReqDuplicate 3-205
8002 - 208 - MpEvRxException_SharedSecretUnavailable 3-206
8003 - MpEvTxException 3-206
8003 - 001 - MpEvTxException_ConnUnknown 3-206
8003 - 101 - MpEvTxException_DclTxTaskQueueCongested 3-207
8003 - 201 - MpEvTxException_RclTxTaskQueueCongested 3-207
8003 - 202 - MpEvTxException_EtrPoolCongested 3-208
xi
8003 - 203 - MpEvTxException_RadiusMsgPoolCongested 3-209
8003 - 204 - MpEvTxException_RadiusIdPoolCongested 3-209
8003 - 205 - MpEvTxException_SharedSecretUnavailable 3-210
8004 - EvFsmAdState 3-211
8004 - 001 - EvFsmAdState_StateChange 3-211
8005 - EvFsmOpState 3-211
8005 - 001 - EvFsmOpState_StateChange 3-211
8006 - EvFsmException 3-212
8006 - 001 - EvFsmException_DnsFailure 3-212
8006 - 002 - EvFsmException_ConnReleased 3-213
8006 - 101 - EvFsmException_SocketFailure 3-213
8006 - 102 - EvFsmException_BindFailure 3-214
8006 - 103 - EvFsmException_OptionFailure 3-215
8006 - 104 - EvFsmException_ConnectFailure 3-215
8006 - 105 - EvFsmException_PeerDisconnected 3-216
8006 - 106 - EvFsmException_PeerUnreachable 3-216
8006 - 107 - EvFsmException_CexFailure 3-217
8006 - 108 - EvFsmException_CeaTimeout 3-217
8006 - 109 - EvFsmException_DwaTimeout 3-218
8006 - 110 - EvFsmException_DwaTimeout 3-218
8006 - 111 - EvFsmException_ProvingFailure 3-219
8006 - 112 - EvFsmException_WatchdogFailure 3-219
8006 - 113 - EvFsmException_AuthenticationFailure 3-220
8007 - EvException 3-221
8007 - 101 - EvException_MsgPriorityFailure 3-221
8008 - EvRxException 3-221
8008 - 001 - EvRxException_MaxMpsExceeded 3-221
8008 - 101 - EvRxException_MsgMalformed 3-222
8008 - 102 - EvRxException_MsgInvalid 3-222
8008 - 201 - EvRxException_SharedSecretUnavailable 3-223
8008 - 202 - EvRxException_MsgAttrLenUnsupported 3-223
8008 - 203 - EvRxException_MsgTypeUnsupported 3-224
8008 - 204 - EvRxException_AnsOrphaned 3-224
8008 - 205 - EvRxException_AccessAuthMissing 3-225
8008 - 206 - EvRxException_StatusAuthMissing 3-225
8008 - 207 - EvRxException_MsgAuthInvalid 3-226
8008 - 208 - EvRxException_ReqAuthInvalid 3-226
8008 - 209 - EvRxException_AnsAuthInvalid 3-227
8008 - 210 - EvRxException_MsgAttrAstUnsupported 3-227
8008 - 212 - EvRxException_MsgTypeMissingMccs 3-228
8008 - 213 - EvRxException_ConnUnavailable 3-228
xii
8009 - EvTxException 3-229
8009 - 001 - EvTxException_ConnUnavailable 3-229
8009 - 101 - EvTxException_DclTxConnQueueCongested 3-229
8009 - 102 - EvTxException_DtlsMsgOversized 3-230
8009 - 201 - EvTxException_MsgAttrLenUnsupported 3-230
8009 - 202 - EvTxException_MsgTypeUnsupported 3-231
8009 - 203 - EvTxException_MsgLenInvalid 3-232
8009 - 204 - EvTxException_ReqOnServerConn 3-232
8009 - 205 - EvTxException_AnsOnClientConn 3-233
8009 - 206 - EvTxException_DiamMsgMisrouted 3-233
8009 - 207 - EvTxException_ReqDuplicate 3-234
8009 - 208 - EvTxException_WriteFailure 3-234
8010 - MpIngressDrop 3-235
8011 - EcRate 3-236
8012 - MpRxNgnPsOfferedRate 3-237
8013 - MpNgnPsStateMismatch 3-238
8014 - MpNgnPsDrop 3-238
8015 - NgnPsMsgMisrouted 3-239
8016 - MpP16StateMismatch 3-240
8017 - MpTaskCpuCongested 3-241
8018 - P16MsgMisrouted 3-241
8019 - MpAnswerPriorityModeMismatch 3-242
8020 - MpRoutingThreadPoolStateMismatch 3-243
8100 - NormMsgMisrouted 3-243
8101 - DiagMsgMisrouted 3-244
8200 - MpRadiusMsgPoolCongested 3-244
8201 - RclRxTaskQueueCongested 3-245
8202 - RclItrPoolCongested 3-245
8203 - RclTxTaskQueueCongested 3-246
8204 - RclEtrPoolCongested 3-247
8205 - RadiusXactionFail 3-248
8206 - MpRxRadiusAllLen 3-248
8207 - MpRadiusKeyError 3-249
22001 - Message Decoding Failure 3-249
22002 - Peer Routing Rules with Same Priority 3-250
22003 - Application ID Mismatch with Peer 3-251
22004 - Maximum pending transactions allowed exceeded 3-251
22005 - No peer routing rule found 3-252
22007 - Inconsistent Application ID Lists from a Peer 3-253
22008 - Orphan Answer Response Received 3-254
22009 - Application Routing Rules with Same Priority 3-255
xiii
22010 - Specified DAS Route List not provisioned 3-255
22012 - Specified MCCS not provisioned 3-256
22013 - DAS Peer Number of Retransmits Exceeded for Copy 3-256
22014 - No DAS Route List specified 3-257
22016 - Peer Node Alarm Aggregation Threshold 3-258
22017 - Route List Alarm Aggregation Threshold 3-259
22018 - Maintenance Leader HA Notification to go Active 3-260
22019 - Maintenance Leader HA Notification to go OOS 3-260
22020 - Copy Message size exceeded the system configured size limit 3-261
22021 - Debug Routing Info AVP Enabled 3-261
22022 - Forwarding Loop Detected 3-262
22051 - Peer Unavailable 3-263
22052 - Peer Degraded 3-264
22053 - Route List Unavailable 3-265
22054 - Route List Degraded 3-266
22055 - Non-Preferred Route Group in Use 3-267
22056 - Connection Admin State Inconsistency Exists 3-268
22057 - ETG Rate Limit Degraded 3-269
22058 - ETG Pending Transaction Limit Degraded 3-270
22059 - Egress Throttle Group Message Rate Congestion Level changed 3-271
22060 - Egress Throttle Group Pending Transaction Limit Congestion Level
changed 3-272
22061 - Egress Throttle Group Monitoring stopped 3-272
22062 - Actual Host Name cannot be determined for Topology Hiding 3-273
22063 - Diameter Max Message Size Limit Exceeded 3-274
22064 - Upon receiving Redirect Host Notification the Request has not been
submitted for re-routing 3-274
22065 - Upon receiving Redirect Realm Notification the Request has not been
submitted for re-routing 3-275
22066 - ETG-ETL Scope Inconsistency 3-275
22067 - ETL-ETG Invalid Association 3-276
22068 - TtpEvDoicException 3-277
22068 - 001 - TtpEvDoicException: DOIC OC-Supported-Features AVP not
received 3-277
22068 - 002 - TtpEvDoicException: DOIC OC-Feature-Vector AVP contains
an invalid value 3-277
22068 - 003 - TtpEvDoicException: DOIC OC-Report-Type AVP contains an
unsupported value 3-278
22068 - 004 - TtpEvDoicException: DOIC OC-Sequence-Number AVP
contains an out of order sequence number 3-278
22068 - 005 - TtpEvDoicException: DOIC OC-Reduction-Percentage AVP
contains an invalid value 3-279
xiv
22068 - 006 - TtpEvDoicException: DOIC OC-Validity-Duration AVP
contains an invalid value 3-280
22069 - TtpEvDoicOlr 3-280
22069 - 001 - TtpEvDoicOlr: Valid DOIC OLR Applied to TTP 3-280
22070 - TtpEvDegraded 3-281
22070 - 001 - TtpEvDegraded: TTP Degraded, Peer Overload 3-281
22070 - 002 - TtpEvDegraded: TTP Degraded, Peer Overload Recovery 3-281
22070 - 003 - TtpEvDegraded: TTP Degraded, Static Rate Limit Exceeded 3-282
22071 - TtgEvLossChg 3-282
22071 - 001 - TtgEvLossChg: TTG Loss Percent Changed 3-282
22072 - TTP Degraded 3-283
22073 - TTP Throttling Stopped 3-283
22074 - TTP Maximum Loss Percentage Threshold Exceeded 3-284
22075 - Message is not routed to Application 3-284
22076 - TTG Maximum Loss Percentage Threshold Exceeded 3-285
22077 - Excessive Request Reroute Threshold Exceeded 3-286
22078 - Loop or Maximum Depth Exceeded in ART or PRT Search 3-287
22082 - RouteList is not Provisioned in System Options 3-287
22101 - Connection Unavailable 3-288
22102 - Connection Degraded 3-289
22103 - SCTP Connection Impaired 3-292
22104 - SCTP Peer is Operating with a Reduced IP Address Set 3-293
22105 - Connection Transmit Congestion 3-294
22106 - Ingress Message Discarded: DraWorker Ingress MessageRate Control 3-295
22200 - MP CPU Congested 3-296
22201 - MpRxAllRate 3-297
22202 - MpDiamMsgPoolCongested 3-298
22203 - PTR Buffer Pool Utilization 3-299
22204 - Request Message Queue Utilization 3-299
22205 - Answer Message Queue Utilization 3-300
22206 - Reroute Queue Utilization 3-301
22207 - DclTxTaskQueueCongested 3-302
22208 - DclTxConnQueueCongested 3-302
22209 - Message Copy Disabled 3-303
22214 - Message Copy Queue Utilization 3-303
22221 - Routing MPS Rate 3-304
22222 - Long Timeout PTR Buffer Pool Utilization 3-305
22223 - DraWorker Memory Utilization Threshold Crossed 3-305
22224 - Average Hold Time Limit Exceeded 3-307
22225 - Average Message Size Limit Exceeded 3-309
22328 - Connection is processing a higher than normal ingress messaging rate 3-311
xv
22349 - IPFE Connection Alarm Aggregation Threshold 3-312
22350 - Fixed Connection Alarm Aggregation Threshold 3-314
22900 - DPI DB Table Monitoring Overrun 3-316
22901 - DPI DB Table Monitoring Error 3-316
22950 - Connection Status Inconsistency Exists 3-317
22960 - DA-MP Profile Not Assigned 3-318
22961 - Insufficient Memory for Feature Set 3-318
25607 - DSR Signaling Firewall is administratively Disabled 3-319
25608 - Abnormal DA-MP Firewall 3-320
25609 - Firewall Configuration Error encountered 3-320
25610 - DSR Signaling Firewall configuration inconsistency detected 3-321
25611 - ETG - Invalid DRMP Attributes 3-321
25612 - Peer CNDRA ping failed 3-322
25613 – Peer Node Alarm Group Threshold 3-323
25614 - Connection Alarm Group Threshold 3-323
25805 - Invalid Shared TTG Reference 3-324
25806 - Invalid Internal Overseer Server Group Designation 3-325
Range Based Address Resolution (RBAR) Alarms and Events (22400-22424) 3-325
22400 - Message Decoding Failure 3-325
22401 - Unknown Application ID 3-326
22402 - Unknown Command Code 3-326
22403 - No Routing Entity Address AVPs 3-327
22404 - No valid Routing Entity Addresses found 3-328
22405 - Valid address received didn’t match a provisioned address or address
range 3-328
22406 - Routing attempt failed due to internal resource exhaustion 3-329
22407 - Routing attempt failed due to internal database inconsistency failure 3-329
22411 - Address Range Lookup for Local Identifier skipped 3-330
Generic Application Alarms and Events (22500-22599) 3-331
22500 - Peer CNDRA Application Unavailable 3-331
22501 - Peer CNDRA Application Degraded 3-332
22502 - Peer CNDRA Application Request Message Queue Utilization 3-334
22503 - Peer CNDRA Application Answer Message Queue Utilization 3-335
22504 - Peer CNDRA Application Ingress Message Rate 3-336
22520 - Peer CNDRA Application Enabled 3-337
22521 - Peer CNDRA Application Disabled 3-338
Full Address Based Resolution (FABR) Alarms and Events (22600-22640) 3-338
22600 - Message Decoding Failure 3-338
22601 - Unknown Application ID 3-339
22602 - Unknown Command Code 3-340
22603 - No Routing Entity Address AVPs 3-340
xvi
22604 - No Valid User Identity Addresses Found 3-341
22605 - No Destination address is found to match the valid User Identity
address 3-342
22606 - Database or DB connection error 3-343
22607 - Routing attempt failed due to DRL queue exhaustion 3-343
22608 - Database query could not be sent due to DB congestion 3-344
22609 - Database connection exhausted 3-344
22610 - FABR DP Service congestion state change 3-345
22611 - FABR Blacklisted Subscriber 3-345
22631 - FABR DP Response Task Message Queue Utilization 3-346
22632 - ComAgent Registration Failure 3-346
Policy and Charging Application (PCA) Alarms and Events (22700-22799) 3-348
22700 - Protocol Error in Diameter Requests 3-348
22701 - Protocol Error in Diameter Answers 3-348
22702 - Database Hash Function Error 3-349
22703 - Diameter Message Routing Failure Due To Full DRL Queue 3-350
22704 - Communication Agent Error 3-350
22705 - SBR Error Response Received 3-351
22706 - Binding Key Not Found In Diameter Message 3-351
22707 - Diameter Message Processing Failure 3-352
22708 - PCA Function is Disabled 3-353
22709 - PCA Function is Unavailable 3-353
22710 - SBR Sessions Threshold Exceeded 3-354
22711 - SBR Database Error 3-355
22712 - SBR Communication Error 3-356
22713 - SBR Alternate Key Creation Error 3-356
22714 - SBR RAR Initiation Error 3-357
22715 - SBR Audit Suspended 3-357
22716 - SBR Audit Statistics Report 3-358
22717 - SBR Alternate Key Creation Failure Rate 3-359
22718 - Binding Not Found for Binding Dependent Session Initiate Request 3-359
22719 - Maximum Number of Sessions per Binding Exceeded 3-360
22720 - Policy SBR To PCA Response Queue Utilization Threshold Exceeded 3-360
22721 - Policy and Charging Server In Congestion 3-361
22722 - Policy Binding Sub-resource Unavailable 3-362
22723 - Policy and Charging Session Sub-resource Unavailable 3-363
22724 - Policy SBR Memory Utilization Threshold Exceeded 3-365
22725 - SBR Server In Congestion 3-365
22726 - SBR Queue Utilization Threshold Exceeded 3-366
22727 - SBR Initialization Failure 3-367
22728 - SBR Bindings Threshold Exceeded 3-368
xvii
22729 - PCRF Not Configured 3-369
22730 - Policy and Charging Configuration Error 3-370
22731 - Policy and Charging Database Inconsistency 3-371
22732 - SBR Process CPU Utilization Threshold Exceeded 3-372
22733 - SBR Failed to Free Binding Memory After PCRF Pooling Binding
Migration 3-373
22734 - Policy and Charging Unexpected Stack Event Version 3-373
22735 - Policy DRA session initiation request received with no APN 3-374
22736 - SBR failed to free shared memory after a PCA function is disabled 3-375
22737 - Configuration Database Not Synced 3-376
22738 - SBR Database Reconfiguration State Transition 3-376
22740 - SBR Reconfiguration Plan Completion Failure 3-377
22741 - Failed to route PCA generated RAR 3-378
22742 - Enhanced Overload Control AdminState Mismatch 3-378
22743 - PCA Server Congested Due to Composite Resource Congestion 3-379
22750 - Enhanced Suspect Binding Removal Feature Enabled 3-380
22751 - Binding Audit Suppression by Suspect Binding Removal 3-380
22752 - SBR Process Not Running 3-381
SCEF (23000-23200, 102801-115001, 390000) 3-381
23150 - Diameter Application Not Supported 3-381
23152 - Universal SBR Sub-Resource Unavailable 3-382
23153 - Diameter Command Code not supported 3-383
23154 - HTTP Message Processing Error 3-383
23155 - SCEF Configuration Error 3-384
23156 - Protocol Error in Diameter Message 3-384
23157 - Protocol Error in HTTP Message 3-385
23158 - Universal SBR Error 3-385
23159 - Diameter Request Routing Failure 3-386
23160 - Access Control Not Enabled 3-386
23161 - USBR Response Queue Utilization Threshold Exceeded 3-387
23162 - Polling Event Queue Utilization Threshold Exceeded 3-387
102801 - 3-388
102826 - 3-388
102827 - 3-389
102828 - 3-389
102829 - 3-390
102830 - 3-390
102831 - 3-391
102832 - 3-391
102833 - 3-392
102834 - 3-392
xviii
102835 - 3-393
102836 - 3-393
102837 - 3-394
102838 - 3-394
102839 - 3-395
102840 - 3-395
102844 - 3-396
102845 - 3-396
102846 - 3-397
111007 - 3-397
115001 - 3-398
390000 - 3-398
Tekelec Virtual Operating Environment, TVOE (24400-24499) 3-399
24400 - TVOE libvirtd is down 3-399
24401 - TVOE libvirtd is hung 3-399
24402 - all TVOE libvirtd connections are in use 3-400
Computer Aided Policy Making, CAPM (25000-25499) 3-400
25000 - CAPM Update Failed 3-400
25001 - CAPM Action Failed 3-401
25002 - CAPM Exit Rule Template 3-402
25003 - CAPM Exit Trigger 3-402
25004 - Script failed to load 3-403
25005 - CAPM Generic Event 3-403
25006 - CAPM Generic Alarm - Minor 3-404
25007 - CAPM Generic Alarm - Major 3-404
25008 - CAPM Generic Alarm - Critical 3-405
OAM Alarm Management (25500-25899) 3-405
25500 - No DA-MP Leader Detected Alarm 3-405
25510 - Multiple DA-MP Leader Detected Alarm 3-407
25800 - Peer Discovery Failure 3-408
25801 - Peer Discovery Configuration Error Encountered 3-408
25802 - Realm Expiration Approaching 3-409
25803 - Peer Discovery - Inconsistent Remote Host Port Assignment 3-410
25804 - Peer Discovery State Change 3-410
Platform (31000-32800) 3-411
31000 - S/W fault 3-411
31001 - S/W status 3-411
31002 - Process watchdog failure 3-412
31003 - Thread watchdog failure 3-412
31100 - Database replication fault 3-413
31101 - Database replication to slave failure 3-414
xix
31102 - Database replication from master failure 3-415
31103 - DB replication update fault 3-416
31104 - DB replication latency over threshold 3-417
31105 - Database merge fault 3-417
31106 - Database merge to parent failure 3-418
31107 - Database merge from child failure 3-419
31108 - Database merge latency over threshold 3-420
31109 - Topology config error 3-420
31110 - Database audit fault 3-421
31111 - Database merge audit in progress 3-421
31112 - DB replication update log transfer timed out 3-422
31113 - DB replication manually disabled 3-422
31114 - DB replication over SOAP has failed 3-423
31115 - Database service fault 3-424
31116 - Excessive shared memory 3-424
31117 - Low disk free 3-425
31118 - Database disk store fault 3-425
31119 - Database updatelog overrun 3-426
31120 - Database updatelog write fault 3-426
31121 - Low disk free early warning 3-427
31122 - Excessive shared memory early warning 3-427
31123 - Database replication audit command complete 3-428
31124 - ADIC error 3-429
31125 - Database durability degraded 3-429
31126 - Audit blocked 3-430
31127 - DB replication audit complete 3-430
31128 - ADIC found error 3-431
31129 - ADIC found minor issue 3-431
31130 - Network health warning 3-432
31131 - DB ousted throttle behind 3-432
31132 - DB replication precedence relaxed 3-433
31133 - DB replication switchover exceeds threshold 3-433
31134 - DB site replication to slave failure 3-434
31135 - DB site replication from master failure 3-434
31136 - DB site replication precedence relaxed 3-435
31137 - DB site replication latency over threshold 3-436
31140 - Database perl fault 3-436
31145 - Database SQL fault 3-437
31146 - DB mastership fault 3-437
31147 - DB upsynclog overrun 3-438
31148 - DB lock error detected 3-438
xx
31149 - DB late write nonactive 3-439
31150 - DB Health Impacted 3-440
31151 – DB Storage Persistent Failure 3-440
31200 - Process management fault 3-441
31201 - Process not running 3-441
31202 - Unkillable zombie process 3-442
31206 - Process mgmt monitoring fault 3-443
31207 - Process resource monitoring fault 3-443
31208 - IP port server fault 3-444
31209 - Hostname lookup failed 3-444
31213 - Process scheduler fault 3-445
31214 - Scheduled process fault 3-445
31215 - Process resources exceeded 3-446
31216 - SysMetric configuration error 3-447
31217 - Network health warning 3-447
31220 - HA configuration monitor fault 3-448
31221 - HA alarm monitor fault 3-448
31222 - HA not configured 3-449
31223 - HA heartbeat transmit failure 3-449
31224 - HA configuration error 3-450
31225 - HA service start failure 3-450
31226 - HA availability status degraded 3-451
31227 - HA availability status failed 3-452
31228 - HA standby offline 3-452
31229 - HA score changed 3-453
31230 - Recent alarm processing fault 3-454
31231 - Platform alarm agent fault 3-455
31232 - Late heartbeat warning 3-455
31233 - HA path down 3-456
31234 - Untrusted time upon initialization 3-456
31235 - Untrusted time after initialization 3-457
31236 - HA link down 3-458
31240 - Measurements collection fault 3-459
31250 - RE port mapping fault 3-459
31260 - SNMP agent 3-460
31261 - SNMP configuration error 3-460
31270 - Logging output 3-461
31280 - HA active to standby transition 3-461
31281 - HA standby to active transition 3-462
31282 - HA management fault 3-462
31283 - Lost communication with server 3-463
xxi
31284 - HA remote subscriber heartbeat warning 3-464
31285 - HA node join recovery entry 3-464
31286 - HA node join recovery plan 3-465
31287 - HA node join recovery complete 3-465
31288 - HA site configuration error 3-466
31290 - HA process status 3-466
31291 - HA election status 3-467
31292 - HA policy status 3-467
31293 - HA resource link status 3-468
31294 - HA resource status 3-468
31295 - HA action status 3-469
31296 - HA monitor status 3-469
31297 - HA resource agent info 3-470
31298 - HA resource agent detail 3-470
31299 - HA notification status 3-471
31300 - HA control status 3-471
31301 - HA topology events 3-472
31322 - HA configuration error 3-472
32100 - Breaker panel feed unavailable 3-473
32101 - Breaker panel breaker failure 3-473
32102 - Breaker panel monitoring failure 3-474
32103 - Power feed unavailable 3-475
32104 - Power supply 1 failure 3-475
32105 - Power supply 2 failure 3-476
32106 - Power supply 3 failure 3-476
32107 - Raid feed unavailable 3-477
32108 - Raid power 1 failure 3-477
32109 - Raid power 2 failure 3-478
32110 - Raid power 3 failure 3-478
32111 - Device failure 3-479
32112 - Device interface failure 3-479
32113 - Uncorrectable ECC memory error 3-481
32114 - SNMP get failure 3-482
32115 - TPD NTP daemon not synchronized failure 3-483
32116 - TPD server's time has gone backwards 3-484
32117 - TPD NTP offset check failure 3-486
32300 - Server fan failure 3-487
32301 - Server internal disk error 3-488
32302 - Server RAID disk error 3-489
32303 - Server Platform error 3-489
32304 - Server file system error 3-490
xxii
32305 - Server Platform process error 3-491
32306 - Server RAM shortage error 3-491
32307 - Server swap space shortage failure 3-492
32308 - Server provisioning network error 3-493
32309 - EAGLE network A error 3-494
32310 - EAGLE network B error 3-494
32311 - Sync network error 3-495
32312 - Server disk space shortage error 3-495
32313 - Server default route network error 3-496
32314 - Server temperature error 3-497
32315 - Server mainboard voltage error 3-498
32316 - Server power feed error 3-499
32317 - Server disk health test error 3-500
32318 - Server disk unavailable error 3-501
32319 - Device error 3-501
32320 - Device interface error 3-502
32321 - Correctable ECC memory error 3-503
32322 - Power supply A error 3-503
32323 - Power supply B error 3-504
32324 - Breaker panel feed error 3-505
32325 - Breaker panel breaker error 3-505
32326 - Breaker panel monitoring error 3-508
32327 - Server HA Keepalive error 3-510
32328 - DRBD is unavailable 3-510
32329 - DRBD is not replicating 3-511
32330 - DRBD peer problem 3-512
32331 - HP disk problem 3-512
32332 - HP smart array controller problem 3-513
32333 - HP hpacucliStatus utility problem 3-514
32334 - Multipath device access link problem 3-515
32335 - Switch link down error 3-515
32336 - Half open socket limit 3-516
32337 - Flash program failure 3-517
32338 - Serial mezzanine unseated 3-517
32339 - TPD max number of running processes error 3-518
32340 - TPD NTP daemon not synchronized error 3-518
32341 - TPD NTP daemon not synchronized error 3-520
32342 - NTP offset check error 3-521
32343 - TPD RAID disk 3-522
32344 - TPD RAID controller problem 3-522
32345 - Server upgrade snapshot(s) invalid 3-523
xxiii
32346 - OEM hardware management service reports an error 3-524
32347 - The hwmgmtcliStatus daemon needs intervention 3-524
32348 - FIPS subsystem problem 3-525
32349 - File tampering 3-526
32350 - Security process terminated 3-526
32500 - Server disk space shortage warning 3-527
32501 - Server application process error 3-528
32502 - Server hardware configuration error 3-528
32503 - Server RAM shortage warning 3-529
32504 - Software configuration error 3-530
32505 - Server swap space shortage warning 3-530
32506 - Server default router not defined 3-531
32507 - Server temperature warning 3-532
32508 - Server core file detected 3-533
32509 - Server NTP daemon not synchronized 3-534
32510 - CMOS battery voltage low 3-535
32511 - Server disk self test warning 3-536
32512 - Device warning 3-536
32513 - Device interface warning 3-537
32514 - Server reboot watchdog initiated 3-537
32515 - Server HA failover inhibited 3-538
32516 - Server HA active to standby transition 3-539
32517 - Server HA standby to active transition 3-539
32518 - Platform health check failure 3-540
32519 - NTP offset check failure 3-540
32520 - NTP stratum check failure 3-541
32521 - SAS presence sensor missing 3-543
32522 - SAS drive missing 3-543
32523 - DRBD failover busy 3-544
32524 - HP disk resync 3-544
32525 - Telco fan warning 3-545
32526 - Telco temperature warning 3-546
32527 - Telco power supply warning 3-546
32528 - Invalid BIOS value 3-547
32529 - Server kernel dump file detected 3-548
32530 - TPD upgrade failed 3-548
32531 - Half open socket warning limit 3-549
32532 - Server upgrade pending accept/reject 3-549
32533 - TPD max number of running processes warning 3-550
32534 - TPD NTP source is bad warning 3-551
32535 - TPD RAID disk resync 3-552
xxiv
32536 - TPD server upgrade snapshot(s) warning 3-552
32537 - FIPS subsystem warning event 3-553
32538 - Platform data collection error 3-554
32539 - Server patch pending accept/reject 3-554
32540 - CPU power limit mismatch 3-555
32700 - Telco switch notification 3-555
32701 - HIDS initialized 3-556
32702 - HIDS baseline deleted 3-556
32703 - HIDS enabled 3-556
32704 - HIDS disabled 3-557
32705 - HIDS monitoring suspended 3-557
32706 - HIDS monitoring resumed 3-557
32707 - HIDS baseline updated 3-558
Diameter Custom Applications (DCA) Framework Alarms and Events (33300-33630) 3-558
33300 - Create Application Version Failure 3-558
33301 - Update Config Data Failure 3-559
33302 - Delete Application Version Failure 3-559
33303 - UDR Event Queue Utilization 3-560
33304 - DCA Runtime Errors 3-560
33305 - DCA Procedure Not Found 3-561
33307 - Diameter Message Routing Failure Due To Full DRL Queue 3-562
33308 - DCA to UDR ComAgent Error 3-563
33309 - DCA Script Compilation Error 3-563
33311 - DCA Application Reloaded 3-564
33312 - DCA Script Generation Error 3-564
33315 - DCA Asynchronous Task Stops Processing 3-565
33316 - DCA AsyncTask Queue Utilization 3-565
33317 - DCA Fetch Log Error 3-566
33318 - DCA CreateAndSend Request Message Send Failed 3-566
DCA Custom MEAL Event Templates 3-567
33330-33429 - DcaCustomMeal.name + "Alrm" 3-567
33430-33630 - DcaCustomMeal.name + "Alrm" 3-567
Independent SBR Alarms and Events (12003-12010, 33730-33830) 3-568
12003 - SBR congestion state 3-568
12007 - SBR active sess binding threshold 3-568
12010 - SBR proc term 3-569
33730 - U-SBR database audit statistics report 3-570
vSTP Alarms and Events (70000-70060, 70100-70999) 3-571
70000 - Association Down 3-571
70001 - Link Down 3-571
70002 - RSP/Destination Unavailable 3-572
xxv
70003 - RSP/Destination Route Unavailable 3-573
70004 - Linkset Unavailable 3-573
70005 - Link Unavailable 3-574
70006 - Preferred Route Unavailable 3-574
70007 - Node Isolated - All Links Down 3-575
70008 - Linkset Restricted 3-576
70009 - Link Congested 3-576
70050 - SCTP Connection Refused 3-577
70051 - Failed to Configure Transport 3-578
70052 - Far-end Closed the Connection 3-578
70053 - SCTP Connection Closed 3-579
70054 - Remote IP Address State Change 3-579
70055 - Association Admin State Change 3-580
70056 - Link Admin State Change 3-580
70057 - Received Invalid M3UA Message 3-581
70058 - Received M3UA ERROR 3-582
70059 - Failed to Send DATA Message 3-583
70060 - TFP Received 3-584
70061 - TFA Received 3-584
70062 - TFR Received 3-584
70063 - TFC Received 3-585
70064 - MTP3 Routing Error 3-585
70065 - MTP3 Routing Error - Invalid NI 3-586
70066 - MTP3 Routing Error - Invalid SI 3-586
70067 - Failed to Receive DATA Message 3-587
70068 - vSTP EIR Application Status Changed 3-588
70069 - TCAP Invalid Parameter or Decode Failure 3-588
70070 - Message Encode Failed 3-589
70071 - Missing IMEI 3-589
70072 - Invalid IMEI Length 3-590
70073 - Unsupported TCAP Message Type 3-590
70075 - vSTP LSS Stack Event Queue Utilization 3-591
70076 - vSTP Logging Stack Event Queue Utilization 3-591
70077 - vSTP EIR Log Fetch Error 3-592
70078 - vSTP EIR Logging Error in MP 3-592
70079 - M3UA Ingress Message Discarded 3-593
70081 - vSTP M3RL Linkset Buffer Utilization 3-594
70082 - vSTP M3RL RSP Buffer Utilization 3-594
70083 - vSTP M2PA Retransmission Buffer Utilization 3-595
70084 - vSTP MTP2 Transmission and Retransmission Buffer Utilization 3-595
70091 - Missing Mandatory Parameter 3-596
xxvi
70092 - Malformed Subscriber ID 3-597
70093 - Unexpected Value for Subscriber ID 3-597
70094 - Invalid MSISDN Length 3-597
70095 - ATINP Invalid Requested Info 3-598
70096 - Digits Truncated in Encoded Parameter 3-598
70100 - ATINP Application Status Changed 3-599
70101 - Transmission Association Queue Congestion Crossed 3-599
70102 - MTP3 Ingress Link MSU TPS Crossed 3-600
70103 - MTP3 Egress Link MSU TPS Crossed 3-601
70104 - MTP3 Ingress Link Management TPS Crossed 3-601
70105 - Transmission Association Queue Discard Crossed 3-602
70107 - vSTP SCCP Stack Event Queue Utilization 3-603
70108 - vSTP M3RL Stack Event Queue Utilization 3-603
70109 - vSTP M3RL Network Management Event Queue Utilization 3-604
70110 - vSTP M3UA Stack Event Queue Utilization 3-604
70111 - vSTP M2PA Stack Event Queue Utilization 3-605
70112 - vSTP M3UA Tx Stack Event Queue Utilization 3-605
70201 - M2PA link operational state changed 3-606
70202 - M2PA Link Failed 3-606
70203 - M2PA Ingress Message Discarded 3-607
70204 - M2PA Egress Message Discarded 3-607
70205 - M2PA Message Encoding Failed 3-608
70206 - M2PA Message Decoding Failed 3-608
70207 - M2PA Proving Period Timer Expired 3-609
70208 - M2PA Remote Congestion Timer(T6) Expired 3-609
70209 - Received Remote Processor Outage 3-610
70210 - Received Remote Out of Service 3-610
70220 - MTP2 Link admin state change 3-611
70221 - Failed to send message to TDM driver 3-611
70222 - Failed to receive message from TDM driver 3-612
70223 - MTP2 link operational state changed 3-612
70224 - MTP2 link failed 3-613
70225 - MTP2 Ingress message discarded 3-613
70226 - MTP2 Egress message discarded 3-614
70227 - Received Remote Out Of Service on MTP2 link 3-614
70251 - Subsystem Congested 3-615
70252 - Subsystem Prohibited 3-615
70210 - Received Remote Out of Service 3-616
70271 - SCCP Received Invalid Message 3-616
70272 - SCCP Message Translation Failed 3-617
70273 - SCCP Message Routing Failed 3-617
xxvii
70274 - SGMG Message Invalid 3-618
70275 - GTT SCCP Loop Detected 3-618
70276 - GTT Load Sharing Failed 3-619
70277 – GTT Action Discard MSU 3-619
70278 – GTT Action Failed 3-620
70279 – GTT MBR Duplicate Set Type Failed 3-620
70280 – GTT MBR Duplicate Set Type Warning 3-621
70281 – GTT FLOBR Duplicate Set Name Failed 3-621
70282 - GTT FLOBR Duplicate Set Name Warning 3-622
70283 - GTT FLOBR Max Search Depth Failed 3-622
70284 - GTT FLOBR Max Search Depth Warning 3-623
70285 – MBR Decoding Failed 3-623
70286 - GTT Duplicate Action Processing Stopped 3-624
70291 - XUDT UDT Conversion Failed. 3-624
70292 - SCCP Encode Failure 3-625
70293 - SFAPP Decode Error 3-625
70294 - SFAPP Validation Matching State not found 3-626
70295 - SFAPP Validation Encoding Error 3-626
70296 - SFAPP Validation Response Timeout Error 3-627
70297 - SFAPP Validation Velocity Chk Failed. 3-627
70298 - SFAPP Validation Failed 3-628
70299 - SFAPP Invalid CC/NDC received 3-628
70300 - Updation failed in UDR 3-628
70301 - VSTP SFAPP Stack Event Queue Utilization 3-629
70302 - Invalid Length of Conditioned Digits 3-629
70303 - Conv to Intl Num - Dflt NC Not Found 3-630
70304 - MNP Circular Route Detected 3-630
70305 - Translation PC Type is ANSI 3-631
70306 - Invalid Digits in MAP MSISDN Parameter 3-631
70307 - Invalid Prefix/Suffix Digit Length 3-631
70308 - Translation PC is Local Point Code 3-632
70309 - ANSI Translation Not Supported 3-632
70310 - Too many digits for DRA parameter 3-633
70311 - IDPR CGPN encoding failed 3-633
70312 - IDPR CDPN encoding failed 3-634
70313 - IDPRCDPN(X) NPP SERVICE is OFF 3-634
70314 - IDPRCGPN NPP SERVICE is OFF 3-634
70315 - DESTINATION ADDRESS DECODING is FAIL 3-635
70316 - TCAP ENCODING is FAIL 3-635
70317 - OUT OF BOUND DIGIT 3-636
70318 - SMS MANDATORY PARAMETER MISSING 3-636
xxviii
70319 - ADDRESS DECODING is FAIL 3-637
70320 - MNPCDPA MATCHES HOME SMSC 3-637
70331 - SCCP XUDT Reassembly Failure 3-638
70332 - SCCP XUDT Segmentation Failure 3-638
70351 – vSTP Maintenance Leader HA Notification to Go Active 3-638
70352 – vSTP Maintenance Leader HA notification to GO OOS 3-639
70353 – Routing DB Inconsistency Exists 3-639
70354 – vSTP DB Table Monitoring Overrun 3-640
70355 - vSTP DB Table Monitoring Error 3-640
70356 - Failed to Process Ingress MSU: Peer MP Unavailable or Congested 3-641
70371 - No vSTP-MP Leader Detected 3-641
70372 - Multiple vSTP-MP Leader Detected 3-642
70373 - Connection Alarm Aggregation Threshold Reached 3-642
70374 - Link Alarm Aggregation Threshold Reached 3-643
70375 - Linkset Alarm Aggregation Threshold Reached 3-643
70376 - Route Alarm Aggregation Threshold Reached 3-644
70377 - RSP Alarm Aggregation Threshold Reached 3-644
70378 - SLTC Failure 3-645
70379 - Unexpected TFA Received 3-645
70380 - Unexpected TFR Received 3-646
70381 - Unexpected TFP Received 3-646
70382 - Unexpected TFC Received 3-647
70383 - Invalid H0 H1 Code 3-648
70384 - TFC Generated 3-648
70385 - Change Over Order Performed 3-649
70386 - Emergency Change Over Performed 3-649
70387 - Changeback Timer Expired 3-649
70388 - UPU Received 3-650
70389 - Remote Blocked 3-650
70290 - RSP/Destination Restricted 3-651
70391 - RSP/Destination Route Restricted 3-652
70392 - MSU Failed MTP Screening 3-652
70411 - ANSI to ITU CDPA GT Conversion Failure 3-653
70401 - ANSI to ITU CGPA GT Conversion Failure 3-653
70402 - ITU to ANSI CDPA GT Conversion Failure 3-654
70403 - ITU to ANSI CGPA GT Conversion Failure 3-654
70404 - Affected PC Conversion Failure 3-655
70404 - OPC Conversion Failed 3-655
70406 - Conversion Failed. CGPA PC Alias Undefined 3-656
70407 - Conversion MSU Discard. SCCP MSU Too Large 3-656
70408 - Conversion MSU Discard. Invalid Segmentation Parameters 3-657
xxix
70409 - Conversion Failed. Incorrect SCCP Parameter Length 3-657
70410 - MTP3 Circular Loop Detected 3-658
70411 - Conversion MSU Discard. Invalid SCMG Message Type 3-658
70416 - SCCP Application MSU Discarded 3-659
70418 - Sccp Egress Tps Threshold Crossed 3-659
70420 - Unsupported ACN Object ID Length 3-660
70421 - Failed to Decode TCAP Parameters 3-660
70422 - INAP Called Party Number is Missing 3-660
70423 - Unexpected SI in TIF Stop Action 3-661
70424 - Modified MSU too large to route 3-661
70425 - ISUP IAM Decode Failed 3-662
70425 - ISUP IAM Decode Failed 3-662
70427 - ISUP Encode Failed 3-663
70428 - TIF CgPN NS Failure: CC mismatch in DN 3-663
70429 - VLR Status changed 3-664
70430 - Velocity Threshold Crossed 3-664
70431 - Dynamic VLR Profile Aging 3-664
70432 - Dynamic VLR Roaming Aging 3-665
70433 - Vstp Dynamic learning is turned OFF 3-665
70434 - Vstp Dynamic learning LEARN Mode Timer Expired 3-666
70435 - Vstp Dynamic learning Profile Table Full 3-666
70436 - Vstp Dynamic learning Roaming Table Full 3-667
70437 - VSTP Security Logging Stack Event Queue Utilization 3-667
70438 - Vstp Security Logging Error in MP 3-668
70439 - Vstp Security Log Fetch Error 3-668
70440 - Vstp Security Log Fetch Error at Remote Server 3-668
70446 - VstpServiceStackEventQueueUtil 3-669
70451 - serviceMpUnavailable 3-669
70458 - Transaction Not Found for Ack. 3-670
70454 - SMS Proxy SCCP Validation Failed 3-670
70448 - SMS Proxy Message Validation Response Timeout Error 3-671
70447 - Service Validation Failed 3-671
70450 - SMS Proxy Message Validation Encoding Error 3-672
70450 - Service Validation Decoding Error 3-672
70453 - SMS Proxy GT address blocked 3-672
70452 - SMS Proxy GT address allowed. 3-673
70456 - Serivce DOS Timer Timeout 3-673
70455 - Service MTFSM Invoke Timer Timeout. 3-674
Diameter Equipment Identity Register (EIR) (71000-71999) 3-674
71000 - EIR Message Decoding Failure 3-674
71001 - ECA Routing Attempt Failed 3-675
xxx
71002 - EIR Message Encoding Failure 3-675
71003 - EIR Application Unavailable 3-676
71004 - UDR DB Connection Error 3-676
71005 - EIR TPS Exceeded 3-677
71006 - EIR Logging Suspended 3-677
71007 - EIR Request Queue Utilization 3-678
71008 - EIR UDR Response Queue Utilization 3-678
71009 - EIR Application Congested 3-679
71010 - ComAgent Registration Failure 3-679
71011 - Fetch Log Failed at SO 3-680
xxxi
vSTP KPIs 4-17
xxxii
List of Figures
2-1 Flow of Alarms 2-2
2-2 Alarm Indicators Legend 2-3
2-3 Trap Count Indicator Legend 2-3
3-1 Breaker Panel LEDs 3-506
3-2 Breaker Panel Setting 3-507
xxxiii
List of Tables
2-1 Alarm/Event ID Ranges 2-4
2-2 Alarm and Event Types 2-5
2-3 Active Alarms Elements 2-6
2-4 Schedule Active Alarm Data Export Elements 2-7
2-5 Graphical information components 2-11
2-6 Schedule Event Data Export Elements 2-13
2-7 Data Export Elements 2-17
2-8 Active Tasks Elements 2-20
2-9 Active Tasks Report Elements 2-23
2-10 Scheduled Tasks Elements 2-24
3-1 Parameter Table 3-160
3-2 Parameter Table 3-162
4-1 KPIs Server Elements 4-1
4-2 Schedule KPI Data Export Elements 4-3
4-3 CAPM KPIs 4-5
4-4 Communication Agent KPIs 4-5
4-5 DCA Custom MEAL KPIs 4-5
4-6 DCA Framework KPIs 4-6
4-7 DIAM KPIs 4-6
4-8 DP KPIs 4-6
4-9 Diameter EIR KPIs 4-7
4-10 SS7 EIR KPIs 4-8
4-11 IDIH KPIs 4-9
4-12 IPFE KPIs 4-9
4-13 MP KPIs 4-9
4-14 FABR KPIs 4-10
4-15 Platform KPIs 4-10
4-16 PCA KPIs 4-11
4-17 Process-based KPIs 4-11
4-18 Provisioning KPIs 4-12
4-19 RBAR KPIs 4-13
4-20 Non Arrayed KPIs 4-14
4-21 Arrayed KPIs 4-14
4-22 SS7/Sigtran KPIs 4-15
4-23 SBR KPIs 4-15
xxxiv
4-24 SBR-Binding KPIs 4-15
4-25 SBR-Session KPIs 4-16
4-26 U-SBR KPIs 4-16
4-27 vSTP KPIs 4-17
xxxv
1
Introduction
This section contains an overview of the available information for DSR alarms
and events. The contents include sections on the scope and audience of the
documentation, as well as how to receive customer support assistance.
Revision History
Date Description
July 2020 Added new vSTP Alarm - Alarm-ID 70437,
70438, 70439, 70440
August 2020 Added a new DCA Alarm: 33315.
September 2020 Removed the following sections:
• DM-IWF (33000-33024)
• MD-IWF (33050-33099)
• GLA (33100-33149)
• MD-IWF KPIs
• DM-IWF KPIs
• GLA KPIs
• 33306 - U-SBR Resolution Failure
• 33310 - U-SBR Sub-resource Unavailable
• 33313 - DCA U-SBR Logical Name
Mismatch
• Updated the following sections:
– 33303 - UDR Event Queue Utilization
– 33305 - DCA Procedure Not Found
– 33308 - DCA to UDR ComAgent
Error
– 33430-33630 -
DcaCustomMeal.name + "Alrm"
November 2020 • Added a note about the KPI column name
in the Exporting KPIs section.
• Added descriptions of Disk and Shared
memory to the following sections:
– KPIs Overview
– Table 4-1
May 2021 Added a new Event in the 22082 - RouteList
is not Provisioned in System Options section
as part of the Mobile Private Network vDRA
(MPN vDRA) feature.
Overview
The DSR Alarms and KPIs documentation provides information about DSR alarms,
events, and KPIs; and provides corrective maintenance procedures and other
information used to maintain the system. This book contains the following:
1-1
Chapter 1
Scope and Audience
Manual Organization
Information in this document is organized into the following sections:
• Introduction contains general information about this document, how to contact My
Oracle Support.
• Alarms, Events, and KPIs Overview provides general information about the
application's alarms, events, and KPIs.
• Alarms and Events provides information and recovery procedures for alarms and
events, organized first by alarm category, and then numerically by the number that
displays in the application.
• Key Performance Indicators (KPIs) provides detailed KPI information, organized
alphabetically by KPI name.
My Oracle Support
My Oracle Support (https://support.oracle.com) is your initial point of contact for all
product support and training needs. A representative at Customer Access Support can
assist you with My Oracle Support registration.
Call the Customer Access Support main number at 1-800-223-1711 (toll-free in the
US), or call the Oracle Support hotline for your local country from the list at http://
www.oracle.com/us/support/contact/index.html. When calling, make the selections in
the sequence shown below on the Support telephone menu:
1. Select 2 for New Service Request.
2. Select 3 for Hardware, Networking and Solaris Operating System Support.
3. Select one of the following options:
• For Technical issues such as creating a new Service Request (SR), select 1.
1-2
Chapter 1
My Oracle Support
1-3
2
Alarms, Events, and KPIs Overview
This section provides general information about the application's alarms, events, and
KPIs.
Alarms Warning
Note:
For the most up-to-date information, refer to the MIB document posted with
each software release on the Oracle Software Delivery Cloud (OSDC) site.
Note:
Alarms in this manual are shared with other applications and may not display
in your specific application.
2-1
Chapter 2
General alarms and events information
Note:
Some events may be throttled because the frequently generated events can
overload the MP or OAM server's system or event history log (for example,
generating an event for every ingress message failure). By specifying a
throttle interval (in seconds), the events display no more than once during
the interval duration period (for example, if the throttle interval is 5 seconds,
the event is logged no more than once every 5 seconds).
Figure 2-1 shows how Alarms and Events are organized in the application.
Alarms and events are recorded in a database log table. Application event logging
provides an efficient way to record event instance information in a manageable form,
and is used to:
• Record events representing alarmed conditions
• Record events for later browsing
• Implement an event interface for generating SNMP traps
Alarm indicators, located in the User Interface banner, indicate all critical, major, and
minor active alarms. A number and an alarm indicator combined represent the number
of active alarms at a specific level of severity. For example, if you see the number six
in the orange-colored alarm indicator, it means there are six major active alarms. This
is shown in Figure 2-2 and Figure 2-3.
2-2
Chapter 2
General alarms and events information
Note:
The value in the Instance field can vary, depending on the process
generating the alarm.
2-3
Chapter 2
General alarms and events information
Note:
Some alarms and events have an Auto Clear Seconds of 0 (zero),
indicating these alarms and events do not auto-clear
2-4
Chapter 2
General alarms and events information
Note:
Not all applications use all of the alarm types listed.
2-5
Chapter 2
General alarms and events information
2-6
Chapter 2
General alarms and events information
Note:
The alarms and events that appear in View Active vary depending on
whether you are logged into an NOAM or SOAM. Alarm collection is handled
solely by NOAM servers in systems that do not support SOAMs.
2-7
Chapter 2
General alarms and events information
2-8
Chapter 2
General alarms and events information
2-9
Chapter 2
General alarms and events information
If the selected export frequency is fifteen minutes or hourly, this is the minute of
each period when the transfer is set to begin. For an export frequency of fifteen
minutes, transfers occur four times per hour, and this field displays the minute of
the first transfer of the hour.
9. Select the Time of Day if Export Frequency is daily or weekly.
This field is not active if the selected export frequency is once, fifteen minutes, or
hourly.
10. Select the Day of Week if Export Frequency is weekly.
This field is not active if the selected export frequency is once, fifteen minutes,
hourly, or daily.
11. Click OK to initiate the active alarms export task or Cancel to discard the changes
and return to the View Active page.
The data export task is initiated or scheduled.
From the Status & Manage, and then Files page, you can view a list of files
available for download, including the file you exported during this procedure. For more
information, see View the File List.
Scheduled tasks can be viewed, edited, and deleted, and reports of scheduled tasks
can be generated from Status & Manage, and then Tasks, and then Scheduled
Tasks. For more information see:
• Editing a Scheduled Task
• Deleting a Scheduled Task
• Generating a Scheduled Task Report
Note:
Only one export operation at a time is supported on a single server. If an
export is in progress from another GUI session when you click Export, a
message is displayed and the export does not start. You must wait until the
other export is complete before you can begin your export.
2-10
Chapter 2
General alarms and events information
Note:
Server is both a topology component and a data field in the active alarm
data grid display.
The graphs for the selected components display above the tabbed area.
4. To adjust the graph viewing area, click and hold the slider above the graph while
adjusting the proportions with the mouse.
5. To remove one or more graphs, de-select the choices from the Graph list.
If only some choices are deselected, the deselected graphs disappear. If all
choices are deselected, the graph display disappears.
2-11
Chapter 2
General alarms and events information
more focused, quick look at the alarms. The quick filter selection(s) are not persistent.
The quick filter settings are cleared once the user browses away from the View Active
Alarms page.
Quick filter selections from the graph are applied to the grid and all graphs displayed
within the current Server Group tab of the View Active Alarms page. For example, if
the portion of the stacked bar graph that displays the critical alarms is selected, the
grid filters to show critical platform alarms and the summary statistics are recalculated
to adjust the graphs. If additional portions of the graphs are selected, both the grid and
the graphs continue to be filtered according to the selections.
Note:
Although the quick filter is applied to the grid display, the quick filter criteria
are not applied to generated Reports and Exports of active alarm data. Use
the Filter list in the toolbar to filter the data.
Once active alarms have been graphed, use this procedure to apply a quick filter to
active alarms in a server group:
1. To add a quick filter, select a portion of the stacked bar graph to filter. The stacked
bar displays lists of active alarms by the alarm severity.
Note:
Alarm severity types are displayed using the following color distinctions:
• Critical - Red
• Major - Orange
• Minor - Yellow
Upon selection, the filtered graph portion displays green to indicate it is being used
as a filter.
2. Repeat the previous step as needed to filter additional portions of the remaining
bar graphs.
3. To remove all quick filtering selections from the active Server Group tab, click
Clear Selections.
The display grid and all graphs display with no quick filtering.
4. To remove individual quick filtering selections from the active Server Group tab,
select the portion of the stacked bar graph displayed in green.
The display grid and all graphs recalculate based on the remaining selections.
2-12
Chapter 2
General alarms and events information
Note:
The alarms and events that appear in View History vary depending on
whether you are logged in to an NOAM or SOAM. Alarm collection is handled
solely by NOAM servers in systems that do not support SOAMs.
Note:
Some fields, such as Additional Info, truncate data to a limited number
of characters. When this happens, a More link displays. Click More to
display a report with all relevant data.
Historical alarms and events are displayed according to the specified criteria.
The historical alarms table updates automatically. When new historical data is
available, the table is automatically updated, and the view returns to the top row of
the table.
3. To suspend automatic updates, click any row in the table.
The following(Alarm updates are suspended.) message displays.
If a new alarm is generated while automatic updates are suspended, the (Alarm
updates are suspended. Available updates pending.) message
displays.
To resume automatic updates, press and hold Ctrl as you click to deselect the
selected row.
2-13
Chapter 2
General alarms and events information
2-14
Chapter 2
General alarms and events information
Note:
Time of Day is not an option if Export Frequency equals Once.
Note:
Day of Week is not an option if Export Frequency equals Once.
11. Click the link in the green message box to go directly to the Status & Manage,
and then Files page.
From the Status & Manage, and then Files page, you can view a list of files
available for download, including the alarm history file you exported during this
procedure. For more information, see Opening a File.
2-15
Chapter 2
View the File List
Opening a File
Use this procedure to open a file stored in the file management storage area.
1. Select Status & Manage, and then Files.
2. Select an NE Name.
3. Click List Files.
The Status & Manage Files list page for the selected network element displays all
files stored in its file management storage area.
4. Click the Filename of the file to be opened.
5. Click Open to open the file.
Data Export
From the Data Export page, you can set an export target to receive exported
performance data. Several types of performance data can be filtered and exported
using this feature. For more information about how to create data export tasks, see:
• Export Active Alarms
• Exporting alarm and event history
• Exporting KPIs
From the Data Export page, you can manage file compression strategy and schedule
the frequency with which data files are exported.
2-16
Chapter 2
Data Export
2-17
Chapter 2
Data Export
2-18
Chapter 2
Tasks
Note:
Depending on the OS and implementation of the remote server, it may
be required to define the path to the rsync binary on the export server
but this is not common. If no path is specified, the username's home
directory on the export server is used.
6. Select whether to enable the transfer of the backup file. To leave the backup
disabled, do not check the box.
7. Select the File Compression type.
8. Select the Upload Frequency.
9. If you selected hourly for the upload frequency, select the Minute intervals.
10. If you selected daily or weekly for the upload frequency, select the Time of Day.
11. If you selected weekly for the upload frequency, select the Day of the Week.
12. If public keys were manually placed on the Export server, skip to step 14.
Otherwise, click Exchange SSH Key to transfer the SSH keys to the Export
server.
13. Enter the password.
The server attempts to exchange keys with the export server currently defined on
the page. After the SSH keys are successfully exchanged, continue with the next
step.
14. Click OK to apply the changes or Cancel to discard the changes.
The export server is now configured and available to receive performance and
configuration data.
15. You may optionally click Test Transfer to confirm the ability to export to the server
currently defined on the page.
The user can monitor the progress of the task by selecting the Tasks drop down
list in the page control area.
Tasks
The Tasks pages display the active, long running tasks and scheduled tasks on a
selected server. The Active Tasks page provides information such as status, start time,
progress, and results for long running tasks, while the Scheduled Tasks page provides
a location to view, edit, and delete tasks scheduled to occur.
Active Tasks
The Active Tasks page displays the long running tasks on a selected server. The
Active Tasks page provides information such as status, start time, progress, and
results, all of which can be generated into a report. Additionally, you can pause,
restart, or delete tasks from this page.
2-19
Chapter 2
Tasks
Deleting a task
Use this procedure to delete one or more tasks.
1. Click Status & Manage, and then Tasks, and then Active Tasks.
2. Select a server.
Note:
Hovering the cursor over any tab displays the name of the server.
Note:
To delete a single task or multiple tasks, the status of each task selected
must be one of the following: completed, exception, or trapped.
2-20
Chapter 2
Tasks
Note:
You can select multiple rows to delete at one time. To select multiple
rows, press and hold Ctrl as you click to select specific rows.
4. Click Delete.
5. Click OK to delete the selected task(s).
Note:
Hovering the cursor over any tab displays the name of the server.
Note:
Hovering the cursor over any tab displays the name of the server.
Pausing a task
Use this procedure to pause a task.
1. Click Status & Manage, and then Tasks, and then Active Tasks.
2. Select a server.
2-21
Chapter 2
Tasks
Note:
Hovering the mouse over any tab displays the name of the server.
Note:
A task may be paused only if the status of the task is running.
4. Click Pause.
A confirmation box appears.
5. Click OK to pause the selected task.
For information about restarting a paused task, see Restarting a task.
Restarting a task
Use this procedure to restart a task.
1. Click Status & Manage, and then Tasks, and then Active Tasks.
2. Select a server.
Note:
Hovering the mouse over any tab displays the name of the server.
Note:
A task may be restarted only if the status of the task is paused.
4. Click Restart.
A confirmation box appears.
5. Click OK to restart the selected task.
The selected task is restarted.
2-22
Chapter 2
Tasks
Note:
Hovering the mouse over any tab displays the name of the server.
Note:
If no tasks are selected, all tasks matching the current filter criteria is
included in the report.
4. Click Report.
5. Click Print to print the report.
6. Click Save to save the report.
Scheduled Tasks
The periodic export of certain data can be scheduled through the GUI. The Scheduled
Tasks page provides you with a location to view, edit, delete, and generate reports
2-23
Chapter 2
Tasks
of these scheduled tasks. For more information about the types of data that can be
exported, see:
• Export Active Alarms
• Exporting alarm and event history
• Exporting KPIs
2-24
Chapter 2
Tasks
Note:
If no tasks are selected, all tasks matching the current filter criteria is
included in the report.
3. Click Report.
4. Click Print to print the report.
5. Click Save to save the report.
2-25
3
Alarms and Events
This section provides general alarm/event information, and lists the types of alarms
and events that can occur on the system. Alarms and events are recorded in a
database log table. Currently active alarms can be viewed from Alarms & Events, and
then View Active. The alarms and events log can be viewed from the View History
option.
Note:
Some of the alarms in this document are shared with other applications and
may not appear in this particular product.
Description:
The IPFE has not received any heartbeats from an application server within the
heartbeat timeout interval.
Severity:
Minor
Instance:
IP address of the application server.
Note:
If a heartbeat is received from the application server, this alarm clears.
HA Score:
Normal
3-1
Chapter 3
IP Front End, IPFE (5000-5999)
OID:
ipfeIpfeBackendUnavailableNotify
Cause:
A DA-MP is not sending heartbeats to the IPFE.
Diagnostic Information:
Wireshark is the tool to monitor if the DAMP is sending a heartbeat to IPFE.
Follow these steps to diagnose the issues:
1. From the SO GUI, navigate to IPFE, and then Configuration, and then Target
Sets, and then TSA#, and then +; and at least one DAMP server XSI IP should
be present.
If yes, go to step 2.
2. Log into the IPFE server.
a. Ping <the DAMP server XSI IP>
b. Telnet <the DAMP server XSI IP> <monitoring port, default
9675>
If steps a or b fail, go to step 3.
3. ssh admusr@<DAMP server XMI>.
a. Run the sudo netstat -anop | grep <monitoring port, default
9675> command to see if there is a TCP listen socket on that DAMP XSI IP.
If yes, check the DAMP XSI network (hardware and software).
If no, check the configuration of the DAMP.
1. Recovery:
1. Check the status of the application servers by navigating to the Status & Manage,
and then Server page.
2. Consult the application server's documentation for recovery steps.
3. If the application server is functioning, check for network connectivity issues
between the IPFE and the application server.
4. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
This alarm indicates misconfiguration due to manual changes to the configuration
database, configuration data importing errors, or software installation errors. In
general, this error is caused by IPFE IP addresses being incorrectly configured.
Severity:
Critical
Instance:
Description of the field or fields that are incorrect.
3-2
Chapter 3
IP Front End, IPFE (5000-5999)
Note:
If the IPFE is able to successfully synchronize data with its peer, this alarm
clears.
HA Score:
Normal
OID:
ipfeIpfeStateSyncConfigErrorNotify
Cause:
The alarm raises if IPFE IP addresses is configured incorrectly.
3-3
Chapter 3
IP Front End, IPFE (5000-5999)
• IPFE-A/B: Only one sync address may be local - two addresses that both
correspond to an interface on the same blade have been entered
• IPFE-A/B: Peer software version incompatible - the peer IPFE is on a different
version
Diagnostic Information:
Collect the following data before contacting My Oracle Support for assistance:
• iqt -pE Network>Network_$(hostname)
• iqt -pE L3Interface>L3Interface_$(hostname)
• Screenshot of Configuration, and then Network, and then Devices, and then
<All IPFE Server Tab>.
• iqt -pE IpfeOption>IpfeOption_$(hostname)
• iqt -pE IpfeOption>IpListTsa_$(hostname)
• Screenshot of IPFE, and then Configuration, and then Options.
• tr.cat ipfe.STK>ipfeSTK_$(hostname)
• ifconfig>ifconfig_$(hostname)
1. Recovery:
1. To correct configuration errors:
• Read and understand the alarm text. This should have sufficient information to
diagnose the configuration error. As a last resort.
• Navigate to IPFE, and then Configuration, and then Options.
• Check the IPFE-A1 and IPFE-A2 IP address. You also need to check IPFE-B1
and IPFE-B2 IP addresses, if you have full 4 IPFE servers. You should select
INTERNALIMI IP address here. All servers have to be the same IP type.
• Check the State Sync TCP Port. We suggest you always use the default
19041, if possible.
2. Ping the local IMI IP address.
3. Reboot the IPFE servers, if you have permission to do so.
4. If the alarm is still there, it is recommended to contact My Oracle Support for
assistance. Collect this data first:
• Screenshots for Configuration, and then Network, and then Devices All
IPFE Server tab and IPFE, and then Configuration, and then Options.
• ifconfig>ifconfig_$(hostname)
Description:
The IPFE was unable to synchronize state data with its mate. This alarm is generated
when the IPFE server missed the heartbeat messages from its mate, or if the mate is
unavailable for any reason.
3-4
Chapter 3
IP Front End, IPFE (5000-5999)
This alarm is normal when one IPFE of a pair is taken down for maintenance. In this
case, the alarm is guaranteed.
If the alarm is not generated, this indicates the IPFE has detected that its mate is out
of service.
DSR currently supports, at most, four IPFE servers, which are named IPFE-A1, IPFE-
A2, IPFE-B1, and IPFE-B2 in the IPFE, and then Configuration, and then Options
tab. You can configure IPFE-A1 and IPFE-A2 servers only in the small DSR system
and you can add IPFE-B1 and IPFE-B2 for a big size DSR, which depends on the
needs. The IPFE-A1 and IPFE-A2 are configured as mated (IPFE-B1 and IPFE-B2
are mated, if configured). The heartbeat message exchanges between the mated
IPFE servers once every 500ms. If, for any reason, the IPFE server missed its mate's
heartbeat message, alarm 5003 is raised. A few typical reasons are:
• Mate server is down
• Network connectivity issue
• Latency between the IPFEs
• High CPU load on the IPFE causing internal software latency in the transmission
or receipt of a heartbeat message
Severity:
Critical
Instance:
One of the following strings:
• connect error - cannot connect to peer IPFE
• data read error - error reading data from peer IPFE
• data write error - error writing data to peer IPFE
Note:
If the is able to synchronize state data with its mate, this alarm will clear.
HA Score:
Normal
3-5
Chapter 3
IP Front End, IPFE (5000-5999)
OID:
ipfeIpfeStateSyncRunErrorNotify
Diagnostic Information:
The state synchronization data exchange is through the connection between IPFE
server mates (IPFE A1/A2 IP or B1/B2 IP, 19041, TCP). Wireshark can be used to
diagnose if there is an state sync heartbeat message sent and received.
1. Recovery:
1. Check IPFE server configurations by navigating to IPFE, and then Configuration,
and then Options and checking the IPFE server IP address. Select the IMI IP
address. The Default State Sync TCP port number is 19041. If this port number is
configurable in your version of the IPFE, then do not change it from the default.
2. Check the Mated IPFE connectivity.
• ssh to IPFE-A1. ssh admusr@<IPFE-A1 XMI IP address>
• ping <IPFE-A2 IMI Address>
• telnet <IPFE-A2 IMI Address> 19041
• ssh to IPFE-A2 to ping/telnet IPFE-A1
• ssh to IPFE-B1 to ping/telnet IPFE-B2
• ssh to IPFE-B2 to ping/telnet IPFE-B1
• If the mated IPFE servers are reachable to each other, go to step 3
3. Reboot the IPFE servers, one by one, if possible.
a. Navigate to Status & Manage, and then Server.
b. Select the IPFE server and click Restart.
The Are you sure you want to restart application software on the
following server(s)? <server name> warning message displays.
3-6
Chapter 3
IP Front End, IPFE (5000-5999)
c. Click OK to continue.
d. If rebooting does not solve the issue or you are not allowed to reboot the IPFE
server, go to the next step.
4. Do CPU and userspace performance diagnostics using the commands: top and
mpstat -P ALL.
5. For further assistance, it is recommended to contact My Oracle Support for
assistance. Collect this data first:
• Screenshots of Configuration, and then Network, and then Devices All IPFE
Server tab and IPFE, and then Configuration, and then Options.
• ifconfig>ifconfig_$(hostname)
• (iqt -E IpfeOption ; iqt -E IpListTsa ; ) > ipfeconfig_$(hostname)
• netstat -anop | grep 19041>netstat_$(hostname)
Description:
This alarm indicates misconfiguration of the Target Set due to manual changes to the
configuration database or configuration data importing errors. One or more of the IP
addresses configured for the application servers is not valid.
Severity:
Critical
Instance:
tsa N address misconfiguration where N is 1-16
HA Score:
Normal
OID:
ipfeIpfeIpTablesConfigErrorNotify
1. Recovery:
1. Navigate to IPFE, and then Configuration, and then Options.
Note:
When the target set address is configured correctly, this alarm clears.
3-7
Chapter 3
IP Front End, IPFE (5000-5999)
4. Ensure there is at least one application server IP address configured in the Target
Set IP List for the TSA.
5. Repeat for each TSA on the Target Set screen.
Description:
The IPFE has received a heartbeat packet from the application server that
indicates the application server is unwilling to accept new connections. However, the
application server continues to process existing connections. The application server
sends a stasis heartbeat message for the following reasons:
• The application server has reached its maximum number of active Diameter
connections
• The application server is congested. The application server also raises 22200 -
MP CPU Congested.
Severity:
Minor
Instance:
IP address of the application server in stasis
HA Score:
Normal
OID:
ipfeIpfeBackendInStasisNotify
Cause:
The application server has reached is maximum resource capacity.
When one or more of the DAMPs in the cluster reaches its capacity. The DAMP
servers that reach their capacity send Stasis messages to the IPFE servers.
3-8
Chapter 3
IP Front End, IPFE (5000-5999)
When the IPFE servers received this stasis message, the IPFE will:
• Raise this 5005 alarm.
• Keep processing the existing connection to this stasis DAMP server.
• Route any NEW connection (TCP SYN, SCTP INIT) to other un-stasis servers in
the cluster.
The IPFE clears this alarm when the IPFE server receives no more stasis message
from the DAMP servers.
It usually means more back-end DAMP servers are required to extend the capacity
when this alarm displays. Contact the Oracle support team to help diagnose the issue.
Diagnostic Information:
Collect following data before contacting Oracle Support:
1. Export the alarm history.
2. iqt -pE IpfeOption>IpfeOption_$(hostname)
3. iqt -pE IpListTsa>IpListTsa_$(hostname)
4. ipfe.STK>ipfeStk_$(hostname)
5. Screenshot of Diameter, and then Maintenance, and then DA-MPs, and then
DA-MP Connectivity.
1. Recovery:
• When the IPFE receives heartbeats from the application server indicating it is
willing to accept new connections, this alarm clears.
3-9
Chapter 3
IP Front End, IPFE (5000-5999)
Description:
IPFE was unable to read from an ethernet device.
Note:
If IPFE is able to read from the ethernet device, this alarm clears.
Severity:
Critical
Instance:
pcap <ethernet device name> or network interface devices added or removed
HA Score:
Degraded
OID:
ipfeIpfeEtherDeviceReadErrorNotify
Cause:
For an old IPFE version, restart IPFE to collect the data for the DSR reconfiguration
like a new added Ethernet card or a deleted bond.
1. Recovery
1. Navigate to Status & Manage, and then Server.
2. Select the IPFE server and click Restart.
The Are you sure you want to restart application software on the following
server(s)? <server name> warning message displays.
3. Click OK to continue.
Description:
Traffic statistics reveal an application server is processing lower than average load.
For example, if a TSA has three application servers, but the IPFE has only two
connections open, then one of the application servers receives no traffic and thus is
considered "underloaded."
3-10
Chapter 3
IP Front End, IPFE (5000-5999)
Severity:
Minor
Instance:
IP address of the application server
HA Score:
Normal
OID:
ipfeIpfeBackendUnderloadedNotify
Cause:
The IPFE has an algorithm to calculate the average traffic load of the DA-MP
application servers, at times the traffic on a DA-MP server may outside of the average
range. When an IPFE detects DA-MPs traffic is unbalanced and processing a lower
than average load, the IPFE server displays the 5007 alarm.
Few of the causes the IPFE to raise this alarm are:
• A new DA-MP server has just been added to a cluster.
• A DA-MP has just been stopped for maintenance or some other reason.
• The activated traffic rate is to low.
These alarms are not harmful to the system, and indicates the IPFE traffic on a DA-
MP server is imbalanced for some reason. There is no impact to traffic or connections
and this alarm does not cause disconnection or congestion. As new connections get
established, and statistics indicate the server is no longer under loaded, alarm 5007
gets cleared.
Diagnostic Information:
Collect following data before contacting My Oracle Support for assistance.
1. Export alarm history.
2. grep * /proc/net/xt_recent* > xt_recent1_$(hostname)
3. grep * /proc/net/xt_recent*/*> xt_recent2_$(hostname)
4. tr.cat ipfe.STK>ipfeSTK_$(hostname)
5. iqt -pE IpfeOption>IpfeOption_$(hostname)
6. iqt -pE IpListTsa>IpListTsa_$(hostname)
1. Recovery:
1. None required. Underloaded application servers do not impact traffic processing.
This alarm clears when traffic statistics reveal the application server is no longer
underloaded.
2. It is recommended to contact My Oracle Support if further assistance is needed.
3-11
Chapter 3
IP Front End, IPFE (5000-5999)
Description:
Traffic statistics reveal an application server is processing higher than average load
and does not receive new connections.
Severity:
Minor
Instance:
IP address of the overloaded application server.
Note:
When traffic statistics indicate the application server is no longer
overloaded, this alarm clears.
HA Score:
Normal
OID:
ipfeIpfeBackendOverloadedNotify
Cause:
The IPFE has an algorithm to calculate the average traffic load of the DA-MP
application servers. At times the traffic on a DA-MP server reaches outside of the
average range. When an IPFE detects DA-MPs traffic is unbalanced and processing a
higher than average load, the IPFE server displays the 5008 alarm.
Few of the causes for IPFE to raise this alarm are:
• A new DA-MP server has just been added to a cluster.
• A DA-MP has just been stopped for maintenance or some other reason.
• The activated traffic rate is to high.
These alarms are not harmful to the system, and indicate the IPFE traffic on a DA-MP
server is unbalanced for some reason. There is no impact to traffic or connections
and this alarm does not cause disconnection or congestion. As new connections are
established, and statistics indicate the server is no longer overloaded, alarm 5008
clears.
Diagnostic Information:
Collect the following DATA before contacting My Oracle Support for assistance.
1. Export alarm history.
2. grep * /proc/net/xt_recent* > xt_recent1_$(hostname).
3-12
Chapter 3
IP Front End, IPFE (5000-5999)
1. Recovery:
1. IPFE monitors traffic statistics and does not assign connections to the overloaded
application server until statistics indicate the server is no longer overloaded.
2. Check the status of the application servers by navigating to the Status & Manage,
and then Server page.
3. Consult the application server's documentation for recovery steps.
4. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Through monitoring of the application servers, the server learns no server in a target
set is available. The associated measurement, TxReject, also shows counts (refer to
the DSR Measurements Reference for details about this measurement). This alarm
can be triggered during configuration of the IPFE when the target set address has
been configured, but application servers have not yet been added to the target set.
Setting the Monitoring Connection Timeout to a value less than 2 seconds is the
primary cause of this alarm. It is recommended to leave this setting on the default of
3 seconds. Do not set to 1 second. Later releases prohibit this from being set to 1
second.
Each target set is configured with at least one backend application server (DAMP).
The IPFE raises the 5009 alarm when the IPFE detects no DAMP is live. The IPFE
detects the DAMP liveliness by receiving the DAMP heartbeat on time.
Severity:
Critical
Instance:
tsa N has no available servers where N is 1-16
Note:
When at least one application server in a target set becomes available, this
alarm clears.
HA Score:
Normal
3-13
Chapter 3
IP Front End, IPFE (5000-5999)
OID:
ipfeIpfeNoAvailableAppServersNotify
Cause:
Setting the Monitoring Connection Timeout to a value less than 2 seconds is the
primary cause of this alarm. It is recommended to leave this setting on the default of
3 seconds. Do not set to 1 second. Later releases prohibit this from being set to 1
second.
Each target set is configured with at least one backend application server (DAMP).
The IPFE raises the 5009 alarm when the IPFE detects no DAMP is live. The IPFE
detects the DAMP liveliness by receiving the DAMP heartbeat on time. The following
screen shows the IPFE monitoring the DAMP XSI port 9675 and the heartbeat is
received every 3 seconds through this port.
When the IPFE does not receive the heartbeat from a single backend DAMP the IPFE
raises alarm 5001. When the IPFE does not receive the heartbeat for all backend
DAMPs in its TSA List, the IPFE raises the alarm 5009.
When 5009 alarm raises, the IPFE is not able to route the connection to a backend
DAMP server. This alarm is critical.
For example:
TSA1 10.240.10.162 has three backend DAMPs (DAMP1-XSI2-10.240.10.1, DAMP2-
XSI2-10.240.10.2, and DAMP3-XSI2-10.240.10.3), when IPFE is not able to receive
the heartbeat in time from DAMP1, alarm 5001 raises from its active IPFE server.
When IPFE misses all three DAMP heartbeats, alarm 5009 raises from its active IPFE
server.
3-14
Chapter 3
IP Front End, IPFE (5000-5999)
Diagnostic Information:
The Wireshark is the normal tool to monitor if the DAMP is sending a heat beat to
IPFE. Follow these steps to diagnose the issues:
1. From the SO GUI, navigate to IPFE, and then Configuration, and then Target
Sets, and then TSA#, and then +; at least one DAMP server XSI IP should be
present. If yes, go to step 2.
2. Log into the IPFE server. - ping <the DAMP server XSI IP> - telnet
<the DAMP server XSI IP> <monitoring port, default 9675>. If
fail, go to step 3.
3. ssh admusr@<DAMP server XMI>. Run the sudo netstat -anop | grep
<monitoring port, default 9675> command to see if there is a TCP
listen socket on that DAMP XSI IP. If no, check the configuration of the DAMP. If
yes, check the DAMP XSI network (hardware and software).
1. Recovery:
1. Make sure the Monitoring Connection Timeout setting is not less than 2 seconds.
Change to a higher value, if required
2. From the SO GUID, navigate to IPFE, and then Configuration, and then Target
Sets. At least on DAMP server XSI IP address should display.
3. Log into the IPFE server.
• ssh to admusr@ @<IPFE XMI IP>
• ping <the DAMP server XSI IP>
• telnet <the DAMP server XSI IP> <monitoring port, default 9675>
3-15
Chapter 3
IP Front End, IPFE (5000-5999)
The telnet terminal prints gibberish at even intervals. These are the raw
heartbeat messages. If you see nothing, then the DSR is not sending
hearbeats.
• ssh to admusr@ @<DAMP server XMI>
• sudo netstat -anop | grep <monitoring port, default 9675> to see if there is a
TCP listen socket on the DAMP XSI IP
If no, check the configuration of the DAMP
If yes, check the DAMP XSI network (switch/firewall...)
4. If application servers have been configured correctly for the target set and the
application server status is healthy, it is recommended to contact My Oracle
Support for assistance. Collect this data first:
• Screenshot of IPFE, and then Configuration, and then Target Sets edit
screen.
• ifconfig>ifconfig_$(hostname)
• cat /etc/sysconfig/network > network_$(hostname)
• cat /etc/modeprobe.d/bnx2x.conf > bnx2x.conf_$(hostname)
• cat /etc/sysconfig/network-scripts/ifcfg-eth01
Description:
The IPFE received an unknown error parsing Linux iptables output. This internal
software error is generated when the iptables kernel module is updated and provides
an error the IPFE wasn't coded to handle. It occurs during startup, if it occurs at all.
Severity:
Critical
Instance:
error parsing iptables output
HA Score:
Normal
OID:
ipfeIpfeErrorParsingIptablesOutputNotify
1. Recovery:
• The alarm clears when the kernel output from the iptables command is parsable.
If the problem persists, collect the following data and it is recommended to contact
My Oracle Support for assistance.
• From the active NO/SO GUI, navigate to Status & Manage, and then Server.
3-16
Chapter 3
IP Front End, IPFE (5000-5999)
• From the Server Status screen, select the IPFE to stop (as it occurs during
startup) and click Stop.
• Log into the IPFE blade as root.
• Make a directory for holding data: # mkdir /var/TKLC/db/filemgmt/
<data_collection_directory>
• Change to that directory.
• Issue the following commands with root account:
# /sbin/iptables -vxZ -t filter -nL > iptables_filter.txt
# /sbin/iptables -vxZ -t mangle -nL > iptables_mangle.txt
# /sbin/ip6tables -vxZ -t filter -nL > ip6tables_filter.txt
# /sbin/ip6tables -vxZ -t mangle -nL > ip6tables_mangle.txt
• tar and compress the directory.
• From the active NO/SO GUI, navigate to Status & Manage, and then Server
and restart IPFE.
Description:
An internal software error. An IPFE attempt to interact with the TPD operating system
has produced a fatally abnormal result (e.g., no network interfaces are provisioned on
the system). This alarm is raised during startup by the following conditions:
• The IPFE cannot write to its Ethernet devices (denoted by the instances, error
opening ethernet listeners or no network cards found).
• The IPFE receives an unknown error when accessing its Ethernet devices.
• The issuance of the service network restart command.
• The IPFE cannot assign Ethernet device queues to certain CPUs, which is
denoted by the instance, Cannot update /proc/irq/N/smp_affinity setting.
Severity:
Critical
Instance:
Description of the problem.
• Error opening ethernet listeners
• No network cards found
• Cannot update /proc/irq/N/smp_affinity setting
• System has less than 16 CPUs
3-17
Chapter 3
IP Front End, IPFE (5000-5999)
Note:
The IPFE detects if it has been installed on a virtual machine and will
not raise this alarm.
HA Score:
Normal
OID:
ipfeIpfeSystemErrorNotify
1. Recovery:
1. If the IPFE is able to use its ethernet interfaces, this alarm will clear. If this alarm
was generated by issuing a service network restart command, it should clear
within 10 seconds. If it does not clear, restart the IPFE process:
a. Select Status & Manage, and then Server.
b. Select the IPFE server and click Restart.
The Are you sure you want to restart application software on the
following server(s)? <server name> warning message displays.
c. Click OK to continue.
2. If the alarm still does not clear, check the Ethernet devices and CPUs.
3. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Heartbeats to monitor the liveness of a signaling interface have timed out. IPFE
always monitors the working condition of its mate signaling interfaces(XSI) as an
entirely separate monitoring mechanism to the synchronization channel. This is done
by the IPFE server sends the heartbeat message to its mate through the signaling
interfaces(XSI) using the default UDP port 19041. If the heartbeat is not received in
3000ms, then the IPFE server assumes the signaling interface is out of service, and
takes over traffic from its mate. At the same time the IPFE raises the alarm 5012 .
Severity:
Critical
Instance:
The name of the Ethernet interface affected, for example, bond0.5.
HA Score:
Degraded
3-18
Chapter 3
IP Front End, IPFE (5000-5999)
OID:
ipfeIpfeSignalingInterfaceNotify
Cause:
Following is the example for the heartbeat message exchange between the IPFE
mates.
Diagnostic Information:
This alarm is normal for the situation where one IPFE of a mated pair has been taken
down for maintenance. This alarm only needs to be acted upon if it is raised when
both IPFEs are expected to be available.
1. From the alarm report to determine the issue interface (eth01, bond0.313 and
so on). For example, when the alarm instance shows: IPFEA1:bond0.313. The
issue interface shall be IPFEA2 (mate),bond0.313.
2. Then using the Wireshark to monitor if the Heartbeat messages is sent from
IPFEA2, bond0.313 (no need to look into the message). If no, the issue is on
IPFEA2. If yes, the issue shall be in the network.
1. Recovery:
1. Check if any manual configuration changes have been executed that removed or
reset interfaces.
2. Diagnose hardware failure of interfaces, switch failure, or network outage when
the issue is on the network.
3. Review currently active platform alarms.
4. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
IPFE has seen traffic in excess of Global Packet Rate Limit and is dropping packets
to throttle the traffic. To protect the DSR, IPFE defines a Global Packet Rate Limit
set as a ingress signaling traffic rate throttle. The packet rate is accounted for on
a per-local-port bases, thus each separate DSR listening port can receive each the
default of 500,000 packets/second. When the IPFE is processing traffic in excess of
this rate, the IPFE throttles the traffic by smoothly dropping packets in the manner of
3-19
Chapter 3
IP Front End, IPFE (5000-5999)
an overloaded border router. The default value of this rate throttle is 500,000 packets/
second.
When traffic is approaching or exceeding its overload capacity, the alarm 5100 is
raised and does not drop the packets. But when the traffic reaches this throttle, IPFE
drops the packets
Severity:
Critical
Instance:
The number of packets that have been throttled
HA Score:
Degraded
OID:
ipfeIpfeThrottlingTrafficNotify
Cause:
When traffic is approaching or exceeding its overload capacity, the alarm 5100 is
raised and does not drop the packets. But when the traffic reaches this throttle, IPFE
drops the packets.
Diagnostic Information:
Refer to the IPFE and connection performance to make further investigation.
3-20
Chapter 3
IP Front End, IPFE (5000-5999)
1. Recovery:
1. If no packets have been dropped for five seconds, the alarm clears.
2. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Total IPFE signaling traffic rate is approaching or exceeding its engineered capacity.
IPFE defined a engineering capacity to monitoring the ingress signaling traffic rate.
This alarm is raised when the total IPFE signaling traffic rate is approaching or
exceeding its engineered capacity. This alarm is different to the alarm 5013, No
packages drop at this point.
The severity thresholds are:
• Minor: set at 245 MB/second, clear at 220 MB/second
3-21
Chapter 3
IP Front End, IPFE (5000-5999)
Severity:
Minor, Major, Critical
Instance:
N/A
Note:
If the signaling traffic declines below the clear threshold, the alarm clears.
HA Score:
Normal
OID:
ipfeIpfeTrafficOverloadNotify
Cause:
The severity thresholds are:
• Minor: set at 245 MB/second, clear at 220 MB/second
• Major: set at 327 MB/second, clear at 302 MB/second
• Critical: set at 409 MB/second, clear at 384 MB/second
Diagnostic Information:
Refer to the KPI to check the IPFE data rate:
1. Recovery:
1. The application is in excess of its design parameters, and may exhibit traffic
loss if an additional failure occurs. Consider expanding system to accommodate
additional capacity.
2. It is recommended to contact My Oracle Support if further assistance is needed.
3-22
Chapter 3
IP Front End, IPFE (5000-5999)
Description:
CPU utilization is approaching maximum levels.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
ipfeIpfeCpuOverloadNotify
1. Recovery:
Description:
Disk space utilization is approaching maximum levels.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
ipfeIpfeDiskUsageNotify
1. Recovery:
3-23
Chapter 3
OAM (10000-10999)
Description:
IPFE memory utilization is approaching maximum levels.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
ipfeIpfeMemoryOverloadNotify
1. Recovery:
OAM (10000-10999)
This section provides information and recovery procedures for OAM alarms, ranging
from 10000-10999.
Description:
The database backup has started.
Severity:
Info
Instance:
GUI
HA Score:
Normal
3-24
Chapter 3
OAM (10000-10999)
OID:
tekelecBackupStartNotify
1. Recovery:
Description:
Backup completed
Severity:
Info
Instance:
GUI
HA Score:
Normal
OID:
tekelecBackupCompleteNotify
1. Recovery:
• No action required.
Description:
The database backup has failed.
Severity:
Info
Instance:
N/A
HA Score:
Normal
3-25
Chapter 3
OAM (10000-10999)
OID:
tekelecBackupFailNotify
1. Recovery:
Description:
The database restoration has started.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
tekelecRestoreStartNotify
1. Recovery:
• No action required.
Description:
The database restoration is completed.
Severity:
Info
Instance:
N/A
HA Score:
Normal
3-26
Chapter 3
OAM (10000-10999)
OID:
tekelecRestoreCompleteNotify
1. Recovery:
• No action required.
Description:
The database restoration has failed.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
tekelecRestoreFailNotify
1. Recovery:
Description:
Database provisioning has been manually disabled.
Severity:
Minor
Instance:
N/A
HA Score:
Normal
3-27
Chapter 3
OAM (10000-10999)
OID:
tekelecProvisioningManuallyDisabledNotify
1. Recovery:
• No action required.
Description:
The configuration and provisioning databases are not yet synchronized. The 10009
alarm raises when DB re-initialization is attempted but fails. The re-initialization
usually happens when transitioning to A state (one of the procmgr states, can get
it from the pl command). DB re-initialization fails because the remote server is not in
the correct state, for example, it is not in the OOS state.
This alarm can also be observed during some DSR patch installations after the
DB replication is inhibited. As long as this alarm is cleared (NOT stuck) after DB
replication is allowed, it is normal behavior and we expect to see the 10009 alarm
when applying a patch.
Severity:
Critical
Instance:
N/A
HA Score:
Failed
OID:
oAGTCfgProvDbNoSync
Diagnostic Information:
Perform the following to diagnose the alarm:
• Examine the /var/TKLC/appw/logs/Process/apwSoapServer.log file on
primary NO and possibly the remote server to investigate the reasons for failure.
• Software release information.
1. Recovery:
1. Monitor the replication status by navigating to Status & Manage, and then
Replication GUI.
2. If alarm persists immediately after an upgrade, reboot the server once using the
sudo init 6 command on the effected server.
3. If alarm persists for more than one hour, it is recommended to contact My Oracle
Support if further assistance is needed.
3-28
Chapter 3
OAM (10000-10999)
Description:
The stateful database is not synchronized with the mate database.
Severity:
Minor
Instance:
N/A
HA Score:
Degraded
OID:
oAGTStDbNoSyncNotify
1. Recovery:
Description:
Monitoring for table cannot be set up.
Severity:
Major
Instance:
N/A
HA Score:
Degraded
OID:
oAGTCantMonitorTable
1. Recovery:
3-29
Chapter 3
OAM (10000-10999)
Description:
The responder for a monitored table failed to respond to a table change.
Severity:
Major
Instance:
N/A
HA Score:
Degraded
OID:
oAGTResponderFailed
1. Recovery:
Description:
An application restart is in progress.
Severity:
Minor
Instance:
N/A
HA Score:
Normal
OID:
oAGTApplSWDisabledNotify
1. Recovery:
3-30
Chapter 3
OAM (10000-10999)
Description:
Database backup failed.
Severity:
Minor
Instance:
N/A
HA Score:
Normal
OID:
apwBackupFailureNotify
1. Recovery:
1. Alarm clears if a backup (Automated or Manual) of the same group data is
successful.
2. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Database backup failed.
Severity:
Minor
Instance:
HA Score:
Normal
OID:
awpss7TekelecResourceAuditFailureNotify
1. Recovery:
3-31
Chapter 3
OAM (10000-10999)
Description:
An error occurred in the deployment of a network. A SOAP request from route audit
thread of apwSoapServer process to the TpdProvD service failed to delete the old
record when insert new route or update existed network route. The audit happens
every minute. The alarm gets cleared when insert new route or update existed
network route record is successful.
Severity:
Minor
Instance:
Route ID failed to deploy
HA Score:
Normal
OID:
awpss7TekelecRouteDeploymentFailedNotify
1. Recovery:
1. Check the following on the affected server:
• See if any network route is configured on the server (output of 'route'
command).
• Check the iqt -Ep NetworkRoute from active NOAM server to see if any
network route is configured.
• Check the iqt -Ep ResourceAudit.1 from active NOAM server to see if
any network route is in audit.
• Check if the apwSoapServer service is running (output of pl command).
• Check if the tpdProvd service is running (output of top or ps command).
• Check if there is any SOAP error in the following log files:
– /var/TKLC/appw/logs/Process/apwSoapServer.log
– /var/TKLC/log/tpdProvd/tpdProvd.log
• Try to identify if the problem occurred in tpdProvd or apwSoapServer.
2. Try restarting the apwSoapServer service on the affected server.
3. If the alarm persists, collect trace list in Diagnostic Information and it is
recommended to contact My Oracle Support if further assistance is needed.
3-32
Chapter 3
OAM (10000-10999)
Description:
An error occurred in the discovery of network routes. A SOAP request from route
audit thread of apwSoapServer process to the TpdProvD service failed to get the list
and details of the configured network routes. The audit happens every minute. The
alarm gets cleared when the route information is received from the TpdProvD service.
Severity:
Minor
Instance:
N/A
HA Score:
Normal
OID:
awpss7TekelecRouteDiscoveryFailedNotify
1. Recovery:
1. Check the following on the affected server:
• See if any network route is configured on the server (output of 'route'
command)
• Check if the apwSoapServer service is running (output of 'pl' command)
• Check if the tpdProvd service is running (output of 'top' or 'ps' command)
• Check if there is any SOAP error in the following log files:
– /var/TKLC/appw/logs/Process/apwSoapServer.log
– /var/TKLC/log/tpdProvd/tpdProvd.log
• Try to identify if the problem occurred in tpdProvd or apwSoapServer
2. Try restarting the apwSoapServer service on the affected server.
3. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
A suitable device could not be identified for the deployment of a network route.
3-33
Chapter 3
OAM (10000-10999)
Severity:
Minor
Instance:
Route ID that failed to deploy
HA Score:
Normal
OID:
awpss7TekelecNoRouteDeviceNotify
Cause:
AppWorks audit tries to insert, edit, or delete a route for a device which does not exist.
The audit happens every minute. The alarm clears when the AppWorks audit is able
to insert, edit, or delete the route.
Diagnostic Information:
Check the following on the affected server:
• Check the iqt -Ep ResourceAudit.1 from active NOAM server to see if any
network route is in audit.
• Find the device for the route.
• If the device specified is other than auto, check the user interface to see if the
specified device is present.
• Check apwSoapServer logs for more information.
1. Recovery:
1. If the device specified is AUTO:
a. Deploy the route on a specific device instead of using the “AUTO” device.
b. Ensure every server in the server group has a usable device for the selected
gateway.
2. If the device specified is deleted:
a. Recreate the missing device.
b. Wait for audit to re-run which shall configure the route and clear the alarm.
Description:
An error occurred in the deployment of a network device.
Severity:
Minor
3-34
Chapter 3
OAM (10000-10999)
Instance:
Device name that failed to deploy
HA Score:
Normal
OID:
awpss7TekelecDeviceDeploymentFailedNotify
Cause:
• Device Audit attempted to update a configured network interface device in the
system configuration using the TpdProvD soap service which returned failure.
• Apart from any platform related issue like TpdProvD SOAP service not being
ready, invalid input is the main cause of this alarm.
Diagnostic Information:
If device is added through one of the configuration interfaces, verify the device
configuration file, /etc/sysconfig/network-scripts/ifcfg-<dev> is not
already present.
If the device is edited through one of the configuration interfaces, verify the device
configuration file, /etc/sysconfig/network-scripts/ifcfg-<dev> is present
and is not RCS locked.
To determine the cause, look for errors in following files:
• /var/TKLC/log/tpdProvd/tpdProvd.log
• /var/TKLC/appw/logs/Process/apwSoapServer.log
1. Recovery:
1. If device is added using one of the configuration interfaces, delete any /etc/
sysconfig/network-scripts/ifcfg-<dev> for the device if present.
2. If the device is edited using one of the configuration interfaces:
a. if the /etc/sysconfig/network-scripts/ifcfg-<dev> is missing,
then add the device using netAdm command.
b. if the /etc/sysconfig/network-scripts/ifcfg-<dev> is RCS locked,
use rcstool command to RCS unlock the file.
3. Delete the device, wait for the alarm to clear and then add it back.
4. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
An error occurred in the discovery of network devices. No network device could not
found; more specifically, if the /etc/sysconfig/network scripts directory could
not be read by the apwSoapServer audit; or 1 named network device could not be
3-35
Chapter 3
OAM (10000-10999)
discovered on the system, more specifically, if the /sbin/ip addr show <dev>
command fails when run from the apwSoapServer audit.
Severity:
Minor
Instance:
N/A
HA Score:
Normal
OID:
awpss7TekelecDeviceDiscoveryFailedNotify
1. Recovery:
1. Correct any directory or file permissions in the /etc/sysconfig/network-
scripts/* directory. It should be 0755 or relaxed.
2. Check if the named device interface is configured, that is, the interface files (ifcfg-
<dev>) are present in the /etc/sysconfig/network scripts directory.
3. If the physical device is present on the system, but it does not show up in the
output of ifconfig command, then use the netAdm command to add the device to
the platform configuration.
4. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
The server group has received the maximum number of allowed HA role warnings.
Severity:
Minor
Instance:
Affected Server Group name
HA Score:
Normal
OID:
oAGTSgMaxAllowedHARoleWarnNotify
1. Recovery:
1. Log into the SO GUI and navigate to the Status & Manage, and then HA.
3-36
Chapter 3
OAM (10000-10999)
2. Click Edit and change the Max Allowed HA role of the current Standby SOAM to
Active.
3. If you cannot perform the HA switchover, log into the server (Status & Manage,
and then Server).
4. Select the active server and click Restart to restart the server.
HA switchover occurs.
5. Verify the switchover was successful from the active SOAM GUI, or log into the
active and standby SOAMs and execute this command:
# ha.mystate
Description:
The standby server has temporarily degraded while the new active server stabilizes
following a switch of activity.
Severity:
Minor
Instance:
N/A
HA Score:
Degraded
OID:
hASbyRecoveryInProgressNotify
1. Recovery:
• No action required. The alarm clears automatically when the standby server
is recovered. This is part of the normal recovery process for the server that
transitioned to standby as a result of a failover.
Description:
The server is no longer providing services because application processes have been
manually stopped.
Severity:
Minor
3-37
Chapter 3
OAM (10000-10999)
Instance:
N/A
HA Score:
Normal
OID:
hAMtceStopApplicationsNotify
1. Recovery:
Description:
The applications on the standby server have not been restarted after an active-to-
standby transition since h_FailureCleanupMode is set to 0.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
failureRecoveryWithoutAppRestartNotify
1. Recovery:
3-38
Chapter 3
OAM (10000-10999)
Description:
Log files export operation has started.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
tekelecLogExportStartNotify
1. Recovery:
• No action required.
Description:
The log files export operation completed successfully.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
tekelecLogExportSuccessNotify
1. Recovery:
• No action required.
3-39
Chapter 3
OAM (10000-10999)
Description:
The log files export operation failed.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
tekelecLogExportFailedNotify
1. Recovery:
1. Verify the export request and try the export again.
2. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Log files export operation did not run; an export can only run on an active network
OAMP server.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
tekelecLogExportNotRunNotify
1. Recovery:
3-40
Chapter 3
OAM (10000-10999)
Description:
The performance data export remote copy operation failed.
Severity:
Info
Instance:
<Task ID>
Note:
<Task ID> refers to the ID column found in Status & Manage, and then
Tasks, and then Active Tasks.
HA Score:
Normal
OID:
tekelecExportXferFailedNotify
1. Recovery:
Description:
The log files export operation cancelled by user.
Severity:
Info
Instance:
<Task ID>
Note:
<Task ID> refers to the ID column found in Status & Manage, and then
Tasks, and then Active Tasks.
HA Score:
Normal
3-41
Chapter 3
OAM (10000-10999)
OID:
tekelecLogExportCancelledUserNotify
1. Recovery:
Description:
The log files export operation was cancelled because a scheduled export is queued
already.
Severity:
Info
Instance:
<Task ID>
Note:
<Task ID> refers to the ID column found in Status & Manage, and then
Tasks, and then Active Tasks.
HA Score:
Normal
OID:
tekelecLogExportCancelledDuplicateNotify
1. Recovery:
1. Check the duration and/or frequency of scheduled exports as they are not
completing before the next scheduled export is requested.
2. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
The log files export operation cancelled because the export queue is full.
Severity:
Info
3-42
Chapter 3
OAM (10000-10999)
Instance:
<Task ID>
Note:
<Task ID> refers to the ID column found in Status & Manage, and then
Tasks, and then Active Tasks.
HA Score:
Normal
OID:
tekelecLogExportCancelledQueueNotify
1. Recovery:
1. Check the amount, duration and/or frequency of scheduled exports to ensure the
queue does not fill up.
2. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
A duplicate scheduled log export task has been queued.
Severity:
Minor
Instance:
<Target ID>
Note:
<Target ID> refers to the scheduled task ID found by running a report from
Status & Manage, and then Tasks, and then Scheduled Tasks.
HA Score:
Normal
OID:
tekelecLogExportDupSchedTaskNotify
1. Recovery:
3-43
Chapter 3
OAM (10000-10999)
1. Check the duration and/or frequency of scheduled exports as they are not
completing before the next scheduled export is requested.
2. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
The log export queue is full
Severity:
Minor
Instance:
<Queue Name>
Note:
<Queue Name> refers to the name of the queue used for the export task ID
found by running a report from either Status & Manage, and then Tasks,
and then Active Tasks or Status & Manage, and then Tasks, and then
Scheduled Tasks.
HA Score:
Normal
OID:
tekelecLogExportQueueFullNotify
1. Recovery:
1. Check the amount, duration and/or frequency of scheduled exports to ensure the
queue does not fill up.
2. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
The certificate expires within 30 days.
Severity:
Minor
3-44
Chapter 3
OAM (10000-10999)
Instance:
<CertificateName>
HA Score:
Normal
OID:
certificateAboutToExpire
Cause:
The certificate is expired.
Certificate Management
The Certificate Management feature allows users to configure certificates for:
• HTTPS/SSL - Allows secure login without encountering messages about
untrusted sites
• LDAP (TLS) - Allows the LDAP server's public key to encrypt credentials sent to
the LDAP server
• TLS/DTLS over TCP/SCTP Transport - Allows transport layer security protocols
and encryption on a per connection basis at the application layer. For example,
DSR local and peer node connections
• Single Sign-On (SSO) - Allows users to navigate among several applications
without having to re-enter login credentials
• Certificate Authority (CA) - A digital certificate provided by a trusted source
used to make secure connections between a client and server
Note:
When setting up Certificate Management, you must first assign a system
domain name for the DNS configuration before importing any certificates.
If you allow a certificate to expire, the certificate becomes invalid, and you are no
longer able to run secure transactions on your website. The Certification Authority
(CA) prompts you to renew your SSL certificate before the expiration date.
Diagnostic Information:
Generating a Certificate Report
3-45
Chapter 3
OAM (10000-10999)
Note:
To select multiple server groups, press and hold Ctrl as you click to
select specific rows. Alternatively, if no servers are selected then all
server groups appear in the report.
3. Click Report.
4. Click Print to print the report, or click Save to save a text file of the report.
1. Recovery:
1. For details on DNS Configuration feature, see the DNS Configuration chapter in
Operation, Administration, and Maintenance (OAM) Guide.
2. For details on Certificate Management feature, see the Certificate Management
chapter in Operation, Administration, and Maintenance (OAM) Guide.
3. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
The certificate is expired.
Severity:
Major
Instance:
<CertificateName>
HA Score:
Normal
OID:
certificateExpired
Cause:
The certificate is expired.
Certificate Management
The Certificate Management feature allows users to configure certificates for:
• HTTPS/SSL - Allows secure login without encountering messages about
untrusted sites
• LDAP (TLS) - Allows the LDAP server's public key to encrypt credentials sent to
the LDAP server
3-46
Chapter 3
OAM (10000-10999)
Note:
When setting up Certificate Management, you must first assign a system
domain name for the DNS configuration before importing any certificates.
If you allow a certificate to expire, the certificate becomes invalid, and you are no
longer able to run secure transactions on your website. The Certification Authority
(CA) prompts you to renew your SSL certificate before the expiration date.
Diagnostic Information:
Generating a Certificate Report
Note:
To select multiple server groups, press and hold Ctrl as you click to
select specific rows. Alternatively, if no servers are selected then all
server groups appear in the report.
3. Click Report.
4. Click Print to print the report, or click Save to save a text file of the report.
1. Recovery:
1. For details on DNS Configuration feature, see the DNS Configuration chapter in
Operation, Administration, and Maintenance (OAM) Guide.
2. For details on Certificate Management feature, see the Certificate Management
chapter in Operation, Administration, and Maintenance (OAM) Guide.
3. It is recommended to contact My Oracle Support if further assistance is needed.
3-47
Chapter 3
OAM (10000-10999)
Description:
The certificate cannot be used because the certificate is not available yet.
Severity:
Major
Instance:
<CertificateName>
HA Score:
Normal
OID:
certificateCannotBeUsed
Cause:
The certificate cannot be used because the certificate is not available yet.
Certificate Management
The Certificate Management feature allows users to configure certificates for:
• HTTPS/SSL - Allows secure login without encountering messages about
untrusted sites
• LDAP (TLS) - Allows the LDAP server's public key to encrypt credentials sent to
the LDAP server
• TLS/DTLS over TCP/SCTP Transport - Allows transport layer security protocols
and encryption on a per connection basis at the application layer. For example,
DSR local and peer node connections
• Single Sign-On (SSO) - Allows users to navigate among several applications
without having to re-enter login credentials
• Certificate Authority (CA) - A digital certificate provided by a trusted source
used to make secure connections between a client and server
Note:
When setting up Certificate Management, you must first assign a system
domain name for the DNS configuration before importing any certificates.
If you allow a certificate to expire, the certificate becomes invalid, and you are no
longer able to run secure transactions on your website. The Certification Authority
(CA) prompts you to renew your SSL certificate before the expiration date.
Diagnostic Information:
Generating a Certificate Report
3-48
Chapter 3
OAM (10000-10999)
Note:
To select multiple server groups, press and hold Ctrl as you click to
select specific rows. Alternatively, if no servers are selected then all
server groups appear in the report.
3. Click Report.
4. Click Print to print the report, or click Save to save a text file of the report.
Recovery:
1. For details on DNS Configuration feature, see the DNS Configuration chapter in
Operation, Administration, and Maintenance (OAM) Guide.
2. For details on Certificate Management feature, see the Certificate Management
chapter in Operation, Administration, and Maintenance (OAM) Guide.
3. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Upgrade health check operation started.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
tekelecLogHealthCheckStart
1. Recovery:
• No action required.
3-49
Chapter 3
OAM (10000-10999)
Description:
Upgrade health check operation completed successfully.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
tekelecLogHealthCheckSuccess
1. Recovery:
• No action required.
Description:
Upgrade health check operation failed.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
tekelecLogHealthCheckFailed
1. Recovery:
• No action required.
3-50
Chapter 3
OAM (10000-10999)
Description:
Upgrade health check not run.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
tekelecLogHealthCheckNotRun
1. Recovery:
Description:
The server group upgrade operation has started.
Severity:
Info
Instance:
<ServerGroupName>
HA Score:
Normal
OID:
tekelecLogSgUpgradeStart
1. Recovery:
• No action required.
3-51
Chapter 3
OAM (10000-10999)
Description:
The server group upgrade operation has been cancelled due to validation failure.
Severity:
Info
Instance:
<ServerGroupName>
HA Score:
Normal
OID:
tekelecLogSgUpgradeCancelled
1. Recovery:
• No action required.
Description:
The server group upgrade operation completed successfully.
Severity:
Info
Instance:
<ServerGroupName>
HA Score:
Normal
OID:
tekelecLogSgUpgradeSuccess
1. Recovery:
• No action required.
3-52
Chapter 3
OAM (10000-10999)
Description:
The server group upgrade operation failed.
Severity:
Info
Instance:
<ServerGroupName>
HA Score:
Normal
OID:
tekelecLogSgUpgradeFailed
1. Recovery:
• No action required. Alarm 10134 - Server Upgrade Failed is raised for each server
in the server group that failed to upgrade. The alarm clears when the server
upgrades successfully.
Description:
The user cancelled the server group upgrade operation.
Severity:
Info
Instance:
<ServerGroupName>
HA Score:
Normal
OID:
tekelecLogSgUpgradeCancelledUser
1. Recovery:
• No action required.
3-53
Chapter 3
OAM (10000-10999)
Description:
Server group upgrade operation failed.
Severity:
Major
Instance:
<ServerGroupName>
HA Score:
Normal
OID:
tekelecLogSgUpgradeFailAlm
Recovery
Description:
The server upgrade operation has started.
Severity:
Info
Instance:
<HostName>
HA Score:
Normal
OID:
tekelecLogServerUpgradeStart
1. Recovery:
• No action required.
3-54
Chapter 3
OAM (10000-10999)
Description:
The server upgrade operation has been cancelled due to validation failure.
Severity:
Info
Instance:
<HostName>
HA Score:
Normal
OID:
tekelecLogServerUpgradeCancelled
1. Recovery:
• No action required.
Description:
The server upgrade operation completed successfully.
Severity:
Info
Instance:
<HostName>
HA Score:
Normal
OID:
tekelecLogServerUpgradeSuccess
1. Recovery:
• No action required.
3-55
Chapter 3
OAM (10000-10999)
Description:
The server upgrade operation failed.
Severity:
Info
Instance:
<HostName>
HA Score:
Normal
OID:
tekelecLogServerUpgradeFailed
1. Recovery:
• No action required. Alarm 10134 - Server Upgrade Failed is raised for each server
that failed to upgrade. The alarm clears when the server upgrades successfully.
Description:
The server upgrade operation failed.
Severity:
Major
Instance:
<HostName>
HA Score:
Normal
OID:
tekelecLogServerUpgradeFailAlm
1. Recovery:
1. If a server upgrade fails, this alarm clears when the server upgrades successfully.
Upgrade the server individually or as part of a server group or site upgrade.
If more than one server in the same server group or site fails to upgrade, the
server group and site upgrades may be useful because both methods will attempt
to upgrade all of the failed servers within the server group or site, respectively.
Upgrading all servers in a server group is useful if the server group has multiple
upgrade failures. Upgrading all servers in a site is useful if servers in multiple
server groups contained in a site have upgrade failures.
3-56
Chapter 3
OAM (10000-10999)
Note:
Servers cannot be selected across tabs. If there are servers in
multiple server groups, you must restart the server upgrade for
each additional Server Group tab, or perform a server group or site
upgrade.
Note:
The active server in the NO server group never upgrades automatically.
3-57
Chapter 3
OAM (10000-10999)
Note:
The Entire Site sub-tab only appears when the site contains more
than one server group.
c. Select the individual server group(s) then click the Upgrade Server Group
button to start the upgrade on the selected server group(s).
4. To upgrade entire sites:
a. Navigate to the Upgrade page (Administration, and then Software
Management, and then Upgrade).
b. Select the SOAM site tab associated with the server(s) that raised the alarm.
Remain on the Entire Site sub-tab.
Note:
The Entire Site sub-tab only appears when the site contains more
than one server group.
c. Click Site Upgrade to upgrade all server groups in the site. (Do not select any
server groups.)
Description:
Site upgrade operation started.
Severity:
Info
Instance:
<SiteName>
HA Score:
Normal
OID:
tekelecLogSiteUpgradeStart
1. Recovery:
• No action required.
3-58
Chapter 3
OAM (10000-10999)
Description:
Site upgrade cancelled - validation failed.
Severity:
Info
Instance:
<SiteName>
HA Score:
Normal
OID:
tekelecLogSiteUpgradeCancelled
1. Recovery:
• No action required.
Description:
Site upgrade operation completed successfully.
Severity:
Info
Instance:
<SiteName>
HA Score:
Normal
OID:
tekelecLogSiteUpgradeSuccess
1. Recovery:
• No action required.
3-59
Chapter 3
OAM (10000-10999)
Description:
Site upgrade operation failed.
Severity:
Info
Instance:
<SiteName>
HA Score:
Normal
OID:
tekelecLogSiteUpgradeFailed
1. Recovery:
• No action required. Alarm 10134 - Server Upgrade Failed is raised for each server
in the site that failed to upgrade. The alarm clears when the server upgrades
successfully.
Description:
Site upgrade cancelled by user.
Severity:
Info
Instance:
<SiteName>
HA Score:
Normal
OID:
tekelecLogSiteUpgradeCancelledUser
1. Recovery:
• No action required.
3-60
Chapter 3
OAM (10000-10999)
Description:
Site upgrade operation failed.
Severity:
Major
Instance:
<SiteName>
HA Score:
Normal
OID:
tekelecLogSiteUpgradeFailed
1. Recovery:
• No action required. Alarm 10134 - Server Upgrade Failed is raised for each server
in the site that failed to upgrade. The alarm clears when the server upgrades
successfully.
Description:
The login operation was successful.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
tekelecLoginSuccessNotify
1. Recovery:
• No action required.
3-61
Chapter 3
OAM (10000-10999)
Description:
The login operation failed
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
tekelecLoginFailedNotify
1. Recovery:
Description:
The logout operation was successful.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
tekelecLogoutSuccessNotify
1. Recovery:
• No action required.
3-62
Chapter 3
OAM (10000-10999)
Description:
User account has been disabled due to multiple login failures.
Severity:
Minor
Instance:
N/A
HA Score:
Normal
OID:
tekelecAccountDisabledNotify
1. Recovery:
Description:
SAML login successful.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
tekelecSamlLoginSuccessNotify
1. Recovery:
3-63
Chapter 3
OAM (10000-10999)
Description:
An attempt to log into the GUI via conventional login or via SSO login failed.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
tekelecSamlLoginFailed
1. Recovery:
1. Use correct username and password to log in.
2. For failed SSO login, verify SSO was properly configured. Collect logs and it is
recommended to contact My Oracle Support if the problem persists.
Description:
The remote database reinitialization is in progress. This alarm is raised on the active
NOAM server for the server being added to the server group.
Severity:
Minor
Instance:
<hostname of remote server>
HA Score:
Normal
3-64
Chapter 3
IDIH (11500-11549)
OID:
apwSgDbReinitNotify
1. Recovery:
1. Check to see that the remote server is configured.
2. Make sure the remote server is responding to network connections.
3. If this does not clear the alarm, delete this server from the server group.
4. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
SNMP trapping not configured for site.
SeverityL
Minor
Instance:
<Hostname>
HA Score:
Normal
OID:
apwSnmpTrappingNotConfiguredForSite
1. Recovery:
• The SNMP trap configuration is in SITE mode. Configure SNMP for the site
<Hostname> belongs to.
IDIH (11500-11549)
This section provides information and recovery procedures for IDIH alarms, which
range from 11500 to 11549.
Description:
IDIH trace has been suspended due to DA-MP (danger of) CPU congestion.
Severity:
Minor
3-65
Chapter 3
IDIH (11500-11549)
Instance:
N/A
HA Score:
Normal
OID:
eagleXgDiameterTracingSuspendedAlarmNotify
1. Recovery:
• No action required. Tracing will resume once the danger of CPU congestion
subsides.
Description:
Troubleshooting trace has been throttled on some DA-MPs due to IDIH TTR
bandwidth usage exceeding provisioned limit.
Severity:
Minor
Instance:
N/A
HA Score:
Normal
OID:
eagleXgDiameterTracingThrottledAlarmNotify
1. Recovery:
• No action required
Description:
A troubleshooting trace instance was started.
Severity:
Info
3-66
Chapter 3
IDIH (11500-11549)
Instance:
<TraceInstanceId>
HA Score:
Normal
OID:
eagleXgDiameterIDIHTraceStartedNotify
1. Recovery:
• No action required.
Description:
A troubleshooting trace instance was stopped.
Severity:
Info
Instance:
<TraceInstanceId>
HA Score:
Normal
OID:
eagleXgDiameterIDIHTraceStoppedNotify
1. Recovery:
• No action required.
Description:
An IDIH-Trace AVP has been received with an invalid format.
Severity:
Info
3-67
Chapter 3
IDIH (11500-11549)
Instance:
<TransConnName>
HA Score:
Normal
OID:
eagleXgDiameterInvalidIDIHTraceAvpNotify
1. Recovery:
1. If the message came from a peer that is not a DA-MP, verify the peer is not
modifying the AVP value (peers may retain the IDIH-Trace AVP unchanged, or
remove it entirely, at their discretion).
2. If the message came from a peer that is a DA-MP, it is recommended to contact
My Oracle Support if further assistance is needed.
Description:
A network trace could not be run at this site because the connection or peer
referenced by the trace scope value is not configured at this site. The trace still runs
at sites that have this entity configured.
Severity:
Info
Instance:
<TraceName>
HA Score:
Normal
OID:
eagleXgDiameterUnableToRunNetworkTraceAtThisSiteNotify
1. Recovery:
• No action required; the trace still runs at all sites that have the indicated object
configured at their site.
3-68
Chapter 3
IDIH (11500-11549)
Description:
An error occurred during configuration of the network trace. Please delete the trace
definition.
Severity:
Minor
Instance:
<TraceName>
HA Score:
Normal
OID:
eagleXgDiameterNetworkTraceConfigurationErrorNotify
1. Recovery:
Description:
An error occurred during configuration of the site trace. Please delete the trace
definition.
Severity:
Minor
Instance:
<TraceName>
HA Score:
Normal
OID:
eagleXgDiameterSiteTraceConfigurationErrorNotify
1. Recovery:
3-69
Chapter 3
SDS (14000-14999)
Description:
Network trace is not active on this site. A temporary error occurred during the
activation of the network trace.
Severity:
Minor
Instance:
<TraceName>
HA Score:
Normal
OID:
eagleXgDiameterNetworkTraceActivationErrorNotify
1. Recovery:
• No action required.
Description:
Unable to connect via ComAgent to remote DIH server with hostname.
Severity:
Minor
Instance:
String of Configured DIH HostName
HA Score:
Normal
OID:
eagleXgDiameterInvalidDihHostNameAlarmNotify
1. Recovery:
• No action required.
SDS (14000-14999)
This section provides information and recovery procedures for SDS alarms and
events, ranging from 14000-14999.
3-70
Chapter 3
SDS (14000-14999)
Description:
Provisioning interface is manually disabled.
Severity:
Critical
Instance:
N/A
HA Score:
Normal
OID:
sdsProvInterfaceDisabled
1. Recovery:
1. xxx
2. Enable the interface to clear the alarm.
Description:
No remote provisioning clients are connected.
Severity:
Major
Instance:
N/A
HA Score:
Normal
OID:
sdsProvNoRemoteConnections
1. Recovery:
• The alarm will clear when at least one remote provisioning client is connected.
3-71
Chapter 3
SDS (14000-14999)
Description:
Provisioning client connection initialization failed due to an error specified in additional
information. See trace log for details. (CID=<Connection ID>, IP=<IP Address>).
Severity:
Major
Instance:
N/A
HA Score:
Normal
OID:
sdsProvConnectionFailed
1. Recovery:
Description:
Both XML and SOAP provisioning client connection are disables since same port is
configured for both.
Severity:
Major
Instance:
N/A
HA Score:
Normal
OID:
sdsProvBothPortIdentical
1. Recovery:
3-72
Chapter 3
SDS (14000-14999)
Description:
Provisioning client connection established.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
sdsProvConnectionEstablished
1. Recovery:
Description:
Provisioning client connection terminated due to the error specified in additional
information.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
sdsProvConnectionTerminated
1. Recovery:
3-73
Chapter 3
SDS (14000-14999)
Description:
Provisioning client connection denied due to the error specified in additional
information.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
sdsProvConnectionDenied
1. Recovery:
Description:
Provisioning import throttled to prevent overrunning database service processes.
Severity:
Minor
Instance:
N/A
HA Score:
Normal
OID:
sdsProvImportThrottled
1. Recovery:
3-74
Chapter 3
SDS (14000-14999)
Description:
Provisioning import failed due to the initialization error specified in additional
information. See trace log for details.
Severity:
Major
Instance:
provimport
HA Score:
Normal
OID:
sdsProvImportInitializationFailed
1. Recovery:
Description:
Provisioning import failed due to the import file execution error specified in the
additional information. See the trace log for details.
Severity:
Major
Instance:
provimport
HA Score:
Normal
OID:
sdsProvImportGenerationFailed
1. Recovery:
3-75
Chapter 3
SDS (14000-14999)
Description:
Provisioning import operation failed due to the file transfer error specified in additional
information. See trace log for details.
Severity:
Major
Instance:
provimport
HA Score:
Normal
OID:
sdsProvImportTransferFailed
1. Recovery:
• Alarm clears automatically after 12 hours or when the file transfer completes
successfully.
Description:
Provisioning export failed due to the initialization error specified in the additional
information. See trace log for details.
Severity:
Major
Instance:
provexport
HA Score:
Normal
OID:
sdsProvExportInitializationFailed
3-76
Chapter 3
SDS (14000-14999)
1. Recovery:
Description:
Provisioning export operation failed due to the export file generation error specified in
the additional information. See trace log for details.
Severity:
Major
Instance:
provexport
HA Score:
Normal
OID:
sdsProvExportGenerationFailed
1. Recovery:
Description:
Provisioning export operation failed due to the file transfer error specified in the
additional information. See trace log for details.
Severity:
Major
Instance:
provexport
HA Score:
Normal
3-77
Chapter 3
SDS (14000-14999)
OID:
sdsProvExportTransferFailed
1. Recovery:
Description:
All files were imported successfully.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
sdsProvImportOperationCompleted
1. Recovery:
Description:
All scheduled exports completed successfully.
Severity:
Info
Instance:
N/A
HA Score:
Normal
3-78
Chapter 3
SDS (14000-14999)
OID:
sdsProvExportOperationCompleted
1. Recovery:
Description:
Remote Audit started and is in progress.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
sdsProvRemoteAuditStartedAndInProgressNotify
1. Recovery:
Description:
Remote audit aborted.
Severity:
Info
Instance:
N/A
HA Score:
Normal
3-79
Chapter 3
SDS (14000-14999)
OID:
sdsProvRemoteAuditAbortedNotify
1. Recovery:
Description:
Remote audit failed to complete.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
sdsProvRemoteAuditFailedToCompleteNotify
1. Recovery:
Description:
Remote Audit completed successfully.
Severity:
Info
Instance:
N/A
HA Score:
Normal
3-80
Chapter 3
SDS (14000-14999)
OID:
sdsProvRemoteAuditCompletedNotify
1. Recovery:
Description:
A pending NPA split has been deleted by the user before it could become active on its
start date.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
sdsProvNpaSplitPendingRequestDeleted
1. Recovery:
Description:
NPA Split activation failed. See trace log for details.
Severity:
Info
Instance:
N/A
HA Score:
Normal
3-81
Chapter 3
SDS (14000-14999)
OID:
sdsProvNpaSplitActivationFailed
1. Recovery:
Description:
NPA Split started and is active.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
sdsProvNpaSplitActivated
1. Recovery:
Description:
NPA split completion failed. See trace log for details.
Severity:
Info
Instance:
N/A
HA Score:
Normal
3-82
Chapter 3
SDS (14000-14999)
OID:
sdsProvNpaSplitCompletionFailed
1. Recovery:
Description:
NPA split completed.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
sdsProvNpaSplitCompleted
1. Recovery:
Description:
Previously blacklisted MSISDN is now a routing entity.
Severity:
Info
Instance:
N/A
HA Score:
Normal
3-83
Chapter 3
SDS (14000-14999)
OID:
sdsProvMsisdnDeletedFromBlacklist
1. Recovery:
• No action necessary.
Description:
Previously Blacklisted IMSI is now a Routing Entity
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
sdsProvImsiDeletedFromBlacklist
1. Recovery:
• No action necessary.
Description:
PdbRelay not connected.
• The SDS Command Log does not go back far enough to resume relaying
commands. A bulk load of HLRR is required.
• Neither Primary nor Disaster Recovery Virtual IP address is configured for the
HLRR.
• The connection is failing with the error shown in Additional Info.
Severity:
Major
Instance:
pdbrelay
3-84
Chapter 3
SDS (14000-14999)
HA Score:
Normal
OID:
sdsProvRelayNotConnectedNotify
1. Recovery:
1. Perform Bulk Load Procedure at the HLRR.
2. Configure the HLRR address in the SDS GUI.
3. Verify network connectivity with the HLRR.
Description:
Pdbrelay feature is enabled but is falling behind. The time between timestamps of the
last record processed and the latest entry in the Command Log has exceeded time
limit threshold.
• Critical: 27 minutes
• Major - 12 minutes
• Minor - 3 minutes
Severity:
Critical, Major, Minor
Instance:
pdbrelay
HA Score:
Normal
OID:
sdsProvRelayTimeLagNotify
1. Recovery:
14198 - ProvDbException
Alarm Group:
PROV
3-85
Chapter 3
SDS (14000-14999)
Description:
The rate of ProvDbException errors has exceed the threshold.
• Critical: 1000 errors per second
• Major: 100 errors per second
• Minor: Any occurrence
Severity:
Critical, Major, Minor
Instance:
ProvDbException, SDS
HA Score:
Normal
OID:
sdsProvDbExceptionNotify
1. Recovery:
• No action required.
Description:
The percent utilization of the DP Stack Event Queue is approaching its maximum
capacity.
Severity:
• Minor when utilization exceeds 60%.
• Major when utilization exceeds 80%.
• Critical when utilization exceeds 95%.
Instance:
N/A
HA Score:
Normal
OID:
sdsDpsStackEventQueueUtilizationNotify
1. Recovery:
3-86
Chapter 3
SS7/Sigtran (19200-19299)
Description:
Event responder failed due to an internal error.
Severity:
Major
Instance:
N/A
HA Score:
Normal
OID:
sdsEraResponderFailed
1. Recovery:
SS7/Sigtran (19200-19299)
This section provides information and recovery procedures for SS7/Sigtran alarms
ranging from 19200 - 19299.
Description:
Unable to access the SS7 Destination Point Code because the Remote Signaling
Point status is unavailable.
Severity:
Critical
Instance:
RSP Name
3-87
Chapter 3
SS7/Sigtran (19200-19299)
HA Score:
Normal
OID:
awpss7M3rlRspUnavailableNotify
1. Recovery:
1. RSP/Destination status can be monitored from the SOAM GUI by navigating to
SS7/Sigtran, and then Maintenance, and then Remote Signaling Points.
• If the RSP/Destination becomes unavailable due to a link set failure, the MP
server automatically attempts to recover all links not manually disabled.
• If the RSP/Destination becomes unavailable due to the receipt of a TFP,
the route's status is periodically audited by sending RST messages to the
adjacent point code which sent the TFP.
2. Navigate to SS7/Sigtran, and then Maintenance, and then Link Sets to check the
status of linkset links to the adjacent server.
3. Navigate to Transport Manager, and then Maintenance, and then Transport to
check the SCTP status to the adjacent server.
4. Verify IP network connectivity exists between the MP server and the adjacent
servers.
5. If all the connections to adjacent server are OK, then check the connections
between adjacent server and Remote Signaling Point. The specific steps depend
on the adjacent server type.
6. Check the event history logs for additional SS7 events or alarms from this MP
server.
7. Verify the adjacent server is not under maintenance.
8. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
Unable to access the SS7 Destination point code via this route.
Severity:
Minor
Instance:
<Route Name>
HA Score:
Normal
3-88
Chapter 3
SS7/Sigtran (19200-19299)
OID:
awpss7M3rlRouteUnavailableNotify
1. Recovery:
1. Route status can be monitored from SS7/Sigtran, and then Maintenance, and
then Remote Signaling Points.
• If the route becomes Unavailable due to a link set failure, the MP server will
attempt to automatically recover all links not manually disabled.
• If the route becomes Unavailable due to the receipt of a TFP, the route's status
will be periodically audited by sending RST messages to the adjacent point
code which sent the TFP.
2. Verify IP network connectivity exists between the MP server and the adjacent
servers.
3. Check the event history logs for additional SS7 events or alarms from this MP
server.
4. Verify the adjacent server is not under maintenance.
5. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
The SS7 link set to an adjacent signaling point has failed.
Severity:
Major
Instance:
<LinkSetName>
HA Score:
Normal
OID:
awpss7M3rlLinksetUnavailableNotify
1. Recovery:
1. The MP server will attempt to automatically recover all links not manually disabled.
2. Link set status can be monitored from SS7/Sigtran, and then Maintenance, and
then Linksets.
3. Verify IP network connectivity exists between the MP server and the adjacent
servers.
3-89
Chapter 3
SS7/Sigtran (19200-19299)
4. Check the event history logs for additional SS7 events or alarms from this MP
server.
5. Verify the adjacent server is not under maintenance.
6. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
M3UA has reported to M3RL that a link is out of service.
Severity:
Minor
Instance:
<Link Name>
HA Score:
Normal
OID:
awpss7M3rlLinkUnavailableNotify
1. Recovery:
1. The MP server will attempt to automatically recover all links not manually disabled.
2. Link status can be monitored from SS7/Sigtran, and then Maintenance, and then
Links.
3. Verify IP network connectivity exists between the MP server and the adjacent
servers.
4. Check the event history logs for additional SS7 events or alarms from this MP
server.
5. Verify the adjacent server is not under maintenance.
6. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
M3RL has started to use a lower priority (higher cost) route to route traffic toward a
given destination address, because the higher priority (lower cost) route specified for
that RSP/Destination has become Unavailable.
3-90
Chapter 3
SS7/Sigtran (19200-19299)
Severity:
Major
Instance:
RSP Name
HA Score:
Normal
OID:
awpss7M3rlPreferredRouteUnavailableNotify
1. Recovery:
1. If the preferred route becomes Unavailable due to the receipt of a TFP, the route's
status will be periodically audited by sending RST messages to the adjacent point
code which sent the TFP.
2. Route status can be monitored from SS7/Sigtran, and then Maintenance, and
then Remote Signaling Points.
3. Verify IP network connectivity exists between the MP server and the adjacent
servers.
4. Check the event history logs for additional SS7 events or alarms from this MP
server.
5. Verify the adjacent server is not under maintenance.
6. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
The TFP message was received by M3Rl layer; an adjacent point code has reported it
no longer has any available routes to the RSP/Destination.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
awpss7M3rlTfpReceivedNotify
3-91
Chapter 3
SS7/Sigtran (19200-19299)
1. Recovery:
1. Monitor the RSP/Destination status from SS7/Sigtran, and then Maintenance,
and then Remote Signaling Points.
2. Follow local procedures to determine the reason why the PC was prohibited.
Description:
TFA message received by M3RL layer; an adjacent point code has reported it has an
available route to the RSP/Destination.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
awpss7M3rlTfaReceivedNotify
1. Recovery:
Description:
TFR message received by M3RL layer; an adjacent point code has reported an
available route to the RSP/Destination has a restriction/limitation.
Severity:
Info
Instance:
N/A
HA Score:
Normal
3-92
Chapter 3
SS7/Sigtran (19200-19299)
OID:
awpss7M3rlTfrReceivedNotify
1. Recovery:
1. Monitor the RSP/Destination status from SS7/Sigtran, and then Maintenance,
and then Remote Signaling Points.
2. Follow local procedures to determine the reason why the PC was prohibited.
Description:
TFC message received by M3RL layer; an adjacent or non-adjacent point code is
reporting the congestion level of a RSP/Destination.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
awpss7M3rlTfcReceivedNotify
1. Recovery:
1. RSP/Destination status can be monitored from SS7/Sigtran, and then
Maintenance, and then Remote Signaling Points.
2. Follow local procedures to determine the reason why the PC was prohibited.
Description:
A message was discarded due to a routing error.
Severity:
Info
Instance:
N/A
HA Score:
Normal
3-93
Chapter 3
SS7/Sigtran (19200-19299)
OID:
awpss7M3rlRoutingFailureNotify
1. Recovery:
1. Each MP's assigned point code can be monitored from SS7/Sigtran, and then
Configuration, and then Local Signaling Points.
2. If the event was caused by:
• The DPC of an egress message is not configured as a remote signaling point,
then look at the routing label in the event additional information, determine the
DPC, and verify the DPC is configured as an RSP.
• The DPC of an egress message is configured but not available for routing,
then look at the routing label in the event additional information, determine
the DPC, verify a route exists for the DPC, and use the RSP status screen to
verify a route is available for the RSP.
• The DPC of an ingress message does not match the TPC or CPC of the MP
server group, then either signaling is being misdirected by the STP toward the
MP, or the MP server’s LSP is misconfigured. Look at the routing label in the
event additional information for the OPC and DPC of the ingress message.
3. If a high number of these errors occurs, then an internal routing table problem
might exist. It is recommended to contact My Oracle Support if further assistance
is needed.
Description:
The message was discarded due to a routing error. The NI (Network Indicator) value
received in a message from the network is not assigned to the MP. This event is
generated under the following circumstances:
• The NI in the MTP3 routing label of the ingress message is not supported for the
given network signaling domain for a provisioned Local Signaling Point.
• For an ingress ANSI SCCP message, bit-8 in the SCCP CDPA address indicator
octet indicates the CDPA is encoded as per international specifications:
– A "0" in bit-8 indicates the address is international and both the address
indicator and the address are coded according to international specifications.
– A "1" in bit-8 indicates the address is national and both the address indicator
and the address are coded according to national specifications.
The NI cannot be International for ANSI messages, since the ordering of the
subsystem number indicator field and the point code indicator fields are in the
reverse order in the ITU specification.
Severity:
Info
3-94
Chapter 3
SS7/Sigtran (19200-19299)
Instance:
N/A
HA Score:
Normal
OID:
awpss7M3rlRoutingFailureInvalidNiNotify
1. Recovery:
1. The Signaling Transfer Point or Signaling Gateway routing tables may be
inconsistent with the NI assigned to the MP. You can monitor each MP's assigned
NI value from SS7/Sigtran, and then Configuration, and then Remote Signaling
Points.
2. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
The message was discarded due to a routing error. The SI value received in a
message from the network is associated with a user part that is not currently
supported.
Severity:
Info
Instance:
RSP Name
HA Score:
Normal
OID:
awpss7M3rlRoutingFailureInvalidSiNotify
1. Recovery:
1. If the SI received is not a 0 (SNM) or 3 (SCCP), verify the STP/SG and the point
code that created the message have correct routing information.
2. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
3-95
Chapter 3
SS7/Sigtran (19200-19299)
Description:
All configured links are down; either failed or disabled. No M3UA signaling is possible.
The node is isolated from the network. All M3UA connectivity to the SS7/Sigtran
network has either failed or has been manually disabled.
Severity:
Critical
Instance:
N/A
HA Score:
Normal
OID:
awpss7M3rlNodeIsolatedAllLinkDownNotify
1. Recovery:
1. On the active SO, navigate to SS7/Sigtran, and then Maintenance, and then
Links to check whether any of the links are manually disabled that should not be.
If so, click Enable to enable the manually disabled links.
2. On the active SO, navigate to Transport Manager, and then Maintenance, and
then Transport to verify the transports are enabled.
3. Go to the specific SS7MP and verify the IP address and NIC status.
4. On the specific SS7MP, verify the adjacent server IP address is available.
5. View the active alarms and event history logs by navigating to Alarms & Events,
and then View Active and Alarms & Events, and then View History. Look for
significant events that may affect the IP network, associations, or links.
6. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
When an association is in the Enabled administrative state, part of the association
initialization involves sending an ASP-UP from the MP server and receiving an
ASP-UP-ACK from the adjacent server. If ASP-UP is sent, but no ASP-UP-ACK is
received within State Management ACK Timer milliseconds, this event is generated
and the ASP-UP is attempted again. ASP-UP attempts will continue indefinitely until
3-96
Chapter 3
SS7/Sigtran (19200-19299)
Severity:
Info
Instance:
<AssocName>
HA Score:
Normal
OID:
awpss7TimedOutWaitingForAspUpAckNotify
1. Recovery:
1. Verify the adjacent server on the Signaling Gateway is not under maintenance.
2. Verify the timer value for State Management ACK Timer is not set too short to
allow the adjacent server to respond with an ASP-UP-ACK. This should be rare if
the network is not congested.
3. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
The adjacent server at the specified IP address and port has sent an ASP-DOWN-
ACK, but not in response to an ASP-DOWN message from the MP server. Normally
this indicates the far-end of the association is being taken down for maintenance. If
the association administrative state is Enabled, the MP server automatically attempts
to bring the association back to ASP-UP. This is done by sending an ASP-UP.
The MP server continues to send ASP-UP until an ASP-UP-ACK is received, the
SCTP association comes down, or the association administrative state is changed to
Blocked or Disabled.
Severity:
Info
Instance:
<AssocName>
HA Score:
Normal
3-97
Chapter 3
SS7/Sigtran (19200-19299)
OID:
awpss7ReceivedUnsolicitedAspDownAckNotify
1. Recovery:
1. Verify the adjacent server on the Signaling Gateway is not under maintenance.
2. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
No ASP-ACTIVE-ACK is received in response to an ASP-ACTIVE message on the
link within State Management ACK Timer milliseconds.
Severity:
Info
Instance:
<LinkName>
HA Score:
Normal
OID:
awpss7TimedOutWaitingForAspActiveAckNotify
1. Recovery:
1. Verify the adjacent server on the Signaling Gateway is not under maintenance.
2. Verify the timer value for State Management ACK Timer is not set too short to
allow the adjacent server to respond with an ASP-ACTIVE-ACK. This should be
rare if the network is not congested.
3. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
An unsolicited ASP-INACTIVE-ACK is received on the link.
Severity:
Info
3-98
Chapter 3
SS7/Sigtran (19200-19299)
Instance:
<LinkName>
HA Score:
Normal
OID:
awpss7ReceivedUnsolicitedAspInactiveAckNotify
1. Recovery:
1. Verify the adjacent server on the Signaling Gateway is not under maintenance.
2. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
The far-end has sent an invalid M3UA message to which the MP server has
responded with an M3UA ERROR message.
Severity:
Info
Instance:
<LinkName> or <AssocName> Information about the type of error and the
accompanying diagnostic data is included in the event additional information.
HA Score:
Normal
OID:
awpss7ReceivedInvalidM3uaMessageNotify
1. Recovery:
1. Examine the M3UA error code and the diagnostic information and attempt to
determine why the far-end of the link sent the malformed message.
• Error code 0x01 indicates an invalid M3UA protocol version. Only version 1 is
supported.
• Error code 0x03 indicates an unsupported M3UA message class.
• Error code 0x04 indicates an unsupported M3UA message type.
• Error code 0x07 indicates an M3UA protocol error. The message contains a
syntactically correct parameter that does not belong in the message or occurs
too many times in the message.
3-99
Chapter 3
SS7/Sigtran (19200-19299)
• Error code 0x11 indicates an invalid parameter value. Parameter type and
length are valid, but value is out of range.
• Error code 0x12 indicates a parameter field error. Parameter is malformed
(e.g., invalid length).
• Error code 0x13 indicates an unexpected parameter. Message contains an
undefined parameter. The differences between this error and "Protocol Error"
are subtle. Protocol Error is used when the parameter is recognized, but not
intended for the type of message that contains it. Unexpected Parameter is
used when the parameter identifier is not known.
• Error code 0x16 indicates a missing parameter. Missing mandatory parameter,
or missing required conditional parameter.
• Error code 0x19 indicates an invalid routing context. Received routing context
not configured for any linkset using the association on which the message was
received.
2. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
An attempt to send an M3UA non-DATA message has failed. Non-DATA messages
include SSNM, ASPSM, ASPTM, and MGMT messages. The message has been
discarded. Possible reasons for the failure include:
• The far-end is slow to acknowledge the SCTP packets sent by the MP server,
causing the MP server’s SCTP send buffer to fill up to the point where the
message cannot be queued for sending.
• The socket has closed just as the send was being processed.
Severity:
Info
Instance:
<LinkName> or <AssocName>
Note:
Information about the type of error and the accompanying diagnostic data is
included in the event additional information.
HA Score:
Normal
3-100
Chapter 3
SS7/Sigtran (19200-19299)
OID:
awpss7FailedToSendNonDataMessageNotify
1. Recovery:
1. Select Alarms & Events, and then View History and check the event history logs
for additional SS7 events or alarms from this MP server.
2. Verify the adjacent server on the Signaling Gateway is not under congestion. The
MP server will have alarms to indicate the congestion if this is the case.
3. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
The link administrative state is manually changed from one administrative state to
another.
Severity:
Info
Instance:
<LinkName>
HA Score:
Normal
OID:
awpss7LocalLinkMaintenanceStateChangeNotify
1. Recovery:
1. No action required if this was an expected change due to some maintenance
activity. Otherwise, security logs can be examined on the SOAM server to
determine which user changed the administrative state.
2. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
An M3UA ERROR message is received from the adjacent server.
Severity:
Info
3-101
Chapter 3
SS7/Sigtran (19200-19299)
Instance:
<LinkName> or <AssocName>
Note:
Information about the type of error and the accompanying diagnostic data is
included in the event additional information.
HA Score:
Normal
OID:
awpss7ReceivedM3uaErrorNotify
1. Recovery:
1. Examine the M3UA error code and the diagnostic information and attempt to
determine why the far-end of the link sent the ERROR message.
• Error code 0x01 indicates an invalid M3UA protocol version. Only version 1 is
supported.
• Error code 0x03 indicates an unsupported M3UA message class.
• Error code 0x04 indicates an unsupported M3UA message type.
• Error code 0x05 indicates an unsupported M3UA traffic mode.
• Error code 0x07 indicates an M3UA protocol error. The message contains a
syntactically correct parameter that does not belong in the message or occurs
too many times in the message.
• Error code 0x09 indicates an invalid SCTP stream identifier. A DATA message
was sent on stream 0.
• Error code 0x0D indicates the message was refused due to management
blocking. An ASP Up or ASP Active message was received, but refused for
management reasons.
• Error code 0x11 indicates an invalid parameter value. Parameter type and
length are valid, but value is out of range.
• Error code 0x12 indicates a parameter field error. Parameter is malformed
(e.g., invalid length).
• Error code 0x13 indicates an unexpected parameter. Message contains an
undefined parameter. The differences between this error and "Protocol Error"
are subtle. Protocol Error is used when the parameter is recognized, but not
intended for the type of message that contains it. Unexpected Parameter is
used when the parameter identifier is not known.
• Error code 0x14 indicates the destination status is unknown. This message
can be sent in response to a DAUD from the MP server if the SG cannot or
does not wish to provide the destination status or congestion information.
• Error Error code 0x16 indicates a missing parameter. Missing mandatory
parameter, or missing required conditional parameter.
3-102
Chapter 3
SS7/Sigtran (19200-19299)
• Error code 0x19 indicates an invalid routing context. Received routing context
not configured for any linkset using the association on which the message was
received.
2. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
The status of remote SCCP subsystem has changed to Prohibited.
Severity:
Minor
Instance:
<RMU>
HA Score:
Normal
OID:
awpss7RemoteSccpSubsystemProhibitedNotify
1. Recovery:
1. You can monitor destination status from SS7/Sigtran, and then Maintenance, and
then Remote Signaling Points and RMU/subsystem status from SS7/Sigtran,
and then Maintenance, and then Remote MTP3 Users.
• If the subsystem's status changed to Prohibited because SCMG received a
SSP message, an audit of the status of the RMU via the SCCP subsystem
status test (SST) procedure is performed.
• If the subsystem's status changed to Prohibited because SCCP received
a MTP-PAUSE indication from M3RL, then recovery actions of restoring the
RSP/Destination status to Available will be invoked by M3RL.
• If the subsystem's status changed to Prohibited because SCCP received
a MTP STATUS cause=unequipped user indication from M3RL, then no
automatic recovery will be initiated. Only manual action at the remote node
can correct a remote point code that has not been configured with SCCP.
• If the subsystem's status changed to Prohibited because SCCP received
a MTP STATUS cause=unknown or inaccessible indication from M3RL,
then SCCP will automatically invoke subsystem status testing depending upon
the network type:
– ANSI: subsystem status testing of all RMUs associated with the point
code.
– ITU: subsystem status testing SCMG (SSN=1) associated with the point
code.
3-103
Chapter 3
SS7/Sigtran (19200-19299)
2. Verify IP network connectivity exists between the MP server and the adjacent
servers.
3. Select Alarms & Events, and then View History and check the event history logs
for additional SS7 events or alarms from this MP server.
4. Verify the adjacent server is not under maintenance.
5. Follow local procedures to determine the reason why the far-end SSN is down. If it
is not down, but it continues to be reported as down, it is recommended to contact
My Oracle Support if further assistance is needed.
Description:
SCCP discarded an ingress message because the Message Type is not currently
supported. The following connectionless message types are supported: UDT, XUDT,
UDTS, and XUDTS. The following SCMG Message Types are supported: SSA, SSP,
and SST.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
awpss7SccpMsgTypeUnrecognizedNotify
1. Recovery:
1. Investigate:
• If the originator of the message is misconfigured.
• If the network is misconfigured, causing messages to be routed to the wrong
RSP/Destination.
• If the message type is currently unsupported.
2. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
3-104
Chapter 3
SS7/Sigtran (19200-19299)
Description:
SCCP discarded an ingress message because a Hop Counter violation was detected.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
awpss7SccpHopCounterViolationNotify
1. Recovery:
1. One of the following conditions causes this error:
• The originator of the message is setting the initial value too low.
• The message is being rerouted too many times by the STPs, possibly because
of an STP routing misconfiguration that has caused message looping.
2. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
SCCP was unable to route or process a message during SCCP processing for
reasons (other than a global title translation failure, detected SCCP loop) possibly
requiring operator intervention.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
awpss7SccpRoutingFailureNotify
1. Recovery:
1. These failures are typically associated with invalid information received in the
SCCP messages. Check for the following:
3-105
Chapter 3
SS7/Sigtran (19200-19299)
Description:
SCCP was unable to route or process a message during SCCP processing due to
transient conditions such as RSP/destination failures and remote or local subsystem
failures.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
awpss7SccpRoutingFailureNetworkStatusNotify
1. Recovery:
1. Monitor status on the GUI Main Menu as follows:
• Destination status from SS7/Sigtran, and then Maintenance, and then
Remote Signaling Points.
• RMU/subsystem status from SS7/Sigtran, and then Configuration, and then
Remote MTP3 Users.
• Local subsystem status from SS7/Sigtran, and then Maintenance, and then
Local SCCP Users.
2. Verify IP network connectivity exists between the MP server and the adjacent
servers.
3. Check the event history logs for additional SS7 events or alarms from this MP
server.
4. Verify the adjacent server is not under maintenance.
5. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
3-106
Chapter 3
SS7/Sigtran (19200-19299)
Description:
SCCP Global Title Translation has failed to determine a destination for a PDU. SCCP
is invoking the message return procedure.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
awpss7SccpGttFailureNotify
1. Recovery:
1. Global title translation has failed. For the cause of the failure, look at the SCCP
return cause and the called party address information in the event additional
information field. Look for the following items:
• Missing global title translation data.
• Incorrect called party address information in the ingress message.
• Point code paused or congested.
• Subsystem prohibited or congested.
2. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
The status of the local SCCP subsystem has changed to Prohibited. This alarm is
raised for one of the following conditions:
• When a new local SSN is configured and is in the disabled state.
• When a GUI maintenance operation is performed to disable the state of the local
SSN.·
• On a system restart where the local SSN was is disabled state prior to the system
restart.
3-107
Chapter 3
SS7/Sigtran (19200-19299)
Severity:
Major
Instance:
<LSP>, <SSN>
HA Score:
Normal
OID:
awpss7SCCPLocalSubsystemProhibitedNotify
1. Recovery:
Description:
SCCP Segmentation Procedure Failure
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
awpss7SccpSegmentationFailureNotify
1. Recovery:
3-108
Chapter 3
SS7/Sigtran (19200-19299)
Description:
SCCP Reassembly Procedure Failure
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
awpss7SccpReassemblyFailureNotify
1. Recovery:
1. This condition indicates reassembly procedure failure at the SCCP layer:
• Reassembly time expired
• Out of sequence segments
• Internal error
2. Determine if the problem is a result of routing decision errors or latency from the
SS7 network.
3. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
The SS7 process, which is responsible for handling all SS7 traffic, is approaching or
exceeding its engineered traffic handling capacity.
3-109
Chapter 3
SS7/Sigtran (19200-19299)
Severity:
Minor, Major, or Critical as shown in the GUI under Alarms & Events, and then View
Active.
Instance:
N/A
HA Score:
Normal
OID:
awpss7Ss7ProcessCpuUtilizationNotify
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed among
the remaining MPs in the server site. You can monitor MP server status from
Status & Manage, and then Server.
2. The misconfiguration of STP routing may result in too much traffic being
distributed to the MP. You can monitor the ingress traffic rate of each MP from
Status & Manage, and then KPIs. Each MP in the server site should be receiving
approximately the same ingress transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. You can monitor the ingress traffic rate of each MP from Status &
Manage, and then KPIs. If all MPs are in a congestion state, then the offered load
to the server site is exceeding its capacity.
4. The SS7 process may be experiencing problems. You monitor the alarm log from
Alarms & Events, and then View Active.
5. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
The ingress message rate (messages per second) for the MP is approaching or
exceeding its engineered traffic handling capacity.
Severity:
Minor, Major, Critical as shown in the GUI under Alarms & Events, and then View
Active.
Instance:
N/A
HA Score:
Normal
3-110
Chapter 3
SS7/Sigtran (19200-19299)
OID:
awpss7IngressMsgRateNotify
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed among
the remaining MPs in the server site. You can monitor MP server status from
Status & Manage, and then Server
2. The misconfiguration of STP routing may result in too much traffic being
distributed to the MP. You can monitor the ingress traffic rate of each MP from
Status & Manage, and then KPIs. Each MP in the server site should be receiving
approximately the same ingress transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. You can monitor the ingress traffic rate of each MP from Status &
Manage, and then KPIs. If all MPs are in a congestion state, then the offered load
to the server site is exceeding its capacity.
4. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
The percent utilization of the MP's PDU buffer pool is approaching its maximum
capacity. If this problem persists and the pool reaches 100% utilization, all new
ingress messages will be discarded.
Severity:
Minor, Major, Critical as shown in the GUI under Alarms & Events, and then View
Active.
Instance:
<PoolName> Values: ANSI, ITUI, ITUN
HA Score:
Normal
OID:
awpss7PduBufferPoolUtilNotify
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed among
the remaining MPs in the server site. You can monitor MP server status from
Status & Manage, and then Server.
2. The misconfiguration of STP routing may result in too much traffic being
distributed to the MP. You can monitor the ingress traffic rate of each MP from
3-111
Chapter 3
SS7/Sigtran (19200-19299)
Status & Manage, and then KPIs. Each MP in the server site should be receiving
approximately the same ingress transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. You can monitor the ingress traffic rate of each MP from Status &
Manage, and then KPIs. If all MPs are in a congestion state, then the offered load
to the server site is exceeding its capacity.
4. A software defect may exist resulting in PDU buffers not being de-allocated to the
pool when a PDU is successfully transmitted into the network. This alarm should
not normally occur when no other congestion alarms are asserted. Examine the
alarm log from Alarms & Events, and then View Active.
5. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
The percent utilization of the MP's SCCP stack event queue is approaching its
maximum capacity.
Severity:
Minor, Major, Critical as shown in the GUI under Alarms & Events, and then View
Active.
Instance:
N/A
HA Score:
Normal
OID:
awpss7SccpStackEventQueueUtilNotify
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed among
the remaining MPs in the server site. You can view MP server status from Status
& Manage, and then Server.
2. The misconfiguration of STP routing may result in too much traffic being
distributed to the MP. You can monitor the ingress traffic rate of each MP from
Status & Manage, and then KPIs. Each MP in the server site should be receiving
approximately the same ingress transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. You can monitor the ingress traffic rate of each MP from Status &
Manage, and then KPIs. If all MPs are in a congestion state, then the offered load
to the server site is exceeding its capacity.
3-112
Chapter 3
SS7/Sigtran (19200-19299)
4. If no additional congestion alarms are asserted, the SCCP Stack Event thread
may be experiencing a problem preventing it from processing events from its event
queue. Examine the alarm log under Alarms & Events, and then View Active.
5. If the problem persists, It is recommended to contact My Oracle Support if further
assistance is needed.
Description:
The percent utilization of the MP's M3RL Stack Event Queue is approaching its
maximum capacity.
Severity:
Minor, Major, Critical as shown in the GUI under Alarms & Events, and then View
Active.
Instance:
N/A
HA Score:
Normal
OID:
awpss7M3rlStackEventQueueUtilNotify
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed among
the remaining MPs in the server site. You can view MP server status from Status
& Manage, and then Server.
2. The misconfiguration of STP routing may result in too much traffic being
distributed to the MP. You can monitor the ingress traffic rate of each MP from
Status & Manage, and then KPIs. Each MP in the server site should be receiving
approximately the same ingress transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. You can monitor the ingress traffic rate of each MP from Status &
Manage, and then KPIs. If all MPs are in a congestion state, then the offered load
to the server site is exceeding its capacity.
4. If no additional congestion alarms are asserted, the M3RL Stack Event thread may
be experiencing a problem preventing it from processing events from its event
queue. Examine the alarm log from Alarms & Events, and then View Active.
5. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
3-113
Chapter 3
SS7/Sigtran (19200-19299)
Description:
The percent utilization of the MP's M3RL Network Management Event Queue is
approaching its maximum capacity.
Severity:
Minor, Major, Critical as shown in the GUI under Alarms & Events, and then View
Active.
Instance:
N/A
HA Score:
Normal
OID:
awpss7M3rlNetMgmtEventQueueUtilNotify
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed among
the remaining MPs in the server site. You can view MP server status from Status
& Manage, and then Server.
2. The misconfiguration of STP routing may result in too much traffic being
distributed to the MP. You can monitor the ingress traffic rate of each MP under
Status & Manage, and then KPIs. Each MP in the server site should be receiving
approximately the same ingress transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. You can monitor the ingress traffic rate of each MP under Status &
Manage, and then KPIs. If all MPs are in a congestion state, then the offered load
to the server site is exceeding its capacity.
4. If no additional congestion alarms are asserted, the M3RL Network Management
Event thread may be experiencing a problem preventing it from processing events
from its event queue. Examine the alarm log from Alarms & Events, and then
View Active.
5. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
3-114
Chapter 3
SS7/Sigtran (19200-19299)
Description:
The percent utilization of the MP's M3UA Stack Event Queue is approaching its
maximum capacity.
Severity:
Minor, Major, Critical as shown in the GUI under Alarms & Events, and then View
Active.
Instance:
N/A
HA Score:
Normal
OID:
awpss7M3uaStackEventQueueUtilNotify
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed among
the remaining MPs in the server site. You can view MP server status from Status
& Manage, and then Server.
2. The misconfiguration of STP routing may result in too much traffic being
distributed to the MP. You can monitor the ingress traffic rate of each MP from
Status & Manage, and then KPIs. Each MP in the server site should be receiving
approximately the same ingress transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. You can monitor the ingress traffic rate of each MP from Status &
Manage, and then KPIs. If all MPs are in a congestion state, then the offered load
to the server site is exceeding its capacity.
4. If no additional congestion alarms are asserted, the M3UA Stack Event thread
may be experiencing a problem preventing it from processing events from its event
queue. Examine the alarm log from Alarms & Events, and then View Active.
5. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
The percent utilization of events queued to all SCTP associations on the MP server is
approaching maximum capacity.
Severity:
Minor, Major, Critical as shown in the GUI under Alarms & Events, and then View
Active.
3-115
Chapter 3
SS7/Sigtran (19200-19299)
Instance:
N/A
HA Score:
Normal
OID:
awpss7SctpAggregateAssocWriteQueueUtilNotify
1. Recovery:
1. An IP network or STP/SG problem may exist preventing SCTP from transmitting
messages into the network on multiple Associations at the same pace that
messages are being received from the network.
2. One or more SCTP Association Writer threads may be experiencing a problem
preventing it from processing events from its event queue. Examine the alarm log
from Alarms & Events, and then View Active.
3. If one or more MPs in a server site have failed, the traffic will be distributed among
the remaining MPs in the server site. You can view MP server status from Status
& Manage, and then Server.
4. The misconfiguration of STP routing may result in too much traffic being
distributed to the MP. You can monitor the ingress traffic rate of each MP from
Status & Manage, and then KPIs. Each MP in the server site should be receiving
approximately the same ingress transaction per second.
5. There may be an insufficient number of MPs configured to handle the network
traffic load. You can monitor the ingress traffic rate of each MP from Status &
Manage, and then KPIs. If all MPs are in a congestion state, then the offered load
to the server site is exceeding its capacity.
6. If the problem persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
Operation discarded due to local resource limitation.
Severity:
Info
Instance:
Application name
HA Score:
Normal
3-116
Chapter 3
SS7/Sigtran (19200-19299)
OID:
awpss7TcapOpDiscardedLocalResLimitNotify
1. Recovery:
1. Determine if this condition indicates a software problem or unexpected TC User
behavior.
2. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Transaction could not be delivered to remote TCAP peer due to conditions in the
network.
Severity:
Info
Instance:
Application name
HA Score:
Normal
OID:
awpss7TcapTransNotDeliveredToPeerNotify
1. Recovery:
1. This event indicates an SCCP service message (UDTS or XUDTS) was received
from the network, meaning that the TCAP message could not be delivered to the
remote TCAP peer. The event additional information field contains the first 80
octets of the SS7 message starting with the MTP3 routing label. This data can be
used to determine the routing instructions for the message.
2. Verify the routing is configured correctly for the destination. If the routing
configuration is correct, determine why the remote TCAP peer is not available.
3. It is recommended to contact My Oracle Support if further assistance is needed.
3-117
Chapter 3
SS7/Sigtran (19200-19299)
Description:
Operation discarded due to malformed component received from remote TCAP peer.
Severity:
Info
Instance:
Application name
HA Score:
Normal
OID:
awpss7TcapMalformedComponentFromRemoteNotify
1. Recovery:
1. This event indicates a TCAP component was received from the remote TCAP peer
that could not be successfully decoded.
2. The event additional information field includes the reason why the decoding failed,
plus the first 80 octets of the message starting with the MTP3 routing label. The
message data can be used to determine the source of the malformed message.
3. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Transaction discarded due to malformed dialogue message received from local TC
user.
Severity:
Info
Instance:
Application name
HA Score:
Normal
OID:
awpss7TcapMalformedDialogueFromLocalNotify
1. Recovery:
1. Determine if this condition indicates a software problem or unexpected TC user
behavior.
3-118
Chapter 3
SS7/Sigtran (19200-19299)
Description:
Transaction discarded due to malformed dialogue message received from local TC
peer.
Severity:
Info
Instance:
Application name
HA Score:
Normal
OID:
awpss7TcapMalformedDialogueFromRemoteNotify
1. Recovery:
1. This event indicates a TCAP message was received from the remote TCAP peer
that could not be successfully decoded.
2. The event additional information field includes the reason why the decoding failed,
plus the first 80 octets of the message starting with the MTP3 routing label. The
message data can be used to determine the source of the malformed message.
3. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Unexpected event received from local TC user.
Severity:
Info
Instance:
Application name
HA Score:
Normal
3-119
Chapter 3
SS7/Sigtran (19200-19299)
OID:
awpss7TcapUnexpectedMsgFromLocalNotify
1. Recovery:
1. Determine if this condition indicates a software problem or unexpected TC user
behavior.
2. The event additional information field includes a description of what event was
received and why it was unexpected, as well as what was done with the operation
or dialogue as a result.
3. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Unexpected event received from remote TCAP peer.
Severity:
Info
Instance:
Application name
HA Score:
Normal
OID:
awpss7TcapUnexpectedMsgFromRemoteNotify
1. Recovery:
1. Determine if this condition indicates a software problem or unexpected TC peer
behavior.
2. The event additional information field includes:
• a description of what event was received and why it was unexpected
• what was done with the operation or dialogue as a result
• the first 80 octets of the message starting with the MTP3 routing label
3. The message data can be used to determine the source of the malformed
message.
4. It is recommended to contact My Oracle Support if further assistance is needed.
3-120
Chapter 3
SS7/Sigtran (19200-19299)
Description:
Dialogue removed by dialogue cleanup timer.
Severity:
Info
Instance:
Application name
HA Score:
Normal
OID:
awpss7TcapDialogueRemovedTimerExpiryNotify
1. Recovery:
1. This event indicates a TCAP transaction containing no components was sent, but
no response was received from the remote TCAP peer.
2. The event additional information field includes:
• the local dialogue-id
• the number of milliseconds that elapsed between the time the message was
sent and the time that the message was discarded
• the destination point code to which the message was destined
• the SCCP called party address to which the message was destined
3. Check for SCCP events just before this event indicating a message could not
be routed. If SCCP failed to route the message, verify a route exists for the
destination to which the TCAP message was being sent.
4. If no SCCP routing failure event exists, investigate why the remote TCAP peer
failed to respond. The DPC and called party address can be used to determine the
destination to which the message was being sent.
5. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Operation removed by invocation timer expiry.
3-121
Chapter 3
SS7/Sigtran (19200-19299)
Severity:
Info
Instance:
Application name
HA Score:
Normal
OID:
awpss7TcapOperationRemovedTimerExpiryNotify
1. Recovery:
1. This event indicates a TCAP transaction containing no components was sent, but
no response was received from the remote TCAP peer.
2. The event additional information field includes:
• the local dialogue-id and invoke-id
• the number of milliseconds that elapsed between the time the message was
sent and the time that the operation was discarded
• the destination point code to which the message was destined if the
component was ever sent
• the SCCP called party address to which the message was destined if the
component was ever sent
3. Check for SCCP events just before this event indicating a message could not
be routed. If SCCP failed to route the message, verify a route exists for the
destination to which the TCAP message was being sent.
4. If no SCCP routing failure event exists, investigate why the remote TCAP peer
failed to respond. The DPC and called party address (if present) can be used to
determine the destination to which the message was being sent.
5. If the DPC and Called Party Address are not included in the additional information
field, it indicates the component was created, but never sent.
6. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Dialogue aborted by remote TCAP peer.
Severity:
Info
Instance:
Application name
3-122
Chapter 3
SS7/Sigtran (19200-19299)
HA Score:
Normal
OID:
awpss7TcapDialogueAbortByRemoteNotify
1. Recovery:
1. This event indicates a remote TCAP peer has aborted a dialogue.
2. The event additional information field includes:
• the abort reason
• the first 80 octets of the message starting with the MTP3 routing label
3. The message data can be used to determine the source of the U-Abort or P-Abort
message.
4. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Received unsupported TCAP message.
Severity:
Info
Instance:
Application name
HA Score:
Normal
OID:
awpss7TcapUnsupportedTCAPMsgRcvdNotify
1. Recovery:
1. This event indicates an unsupported TCAP message has been received.
2. The event additional information field includes:
• the abort reason
• the first 80 octets of the message starting with the MTP3 routing label
3. The message data can be used to determine the source of the unsupported
message.
4. It is recommended to contact My Oracle Support if further assistance is needed.
3-123
Chapter 3
SS7/Sigtran (19200-19299)
Description:
Operation rejected by remote TCAP peer.
Severity:
Info
Instance:
Application name
HA Score:
Normal
OID:
awpss7TcapReturnRejectByRemoteNotify
1. Recovery:
1. This event indicates a remote TCAP peer has rejected an operation.
2. The event additional information field includes:
• the reject reason
• the first 80 octets of the message starting with the MTP3 routing label
3. The message data can be used to determine the source of the message.
4. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
TCAP active dialogue utilization
Severity:
Minor, Major, Critical
Instance:
Application name
HA Score:
Normal
3-124
Chapter 3
SS7/Sigtran (19200-19299)
OID:
awpss7TcapActiveDialogueUtilNotify
1. Recovery:
1. The percent utilization of the MP's dialogue table is approaching maximum
capacity. This alarm indicates the number of active dialogues on the MP server
is higher than expected.
2. If this problem persists and the dialogue table reaches 100% utilization, all new
messages will be discarded. This alarm should not normally occur when no other
congestion alarms are asserted. This condition may be caused by any of the
following:
• the incoming plus outgoing rate of new dialogues is higher than expected
(possibly due to poor load balancing across MP servers, or too few MP
servers to handle the load)
• the duration of the dialogues is longer than expected
• both the rate and duration are higher than expected
• a software problem is preventing removal of completed dialogues
3. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
TCAP active operation utilization
Severity:
Minor, Major, Critical
Instance:
Application name
HA Score:
Normal
OID:
awpss7TcapActiveOperationUtilNotify
1. Recovery:
1. The percent utilization of the MP's component table is approaching maximum
capacity. This alarm indicates the number of active egress TCAP operations on
the MP server is higher than expected.
2. If this problem persists and the component table reaches 100% utilization, all new
egress operations will be discarded. This alarm should not normally occur when no
other congestion alarms are asserted. This may be caused by any of the following:
• the outgoing rate of new operations is higher than expected (possibly due to a
higher than expected average number of operations per message)
3-125
Chapter 3
SS7/Sigtran (19200-19299)
Description:
TCAP stack event queue utilization
Severity:
Minor, Major, Critical
Instance:
Application name
HA Score:
Normal
OID:
awpss7TcapStackEventQueueUtilNotify
1. Recovery:
1. The percent utilization of the MP's TCAP Stack Event Queue is approaching its
maximum capacity. This alarm indicates the number of ingress TCAP messages
on the MP server is higher than expected.
2. If this problem persists and the queue reaches 100% utilization, all new ingress
messages will be discarded. This alarm should not normally occur when no other
congestion alarms are asserted. This may be caused by any of the following:
• the incoming rate of new TCAP messages is higher than expected (possibly
due to poor load balancing across MP servers, or too few MP servers to
handle the load)
• a software problem is causing the messages to be processed more slowly
than expected
3. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Return error from remote TCAP peer
3-126
Chapter 3
SS7/Sigtran (19200-19299)
Severity:
Info
Instance:
Application name
HA Score:
Normal
OID:
awpss7TcapReturnErrorFromRemoteNotify
1. Recovery:
1. This event indicates a remote TCAP peer has responded to an operation using
Return Error.
2. The event additional information field includes:
• the error reason
• the first 80 octets of the message starting with the MTP3 routing label
3. The message data can be used to determine the source of the message.
4. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
The SCCP Egress Message Rate (Message per second) for the MP is approaching or
exceeding its engineered traffic handling capacity.
Severity:
Major
Instance:
N/A
HA Score:
Normal
OID:
awpss7SccpEgressMsgRateNotify
1. Recovery:
1. This condition indicates the SS7 Stack is reaching its engineered traffic handling
capacity due to egress traffic received from application.
2. It is recommended to contact My Oracle Support if further assistance is needed.
3-127
Chapter 3
Transport Manager (19400-19419)
Description:
TCAP was unable to route message due to transient conditions such as destination
failure or destination unavailability.
Severity:
Info
Instance:
Hostname
HA Score:
Normal
OID:
awpss7TcapRoutingFailureNotify
1. Recovery:
1. This condition indicates failure at the TCAP layer due to XG SS7 node removal or
congestion at Communication Agent.
2. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Transport Down
Severity:
Major
Instance:
<TransportName>
HA Score:
Normal
3-128
Chapter 3
Transport Manager (19400-19419)
OID:
awptransmgrTransportDownNotify
1. Recovery:
1. The Active alarm instance data, which can be viewed from Alarms & Events,
and then View Active, contains the Transport Name as configured in Transport
Manager, and then Configuration, and then Transport
Additional Information for the alarm can be found in Alarms & Events, and
then View Active or View History by locating the row with a sequence number
that matches the active alarm sequence number and viewing the Additional Info
column. This column will include the local and remote IP addresses and ports, the
administrative state, and the protocol state of the association.
This alarm is raised when:
• The association is configured and the admin state is enabled, but the SCTP
transport is not in the ASP-UP protocol state for the M3UA plugin, or
• The association is configured, but the SCTP transport is not in the APP-UP
state for other plugins
Note:
It is normal to have an association alarm if the association is in the
Blocked or Disabled administrative state.
3-129
Chapter 3
Transport Manager (19400-19419)
• Verify the association's remote IP address and port correctly identify an SCTP
listening port on the adjacent server.
• Verify IP network connectivity exists between the MP server and the adjacent
server.
• Check the event history logs at Alarms & Events, and then View History for
additional SS7 events or alarms from this MP server.
• Verify the adjacent server on the Signaling Gateway is not under maintenance.
2. If the alarm persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
Failed to configure transport.
Severity:
Info
Instance:
<TransportName>
HA Score:
Normal
OID:
awptransmgrFailedToConfigureTransportNotify
1. Recovery:
1. A Transport is configured each time the Transport attempts to connect or
reconnect.
2. If transport configuration fails or the alarm persists, it is recommended to contact
My Oracle Support if further assistance is needed.
Description:
Failed to connect Transport
Severity:
Info
3-130
Chapter 3
Transport Manager (19400-19419)
Instance:
<TransportName>
HA Score:
Normal
OID:
awptransmgrFailedToConnectTransportNotify
1. Recovery:
1. The Transport named in the Instance field has failed in a connection attempt. If
configured as an SCTP Initiator, the system will automatically attempt to recover
the association/connection. Connection attempts occur every "Connection Retry
Interval" seconds, as defined in the Transport Configuration Set screen for the
configuration set used by the failed transport (default: 10 seconds). If configured
as an SCTP or UDP Listener, no further action is taken.
To troubleshoot
• Verify the transport's local IP address and port number are configured on the
Adjacent Node (Some Nodes only accept connections from IP addresses and
ports they are configured to accept connections from).
• Verify the transport's remote IP address and port correctly identify an SCTP
listening port on the adjacent node.
• Verify IP network connectivity exists between the MP and the adjacent node.
• Verify the timers in the transport's configuration set are not set too short to
allow the connection to proceed. This should be rare if the IP network is
functioning correctly.
• Check the event history logs at Alarms & Events, and then View History for
additional SS7 events or alarms from this MP server.
• Verify adjacent server on the Signaling Gateway is not under maintenance.
2. If the alarm persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
Received malformed SCTP message (invalid length).
Severity:
Info
Instance:
<TransportName>
3-131
Chapter 3
Transport Manager (19400-19419)
HA Score:
Normal
OID:
awptransmgrReceivedMalformedTransSctpMessageNotify
1. Recovery:
1. An SCTP message was received containing a message not valid in length.
2. If the alarm persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
Far-end closed the transport.
Severity:
Info
Instance:
<TransportName>
HA Score:
Normal
OID:
awptransmgrFarEndClosedTheTransportNotify
1. Recovery:
1. The far-end of the SCTP association sent a SHUTDOWN or ABORT message
to close the association. If an Initiator, the MP server automatically attempts to
reestablish the connection. Connection attempts occur every "Connection Retry
Interval" seconds, as defined in the Transport Configuration Set screen for the
configuration set used by the failed association (default: 10 seconds). If a Listener,
the MP server will only open the socket and await further messages from the
far-end.
To Troubleshoot:
• Investigate the adjacent node at the specified IP address and port to
determine if it failed or if it is under maintenance.
• Check the adjacent node for alarms or logs that might indicate the cause for
their closing the association.
2. If the alarm persists, it is recommended to contact My Oracle Support if further
assistance is needed.
3-132
Chapter 3
Transport Manager (19400-19419)
Description:
Transport closed due to lack of response
Severity:
Info
Instance:
<TransportName>
HA Score:
Normal
OID:
awptransmgrTransportClosedDueToLackOfResponseNotify
1. Recovery:
1. The adjacent node at the specified IP address and port failed to respond to
attempts to deliver an SCTP DATA packet or SCTP heartbeat. If an SCTP Initiator,
the transport is closed and the MP server automatically attempts to reestablish
the connection. Connection attempts occur every Connection Retry Interval
seconds, as defined in the Transport Configuration Set screen for the configuration
set used by the failed transport (default: 10 seconds). If a Listener, the MP server
will only open the socket and await further messages from the far-end.
To troubleshoot:
• Verify IP network connectivity still exists between the MP server and the
adjacent server.
• Verify the timers in the transport's configuration set are not set too short
to allow the signaling to succeed. This should be rare if the IP network is
functioning correctly.
• Check the event history logs at Alarms & Events, and then View History for
additional SS7 events or alarms from this MP server.
• Verify the adjacent server on the Signaling Gateway is not under maintenance.
2. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Local transport maintenance state change.
3-133
Chapter 3
Transport Manager (19400-19419)
Severity:
Info
Instance:
<TransportName>
HA Score:
Normal
OID:
awptransmgrLocalTransportMaintenanceStateChangeNotify
1. Recovery:
1. No customer action is necessary if this was an expected change due to some
maintenance activity. Otherwise, security logs can be examined on the NO/SO
server to determine which user changed the administrative state.
Transport status can be viewed using Transport Manager, and then
Maintenance, and then Transport.
2. If the alarm persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
Failed to send transport DATA message.
Severity:
Info
Instance:
<TransportName>, <TransportAdapter>, <TransportProtocol>
HA Score:
Normal
OID:
awptransmgrFailedToSendTransDataMessageNotify
1. Recovery:
1. An attempt to send an SS7 M3UA/ENUM DATA message has failed. The message
has been discarded.
For SCTP, Possible reasons for the failure include:
3-134
Chapter 3
Transport Manager (19400-19419)
• The far-end is slow to acknowledge the SCTP packets sent by the MP server,
causing the MP server's SCTP send buffer to fill up to the point where the
message cannot be queued for sending.
• The socket has closed just as the send was being processed.
To Troubleshoot:
• Check the event history logs at Alarms & Events, and then View History for
additional SS7 events or alarms from this MP server.
• Verify the adjacent server on the Signaling Gateway is not under congestion.
The MP server will have alarms to indicate the congestion if this is the case.
2. If the alarm persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
The percent utilization of the MP's single transport egress-queue is approaching its
maximum capacity.
Severity:
Based on defined Thresholds. Minor, Major, Critical Engineered Max Value = 1000
Instance:
<TransportName>
HA Score:
Normal
OID:
awptransmgrTransSingleWriteQueueUtilNotify
1. Recovery:
1. The percent utilization of the MP's Transport Writer Queue is approaching
its maximum capacity. If this problem persists and the queue reaches 100%
utilization, all new egress messages from the Transport will be discarded.
This alarm should not normally occur when no other congestion alarms are
asserted. This may occur for a variety of reasons:
• An IP network or Adjacent node problem may exist preventing SCTP from
transmitting messages into the network at the same pace that messages are
being received form the network.
• The SCTP Association Writer process may be experiencing a problem
preventing it from processing events from its event queue. The alarm log
should be examined from Main Menu, and then Alarms & Events.
3-135
Chapter 3
Transport Manager (19400-19419)
• If one or more MPs in a server site have failed, the traffic will be distributed
amongst the remaining Mps in the server site. MP server status can be
monitored from Status & Manage, and then Server Status.
• The mis-configuration of Adjacent Node IP routing may result in too much
traffic being distributed to the MP. Each MP in the server site should be
receiving approximately the same ingress transaction per second.
• There may be an insufficient number of MPs configured to handle the network
traffic load. The ingress traffic rate of each MP can be monitored from Status
& Manage, and then KPI Display. If all MPs are in a congestion state then the
offered load to the server site is exceeding its capacity.
2. If the alarm persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
The message is rejected based on configured Access Control List for transport.
Severity:
Info
Instance:
<TransportName>
HA Score:
Normal
OID:
awptransmgrMessageRejectedByAclFilteringNotify
1. Recovery:
1. Verify the ENUM server's IP address is the ACL, or that the ACL is empty.
2. If the alarm persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
State change of an IP address of a multi-homed adjacent node in SCTP transport.
Severity:
Info
3-136
Chapter 3
Transport Manager (19400-19419)
Instance:
<TransportName>
HA Score:
Normal
OID:
awptransmgrAdjIpAddrStateChangeNotify
1. Recovery:
1. Verify IP network connectivity still exists between the MP server and the adjacent
server.
2. If the alarm persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
SCTP Transport closed due to failure of multi-homing validation.
Severity:
Info
Instance:
<TransportName>, <TransportId>
HA Score:
Normal
OID:
awptransmgrSctpTransportRefusedNotify
1. Recovery:
1. Recheck the adjacent node's configure IP address and validation mode.
2. If alarm persists, it is recommended to contact My Oracle Support if further
assistance is needed.
3-137
Chapter 3
Transport Manager (19400-19419)
Description:
IP address advertised by an adjacent node in INIT/INIT-ACK chunk are different from
configured IP addresses.
Severity:
Info
Instance:
<TransportName>
HA Score:
Normal
OID:
awptransmgrSctpTransportCfgMismatchNotify
1. Recovery:
1. Recheck the configured IP address and transport configuration and validation
mode.
2. If the alarm persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description:
SCTP transport closed due to unsupported add/delete peer IP address event received
in peer address notification.
Severity:
Info
Instance:
<TransportName>
HA Score:
Normal
3-138
Chapter 3
Communication Agent, ComAgent (19420-19909)
OID:
awptransmgrTransportClosedDueToUnsupportedEventNotify
1. Recovery:
1. Disable SCTP dynamic address reconfiguration at the adjacent node.
2. If the alarm persists, it is recommended to contact My Oracle Support if further
assistance is needed.
Description
The BDF work queue depth size has reached full capacity.
Severity
Minor
Instance
N/A
HA Score
Normal
OID
cAFBDFQFullNotify
1. Recovery:
1. The system itself may be heavily loaded with work, causing this subsystem to also
become overloaded. Check other system resources for signs of overload.
2. It is recommended to contact My Oracle Support for assistance if needed.
3-139
Chapter 3
Communication Agent, ComAgent (19420-19909)
Description
The BDF subsystem is throttling traffic at sender.
Severity
Minor
Instance
N/A
HA Score
Normal
OID
cAFBDFThrotlNotify
1. Recovery:
Description
The BDF subsystem received a StackEvent that was somehow invalid, corrupt, or
could not be delivered to the application.
Severity
Info
Instance
<Source IP>
HA Score
Normal
Throttle Seconds
0 (zero)
OID
cAFBroadcastDataFrameworkInvalidStackEventNotify
1. Recovery:
3-140
Chapter 3
Communication Agent, ComAgent (19420-19909)
1. If more messages of the same type occur, then check the site(s) and network for
other possible corruption or overloaded conditions.
2. It is recommended to contact My Oracle Support for assistance if needed.
Description:
This alarm indicates that a Communication Agent is unable to establish transport
connections with one or more other server, and this may indicate applications on the
local server are unable to communicate with all of their peers. Generally this alarm
is generated when a server or the IP network is undergoing maintenance or when a
connection has been manually disabled.
Severity:
Major
Instance:
N/A
HA Score:
Normal
OID:
cAFConnectionDownNotify
Cause:
• A connection becomes down. If a connection was already down, when another
connection becomes down, then the count of connections is updated, and the
alarm is re-asserted.
• A connection exits the down state, and there are other down connections. Update
the connection count and re-assert the alarm.
Diagnostic Information:
This alarm indicates a Communication Agent is unable to establish transport
connections with one or more other servers, and this may indicate applications on
the local server are unable to communicate with all of their peers. Generally this alarm
is asserted when a server or the IP network is undergoing maintenance or when a
connection has been manually disabled.
1. Recovery:
3-141
Chapter 3
Communication Agent, ComAgent (19420-19909)
1. Navigate to Alarms & Events, and then View History to find additional
information about the alarm.
The information can be found by locating the row with a sequence number
that matches the active alarm sequence number and viewing the Additional Info
column.
2. Check the event history logs by navigating to Alarms & Events, and then View
History for additional Communication Agent events or alarms from this MP server.
3. Navigate to Communication Agent, and then Maintenance, and then
Connection Status to determine which connections on the server have abnormal
status.
4. If the connection is manually disabled, then no further action is necessary.
5. Verify the remote server is not under maintenance.
6. Verify IP network connectivity exists between the two connection end-points.
7. Verify the connection’s local IP address and port number are configured on remote
node.
8. Verify the Application Process using Communication Agent plug-in is running on
both ends.
9. Verify the connection’s remote IP address and port correctly identify remote’s
listening port.
10. It is recommended to contact My Oracle Support for assistance.
Description:
This alarm indicates that one or more Communication Agent connections have been
administratively blocked at the server asserting the alarm, and this is generally done
as part of a maintenance procedure. A connection that is blocked cannot be used by
applications to communicate with other servers, and so this alarm may indicate that
applications are unable to communicate with their expected set of peers.
Note:
It is normal to have this alarm if the connection is in the Blocked
administrative state on the near-side of the connection.
Severity:
Minor
Instance:
N/A
3-142
Chapter 3
Communication Agent, ComAgent (19420-19909)
Note:
This alarm is cleared when:
• Locally UNBLOCKed: An Admin Action to locally UNBLOCK the service
connection and no other connection is locally blocked.
• Deleted: The MP Server/Connection is deleted.
• Failed: The Connection is terminated, due to Admin Disable action or
Heartbeat failure or remote end initiated disconnection or any other
reason.
HA Score:
Normal
OID:
cAFConnLocalBlockedNotify
1. Recovery:
1. Use Alarms & Events, and then View History to find additional information about
the alarm.
The information can be found by locating the row with a sequence number
that matches the active alarm sequence number and viewing the Additional Info
column.
2. Check the event history logs at Alarms & Events, and then View History for
additional Communication Agent events or alarms from this MP server.
3. Use Communication Agent, and then Maintenance, and then Connection
Status to determine which connections on the server have abnormal status.
4. If the expected set of connections is locally blocked, then no further action is
necessary.
5. To remove a the local block condition for a connection, use the Communication
Agent, and then Maintenance, and then Connection Status screen and click
Enable for the desired connection.
6. It is recommended to contact My Oracle Support for assistance.
Description:
This alarm indicates that one or more Communication Agent connections have been
administratively blocked at a remote server connected to the server, and this is
generally done as part of a maintenance procedure. A connection that is blocked
cannot be used by applications to communicate with other servers, and so this alarm
3-143
Chapter 3
Communication Agent, ComAgent (19420-19909)
may indicate that applications are unable to communicate with their expected set of
peers.
Note:
It is normal to have this alarm if the connection is in the Blocked
administrative state on the far-side of the connection.
Severity:
Minor
Instance:
N/A
Note:
This alarm is cleared when:
• Locally UNBLOCKed: An Admin Action to locally UNBLOCK the service
connection and no other connection is locally blocked.
• Deleted: The MP Server/Connection is deleted.
• Failed: The Connection is terminated, due to Admin Disable action or
Heartbeat failure or remote end initiated disconnection or any other
reason.
HA Score:
Normal
OID:
cAFConnRemoteBlockedNotify
1. Recovery:
1. Use Alarms & Events, and then View History to find additional information about
the alarm.
The information can be found by locating the row with a sequence number
that matches the active alarm sequence number and viewing the Additional Info
column.
2. Check the event history logs at Alarms & Events, and then View History for
additional Communication Agent events or alarms from this MP server.
3. Use Communication Agent, and then Maintenance, and then Connection
Status to determine which connections on the server have abnormal status.
4. If the expected set of connections is locally blocked, then no further action is
necessary.
3-144
Chapter 3
Communication Agent, ComAgent (19420-19909)
5. To remove a the local block condition for a connection, use the Communication
Agent, and then Maintenance, and then Connection Status screen and click
Enable for the desired connection.
6. It is recommended to contact My Oracle Support for assistance.
Description:
The percent utilization of the Communication Agent Task stack queue is
approaching defined threshold capacity. If this problem persists and the queue
reaches above the defined threshold utilization, the new StackEvents (Query/
Response/Relay) messages for the Task can be discarded based on the StackEvent
priority and Application's Global Congestion Threshold Enforcement Mode.
Severity:
Minor, Major, Critical
Instance:
<ComAgent StackTask Name>
HA Score:
Normal
OID:
cAFQueueUtilNotify
Cause:
This alarm raises when KPI ComAgentQueueUtil exceeds the thresholds defined in
the SysMetricThreshold table .
• MINOR: ComAgentQueueUtil|CAF|-*|Current|19803|60|50|3000
• MAJOR: ComAgentQueueUtil|CAF|**|Current|19803|80|70|3000
• CRITICAL: ComAgentQueueUtil|CAF|*C|Current|19803|95|90|3000
Diagnostic Information:
The percent utilization of the Communication Agent Task's Queue is approaching its
defined capacity. If this problem persists and the queue reaches above the defined
threshold utilization, the new StackEvents (Query/Response/Relay) messages for the
Task can be discarded, based on the StackEvent priority and Application's Global
Congestion Threshold Enforcement Mode.
This alarm should not normally occur when no other congestion alarms are asserted.
This may occur for a variety of reasons:
3-145
Chapter 3
Communication Agent, ComAgent (19420-19909)
1. Recovery:
1. Navigate to Main Menu, and then Alarms & Events to examine the alarm log.
An IP network or Adjacent node problem may exist preventing from transmitting
messages into the network at the same pace that messages are being received
from the network. The Task thread may be experiencing a problem preventing it
from processing events from its event queue. It is recommended to contact My
Oracle Support for assistance.
2. Navigate to Status & Manage, and then KPIs to monitor the ingress traffic rate of
each MP.
Each MP in the server site should be receiving approximately the same ingress
transaction per second.
It is recommended to contact My Oracle Support for assistance.
3. If the MP ingress rate is approximately the same, there may be an insufficient
number of MPs configured to handle the network traffic load.
If all MPs are in a congestion state, then the offered load to the server site is
exceeding its capacity.
It is recommended to contact My Oracle Support for assistance.
Description:
Communication Agent configured connection waiting for remote client to establish
connection. This alarm indicates that a Communication Agent is waiting for one or
more far-end client MPs to initiate transport connections. Generally this alarm is
asserted when a client MP or the IP network is undergoing maintenance or when a
connection has been manually disabled at a client MP.
Note:
It is normal to have this auto-clearing connection alarm for the remote
server connections that configured manually in Client mode, but are not yet
available for processing traffic.
Severity:
Minor
3-146
Chapter 3
Communication Agent, ComAgent (19420-19909)
Instance:
N/A
Note:
The alarm is cleared when a server connection exits the forming state and
no other connection having server connect mode is in the forming state or
the auto-clear time-out occurs.
• The MP Server/Connection is deleted
• When connection is moved to TotallyBlocked/RemotelyBlocked/
InService state from Aligning
• Auto Clear
• Connection is disabled
HA Score:
Normal
OID:
cAFClientConnWaitNotify
1. Recovery:
1. Find additional information for the alarm in Alarms & Events, and then View
History by locating the row with a sequence number that matches the active alarm
sequence number and viewing the Additional Info column.
The alarm is cleared only for remote server connections that are configured
manually in “Client” mode. This mode is used to listen for connection requests
from configured remote clients.
• The MP Server/Connection is deleted
• When connection is moved to TotallyBlocked/RemotelyBlocked/InService state
from Aligning
• Auto Clear
• Connection is disabled
2. Check the event history logs at Alarms & Events, and then View History for
additional Communication Agent events or alarms from this MP server.
3. Check Communication Agent, and then Maintenance, and then Connection
Status to determine which connections on the server have abnormal status.
4. Verify that the remote server is not under maintenance.
5. If the connection is manually disabled at the client MP, and it is expected to be
disabled, then no further action is necessary.
6. If the connection has been manually disabled at the client MP, but it is not
supposed to be disabled, then enable the connection by clicking on the 'Enable'
action button on the Connection Status screen.
3-147
Chapter 3
Communication Agent, ComAgent (19420-19909)
7. Verify that IP network connectivity exists between the two connection end-points.
8. Verify that the connection's local IP address and port number are configured on
remote client MP.
9. Verify that the Application Process using Communication Agent plug-in is running
on both ends.
10. Verify that the connection's remote IP address and port correctly identify remote's
listening port.
11. It is recommended to contact My Oracle Support for assistance.
Description:
The Communication Agent failed to align connection. This alarm indicates that
Communication Agent has established one or more transport connections with
servers that are running incompatible versions of software, and so Communication
Agent is unable to complete the alignment of the connection. A connection that fails
alignment cannot be used by applications to communicate with other servers, and
so this alarm may indicate that applications are unable to communicate with their
expected set of peers.
Severity:
Major
Instance:
N/A
HA Score:
Normal
OID:
cAFConnAlignFailedNotify
1. Recovery:
1. If the connection administrative action is set to ‘disable’, the alarm is cleared. No
further action is necessary.
2. Check the event history logs at Alarms & Events, and then View History for
additional Communication Agent events or alarms from this MP server.
3. Find additional information for the alarm in Alarms & Events, and then View
History by locating the row with a sequence number that matches the active alarm
sequence number and viewing the Additional Info column.
4. Check the event history logs at Alarms & Events, and then View History for
additional Communication Agent events or alarms from this MP server.
5. Check Communication Agent, and then Maintenance, and then Connection
Status to determine which connections on the server have abnormal status.
3-148
Chapter 3
Communication Agent, ComAgent (19420-19909)
For each connection reporting 'Aligning' connection status, determine the servers
that are endpoints, and verify that the correct software is installed on each server.
If incorrect software is present, then server maintenance may be required.
6. It is recommended to contact My Oracle Support for assistance.
Description:
The percent utilization of the Communication Agent internal resource pool
(CommMessage) is approaching its defined capacity. If this problem persists and
the usage reaches 100% utilization, ComAgent allocates the CommMessage objects
from the heap. This should not impact the functionality, but may impact performance
and/or latency.
Severity:
Critical, Major, Minor
Instance:
<ComAgent Process Name>
HA Score:
Normal
OID:
cAFPoolResUtilNotify
Cause:
This alarm raises when ComAgent mempool utilization exceeds threshold limits.
Minor (>= 60%), Major (>=80% ), Critical (>=95%), % level of Max = 65535.
Diagnostic Information:
The percent utilization of the Communication Agent internal resource pool,
CommMessage is approaching its defined capacity. If this problem persists and the
usage reaches 100% utilization, ComAgent will allocate the CommMessage objects
from the heap. This should not impact the functionality, but may impact performance
and/or latency.
This alarm usually occurs when other congestion alarms are asserted. This may occur
for one of the following reasons:
• An IP network or adjacent node problem may exist preventing from transmitting
messages into the network at the same pace that messages are being received
from the network.
• The Task thread may be experiencing a problem preventing it from processing
events from its internal resource queue.
• The mis-configuration of adjacent node IP routing may result in too much traffic
being distributed to the MP.
3-149
Chapter 3
Communication Agent, ComAgent (19420-19909)
1. Recovery:
1. Navigate to Alarms & Events to examine the alarm log.
An IP network or Adjacent node problem may exist preventing from transmitting
messages into the network at the same pace that messages are being received
from the network. The Task thread may be experiencing a problem preventing it
from processing events from its internal resource queue. It is recommended to
contact My Oracle Support for assistance.
2. Navigate to Status & Manage, and then KPIs to monitor the ingress traffic rate of
each MP.
Each MP in the server site should be receiving approximately the same ingress
transaction per second.
It is recommended to contact My Oracle Support for assistance.
3. If the MP ingress rate is approximately the same, there may be an insufficient
number of MPs configured to handle the network traffic load.
If all MPs are in a congestion state then the ingres rate to the server site is
exceeding its capacity.
It is recommended to contact My Oracle Support for assistance.
Description:
The percent utilization of the Communication Agent User Data FIFO queue is
approaching defined threshold capacity. If this problem persists and the queue
reaches above the defined threshold utilization, the new StackEvents (Query/
Response/Relay) messages for the Task can be discarded, based on the StackEvent
priority and Application's Global Congestion Threshold Enforcement Mode.
Severity:
Minor, Major, Critical
Instance:
<ComAgent StackTask Name>
HA Score:
Normal
OID:
cAFUserDataFIFOUtilNotify
Cause:
Minor (>= 60%), Major (>=80% ), Critical (>=95%), Percentage level of Max = 8000
3-150
Chapter 3
Communication Agent, ComAgent (19420-19909)
Diagnostic Information:
The percent utilization of the Communication Agent User Data FIFO queue is
approaching its defined capacity. If this problem persists and the queue reaches
above the defined threshold utilization, the new StackEvents (Query/Response/Relay)
messages for the Task can be discarded, based on the StackEvent priority and
Application's Global Congestion Threshold Enforcement Mode. This alarm should not
normally occur when no other congestion alarms are asserted.
1. Recovery:
1. Navigate to Alarms & Events to examine the alarm log and determine if the
ComAgent worker thread may be experiencing a problem preventing it from
processing events from User Data FIFO queue.
2. Navigate to Status & Manage, and then KPIs to monitor the ingress traffic rate of
each MP.
• Mis-configuration of routing may result in unbalanced traffic directed to the MP.
Under balanced traffic distribution, each MP should be receiving approximately
the same ingress transaction per second.
• There may be an insufficient number of MPs configured to handle the network
traffic load. If all MPs are in a congestion state then the offered load to the
server site is exceeding its capacity.
3. There may be an issue with network that causes lot of ComAgent connection
setup and handshake messages. Check network latency and stability parameters.
4. If the problem persists, it is recommended to contact My Oracle Support for
assistance.
Description:
The percent utilization of the Communication Agent Connection FIFO queue is
approaching defined threshold capacity. If this problem persists and the queue
reaches above the defined threshold utilization, the new ComAgent internal
Connection Management StackEvents messages can be discarded based on
Application's Global Congestion Threshold Enforcement Mode.
Severity:
Minor, Major, Critical
Instance:
<ComAgent StackTask Name>
HA Score:
Normal
OID:
cAFMxFIFOUtilNotify
3-151
Chapter 3
Communication Agent, ComAgent (19420-19909)
Cause:
Minor (>= 60%), Major (>=80% ), Critical (>=95%), Percentage level of Max = 1000
Diagnostic Information:
The percent utilization of the Communication Agent Connection FIFO queue is
approaching its defined capacity. If this problem persists and the queue reaches
above the defined threshold utilization, the new ComAgent internal Connection
Management StackEvents messages can be discarded based on Application's Global
Congestion Threshold Enforcement Mode. This alarm should not normally occur when
no other congestion alarms are asserted.
1. Recovery:
1. Use Main Menu, and then Alarms & Events to determine if the ComAgent worker
thread may be experiencing a problem preventing it from processing events from
ComAgent Connection FIFO queue.
It is recommended to contact My Oracle Supportfor assistance.
2. An IP network or adjacent node problem may exist preventing transmission of
messages into the network at the same pace the messages are being received
from the network.
3. Navigate to Status & Manage, and then KPIs to monitor the ingress traffic rate of
each MP.
• The mis-configuration of adjacent node IP routing may result in too much
traffic being distributed to the MP. Each MP in the server site should be
receiving approximately the same ingress transaction per second.
• There may be an insufficient number of MPs configured to handle the network
traffic load. If all MPs are in a congestion state then the offered load to the
server site is exceeding its capacity.
4. If the problem persists, it is recommended to contact My Oracle Support for
assistance.
Description:
The Communication Agent egress message is being discarded due to one of the
following reasons:
• Unknown destination server
• Connection state is not InService
• Incompatible destination
• Serialization failed
• MxEndpoint send failed
• Internal error
Severity:
Info
3-152
Chapter 3
Communication Agent, ComAgent (19420-19909)
Instance:
<RemoteIP>
Note:
If <RemoteIP> is not known at the time of message discard, then "Unknown"
will be used.
HA Score:
Normal
Throttle Seconds:
10
OID:
cAFEventEgressMessageDiscardedNotify
1. Recovery:
1. View the Event AddlInfo column.
Message is being discarded due to one of the reasons specified.
2. If it’s a persistent condition with the status of one of the Communication
Agent Configuration Managed Object then resolve the underlying issue with the
Managed Object.
3. If the event is raised due to software condition, It’s an indication that the
Communication Agent Process may be experiencing problems.
4. Use Main Menu, and then Alarms & Events and examine the alarm log.
5. It is recommended to contact My Oracle Support for assistance.
Description:
Communication Agent Ingress Message Discarded.
Severity:
Info
Instance:
<RemoteIP>
HA Score:
Normal
Throttle Seconds:
10
OID:
cAFEventIngressMessageDiscardedNotify
3-153
Chapter 3
Communication Agent, ComAgent (19420-19909)
1. Recovery:
1. View the Event AddlInfo column.
Message is being discarded due to one of the reasons specified.
2. If it’s a persistent condition with the status of one of the Communication
Agent Configuration Managed Object then resolve the underlying issue with the
Managed Object.
3. If the event is raised due to software condition, it is an indication that the
Communication Agent Process may be experiencing problems.
4. Use Main Menu, and then Alarms & Events and examine the alarm log.
5. It is recommended to contact My Oracle Support for assistance.
Description:
Communication Agent Peer has not responded to heartbeat.
Severity:
Info
Instance:
<RemoteIP>
HA Score:
Normal
OID:
cAFEventHeartbeatMissedNotify
1. Recovery:
1. Check the configuration of managed objects and resolve any configuration issues
with the Managed Object or hosting nodes.
This message may be due to network condition or latency or due to setup issues.
2. If the event is raised due to software condition, It’s an indication that the
Communication Agent Process may be experiencing problems.
3. Use Main Menu, and then Alarms & Events and examine the alarm log.
4. It is recommended to contact My Oracle Support for assistance.
Description:
Communication Agent Connection State Changed.
3-154
Chapter 3
Communication Agent, ComAgent (19420-19909)
Severity:
Info
Instance:
<RemoteIP>
HA Score:
Normal
OID:
cAFEventConnectionStateChangeNotify
1. Recovery:
1. Use Main Menu, and then Alarms & Events and examine the alarm log.
This Event is a log of connection state change.
2. It is recommended to contact My Oracle Support for assistance.
Description:
Communication Agent DB Responder detected a change in configurable control
option parameter.
Note:
This event is an indication that Communication Agent detected a control
parameter change. The change will be applied to applicable software
component. If the change is applied on the GUI, the appropriate GUI action
is logged in security logs. If the action is not performed from GUI and the
control parameter is changed, this event indicates the executed change.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
cAFEventComAgtConfigParamChangeNotify
1. Recovery:
1. Use Main Menu, and then Alarms & Events and examine the alarm log.
2. Use Main Menu, and then Security Log and examine the alarm log.
3-155
Chapter 3
Communication Agent, ComAgent (19420-19909)
3. If the event shows up in Main Menu, and then Alarms & Events, without the
corresponding GUI security-log in Main Menu, and then Security Log. It is
recommended to contact My Oracle Support for assistance.
Description:
The percent utilization of the Communication Agent DataEvent Mempool is
approaching defined threshold capacity.
Severity:
Minor, Major, Critical
Instance:
<ComAgent Process>
HA Score:
Normal
OID:
cAFDataEvPoolResUtilNotify
1. Recovery:
Description:
This alarm indicates all connections of all connection groups associated with a
routed service are unavailable. This generally occurs when far-end servers have been
removed from service by maintenance actions. This can also occur if all of the routed
service’s connections have been either disabled or blocked.
Severity:
Major
Instance:
<RoutedServiceName>
HA Score:
Normal
3-156
Chapter 3
Communication Agent, ComAgent (19420-19909)
OID:
cAFRSUnavailNotify
Cause:
When all member Connection Groups are Unavailable.
Diagnostic Information:
This alarm indicates all connections of all connection groups associated with a
routed service are unavailable. This generally occurs when far-end servers have
been removed from service by maintenance actions. This can also occur if all of the
routed service's connections have been either disabled or blocked. Also, if there is
any disruption that can lead to loss of connectivity between the user and provider MP.
1. Recovery:
1. Navigate to Communication Agent, and then Maintenance, and then Routed
Service Status to view the connection groups and connections associated with
the Routed Service.
2. Navigate to Communication Agent, and then Maintenance, and then
Connection Status to view the reasons why connections are unavailable.
3. Navigate to Status & Manage, and then Server to confirm the far-end servers
have an application state of enabled, and their subsystems are operating normally.
This alarm can result from conditions at the far-end servers connected to the
server that asserted this alarm.
4. Check network and reach-ability of provider server(s) from user server(s). Loss of
network connectivity can lead to this alarm. In that case, the user also sees alarm
19800.
5. It is recommended to contact My Oracle Support for assistance.
Description:
This alarm indicates that some, but not all, connections are unavailable in the
connection group being used by a Communication Agent Routed Service to route
messages. The result is that the server that posted this alarm is not load-balancing
traffic across all of the connections configured in the connection group.
Severity:
Major
Instance:
<ServiceName>
HA Score:
Normal
3-157
Chapter 3
Communication Agent, ComAgent (19420-19909)
OID:
cAFRSDegradedNotify
1. Recovery:
1. Use Communication Agent, and then Maintenance, and then Routed Service
Status to view the connection groups and connections associated with the Routed
Service.
2. Use Communication Agent, and then Maintenance, and then Connection
Status to view the reasons why connections are unavailable.
3. Use Status & Manage, and then Server to confirm that the far-end servers have
an application state of enabled, and that their subsystems are operating normally.
It is possible that this alarm results from conditions at the far-end servers
connected to the server that asserted this alarm.
4. It is recommended to contact My Oracle Support for assistance.
Description:
This alarm indicates a routed service is load-balancing traffic across all connections in
a connection group, but all of the connections are experiencing congestion. Messages
may be discarded due to congestion.
Severity:
Major
Instance:
<ServiceName>
HA Score:
Normal
OID:
cAFRSCongestedNotify
Cause:
When the active Connection Group is congested.
Diagnostic Information:
This alarm indicates a routed service is load-balancing traffic across all connections in
a connection group, but all of the connections are experiencing congestion. Messages
may be discarded due to congestion. Congestion generally occurs when the far-end
servers are overloaded.
Overload can be due to following:
• TCP connection has higher latency or error rate, then connection is getting into
congestion state
3-158
Chapter 3
Communication Agent, ComAgent (19420-19909)
• Far end server is receiving traffic at higher rate (may be from other servers). This
triggers ComAgent congestion on far-end side.
• Application process CPU on far-end is above normal.
1. Recovery:
1. Navigate to Communication Agent, and then Maintenance, and then Routed
Service Status to view the connection groups and connections associated with
the Routed Service.
2. Navigate to Communication Agent, and then Maintenance, and then
Connection Status to view the are congested and the degree to which they are
congested.
3. Check the far-end of the congested connections to further isolate the cause of
congestion.
If the far-end servers are overloaded, then it is possible the system is being
presented a load that exceeds its engineered capacity. If this is the case, then
either the load must be reduced, or additional capacity must be added.
4. It is recommended to contact My Oracle Support for assistance.
Description:
Communication Agent routed service is routing traffic using a connection group that
has a lower-priority than another connection group.
Severity:
Major
Instance:
<ServiceName>
HA Score:
Normal
OID:
cAFRSUsingLowPriConnGrpNotify
1. Recovery:
1. Use Communication Agent, and then Maintenance, and then Routed Service
Status to view the connection groups and connections associated with the Routed
Service.
2. Use Communication Agent, and then Maintenance, and then Connection
Status to view the reasons why connections are unavailable.
3-159
Chapter 3
Communication Agent, ComAgent (19420-19909)
3. Use Status & Manage, and then Server to confirm that the far-end servers have
an application state of enabled, and that their subsystems are operating normally.
It is possible that this alarm results from conditions at the far-end servers
connected to the server that asserted this alarm.
4. It is recommended to contact My Oracle Support for assistance.
Description:
The ComAgent Reliable Transfer Function is approaching or exceeding its
engineered reliable transaction handling capacity.
Severity:
Minor, Major, Critical
Instance:
N/A (ComAgent process)
HA Score:
Normal
OID:
cAFTransUtilNotify
Cause:
Default Values:
• Minor >= PTRCL1OnsetPrcnt and < PTRCL2OnsetPrcnt
• Major >= PTRCL2OnsetPrcnt and < PTRCL3OnsetPrcnt
• Critical >= PTRCL3OnsetPrcnt
3-160
Chapter 3
Communication Agent, ComAgent (19420-19909)
Diagnostic Information:
N/A.
1. Recovery:
1. Navigate to Status & Manage, and then Server Status to view MP server status.
2. Remote server is slow in responding to outstanding transaction with correlation
resource in-use. The mis-configuration of ComAgent server/client routing may
result in too much traffic being distributed to affected connection for MP.
3-161
Chapter 3
Communication Agent, ComAgent (19420-19909)
Description:
The number of failed transactions during the sampling period has exceeded
configured thresholds.
Severity:
Minor, Major, Critical
Instance:
<ServiceName>
HA Score:
Normal
OID:
cAFTransFailRateNotify
Cause:
Default Values:
• Minor >= FailedTransOnset1Rate and < FailedTransOnset2Rate
• Major >= FailedTransOnset2Rate and < FailedTransOnset3Rate
• Critical >= FailedTransOnset3Rate
3-162
Chapter 3
Communication Agent, ComAgent (19420-19909)
Diagnostic Information
N/A.
1. Recovery:
1. Navigate to Status & Manage, and then Server Status to view MP server status.
2. Remote server is slow in responding to outstanding transaction with correlation
resource in-use. The mis-configuration of ComAgent Server/Client routing may
result in too much traffic being distributed to affected connection for MP.
3. There may be an insufficient number of server application MPs configured to
handle the internal traffic load. If server application MPs are in a congestion state
then the offered load to the server site is exceeding its capacity.
4. Navigate to Alarm & Events to examine the alarm log.
The system may be experiencing network problems.
The Communication Agent process may be experiencing problems.
5. It is recommended to contact My Oracle Support for assistance.
Description:
This alarm indicates Communication Agent is experiencing congestion in
communication between two servers and this can be caused by a server becoming
overloaded or by network problems between two servers.
Severity:
Major
Instance:
N/A
HA Score:
Normal
OID:
cAFConnCongestedNotify
3-163
Chapter 3
Communication Agent, ComAgent (19420-19909)
Cause:
• A connection becomes congested, that is congestion level (CL) increases
from ConnCL0 to either ConnCL1, ConnCL2, or ConnCL3. If a connection
becomes congested, and there is another congested connection, then update the
connection count and re-assert the alarm.
• A connection becomes uncongested, that is congestion level (CL) decreases to
ConnCL0, and there is another congested connection. Update the connection
count and re-assert the alarm.
Overload can be due to:
• TCP connection has higher latency or error rate, then connection is getting into
congestion state
• Far-end server is receiving traffic at higher rate (may be from other servers). This
triggers ComAgent congestion on far-end side.
• Application process CPU on far-end is above normal.
Diagnostic Information:
N/A.
1. Recovery:
1. Navigate to Alarms & Events, and then View History to find additional
information for the alarm by locating the row with a sequence number that
matches the active alarm sequence number and viewing the Additional Info
column.
2. Navigate to Alarms & Events, and then View History to check the event history
logs for additional Communication Agent events or alarms from this MP server.
3. Navigate to Communication Agent, and then Maintenance, and then
Connection Status to determine which connections on the server have abnormal
status.
4. If the Remote MP Overload Level (OL) > 0 then determine why the remote server
is congested.
a. Verify the remote server is not under maintenance.
b. Examine the remote's CPU utilization.
5. If the problem persists, it is recommended to contact My Oracle Support for
assistance.
Description:
The percent utilization of the SMS Task stack queue is approaching defined threshold
capacity.
Severity:
Minor, Major, Critical
3-164
Chapter 3
Communication Agent, ComAgent (19420-19909)
Instance:
<SMS Thread/Queue Index>
HA Score:
Normal
OID:
cAFSmsQueueUtilNotify
1. Recovery:
1. The system itself may be heavily loaded with work, causing this subsystem to also
become overloaded. Check other system resources (ComAgent Congestion, Cpu
Utilization, and Server Congestion are some examples) for signs of overload.
2. If the problem persists, it is recommended to contact My Oracle Support for
assistance.
Description:
Communication Agent Service Registration State Change.
Severity:
Info
Instance:
<ServiceName>
HA Score:
Normal
OID:
cAFEventComAgtSvcRegChangedNotify
1. Recovery:
• This event is a log of normal application startup and shutdown activity. It may
provide aid during troubleshooting when compared to other events in the log.
Description:
Communication Agent Service Operational State Changed.
Severity:
Info
3-165
Chapter 3
Communication Agent, ComAgent (19420-19909)
Instance:
<ServiceName>
HA Score:
Normal
OID:
cAFEventComAgtSvcOpStateChangedNotify
1. Recovery:
1. This event indicates that a Communication Agent service changed operational
state, and typically results from maintenance actions.
A service can also change state due to server overload.
2. If the state change is unexpected, it is recommended to contact My Oracle
Support for assistance.
Description:
Failed transaction between servers result from normal maintenance actions, overload
conditions, software failures, or equipment failures.
Severity:
Info
Instance:
<ServiceName>, <RemoteIP> |<null>
• If serviceID is InvalidServiceID, then <ServiceName> is “EventTransfer”.
• If <ServiceName> is “EventTransfer”, then include <RemoteIP>.
• If serviceID is unknown, then <ServiceName> is null.
HA Score:
Normal
Throttle Seconds:
10
OID:
cAFEventComAgtTransFailedNotify
1. Recovery:
1. Use Communication Agent, and then Maintenance, and then Connection
Status to determine if the local server is unable to communicate with another
server or if servers have become overloaded.
2. Check the server’s KPIs and the Communication Agent, and then Maintenance,
and then Connection Status to trouble-shoot the cause of server overload.
3-166
Chapter 3
Communication Agent, ComAgent (19420-19909)
3. Check the Communication Agent, and then Maintenance, and then HA Status
that corresponds to the ServiceID in the event instance to trouble-shoot the
operation of the service.
4. If the event cannot be explained by maintenance actions, it is recommended to
contact My Oracle Support for assistance.
Description:
Communication Agent Service Egress Message Discarded.
Severity:
Info
Instance:
<ServiceName>
• If serviceID is unknown, then <ServiceName> is null.
HA Score:
Normal
Throttle Seconds:
10
OID:
cAFEventRoutingFailedNotify
1. Recovery:
1. View the Event AddlInfo column.
Message is being discarded due to one of the reasons specified.
2. If it’s a persistent condition with the status of one of the Communication
Agent Configuration Managed Object then resolve the underlying issue with the
Managed Object.
3. If the event is raised due to software condition, it’s an indication that the
Communication Agent Process may be experiencing problems.
4. Use Main Menu, and then Alarms & Events and examine the alarm log.
5. It is recommended to contact My Oracle Support for assistance.
Description:
Communication Agent Resource-Provider Registered.
3-167
Chapter 3
Communication Agent, ComAgent (19420-19909)
Severity:
Info
Instance:
<ResourceName>
HA Score:
Normal
OID:
cAFEventResourceProviderRegisteredNotify
1. Recovery:
• No action required.
Description:
Communication Agent Resource-Provider Resource State Changed.
Severity:
Info
Instance:
<ProviderServerName>: <ResourceName>
HA Score:
Normal
OID:
cAFEventResourceStateChangeNotify
1. Recovery:
• No action required.
Description:
Communication Agent Resource-Provider Stale Status Received.
Severity:
Info
3-168
Chapter 3
Communication Agent, ComAgent (19420-19909)
Instance:
<ProviderServerName>: <ResourceName>
HA Score:
Normal
Throttle Seconds:
10
OID:
cAFEventStaleHBPacketNotify
1. Recovery:
Description:
Communication Agent Resource-Provider Deregistered.
Severity:
Info
Instance:
<ResourceName>
HA Score:
Normal
OID:
cAFEventResourceProviderDeRegisteredNotify
1. Recovery:
• No action required.
Description:
Communication Agent Resource Degraded. A local application is using the resource,
identified in the alarm, and the access to the resource is impaired. Some of the
resource providers are either unavailable and/or congested.
Severity:
Major
3-169
Chapter 3
Communication Agent, ComAgent (19420-19909)
Instance:
<ResourceName>
HA Score:
Normal
OID:
cAFResourceCongestedNotify
1. Recovery:
1. Use Communication Agent, and then Maintenance, and then HA Services
Status to determine which sub-resources are unavailable or degraded for the
server that asserted the alarm.
2. Use Communication Agent, and then Maintenance, and then Connection
Status to determine if connections have failed or have congested.
3. It is recommended to contact My Oracle Support for assistance.
Description:
Communication Agent Resource unavailable. A local application needs to use
a ComAgent resource, but the resource is unavailable. The resource can be
unavailable if the local server has no ComAgent connections to servers providing
the resource or no servers host active instances of the resource’s sub-resources.
Severity:
Major
Instance:
<ResourceName>
HA Score:
Normal
OID:
cAFResourceUnavailNotify
Cause:
Communication Agent Resource Unavailable. A local application needs to use a
ComAgent resource, but the resource is unavailable. The resource can be unavailable
if the local server has no ComAgent connections to servers providing the resource or
no servers host active instances of the resource's sub-resources.
Diagnostic Information:
N/A.
3-170
Chapter 3
Communication Agent, ComAgent (19420-19909)
1. Recovery:
1. Navigate to Communication Agent, and then Maintenance, and then
Connection Status to verify the local server is connected to the expected servers.
If the local server reports unavailable connections, then take actions to
troubleshoot the cause of the connection failures.
2. If the ComAgent connections are InService, navigate to Communication Agent,
and then Maintenance, and then HA Services Status to determine which servers
are providing the resource.
If no servers are providing the resource, then the most likely reason is
maintenance actions have removed the application from service that provides the
concerned resource.
3. It is recommended to contact My Oracle Support for assistance.
Description:
Communication Agent Resource Error. Two sets of servers are using incompatible
configurations for a ComAgent resource.
Severity:
Minor
Instance:
<ResourceName>
HA Score:
Normal
OID:
cAFResourceErrorNotify
1. Recovery:
1. Use Communication Agent, and then Maintenance, and then HA Services
Status to determine which sets of servers are incompatible.
Check the incompatible servers to verify that they are operating normally and are
running the expected versions of software.
2. It is recommended to contact My Oracle Support for assistance.
3-171
Chapter 3
Communication Agent, ComAgent (19420-19909)
Description:
Communication Agent Resource-User Registered.
Severity:
Info
Instance:
<ResourceName>
HA Score:
Normal
OID:
cAFEventResourceUserRegisteredNotify
1. Recovery:
• No action required.
Description:
Communication Agent Resource-User Deregistered.
Severity:
Info
Instance:
<ResourceName>
HA Score:
Normal
OID:
cAFEventResourceUserDeRegisteredNotify
1. Recovery:
• No action required.
Description:
Communication Agent Resource Routing State Changed.
Severity:
Info
3-172
Chapter 3
Communication Agent, ComAgent (19420-19909)
Instance:
<ResourceName>
HA Score:
Normal
OID:
cAFEventResourceRoutingStateNotify
1. Recovery:
• No action required.
Description:
Communication Agent Resource Egress Message Discarded.
Severity:
Info
Instance:
<ResourceName>: <SubResourceID>
Note:
If the resource is unknown, then <ResourceName> is the ResourceID
converted to text. The <SubResourceID> is an integer converted to text,
regardless of whether it is known or unknown.
HA Score:
Normal
Throttle Seconds:
10
OID:
cAFEventHaEgressMessageDiscardedNotify
1. Recovery:
1. Message is being discarded due to one of the reasons specified in Event AddlInfo.
If the condition is persistent with the status of one of the ComAgent Configuration
Managed Objects there is an underlying issue with the Managed Object.
2. Use Main Menu, and then Alarms & Events and examine the alarm log for
ComAgent Process problems.
3. It is recommended to contact My Oracle Support for assistance.
3-173
Chapter 3
Communication Agent, ComAgent (19420-19909)
Description:
Communication Agent Resource-Provider Tracking Table Audit Results. This event
is generated when a Resource Provider Tracking Table (RPTT) entry with Status
equal to Auditing is replaced with a new status (null, Active, Standby, Spare, OOS,
etc) and there are no other RPTT entries, for this specific Resource/SR, with Status
equal to Auditing.
Severity:
Info
Instance:
None
HA Score:
Normal
OID:
cAFEventHaRPTTAuditResultNotify
1. Recovery:
• No action required.
Description:
This alarm indicates a possible IP network disruption that has caused more than one
Resource Provider to become Active. The server that asserted this alarm expects
there to be only one active Resource Provider server for the Resource, but instead it
is seeing more than one. During this condition the server may be sending commands
to the wrong Resource Provider. This may affect applications such as CPA, PDRA.
Severity:
Major
Instance:
<ResourceName>
HA Score:
Normal
3-174
Chapter 3
Communication Agent, ComAgent (19420-19909)
OID:
cAFMultipleActivesNotify
1. Recovery:
1. Use Communication Agent, and then Maintenance, and then HA Services
Status to determine which Resource Provider servers are announcing ‘Active’
status for the Resource.
2. Investigate possible IP network isolation between these Resource Provider
servers.
3. It is recommended to contact My Oracle Support for assistance.
Description:
The Communication Agent Service Provider Registration State has changed.
Severity:
Info
Instance:
<ServiceName>
HA Score:
Normal
OID:
cAFEventSvcProvRegStateChangedNotify
1. Recovery:
1. This event is a log of normal application startup and shutdown activity. It may
provide aid during troubleshooting when compared to other events in the log.
2. It is recommended to contact My Oracle Support for further assistance.
Description:
The Communication Agent Service Provider Operational State has Changed
Severity:
Info
3-175
Chapter 3
Communication Agent, ComAgent (19420-19909)
Instance:
<ServiceName>
HA Score:
Normal
OID:
cAFEventSvcProvOpStateChangedNotify
1. Recovery:
1. This event indicates that a ComAgent service provider changed operational state,
and typically results from maintenance actions. A service can also change state
due to overload.
2. If the state change is unexpected, it is recommended to contact My Oracle
Support.
Description:
The Communication Agent receives a connection request from an unknown server.
Severity:
Info
Instance:
<RemoteIP>
HA Score:
Normal
Throttle Seconds:
1800 (30 minutes)
OID:
cAFEventSvcProvOpStateChangedNotify
1. Recovery:
1. Verify network routes are correctly configured for ComAgent.
2. If assistance is required, it is recommended to contact My Oracle Support.
Description:
This alarm indicates that a Communication Agent Configuration Daemon has
encountered an error that prevents it from properly using server topology
3-176
Chapter 3
Communication Agent, ComAgent (19420-19909)
Severity:
Critical
Instance:
None
HA Score:
Normal
OID:
CAFTableMonitorFailureNotify
Cause:
Alarm 19860 is asserted when Communication Agent Configuration Daemon is
unable to monitor one or more tables that it has been configured to monitor.
Diagnostic Information:
This alarm indicates that a Communication Agent Configuration Daemon has
encountered an error that prevents it from properly using server topology
configuration data to configure automatic connections for the Communication Agents
on MPs, and this may prevent applications on MPs from communicating.
To troubleshoot:
• Find additional information for the alarm in Alarms & Events, and then View
History by locating the row with a sequence number that matches the active
alarm sequence number and viewing the Additional Info column.
• Check the event history logs at Alarms & Events, and then View History for
additional Communication Agent events or alarms from this server.
1. Recovery:
1. Use Alarms & Events, and then View History to find additional information about
the alarm.
The information can be found by locating the row with a sequence number
that matches the active alarm sequence number and viewing the Additional Info
column.
2. Check the event history logs at Alarms & Events, and then View History for
additional Communication Agent events or alarms from this MP server.
3. If conditions do not permit a forced failover of the active NOAM, it is recommended
to contact My Oracle Support for assistance.
4. If conditions permit, then initiate a failover of active NOAM.
This causes the Communication Agent Configuration Daemon to exit on the
originally-active NOAM and to start on the newly-active NOAM.
5. After NOAM failover completes, verify the alarm has cleared.
6. If the alarm has not cleared, it is recommended to contact My Oracle Support for
assistance.
3-177
Chapter 3
Communication Agent, ComAgent (19420-19909)
Description:
This alarm indicates a Communication Agent Configuration Daemon has encountered
an error that prevents it from properly using server topology configuration data to
configure automatic connections for the Communication Agents on MPs, and this may
prevent applications on MPs from communicating.
Severity:
Critical
Instance:
None
HA Score:
Normal
OID:
cAFScriptFailureNotify
Cause:
This alarm raises when the Communication Agent Configuration Daemon
configuration script fails.
Diagnostic Information:
This alarm indicates a Communication Agent Configuration Daemon has encountered
an error that prevents it from properly using server topology configuration data to
configure automatic connections for the Communication Agents on MPs, and this may
prevent applications on MPs from communicating.
To troubleshoot:
• Find additional information for the alarm in Alarms & Events, and then View
History by locating the row with a sequence number that matches the active
alarm sequence number and viewing the Additional Info column.
• Check the event history logs at Alarms & Events, and then View History for
additional Communication Agent events or alarms from this server.
1. Recovery:
1. Use Alarms & Events, and then View History to find additional information about
the alarm.
The information can be found by locating the row with a sequence number
that matches the active alarm sequence number and viewing the Additional Info
column.
2. Check the event history logs at Alarms & Events, and then View History for
additional Communication Agent events or alarms from this server.
3-178
Chapter 3
Communication Agent, ComAgent (19420-19909)
Description:
The Communication Agent Ingress Stack Event Rate is approaching its defined
threshold capacity.
Severity:
• Minor - if exceeding 100K on Gen8/Gen9 hardware, 75k on other hardware
• Major - if exceeding 110K on Gen8/Gen9 hardware, 80k on other hardware
• Critical - if exceeding 120K on Gen8/Gen9 hardware, 84k on other hardware
Instance:
<ServiceName>
HA Score:
Normal
OID:
cAFIngressRateNotify
1. Recovery:
1. This alarm indicates that a server is overrunning its defined processing capacity. If
any of the defined threshold onset levels are exceeded, Communication Agent will
discard comparatively low priority messages. Check the configuration, routing, and
deployment mode capacity.
2. It is recommended to contact My Oracle Support for further assistance.
3-179
Chapter 3
Communication Agent, ComAgent (19420-19909)
Description:
The maximum number of connections per connection group limit has been reached.
Severity:
Info
Instance:
<Connection group name>
HA Score:
Normal
Throttle Seconds:
10
OID:
cAFComAgentMaxConnsInConnGrpNotify
1. Recovery:
1. This event indicates that a connection group has already reached its maximum
limit and no more connections can be added to the group. Determine what is
preventing potential connections from being added to the connection group.
2. It is recommended to contact My Oracle Support for further assistance.
Description:
ComAgent successfully set the host server hardware profile.
Severity:
Info
Instance:
None
HA Score:
Normal
OID:
cAFEventSuccessSetHostServerHWProfileNotify
1. Recovery:
1. This event indicates that all TPS controlling parameter values are successfully set
for the host server hardware profile.
2. If needed, it is recommended to contact My Oracle Support.
3-180
Chapter 3
Communication Agent, ComAgent (19420-19909)
Description:
ComAgent failed to set the host server hardware profile.
Severity:
Info
Instance:
None
HA Score:
Normal
OID:
cAFEventFailToSetHostServerHWProfileNotify
1. Recovery:
1. This event indicates that there is a failure in applying default hardware settings
for ComAgent TPS controlling parameters. When default settings also fail to apply,
then the factory values will be used for the TPS controlling parameters.
2. If needed, it is recommended to contact My Oracle Support.
Description:
The Communication Agent Peer Group operational status has changed.
Severity:
Info
Instance:
<PeerGroupName>
HA Score:
Normal
OID:
cAFEventPeerGroupStatusChangeNotify
1. Recovery:
3-181
Chapter 3
Communication Agent, ComAgent (19420-19909)
Description:
The Communication Agent Peer Group egress message is being discarded due to
one of the following reasons:
• Unknown Peer Group
• Peer Group Unavailable
• Peer Congested
• Reliability not supported
Severity:
Info
Instance:
<PeerGroupName>
HA Score:
Normal
Throttle Seconds:
10
OID:
cAFEventPSEgressMessageDiscardedNotify
1. Recovery:
Description:
Communication Agent connection rejected. Connection to the peer node is not
initiated due to network incompatibility. This event will be raised on the connection
initiator side when the connection initiator MP has only IPv6 IP addresses configured
and Remote MP has only IPv4 IP addresses configured or when connection initiator
MP has only IPv4 IP addresses configured and Remote MP has only IPv6 IP
addresses configured.
Severity:
Info
Instance:
<RemoteIP>
HA Score:
Normal
OID:
cAFEventConnectionRejectNotify
3-182
Chapter 3
Communication Agent, ComAgent (19420-19909)
1. Recovery:
1. Disable both sides of the connection.
2. Configure the correct network modes on either server.
3. Restart the application on the reconfigured server.
4. Enable both sides of the connection.
5. It is recommended to contact My Oracle Support for assistance if needed.
Description:
The process, which is responsible for handling all signaling traffic, is approaching or
exceeding its engineered traffic handling capacity.
Severity:
Critical, Major, Minor
Instance:
N/A
HA Score:
Normal
OID:
dbcProcessCpuUtilizationNotify
Cause:
This alarm raises when the MP is handling too much traffic and is operating in
congestion.
Diagnostic Information:
N/A
1. Recovery:
1. Navigate to Status & Manage, and then KPIs to monitor the ingress traffic rate of
each MP.
• The mis-configuration of Server/Client routing may result in too much traffic
being distributed to the MP. Each MP in the server site should be receiving
approximately the same ingress transaction per second.
• There may be an insufficient number of MPs configured to handle the network
traffic load. If all MPs are in a congestion state, then the traffic load to the
server site is exceeding its capacity.
2. Navigate to Alarms & Events to examine the alarm log.
It is recommended to contact My Oracle Support for assistance.
3-183
Chapter 3
Communication Agent, ComAgent (19420-19909)
Description:
A minor database validation error was detected on the MP server during an update.
MP internal database is now out of sync with the configuration database. Subsequent
database operations on the MP are ALLOWED.
Severity:
Major
Instance:
N/A
HA Score:
Normal
OID:
dbcCfgDbValidationErrorNotify
1. Recovery:
Description:
A critical database validation error was detected on the MP server during an update.
MP internal database is now out of sync with the configuration database. Subsequent
database operations on the MP are DISABLED.
Severity:
Critical
Instance:
N/A
HA Score:
Normal
3-184
Chapter 3
Communication Agent, ComAgent (19420-19909)
OID:
dbcCfgDbUpdateFailureNotify
Cause:
After receiving configuration updates from GUI, the DSR application is not able to
modify its Runtime Database completely and correctly. All configurations changes are
verified for syntactic and semantic errors by pre-update procedures.
Poor system health or degraded application state might be one of the cause.
Diagnostic Information:
• Determine if this condition indicates a software problem or unexpected TC User
behavior.
• The Event Additional Information field includes a description of the event
received, cause, and the actions occurred with the operation or dialogue as a
result. Dialogue removed by dialogue cleanup timer.
• Possibly an Internal Error has occurred. Perform the following:
– Click Alarm Instance.
– Collect the information from instance and additional Information section of
raised alarm.
– Provide this information while contacting My Oracle Support.
1. Recovery:
Description:
A minor database validation error was detected on the MP server after a database
update. MP internal database is still in sync with the configuration database.
Subsequent database operations on the MP are ALLOWED.
Severity:
Major
Instance:
N/A
HA Score:
Normal
OID:
dbcCfgDbPostUpdateErrorNotify
3-185
Chapter 3
Communication Agent, ComAgent (19420-19909)
Cause:
N/A
Diagnostic Information:
N/A
1. Recovery:
Description:
A critical database validation error was detected on the MP server after a database
update. MP internal database is still in sync with the configuration database.
Subsequent database operations on the MP are DISABLED.
Severity:
Critical
Instance:
N/A
HA Score:
Normal
OID:
dbcCfgDbPostFailureNotify
Cause:
After receiving configuration updates from GUI, the DSR application is not able
to modify its Runtime Database and fails in the post-update procedure such as
verification. The error is critical, and subsequent configuration updates will not be
updated in the Runtime Database.
All configurations changes are verified for syntactic and semantic errors by pre-update
procedures. One of the causes for this alarm is the poor system health.
Diagnostic Information:
The alarm may raise due to an internal error. Click Alarm Instance. Collect the
information from instance and additional Information section of raised alarm. Provide
this information while contacting My Oracle Support.
1. Recovery:
3-186
Chapter 3
Communication Agent, ComAgent (19420-19909)
Description:
A measurement object failed to initialize.
Severity:
Critical
Instance:
<measTagName>
HA Score:
Normal
OID:
dbcMeasurementInitializationFailureNotify
Cause:
All Measurements are bound to a specific Measurement ID or Measurement Name
defined in the Internal Database. This alarm is raised when Measurement subsystem
initialization has failed, which occurs only when the system (or a process) is coming
up.
The alarm raises when:
• An application is trying to bind the measurement using an incorrect measurement
identifier which does not exist in Database. If you have performed an upgrade or a
new installation, contact My Oracle Support for assistance.
• An unauthorized configuration change resulted in inconsistent data.
Diagnostic Information:
Note any configuration change made to the system which requires (or caused)
a process(or system) restart. Additionally, note alarm instance and any additional
information present in alarm's Additional Info section.
1. Recovery:
3-187
Chapter 3
Diameter Signaling Router (DSR) Diagnostics (19910-19999)
Description:
Normal traffic is being discarded because it is routed to an egress Test Connection.
An egress Test Connection is given a normal message to be transmitted.
Severity:
Major
Instance:
<Connection name>
HA Score:
Normal
OID:
dbcNormalMessageDiscardedNotify
1. Recovery:
1. Update routing rules to exclude Test connections from being used for routing.
Normal traffic should be received and sent on non-test connections.
2. Change the hostname of the peer connected to the test connection.
The hostname of the peer connected to the test connection may be the destination
host for the incoming normal traffic.
Description:
Test message is given to a non-test connection to be transmitted.
3-188
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Severity:
Info
Instance:
<Connection name>
HA Score:
Normal
Throttle Seconds:
5
OID:
dbcDiagnosticMessageDiscardNotify
1. Recovery:
• Update routing rules to exclude Test messages from being routed to non-test
connection.
Test messages should be received and sent only on test connections.
Description:
DraWorker connection FSM exception.
Severity
Info
Instance
<DraWorker Name>:001
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvFsmException
1. Recovery
1. This event is potentially caused by the Peer CNDRA process reaching its
descriptor capacity.
3-189
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description
DraWorker connection FSM exception.
Severity
Info
Instance
<DraWorker Name>:002
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvFsmException
1. Recovery
1. Potential causes of this event are:
• Network interface(s) are down.
• Port is already in use by another process.
• Configuration is invalid.
2. This event is unexpected. It is recommended to contact My Oracle Support for
assistance.
Description
DraWorker connection FSM exception.
Severity
Info
Instance
<DraWorker Name>:003
HA Score
Normal
Throttle Seconds
10
3-190
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
OID
eagleXgDiameterMpEvFsmException
1. Recovery
1. Potential causes of this event are:
• Peer CNDRA process is not running with root permission.
• Configuration is invalid.
2. This event is unexpected. It is recommended to contact My Oracle Support for
assistance.
Description
DraWorker connection FSM exception.
Severity
Info
Instance
<DraWorker Name>:004
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvFsmException
1. Recovery
Note:
The rate will ease over time as an increasing number of connections are
accepted.
Description
DraWorker connection FSM exception.
3-191
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Severity
Info
Instance
<DraWorker Name>:101
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvFsmException
1. Recovery
Description
DraWorker connection FSM exception.
Severity
Info
Instance
<DraWorker Name>:102
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvFsmException
1. Recovery
• No action required.
Description
DraWorker connection FSM exception.
3-192
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Severity
Info
Instance
<DraWorker Name>:103
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvFsmException
1. Recovery
Description
DraWorker connection FSM exception.
Severity
Info
Instance
<DraWorker Name>:104
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvFsmException
1. Recovery
3-193
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description
DraWorker connection FSM exception.
Severity
Info
Instance
<DraWorker Name>:105
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvFsmException
1. Recovery
• No action required.
Description
DraWorker connection FSM exception.
Severity
Info
Instance
<DraWorker Name>:106
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvFsmException
1. Recovery
3-194
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description
DraWorker connection FSM exception.
Severity
Info
Instance
<DraWorker Name>:201
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvFsmException
1. Recovery:
8001 - MpEvException
8001 - 001 - MpEvException_Oversubscribed
Event Type
DIAM
Description
DraWorker exception.
Severity
Info
Instance
<DraWorker Name>:001
HA Score
Normal
Throttle Seconds
None
3-195
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
OID
eagleXgDiameterMpEvException
1. Recovery
8002 - MpEvRxException
8002 - 001 - MpEvRxException_DiamMsgPoolCongested
Event Type
DIAM
Description
DraWorker ingress message processing exception.
Severity
Info
Instance
<DraWorker Name>:001
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvRxException
1. Recovery
Description
DraWorker ingress message processing exception.
3-196
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Severity
Info
Instance
<DraWorker Name>:002
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvRxException
1. Recovery
• This event is potentially caused when a peer is generating more traffic than is
nominally expected.
Description
DraWorker ingress message processing exception.
Severity
Info
Instance
<DraWorker Name>:003
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvRxException
1. Recovery
3-197
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description
DraWorker ingress message processing exception.
Severity
Info
Instance
<DraWorker Name>:004
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvRxException
1. Recovery
Description
DraWorker ingress message processing exception.
Severity
Info
Instance
<DraWorker Name>:005
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvRxException
1. Recovery
3-198
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description
DraWorker ingress message processing exception.
Severity
Info
Instance
<DraWorker Name>:006
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvRxException
1. Recovery
Description
DraWorker ingress message processing exception.
Severity
Info
Instance
<DraWorker Name>:007
HA Score
Normal
Throttle Seconds
10
3-199
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
OID
eagleXgDiameterMpEvRxException
1. Recovery
Description
DraWorker ingress message processing exception.
Severity
Info
Instance
<DraWorker Name>:008
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvRxException
1. Recovery
Description
DraWorker ingress message processing exception.
Severity
Info
Instance
<DraWorker Name>:009
HA Score
Normal
Throttle Seconds
10
3-200
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
OID
eagleXgDiameterMpEvRxException
1. Recovery
Description
DraWorker ingress message processing exception.
Severity
Info
Instance
<DraWorker Name>:201
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvRxException
1. Recovery
Description
DraWorker ingress message processing exception.
Severity
Info
Instance
<DraWorker Name>:202
HA Score
Normal
Throttle Seconds
10
3-201
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
OID
eagleXgDiameterMpEvRxException
1. Recovery
• The host or peer may be misconfigured. Adjust the peer IP address(es) option of
the associated Peer Node if necessary.
Description
DA-MP ingress message processing exception.
Severity
Info
Instance
<DA-MP Name>:203
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvRxException
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed
between the remaining MPs in the server site. MP server status can be monitored
from the Status & Manage, and then Server page.
2. The mis-configuration of Diameter peers may result in too much traffic being
distributed to the MP. The ingress traffic rate of each MP can be monitored from
the Status & Manage, and then KPIs page. Each MP in the server site should be
receiving approximately the same ingress transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. The ingress traffic rate of each MP can be monitored from the Status
& Manage, and then KPIs page. If all MPs are in a congestion state then the
offered load to the server site is exceeding its capacity.
4. A software defect may exist resulting in PDU buffers not being deallocated to the
pool. This alarm should not normally occur when no other congestion alarms are
asserted. The alarm log should be examined using the Alarms & Events page.
5. This event is unexpected. It is recommended to contact My Oracle Support for
assistance.
3-202
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description
DraWorker ingress message processing exception.
Severity
Info
Instance
<DraWorker Name>:204
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvRxException
1. Recovery:
1. Adjust the RADIUS Cached Response Duration option of the associated
Connection configuration set(s) to reduce the lifetime of cached transactions, if
needed.
2. If one or more MPs in a server site have failed, the traffic will be distributed
between the remaining MPs in the server site.
3. The mis-configuration of Diameter peers may result in too much traffic being
distributed to the MP. Each MP in the server site should be receiving
approximately the same ingress transaction per second.
4. There may be an insufficient number of MPs configured to handle the network
traffic load. If all MPs are in a congestion state then the offered load to the server
site is exceeding its capacity.
5. A software defect may exist resulting in PTR buffers not being deallocated to the
pool. This alarm should not normally occur when no other congestion alarms are
asserted. The alarm log should be examined.
6. If the problem persists, it is recommended to contact My Oracle Support.
Description
DA-MP ingress message processing exception.
Severity
Info
3-203
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Instance
<DA-MP Name>:205
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvRxException
1. Recovery:
1. The alarm will clear when the DCL egress task message queue utilization falls
below the clear threshold. The alarm may be caused by one or more peers being
routed more traffic than is nominally expected.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description
DA-MP ingress message processing exception.
Severity
Info
Instance
<DA-MP Name>:206
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvRxException
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed
between the remaining MPs in the server site. MP server status can be monitored
from the Status & Manage, and then Server page.
2. The mis-configuration of Diameter peers may result in too much traffic being
distributed to the MP. The ingress traffic rate of each MP can be monitored from
the Status & Manage, and then KPIs page. Each MP in the server site should be
receiving approximately the same ingress transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. The ingress traffic rate of each MP can be monitored from the Status
& Manage, and then KPIs page. If all MPs are in a congestion state then the
offered load to the server site is exceeding its capacity.
3-204
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
4. A software defect may exist resulting in PDU buffers not being deallocated to the
pool. This alarm should not normally occur when no other congestion alarms are
asserted. The alarm log should be examined using the Alarms & Events page.
5. If the problem persists, it is recommended to contact My Oracle Support.
Description
Connection ingress message processing exception.
Severity
Info
Instance
<Connection Name>:207
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvRxException
1. Recovery:
1. It is possible to observe this event occasionally, due to the unreliable nature of
the UDP transport protocol. However, if the occurrence of this event is frequent,
investigate the issue further.
This event is expected when a retransmission is received from the client before a
server has responded to the request, possibly a result of the client retransmitting
too quickly before allowing sufficient time for a server to respond in time. Another
possible cause is if one or more servers configured to handle the request are
non-responsive.
2. Investigate the routing configuration to narrow down the list of servers (Peer
Nodes) which are expected to handle requests from the reported server
connection.
3. Evaluate whether an Egress Transaction Failure Rate alarm has been raised for
any of the corresponding client connections. If so, investigate the cause of the
server becoming non-responsive and address the condition.
3-205
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Note:
Depending on the operator's choice, the client connection may need
to be Admin Disabled until the evaluation is complete, which will allow
requests to be routed to other servers, depending on the routing
configuration. If this is not the case, tune the client's retransmit timers
to be greater than the typical turnaround time for the request to be
processed by the server and for the response to be sent back to the
client.
Description
Failed to access shared secret.
Severity
Info
Instance
<Connection Name>:208
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvRxException
1. Recovery:
• Check to see if alarm 8207 is present. If so, follow the recovery steps for alarm
8207 - MpRadiusKeyError.
8003 - MpEvTxException
8003 - 001 - MpEvTxException_ConnUnknown
Event Type
DIAM
Description
DraWorker egress message processing exception.
Severity
Info
3-206
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Instance
<DraWorker Name>:001
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvTxException
1. Recovery
• No action required.
Description
DraWorker egress message processing exception.
Severity
Info
Instance
<DraWorker Name>:101
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvTxException
1. Recovery
• This event is potentially caused by one or more peers being routed more traffic
than is nominally expected.
Description
DA-MP egress message processing exception.
Severity
Info
3-207
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Instance
<DA-MP Name>:201
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvTxException
1. Recovery:
1. The alarm will clear when the DCL egress task message queue utilization falls
below the clear threshold. The alarm may be caused by one or more peers being
routed more traffic than is nominally expected.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description
DraWorker egress message processing exception.
Severity
Info
Instance
<DraWorker Name>:202
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvTxException
1. Recovery:
1. Adjust the Diameter configuration set(s) to reduce the lifetime of pending
transactions, if needed.
2. If one or more MPs in a server site have failed, the traffic will be distributed
between the remaining MPs in the server site.
3. The mis-configuration of Diameter peers may result in too much traffic being
distributed to the MP. Each MP in the server site should be receiving
approximately the same ingress transaction per second.
4. There may be an insufficient number of MPs configured to handle the network
traffic load. If all MPs are in a congestion state then the offered load to the server
site is exceeding its capacity.
3-208
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
5. A software defect may exist resulting in PTR buffers not being deallocated to the
pool. This alarm should not normally occur when no other congestion alarms are
asserted.
6. If the problem persists, it is recommended to contact My Oracle Support.
Description
DA-MP egress message processing exception.
Severity
Info
Instance
<DA-MP Name>:203
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvTxException
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed
between the remaining MPs in the server site. MP server status can be monitored
from the Status & Manage, and then Server page.
2. The mis-configuration of Diameter peers may result in too much traffic being
distributed to the MP. The ingress traffic rate of each MP can be monitored from
the Status & Manage, and then KPIs page. Each MP in the server site should be
receiving approximately the same ingress transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. The ingress traffic rate of each MP can be monitored from the Status
& Manage, and then KPIs page. If all MPs are in a congestion state then the
offered load to the server site is exceeding its capacity.
4. A software defect may exist resulting in PDU buffers not being deallocated to the
pool. This alarm should not normally occur when no other congestion alarms are
asserted. The alarm log should be examined using the Alarms & Events page.
5. This event is unexpected. It is recommended to contact My Oracle Support for
assistance.
3-209
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description
DA-MP egress message processing exception.
Severity
Info
Instance
<DA-MP Name>:204
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterMpEvTxException
1. Recovery:
1. The peer is being routed more traffic than is nominally expected, or is responding
slowly. If the problem persists, the client port range configured in the Local Node
corresponding to the indicated transport connection may need to be increased.
2. Access the connection information via Diameter, and then Configuration, and
then Connections screen, which indicates the associated Local Node.
3. Access the Local Node screen via Diameter, and then Configuration, and then
Local Nodes.
4. Update the client port range by modifying the RADIUS Client UDP Port Range
Start and the RADIUS Client UDP Port Range End values in the Local Node edit
screen, if necessary.
Note:
To update the Local Node configuration, Admin Disable all associated
connections.
Description
Failed to access shared secret.
Severity
Info
Instance
<DA-MP Name>:205
HA Score
Normal
3-210
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Throttle Seconds
10
OID
eagleXgDiameterMpEvTxException
1. Recovery:
1. Proceed to 2 if alarm 8207 - MpRadiusKeyError is present.
2. Synchronize the RADIUS key file.
3. Restart the DSR process. If the required keys are now available, the alarm will not
be raised.
4. If the problem persists, it is recommended to contact My Oracle Support.
8004 - EvFsmAdState
8004 - 001 - EvFsmAdState_StateChange
Event Type
DIAM
Description
Connection FSM administrative state change.
Severity
Info
Instance
<Connection Name>:001
HA Score
Normal
Throttle Seconds
None
OID
eagleXgDiameterEvFsmAdState
1. Recovery
• No action required.
8005 - EvFsmOpState
8005 - 001 - EvFsmOpState_StateChange
Event Type
DIAM
Description
Connection FSM operational state change.
3-211
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Severity
Info
Instance
<Connection Name>:001
HA Score
Normal
Throttle Seconds
None
OID
eagleXgDiameterFsmOpState
1. Recovery
1. No action required when operationally available.
2. Potential causes for this event when operationally unavailable are:
• Connection is administratively disabled.
• Diameter initiator connection is connecting.
• Diameter initiator connection is suppressed (peer is operationally available).
• Diameter initiator connection is suppressed (peer did not signal reboot during
graceful disconnect).
• Diameter responder connection is listening.
• RADIUS server connection is opening.
3. Potential causes for this event when operationally degraded are:
• Connection egress message rate threshold crossed.
• Diameter connection is in watchdog proving.
• Diameter connection is in graceful disconnect.
• Diameter peer signaled remote busy.
• Diameter connection is in transport congestion.
8006 - EvFsmException
8006 - 001 - EvFsmException_DnsFailure
Event Type
DIAM
Description
Connection FSM exception.
Severity
Info
Instance
<Connection Name>:001
3-212
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvFsmException
1. Recovery
Description
Connection FSM exception.
Severity
Info
Instance
<Connection Name>:002
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvFsmException
1. Recovery
• No action required.
Description
Connection FSM exception.
3-213
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Severity
Info
Instance
<Connection Name>:101
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvFsmException
1. Recovery
1. This event is potentially caused by the Peer CNDRA process reaching its
descriptor capacity.
2. This event is unexpected. It is recommended to contact My Oracle Support for
assistance.
Description
Connection FSM exception.
Severity
Info
Instance
<Connection Name>:102
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvFsmException
1. Recovery
1. Potential causes for this event are:
• Network interface(s) are down.
• Port is already in use by another process.
• Configuration is invalid.
2. This event is unexpected. It is recommended to contact My Oracle Support for
assistance.
3-214
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description
Connection FSM exception.
Severity
Info
Instance
<Connection Name>:103
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvFsmException
1. Recovery
1. Potential causes for this event are:
• Peer CNDRA process is not running with root permission.
• Configuration is invalid.
2. This event is unexpected. It is recommended to contact My Oracle Support for
assistance.
Description
Connection FSM exception.
Severity
Info
Instance
<Connection Name>:104
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvFsmException
3-215
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
1. Recovery
Description
Connection FSM exception.
Severity
Info
Instance
<Connection Name>:105
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvFsmException
1. Recovery
Description
Connection FSM exception.
Severity
Info
Instance
<Connection Name>:106
HA Score
Normal
Throttle Seconds
10
3-216
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
OID
eagleXgDiameterEvFsmException
1. Recovery
Description
Connection FSM exception.
Severity
Info
Instance
<Connection Name>:107
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvFsmException
1. Recovery
Description
Connection FSM exception.
Severity
Info
3-217
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Instance
<Connection Name>:108
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvFsmException
1. Recovery
• No action required.
Description
Connection FSM exception.
Severity
Info
Instance
<Connection Name>:109
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvFsmException
1. Recovery
• No action required.
Description
Connection FSM exception.
Severity
Info
3-218
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Instance
<Connection Name>:110
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvFsmException
1. Recovery
• No action required.
Description
Connection FSM exception.
Severity
Info
Instance
<Connection Name>:111
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvFsmException
1. Recovery
Description
Connection FSM exception.
3-219
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Severity
Info
Instance
<Connection Name>:112
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvFsmException
1. Recovery
Description
Connection FSM exception.
Severity
Info
Instance
<Connection Name>:113
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvFsmException
1. Recovery
3-220
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
8007 - EvException
8007 - 101 - EvException_MsgPriorityFailure
Event Type
DIAM
Description
Connection exception.
Severity
Info
Instance
<Connection Name>:101
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvException
1. Recovery
8008 - EvRxException
8008 - 001 - EvRxException_MaxMpsExceeded
Event Type
DIAM
Description
Connection ingress message processing exception.
Severity
Info
Instance
<Connection Name>:001
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvRxException
3-221
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
1. Recovery
• This event is potentially caused when a peer is generating more traffic than is
nominally expected.
Description
Connection ingress message processing exception.
Severity
Info
Instance
<Connection Name>:101
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvRxException
1. Recovery
Description
Connection ingress message processing exception.
Severity
Info
Instance
<Connection Name>:102
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvRxException
1. Recovery
3-222
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description
Connection ingress message processing exception.
Severity
Info
Instance
<Connection Name>:201
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvRxException
1. Recovery:
Description
Connection ingress message processing exception.
Severity
Info
Instance
<Connection Name>:202
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvRxException
1. Recovery:
3-223
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description
Connection ingress message processing exception.
Severity
Info
Instance
<Connection Name>:203
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvRxException
1. Recovery:
Description
Connection ingress message processing exception.
Severity
Info
Instance
<Connection Name>:204
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvRxException
1. Recovery:
3-224
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
• The peer is responding slowly, network latency is high, or the ETR timer is
configured too small. Adjust the Diameter configuration set(s) to reduce the
lifetime of pending transactions, if needed.
Description
Connection ingress message processing exception.
Severity
Info
Instance
<Connection Name>:205
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvRxException
1. Recovery:
Description
Connection ingress message processing exception.
Severity
Info
Instance
<Connection Name>:206
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvRxException
1. Recovery:
3-225
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description
Connection ingress message processing exception.
Severity
Info
Instance
<Connection Name>:207
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvRxException
1. Recovery:
1. Evaluate the indicated message. If an invalid message authenticator value is
indicated, ensure that the same shared secret is configured for the connection on
the Peer CNDRA and on the RADIUS peer.
2. If an invalid message authenticator value is not indicated, then the peer may have
an implementation defect or may be misconfigured. It is recommended to contact
My Oracle Support for assistance. This event is unexpected.
Description
Connection ingress message processing exception.
Severity
Info
Instance
<Connection Name>:208
HA Score
Normal
Throttle Seconds
10
3-226
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
OID
eagleXgDiameterEvRxException
1. Recovery:
Description
Connection ingress message processing exception.
Severity
Info
Instance
<Connection Name>:209
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvRxException
1. Recovery:
Description
Connection ingress message processing exception.
Severity
Info
Instance
<Connection Name>:210
HA Score
Normal
Throttle Seconds
10
3-227
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
OID
eagleXgDiameterEvRxException
1. Recovery:
1. This event is unexpected. It is recommended to contact My Oracle Support
for assistance. The peer may have an implementation defect or may be
misconfigured .
2. Only certain Acct-Status-Type values are supported. Ensure that the Acct-Status-
Type value is one of these values:
• 1 (Start)
• 2 (Stop)
• 3 (Interim-Update)
• 7 (Accounting-On)
• 8 (Accounting-Off)
Description
Connection ingress message processing exception.
Severity
Info
Instance
<Connection Name>:212
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvRxException
1. Recovery:
Description
Connection ingress message processing exception.
3-228
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Severity
Info
Instance
<Connection Name>:213
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvRxException
1. Recovery:
8009 - EvTxException
8009 - 001 - EvTxException_ConnUnavailable
Event Type
DIAM
Description
Connection egress message processing exception.
Severity
Info
Instance
<Connection Name>:001
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvTxException
1. Recovery
• No action required.
3-229
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description
Connection egress message processing exception.
Severity
Info
Instance
<Connection Name>:101
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvTxException
1. Recovery
• This event is potentially caused by a peer being routed more traffic than is
nominally expected.
Description
Connection egress message processing exception.
Severity
Info
Instance
<Connection Name>:102
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvTxException
1. Recovery
• This event is potentially caused by a peer being routed more traffic than is
nominally expected.
3-230
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description
Connection egress message processing exception.
Severity
Info
Instance
<Connection Name>:201
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvTxException
1. Recovery:
Description
Connection egress message processing exception.
Severity
Info
Instance
<Connection Name>:202
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvTxException
1. Recovery:
3-231
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description
Connection egress message processing exception.
Severity
Info
Instance
<Connection Name>:203
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvTxException
1. Recovery:
1. This event is unexpected. It is recommended to contact My Oracle Support for
assistance.
2. This event is typically generated when the Peer CNDRA needs to add a
Message-Authenticator to the message, but doing so causes the message size
to exceed maximum RADIUS message length. If this problem persists, evaluate
the source of this message and ensure that the message size allows adding a
Message-Authenticator attribute (16 octets). Evaluate the message authenticator
configuration for the egress connection and ensure that the adding of Message-
Authenticator to specific message types is configured appropriately.
Description
Connection egress message processing exception.
Severity
Info
Instance
<Connection Name>:204
HA Score
Normal
Throttle Seconds
10
3-232
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
OID
eagleXgDiameterEvTxException
1. Recovery:
1. This event is unexpected. It is recommended to contact My Oracle Support for
assistance. The peer may be misconfigured.
2. Review the configuration of Route Groups and ensure that there are no RADIUS
server instances.
Description
Connection egress message processing exception.
Severity
Info
Instance
<Connection Name>:205
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvTxException
1. Recovery:
1. This event is unexpected. It is recommended to contact My Oracle Support for
assistance. The peer may be misconfigured.
2. Review the configuration of Connections and ensure that there are no RADIUS
client instances being used as a RADIUS server by one or more peers.
Description
Connection egress message processing exception.
Severity
Info
Instance
<Connection Name>:206
3-233
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvTxException
1. Recovery:
1. This event is unexpected. It is recommended to contact My Oracle Support for
assistance. The peer may be misconfigured.
2. Review the configuration of Route Groups and ensure that there are no RADIUS
server instances.
Description
Connection egress message processing exception.
Severity
Info
Instance
<Connection Name>:207
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvTxException
1. Recovery:
• No action required.
Description
Connection egress message processing exception.
Severity
Info
3-234
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Instance
<Connection Name>:208
HA Score
Normal
Throttle Seconds
10
OID
eagleXgDiameterEvTxException
1. Recovery:
1. This event is unexpected. It is recommend to contact My Oracle Support for
assistance. The peer may be misconfigured.
2. Ensure that the RADIUS UDP Transmit Buffer Size is sufficient for the offered
traffic load.
8010 - MpIngressDrop
Alarm Group:
DIAM
Description:
An ingress message is discarded or rejected.
Severity:
Major
Instance:
<DraWorker Name>
HA Score:
Normal
OID:
eagleXgDiameterMpIngressDrop
Cause:
An ingress message is discarded or rejected in the following congestion scenarios:
• Connection maximum message rate exceeded (ingress control).
• DraWorker maximum message rate exceeded (ingress control).
• DraWorker CPU congestion (overload control).
• Diameter message pool congested (routing ingress).
• Signaling event pool congested (routing ingress).
• Destination DraWorker unknown (routing ingress).
• Destination DraWorker congested (routing ingress).
3-235
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Diagnostic Information:
Collect the following information to diagnose the cause before contacting Oracle
Support:
• Event History on active SO server.
• Savelogs of all MPs.
• Peer CNDRA logs of all MPs.
1. Recovery:
8011 - EcRate
Alarm Group:
DIAM
Description:
Connection egress message rate threshold crossed.
Severity:
Minor, Major, Critical
Instance:
<Connection Name>
HA Score:
Normal
OID:
eagleXgDiameterEmr
Cause:
Connection egress message rate threshold crossed.
Diagnostic Information:
Collect the following information to diagnose the cause before contacting Oracle
Support:
• Event History on active SO server.
3-236
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
1. Recovery:
1. This alarm is potentially caused when a peer has routed more traffic than is
nominally expected.
2. Inability of the adjacent Diameter Peer to handle the rate of egress message traffic
currently being offered on a connection.
3. TCP/SCTP buffers filling up on the egress side.
8012 - MpRxNgnPsOfferedRate
Alarm Group:
DIAM
Description:
DraWorker ingress NGN-PS message rate threshold crossed.
Severity:
Major
Instance:
MpRxNgnPsOfferedRate, DIAM
HA Score:
Normal
OID:
eagleXgDiameterMpRxNgnPsOfferedRateNotify
Cause:
DraWorker ingress NGN-PS message rate threshold crossed. The alarm clears when
threshold crossing abates.
Diagnostic Information:
N/A
1. Recovery:
1. Check for one or more DraWorkers is unavailable and traffic has been distributed
to the remaining DraWorkers.
2. Check for one or more peers is generating more traffic than is nominally expected.
3. Check for an insufficient number of DraWorkers provisioned.
4. This alarm clears when the treshold crossing abates.
3-237
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
8013 - MpNgnPsStateMismatch
Alarm Group:
DIAM
Description:
DraWorker NGN-PS administrative and operational state mismatch.
Severity:
Major
Instance:
<DraWorker Name>
HA Score:
Normal
OID:
eagleXgDiameterMpNgnPsStateMismatch
Cause:
The alarm raises when the administrative state of NGN-PS is not aligned with the
operational state. Alarm clears when the administrative and operational states are
aligned.
Diagnostic Information:
Collect the following information to diagnose the cause before contacting Oracle
Support:
• The details of active SO server.
• Event History on active SO server.
1. Recovery:
1. This alarm is potentially caused when a DraWorker restart is required.
The alarm clears when the administrative and operational states are aligned.
2. If the NGN-PS feature is mistakenly activated, disable the feature to clear the
alarm and align the operational state with administrative state .
3. If the NGN-PS feature is mistakenly de-activated, enable the feature to clear the
alarm and align the operational state with administrative state.
8014 - MpNgnPsDrop
Alarm Group:
DIAM
Description:
DraWorker NGN-PS message discarded or rejected.
3-238
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Severity:
Major
Instance:
<DraWorker Name>
HA Score:
Normal
OID:
eagleXgDiameterMpNgnPsDrop
Cause:
Each layer involved in processing an NGN-PS transaction may reject or discard a
request or answer. Such scenarios include:
• Routing or application controls.
• Peer or network congestion.
• Internal processing error.
• Task queue or resource congestion or ComAgent congestion or delivery failure.
• Processing error.
Diagnostic Information:
Collect the following information to diagnose the cause before contacting Oracle
Support:
• Event History on active SO server.
• Savelogs of all MPs.
• DSR logs of all MPs.
1. Recovery
8015 - NgnPsMsgMisrouted
Alarm Group:
DIAM
Description:
NGN-PS message routed to peer CNDRA lacking NGN-PS support.
Severity:
Major
3-239
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Instance:
<Connection Name>
HA Score:
Normal
OID:
eagleXgDiameterNgnPsMsgMisrouted
Cause:
An NGN-PS message routed to a peer CNDRA lacking NGN-PS support, and will not
be processed as intended.
Diagnostic Information:
Collect the following before contacting Oracle Support:
• Event history on active SO server.
• Software release information of dra-Worker's on the dra-Worker server.
1. Recovery
8016 - MpP16StateMismatch
Alarm Group:
DIAM
Description:
MP P16 Support administrative and operational state mismatch.
Severity:
Major
Instance:
<MP Name>
HA Score:
Normal
OID:
eagleXgDiameterMpP16StateMismatch
Cause:
The administrative state of P16 support is not aligned with the operational state.
3-240
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Diagnostic Information:
Collect the following before contacting Oracle Support:
• Screenshot of active SO server.
• Event History on active SO server.
1. Recovery
1. Potential causes of this alarm are:
• An MP restart is required.
• If the 16 Priority Support is mistakenly activated, disable the feature to clear
the alarm and align the operational state with administrative state.
• If the 16 Priority Support is mistakenly de-activated, enable the feature to clear
the alarm and align the operational state with administrative state.
2. Alarm clears when the administrative and operational states are aligned.
8017 - MpTaskCpuCongested
Alarm Group
DIAM
Description
DraWorker Task CPU utilization threshold crossed
Severity
Minor, Major, Critical
Instance
Task Name
HA Score
Normal
OID
eagleXgDiameterMpTaskCpuCongested
1. Recovery
8018 - P16MsgMisrouted
Alarm Group
DIAM
3-241
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description
16 priority message routed to peer CNDRA lacking 16 priority support
Severity
Major
Instance
<Connection Name>
HA Score
Normal
OID
eagleXgDiameterP16MsgMisrouted
1. Recovery
8019 - MpAnswerPriorityModeMismatch
Alarm Group
DIAM
Description
DraWorker Answer Priority Mode administrative and operational state mismatch.
Severity
Major
Instance
<DraWorker Name>
HA Score
Normal
OID
eagleXgDiameterMpAnswerPriorityModeMismatch
1. Recovery
3-242
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
8020 - MpRoutingThreadPoolStateMismatch
Alarm Group
DIAM
Description
Routing Thread Pool administrative and operational state mismatch.
Severity
Minor
Instance
<DraWorker Name>
HA Score
Normal
OID
eagleXgDiameterMpRoutingThreadPoolStateMismatch
1. Recovery
8100 - NormMsgMisrouted
Alarm Group:
DIAG
Description:
Normal message routed onto diagnostic connection.
Severity:
Major
Instance:
<Connection Name>
HA Score:
Normal
OID:
eagleXgDiameterNormMsgMisrouted
1. Recovery:
1. The alarm is potentially caused by a diameter routing misconfiguration.
2. If the problem persists, it is recommended to contact My Oracle Support.
3-243
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
8101 - DiagMsgMisrouted
Alarm Group:
DIAG
Description:
Diagnostic message routed onto normal connection.
Severity:
Minor
Instance:
<Connection Name>
HA Score:
Normal
OID:
eagleXgDiameterDiagMsgMisrouted
1. Recovery:
1. The alarm is potentially caused by a diameter routing misconfiguration.
2. If the problem persists, it is recommended to contact My Oracle Support.
8200 - MpRadiusMsgPoolCongested
Alarm Group
DIAM
Description
DA-MP RADIUS message pool utilization threshold crossed.
Severity
Minor, Major, Critical
Instance
MpRadiusMsgPool, DIAM
HA Score
Normal
OID
eagleXgDiameterMpRadiusMsgPoolCongested
1. Recovery:
3-244
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
1. If one or more MPs in a server site have failed, the traffic will be distributed
between the remaining MPs in the server site. MP server status can be monitored
from the Status & Manage, and then Server page.
2. The mis-configuration of Diameter peers may result in too much traffic being
distributed to the MP. The ingress traffic rate of each MP can be monitored from
the Status & Manage, and then KPIs page. Each MP in the server site should be
receiving approximately the same ingress transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. The ingress traffic rate of each MP can be monitored from the Status
& Manage, and then KPIs page. If all MPs are in a congestion state then the
offered load to the server site is exceeding its capacity.
4. A software defect may exist resulting in PDU buffers not being deallocated to the
pool. This alarm should not normally occur when no other congestion alarms are
asserted. The alarm log should be examined using the Alarms & Events page.
5. If the problem persists, it is recommended to contact My Oracle Support.
8201 - RclRxTaskQueueCongested
Alarm Group
DIAM
Description
RCL ingress task message queue utilization threshold crossed.
Severity
Minor, Major, Critical
Instance
RclRxTaskQueue, DIAM
HA Score
Normal
OID
eagleXgDiameterRclRxTaskQueueCongested
1. Recovery:
1. The alarm will clear when the RCL ingress task message queue utilization falls
below the clear threshold. The alarm may be caused by one or more peers being
routed more traffic than is nominally expected.
2. If the problem persists, it is recommended to contact My Oracle Support.
8202 - RclItrPoolCongested
Alarm Group
DIAM
3-245
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description
RCL ITR pool utilization threshold crossed.
Severity
Minor, Major, Critical
Instance
RclItrPool, DIAM
HA Score
Normal
OID
eagleXgDiameterRclItrPoolCongested
1. Recovery:
1. Adjust the RADIUS Cached Response Duration option of the associated
Connection configuration set(s) to reduce the lifetime of cached transactions, if
needed.
2. If one or more MPs in a server site have failed, the traffic will be distributed
between the remaining MPs in the server site. MP server status can be monitored
from the Status & Manage, and then Server page.
3. The mis-configuration of Diameter peers may result in too much traffic being
distributed to the MP. The ingress traffic rate of each MP can be monitored from
the Status & Manage, and then KPIs page. Each MP in the server site should be
receiving approximately the same ingress transaction per second.
4. There may be an insufficient number of MPs configured to handle the network
traffic load. The ingress traffic rate of each MP can be monitored from the Status
& Manage, and then KPIs page. If all MPs are in a congestion state then the
offered load to the server site is exceeding its capacity.
5. A software defect may exist resulting in PTR buffers not being deallocated to the
pool. This alarm should not normally occur when no other congestion alarms are
asserted. The alarm log should be examined from the Alarms & Events page.
6. If the problem persists, it is recommended to contact My Oracle Support.
8203 - RclTxTaskQueueCongested
Alarm Group
DIAM
Description
RCL egress task threshold crossed.
Severity
Minor, Major, Critical
Instance
RclTxTaskQueue, DIAM
3-246
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
HA Score
Normal
OID
eagleXgDiameterRclTxTaskQueueCongested
1. Recovery:
1. The alarm will clear when the RCL egress task message queue utilization falls
below the clear threshold. The alarm may be caused by one or more peers being
routed more traffic than is nominally expected.
2. If the problem persists, it is recommended to contact My Oracle Support.
8204 - RclEtrPoolCongested
Alarm Group
DIAM
Description
RCL ETR pool utilization threshold crossed.
Severity
Minor, Major, Critical
Instance
RclEtrPool, DIAM
HA Score
Normal
OID
eagleXgDiameterRclEtrPoolCongested
1. Recovery:
1. Adjust the RADIUS Cached Response Duration option of the associated
Connection configuration set(s) to reduce the lifetime of cached transactions, if
needed.
2. If one or more MPs in a server site have failed, the traffic will be distributed
between the remaining MPs in the server site. MP server status can be monitored
from the Status & Manage, and then Server page.
3. The mis-configuration of Diameter peers may result in too much traffic being
distributed to the MP. The ingress traffic rate of each MP can be monitored from
the Status & Manage, and then KPIs page. Each MP in the server site should be
receiving approximately the same ingress transaction per second.
4. There may be an insufficient number of MPs configured to handle the network
traffic load. The ingress traffic rate of each MP can be monitored from the Status
& Manage, and then KPIs page. If all MPs are in a congestion state then the
offered load to the server site is exceeding its capacity.
3-247
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
5. A software defect may exist resulting in PTR buffers not being deallocated to the
pool. This alarm should not normally occur when no other congestion alarms are
asserted. The alarm log should be examined from the Alarms & Events page.
6. If the problem persists, it is recommended to contact My Oracle Support.
8205 - RadiusXactionFail
Alarm Group
DIAM
Description
RADIUS connection transaction failure threshold crossed. The presence of this alarm
indicates that the server is not responding to requests in a timely manner. A response
that is not received in a timely manner constitutes a transaction failure.
Severity
Minor, Major
Instance
<Connection Name>
HA Score
Normal
OID
eagleXgDiameterRadiusXactionFail
1. Recovery:
1. Check whether there is an IP network problem, RADIUS server congestion
resulting in large response times, or whether a RADIUS server failure has
occurred.
2. The user may choose to Admin Disable the corresponding transport connection
which will prevent the DSR from selecting that connection for message routing,
until the cause of the alarm is determined.
8206 - MpRxRadiusAllLen
Alarm Group
DIAM
Description
RADIUS average ingress message length threshold crossed.
Severity
Minor, Major
Instance
MpRxRadiusAllLen, DIAM
3-248
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
HA Score
Normal
OID
eagleXgDiameterMpRxRadiusAllLen
1. Recovery:
1. Investigate traffic sources. One or more peers is sending larger messages than is
nominally expected.
2. Adjust the message length thresholds if necessary.
8207 - MpRadiusKeyError
Alarm Group
DIAM
Description
DA-MP RADIUS key error. This alarm is unexpected during normal processing. The
presence of this alarm indicates DSR encountered an error while accessing RADIUS
encryption keys used to decrypt RADIUS shared secrets.
Severity
Critical
Instance
<DA-MP Name>
HA Score
Normal
OID
eagleXgDiameterMpRadiusKeyError
1. Recovery:
1. Synchronize the RADIUS key file.
2. Restart the DSR process. If the required keys are now available, the alarm is not
raised.
3. If the problem persists, it is recommended to contact My Oracle Support.
3-249
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
A message received from a peer was rejected because of a decoding failure.
Decoding failures can include missing mandatory parameters.
Severity:
Info
Instance:
<TransConnName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterIngressMsgRejectedDecodingFailureNotify
1. Recovery:
• During Diameter Request decoding, the message content was inconsistent with
the "Message Length" in the message header. This protocol violation can be
caused by the originator of the message (identified by the Origin-Host AVP in the
message) or the peer who forwarded the message to this node.
Description:
A peer routing table search with a received Request message found more than
one highest priority Peer Routing Rule match. The system selected the first rule
found but it is not guaranteed that the same rule will be selected in the future. It is
recommended that Peer Routing Rules be unique for the same type of messages to
avoid non-deterministic routing results.
Severity:
Info
Instance:
<MPName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterPeerRoutingTableRulesSamePriorityNotify
1. Recovery:
3-250
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
While attempting to route a request message to a peer, a peer's transport connection
was bypassed because the peer did not support the Application ID for that transport
connection.
Severity:
Info
Instance:
<MPName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterApplicationIdMismatchWithPeerNotify
1. Recovery:
1. The system's peer routing table may be using a Route List containing a peer which
does not support the Application ID or the list of Application IDs supported by the
peer on each connection may not be the same. View the list of Application IDs that
the peer supports on each connection and if the Application IDs are not the same
for each connection (but should be), the Application ID for any connection can be
refreshed by disabling or enabling the connection.
2. The Diameter Node which originated the message (identified by the Origin-Host
AVP) could be configured incorrectly and the application is trying to address a
node which doesn't support the Application ID. This cannot be fixed using this
application.
3. If the problem persists, contact My Oracle Support.
Description:
Routing attempted to select an egress transport connection to forward a message but
the maximum number of allowed pending transactions queued on the connection has
been reached.
Severity:
Info
3-251
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Instance:
<TransConnName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterMaxPendingTxnsPerConnExceededNotify
1. Recovery:
Description:
A message not addressed to a peer (either Destination-Host AVP was absent or
Destination-Host AVP was present but was not a peer's FQDN) could not be routed
because no Peer Routing Rules matched the message.
Severity:
Info
Instance:
<MPName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterNoPrtRuleNotify
Cause:
Ingress-request message from a downstream peer is rejected by a Local Node when
no peer-routing rules are found in the Peer Routing Table (PRT) and one of the
following is true:
• The ingress-request message did not contain a Destination-Host AVP or
• The ingress-request message contained a Destination-Host AVP but did not
match with any configured peer node's FQDN or
• Destination-Realm AVP value and the Application-ID in the request message
header did not match with configured Realm/Application-Id in Realm Route Table
3-252
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
The Realm Route Table (table RealmRoute) managed object is used to perform
message routing based upon the Destination-Realm and Application-ID in a request
message. The Realm Route Table is dynamically configured on the active Overseer.
Diagnostic Information:
Analyze the event history and event #22005 which will have following information
regarding the failure diameter message:
• <TransConnName> (Receiving connection)
• <PeerName> (Name of the receiving peer )
• <DestRealm> (Value found in Request message Destination-Realm AVP)
• <ApplicationID> (Application ID in the Request message)
• <DestHostFQDN> (FQDN found in request message Destination-Host AVP, if
present)
• <OriginHostFQDN> (FQDN found in request message Origin-Host AVP)
The Diameter Ingress Transaction Exception group measurement report contains the
RxNoRulesFailure (10034) measurement, which is also pegged in the same scenario.
1. Recovery:
1. Either the message was incorrectly routed to this node or additional Peer Routing
Rules need to be added. View and update the Peer Routing Rules.
2. If multiple peer routing tables are used, ensure the correct table is applied for the
message in question.
3. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The list of Application IDs supported by a peer during the Diameter Capabilities
Exchange procedure on a particular transport connection is not identical to one of
the list of Application IDs received from the peer over a different available transport
connection to that peer.
Severity:
Info
Instance:
<PeerName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterSupportedAppIdsInconsistentNotify
1. Recovery:
3-253
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
An answer response was received for which no pending request transaction existed,
resulting in the answer message being discarded. When a Request message is
forwarded the system saves a pending transaction, which contains the routing
information for the answer response. The pending transaction is abandoned if an
answer response is not received in a timely fashion.
Severity:
Info
Instance:
<TransConnName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterOrphanAnswerResponseReceivedNotify
Cause:
An answer message is received without any corresponding pending transaction. The
message is discarded.
Diagnostic Information:
Reasons the pending transaction is not available include:
• Peer CNDRA's Tx sender buffer is filling up causing connection congestion.
• PAT expiry or total transaction life-time expiry is causing transaction timeout.
The associated measurement tag for this event is RxAnswerUnexpected (10008),
which is the number of times that the DRL receives an answer message event from
DCL/RCL with a valid Connection ID for which a pending transaction cannot be found.
1. Recovery:
• If this event is occurring frequently, the transaction timers may be set too low.
3-254
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
An application routing table search with a received Request message found more
than one highest priority application routing rule match. At least two application
routing rules with the same priority matched an ingress Request message. The
system selected the first application routing rule found.
Severity:
Info
Instance:
<MPName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterApplicationRoutingTableRulesSamePriorityNotify
1. Recovery:
1. It is recommended that application routing rules be unique for the same type of
messages to avoid unexpected routing results.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The DAS Route List specified by the message copy trigger point is not provisioned.
Severity:
Info
Instance:
<RouteListId>
HA Score:
Normal
Throttle Seconds:
10
3-255
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Note:
Because many route lists can be created on a DraWorker server, care must
be taken to prevent excessive event generation with these resources.
OID:
eagleXgDiameterSpecifiedDasRouteListNotProvisionedNotify
1. Recovery:
1. Provisioning is incorrect/misconfigured. Verify provisioning and provision/correct
provisioning.
2. If this problem persists, it is recommended to contact My Oracle Support for
assistance.
Description:
The Message Copy Config Set specified by the trigger point is not provisioned.
Severity:
Info
Instance:
<MCCS>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterSpecifiedMCCSNotProvisionedNotify
1. Recovery:
1. Verify the configured value of MCCS with the trigger point.
2. Verify the Message Copy CfgSet (MCCS) provisioning is properly configured.
3. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The configured number of Message Copy retransmits has been exceeded for the
DAS Peer.
3-256
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Severity:
Info
Instance:
<MCCS>
HA Score:
Normal
Throttle Seconds:
10
Note:
Because many route lists can be created on a DraWorker server, care must
be taken to prevent excessive event generation with these resources.
OID:
eagleXgDiameterNumberOfRetransmitsExceededToDasNotify
1. Recovery:
1. Verify the configured value of 'Max Retransmission Attempts'
2. Verify local provisioning to connections to intended DAS peer server(s) are in
service and no network issues in path(s) to intended DAS peer server(s) exist.
3. Verify DAS peer provisioning to insure proper configuration.
4. If the problem persists, it is recommended to contact My Oracle Support for
assistance.
Description:
No valid DAS Route List was specified in the Message Copy Config Set.
Severity:
Info
Instance:
<RouteListId>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterNoDasRouteListSpecifiedNotify
1. Recovery:
3-257
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
This alarm occurs when there are a critical number of peer node alarms for a single
network element and it exceeds the configurable alarm threshold.
Note:
The alarm thresholds are configurable using the Alarm Threshold Options
tab on Diameter, and then Configuration, and then System Options.
When this alarm is generated, the system clears all individual peer node alarms
(alarm 22051) for the peer node.
Severity:
Critical
Instance:
<NetworkElement>
HA Score:
Normal
OID:
eagleXgDiameterPeerNodeUnavailableThresholdReachedNotify
Cause:
The number of critical peer node alarms for a single network element exceeds the
configurable alarm threshold.
Diagnostic Information:
Refer to Alarm 22051- Peer Unavailable. When this alarm is reported, the system
clears all the individual peer node alarms (alarm 22051) for the peer node.
1. Recovery:
1. Check the peer status.
2. Verify IP network connectivity exists between the MP server and the peer node.
3. Check the event history logs for additional DIAM events or alarms from this MP
server.
4. Verify the peer is not under maintenance.
5. It is recommended to contact My Oracle Support for assistance.
3-258
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
This alarm occurs when there are a ‘Critical’ number of Route List alarms for the
Network Element.
Severity:
Critical
Instance:
<NetworkElement>
HA Score:
Normal
OID:
eagleXgDiameterRouteListUnavailableThresholdReachedNotify
Cause:
The alarm # 22017 raises when the total number of Route List alarms for a single NE
have reached the configured Route List Failure Critical Aggregation Alarm Threshold.
The alarm gets cleared when the total number of Route List alarms for a single
NE have dropped to at least 20% below the configured Route List Failure Critical
Aggregation Alarm Threshold.
Diagnostic Information:
For further information on this alarm:
1. Examine the alarm log on Active Overseer Server.
2. Find all the route lists with a problem for the specific MP.
3. A Route List's operational status is always set to the operational status of the
Route Group within the Route List that is designated as the Active Route Group.
4. If all Route Groups within the route list are Unavailable, then the Route List is
Unavailable and there is no Active Route Group.
1. Recovery:
1. View the Route List to monitor Route List status.
2. Verify that IP network connectivity exists between the MP server and the peers.
3. Check the event history logs for additional DIAM events or alarms from this MP
server.
4. Verify that the peers in the Route List are not under maintenance.
5. It is recommended to contact My Oracle Support for assistance.
3-259
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
This alarm occurs when a DraWorker has received a notification from HA that the
Maintenance Leader resource should transition to the Active role.
Severity:
Info
Instance:
<MP Node ID>
HA Score:
Normal
Throttle Seconds:
1
OID:
eagleXgDiameterDaMpLeaderGoActiveNotificationNotify
1. Recovery:
• No action necessary.
Description:
This alarm occurs when a DraWorker has received a notification from HA that the
Maintenance Leader resource should transition to the OOS role.
Instance:
<MP Node ID>
Severity:
Info
HA Score:
Normal
Throttle Seconds:
1
OID:
eagleXgDiameterDaMpLeaderGoOOSNotificationNotify
1. Recovery:
• No action necessary.
3-260
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
22020 - Copy Message size exceeded the system configured size limit
Event Type:
DIAM
Description:
The generated Copy message size exceeded the max message size on the system.
Severity:
Info
Instance:
<DraWorker>
HA Score:
Normal
Throttle Seconds:
10
Note:
Because many copy messages can exceed the system configured size,
care must be taken to prevent excessive generation with these resources.
OID:
eagleXgDiameterCopyMessageSizeExceededNotify
1. Recovery:
1. Verify the size of the Request and Answer messages and see it exceeds the
system set message size.
2. Review provisioning and correct provisioning and see whether answers also
needed to copy.
Requests and answers may be copied to DAS.
3. If this problem persists, it is recommended to contact My Oracle Support for
assistance.
Description:
Debug Routing Info AVP is enabled.
Severity:
Minor
3-261
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Instance:
None
HA Score:
Normal
OID:
eagleXgDiameterDebugRoutingInfoAvpEnabledNotify
1. Recovery:
1. Change the IncludeRoutingInfoAvp parameter to no in the DpiOption table
on the NO for a 2-tier system or on the SO for a 3-tier system.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
Ingress Request message received was previously processed by the local node as
determined from the Route-Record AVPs received in the message.
Severity:
Major
Instance:
<Peer Name>
HA Score:
Normal
OID:
eagleXgDiameterForwardingLoopDetectedNotify
1. Recovery:
1. An ingress request message was rejected because message looping was
detected. In general, the forwarding node should not send a message to a peer
that has already processed the message (it should examine the Route-Record
AVPs before message forwarding). If this type of error is occurring frequently, then
the forwarding node is most likely mis-routing the message. This should not be
related to a configuration error because the identity of the local node is sent to the
peer during the Diameter Capabilities Exchange procedure when the Connection
comes into service.
2. If Path Topology Hiding is activated and Protected Network Node's Route-Records
are obscured with PseudoNodeFQDN, then inter-network ingress message loop
detection could reject the message if same Request message is routed back to
3-262
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
DEA. If this type of error is occurring then the forwarding node is most likely
mis-routing the message back to DEA.
3. If the problem persists, it is recommended to contact My Oracle Support.
Description:
Unable to access the Diameter Peer because all of the transport connections are
down. Peer node unavailability can happen in these cases:
• All connections toward a peer are no longer candidates for routing Request
messages.
• No available connections within the peer node support the Application ID. This is
functionally equivalent to the peer node being unavailable.
• The Connection Priority Level (CPL) value for a resource is changed to 99, which
means the operational status is Unavailable. The CPL value of a connection can
be found in the active SO.
• The number of established connections drops below the configured Minimum
Connection Capacity.
Severity:
Critical
Instance:
<PeerName> (of the Peer which failed).
HA Score:
Normal
OID:
eagleXgDiameterPeerUnavailableNotify
Cause
The Alarm #22051 raises when the Diameter Peer is not accessible as all the
transport connections are down.
Diagnostic Information
Peer node is unavailable in the following cases:
• All connections towards a peer are no longer candidates for routing Request
messages.
• No available connections within the peer node support the Application ID. This is
functionally equivalent to the peer node being unavailable.
• The Connection Priority Level (CPL) value for a resource is changed to 99, which
means the operational status is Unavailable. The CPL value of a connection can
be found in the active SO.
3-263
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
1. Recovery:
1. Confirm a connection is provisioned for the peer node.
• Verify IP network connectivity exists between the MP server and the peer
nodes using ping, traceroute, or other means.
• Examine the event history logs for additional DIAM events or alarms from the
MP server.
• Verify the peer is not under maintenance.
• Verify there are connections provisioned for the peer node.
• Verify the status of all connections toward the peer node.
View the Transaction Configuration Set of the peer node.
If the peer node has a corresponding Transaction Configuration Set setting,
then confirm the Application ID is supported.
2. Confirm the peer node supports the Application ID in the request message.
3. Resolve any congestion issues on the peer node.
4. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The peer has some available connections, but less than its minimum connection
capacity. Continued routing to this peer may cause congestion or other overload
conditions.
Severity:
Major
Instance:
<PeerName> (of the Peer which is degraded)
HA Score:
Normal
OID:
eagleXgDiameterPeerDegradedNotify
Cause:
• If the number of available connections to peer node is less than minimum
connection capacity which is default 1 per Peer Node, then Peer Node Status
will be degraded, and alarm 22052 raises.
3-264
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
• If all the connections are degraded for the peer node, then Peer Node status will
be degraded and Alarm 22052 raises.
Diagnostic Information:
• Verify the number of available connection to that peer should be greater than
minimum connection capacity which is default 1.
• Peer CNDRA configurations on active SO
• Savelogs on active SO
• Event History on active SO
1. Recovery:
1. Check the Peer status.
2. Verify IP network connectivity exists between the MP server and the adjacent
servers.
3. Check the event history logs for additional DIAM events or alarms from this MP
server.
4. Verify the peer is not under maintenance.
5. Make sure the number of available connections to that peer node is greater than
minimum connection capacity configured.
6. If the problem persists, it is recommended to contact My Oracle Support.
Description:
All route groups with the route list are unavailable. A Route List becomes unavailable
when all of its peers become unavailable and a peer becomes unavailable when all of
its transport connections become unavailable.
If a Transport Connection is configured for Initiate mode, the network element
periodically attempts to recover the connection automatically if its Admin State is
enabled. If the Transport Connection is configured for Responder-Only mode, the
peer is responsible for re-establishing the transport connection.
Examine the Event history and software release information for the route groups.
Severity:
Critical
Instance:
<RouteListName> (of the Route List which failed)
HA Score:
Normal
OID:
eagleXgDiameterRouteListUnavailableNotify
3-265
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Cause:
All route groups within the route list are unavailable. Check the Route list status.
Diagnostic Information
Examine the following for the route groups:
• Event history
• Software release information
1. Recovery:
1. Check the Route List status.
2. Verify IP network connectivity exists between the MP server and the peers.
3. Check the event history logs for additional DIAM events or alarms from this MP
server.
4. Verify the peers in the route list not under maintenance.
5. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The Route List's Operational Status has changed to degraded because the capacity
of the Route List's active route group has dropped below the Route List's configured
minimum capacity. There are two potential causes:
1. One or more of the Route List's peers become Unavailable. A peer becomes
unavailable when all of its transport connections become unavailable. If a
transport connection is configured for Initiate mode, the network element
periodically attempts to recover the connection if its admin state is enabled.
If the transport connection is configured for responder-only mode, the peer is
responsible for re-establishing the transport connection.
2. The Route Groups within the Route List may not have been configured with
sufficient capacity to meet the Route List's configured minimum capacity.
Severity:
Major
Instance:
<RouteListName> (of the Route List which is degraded)
HA Score:
Normal
OID:
eagleXgDiameterRouteListDegradedNotify
3-266
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Cause:
There are no available Route Groups, and the Operational Status of one or more
Route Groups within the Route List is degraded.
Diagnostic Information:
A Route List's operational status is always set to the operational status of the Route
Group within the Route List that is designated as the Active Route Group.
DRL determines which Route Group within a Route List is designated the Active
Route Group for that Route List as follows:
• If the operational status of one or more Route Groups within the Route List is
Available, then the Active Route Group for the Route List is the Available Route
Group with the highest priority
• If there are no Available Route Groups, and the operational status of one or
more Route Groups within the Route List is Degraded, the Active Route Group
is the Degraded Route Group with the highest Current Capacity. If two or more
degraded Route Groups exist with equal Current Capacity, then the Active Route
Group is the one with the highest Priority
• If all Route Groups within the route list are Unavailable, then the Route List is
Unavailable and there is no Active Route Group
1. Recovery:
1. Verify Route List status and configured minimum capacity.
2. Verify IP network connectivity exists between the MP server and the peers.
3. Check the event history logs for additional DIAM events or alarms from this MP
server.
4. Verify the peers in the Route List are not under maintenance.
5. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The application has started to utilize a Route Group other than the highest priority
Route Group to route Request messages for a Route List because the highest
priority Route Group specified for that Route List has either become Unavailable or its
capacity has dropped below the minimum capacity configured for the Route List while
a lower priority Route Group has more capacity.
The preferred Route Group (i.e., with highest priority) is demoted from the Active
Route Group to a Standby Route Group when a peer failure occurs causing the Route
Group's Operational Status to change to Unavailable or Degraded. A Route Group
becomes Degraded when its capacity has dropped below Route List's configured
minimum capacity. A Route Group becomes Unavailable when all of its peers have an
Operational Status of Unavailable or Degraded.
3-267
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Severity:
Minor
Instance:
<RouteListName> (of the concerned Route List)
HA Score:
Normal
OID:
eagleXgDiameterNonPreferredRouteGroupInUseNotify
1. Recovery:
1. Check the Route List status and configured minimum capacity.
2. Verify that IP network connectivity exists between the MP server and the peers.
3. Check the event history logs for additional DIAM events or alarms from this MP
server.
4. Verify that the adjacent server is not under maintenance.
5. If the problem persists, it is recommended to contact My Oracle Support.
Description:
An operator request to change the Admin State of a transport connection was not
completely processed due to an internal error. The admin state is either disabled from
an egress routing perspective but the connection could not be taken out of service or
the admin state is enabled from an egress routing perspective but the connection is
not in service.
Severity:
Major
Instance:
<TransConnName>
HA Score:
Normal
OID:
eagleXgDiameterConnAdminStateInconsistencyNotify
1. Recovery:
3-268
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
1. If the transport connection's Admin State is Disabled but the transport connection
was not taken out of service due to an internal error do the following actions to
correct the failure:
a. Enable the connection.
b. Wait for this alarm to clear.
c. Disable the connection.
2. If the transport connection's Admin State is Enabled but the transport connection
was not taken out of service due to an internal error do the following actions to
correct the failure:
a. Disable the connection.
b. Wait for this alarm to clear.
c. Enable the connection.
3. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The ETG Rate Limit has exceeded the defined threshold.
Severity:
Major
Instance:
<ETGName>
HA Score:
Normal
OID:
eagleXgDiameterEtgRateLimitDegradedNotify
Cause:
This alarm triggers when Rate Limiting is Enabled through active SO server menu,
Diameter > Maintenance > Egress Throttle Groups.
• Rate Limiting Operational Status transitions from Available to Degraded.
• Rate Limiting Operational Status transitions from Inactive to Degraded.
Diagnostic Information
• Screen snapshot of active SO server through menu, Main Menu> Diameter ->
Maintenance -> Egress Throttle Groups.
• Savelogs of all MPs.
• DSR logs of all MPs.
3-269
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
1. Recovery:
1. Check the configuration in Diameter, and then Configuration, and then Egress
Throttle Groups to determine if the Maximum Configured rate is too low.
2. Check the Egress Message Rate at Diameter, and then Maintenance, and
then Egress Throttle Groups and Diameter, and then Maintenance, and then
Connections to determine if the sending Peers/Connections are offering too much
traffic.
3. If the problem persists, collect the logs list in Diagnostic information and it is
recommended to contact My Oracle Support.
Description:
The ETG Pending Transactions Limit has exceeded the defined threshold.
Severity:
Major
Instance:
<ETGName>
HA Score:
Normal
OID:
eagleXgDiameterEtgPendingTransLimitDegradedNotify
Cause:
When Pending Transaction limiting is Enabled through Active SO, menu Diameter
-> Maintenance -> Egress Throttle Groups, the alarm will be triggered when the
following conditions met:
• Pending Transaction Limiting Operational Status transitions from Available to
Degraded
• Pending Transaction Limiting Operational Status transitions from Inactive to
Degraded
Diagnostic Information:
• Screen Snapshot of active SO via menu: Main Menu > Diameter > Maintenance
> Egress Throttle Groups .
• Savelogs of all MPs.
• DSR logs of all MPs.
3-270
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
1. Recovery:
1. Check the configuration in Diameter, and then Configuration, and then Egress
Throttle Groups to determine if the Maximum Configured rate is too low.
2. Check the Egress Message Rate at Diameter, and then Maintenance, and
then Egress Throttle Groups and Main Menu, and then Diameter, and
then Maintenance, and then Connections to determine if the sending Peers/
Connections are offering too much traffic.
3. Determine if the receiving Peers or Connections in the ETG are not responding
with Answers in a timely manner because they are either busy or overloaded.
4. If the problem persists, collect logs in Diagnostic information and it is
recommended to contact My Oracle Support.
Description:
The Egress Throttle Group Message rate Congestion Level has changed. This will
change the Request priority that can be routed on peers and connections in the ETG.
Severity:
Info
Instance:
<ETGName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterEtgRateCongestionNotify
1. Recovery:
1. The Maximum Configured rate may be too low. Check the configuration in
Diameter, and then Configuration, and then Egress Throttle Groups
2. The sending Peers/Connections are offering too much traffic. Check the EMR rate
at Diameter, and then Maintenance, and then Egress Throttle Groups and/or
Diameter, and then Maintenance, and then Connections
3. Typically all routes to a server should be in an ETG. However, if that is not the
case, alternate routes may be out of service and could cause overloading of traffic
towards connections contained in this ETG. Evaluate traffic distribution to server
connections and see if any alternate routes to server are unavailable causing
overloading of traffic on an ETG.
4. It is recommended to contact My Oracle Support for assistance.
3-271
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
The Egress Throttle Group Pending Transaction Limit Congestion Level has changed.
This will change the Request priority that can be routed on peers and connections in
the ETG.
Severity:
Info
Instance:
<ETGName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterEtgPendingTransCongestionNotify
1. Recovery:
1. The Maximum Configured rate may be too low. Check the configuration in
Diameter, and then Configuration, and then Egress Throttle Groups
2. The sending Peers/Connections are offering too much traffic. Check the EMR rate
at Diameter, and then Maintenance, and then Egress Throttle Groups and/or
Diameter, and then Maintenance, and then Connections
3. Typically all routes to a server should be in a ETG, however if that is not the
case, then those routes becoming out of service could cause overloading of traffic
towards connections contained in this ETG. Evaluate traffic distribution to server
connections and see if any alternate routes to server are unavailable causing
overloading of traffic on an ETG.
4. The receiving Peers or Connections in the ETG are not responding with Answers
in a timely manner. Check to see if they are busy or overloaded.
5. If the problem persists, it is recommended to contact My Oracle Support for
assistance.
Description:
ETG Rate and Pending Transaction Monitoring is stopped on all configured ETGs
3-272
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Severity:
Minor
Instance:
<DA-MP Hostname>
HA Score:
Normal
OID:
eagleXgDiameterEtgMonitoringStoppedNotify
1. Recovery:
1. Verify ComAgent links setup between DA-MPs have not gone OOS causing SMS
Service to not receive Responses from DA-MP Leader under Communication
Agent, and then Maintenance.
2. Verify ComAgent links are established between DA-MPs under Communication
Agent, and then Maintenance
3. Verify the No-MP Leader condition in Diameter, and then Maintenance, and then
DA-MPs, and then Peer DA-MP Status that at least 1 DA-MP is MP-Leader.
4. If the problem persists, it is recommended to contact My Oracle Support.
Description:
Topology Hiding could not be applied because the Actual Host Name could not be
determined.
Severity:
Info
Instance:
<CfgSetName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterTopoHidingActualHostNameNotFoundNotify
1. Recovery:
1. Ensure that all MME/SGSN hostnames to be hidden are present in the MME/
SGSN Configuration Set.
3-273
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
2. If any Peer CNDRA Applications are activated on Peer CNDRA, ensure that
any specific Application Level Topology Hiding feature is not conflicting with the
contents of Actual Host Names specified in the MME Configuration Set.
3. Check if the first instance of a Session-ID AVP in the Request/Answer message
contains the mandatory delimited ";".
4. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The size of the message encoded by Peer CNDRA has exceeded its max limits.
Severity:
Info
Instance:
<TransConnName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterDiameterMaxMsgSizeLimitExceededNotify
1. Recovery:
22064 - Upon receiving Redirect Host Notification the Request has not
been submitted for re-routing
Event Type:
DIAM
Description:
This event indicates that the Peer CNDRA has encountered a Redirect Host
Notification that it can accept for processing but cannot continue processing due to
some reason, such as internal resources exhaustion.
Severity:
Info
Instance:
<PeerName>
3-274
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
HA Score:
Normal
Throttle Seconds:
60
OID:
eagleXgDiameterRxRedirectHostNotRoutedNotify
1. Recovery:
1. Examine the DraWorker congestion status and related measurements and take
appropriate action.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The Redirect Realm Notification received is accepted but cannot be processed due to
some reason, such as internal resources exhaustion.
Severity:
Info
Instance:
<PeerName>
HA Score:
Normal
Throttle Seconds:
60
OID:
eagleXgDiameterRxRedirectRealmNotRoutedNotify
1. Recovery:
1. Examine the DraWorker congestion status and related measurements and take
appropriate action.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
An ETG's Control Scope is set to ETL, but the ETG is not configured against an ETL.
3-275
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Severity:
Minor
Instance:
<ETG Name>
HA Score:
Normal
OID:
eagleXgDiameterEtgEtlScopeInconsistencyNotify
1. Recovery:
1. Correct the configuration inconsistency by changing the Control Scope of the ETG
from ETL to ETG, or by adding the ETG to an ETL.
2. If a backup image has been restored to the SOAM, but not the NOAM, restoring a
consistent backup image for the NOAM should resolve the problem.
3. If the problem persists, it is recommended to contact My Oracle Support.
Description:
An ETL is associated with an ETG that does not exist.
Severity:
Minor
Instance:
<ETL Name>
HA Score:
Normal
OID:
eagleXgDiameterEtgEtlInvalidAssocNotify
1. Recovery:
1. Correct the configuration inconsistency by updating the ETL to refer to a valid
ETG, or by installing consistent backups on the NOAM and SOAM.
2. If the problem persists, it is recommended to contact My Oracle Support.
3-276
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
22068 - TtpEvDoicException
22068 - 001 - TtpEvDoicException: DOIC OC-Supported-Features AVP not
received
Event Type:
DIAM
Description:
DOIC Protocol Error
Severity:
Info
Instance:
<TTP Name>:001
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterTtpEvDoicExceptionNotify
1. Recovery:
• The Peer Node associated with the TTP is not responding to a DOIC Capability
Announcement (DCA). This can occur when the Peer Node either does not
support DOIC or DOIC has been disabled on the Peer Node. The operator
should either disable DOIC on the DSR associated with TTP by setting the TTP's
"Dynamic Throttling Admin State" to Disabled or enable DOIC on the Peer Node.
Description:
DOIC Protocol Error
Severity:
Info
Instance:
<TTP Name>:002
HA Score:
Normal
3-277
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Throttle Seconds:
10
OID:
eagleXgDiameterTtpEvDoicExceptionNotify
1. Recovery:
• The Peer Node associated with the TTP has selected a DOIC Abatement
Algorithm not supported by the TTP. This should never happen and may be the
result of a mis-configuration or bug on the Peer Node. If this error persists, the
operator should disable DOIC for the TTP by setting the TTP's "Dynamic Throttling
Admin State" to Disabled or enable DOIC on the Peer Node.
Description:
DOIC Protocol Error
Severity:
Info
Instance:
<TTP Name>:003
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterTtpEvDoicExceptionNotify
1. Recovery:
• The Peer Node associated with the TTP is sending a DOIC overload report which
is not supported by DSR at this time. The operator should disable Realm-based
DOIC overload reports on the Peer Node.
Description:
DOIC Protocol Error
3-278
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Severity:
Info
Instance:
<TTP Name>:004
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterTtpEvDoicExceptionNotify
1. Recovery:
• The Peer Node associated with the TTP has sent a DOIC overload report that is
out of sequence. If this error occurs infrequently, then it may have been caused
by a timing delay whereby Answer messages received from the Peer Node were
delivered out of order. If this error occurs frequently, then the Peer Node may be in
violation of the DOIC specification.
Description:
DOIC Protocol Error
Severity:
Info
Instance:
<TTP Name>:005
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterTtpEvDoicExceptionNotify
1. Recovery:
• The Peer Node associated with the TTP has sent a DOIC overload report
containing an OC-Reduction-Percentage AVP value greater than 100. If this error
occurs infrequently, then there may be a DOIC software error in the Peer Node. If
this error occurs frequently, then the error may be caused by a Peer Node DOIC
mis‑configuration problem.
3-279
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
DOIC Protocol Error
Severity:
Info
Instance:
<TTP Name>:006
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterTtpEvDoicExceptionNotify
1. Recovery:
• The Peer Node associated with the TTP has sent a DOIC overload report
containing an OC-Validity-Duration AVP value greater than the maximum allowed.
The maximum value for the OC-Validity-Duration AVP is 86,400 seconds (24
hours). If this error occurs infrequently, then there may be a DOIC software error in
the Peer Node. If this error occurs frequently, then the error may be caused by a
Peer Node DOIC mis-configuration problem.
22069 - TtpEvDoicOlr
22069 - 001 - TtpEvDoicOlr: Valid DOIC OLR Applied to TTP
Event Type:
DIAM
Description:
A DOIC OverLoad Request (OLR) was received from a Peer Node and applied to a
configured TTP.
Severity:
Info
Instance:
<TTP Name>:001
HA Score:
Normal
3-280
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Throttle Seconds:
0 (zero)
OID:
eagleXgDiameterTtpEvDoicExceptionNotify
1. Recovery:
• No action required.
22070 - TtpEvDegraded
22070 - 001 - TtpEvDegraded: TTP Degraded, Peer Overload
Event Type:
DIAM
Description:
TTP Degraded
Severity:
Info
Instance:
<TTP Name>:001
HA Score:
Normal
Throttle Seconds:
0 (zero)
OID:
eagleXgDiameterTtpEvDegradedNotify
1. Recovery:
• No action required.
Description:
TTP Degraded
Severity:
Info
Instance:
<TTP Name>:002
3-281
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
HA Score:
Normal
Throttle Seconds:
0 (zero)
OID:
eagleXgDiameterTtpEvDegradedNotify
1. Recovery:
• No action required.
Description:
TTP Degraded
Severity:
Info
Instance:
<TTP Name>:003
HA Score:
Normal
Throttle Seconds:
0 (zero)
OID:
eagleXgDiameterTtpEvDegradedNotify
1. Recovery:
• No action required.
22071 - TtgEvLossChg
22071 - 001 - TtgEvLossChg: TTG Loss Percent Changed
Event Type:
DIAM
Description:
TTG's Loss Percentage was modified.
Severity:
Info
3-282
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Instance:
<TTG Name>:001
HA Score:
Normal
Throttle Seconds:
0 (zero)
OID:
eagleXgDiameterTtpEvDoicExceptionNotify
1. Recovery:
• No action required.
Description
The TTP's Operational Status has been changed to Degraded.
Severity
Major
Instance
<TTP Name>
HA Score
Normal
OID
eagleXgDiameterTtpDegradedNotify
1. Recovery
• No action required.
Description
TTP rate throttling has been suspended due to an internal failure.
Severity
Minor
3-283
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Instance
<DA-MP Name>
HA Score
Normal
OID
eagleXgDiameterTtpThrottlingStoppedNotify
1. Recovery:
1. Verify that ComAgent links setup between DA-MPs have not gone OOS
causing SMS Service to not receive Responses from DA-MP Leader under
Communication Agent, and then Maintenance.
2. Verify ComAgent links are established between DA-MPs under Communication
Agent, and then Maintenance
3. Verify the No-MP Leader condition in Diameter, and then Maintenance, and then
DA-MPs, and then Peer DA-MP Status that at least 1 DA-MP is MP-Leader.
4. If the problem persists, it is recommended to contact My Oracle Support.
Description
The Maximum Loss Percentage Threshold assigned to the TTP has been exceeded.
Severity
Major
Instance
<TTP Name>
HA Score
Normal
OID
eagleXgDiameterTtpMaxLossPercentageExceededNotify
1. Recovery
• No action required.
3-284
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
ART Rule-X was selected, but message was not routed because Peer CNDRA
Application is disabled or not available.
Severity:
Major
Instance:
<Peer CNDRA Application Name>
HA Score:
Normal
OID:
eagleXgDiameterArtMatchAppUnavailableNotify
1. Recovery:
1. Check the Application Status and Enable the application if the Admin State of the
Peer CNDRA application is Disabled for a particular DraWorker(s) which raised
the alarm.
2. If the Application is Enabled for a particular DraWorker, but the Operational Status
is Unavailable or Degraded, then refer to the Operational Reason and rectify it
accordingly.
3. If the problem persists, it is recommended to contact My Oracle Support.
Description
The "Maximum Loss Percentage Threshold" assigned to the Route Group within the
Route List has been exceeded.
Severity
Major
Instance
<Route List Name>:<Route Group Name>.<TTG Name>
HA Score
Normal
OID
eagleXgDiameterTtgMaxLossPercentageExceededNotify
1. Recovery
• No action required.
3-285
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
Request reroutes due to Answer response and/or Answer timeout having exceeded
the configured onset threshold percentage on the DraWorker server.
Severity:
Major
Instance:
MpReroutePercent
HA Score:
Normal
Note:
The alarm clears when the percentage of Request reroutes due to Answer
Result-code matching "Reroute on Answer" and Answer Timeout drops
below the configured abatement threshold and remains there for the
configured abatement time. The alarm also clears when the Peer CNDRA
process is stopped or restarted.
OID:
eagleXgDiameterMpExcessiveRequestRerouteNotify
1. Recovery:
1. This alarm is an indication of reroutes exceeding the configured threshold, due
to responses from the Peer Node exceeding the Pending Answer timer in Peer
CNDRA or due to configured "Reroute on Answer" Result codes.
2. If rerouting is triggered due to Answer Result-code:
a. Use measurement TxRerouteAnswerResponse to identify any peer (or set of
peers) being identified as triggering reroute.
b. If a peer (or set of peers) is identified, validate that Reroute-on-Answer is
properly configured for that peer.
c. Check for congestion being reported by the peer.
3. If rerouting is triggered due to Answer Timeout:
a. Use measurement TxRerouteAnswerTimeout to identify any peer (or set of
peers) being identified as timing out.
b. If a peer (or set of peers) is identified, verify that Pending Answer Timer and
Transaction Lifetime are properly configured.
c. Check for congestion being reported by the peer.
3-286
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
An ART/PRT search has resulted in either a loop between ART/PRT tables, or the
search depth has exceeded the maximum allowed depth.
Severity:
Info
Instance:
<MPName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterNestedArtPrtSearchErrorNotify
1. Recovery:
1. If the error was a search loop, the customer should change at least one of the
rules in the search sequence to avoid a loop. If the error was a maximum depth
exceeded, the customer should remove one or more rules in the search sequence.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
Radius Route List is not provisioned in the system options.
Severity:
Info
Instance:
<MPName>
HA Score:
Normal
3-287
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Throttle Seconds:
10
OID:
eagleXgDiameterInvalidDestRouteListNotify
1. Recovery
1. If the error was a search loop, the customer should change at least one of the
rules in the search sequence to avoid a loop. If the error was a maximum depth
exceeded, the customer should remove one or more rules in the search sequence.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
Connection is unavailable for Diameter Request/Answer exchange with peer.
Note:
This alarm is not raised when the Suppress Connection Unavailable alarm
for a Transport Connection is set to Yes.
Alarm 22101 is generated when the connection's administrative state is enabled and
the connection is not in a state where it can send or receive Diameter Requests or
Answers to/from the peer. The alarm is generated when one of the following occurs.
• Connection's Admin State transitions from disabled to enabled
• Connection's Operational Status transitions from available to unavailable
• Connection's Operational Status transitions from degraded to unavailable
Severity:
Major
Instance:
<Connection Name>
HA Score:
Normal
OID:
eagleXgDiameterConnectionUnavailableAlarmNotify
3-288
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Cause:
Alarm #22101 raises when the connection's administrative state is enabled and the
connection is not in a state where it can send or receive Diameter Requests or
Answers to/from the peer. The alarm is generated when one of the following occurs:
• Connection's Admin State transitions from disabled to enabled
• Connection's Operational Status transitions from available to unavailable
• Connection's Operational Status transitions from degraded to unavailable
Diagnostic Information:
Confirm any of following conditions is occurring:
1. A host IP interface is down
2. A host IP interface is unreachable from the peer
3. A peer IP interface is down
4. A peer IP interface is unreachable from the host
Verify the following are configured and available:
1. Remote IP availability
2. Remote server (port) availability
3. Network availability
4. Local IP route to remove
5. Local MP service availability
6. Configuration correctness, such as CEX parameter matching with remove
1. Recovery:
1. Confirm the host IP interface is down or unreachable from the peer.
2. Confirm the peer IP interface is down or unreachable from the host.
3. Verify the following are configured and available:
• Remote IP availability
• Remote server (port) availability
• Network availability
• Local IP route to remove
• Local MP service availability
• Configuration correctness, such as CEX parameter matching with remove
4. Identify the most recent Connection Unavailable event in the event log for the
connection and use the Event's recovery steps to resolve the issue.
5. If the problem persists, it is recommended to contact My Oracle Support.
3-289
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
Connection is only available for routing messages with a priority greater than or equal
to the connection's congestion level. This alarm is generated when:
• Connection congestion when the Peer CNDRA Tx sender buffer is at maximum
capacity
• The connection's administrative state is enabled and the connection is in
congestion. Requests and Answers continue to be received and processed from
the peer over the connection, and attempts to send Answers to the peer still
occur. The alarm is raised when one of the following occurs:
– Connection's Operational Status transitions from available to degraded
(connection has become congested or watchdog algorithm has failed)
– Connection's Operational Status transitions from unavailable to degraded
(connection has successfully completed the capabilities exchange and is
performing connection proving)
• Connection egress message rate threshold has been crossed
• Diameter connection is in watchdog proving
• Diameter connection is in graceful disconnect
• Diameter peer signaled the remote is busy
• Diameter connection is in transport congestion
Severity:
Major
Instance:
<Connection Name>
HA Score:
Normal
OID:
eagleXgDiameterFsmOpStateDegraded
Cause:
This alarm is raised when:
• Connection congestion when the Peer CNDRA Tx sender buffer is at maximum
capacity
• The connection's administrative state is enabled and the connection is in
congestion. Requests and Answers will continue to be received and processed
from the peer over the connection and attempts to send Answers to the peer will
still occur. The alarm is raised when one of the following occurs:
– Connection's Operational Status transitions from available to degraded
(connection has become congested or watchdog algorithm has failed)
3-290
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Diagnostic Information:
1. View the Connection Performance measurement report for the +/- 1 hour
congestion event.
2. Examine the Log file by using these commands:
• # date >> tcp_stat_<hostname>
• # cat /proc/net/tcp >> tcp_stat_<hostname>
• # sleep 1
• # cat /proc/net/tcp >> tcp_stat_<hostname>
• # sleep 1
• # cat /proc/net/tcp >> tcp_stat_<hostname>
• # sleep 1
• # cat /proc/net/tcp >> tcp_stat_<hostname>
• # date >> tcp_stat_<hostname>
3. Examine the output of the command, netstat -canp --tcp | grep
<remote IP:Port for conn> for few minutes.
4. Examine the corresponding Rx buffer on the connection in question using
this command: netstat -canp --tcp | grep <remote IP:Port for
conn>. The RxBuffer value is configured using ConnectionCfget.
5. Examine the overall network statistics for other issues using the command,
netstat -i.
6. Examine the overall network delay using the command ping.
7. View the software release information.
1. Recovery:
1. View the Connection Performance measurement report for the +/- 1 hour
congestion event.
2. Examine the log file by using these commands:
• # date >> tcp_stat_<hostname>
• # cat /proc/net/tcp >> tcp_stat_<hostname>
• # sleep 1
• # cat /proc/net/tcp >> tcp_stat_<hostname>
• # sleep 1
3-291
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
One or more paths of the SCTP multi-homed connection is down.
Severity:
Minor
Instance:
<TransConnName>
HA Score:
Normal
OID:
eagleXgDiameterSCTPConnectionImpairedAlarmNotify
Cause:
A host IP interface for one of the paths in the connection is down. One of following
cases can cause this alarm:
3-292
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Diagnostic Information:
1. Export the Diameter and IPFE configuration information from the active SOAM.
2. Retrieve the software release information.
3. Test each path in the connection to determine which one is causing the
connection to be impaired.
4. Capture pcap (tcpdump) trace of packets on the local host (of the specific
interface of the MP reporting the issue), or on remote peer or on IPFE (if it is
TSA addressed) to see if data traffic or the heartbeat is running on the network
1. Recovery:
1. The alarm clears when the connection is operationally unavailable or all paths are
operationally available.
Potential causes are:
• A host IP interface is down.
• A host IP interface is unreachable from the peer.
• A peer IP interface is down.
• A peer IP interface is unreachable from the host.
• Network path is down between one host IP and the other peer IP.
• Network congestion or large latency in network (resulting loss or late arrival of
packets).
2. Identify the most recent SCTP Connection Impaired event in the event log for the
connection and use the event's recovery steps to resolve the issue.
3. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The SCTP peer advertised less IP addresses than configured for the connection.
If two IP addresses have been configured for the Local Node of a certain SCTP
connection, but following the SCTP connection establishment the peer node has
advertised only one IP address (less than the number of IP addresses configured for
the local node), then Alarm 22104 is generated.
3-293
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Severity:
Minor
Instance:
<TransConnName>
HA Score:
Normal
OID:
eagleXgDiameterSCTPPeerReducedIPSetAlarmNotify
Cause:
When the operational status is Available and a connection is established over SCTP
transport, the number of IP addresses advertised by the peer in INIT/INIT_ACK is
less than the number of paths set by the connection configuration. For instance, the
established connection has two IP addresses configured for the Local Node, but the
peer node has advertised only one IP address.
Diagnostic Information:
View the networking configuration on the peer node.
1. Recovery:
1. When the operational status is Available and a connection is established over
SCTP transport, the number of IP addresses advertised by the peer in INIT/
INIT_ACK is less than the number of paths set by the connection configuration.
For instance, the established connection has two IP addresses configured for the
Local Node, but the peer node has advertised only one IP address.
2. The peer is not able to advertise more than one IP address either due to an error
in its configuration or due to being affected by a network interface failure.
3. If the problem persists, it is recommended to contact My Oracle Support.
Description:
Alarm is raised when the connection transmit buffer is congested; messages are
discarded until condition clears. This error indicates the socket write cannot complete
without blocking, which signals the socket buffer is currently full.
Severity:
Major
Instance:
<TransConnName>
HA Score:
Normal
3-294
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
OID:
eagleXgDiameterConnectionTxCongestionAlarmNotify
Cause:
The socket write cannot complete without blocking, signaling that the socket buffer is
currently full.
Diagnostic Information:
N/A.
1. Recovery:
1. The peer is not able to process the volume of traffic being offered on the
connection. Reduce the traffic volume or increase the processing capacity on the
peer.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
An ingress message is discarded due to connection (or DraWorker) ingress message
rate exceeding connection (or DraWorker) maximum ingress MPS.
Severity:
Major
Instance:
<MPHostName>
HA Score:
Normal
OID:
eagleXgDiameterIngressMessageDiscardedAlarmNotify
Cause:
An ingress message is discarded or rejected in the following congestion scenarios:
• Connection maximum message rate exceeded.
• DraWorker maximum message rate exceeded.
3-295
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Diagnostic Information:
1. From the event history, check the current message rate and the threshold rate for
the diameter connection/DAMP node.
2. Check the maximum reserved ingress MPS for the DAMP on the Active Overseer
server.
3. Ensure that the ingress MPS is less than the threshold for the diameter
connection/DAMP.
1. Recovery:
1. The ingress MPS on the DraWorker is exceeding the MP Maximum ingress MPS.
Maybe one or more DraWorkers is unavailable and traffic has been distributed to
the remaining DraWorkers.
2. See if one or more peers are generating more traffic than is normally expected.
3. Make sure a sufficient number of DraWorkers is provisioned.
4. If the problem persists, it is recommended to contact My Oracle Support.
Description:
DraWorker CPU utilization threshold has been exceeded. Potential causes are:
• One or more peers are generating more traffic than is normally expected
• Configuration requires more CPUs for message processing than is normally
expected
• One or more peers are answering slowly, causing a backlog of pending
transactions
• A DraWorker has failed, causing the redistribution of traffic to the remaining
DraWorkers
Severity:
Minor, Major, Critical, Warning
Instance
NA
HA Score:
Normal
OID:
eagleXgDiameterMpCpuCongestedNotify
Cause:
Potential causes are:
3-296
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
• One or more peers are generating more traffic than is normally expected.
• Configuration requires more CPUs for message processing than is normally
expected.
• One or more peers are answering slowly, causing a backlog of pending
transactions.
• A DraWorker has failed, causing the redistribution of traffic to the remaining
DraWorkers.
Diagnostic Information:
1. Observe the ingress traffic rate of each MP.
a. The misconfiguration of server/client routing may result in too much traffic
being distributed to the MP. Each MP in the server site should be receiving
approximately the same ingress transactions per second.
b. There may be an insufficient number of MPs configured to handle the network
traffic load. If all MPs are in congestion, then the traffic load to the server site
is exceeding its capacity.
2. Examine the alarm log.
3. Examine the DraWorker status.
1. Recovery:
1. If one or more MPs in a server site has failed, the traffic is distributed between the
remaining MPs in the server site. Monitor the MP server status.
2. The mis-configuration of DIAMETER peers may result in too much traffic being
distributed to the MP. Monitor the ingress traffic rate of each MP. Each MP in the
server site should be receiving approximately the same ingress transaction per
second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. If all MPs are in a congestion state then the offered load to the server
site is exceeding its capacity.
4. The Diameter Process may be experiencing problems. Examine the alarm log.
5. If the problem persists, it is recommended to contact My Oracle Support.
22201 - MpRxAllRate
Alarm Group:
DIAM
Description:
DraWorker ingress message rate threshold crossed.
Severity:
Minor, Major, Critical
Instance:
MpRxAllRate, DIAM
HA Score:
Normal
3-297
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
OID:
eagleXgDiameterMpRxAllRateNotify
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed
between the remaining MPs in the server site.
2. The mis-configuration of Diameter peers may result in too much traffic being
distributed to the MP. Each MP in the server site should be receiving
approximately the same ingress transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. If all MPs are in a congestion state then the offered load to the server
site is exceeding its capacity.
4. If the problem persists, it is recommended to contact My Oracle Support.
22202 - MpDiamMsgPoolCongested
Alarm Group:
DIAM
Description:
DraWorker Diameter message pool utilization threshold crossed.
Severity:
Minor, Major, Critical
Instance:
MpDiamMsgPool, DIAM
HA Score:
Normal
OID:
eagleXgDiameterMpDiamMsgPoolCongestedNotify
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed
between the remaining MPs in the server site.
2. The mis-configuration of Diameter peers may result in too much traffic being
distributed to the MP. Each MP in the server site should be receiving
approximately the same ingress transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. If all MPs are in a congestion state then the offered load to the server
site is exceeding its capacity.
4. A software defect may exist resulting in PDU buffers not being deallocated to the
pool. This alarm should not normally occur when no other congestion alarms are
asserted.
3-298
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
The MP's PTR buffer pool is approaching its maximum capacity. If this problem
persists and the pool reaches 100% utilization all new ingress messages will be
discarded. This alarm should not normally occur when no other congestion alarms are
asserted.
Severity:
Minor, Major, Critical
Instance:
N/A
HA Score:
Normal
OID:
eagleXgDiameterPtrBufferPoolUtilNotify
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed
between the remaining MPs in the server site.
2. The mis-configuration of Diameter peers may result in too much traffic being
distributed to the MP. Each MP in the server site should be receiving
approximately the same ingress transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. If all MPs are in a congestion state then the offered load to the server
site is exceeding its capacity.
4. A software defect may exist resulting in PTR buffers not being deallocated to the
pool. This alarm should not normally occur when no other congestion alarms are
asserted.
5. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The MP's Request Message Queue Utilization is approaching its maximum capacity. If
this problem persists and the queue reaches 100% utilization all new ingress Request
messages will be discarded. This alarm should not normally occur when no other
congestion alarms are asserted.
3-299
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Severity:
Minor, Major, Critical
Instance:
N/A
HA Score:
Normal
OID:
eagleXgDiameterRequestMessageQueueUtilNotify
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed
between the remaining MPs in the server site.
2. The mis-configuration of Diameter peers may result in too much traffic being
distributed to the MP. Each MP in the server site should be receiving
approximately the same ingress transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. If all MPs are in a congestion state then the offered load to the server
site is exceeding its capacity.
4. If no additional congestion alarms are asserted, the Request Task may be
experiencing a problem preventing it from processing messages from its Request
Message Queue.
5. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The MP's Answer Message Queue Utilization is approaching its maximum capacity. If
this problem persists and the queue reaches 100% utilization all new ingress Answer
messages will be discarded. This alarm should not normally occur when no other
congestion alarms are asserted.
Severity:
Minor, Major, Critical
Instance:
N/A
HA Score:
Normal
3-300
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
OID:
eagleXgDiameterAnswerMessageQueueUtilNotify
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed
between the remaining MPs in the server site.
2. The mis-configuration of Diameter peers may result in too much traffic being
distributed to the MP. Each MP in the server site should be receiving
approximately the same ingress transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. If all MPs are in a congestion state then the offered load to the server
site is exceeding its capacity.
4. If no additional congestion alarms are asserted, the Answer Task may be
experiencing a problem preventing it from processing messages from its Answer
Message Queue.
5. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The MP's Reroute Queue is approaching its maximum capacity. If this problem
persists and the queue reaches 100% utilization any transactions requiring rerouting
will be rejected. This alarm should not normally occur when no other congestion
alarms are asserted.
Severity:
Minor, Major, Critical
Instance:
N/A
HA Score:
Normal
OID:
eagleXgDiameterRerouteQueueUtilNotify
1. Recovery:
1. An excessive amount of Request message rerouting may have been triggered by
either connection failures or Answer time-outs.
2. If no additional congestion alarms are asserted, the Reroute Task may be
experiencing a problem preventing it from processing messages from its Reroute
Queue.
3. If the problem persists, it is recommended to contact My Oracle Support.
3-301
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
22207 - DclTxTaskQueueCongested
Alarm Group:
DIAM
Description:
DCL egress task message queue utilization threshold crossed.
Severity:
Minor, Major, Critical
Instance:
<DraWorker Name>
HA Score:
Normal
OID:
eagleXgDiameterDclTxTaskQueueCongested
1. Recovery:
1. The alarm will clear when the DCL egress task message queue utilization falls
below the clear threshold. The alarm may be caused by one or more peers being
routed more traffic than is nominally expected.
2. If the problem persists, it is recommended to contact My Oracle Support.
22208 - DclTxConnQueueCongested
Alarm Group:
DIAM
Description:
DCL egress connection message queue utilization threshold crossed.
Severity:
Minor, Major, Critical
Instance:
<ConnectionName>
HA Score:
Normal
OID:
eagleXgDiameterDclTxConnQueueCongested
1. Recovery:
3-302
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
1. The alarm will clear when the DCL egress connection message queue utilization
falls below the clear threshold. The alarm may be caused by peers being routed
more traffic than nominally expected.
2. It is recommended to contact My Oracle Support for further assistance.
Description:
Diameter Message Copy is disabled.
Severity:
Minor
Instance:
N/A
HA Score:
Normal
OID:
eagleXgDiameterMessageCopyDisabledNotify
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed
between the remaining MPs in the server site.
2. The mis-configuration of Diameter peers may result in too much traffic being
distributed to the MP. Each MP in the server site should be receiving
approximately the same ingress transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. If all MPs are in a congestion state then the offered load to the server
site is exceeding its capacity.
4. The Diameter Process may be experiencing problems.
5. If the problem persists, contact My Oracle Support.
Description:
The DraWorker's Message Copy queue utilization is approaching its maximum
capacity.
Severity:
Minor, Major, Critical
3-303
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Instance:
N/A
HA Score:
Normal
OID:
eagleXgDiameterMsgCopyQueueUtilNotify
1. Recovery:
1. Reduce traffic to the MP.
2. Verify that no network issues exist between the DraWorker and the intended DAS
peer(s).
3. Verify that the intended DAS peer has sufficient capacity to process the traffic load
being routed to it.
4. If the problem persists, it is recommended to contact My Oracle Support.
Description:
Message processing rate for this MP is approaching or exceeding its engineered
traffic handling capacity. The routing mps rate (MPS/second) is approaching or
exceeding its engineered traffic handling capacity for the MP.
Severity:
Minor, Major, Critical
Instance:
N/A
HA Score:
Normal
OID:
eagleXgDiameterRoutingMpsRateNotify
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed
amongst the remaining MPs in the server site.
2. The mis-configuration of Diameter peers may result in too much traffic being
distributed to the MP.
Each MP in the server site should be receiving approximately the same ingress
transaction per second.
3-304
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
The MP's Long Timeout PTR buffer pool is approaching its maximum capacity.
Severity:
Minor, Major, Critical
Instance:
N/A
HA Score:
Normal
OID:
eagleXgDiameterLongTimeoutPtrBufferPoolUtilNotify
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed
amongst the remaining MPs in the server site.
2. The misconfiguration of Pending Answer Timer assignment may result in
excessive traffic being assigned to the Long Timeout PTR buffer Pool.
3. The misconfiguration of Diameter peers may result in too much traffic being
distributed to the MP. Each MP in the server site should be receiving
approximately the same ingress transaction per second
4. There may be an insufficient number of MPs configured to handle the network
traffic load. If all MPs are in a congestion state then the offered load to the server
site is exceeding its capacity.
5. A software defect may exist resulting in Long Timeout PTR buffers not being
de-allocated to the pool. This alarm should not normally occur when no other
congestion alarms are asserted. Examine the alarm log.
6. If the problem persists, it is recommended to contact My Oracle Support.
3-305
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
DraWorker memory utilization threshold crossed.
Severity:
Minor, Major, Critical
Instance:
System.RAM_UtilPct, Peer CNDRA
HA Score:
Normal
OID:
eagleXgDiameterMpMemCongestedNotify
Cause:
Following are the potential causes:
• One or more peers are generating more traffic than expected.
• Configuration requires more Physical Memory for message processing than
expected.
• One or more peers are answering slowly, causing a backlog of pending
transactions.
• A DraWorker failed, causing the redistribution of traffic to the remaining
DraWorkers.
Diagnostic Information:
To diagnose the cause:
1. Recovery:
1. Analyze and correct routing so the traffic load is balanced between MPs.
2. If all MPs are approaching or exceeding their engineered traffic handling capacity,
add more MPs to the system and configure connections and routes to distribute
traffic to new DraWorkers.
3. If the problem persists, it is recommended to contact My Oracle Support.
3-306
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
The average transaction hold time has exceeded its configured limits.
This alarm is generated when KPI #10098 (TmAvgRspTime) exceeds Peer CNDRA-
wide engineering attributes associated with average hold time, defined in the
DraWorker profile assigned to the DraWorker server. KPI #10098 is defined as
the average time (in milliseconds) from when the routing layer (DRL) receives a
request message from a downstream peer to the time that an answer response
is sent to that downstream peer. The source measurement of KPI #10098 is the
TmResponseTimeDownstreamMp (10093) measurement.
This alarm indicates the average response time (TmAvgRspTime) for messages
forwarded by the Relay Agent is larger than what is defined for a deployment as
per DraWorker profile assignment. One of these problems could exist:
• The IP network may be experiencing problems that are adding propagation delays
to the forwarded request message and the answer response.
– Verify the IP network connectivity exists between the MP server and the
adjacent nodes.
– View the event history logs for additional events or alarms from this MP
server.
• One or more upstream nodes may be experiencing traffic overload.
• One or more MPs is experiencing traffic overload.
– View the KPI Routing Recv Msgs/Sec.
– View the CPU utilization of MPs.
Severity:
Minor, Major, Critical
Instance:
N/A
HA Score:
Normal
OID:
eagleXgDiameterAvgHoldTimeLimitExceededNotify
Cause:
Alarm 22224 is generated when KPI #10098 (TmAvgRspTime) exceeds Peer
CNDRA-wide engineering attributes associated with average hold time, defined in
the DraWorker profile assigned to the DraWorker server. KPI #10098 is defined as
the average time (in milliseconds) from when the routing layer (DRL) receives a
request message from a downstream peer to the time that an answer response
3-307
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
is sent to that downstream peer. The source measurement of KPI #10098 is the
TmResponseTimeDownstreamMp (10093) measurement.
The alarm thresholds are configurable for:
• Average hold time minor alarm onset threshold
• Average hold time minor alarm abatement threshold
• Average hold time major alarm onset threshold
• Average hold time major alarm abatement threshold
• Average hold time critical alarm onset threshold
• Average hold time critical alarm abatement threshold
The severity of the alarm (Minor, Major, or Critical) is according to onset threshold/
abatement threshold of each severity level. When the average hold time initially
exceeds the average hold time for an alarm onset threshold, a minor, major, or
critical alarm is triggered. When the average hold time subsequently exceeds a higher
onset threshold, or drops below an abatement threshold, but is still above the minor
alarm abatement threshold, the alarm severity changes based on the highest onset
threshold crossed by the current average hold time.
Diagnostic Information:
If Alarm #22224 is raised, then it indicates the average response time
(TmAvgRspTime) for messages forwarded by the Relay Agent is larger than the
defined for a deployment as per DraWorker profile assignment. One of the following
problems could exist:
• The IP network may be experiencing problems that are adding propagation delays
to the forwarded request message and the answer response.
– Verify the IP network connectivity exists between the MP server and the
adjacent nodes.
– View the event history logs for additional events or alarms from this MP
server.
• The IP network may be experiencing problems that are adding propagation delays
to the forwarded request message and the answer response.
• One or more upstream nodes may be experiencing traffic overload.
• One or more MPs is experiencing traffic overload.
– View the KPI Routing Recv Msgs/Sec.
– View the CPU utilization of MPs.
1. Recovery:
1. The average transaction hold time is exceeding its configured limits, resulting in
an abnormally large number of outstanding transactions that may be leading to
excessive use of resources like memory.
• Reduce the average hold time by examining the configured Pending Answer
Timer values and reducing any values that are unnecessarily large or small.
• Identify the causes for the large average delay between the Peer CNDRA
sending requests to the upstream peers and receiving answers for the
requests.
• Confirm the peer node(s) or Peer CNDRA is in overload by viewing KPI/
Measurements/CPU usage and take corrective action.
3-308
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
The size of the average message processed by Peer CNDRA has exceeded its
configured limits.
The alarm is generated when the measurement RxAvgMsgSize reaches the
Peer CNDRA-wide engineering attributes, defined in the DaMpProfileParameters
corresponding to the MP profile being used. RxAvgMsgSize is defined as the size
of the average message processed by Peer CNDRA.
This alarm indicates Peer CNDRA has encountered a message it can accept for
processing, but might not continue processing if the message size increases more
than the maximum supported message size. This increase can be due to standard
diameter processing (for example, Route Record additions to requests) or due to
custom processing (for example, Mediation modifying AVPs).
Severity:
Minor, Major, Critical
Instance:
N/A
HA Score:
Normal
3-309
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
OID:
eagleXgDiameterAvgMsgSizeLimitExceededNotify
Cause:
Alarm 22225 raises when the measurement RxAvgMsgSize reaches the Peer
CNDRA-wide engineering attributes, defined in the DaMpProfileParameters
corresponding to the MP profile being used.
RxAvgMsgSize is defined as the size of the average message processed by Peer
CNDRA.
• Average message size minor alarm onset threshold
• Average message size minor alarm abatement threshold
• Average message size major alarm onset threshold
• Average message size major alarm abatement threshold
• Average message size critical alarm onset threshold
• Average message size critical alarm abatement threshold
The severity of alarm (Minor, Major, or Critical) is according to onset/abatement
threshold of each severity level. When the average message size reaches the value
of the respective alarm onset/abatement threshold, within 3 seconds the alarm is
raised with severity Minor, Major, or Critical, based on the value reached by the
average message size.
Diagnostic Information:
This event indicates that Peer CNDRA has encountered a message that it can accept
for processing, but might not continue processing if the message size increases more
than the maximum supported message size. This increase can be due to standard
diameter processing (for example, RouteRecord additions to requests) or due to
custom processing (for example, Mediation modifying AVPs).
1. Recovery:
1. Examine the traffic coming from connected peers to see if any of them are
sending abnormally large messages, and look for any special processing rules
being applied by Peer CNDRA to that message.
2. The alarm thresholds are configurable for:
• Average hold time minor alarm onset threshold
• Average hold time minor alarm abatement threshold
• Average hold time major alarm onset threshold
• Average hold time major alarm abatement threshold
• Average hold time critical alarm onset threshold
• Average hold time critical alarm abatement threshold
The severity of the alarm (Minor, Major, or Critical) is according to the onset
threshold/abatement threshold of each severity level. When the average hold time
initially exceeds the average hold time for an alarm onset threshold, a minor,
major, or critical alarm is triggered. When the average hold time subsequently
exceeds a higher onset threshold, or drops below an abatement threshold, but is
still above the minor alarm abatement threshold, the alarm severity changes based
on the highest onset threshold crossed by the current average hold time.
3. If the problem persists, it is recommended to contact My Oracle Support.
3-310
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
The diameter connection specified in the alarm instance is processing a higher than
normal ingress messaging rate.
Severity:
• Minor (if all of the following are true):
– The average ingress MPS rate the connection is processing has reached the
percentage of the connection's maximum ingress MPS rate configured for the
connection minor alarm threshold.
– The average ingress MPS rate the connection is processing has not yet
reached the percentage of the connection's maximum ingress MPS rate
configured for the connection major alarm threshold.
• Major (if the following are true):
– The average ingress MPS rate the connection is processing has reached the
percentage of the connection's maximum ingress MPS rate configured for the
connection major alarm threshold.
Instance:
The name of the diameter connection as defined by the TransportConnection table
HA Score:
Normal
OID:
eagleXgDiameterIngressMpsRateNotify
Cause:
Alarm # 22328 raises the severity,
3-311
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
• The average ingress MPS rate that the connection is processing has reached
the percentage of the connection's maximum ingress MPS rate configured for the
connection major alarm threshold.
Diagnostic Information:
To get further information regarding this issue:
1. Examine the alarm log on Active Overseer Server.
2. Get the Connection ID IcRate[Connection_Id] from Alarm Details and the
corresponding Connection Name from TransportConnectionTable on active
Overseer server.
3. Investigate the connection's remote Diameter peer (the source of the ingress
messaging) to determine why they are sending the abnormally high traffic rate.
1. Recovery:
1. The Diameter connection specified in the Alarm Instance field is processing
a higher than expected average ingress Diameter message rate. The alarm
thresholds for minor and major alarms are configured in the Capacity
Configuration Set used by the Diameter connection.
2. The message rate used for this alarm is an exponentially smoothed 30 second
average. This smoothing limits false alarms due to short duration spikes in the
ingress message rate.
3. If the alarm severity is minor, the alarm means the average ingress message rate
has exceeded the minor alarm threshold percentage of the maximum ingress MPS
configured for the connection.
4. If the alarm severity is major, the alarm means the average ingress message rate
has exceeded the major alarm threshold percentage of the maximum ingress MPS
configured for the connection.
5. This alarm is cleared when the average ingress message rate falls 5% below the
minor alarm threshold, or the connection becomes disabled or disconnected. This
alarm is downgraded from major to minor if the average ingress message rate falls
5% below the major alarm threshold.
6. If the average ingress message rate is determined to be unusually high,
investigate the connection's remote Diameter peer (the source of the ingress
messaging) to determine why they are sending the abnormally high traffic rate;
otherwise, consider increasing either the connection's maximum ingress MPS rate
or the connection's alarm thresholds.
7. If the problem persists, it is recommended to contact My Oracle Support.
Description:
This alarm occurs when there are a ‘Critical’ number of IPFE connection alarms for
the network element.
The Alarm Thresholds are configurable using the Alarm Threshold Options tab on
Diameter, and then Configuration, and then System Options.
3-312
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
The IPFE connection may not be established for a variety of reasons. The operational
status of this connection is displayed on the GUI as unavailable and Alarm 22101
Connection Unavailable is raised.
When the number of unavailable IPFE connections exceeds the defined threshold,
IPFE Connection Failure Major/Critical Aggregation Alarm Threshold (default is
100/200), alarm 22349 is raised by the DSR.
Severity:
Major, Critical
Note:
The Critical threshold may be disabled by setting the Critical Threshold
to zero using the Alarm Threshold Options tab on Diameter, and then
Configuration, and then System Options.
Instance:
<NetworkElement>
HA Score:
Normal
OID:
eagleXgDiameterIPFEConnUnavailableThresholdReachedNotify
Cause:
The IPFE connection may not be established for a variety of reasons. The operational
status of this connection is displayed on the GUI as unavailable and Alarm 22101,
Connection Unavailable is raised.
When the number of unavailable IPFE connections exceeds the defined threshold,
IPFE Connection Failure Major/Critical Aggregation Alarm Threshold (default is
100/200), alarm 22349 is raised by the DSR.
Diagnostic Information:
Perform the following:
3-313
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
• Use Wireshark to capture the diameter traffic on all MPs under the concerned
TSA list and the primary IPFE. Save the PCAP traffic capture generated by
Wireshark.
• Verify the connection configurations (IP addresses, ports, peer node, protocol) are
correct.
• Verify peer-connection configurations (protocol, remote/local IP address, remote/
local port) matches local connection configurations.
• Verify the connection's transport protocol and/or port are not being blocked by a
network firewall or other ACL in the network path.
1. Recovery:
1. Navigate to Diameter, and then Maintenance, and then Connection to monitor
IPFE Connection status.
2. Confirm peer connection configuration (protocol, remote/local IP address, remote/
local port) matches the local connection configuration.
3. Confirm the connection’s transport protocol and/or port are not being blocked by a
network firewall or other ACL in the network path.
4. Verify the peers in the Route List are not under maintenance.
5. Use Wireshark to analyze all the captured PCAP data to find where the message
exchange is broken or failed. Wireshark should be the main tool used to diagnose
the unavailable connection.
6. Based on the PCAP file, correct the configuration if the issue is on the DSR side.
The Alarm will be cleared automatically when the numbers of unavailable IPFE
connections are under the IPFE Connection Failure Critical/Major Aggregation
Alarm Threshold.
7. If the issue is on the DSR side or you are not sure, it is recommended to contact
My Oracle Support for assistance.
Description:
This alarm occurs when there are a critical number of fixed connection alarms for the
DraWorker.
Severity:
Major, Critical
Note:
The Critical threshold may be disabled by setting the Critical Threshold to
zero.
Instance:
<DraWorker-Hostname>
3-314
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
HA Score:
Normal
OID:
eagleXgDiameterConnUnavailableThresholdReachedNotify
Cause:
The alarm #22350 raises when there are a critical number of fixed connection alarms
for the DraWorker.
Diagnostic Information:
To get further information regarding this issue:
1. Find all the connections with a problem for the specific MP.
2. For each connection with a problem, verify:
a. The remote host is reachable from the local MP by using ssh to the MP and
pinging the remote server IP (if using IP address) or server FQDN (if using
FQDN)
b. DNS availability should be tested by pinging the DNS server IP
c. FQDN resolving should be tested by using nslookup to check the FQDN
resolving on the MP
3. If the above tests reveal the remote host is not reachable, then verify that there is
no network problem on the remote server.
4. If the remote server is reachable, then verify the processes are running correctly.
a. Verify the local Peer CNDRA process is running by checking the ps -ef
output
b. Verify the local node is listening on the correct port by using netstat -na
and checking the correct transport type, tcp/sctp port is listening
c. Use wireshark or tcpdump to capture traffic messages, and verify the
connection is established (confirm the handshake process is occurring for
SCTP or TCP)
5. If the port is not listening, or the handshake procedure is not occurring, then the
process or server may be in trouble.
6. If the connection/association is established, then ensure that the Diameter
handshake is happening and correct, by checking the Diameter CEX message
exchange, for information like server FQDN, IP address, or applications
supported; mismatching information causes the connection to abort.
7. If Diameter handshake is good, then observe the health of the Diameter
connection by verifying the DWR messages are answered correctly.
1. Recovery:
1. Check Fixed Connection status.
2. Confirm the peer connection configuration (protocol, remote/local IP address,
remote/local port) matches the local connection configuration.
3. Confirm the connection’s transport protocol and/or port are not being blocked by a
network firewall or other ACL in the network path.
3-315
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
4. Verify the peers in the Route List are not under maintenance.
5. Modify the value of Alarm Threshold Options if it is set too low.
6. It is recommended to contact My Oracle Support for assistance.
Description:
The COMCOL update sync log used by DB Table monitoring to synchronize Diameter
Connection Status among all DraWorker RT-DBs has overrun. The DraWorker's
Diameter Connection Status sharing table is automatically audited and re-synced to
correct any inconsistencies.
Severity:
Info
Instance:
<DbTblName>
Note:
<DbTblName> refers to the name of the Diameter Connection Status
Sharing Table the Diameter Connection status inconsistency that was
detected.
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterDpiTblMonCbOnLogOverrunNotify
1. Recovery:
Description:
An unexpected error occurred during DB Table Monitoring.
Severity:
Info
3-316
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Instance:
DpiTblMonThreadName
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterDpiSldbMonAbnormalErrorNotify
1. Recovery:
Description:
Diameter Connection status inconsistencies exist among the DraWorkers in the Peer
CNDRA signaling NE.
Severity:
Critical
Instance:
<DbTblName> Name of the Diameter Connection Status Sharing Table where the
Diameter Connection status inconsistency was detected.
HA Score:
Normal
OID:
eagleXgDiameterConnStatusInconsistencyExistsNotify
Cause:
The data inconsistency might have caused due to the following reasons:
• Network issue, the change log is not distributed to the destination MP.
• Process error (update is disturbed) in executing change on the destination MP.
Diagnostic Information:
No specific diagnostic information is required if alarm clears in the next audit/sync.
Analyze the error log if the problem persists.
1. Recovery:
• No action necessary.
3-317
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Note:
DraWorker's SLDB tables are automatically audited and re-synchronized
to correct inconsistencies after a log overrun has occurred. The
Automatic Data Integrity Check, which was introduced in cm6.2,
periodically scans almost the entire local IDB for integrity. The initial
default period is 30 minutes.
Description:
This alarm is generated when a DA-MP is brought into service and a DA-MP
configuration profile has not been assigned to the DA-MP during DSR installation/
upgrade procedures.
Severity:
Critical
Instance:
N/A
HA Score:
Normal
OID:
eagleXgDiameterDaMpProfileNotAssignedNotify
Cause:
Alarm #22960 raises when a DA-MP is brought into service and a DA-MP
configuration profile has not been assigned to the DA-MP during DSR installation/
upgrade procedures.
Diagnostic Information:
Examine the error log in Main Menu > Alarms & Events.
1. Recovery:
1. From the DSR OAM GUI, navigate to Diameter Common, and then MPs, and
then Profile Assignments to assign a DA-MP profile to the DA-MP.
2. If the problem persists, it is recommended to contact My Oracle Support.
3-318
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
The available memory (in kilobytes) for feature set is less than the required memory
(in kilobytes). This alarm is raised when a DraWorker is brought into service and a
DraWorker configured DiamaterMaxMessageSize in DpiOption table value is greater
than 16KB, but the available memory on DraWorker is less than 48GB.
Severity:
Critical
Instance:
N/A
HA Score:
Normal
OID:
eagleXgDiameterInsufficientAvailMemNotify
Cause:
Alarm #22961 raises when a DraWorker is brought into service and a DraWorker
configured DiamaterMaxMessageSize in DpiOption table value is greater than 16KB
but the available memory on DraWorker is less than 48GB.
Diagnostic Information:
N/A.
1. Recovery:
1. Make additional memory available on the DraWorker for the configured
DiameterMaxMessageSize.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description
DSR Signaling Firewall is administratively Disabled
Severity
Minor
Instance
<System OAM name>
HA Score
Normal
3-319
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
OID
eagleXgDiameterFwDisabledNotify
1. Recovery
1. Navigate to the Signaling Firewall page (Diameter, and then Maintenance, and
then Signaling Firewall). Click the Enable button.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description
DSR Signaling Firewall Operational status is degraded.
Severity
Minor
Instance
<DA-MP name>
HA Score
Normal
OID
eagleXgDiameterFwDegradedNotify
1. Recovery
1. Analyze event 25609 - Firewall Configuration Error encountered to identify the
error(s) and the DA-MP which reported the error(s).
2. Analyze any platform alarms on the identified DA-MP. Follow the procedures to
clear the platform alarms on the identified DA-MP
3. Disable the Signaling Firewall from the Signaling Firewall page (Diameter, and
then Maintenance, and then Signaling Firewall).
4. If the alarm persists, restart the application on the identified DA-MP from the
Status & Manage screen on the active Network OAM GUI.
5. If the problem is still unresolved, it is recommended to contact My Oracle Support
for assistance.
Description
Firewall Configuration Error encountered.
3-320
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Severity
Info
Instance
<DA-MP name>
HA Score
Normal
Throttle Seconds
N/A
OID
eagleXgDiameterFwDisabledNotify
1. Recovery
Description
DSR Signaling Firewall configuration inconsistency detected
Severity
Minor
Instance
<DA-MP name>
HA Score
Normal
OID
eagleXgDiameterFwDegradedNotify
1. Recovery
1. One possible cause could be manual changes in the "01dsr" domain of Linux
firewall configuration on the DA-MP server. If so, the manual configuration should
be rolled back.
2. If the problem persists, it is recommended to contact My Oracle Support for
assistance.
3-321
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description
DRMP attributes of ETG not in synch with remote ETGs associated with same ETL.
Severity
Minor
Instance
<ETG name>
HA Score
Normal
OID
eagleXgDiameterEtgInvalidDRMPAttrbsNotify
1. Recovery
Description
Connection was rejected due to the DraWorker exceeding its connection or ingress
MPS capacity
Severity
Major
Instance
pingAllLivePeers
HA Score
Normal
OID
eagleXgDiameterPingAllLivePeerErrorNotify
1. Recovery
1. Check /var/log/messages and /var/log/cron for more information.
2. Run pingAllLivePeers -v and pingAllLivePeers -h as root on the command
line.
3. If the problem persists, it is recommended to contact My Oracle Support for
assistance.
3-322
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
Description:
Peer Node Alarm Group Threshold Reached. This alarm occurs when there are a
number of minor, major, or critical Peer Node alarms for a single Peer Node Alarm
Group.
Severity:
Minor, Major, and Critical
Instance:
<PeerNodeAlarmGroupName>
HA Score:
Normal
Throttle Seconds:
0 (zero)
OID:
eagleXgDiameterPeerNodeAlarmGroupThresholdReachedNotify
Description:
Connection Alarm Group Threshold Reached. This alarm occurs when there are a
number of minor, major, or critical Connection alarms for a single Connection Alarm
Group.
Severity:
Minor, Major, and Critical
Instance:
<ConnectionAlarmGroupName>
HA Score:
Normal
Throttle Seconds:
0 (zero)
3-323
Chapter 3
Diameter Alarms and Events (8000-8299, 22000-22350, 22900-22999, 25600-25899)
OID:
eagleXgDiameterConnectionAlarmGroupThresholdReachedNotify
Description
Invalid Shared TTG Reference
Severity
Minor
Instance
<Route List Name>&<Route Group Name>&<TTG SG Name>&<TTG Name>
HA Score
Normal
OID
eagleXgDiameterDoicInvalidSharedTtgRefNotify
1. Recovery
1. For the Route List named in the alarm instance, edit its configuration and delete
the association to the non-existent Shared TTG. Then,
2. If desired, re-create the Shared TTG at its host site, and re-add the association to
the Route List/Route Group.
Note:
Because, internally, the association of a TTG to the RL/RG is based on
an internal ID, (not the TTG name), it is not valid to leave the original
association in the Route List configuration and simply create a new
Shared TTG with original name. This will not work, as the internal ID
for the original TTG will not be the same as the ID for the new TTG (even
though the TTG name is the same).
3-324
Chapter 3
Range Based Address Resolution (RBAR) Alarms and Events (22400-22424)
Description
Invalid Internal Overseer Server Group Designation
Severity
Minor
Instance
<Route List Name>&<Route Group Name>&<TTG SG Name>&<TTG Name>
HA Score
Normal
OID
eagleXgDiameterDoicInvalidInternalSoamSgDesignationNotify
1. Recovery
• For the Route List named in the alarm instance, edit its configuration and delete
the association to the Shared TTG. This will clear the alarm. The association can
simply be re-added to restore integrity to the configuration.
Description:
A message received was rejected because of a decoding failure.
Severity:
Info
Instance:
<MPName>
HA Score:
Normal
Throttle Seconds:
10
3-325
Chapter 3
Range Based Address Resolution (RBAR) Alarms and Events (22400-22424)
OID:
eagleXgDiameterRbarMsgRejectedDecodingFailureNotify
1. Recovery:
• While parsing the message, the message content was inconsistent with the
Message Length in the message header. These protocol violations can be caused
by the originator of the message (identified by the Origin-Host AVP in the
message) or the peer who forwarded the message to this node.
Description:
A message could not be routed because the Diameter Application ID is not supported.
Severity:
Info
Instance:
<MPName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterRbarUnknownApplIdNotify
1. Recovery:
1. The Peer CNDRA Relay Agent forwarded a Request message to the address
resolution application which contained an unrecognized Diameter Application ID
in the header. Either a Peer CNDRA Relay Agent application routing rule is
mis-provisioned or the Application ID is not provisioned in the RBAR routing
configuration.
2. Check the currently provisioned Diameter Application IDs.
3. Check the currently provisioned Application Routing Rules.
Description:
A message could not be routed because the Diameter Command Code in the ingress
Request message is not supported and the Routing Exception was configured to send
an Answer response.
3-326
Chapter 3
Range Based Address Resolution (RBAR) Alarms and Events (22400-22424)
Severity:
Info
Instance:
<MPName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterRbarUnknownCmdCodeNotify
1. Recovery:
1. The order pair (Application ID, Command Code) is not provisioned in the Address
Resolutions routing configuration.
2. Check the currently provisioned Application IDs and Command Codes.
Description:
A message could not be routed because no address AVPs were found in the
message and the Routing Exception was configured to send an Answer response.
Severity:
Info
Instance:
<AddressResolution>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterRbarNoRoutingEntityAddrAvpNotify
1. Recovery:
1. This may be a normal event or an event associated with misprovisioned address
resolution configuration. If this event is considered abnormal, validate which AVPs
are configured for routing with the Application ID and Command Code.
2. Check the currently provisioned Application IDs and Command Codes.
3-327
Chapter 3
Range Based Address Resolution (RBAR) Alarms and Events (22400-22424)
Description:
A message could not be routed because none of the address AVPs contained a valid
address and the Routing Exception was configured to send an Answer response.
Severity:
Info
Instance:
<AddressResolution>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterRbarNoValidRoutingEntityAddrFoundNotify
1. Recovery:
1. This may be a normal event or an event associated with misprovisioned address
resolution configuration. If this event is considered abnormal, validate which AVPs
are configured for routing with the Application ID and Command Code.
2. Check the currently provisioned Application IDs and Command Codes.
Description:
A message could not be routed because a valid address was found that did not match
an individual address or address range associated with the Application ID, Command
Code, and Routing Entity Type, and the Routing Exception was configured to send an
Answer response.
Severity:
Info
Instance:
<AddressResolution>
HA Score:
Normal
3-328
Chapter 3
Range Based Address Resolution (RBAR) Alarms and Events (22400-22424)
Throttle Seconds:
10
OID:
eagleXgDiameterRbarAddrMismatchWithProvisionedAddressNotify
1. Recovery:
1. An individual address or address range associated with the Application ID,
Command Code and Routing Entity Type may be missing from the RBAR
configuration. Validate which address and address range tables are associated
with the Application ID, Command Code and Routing Entity Type.
2. View the currently provisioned Application IDs, Command Codes, and Routing
Entity Types by selecting RBAR, and then Configuration, and then Address
Resolutions.
Description:
A message could not be routed because the internal "Request Message Queue"
to the Peer CNDRA Relay Agent was full. This should not occur unless the MP is
experiencing local congestion as indicated by Alarm-ID 22200 - MP CPU Congested.
Severity:
Info
Instance:
<MPName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterRbarRoutingAttemptFailureInternalResExhNotify
1. Recovery:
Description:
A message could not be routed because an internal address resolution run-time
database inconsistency was encountered.
3-329
Chapter 3
Range Based Address Resolution (RBAR) Alarms and Events (22400-22424)
Severity:
Info
Instance:
<MPName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterRbarRoutingFailureInternalDbInconsistencyNotify
1. Recovery:
Description:
Address Range Lookup could not be performed for the Local Identifier component of
the Routing Entity Type External Identifier. Address Resolution used the Destination
found using Domain Identifier.
Severity:
Info
Instance:
xxx
HA Score:
Normal
OID:
xxx
1. Recovery:
3-330
Chapter 3
Generic Application Alarms and Events (22500-22599)
Note:
These alarms are generic across the various Peer CNDRA applications with
some details varying depending on the application generating the alarm.
Description:
Peer CNDRA application is unable to process any messages because it is
unavailable.
Severity:
Critical
Instance:
<Peer CNDRA Application Name>
Note:
The value for Peer CNDRA Application Name varies depending on the Peer
CNDRA application generating the alarm such as RBAR. Use the name that
corresponds to the specific Peer CNDRA application in use.
HA Score:
Normal
OID:
eagleXgDiameterCndraApplicationUnavailableNotify
Cause:
The alarm #22500 is raises:
• When the Peer CNDRA application completes initialization and determines its
operational status is unavailable after changing its admin state from disabled to
enabled.
• When the Peer CNDRA application is in enabled state and the following Peer
CNDRA application operational status changes occur:
– Available → Unavailable
3-331
Chapter 3
Generic Application Alarms and Events (22500-22599)
– Degraded → Unavailable
This alarm is clears:
• When Peer CNDRA application is in enabled state and the following Peer CNDRA
application operational status changes occur:
– Unavailable → Available
– Unavailable → Degraded
• If the Diameter process is stopped.
• If the Peer CNDRA application admin state change from Enabled > Disabled.
Diagnostic Information:
• A Peer CNDRA application operation status becomes unavailable when either the
Admin State is set to Disable with the Forced Shutdown option, or the Admin
State is set to Disable with the Graceful Shutdown option and the Graceful
Shutdown timer expires.
• A Peer CNDRA application can also become unavailable when it reaches
Congestion Level 3 if enabled.
Note:
This alarm is NOT raised when the Peer CNDRA application is shutting
down gracefully or application is in Disabled state. Only the Peer CNDRA
Application operational status is changed to unavailable.
1. Recovery:
1. Display and monitor the Peer CNDRA application status. Verify the Admin State is
set as expected.
2. A Peer CNDRA application operation status becomes unavailable when either the
Admin State is set to disable with the Forced Shutdown option, or the Admin State
is set to disable with the Graceful Shutdown option and the Graceful Shutdown
timer expires.
3. If the problem persists, it is recommended to contact My Oracle Support.
Description:
Unable to forward requests to the Peer CNDRA application because it is degraded.
Severity:
Major
Instance:
<Peer CNDRA Application Name>
3-332
Chapter 3
Generic Application Alarms and Events (22500-22599)
Note:
The value for Peer CNDRA Application Name varies depending on the Peer
CNDRA application generating the alarm such as RBAR. Use the name that
corresponds to the specific Peer CNDRA application in use.
HA Score:
Normal
OID:
eagleXgDiameterCndraApplicationDegradedNotify
Cause:
The alarm #22501 raises when the Peer CNDRA application is in enabled state and
the following Peer CNDRA Application Operational Status changes occur:
• Available → Degraded
• Unavailable → Degraded
This alarm is cleared when the Peer CNDRA application is in enabled state and
following Peer CNDRA Application Operational Status changes occur:
• Degraded → Available
• Degraded → Unavailable
Diagnostic Information:
• A Peer CNDRA application becomes degraded when the Peer CNDRA
application becomes congested if enabled. This alarm is NOT raised when the
Peer CNDRA application is shutting down gracefully or application is in the
disabled state.
• Verify the admin state is set as expected. Check the Event History logs for
additional DIAM events or alarms from this MP server.
1. Recovery:
1. Check the Peer CNDRA application status. Verify the Admin State is set as
expected.
2. A Peer CNDRA application becomes degraded when the Peer CNDRA application
becomes congested, if enabled.
Note:
This alarm is NOT raised when the Peer CNDRA application is shutting
down gracefully or application is in the disabled state. Only the Peer
CNDRA application operational status is changed to unavailable.
3. Check the Event History logs for additional DIAM events or alarms for this MP
server.
4. If the problem persists, it is recommended to contact My Oracle Support.
3-333
Chapter 3
Generic Application Alarms and Events (22500-22599)
Description:
The Peer CNDRA Application Request Message Queue Utilization is approaching its
maximum capacity.
Severity:
Minor, Major, Critical
Instance:
<Metric ID>, <Peer CNDRA Application Name>
Note:
The value for Metric ID for this alarm varies (such as
RxRbarRequestMsgQueue) depending on which Peer CNDRA application
generates the alarm (such as RBAR). Use the ID that corresponds to the
specific Peer CNDRA application in use.
Note:
The value for Peer CNDRA Application Name will vary depending on the
Peer CNDRA application generating the alarm (such as RBAR). Use the
name that corresponds to the specific Peer CNDRA application in use.
HA Score:
Normal
OID:
eagleXgDiameterCndraApplicationRequestQueueUtilNotify
Cause:
Alarm #22502 is raises:
• When Peer CNDRA Application Request Message Queue Utilization is
approaching its maximum capacity.
• If this problem persists and the queue reaches 100% utilization all new ingress
Request messages will be discarded.
Diagnostic Information:
To get further information regarding this issue:
1. Examine the alarm log on the active Overseer server.
3-334
Chapter 3
Generic Application Alarms and Events (22500-22599)
2. This alarm should not normally occur when no other congestion alarms are
asserted.
1. Recovery:
1. Display and monitor the Peer CNDRA application status. Verify the Admin State is
set as expected.
The Peer CNDRA application's Request Message Queue Utilization is
approaching its maximum capacity. This alarm should not normally occur when
no other congestion alarms are asserted.
2. Application Routing might be mis-configured and is sending too much traffic to the
Peer CNDRA Application. Verify the configuration.
3. If no additional congestion alarms are asserted, the Peer CNDRA application task
might be experiencing a problem that is preventing it from processing messages
from its Request Message Queue. Examine the Alarm log on the active Overseer
server.
4. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The Peer CNDRA Application Answer Message Queue Utilization is approaching its
maximum capacity.
Severity:
Minor, Major, Critical
Instance:
<Metric ID>, <Peer CNDRA Application Name>
Note:
The value for Metric ID for this alarm varies (such as
RxRbarAnswerMsgQueue) depending on which Peer CNDRA application
generates the alarm (such as RBAR). Use the ID that corresponds to the
specific Peer CNDRA application in use.
Note:
The value for the Peer CNDRA Application Name varies depending on the
Peer CNDRA application generating the alarm (such as RBAR). Use the
name that corresponds to the specific Peer CNDRA application in use.
HA Score:
Normal
3-335
Chapter 3
Generic Application Alarms and Events (22500-22599)
OID:
eagleXgDiameterCndraApplicationAnswerQueueUtilNotify
Cause:
Alarm #22503 raises:
• When Peer CNDRA Application AnswerMessage Queue Utilization is
approaching its maximum capacity.
• If this problem persists and the queue reaches 100% utilization, all new ingress
Answer messages will be discarded.
Diagnostic Information:
To get further information regarding this issue:
1. Examine the alarm log on the active Overseer server.
2. This alarm should not occur when no other congestion alarms are asserted.
1. Recovery:
1. Application Routing might be mis-configured and is sending too much traffic to the
Peer CNDRA application. Verify the configuration.
2. If no additional congestion alarms are asserted, the Peer CNDRA application task
might be experiencing a problem that is preventing it from processing message
from its Answer Message Queue. Examine the Alarm log on the active Overseer
server.
3. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The ingress message rate for the Peer CNDRA application is exceeding its
engineered traffic handling capacity.
Severity:
Minor, Major, Critical
Instance:
<Metric ID>, <Peer CNDRA Application Name>
Note:
The value for metric ID for this alarm varies (such as RxRbarMsgRate)
depending on which Peer CNDRA application generates the alarm (such as
RBAR). Use the ID that corresponds to the specific Peer CNDRA application
in use.
3-336
Chapter 3
Generic Application Alarms and Events (22500-22599)
Note:
The value for Peer CNDRA Application Name varies depending on the Peer
CNDRA application generating the alarm (such as RBAR, etc.). Use the
name that corresponds to the specific Peer CNDRA application in use.
HA Score:
Normal
OID:
eagleXgDiameterCndraApplicationIngressMsgRateNotify
Cause:
The alarm #22504 raises when the ingress message rate for the Peer CNDRA
Application is approaching or exceeding its engineered traffic handling capacity.
Diagnostic Information:
For further information regarding this alarm:
1. Examine the alarm log on Active Overseer Server.
2. Average Ingress Message rate utilization on a MP Server of the Peer CNDRA
Application is exceeding or approaching engineering traffic handling capacity.
1. Recovery:
1. Application routing may be mis-configured and is sending too much traffic to the
Peer CNDRA application. Verify the configuration.
2. There may be an insufficient number of MPs configured to handle the network
load. Monitor the ingress traffic rate of each MP.
3. If MPs are in a congestion state, then the offered load to the server site is
exceeding its capacity.
4. If the problem persists, it is recommended to contact My Oracle Support.
Description:
Peer CNDRA Application Admin state was changed to ‘enabled’.
Severity:
Info
Instance:
<Peer CNDRA Application Name>
3-337
Chapter 3
Full Address Based Resolution (FABR) Alarms and Events (22600-22640)
HA Score:
Normal
Throttle Seconds:
0 (zero)
OID:
eagleXgDiameterCndraApplicationEnabledNotify
1. Recovery:
• No action required.
Description:
Peer CNDRA Application Admin state was changed to ‘disabled’.
Severity:
Info
Instance:
<Peer CNDRA Application Name>
HA Score:
Normal
Throttle Seconds:
0 (zero)
OID:
eagleXgDiameterCndrapplicationDisabledNotify
1. Recovery:
• No action required.
Description:
Message received was rejected because of a decoding failure. While parsing the
message, the message content was inconsistent with the "Message Length" in the
message header. These protocol violations can be caused by the originator of the
3-338
Chapter 3
Full Address Based Resolution (FABR) Alarms and Events (22600-22640)
message (identified by the Origin-Host AVP in the message), the peer who forwarded
the message to this node, or any intermediate node that modifies the message.
Severity:
Info
Instance:
<MPName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterFabrMsgRejectedDecodingFailureNotify
1. Recovery:
Description:
Message could not be routed because the Diameter Application ID is not supported.
A Request message was forwarded to the FABR application which contained an
unrecognized Diameter Application ID in the header. Either an application routing rule
is mis-provisioned or the Application ID is not provisioned in the FABR configuration.
Severity:
Info
Instance:
<MPName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterFabrUnknownApplIdNotify
1. Recovery:
1. The currently provisioned Application Routing Rules can be viewed using
Diameter, and then Configuration, and then Application Route Tables.
2. The currently provisioned Diameter Application IDs can be viewed in the FABR,
and then Configuration, and then Applications Configuration.
3. It is recommended to contact My Oracle Support for assistance.
3-339
Chapter 3
Full Address Based Resolution (FABR) Alarms and Events (22600-22640)
Description:
Message could not be routed because the Diameter Command Code in the ingress
Request message is not supported and the Routing Exception was configured to send
an Answer response.
Either an application routing rule is mis-provisioned or the Command Code is not
provisioned in the FABR configuration.
Severity:
Info
Instance:
<MPName>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterFabrUnknownCmdCodeNotify
1. Recovery:
1. The currently provisioned Application Routing Rules can be viewed using
Diameter, and then Configuration, and then Application Route Tables.
2. The currently provisioned Diameter Application IDs can be viewed in the FABR,
and then Configuration, and then Address Resolutions.
3. It is recommended to contact My Oracle Support for assistance.
Description:
Message could not be routed because no address AVPs were found in the message
and the Routing Exception was configured to send an Answer response.
Severity:
Info
Instance:
<AddrResolution>
HA Score:
Normal
3-340
Chapter 3
Full Address Based Resolution (FABR) Alarms and Events (22600-22640)
Throttle Seconds:
10
OID:
eagleXgDiameterFabrNoRoutingEntityAddrAvpNotify
1. Recovery:
1. If this event is considered abnormal, then validate which AVPs are configured
for routing with the Application ID and Command Code using FABR, and then
Configuration, and then Address Resolutions.
2. The currently provisioned Application Routing Rules can be viewed using
Diameter, and then Configuration, and then Application Route Tables.
3. It is recommended to contact My Oracle Support for assistance.
Description:
No valid User Identity Address is found in the configured AVPs contained in the
ingress message. FABR searches for a valid Routing Entity address in the ingress
Diameter message based on a Routing Entity Preference List assigned to the ordered
pair (Application ID, Command Code) via user-defined configuration. This event is
raised if a valid Routing Entity address cannot be found using any of the Routing
Entity types in the Routing Entity Preference List and if the Routing Exception Action
associated with this failure is set to Send Answer response .
Severity:
Info
Instance:
<AddrResolution>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterFabrNoValidUserIdentityAddrFoundNotify
Cause:
FABR searches for a valid Routing Entity address in the Ingress Diameter Message
based on a Routing Entity Preference List assigned to the ordered pair (Application
ID, Command Code) via user-defined configuration. This event raises if a valid
Routing Entity address cannot be found using any of the Routing Entity types in
the Routing Entity Preference List and if the Routing Exception Action associated
with this failure is set to Send Answer Response.
Diagnostic Information:
Alarm #22604 raises if FABR is unable to decode the user configured AVPS from the
Ingress Diameter Message and yield a routing entity address. This may be a normal
3-341
Chapter 3
Full Address Based Resolution (FABR) Alarms and Events (22600-22640)
1. Recovery:
1. If this event is considered abnormal, then navigate to FABR, and then
Configuration, and then Address Resolutions to validate which AVPs are
configured for routing with the Application ID and Command Code.
2. Navigate to Diameter, and then Configuration, and then Application Route
Tables to view the currently provisioned Application Routing rules.
3. It is recommended to contact My Oracle Support for assistance.
Description:
Message could not be routed because the valid user identity address extracted from
the message did not resolve to a destination address. The Routing Exception was
configured to send an Answer response. Please verify the provisioning in the address
resolution table and the data provided in the SDS corresponding to this address/
resolution entry.
The FABR address resolution table entry may be misconfigured or the destination
address associated with User Identity address from the message and the destination
type configured in the address resolution table may be missing from the address
mapping configuration. The destination address associated with User Identity address
derived may be missing from the address mapping configuration on DP/SDS.
Severity:
Info
Instance:
<AddrResolution>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterFabrNoAddrFoundAtDpNotify
1. Recovery:
1. Validate the address resolution table entry and verify that a valid destination
address is associated with the user identity address by using DP configuration.
3-342
Chapter 3
Full Address Based Resolution (FABR) Alarms and Events (22600-22640)
Description:
FABR application receives service notification indicating Database (DP) or DB
connection (ComAgent) Errors (DP timeout, errors or ComAgent internal errors) for
the sent database query.
Severity:
Info
Instance:
<MPNname>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterFabrDpErrorsNotify
1. Recovery:
Description:
Message could not be routed because the internal “Request Message Queue” to the
DSR Relay Agent was full.
Severity:
Info
Instance:
<MPNname>
HA Score:
Normal
Throttle Seconds:
10
3-343
Chapter 3
Full Address Based Resolution (FABR) Alarms and Events (22600-22640)
OID:
eagleXgDiameterFabrRoutingAttemptFailureDrlQueueExhNotify
1. Recovery:
Description:
FABR could not send a database query either because the ComAgent reported DP
congestion level of (CL=2 or 3), or an abatement period is in progress.
Severity:
Info
Instance:
<MPNname>
HA Score:
Normal
Throttle Seconds:
10
OID:
eagleXgDiameterFabrDpCongestedNotify
1. Recovery:
Description:
Database queries could not be sent because the database connection (ComAgent)
queue was full.
Severity:
Info
Instance:
<MPNname>
HA Score:
Normal
Throttle Seconds:
10
3-344
Chapter 3
Full Address Based Resolution (FABR) Alarms and Events (22600-22640)
OID:
eagleXgDiameterFabrDbConnectionExhNotify
1. Recovery:
Description:
FABR application received status notification indicating DP congestion state change
or DP congestion abatement time period has completed.
Severity:
Info
Instance:
<MPName>
HA Score:
Normal
Throttle Seconds:
0 (zero)
OID:
eagleXgDiameterFabrDpCongestionStateChangeNotify
1. Recovery:
Description:
Message could not be routed because valid User Identity Address extracted from
diameter request belongs to blacklisted subscriber.
Severity:
Info
Instance:
<AddrResolution>
HA Score:
Normal
Throttle Seconds:
10
3-345
Chapter 3
Full Address Based Resolution (FABR) Alarms and Events (22600-22640)
OID:
eagleXgDiameterFabrBlacklistedSubscriberNotify
1. Recovery:
1. Validate which User identity address is not blacklisted by using DP configuration.
The destination address associated with User Identity address derived is
blacklisted in the address mapping configuration on DDR.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The FABR Application's DP Response Message Queue Utilization is approaching its
maximum capacity.
Severity:
Minor, Major, Critical
Instance:
RxFabrDpResponseMsgQueue, FABR
HA Score:
Normal
OID:
eagleXgDiameterFabrAppDpResponseMessageQueueUtilizationNotify
1. Recovery:
1. This alarm may occur due to persistent overload conditions with respect to
database response processing.
2. It is recommended to contact My Oracle Support for assistance.
Description:
FABR application is unavailable and DSR cannot successfully process FABR traffic.
Severity:
Critical
Instance:
Full Address Based Resolution
3-346
Chapter 3
Full Address Based Resolution (FABR) Alarms and Events (22600-22640)
HA Score:
Normal
OID:
eagleXgDiameterComAgentRegistFailNotify
Cause:
This alarm is raised when ComAgent fails to register:
• Service with DPService.
– The DPService routed service entry missing in ComAgent table.
– FABR routing service has been enabled on the MP blade, but DP routed
service entry is not present in the ComAgtRoutedService table on MP blade.
• ServiceNotificationHandler after the successful ComAgent service registration.
Diagnostic Information:
1. Check the ComAgtRoutedService table entries, by running the below command
on the MP1 command prompt. iqt -p -s'|' ComAgtRoutedService
2. Entry corresponding to the DP routed service used by FABR must be present with
id=11 and name=DPService. For example: 11|DPService|No|Yes|0
1. Recovery:
1. Check the ComAgtRoutedService table entries, by running the below command on
the MP1 command prompt.
iqt -p -s'|' ComAgtRoutedService
2. Entry corresponding to the DP routed service used by FABR must be present with
id=11 and name=DPService. For example:
11|DPService|No|Yes|0
3. Disable the FABR application to clear the ComAgent Service Registration Failure
alarm.
4. Check the ComAgtRoutedService table on NOAM server blade to identify if there
is any mismatch with the MP blade.
5. Check the ComAgtRoutedService table on SOAM server blade to identify if there
is any mismatch with the MP blade (in case of 3-tier architecture).
6. If DP routed service entry is not present, then add it to the MP blade using the ivi
command (after turning off the inetrep using pm.set off inetrep), then restart the
inetrep process.
Afterwards, please restart the DSR process by running pm.set off dsr; followed
by pm.set on dsr; on MP blade command prompt.
7. It is recommended to contact My Oracle Support for assistance.
3-347
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Description:
The Diameter Request message(s) received by PCA contain protocol error(s).
Severity:
Info
Instance:
PCA, <PcaFunctionName>
HA Score:
Normal
Throttle Seconds:
60
OID:
pdraPdraProtocolErrorsInDiameterReqNotify
1. Recovery:
Description:
The Diameter Answer message(s) received by PCA contain(s) protocol error(s). This
error message is based on error scenarios such as:
• Command-Code value is not supported
• Mandatory AVP used for processing decisions is missing
• Mandatory AVP used for processing contains an invalid value
• Mandatory Session-Id AVP has a zero-length value
3-348
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Note:
This event is not generated when the received Diameter Answer message
'E' (Error) bit is set and a mandatory Diameter command-specific AVP
(AVPs other than Session-ID, Origin-Host, Origin-Realm, and result-Code)
are missing.
Severity:
Info
Instance:
PCA, <PcaFunctionName>
HA Score:
Normal
Throttle Seconds:
60
OID:
pdraPdraProtocolErrorsInDiameterAnsNotify
1. Recovery:
Description:
The hash function result does not map to a database resource or sub-resource.
Severity:
Info
Instance:
N/A
HA Score:
Normal
OID:
pdraPdraHashingResDoesNotMatchResOrSubResNotify
1. Recovery:
3-349
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Description:
The Diameter Egress message could not be sent because the DRL Message Queue
is full.
Severity:
Info
Instance:
PCA, <PcaFunctionName>
HA Score:
Normal
Throttle Seconds:
60
OID:
pdraPdraEgressMsgRoutingFailureDueToDrlQueueExhaustedNotify
1. Recovery:
1. Refer to measurement RxGyRoAnsDiscardDrlQueueFullPerCmd (in the DSR
Measurements Reference) to determine the number of Gy/Ro Diameter Credit
Control Application Answer messages discarded by OC-DRA due to DRL's
Answer queue being full.
2. It is recommended to contact My Oracle Support for assistance.
Description:
The Policy and Charging server to SBR server communication failure.
Severity:
Info
Instance:
<PcaFunctionName>
HA Score:
Normal
Throttle Seconds:
60
OID:
pdraPdraStackEventSendingFailureCAUnavailNotify
3-350
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Cause:
Applicable Diameter Interface/Message Type
• Gx CCR-I, CCR-U and CCR-T
• Rx AAR, STR
• Gx-Prime CCR-I, CCR-U and CCR-T
Diagnostic Information:
Direct Exception Measurement & Measurement Group:
• 10834: TxPdraErrAnsGeneratedCaFailure in P-DRA Diameter Exception
Measurement Group
3-digit Error Code:
• Refer to EC-507 - Policy SBR Error. ComAgent timeout
1. Recovery:
Description:
The Policy and Charging server received response from SBR server indicating SBR
errors.
Severity:
Info
Instance:
<PcaFunctionName>
HA Score:
Normal
Throttle Seconds:
60
OID:
pdraPdraPsbrErrorIndicationNotify
1. Recovery:
3-351
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Description:
A binding key is not found in the received CCR-I message.
Severity:
Info
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
60
OID:
pdraPdraBindingKeyNotFoundNotify
1. Recovery:
1. Check the P-DRA GUI at Policy DRA, and then Configuration, and then Binding
Key Priority.
2. It is recommended to contact My Oracle Support for assistance.
Description:
PCA failed to process a Diameter message. The specific reason is provided by the
PCA signaling code.
Severity:
Info
Instance:
<PcaFunctionName>
HA Score:
Normal
Throttle Seconds:
60
OID:
pdraPdraDiameterMessageProcessingFailureNotify
1. Recovery:
1. If the event was generated for a Diameter message being discarded due
to congestion, refer to the Recovery steps for Alarm 22504 - Peer CNDRA
Application Ingress Message Rate.
2. It is recommended to contact My Oracle Support for further assistance.
3-352
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Description:
The PCA Function is unable to process any messages because it is Disabled.
Severity:
Major
Instance:
<PcaFunctionName>
HA Score:
Normal
OID:
pdraPcaFunctionDisabledNotify
1. Recovery:
1. The PCA Function becomes Disabled when the Admin State is set to Disable. The
PCA Function Admin State can be determined from the SOAM GUI Policy and
Charging, and then General Options. Verify the admin state is set as expected.
2. If the Admin State of the PCA Function is to remain Disabled, consider changing
the ART configuration to stop sending traffic for that function to PCA.
3. It is recommended to contact My Oracle Support for assistance if needed.
Description:
The PCA Function is unable to process any messages because it is Unavailable.
Severity:
Major
Instance:
<PcaFunctionName>
HA Score:
Normal
3-353
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
OID:
pdraPcaFunctionUnavailableNotify
1. Recovery:
1. The availability of the Policy DRA function to receive and process ingress
messages is based on its administration state (Enabled or Disabled) and the
status of the SBR Binding and Session resources.
2. The availability of the Online Charging DRA function to receive and process
ingress messages is based on its administration state (Enabled or Disabled), OCS
configuration, and the status of the SBR Session resource.
3. The PCA function is unavailable to receive and process ingress messages for one
of the following reasons:
• "Insufficient Binding SBR Resources" - The number of Binding SBR sub-
resources available is less than the minimum number required. Refer to the
Recovery steps for Alarm 22722 - Policy Binding Sub-resource Unavailable,
which will also be asserted.
• "Insufficient Session SBR Resources" - The number of Session SBR sub-
resources available is less than the minimum number required. Refer to the
Recovery steps for Alarm 22723 - Policy and Charging Session Sub-resource
Unavailable, which will also be asserted.
• "No OCSs Configured at Site" - At least one OCS is required to be locally
configured. Use the SOAM GUI Main Menu Policy and Charging, and then
Configuration, and then Online Charging DRA, and then OCSs to configure
an OCS at the site.
• "Session DB has not been created" - A Session SBR Database must be
configured for each Policy and Charging Mated Sites Place Association.
Use the Network OAM GUI Main Menu Policy and Charging, and then
Configuration, and then SBR Databases to configure a Session SBR
Database.
• "Binding DB has not been created" - For P-DRA, a Binding SBR Database
must be configured. Use the Network OAM GUI Main Menu Policy and
Charging, and then Configuration, and then SBR Databases to configure
a Binding SBR Database.
• "Session DB's admin state is not Enabled" - A Session SBR Database must
be Enabled for each Policy and Charging Mated Sites Place Association
where signaling is to be processed. Use the Network OAM GUI Main Menu
Policy and Charging, and then Maintenance, and then SBR Database
Status to Enable a Session SBR Database.
• "Binding DB's admin state is not Enabled" - For P-DRA, a Binding SBR
Database must be Enabled. Use the Network OAM GUI Main Menu Policy
and Charging, and then Maintenance, and then SBR Database Status to
Enable a Binding SBR Database.
4. It is recommended to contact My Oracle Support for assistance if needed.
3-354
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Description:
The number of SBR sessions threshold for a Policy and Charging Mated Sites Place
Association has been exceeded.
Severity:
Minor, Major, Critical
Instance:
<SbrDatabaseName>
HA Score:
Normal
OID:
eagleXgDiameterPSbrActSessThreshNotify
Cause:
The number of session records stored in the policy session database has exceeded
the minor, major, or critical alarm threshold percentage of the calculated session
capacity for the topology.
Diagnostic Information:
Check the event or alarm information on the active SOAM and analyze the error trace
on this SBR server.
1. Recovery:
1. The session database specified in the Instance field is nearing the limit on the
number of session records. Alarm severity is determined by the number of session
records stored in the policy session database exceeding the alarm threshold
percentage of the calculated session capacity for the topology.
2. If the alarm assert thresholds are improperly configured, they can be configured
on a network-wide basis on the NOAM from the Policy DRA, and then
Configuration, and then Alarm Settings.
3. In general, the system should be sized to host the expected number of concurrent
sessions per policy subscriber.
4. If the system is nearing 100% capacity, it is recommended to contact My Oracle
Support for further assistance.
Description:
An error occurred during a SBR database operation.
Severity:
Info
3-355
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Instance:
<SbrServerType>, <SbrSgNameDbType> (I-SBR)
HA Score:
Normal
Throttle Seconds:
60
OID:
eagleXgDiameterPSBRDbOpFailNotify
1. Recovery:
1. An unexpected, internal error was encountered while the SBR database was being
accessed. This error may occur for a variety of reasons:
a. The database is filled to capacity
b. Database inconsistency between NO and SO tables caused by a database
restore operation. This issue is corrected by the SBR audit.
2. It is recommended to contact My Oracle Support for further assistance.
Description:
The SBR received an error or timeout response from Communication Agent when
sending a stack event to another SBR server.
Severity:
Info
Instance:
<SbrServerType>, <SbrDbType> (I-SBR)
HA Score:
Normal
Throttle Seconds:
60
OID:
eagleXgDiameterPSBRStkEvFailComAgentNotify
1. Recovery:
3-356
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Description:
Failed to create an Alternate Key record in the Binding database.
Severity:
Info
Instance:
Session SBR
HA Score:
Normal
Throttle Seconds:
60
OID:
eagleXgDiameterPSBRAltKeyCreateFailNotify
1. Recovery:
Description:
SBR encountered an error while processing PCA initiated RAR requests.
Severity:
Info
Instance:
Session SBR
HA Score:
Normal
Throttle Seconds:
60
OID:
eagleXgDiameterPSBRRARInitiationErrNotify
1. Recovery:
3-357
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Description:
SBR DB (Binding, Session, or Universal) auditing has been suspended because the
Session Integrity send rate is more than the engineering configurable threshold, or
due to a congestion condition on either the local server reporting the alarm or on a
remote server being queried for auditing purposes.
Severity:
Minor
Instance:
N/A
HA Score:
Normal
OID:
eagleXgDiameterPSBRAuditSuspendedNotify
1. Recovery:
1. If the Binding DB server is not locally congested, this alarm indicates that auditing
is suspended only on the remote Session servers being queried by Binding for
auditing purposes that are congested. The audit cleans up stale records in the
database. Prolonged suspension of the audit could result in the exhaustion of
memory resources on a binding or session SBR server. Investigate the causes of
congestion on the SBR servers (see Alarm 22725 - SBR Server In Congestion).
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
This report provides statistics related to SBR session or binding table audits. Each
SBR server generates this event upon reaching the last record in a table. The
statistics reported are appropriate for the type of table being audited. This report also
provides hourly statistics related to the Pending RAR report.
Severity:
Info
Instance:
<PcaTableName>, <SbrSgName> (I-SBR)
HA Score:
Normal
Throttle Seconds:
0 (zero)
3-358
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
OID:
eagleXgDiameterPSBRAuditStatisticsReportNotify
1. Recovery:
Description:
SBR Alternate Key Creation Failure rate exceeds threshold.
Severity:
Minor, Major, Critical
Instance:
PsbrAltKeyCreationFailureRate, SBR
HA Score:
Normal
OID:
eagleXgDiameterPSBRAltKeyCreationFailureRateNotify
1. Recovery:
Description:
Binding record is not found for the configured binding keys in the binding dependent
session-initiation request message.
Severity:
Info
Instance:
N/A
HA Score:
Normal
3-359
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Throttle Seconds:
60
OID:
pdraPdraBindingRecordNotFoundNotify
1. Recovery:
1. Check the Policy and Charging GUI Main Menu Policy and Charging, and then
Configuration, and then Binding Key Priority on the subscriber key priorities to
ensure the configuration is correct.
2. Using the Binding Key Query Tool, check if a binding exists for the binding keys at
Policy DRA, and then Configuration, and then Binding Key Priority.
Description:
A Binding capable session initiation request failed because this subscriber already
has the maximum number of sessions per binding.
Severity:
Info
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
60
OID:
pdraPdraMaxSessionsReachedNotify
1. Recovery:
1. Determine if the existing sessions are valid. The existing sessions may be
displayed using the Binding Key Query Tool to obtain all relevant information
including session IDs and PCEF FQDNs.
2. If the sessions exist in the P-DRA but not on the PCEF(s), it is recommended to
contact My Oracle Support.
Description:
The SBR to PCA Response Queue Utilization Threshold Exceeded
3-360
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Severity:
Minor, Major, Critical
Instance:
RxPcaSbrEventMsgQueue, PCA
HA Score:
Normal
OID:
pdraPdraPsbrResponseQueueUtilizationNotify
1. Recovery:
1. If one or more MPs in a server site have failed, the traffic will be distributed
amongst the remaining MPs in the server site. Monitor the MP server status from
Status & Manage, and then Server Status
2. The mis-configuration of Diameter peers may result in too much traffic being
distributed to the MP. Monitor the ingress traffic rate of each MP from Status &
Manage, and then KPIs
Each MP in the server site should be receiving approximately the same ingress
transaction per second.
3. There may be an insufficient number of MPs configured to handle the network
load. Monitor the ingress traffic rate of each MP by selecting Status & Manage,
and then KPIs.
If MPs are in a congestion state, then the offered load to the server site is
exceeding its capacity.
4. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The Policy and Charging Server is operating in congestion. Average Policy and
Charging ingress messages rate exceeds the configured threshold. The thresholds
are based on the engineered system value for Ingress Message Capacity.
Severity:
Minor, Major, Critical
Instance:
PCA
HA Score:
Normal
OID:
pdraPdraCongestionStateNotify
3-361
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Cause
This alarm raises when the Average Policy and Charging ingress messages rate
exceeds the configured threshold. The thresholds are based on the engineered
system value for Ingress Message Capacity.
Diagnostic Information:
• The alarm thresholds for DSR Application Ingress Message Rate are
configured network wide on Network OAM using the Policy DRA >
Configuration > Alarm Settings and Congestion Options screens.
• Monitor the ingress traffic rate of each MP by selecting Main Menu > Status &
Manage > KPIs. If MPs are in a congestion state, then the offered load to the
server site is exceeding its capacity.
1. Recovery:
1. Adjust the alarm threshold parameters. Verify the configuration by navigating to
the Congestion Options on Policy DRA, and then Configuration, and then Alarm
Settings.
2. There may be an insufficient number of MPs configured to handle the network
load. Monitor the ingress traffic rate of each MP by selecting Status & Manage,
and then KPIs.
If MPs are in a congestion state, then the offered load to the server site is
exceeding its capacity.
3. If the problem persists, it is recommended to contact My Oracle Support.
Description:
One or more Policy binding sub-resources are not available.
Severity:
• Major: When a Binding SBR Database is prepared or enabled and at least one
server group that has a range of binding sub-resources is not available
• Critical: When a Binding SBR Database is prepared or enabled and all of the
binding sub-resources are not available, i.e., all server groups hosting the sub-
resources are not available.
Instance:
<ResourceDomainName>
HA Score:
Normal
OID:
pdraPdraBindingSubresourceUnavailableNotify
3-362
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
1. Recovery:
1. At the NOAM, navigate to the SBR Database Status screen at Policy and
Charging, and then Maintenance, and then SBR Database Status and locate
the SBR Database specified in the Alarm Additional Information. The database's
Operational Status and the Operational Reason values associated with resource
users and resource providers are displayed.
2. Click on the row for the Database Name. If the Resource User Operational
Reason has a colored cell, the lower-left pane on the status screen will display
information about which resource users are having problems accessing the
database. If the Resource Provider Operational Reason has a colored cell, the
lower-right pane on the status screen will display information about which resource
providers are unable to provide service.
3. If the Resource Provider pane on the lower right is empty, look for ComAgent
connection Alarms. If ComAgent connection alarms exist, follow the Recovery
steps for those alarms to troubleshoot further. If there are no ComAgent
connection alarms, review the configuration of Resource Domains, Places, and
Place Associations using the NOAM GUI and verify that they are provisioned as
expected:
• Configuration, and then Resource Domains
• Configuration, and then Places
• Configuration, and then Place Associations
4. Click the Database Name hyperlink to go to the SBR Database Configuration View
screen, filtered by the SBR Database Name. Make note of the Resource Domain
configured for the SBR Database.
5. Navigate to the ComAgent HA Services Status screen at Communication Agent,
and then Maintenance, and then HA Service Status and locate the Resource
with name equal to that configured as the Resource Domain for the SBR
Database.
6. Click the HA Services Status row for the Resource, which will have further detailed
information about the Communication Agent's problem.
7. It is recommended to contactMy Oracle Support for assistance if needed.
Description:
One or more Policy and Charging session sub-resources are not available.
Severity:
• Major: When a Session SBR Database is prepared or enabled and at least one of
the server groups hosting session sub-resources is not available.
3-363
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
• Critical: When a Session SBR Database is prepared or enabled and all of the
server groups hosting session sub-resources are not available.
Instance:
<ResourceDomainName>
HA Score:
Normal
OID:
pdraPdraSessionSubresourceUnavailableNotify
1. Recovery:
1. At the NOAM, navigate to the SBR Database Status screen at Policy and
Charging, and then Maintenance, and then SBR Database Status and locate
the SBR Database specified in the Alarm Additional Information. The database's
Operational Status and the Operational Reason values associated with resource
users and resource providers are displayed.
2. Click on the row for the Database Name. If the Resource User Operational
Reason has a colored cell, the lower-left pane on the status screen will display
information about which resource users are having problems accessing the
database. If the Resource Provider Operational Reason has a colored cell, the
lower-right pane on the status screen will display information about which resource
providers are unable to provide service.
3. If the Resource Provider pane on the lower right is empty, look for ComAgent
connection Alarms. If ComAgent connection alarms exist, follow the Recovery
steps for those alarms to troubleshoot further. If there are no ComAgent
connection alarms, review the configuration of Resource Domains, Places, and
Place Associations using the NOAM GUI and verify that they are provisioned as
expected:
• Configuration, and then Resource Domains
• Configuration, and then Places
• Configuration, and then Place Associations
4. Click the Database Name hyperlink to go to the SBR Database Configuration View
screen, filtered by the SBR Database Name. Make note of the Resource Domain
configured for the SBR Database.
5. Navigate to the ComAgent HA Services Status screen at Communication Agent,
and then Maintenance, and then HA Service Status and locate the Resource
with name equal to that configured as the Resource Domain for the SBR
Database.
6. Click the HA Services Status row for the Resource, which will have further detailed
information about the Communication Agent's problem.
7. It is recommended to contact My Oracle Support for assistance if needed.
3-364
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Description:
The SBR server memory utilization threshold has been exceeded.
Severity:
Minor, Major, Critical
Instance:
Policy and Charging mated Sites Place Association Name
HA Score:
Normal
OID:
eagleXgDiameterPSbrMemUtilNotify
Cause:
Policy pSBR server memory utilization threshold has been exceeded.
This alarm's assert conditions are defined by the following default parameters:
• Minor: pSBR memory utilization threshold > 70%
• Major: pSBR memory utilization threshold > 80%
• Critical: pSBR memory utilization threshold > 90%
Diagnostic Information:
• The pSBR exceeds the engineered memory utilization levels.
• Do not rasie pSBR memory Alarm 22724 on non-pSBR servers.
• Check the server memory usage.
1. Recovery:
1. Change threshold parameters.
2. If this condition persists, it may be necessary to allocate more memory for pSBR.
3. It is recommended to contact My Oracle Support for further assistance.
Description:
The SBR server is operating in congestion.
3-365
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Severity:
• Minor: CL_1
• Major: CL_2
• Critical: CL_3
Instance:
Policy and Charging mated Sites Place Association Name, <SbrSgName> (I-SBR)
HA Score:
Normal
OID:
eagleXgDiameterPSbrServerInCongestionNotify
1. Recovery:
1. Application Routing might be mis-configured and is sending too much traffic to
the DSR Application. Verify the configuration by selecting Diameter, and then
Configuration, and then Application Route Tables.
2. There may be an insufficient number of MPs configured to handle the network
load. Monitor the ingress traffic rate of each MP by selecting Status & Manage,
and then KPIs.
If MPs are in a congestion state, then the offered load to the server site is
exceeding its capacity.
3. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The SBR stack event queue utilization threshold has been exceeded. The
alarm is asserted for three separate stack event queues (PsbrSisTaskQMetric,
PsbrSisSendRarTaskQMetric, and PsbrInvokeSisRspHandlerTaskQMetric) in Binding
and Session SBR servers.
Severity:
Minor, Major, Critical
Instance:
SBR
HA Score:
Normal
3-366
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
OID:
eagleXgDiameterPSbrStackEvQUtilNotify
Cause:
The alarm is asserted for the separate stack event queues as following:
• PsbrBindingTaskQMetric
• PsbrSessionTaskQMetric
• PsbrAuditStackEventTaskQMetric
• PsbrTableWatcherTaskQMetric
• PsbrSisTaskQMetric
• PsbrSisSendRarTaskQMetric
• PsbrInvokeSisRspHandlerTaskQMetric
• PsbrSisRspHandlerTaskQMetric
Each stack event queue has its configurable threshold parameters.
Default values as following:
• Assert conditions:
– Minor: pSBR stack event queue utilization threshold > 80%
– Major: pSBR stack event queue utilization threshold > 90%
– Critical: pSBR stack event queue utilization threshold > 100%
• Clear conditions:
– Minor: pSBR stack event queue utilization threshold <= 70%
– Major: pSBR stack event queue utilization threshold <= 85%
– Critical: pSBR stack event queue utilization threshold <= 95%
Diagnostic Information:
To further diagnose the issue:
• Check the event/alarm information on the active SOAM and analyze the error
trace on this SBR server.
• Collect Savelogs on this SBR server.
• Event History on the active SOAM server.
1. Recovery:
Description:
The SBR server process failed to initialize.
3-367
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Severity:
Critical
Instance:
Policy DRA Mated Sites Place Association Name
HA Score:
Normal
OID:
eagleXgDiameterPSbrInitializationFailureNotify
Cause:
• Any of the ComAgent registration calls for either session resource or binding
resource fails during the pSBR initialization.
• Unable to calculate the number of Session or Binding Sub-resource.
• Unable to initialize the SBR internal resource. For example, PsbrHaMgr.
Diagnostic Information:
• Check the event/alarm information on the active SOAM and analyze the error
trace on this SBR server.
• Collect Savelogs on this SBR server.
• Event history on the active SOAM server.
1. Recovery:
Description:
The number of bindings threshold has been exceeded.
Severity:
Minor, Major, Critical
Instance:
<SbrDatabaseName>
HA Score:
Normal
3-368
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
OID:
eagleXgDiameterPSbrActBindThreshNotify
Cause:
The Binding Region specified in the Instance field is nearing the expected number of
binding records for this network.
Diagnostic Information:
The alarm thresholds for Binding Capacity alarms are configured network wide on
Network OAM using the "Policy DRA > Configuration > Alarm Settings" screen.
• If the alarm severity is minor, the alarm means that the number of binding records
stored in Binding Region has exceeded the minor alarm threshold percentage of
the calculated binding capacity for the topology.
• If the alarm severity is major, the alarm means that the number of binding records
stored in Binding Region has exceeded the major alarm threshold percentage of
the calculated binding capacity for the topology.
• If the alarm severity is major, the alarm means that the number of binding records
stored in Binding Region has exceeded the major alarm threshold percentage of
the calculated binding capacity for the topology.
1. Recovery:
1. The binding database specified in the Instance field is nearing the limit on the
number of binding records. The alarm threshold percentages can be modified as
desired by the network operator at the NOAM using Policy and Charging, and
then Configuration, and then Alarm Settings.
2. If a given alarm severity is unwanted, the alarm severity may be suppressed by
checking the Suppress checkbox for that alarm severity.
3. It is recommended to contact My Oracle Support to discuss plans for system
growth is this alarm continues to be asserted under normal operating conditions.
Note:
It is expected, but not guaranteed, that the system will continue
to function beyond the tested maximum number of subscribers with
bindings.
Description:
PCRF Not Configured
Severity:
Major
Instance:
Policy Binding Region Place Association Name
3-369
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
HA Score:
Normal
OID:
pdraPcrfNotConfiguredNotify
Cause:
This alarm raises when the P-DRA completes initialization and determines that the
PCRF's are not configured.
Diagnostic Information:
• Check the NOAM GUI at Main Menu > Policy and Charging > Configuration >
Policy DRA for further PCRF configuration.
• Check for any missing configuration or capture this screen for further analysis.
1. Recovery:
1. Check the NOAM GUI at Policy and Charging, and then Configuration, and
then Policy DRA for further PCRF configuration.
2. Check the event history logs in Alarms & Events.
3. If the problem persists, it is recommended to contact My Oracle Support.
Description:
Policy and Charging message processing could not be successfully completed due to
a configuration error.
Severity:
Major
Instance:
<ConfigurationError>
HA Score:
Normal
OID:
pdraPdraConfigErrorNotify
Cause:
• The session initiation request message was received with a missing or un-
configured APN.
3-370
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Diagnostic Information:
• Check DSR configuration
• Check Diameter message PCAP.
1. Recovery:
1. If there is an unconfigured PCRF, it means the binding capable session initiation
request was routed to a PCRF that is not configured in Policy and Charging,
and then Configuration, and then Policy DRA, and then PCRFs at the site
where the request was received. This indicates a mismatch between the PCRF's
configuration and the routing configuration. If the PCRF is a valid choice for the
request, configure the PCRF in Policy and Charging, and then Configuration,
and then Policy DRA, and then PCRFs. If the PCRF is not valid for the request,
correct the routing table or tables included the PCRF.
Also see measurement RxBindCapUnknownPcrf in the DSR Measurement
Reference.
2. If there is an unconfigured APN and if the APN string is valid, configure the APN
at the NOAM using the Policy and Charging, and then Configuration, and then
Access Point Names screen. If the APN string is not valid, investigate the policy
client to determine why it is sending policy session initiation requests using the
invalid APN.
Also see measurements RxBindCapUnknownApn and RxBindDepUnknownApn in
the DSR Measurement Reference.
3. If there is a missing APN, investigate the policy client to determine why it is
sending policy session initiation requests with no APN.
Also see measurements RxBindCapMissingApn and RxBindDepMissingApn in the
DSR Measurement Reference.
4. If there are no PCRFs configured, configure PCRFs at the SOAM GUI for the site
using Policy and Charging, and then Configuration, and then PCRFs.
5. If there is an unconfigured OCS, it means that the binding independent session
initiation request was routed to an OCS that is not configured in Policy and
Charging, and then Configuration, and then Online Charging DRA, and then
OCSs. This indicates a mismatch between the OCSs configuration and the routing
configuration. If the OCS named in the alarm additional information is a valid
choice for the request, configure the OCS at the SOAMP using Policy and
Charging, and then Configuration, and then Online Charging DRA, and then
OCSs. If the OCS is not valid for the request, correct the routing table or tables
included the OCS.
6. It is recommended to contact My Oracle Support.
3-371
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Description:
The Policy and Charging database inconsistency exists due to an internal data error
or internal database table error.
Severity:
Major
Instance:
<PcaFunctionName>
HA Score:
Normal
OID:
pdraPdraDbInconsistencyExistsNotify
1. Recovery:
1. Check the error history logs for the details of the data inconsistency.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The SBR process on the indicated server is using higher than expected CPU
resources.
Severity:
Minor, Major, Critical
Instance:
psbr.cpu, SBR
HA Score:
Normal
OID:
eagleXgDiameterPSbrProcCpuThreshNotify
Cause:
Policy SBR Process CPU Utilization Threshold has been exceeded. The Policy SBR
process on the indicated server is using higher than expected CPU resources.
Diagnostic Information:
This alarm's assert conditions are defined by the following parameters:
3-372
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
1. Recovery:
1. If this condition persists, it may be necessary to deploy more policy signaling
capacity.
2. It is recommended to contact My Oracle Support for further assistance.
Description:
The SBR failed to free binding memory after PCRF Pooling binding migration.
Severity:
Minor
Instance:
N/A
HA Score:
Normal
OID:
eagleXgDiameterPSBRPostMigrationMemFreeNotify
1. Recovery:
1. On systems upgraded from a release where Policy DRA was running, but that did
not support PCRF Pooling, to a release that supports PCRF Pooling, binding data
is migrated from the tables used by the old release to tables used by the new
release. Once this migration process completes on a given binding policy SBR,
a script is automatically executed to free memory for the old tables. If this script
should fail for any reason to free the memory, this alarm is asserted.
2. If additional assistance is needed, it is recommended to contact My Oracle
Support.
3-373
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Description:
A Policy and Charging server received a stack event with an unexpected down-
version.
Severity:
Major
Instance:
N/A
HA Score:
Normal
OID:
pdraPdraUnexpectedSEDownVersionNotify
Cause:
A Policy and Charging server received a stack event with an unexpected down-
version. One of the SBRs is running on an older version of DSR software.
Diagnostic Information:
From the event history, view the details of this alarm. Determine which server/server
group the alarm was raised for.
1. Recovery:
1. From the NOAM GUI at Policy and Charging, and then Maintenance, and then
SBR Status, find the Resource Domain Name to which the stack event was being
sent.
2. Expand all Server Groups having that Resource Domain name to see which
Server Group hosts the ComAgent Sub Resource.
3. The Server with Resource HA Role of "Active" is likely the server that has the
old software (unless a switch-over has occurred since the alarm was asserted). In
any case, one of the servers in the Server Group has old software. The software
version running on each server can be viewed from Administration, and then
Upgrade. The "Hostname" field is the same as the Server Name on the SBR
Status screen
4. Find the server or servers running the old software and upgrade those servers to
the current release and accept the upgrade.
5. If additional assistance is needed, it is recommended to contact My Oracle
Support.
Description:
A Policy DRA session initiation request was received with no APN.
3-374
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Severity:
Info
HA Score:
Normal
Instance:
None
Throttle Seconds:
30
OID:
pdraPdraSessInitReqWithNoApnNotify
1. Recovery:
1. Investigate why the policy client named by the Origin-Host FQDN in the additional
information field is not including the Called-Station-ID AVP and correct it to include
the APN.
2. Investigate why the policy client named by the Origin-Host FQDN in the additional
information field is not including the Called-Station-ID AVP and correct it to include
the APN. Or have that policy client include another binding correlation key that can
be used to find the binding
3. Examine associated measurements RxBindCapMissingApn and
RxBindDepMissingApn (refer to the DSR Measurements Reference for details
about these measurements).
4. If the problem persists, it is recommended to contact My Oracle Support.
Description:
SBR failed to free shared memory after a PCA function is disabled
Severity:
Minor
HA Score:
Normal
Instance:
<PcaFunctionName>
OID:
pdraPSBRPostPcaFunctionDisableMemFreeNotify
1. Recovery:
3-375
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Description:
Configuration Database is not synced between the System OAM and Network OAMP.
Severity:
Minor
Instance:
Site name of SOAM server which asserted this alarm
HA Score:
Normal
OID:
pdraPcaConfDbNotSyncedNotify
1. Recovery:
1. Make note of all Status & Manage, and then Database Resote operations (if any)
at NOAM or SOAM within a day of the occurrence of alarm.
2. Gather all configuration changes (Insert, Edit, or Delete) for PCRFs, Policy Clients,
OCSs, CTFs via Security Log from the time the database restore was executed
until the present. If there was no database restore performed, then start from the
time the alarm was first asserted until the present.
3. If additional assistance is needed, it is recommended to contact My Oracle
Support.
Description:
This event is generated any time a state transition occurs in a SBR Database
Resizing or Data Migration Plan. This includes both state transitions due to a user
clicking a button on the SBR Database Reconfiguration Status screen and internal
state transitions.
Severity:
Info
Instance:
<SbrReconfigurationPlanName>, <SbrReconfigurationPlanName> (I-SBR)
3-376
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
HA Score:
Normal
Throttle Seconds:
0 (zero)
OID:
eagleXgDiameterPsbrReconfigStateTransitionNotify
1. Recovery:
• This event records the time and conditions under which an SBR Database
Reconfiguration Plan (identified in the event instance field) undergoes a state
transition. The event additional information includes details such as the previous
state, current state, and whether the "Force" option was chosen. This event can be
used to obtain a timeline of the entire history of a given reconfiguration plan.
Description:
Failed to successfully complete an SBR Reconfiguration Plan.
Note:
When an SBR Reconfiguration Plan is completed by the user clicking
Complete, or Force Complete on the SBR Reconfiguration Status GUI,
database updates are performed to finalize the reconfiguration plan as
follows. If any of these updates fail, this alarm shall be asserted.
• Condition 1: Failed to update the Resource Domain of the SBR
Database to point to the Target Resource Domain of the Resizing Plan
on completion of a Resizing Plan.
• Condition 2: Failed to mark the Initial SBR Database so that it is no
longer the default database for the Place Association on completion of a
Data Migration Plan.
• Condition 3: Failed to mark the Target SBR Database as the default
database for the Place Association on completion of a Data Migration
Plan.
• Condition 4: Failed to enable the Target SBR Database on completion of
a Data Migration Plan.
• Condition 5: Failed to disable the Initial SBR Database on completion of
a Data Migration Plan.
Severity:
• Minor: Condition 5
3-377
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Instance:
<SbrReconfigPlanAndCondition>
HA Score:
Normal
OID:
eagleXgDiameterPSbrReconfigConditionsErrorNotify
1. Recovery:
• The SBR Reconfiguration plan specified in the Alarm Instance was not
successfully completed, possibly leaving the SBR Database in an abnormal state.
Make note of the specific reason for the alarm, and it is recommended to contact
My Oracle Support for assistance.
Description:
Unable to Route RAR generated at PCA
Severity:
Info
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
60
OID:
eagleXgDiameterPcaGeneratedRARRouteErrNotify
1. Recovery:
• Use Destination-Host to identify the locally generated RAR routing failures and
correct the respective configurations. If the DRL provides an error message, it will
be displayed with this event, which will have a 3-digit internal error code.
3-378
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
Description
Enhanced Overload Control administrative and operational states are mismatched.
Severity
Major
Instance
None
HA Score
Normal
OID
eagleXgDiameterEnhancedOverloadCtrlAdminStateMismatch
1. Recovery
Description
PCA Server Congested Due to Composite Resource Congestion.
Severity
Minor, Major, Critical
Instance
None
HA Score
Normal
OID
eagleXgDiameterPcaCongrestionStateNotify
1. Recovery
The PCA server is congested because at least one of the PCA resources is
congested.
1. The Application Routing Table may be configured incorrectly and too much traffic
was sent to PCA. Verify the configuration via Diameter, and then Configuration,
and then Application Routing Rules.
3-379
Chapter 3
Policy and Charging Application (PCA) Alarms and Events (22700-22799)
2. A burst of ingress traffic from the network. There may be insufficient number of
DA-MPs configured to handle the network load. The ingress traffic rate of each
DA-MP can be monitored from Status & Manage, and then KPIs. If DA-MPs are
in a congestion state, then the offered load to the server site is exceeding its
capacity.
3. It is recommended to contact My Oracle Support for assistance if needed.
Description:
The Enhanced Suspect Binding Feature is enabled.
Severity:
Info
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
0 (zero)
OID:
eagleXgDiameterEnhSuspBindingFeatEnabledNotify
1. Recovery:
• No action required.
Description:
The binding SBR audit function is suppressed by the Enhanced Suspect Binding
Removal feature.
Severity:
Minor
Instance:
PCA
HA Score:
Normal
3-380
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
OID:
Recovery:
1. If this condition persists, it may indicate a failure of a PCRF or the need to change
the configuration of the Suspect Binding Removal Rules.
2. It is recommended to contact My Oracle Support for further assistance.
Description:
A managed SBR process cannot be started or has unexpectedly terminated.
Severity:
Major
Instance:
xxx
HA Score:
Normal
OID:
xxx
1. Recovery:
Description:
Diameter message received was not processed as it contained an unsupported
Application Identifier.
Severity:
Info
Instance:
N/A
3-381
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
HA Score:
Normal
Throttle Seconds:
300
OID:
N/A
1. Recovery:
Description:
One or more Universal SBR sub-resources are unavailable
Severity:
Critical, Major
Instance:
<ResourceDomainName>
HA Score:
Normal
OID:
scefUsbrSubresourceUnavailableNotify
Cause:
This alarm is cleared if any of the following conditions are met:
• When a relevant Universal SBR Database administrative state is Disable and the
Operational Status is Providers Detaching or Disable
• When a relevant Universal SBR Reconfiguration Plan administrative state is
Cancel and the Operational Status is Providers Detaching From Target and the
resource user has received notification (from ComAgent) that all of the initial
sub-resources are available
• When a relevant Universal SBR Reconfiguration Plan administrative state is
Complete and the Operational Status is Providers Detaching From Initial and
the resource user has received notification (from ComAgent) that all of the target
sub-resources are available
• The application process (dsr) on the server that asserted the alarm is shut down
• The SCEF application on the server that asserted the alarm is manually Disabled
Diagnostic Information:
N/A
3-382
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
1. Recovery:
Description:
Diameter message received was not processed as it contained an unsupported
Command Code.
Severity:
Info
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
300
OID:
N/A
1. Recovery:
Description:
HTTP message received could be processed due to an error.
Severity:
Info
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
300
OID:
N/A
1. Recovery:
3-383
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
Description:
Message processing failed because a required configuration was not found.
Severity:
Major
Instance:
N/A
HA Score:
Normal
OID:
scefConfigurationErrorNotify
Cause:
This alarm is triggered by a transient condition (for example, receipt of an ingress
message) and is cleared automatically <Auto Clear Secs> after the last time the
condition occurs.
Diagnostic Information:
N/A
1. Recovery:
• No action required.
Description:
Diameter message received was not processed due to protocol errors.
Severity:
Info
Instance:
N/A
HA Score:
Normal
3-384
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
Throttle Seconds:
300
OID:
N/A
1. Recovery:
Description:
HTTP message received was not processed due to protocol errors.
Severity:
Info
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
300
OID:
N/A
1. Recovery:
Description:
SCEF-MP server received an error response from the Universal SBR server.
Severity:
Info
Instance:
N/A
HA Score:
Normal
3-385
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
Throttle Seconds:
300
OID:
N/A
1. Recovery:
Description:
Diameter request could not be routed by the local Diameter Stack.
Severity:
Info
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
300
OID:
N/A
1. Recovery:
Description:
This event is raised when ACL is not configured for SCS.
Severity:
Info
Instance:
ScsAsId
1. Recovery:
3-386
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
• Configure ACL for ScsAs by adding the entry to the ScefACL table and associating
the same with ScsAs.
Description:
This event is raised each time queue utilization for USBR response task exceeds the
configured threshold value.
Severity:
Major
Instance:
None
HA Score:
Normal
1. Recovery:
Description:
This event is raised each time queue utilization for SCEF polling task exceeds the
configured threshold value.
Severity:
Major
Instance:
None.
HA Score:
Normal
1. Recovery:
• If this event is observed consistently, there may be too many concurrent events
received for same subscriber. Monitor the USBR alarms and measurements to
identify issue.
3-387
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
102801 -
Event Type:
SCEF
Description:
An alarm was raised from the policy rule file.
Severity:
Major
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
1. Recovery:
1. Investigate using the log for stacktrace.
2. It is recommended to contact My Oracle Support if further assistance is needed.
102826 -
Event Type:
SCEF
Description:
The application does not exist or it is in an inactive state.
Severity:
Major
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
1. Recovery:
1. Create an application instance if one does not exist
2. Make the application active if the state is inactive.
3-388
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
102827 -
Event Type:
SCEF
Description:
The service provider or application cannot be resolved.
Severity:
Major
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
102828 -
Event Type:
SCEF
Description:
The request rate is higher than the rate stated in the Service Level Agreement for the
service type.
Severity:
Major
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
3-389
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
102829 -
Event Type:
SCEF
Description:
The quota for the service type stated in the Service Level Agreement is exceeded.
Severity:
Major
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
102830 -
Event Type:
SCEF
Description:
Properties from application are not allowed.
Severity:
Major
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
3-390
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
102831 -
Event Type:
SCEF
Description:
The value from a parameter in the application is not allowed.
Severity:
Major
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
1. Notify the service provider of the application behavior or update the SLA to allow
the parameter value.
2. It is recommended to contact My Oracle Support if further assistance is needed.
102832 -
Event Type:
SCEF
Description:
The RequestInfo object is empty and cannot proceed with the request.
Severity:
Major
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
3-391
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
102833 -
Event Type:
SCEF
Description:
An application tried to use a method that is not allowed according to the SLA.
Severity:
Minor
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
102834 -
Event Type:
SCEF
Description:
An application tried to use a method that is not allowed according to the SLA.
Severity:
Major
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
3-392
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
102835 -
Event Type:
SCEF
Description:
A service correlator threw an exception when it was invoked.
Severity:
Critical
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
102836 -
Event Type:
SCEF
Description:
The RequestFactory threw an exception when it was invoked.
Severity:
Critical
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
3-393
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
102837 -
Event Type:
SCEF
Description:
Could not find a global node or service provider node SLA.
Severity:
Major
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
102838 -
Event Type:
SCEF
Description:
The service contract in the SLA for the service provider group or application group
has expired.
Severity:
Major
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
3-394
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
102839 -
Event Type:
SCEF
Description:
The application or service provider group service type contract is out of date. The
service contract for the service type in the SLA for the service provider group or
application group has expired.
Severity:
Major
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
102840 -
Event Type:
SCEF
Description:
The service contract for the service type in the SLA for the service provider group or
application group could not be found.
Severity:
Major
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
3-395
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
102844 -
Event Type:
SCEF
Description:
The application or service provider group within the service contract has expired.
Severity:
Major
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
102845 -
Event Type:
SCEF
Description:
The request rate is higher than the rate specified in the composed service contract.
Severity:
Major
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
3-396
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
102846 -
Event Type:
SCEF
Description:
The quota for the composed service contract has been exceeded.
Severity:
Major
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
111007 -
Event Type:
SCEF
Description:
The value of the budget is below 20% of the maximum value.
Severity:
Minor
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
1. Inform the service provider that the request limit is closing or update the SLA.
2. It is recommended to contact My Oracle Support if further assistance is needed.
3-397
Chapter 3
SCEF (23000-23200, 102801-115001, 390000)
115001 -
Event Type:
SCEF
Description:
An SLA is about to expire.
Severity:
Warning
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
390000 -
Event Type:
SCEF
Description:
An incoming request violated a firewall policy.
Severity:
Warning
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
###
OID:
N/A
1. This is a security alert, rather than a Services Gatekeeper problem. The action you
take depends on your security policies.
2. It is recommended to contact My Oracle Support if further assistance is needed.
3-398
Chapter 3
Tekelec Virtual Operating Environment, TVOE (24400-24499)
Description:
This alarm indicates that the libvirtd daemon is not running.
Severity:
Major
HA Score:
Normal
OID:
1.3.6.1.4.1.323.5.3.31.1.1.2.1
Alarm ID:
TKSTVOEMA1
1. Recovery:
Description:
This alarm indicates that we attempted to determine if the libvirtd daemon is not
responding and it did not respond.
Severity:
Major
HA Score:
Normal
3-399
Chapter 3
Computer Aided Policy Making, CAPM (25000-25499)
OID:
1.3.6.1.4.1.323.5.3.31.1.1.2.2
Alarm ID:
TKSTVOEMA2
1. Recovery:
Description:
This alarm indicates that all twenty connections to libvirtd are in use and more could
be killed.
Severity:
Major
HA Score:
Normal
OID:
1.3.6.1.4.1.323.5.3.31.1.1.2.3
Alarm ID:
TKSTVOEMA3
1. Recovery:
3-400
Chapter 3
Computer Aided Policy Making, CAPM (25000-25499)
Description:
The Rule Template failed to update because of syntax errors. The Additional Info of
the Historical alarm includes the name of the Rule Template that failed to be updated.
When the alarm is caused by CAPM Rule Template which contains a syntax error, it
may not be raised immediately after applying the template, but may occur when the
first Rule has been provisioned and committed.
Severity:
Minor
Instance:
<ruleset> or <ruleset:rule-id>
HA Score:
Normal
OID:
eagleXgDiameterCapmUpdateFailedNotify
1. Recovery:
1. Check the CAPM Rule Template and verify that the left-hand side term of each
condition contains a valid Linking-AVP or Select expression.
A typical problem can be a non-existing expression, or syntax error of a custom-
defined Select expression. If the CAPM Rule Template contains a syntax error,
create a new Rule Template by copying and modifying the existing one, then
deleting the old Rule Template.
2. Verify also that the recently provisioned data of the Rule Template does not
contain a syntax error, i.e., the regular expressions are correct, the fields expecting
numbers contain only numbers, etc.
Description:
When a new Rule Template is created, a failure occurs when performing the action.
Severity:
Info
Instance:
<ruleset> or <ruleset:rule-id>
HA Score:
Normal
Throttle Seconds:
30
3-401
Chapter 3
Computer Aided Policy Making, CAPM (25000-25499)
OID:
eagleXgDiameterCapmActionFailedNotify
1. Recovery:
• Check the reasons the action failed. It may be a lack of system resources to
perform an action, or the action may refer to a part of the message that is not
available.
Description:
When Action Error Handling is set to ‘immediately exit from the rule template’ for the
given Rule Template and a failure occurs when performing the action, processing of
the Rule Template is stopped.
Severity:
Info
Instance:
<ruleset> or <ruleset:rule-id>
HA Score:
Normal
Throttle Seconds:
30
OID:
eagleXgDiameterCapmExitRuleFailedNotify
1. Recovery:
• No action required.
Description:
When Action Error Handling is set to ‘immediately exit from the trigger point’ for the
given Rule Template and a failure occurs when performing the action, processing of
the Rule Template is stopped (subsequent templates within the trigger point are also
skipped).
Severity:
Info
Instance:
<ruleset> or <ruleset:rule-id>
3-402
Chapter 3
Computer Aided Policy Making, CAPM (25000-25499)
HA Score:
Normal
Throttle Seconds:
30
OID:
eagleXgDiameterCapmExitTriggerFailedNotify
1. Recovery:
• No action required.
Description:
Script syntax error
Severity:
Minor
Instance:
<script name>
HA Score:
Normal
OID:
eagleXgDiameterCapmScriptLoadingFailedNotify
1. Recovery:
Description:
CAPM Generic Event
Severity:
Info
Instance:
<template-id:rule-id>
3-403
Chapter 3
Computer Aided Policy Making, CAPM (25000-25499)
HA Score:
Normal
Throttle Seconds:
30
OID:
eagleXgDiameterCapmGenericInfoAlarmNotify
1. Recovery:
Description:
CAPM Generic Alarm - Minor
Severity:
Minor
Instance:
<template-id:rule-id>
HA Score:
Normal
OID:
eagleXgDiameterCapmGenericMinorAlarmNotify
1. Recovery:
Description:
CAPM Generic Alarm - Major
Severity:
Major
Instance:
<template-id:rule-id>
3-404
Chapter 3
OAM Alarm Management (25500-25899)
HA Score:
Normal
OID:
eagleXgDiameterCapmGenericMajorAlarmNotify
1. Recovery:
Description:
CAPM Generic Alarm - Critical
Severity:
Critical
Instance:
<template-id:rule-id>
HA Score:
Normal
OID:
eagleXgDiameterCapmGenericCriticalAlarmNotify
1. Recovery:
3-405
Chapter 3
OAM Alarm Management (25500-25899)
Description:
This alarm occurs when no active DA-MP leaders have been detected.
Severity:
Critical
Instance:
<NetworkElement>
HA Score:
Normal
OID:
eagleXgDiameterNoDaMpLeaderDetectedNotify
Cause:
The alarm # 25500 raises:
• When No Active DA-MP leaders are reported by the maintenance leader.
• When there is a single DA-MP and DSR process is stopped.
• When there are multiple DA-MPs, DSR process is stopped and there is
ComAgent Connection failure between two or more DA-MP's.
The alarm clears when maintenance leader reports a single active DA-MP leader.
Diagnostic Information:
1. Examine the alarm log from Main Menu > Alarms & Events on Active SOAM
Server.
2. This alarm is raised against the Network Element when no DA-MPs report
themselves as Leader.
1. Recovery:
1. Verify the MP operational status of the DA-MP from the Diameter, and then
Maintenance, and then DA-MP active SOAM screen.
a. Verify the # Peer MPs Unavailable column displays 0 for each DA-MP server.
b. Verify all DA-MP servers are available in individual DA-MP server tabs on the
Diameter, and then Maintenance, and then DA-MP active SOAM screen.
c. Verify ComAgent inter-MP connections (auto) are in the InService state on
the Communication Agent, and then Maintenance, and then Connection
Status screen.
3-406
Chapter 3
OAM Alarm Management (25500-25899)
Description:
This alarm occurs when multiple active DA-MP leaders have been detected.
Severity:
Critical
Instance:
<NetworkElement>
HA Score:
Normal
OID:
eagleXgDiameterMultipleDaMpLeadersDetectedNotify
Cause:
The alarm #25510 raises:
• When more than one DA-MP report themselves as Leader.
• When DSR process is running on all DA-MPs and ComAgent Connection is down
between two or more DA-MP's.
The alarm clears when maintenance leader reports a single active DA-MP leader.
Diagnostic Information:
• This alarm is raised against the Network Element when multiple DA-MPs report
themselves as Leader.
• Examine the alarm log from Main Menu > Alarms & Events on Active SOAM
Server.
• When this alarm is raised Existing IPFE Connection, Route List, and Peer Node
alarms will be cleared.
• New IPFE Connection, Route List, and Peer Node alarms are suppressed.
1. Recovery:
1. Verify the MP operational status of the DA-MP from the Diameter, and then
Maintenance, and then DA-MP active SOAM screen.
a. Verify the # Peer MPs Unavailable column displays 0 for each DA-MP server.
b. Verify all DA-MP servers are available in individual DA-MP server tabs on the
Diameter, and then Maintenance, and then DA-MP active SOAM screen.
3-407
Chapter 3
OAM Alarm Management (25500-25899)
Description:
Peer discovery failure.
Severity:
Minor
Instance:
Discover_Realm_{realm_name} where {realm_name} is the full configured name of
the Realm whose discovery has failed.
HA Score:
Normal
OID:
eagleXgDiameterDpdRealmDiscoveryFailedNotify
1. Recovery:
1. Analyze event 25801 - Peer Discovery Configuration Error Encountered that has
the same instance to identify the error(s).
2. Verify the DSR and DNS configurations and fix any configuration error(s).
3. Administratively refresh the Realm.
4. It is recommended to contact My Oracle Support for assistance.
Description:
Peer discovery configuration error encountered.
Severity:
Info
Instance:
Discover_Realm_{realm_name} where {realm_name} is the full configured name of
the Realm whose discovery has encountered a configuration error.
3-408
Chapter 3
OAM Alarm Management (25500-25899)
HA Score:
Normal
Throttle Seconds:
0 (zero)
OID:
eagleXgDiameterDpdConfigErrorNotify
1. Recovery:
1. Depending on the specific error code, follow the appropriate recovery steps.
Note:
One likely cause is the number of instances of a managed object type is
at capacity, and no new instances can be created. The user can delete
unused instances of the MO type to free up capacity and try the Realm
discovery again.
Description:
Realm expiration approaching.
Severity:
Minor, Major
Instance:
Discover_Realm_{realm_name} where {realm_name} is the full configured name of
the Realm whose expiry is approaching.
HA Score:
Normal
OID:
eagleXgDiameterDpdConfigErrorNotify
1. Recovery:
1. Administratively disable the Realm.
2. Administratively extend the Realm.
3. Administratively refresh the Realm.
4. It is recommended to contact My Oracle Support for assistance.
3-409
Chapter 3
OAM Alarm Management (25500-25899)
Description:
Peer discovery - inconsistent remote host port assignment.
Severity:
Info
Instance:
Discover_Realm_{realm_name} where {realm_name} is the full configured name
of the Realm whose discovery has encountered inconsistent remote host port
assignment.
HA Score:
Normal
Throttle Seconds:
0 (zero)
OID:
eagleXgDiameterDpdInconsistentPortAssignmentNotify
1. Recovery:
• No action required. The DNS records for the Realm being discovered must be
corrected by the Realm's DNS administrator.
Description:
Peer discovery state change.
Severity:
Info
Instance:
Discover_Realm_{realm_name} where {realm_name} is the full configured name of
the Realm whose discovery state has changed.
HA Score:
Normal
Throttle Seconds:
0 (zero)
OID:
eagleXgDiameterDpdInconsistentPortAssignmentNotify
1. Recovery:
3-410
Chapter 3
Platform (31000-32800)
• No action required.
Platform (31000-32800)
This section provides information and recovery procedures for the Platform alarms,
ranging from 31000-32800.
Description:
Program impaired by s/w fault
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolSwFaultNotify
1. Recovery:
• No action is required. This event is used for command-line tool errors only.
Description:
Program status
Severity:
Info
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-411
Chapter 3
Platform (31000-32800)
OID:
comcolSwStatusNotify
1. Recovery:
• No action required.
Description:
Process watchdog timed out.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolProcWatchdogFailureNotify
1. Recovery:
1. Alarm indicates a stuck process was automatically recovered, so no additional
steps are needed.
2. If this problem persists, collect savelogs ,and it is recommended to contact My
Oracle Support.
Description:
Tab thread watchdog timed out
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
3-412
Chapter 3
Platform (31000-32800)
HA Score:
Normal
OID:
comcolThreadWatchdogFailureNotify
Cause:
This alarm is caused by an application thread which fails to respond to the platform
process management subsystem heartbeat within the defined time period. The actual
cause may vary depending on the differing threads and defined time periods.
Diagnostic Information:
Collect the following data before contacting My Oracle Support for assistance.
• iqt -Ep PmControl on the issuing server.
• Savelogs_Plat on the issuing server.
• Alarm history from active SOAM server.
1. Recovery:
1. Alarm indicates an application failed to respond to the platform process
management subsystem heartbeat within the defined period. Export event history
for the given process to narrow the actual cause.
2. If this problem persists, collect Savelogs and it is recommended to contact My
Oracle Support.
Description:
The database replication process is impaired by a software fault.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrDbRepToSlaveFailureNotify
1. Recovery:
1. Export event history for the affected server and inetsync task.
3-413
Chapter 3
Platform (31000-32800)
Description:
Database replication to a slave database has failed. This alarm is generated when:
• The replication master finds the replication link is disconnected from the slave.
• The replication master's link to the replication slave is OOS, or the replication
master cannot get the slave's correct HA state because of a failure to
communicate.
• The replication mode is relayed in a cluster and either:
– No nodes are active in cluster, or
– None of the nodes in cluster are getting replication data.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbRepToSlaveFailureNotify
Cause:
Alarm 31101 raises when:
• The replication master finds the replication link is disconnected from the slave.
• The replication master's link to the replication slave is OOS, or the replication
master could not get the slave's correct HA state as a failure to communicate.
• The replication mode is relayed in a cluster and either:
– No nodes are active in cluster, or
– None of the nodes in cluster are getting replication data.
Diagnostic Information:
1. Verify the path for all services on a node:
a. In a command interface, type path.test -a <toNode> to test the paths
for all services.
3-414
Chapter 3
Platform (31000-32800)
2. In a command interface, use the path test commands to test the communication
between nodes:
a. Run the command, iqt -pE NodeInfo to get the node ID
b. Then, run the command, path.test -a <nodeid> to test the paths for all
services
3. Examine the Platform savelogs on all MPs, SO, and NO:
a. Run the command, sudo /usr/TKLC/plat/sbin/savelogs_plat
b. The plat savelogs in the /tmp directory.
1. Recovery:
1. Verify the path for all services on a node by typing path.test –a <toNode> in
a command interface to test the paths for all services.
2. Use the path test command to test the communication between nodes by typing
iqt -pE NodeInfo to get the node ID. Then type path.test -a <nodeid>
to test the paths for all services.
3. Examine the Platform savelogs on all MPs, SO, and NO by typing sudo /usr/
TKLC/plat/sbin/savelogs_plat in the command interface. The plat
savelogs are in the /tmp directory.
4. Check network connectivity between the affected servers.
5. If there are no issues with network connectivity, contact My Oracle Support.
Description:
Database replication from a master database has failed. This alarm is generated
when the replication slave finds the replication link is disconnected from the master.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbRepFromMasterFailureNotify
Cause:
Alarm 31102 raises when the replication slave finds the replication link is
disconnected from the master.
3-415
Chapter 3
Platform (31000-32800)
Diagnostic Information
1. Verify the path for all services on a node:
a. In a command interface, run the command, path.test -a <toNode> to
test the paths for all services.
2. In a command interface, use the path test command to test the communication:
a. Run the command, iqt -pE NodeInfo to get the node ID
b. Run the command, path.test -a <nodeid> to test the communication
path
3. Examine the Platform savelogs on all MPs, SO, and NO:
a. Run the command, sudo /usr/TKLC/plat/sbin/savelogs_plat
b. The plat savelogs are in the /tmpdirectory.
1. Recovery:
1. Verify the path for all services on a node by typing path.test –a <toNode> in a
command interface to test the paths for all services.
2. Use the path test command to test the communication between nodes by typing
iqt -pE NodeInfo to get the node ID. Then type path.test -a <nodeid> to test
the paths for all services.
3. Examine the Platform savelogs on all MPs, SO, and NO by typing sudo /usr/
TKLC/plat/sbin/savelogs_plat in the command interface. The plat savelogs are
in the /tmp directory.
4. Indicates replication subsystem is unable to contact a server, due to networking
issues or because the server is not available. Investigate the status of the server
and verify network connectivity.
5. If no issues with network connectivity or the server are found and the problem
persists, it is recommended to contact My Oracle Support.
Description:
Database replication process cannot apply update to database.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-416
Chapter 3
Platform (31000-32800)
OID:
comcolDbRepUpdateFaultNotify
1. Recovery:
1. This alarm indicates a transient error occurred within the replication subsystem,
but the system has recovered, so no additional steps are needed.
2. If the problem persists, collect savelogs, and it is recommended to contact My
Oracle Support.
Description:
Database replication latency has exceeded thresholds.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrDbRepLatencyNotify
1. Recovery:
1. If this alarm is raised occasionally for short time periods (a couple of minutes
or less), it may indicate network congestion or spikes of traffic pushing servers
beyond their capacity. Consider re-engineering network capacity or subscriber
provisioning.
2. If this alarm does not clear after a couple of minutes, it is recommended to contact
My Oracle Support.
Description:
The database merge process (inetmerge) is impaired by a s/w fault
Severity:
Minor
3-417
Chapter 3
Platform (31000-32800)
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbMergeFaultNotify
1. Recovery:
1. This alarm indicates a transient error occurred within the merging subsystem, but
the system has recovered, so no additional steps are needed.
2. If the problem persists, collect savelogs, and it is recommended to contact My
Oracle Support.
Description:
Database merging to the parent Merge Node has failed.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbMergeToParentFailureNotify
Cause:
DB merging to the Parent Merge Node has failed.
Diagnostic Information:
• Check if the states are either Active or Standby (for example, none are
DownConnecting or Auditing).
• Check if there are issues with merging or replication or with communication. Can
the primary active NO talk to the server with the issue and visa versa. run the
command path.test command.
3-418
Chapter 3
Platform (31000-32800)
Note:
If checking information for an MP server, also check it's SOAM server that it
would merge to or receive replicated data from:
• soapstat -w
• irepstat -w
• inetmstat -w
• path.test -a -r
Note:
In older releases, the '-r' option is not available.
• cat /var/tmp/dbreinitstate
1. Recovery:
1. This alarm indicates the merging subsystem is unable to contact a server, due to
networking issues or because the server is not available. Investigate the status of
the server and verify network connectivity.
2. If no issues with network connectivity or the server are found and the problem
persists, it is recommended to contact My Oracle Support.
Description:
Database merging from a child Source Node has failed.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbMergeFromChildFailureNotify
1. Recovery:
3-419
Chapter 3
Platform (31000-32800)
1. This alarm indicates the merging subsystem is unable to contact a server, due to
networking issues or because the server is not available. Investigate the status of
the server and verify network connectivity.
2. If no issues with network connectivity or the server are found and the problem
persists, it is recommended to contact My Oracle Support.
Description:
Database merge latency has exceeded thresholds.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbMergeLatencyNotify
1. Recovery:
1. If this alarm is raised occasionally for short time periods (a couple of minutes
or less), it may indicate network congestion or spikes of traffic pushing servers
beyond their capacity. Consider re-engineering network capacity or subscriber
provisioning.
2. If this alarm does not clear after a couple of minutes, it is recommended to contact
My Oracle Support.
Description:
Topology is configured incorrectly.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
3-420
Chapter 3
Platform (31000-32800)
HA Score:
Normal
OID:
comcolTopErrorNotify
1. Recovery:
1. This alarm may occur during initial installation and configuration of a server. No
action is necessary at that time.
2. If this alarm occurs after successful initial installation and configuration of a server,
it is recommended to contact My Oracle Support.
Description:
The Database service process (idbsvc) is impaired by a s/w fault.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbAuditFaultNotify
1. Recovery:
1. Alarm indicates an error occurred within the database audit system, but the
system has recovered, so no additional steps are needed.
2. If this problem persists, collect savelogs, and it is recommended to contact My
Oracle Support.
Description:
Database Merge Audit between mate nodes in progress
3-421
Chapter 3
Platform (31000-32800)
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbMergeAuditNotify
1. Recovery:
• No action required.
Description:
DB Replicated data may not have transferred in the time allotted.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbRepUpLogTransTimeoutNotify
1. Recovery:
1. No action required.
2. It is recommended to contact My Oracle Support if this occurs frequently.
3-422
Chapter 3
Platform (31000-32800)
Description:
DB Replication Manually Disabled
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbReplicationManuallyDisabledNotify
1. Recovery:
• No action required.
Description:
Database replication of configuration data via SOAP has failed.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbReplicationSoapFaultNotify
1. Recovery:
1. This alarm indicates a SOAP subsystem is unable to connect to a server, due to
networking issues or because the server is not available. Investigate the status of
the server and verify network connectivity.
2. If no issues with network connectivity or the server are found and the problem
persists, it is recommended to contact My Oracle Support.
3-423
Chapter 3
Platform (31000-32800)
Description:
The Database service process (idbsvc) is impaired by a s/w fault.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbServiceFaultNotify
1. Recovery:
1. Alarm indicates an error occurred within the database disk service subsystem, but
the system has recovered, so no additional steps are needed.
2. If this problem persists, collect savelogs, and it is recommended to contact My
Oracle Support.
Description:
The amount of shared memory consumed exceeds configured thresholds.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrExcessiveSharedMemoryConsumptionNotify
3-424
Chapter 3
Platform (31000-32800)
1. Recovery:
• This alarm indicates a server has exceeded the engineered limit for shared
memory usage and there is a risk the application software will fail. Because there
is no automatic recovery for this condition, it is recommended to contact My Oracle
Support.
Description:
The amount of free disk is below configured thresholds.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrLowDiskFreeNotify
1. Recovery:
1. Remove unnecessary or temporary files from partitions.
2. If there are no files known to be unneeded, it is recommended to contact My
Oracle Support.
Description:
Writing the database to disk failed
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-425
Chapter 3
Platform (31000-32800)
OID:
comcolDbDiskStoreFaultNotify
1. Recovery:
1. Remove unnecessary or temporary files from partitions.
2. If there are no files known to be unneeded, it is recommended to contact My
Oracle Support.
Description:
The Database update log was overrun increasing risk of data loss
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbUpdateLogOverrunNotify
1. Recovery:
1. This alarm indicates a replication audit transfer took too long to complete and the
incoming update rate exceeded the engineered size of the update log. The system
will automatically retry the audit, and if successful, the alarm will clear and no
further recovery steps are needed.
2. If the alarm occurs repeatedly, it is recommended to contact My Oracle Support.
Description:
A Database change cannot be stored in the updatelog
Severity:
Minor
3-426
Chapter 3
Platform (31000-32800)
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbUpdateLogWriteFaultNotify
1. Recovery:
1. This alarm indicates an error has occurred within the database update log
subsystem, but the system has recovered.
2. If the alarm occurs repeatedly, it is recommended to contact My Oracle Support.
Description:
The amount of free disk is below configured early warning thresholds
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolLowDiskFreeEarlyWarningNotify
1. Recovery:
1. Remove unnecessary or temporary files from partitions that are greater than 80%
full.
2. If there are no files known to be unneeded, it is recommended to contact My
Oracle Support.
3-427
Chapter 3
Platform (31000-32800)
Description:
The amount of shared memory consumed exceeds configured early warning
thresholds
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolExcessiveShMemConsumptionEarlyWarnNotify
1. Recovery:
1. This alarm indicates that a server is close to exceeding the engineered limit for
shared memory usage and the application software is at risk to fail. There is no
automatic recovery or recovery steps.
2. It is recommended to contact My Oracle Support.
Description:
ADIC found one or more errors that are not automatically fixable.
Severity:
Info
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbRepAuditCmdCompleteNotify
1. Recovery:
• No action required.
3-428
Chapter 3
Platform (31000-32800)
Description:
An ADIC detected errors
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbRepAuditCmdErrNotify
1. Recovery:
Description:
Database durability has dropped below configured durability level.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrDbDurabilityDegradedNotify
1. Recovery:
3-429
Chapter 3
Platform (31000-32800)
1. Check configuration of all servers, and check for connectivity problems between
server addresses.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
Site audit controls blocked an inter-site replication audit due to the number in progress
per configuration.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrAuditBlockedNotify
1. Recovery:
• This alarm indicates the WAN network usage has been limited following a site
recovery. No recovery action is needed.
Description:
DB replication audit completed.
Severity:
Info
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-430
Chapter 3
Platform (31000-32800)
OID:
comcolDbRepAuditCompleteNotify
1. Recovery:
• No action required.
Description:
ADIC found one or more errors that are not automatically fixable.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrDbADICErrorNotify
1. Recovery:
1. This alarm indicates a data integrity error was found by the background database
audit mechanism, and there is no automatic recovery.
2. It is recommended to contact My Oracle Support.
Description:
ADIC found one or more minor issues that can most likely be ignored.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-431
Chapter 3
Platform (31000-32800)
OID:
comcolDbADICWarn
1. Recovery:
• No action required.
Description:
Network health issue detected.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolNetworkHealthWarningNotify
1. Recovery:
1. Check configuration of all servers, and check for connectivity problems between
server addresses.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
DB ousted throttle may be affecting processes.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
3-432
Chapter 3
Platform (31000-32800)
HA Score:
Normal
OID:
comcolOustedThrottleWarnNotify
1. Recovery:
1. This alarm indicates a process has failed to release database memory segments,
which is preventing new replication audits from taking place. There is no automatic
recovery for this failure.
2. Run procshm -o to identify involved processes.
3. It is recommended to contact My Oracle Support.
Description
Standby database updates are falling behind. Relaxing the replication barrier to allow
non-standby databases to update as fast as possible.
Severity
Info
Instance
Remote Node Name + HA resource name (if Policy 0, no resource name)
HA Score
Normal
Throttle Seconds
150
OID
comcolDbRepPrecRelaxedNotify
1. Recovery
• No action required.
Description
DB replication active to standby switchover exceeded maximum switchover time.
3-433
Chapter 3
Platform (31000-32800)
Severity
Major
Instance
Remote Node Name + HA resource name (if Policy 0, no resource name)
HA Score
Normal
OID
eagleXgDsrDbRepSwitchoverNotify
1. Recovery
1. If this alarm is raised, it may indicate network congestion or spikes of traffic
pushing servers beyond their capacity. Consider re-engineering network capacity
or subscriber provisioning.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description
DB site replication to a slave DB has failed.
Severity
Minor
Instance
Remote Node Name + HA resource name (if Policy 0, no resource name)
HA Score
Normal
OID
comcolDbSiteRepToSlaveFailureNotify
1. Recovery
1. Check configuration of all servers, and check for connectivity problems between
server addresses.
2. If the problem persists, it is recommended to contact My Oracle Support.
3-434
Chapter 3
Platform (31000-32800)
Description
DB site replication from a master DB has failed.
Severity
Minor
Instance
Remote Node Name + HA resource name (if Policy 0, no resource name)
HA Score
Normal
OID
comcolDbSiteRepFromMasterFailureNotify
1. Recovery
1. Check configuration of all servers, and check for connectivity problems between
server addresses.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description
Standby site database updates are falling behind. Relaxing the replication barrier to
allow non-standby site databases to update as fast as possible.
Severity
Info
Instance
Remote Node Name + HA resource name (if Policy 0, no resource name)
HA Score
Normal
Throttle Seconds
150
OID
comcolDbSiteRepPrecRelaxedNotify
1. Recovery
• No action required.
3-435
Chapter 3
Platform (31000-32800)
Description
DB site replication latency has exceeded thresholds.
Severity
Major
Instance
Remote Node Name + HA resource name (if Policy 0, no resource name)
HA Score
Normal
OID
eagleXgDsrDbSiteRepLatencyNotify
1. Recovery
1. If this alarm is raised occasionally for short time periods (a couple of minutes
or less), it may indicate network congestion or spikes of traffic pushing servers
beyond their capacity. Consider re-engineering network capacity or subscriber
provisioning.
2. If this alarm does not clear after a couple of minutes, it is recommended to contact
My Oracle Support.
Description:
Perl interface to Database is impaired by a s/w fault
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-436
Chapter 3
Platform (31000-32800)
OID:
comcolDbPerlFaultNotify
1. Recovery:
1. This alarm indicates an error has occurred within a Perl script, but the system has
recovered.
2. If the alarm occurs repeatedly, it is recommended to contact My Oracle Support.
Description:
SQL interface to Database is impaired by a s/w fault
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbSQLFaultNotify
1. Recovery:
1. This alarm indicates an error has occurred within the MySQL subsystem, but the
system has recovered.
2. If this alarm occurs frequently, it is recommended to collect savelogs and contact
My Oracle Support.
Description:
DB replication is impaired due to no mastering process (inetrep/inetrep).
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
3-437
Chapter 3
Platform (31000-32800)
HA Score:
Normal
OID:
eagleXgDsrDbMastershipFaultNotify
1. Recovery:
1. Export event history for the given server.
2. It is recommended to contact My Oracle Support.
Description:
UpSyncLog is not big enough for (WAN) replication.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbUpSyncLogOverrunNotify
1. Recovery:
1. This alarm indicates that an error occurred within the database replication
subsystem. A replication audit transfer took too long to complete, and during the
audit the incoming update rate exceeded the engineered size of the update log.
The replication subsystem will automatically retry the audit, and if successful, the
alarm will clear.
2. If the alarm occurs repeatedly, it is recommended to contact My Oracle Support.
3-438
Chapter 3
Platform (31000-32800)
Description:
The DB service process (idbsvc) has detected an IDB lock-related error caused by
another process. The alarm likely indicates a DB lock-related programming error, or it
could be a side effect of a process crash.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolDbLockErrorNotify
1. Recovery:
1. This alarm indicates an error occurred within the database disk service subsystem,
but the system has recovered.
2. If this alarm occurs repeatedly, it is recommended to contact My Oracle Support.
Description
Application wrote to database while HA role change from active was in progress.
Severity
Minor
Instance
HA resource name
HA Score
Normal
OID
comcolDbLateWriteNotify
1. Recovery
3-439
Chapter 3
Platform (31000-32800)
Description:
Database health impacted
Severity:
Critical
Instance:
xxx
HA Score:
xxx
OID:
xxx
1. Recovery:
Description
Persistent database failure
Severity
Critical
Instance:
xxx
HA Score:
xxx
OID:
xxx
1. Recovery:
3-440
Chapter 3
Platform (31000-32800)
Description:
The process manager (procmgr) is impaired by a s/w fault
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolProcMgmtFaultNotify
1. Recovery:
1. This alarm indicates an error occurred within the process management subsystem,
but the system has recovered.
2. If this alarm occurs repeatedly, it is recommended to contact My Oracle Support.
Description:
A managed process cannot be started or has unexpectedly terminated.
Severity:
Critical
Instance:
May include process name
HA Score:
Normal
OID:
eagleXgDsrProcNotRunningNotify
3-441
Chapter 3
Platform (31000-32800)
Cause:
Internal error occurs and application shut down abruptly. A managed process cannot
be started or has been terminated unexpectedly .
Diagnostic Information:
1. If this alarm is observed during installation of DSR system, and alarm instance
is EXGSTACK_Process, make sure the DAMP Profile Assignment procedure is
complete on the active SOAM for all DA-MPs.
2. During application start and shutdown, a temporary error may result while
restarting the application.
a. The alarm automatically clears in 300 seconds if it was caused by a
temporary error that no longer exists now.
b. The alarm exists, if the error is not recovered.
3. If alarm is raised after any unapproved configuration change, try to revert back the
configuration and check if alarm clears.
Note:
In a few cases, the alarm may stay for more than 300 seconds even if
error condition is corrected. In such cases, wait for 300 seconds after
corrective actions, before reporting it.
1. Recovery:
1. This alarm indicates a managed process cannot be started and has unexpectedly
terminated.
2. It is recommended to contact My Oracle Support.
Description:
A zombie process exists that cannot be killed by procmgr. procmgr no longer
manages this process.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-442
Chapter 3
Platform (31000-32800)
OID:
eagleXgDsrProcZombieProcessNotify
1. Recovery:
1. This alarm indicates a managed process exited unexpectedly and was unable to
be restarted automatically.
2. It is recommended to collect savelogs and contact My Oracle Support.
Description:
The process manager monitor (pm.watchdog) is impaired by a s/w fault
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolProcMgmtMonFaultNotify
1. Recovery:
1. This alarm indicates an error occurred within the process management subsystem,
but the system has recovered.
2. If this alarm occurs repeatedly, it is recommended to contact My Oracle Support.
Description:
The process resource monitor (ProcWatch) is impaired by a s/w fault
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
3-443
Chapter 3
Platform (31000-32800)
HA Score:
Normal
OID:
comcolProcResourceMonFaultNotify
1. Recovery:
1. This alarm indicates an error occurred within the process monitoring subsystem,
but the system has recovered.
2. If this alarm occurs repeatedly, it is recommended to contact My Oracle Support.
Description:
The run environment port mapper (re.portmap) is impaired by a s/w fault
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolPortServerFaultNotify
1. Recovery:
1. This alarm indicates an error occurred within the port mapping subsystem, but the
system has recovered.
2. If this alarm occurs repeatedly, it is recommended to contact My Oracle Support.
Description:
Unable to resolve a hostname specified in the NodeInfo table.
Severity:
Minor
3-444
Chapter 3
Platform (31000-32800)
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolHostLookupFailedNotify
1. Recovery:
1. This typically indicates a DNS Lookup failure. Verify all server hostnames are
correct in the GUI configuration on the server generating the alarm.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The process scheduler (ProcSched/runat) is impaired by a s/w fault
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolProcSchedulerFaultNotify
1. Recovery:
1. This alarm indicates an error occurred within the process management subsystem,
but the system has recovered.
2. If this alarm occurs repeatedly, it is recommended to contact My Oracle Support.
3-445
Chapter 3
Platform (31000-32800)
Description:
A scheduled process cannot be executed or abnormally terminated
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolScheduleProcessFaultNotify
1. Recovery:
1. This alarm indicates that a managed process exited unexpectedly due to a
memory fault, but the system has recovered.
2. It is recommended to contact My Oracle Support.
Description:
A process is consuming excessive system resources.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolProcResourcesExceededFaultNotify
1. Recovery:
1. This alarm indicates a process has exceeded the engineered limit for heap usage
and there is a risk the application software will fail.
2. Because there is no automatic recovery for this condition, it is recommended to
contact My Oracle Support.
3-446
Chapter 3
Platform (31000-32800)
Description:
A SysMetric Configuration table contains invalid data
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolSysMetricConfigErrorNotify
1. Recovery:
1. This alarm indicates a system metric is configured incorrectly.
2. It is recommended to contact My Oracle Support.
Description
Missed heartbeats detected.
Severity
Minor
Instance
IP Address
HA Score
Normal
OID
comcolNetworkHealthWarningNotify
1. Recovery
3-447
Chapter 3
Platform (31000-32800)
1. Check configuration of all servers, and check for connectivity problems between
server addresses.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The HA configuration monitor is impaired by a s/w fault.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolHaCfgMonitorFaultNotify
1. Recovery:
Description:
The high availability alarm monitor is impaired by a s/w fault.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-448
Chapter 3
Platform (31000-32800)
OID:
comcolHaAlarmMonitorFaultNotify
1. Recovery:
Description:
High availability is disabled due to system configuration.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolHaNotConfiguredNotify
1. Recovery:
Description:
The high availability monitor failed to send heartbeat.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-449
Chapter 3
Platform (31000-32800)
OID:
eagleXgDsrHaHbTransmitFailureNotify
1. Recovery:
1. This alarm clears automatically when the server successfully registers for HA
heartbeating.
2. If this alarm does not clear after a couple minutes, it is recommended to contact
My Oracle Support.
Description:
High availability configuration error.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrHaCfgErrorNotify
1. Recovery:
1. This alarm indicates a platform configuration error in the high availability or VIP
management subsystem.
2. Because there is no automatic recovery for this condition, it is recommended to
contact My Oracle Support.
Description:
The required high availability resource failed to start.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
3-450
Chapter 3
Platform (31000-32800)
HA Score:
Normal
OID:
eagleXgDsrHaSvcStartFailureNotify
Cause:
The COMCOL module reports the 31225 alarm when the required HA resource fail to
start.
Diagnostic Information:
On the active NO, get the content of the following these tables by executing the
commands:
• iqt -E HaClusterPolicyCfg
• iqt -E HaClusterResourceCfg
• iqt -E HaNodeLocPref
• iqt -E HaResourceCfg
• ha.info on active NO, SO and all MPs
1. Recovery:
1. This alarm clears automatically when the HA daemon successfully starts.
2. If this alarm does not clear after a couple minutes, collect logs in Diagnostic
information and it is recommended to contact My Oracle Support.
Description:
The high availability status is degraded due to raised alarms.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrHaAvailDegradedNotify
1. Recovery:
3-451
Chapter 3
Platform (31000-32800)
Description:
The high availability status is failed due to raised alarms.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrHaAvailFailedNotify
Cause:
This alarm raises when there are alarms with haScore="FAILED", and displayed in the
GUI.
Diagnostic Information:
• Get the iqt -E RecentAlarmEv.1 result on active SO server.
• Get Savelogs on active SO server.
• Get err.show output on active SO server.
1. Recovery:
1. View alarms dashboard for other active alarms on this server.
2. Follow corrective actions for each individual alarm on the server to clear them.
3. If the problem persists, collect logs in Diagnostic information and it is
recommended to contact My Oracle Support.
Description:
High availability standby server is offline.
3-452
Chapter 3
Platform (31000-32800)
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrHaStandbyOfflineNotify
Cause:
There are HA heartbeat messages among the servers. If the servers, such as NO and
SO, cannot get the HA heartbeat from its mate even after trying several times, the
alarm raises. The default interval time is 250 ms. The alarm raises after retrying five
times.
Diagnostic Information:
To diagnose the alarm further, perform the following:
• The platform savelogs on active NO and SO servers.
• Get iqt -E HaCfg from active NO and SO servers.
1. Recovery:
1. If loss of communication between the active and standby servers is caused
intentionally by maintenance activity, the alarm can be ignored. It clears
automatically when communication is restored between the two servers.
2. If communication fails at any other time, look for network connectivity issues and it
is recommended to contact My Oracle Support, if needed.
3. A workaround for this problem is to increase the failCount values for all server
groups in the HaCfg table. Bumping it from 5 to 10 should solve the problem.
Check with the application team before applying this workaround. Run the iset
-ffailCount=10 HaCfg command on the active NO where "1=1".
Note:
This command is disruptive and causes active servers in the entire
topology to lose service for about one minute while HA is reconfigured. A
new server may be selected as active after the change is applied. If less
disruption is required, you can apply the change one server group at a
time as an alternative.
3-453
Chapter 3
Platform (31000-32800)
Description:
High availability health score changed.
Severity:
Info
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolHaScoreChangeNotify
1. Recovery:
Description:
The recent alarm event manager (raclerk) is impaired by a s/w fault.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolRecAlarmEvProcFaultNotify
1. Recovery:
1. This alarm indicates an error occurred within the alarm management subsystem,
but the system has recovered.
2. If this alarm occurs repeatedly, it is recommended to contact My Oracle Support.
3-454
Chapter 3
Platform (31000-32800)
Description:
The platform alarm agent impaired by a s/w fault
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolPlatAlarmAgentNotify
1. Recovery:
1. This alarm indicates an error occurred within the alarm management subsystem,
but the system has recovered.
2. If this alarm occurs repeatedly, it is recommended to contact My Oracle Support.
Description:
High availability server has not received a message on specified path within the
configured interval.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolHaLateHeartbeatWarningNotify
3-455
Chapter 3
Platform (31000-32800)
1. Recovery:
Description:
High availability path loss of connectivity.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrHaPathDownNotify
1. Recovery:
1. If loss of communication between the active and standby servers over the
secondary path is caused intentionally by maintenance activity, alarm can be
ignored; it clears automatically when communication is restored between the two
servers.
2. If communication fails at any other time, look for network connectivity issues on
the secondary network.
3. It is recommended to contact My Oracle Support.
Description:
Upon system initialization, the system time is not trusted probably because NTP is
misconfigured or the NTP servers are unreachable. There are often accompanying
Platform alarms to guide correction. Generally, applications are not started if time is
not believed to be correct on start-up. Recovery often requires rebooting the server.
Severity:
Critical
3-456
Chapter 3
Platform (31000-32800)
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrUtrustedTimeOnInitNotify
Cause:
• NTP is misconfigured
• NTP servers are unreachable
• NTP service not running
Diagnostic Information:
There are often accompanying Platform alarms to guide correction. Applications do
not start if time is not accurate on start-up. Recovery often requires rebooting the
server.
1. Recovery:
1. Correct NTP configuration.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
After system initialization, the system time has become untrusted probably because
NTP has reconfigured improperly, time has been manually changed, the NTP servers
are unreachable, or the NTP service (ntpd process) has stopped. There are often
accompanying Platform alarms to guide correction. Generally, applications remain
running, but time-stamped data are likely incorrect, reports may be negatively
affected, or some behavior may be improper.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-457
Chapter 3
Platform (31000-32800)
OID:
eagleXgDsrUtrustedTimePostInitNotify
Cause:
• NTP has reconfigured improperly after system initialization
• System time has been manually changed
• The NTP servers have become unreachable
• NTP service (ntpd process) stopped
Diagnostic Information:
There are often accompanying Platform alarms to guide correction.
1. Recovery:
1. Correct NTP configuration.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
High availability TCP link is down.
Severity:
Critical
Instance:
Remote node being connected to plus the path identifier.
HA Score:
Normal
OID:
eagleXgDsrHaLinkDownNotify
1. Recovery:
1. If loss of communication between the active and standby servers over the
specified path is caused intentionally by maintenance activity, alarm can be
ignored; it clears automatically when communication is restored between the two
servers.
2. If communication fails at any other time, it is recommended to look for network
connectivity issues on the primary network and/or contact My Oracle Support.
3-458
Chapter 3
Platform (31000-32800)
Description:
The measurements collector (statclerk) is impaired by a s/w fault.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolMeasCollectorFaultNotify
1. Recovery:
1. This alarm indicates that an error within the measurement subsystem has
occurred, but that the system has recovered.
2. If this alarm occurs repeatedly, it is recommended to collect savelogs and contact
My Oracle Support.
Description:
The IP service port mapper (re.portmap) is impaired by a software fault.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolRePortMappingFaultNotify
3-459
Chapter 3
Platform (31000-32800)
1. Recovery:
• This typically indicates a DNS Lookup failure. Verify all server hostnames are
correct in the GUI configuration on the server generating the alarm.
Description:
The SNMP agent (cmsnmpa) is impaired by a software fault.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrDbcomcolSnmpAgentNotify
1. Recovery:
1. This alarm indicates an error occurred within the SNMP subsystem, but the
system has recovered.
2. If this alarm occurs repeatedly, it is recommended to collect savelogs and contact
My Oracle Support.
Description
A SNMP configuration error was detected.
Severity
Minor
Instance
comcolAlarmSrcNode, comcolAlarmNumber, comcolAlarmInstance,
comcolAlarmSeverity, comcolAlarmText, comcolAlarmInfo, comcolAlarmGroup,
comcolServerHostname, comcolAlarmSequence, comcolAlarmTimestamp,
comcolAlarmEventType, comcolAlarmProbableCause, comcolAlarmAdditionalInfo
HA Score
Normal
3-460
Chapter 3
Platform (31000-32800)
OID
comcolSnmpConfigNotify
1. Recovery
1. Export event history for the given server and all processes.
2. It is recommended to contact My Oracle Support for assistance.
Description:
Logging output set to Above Normal
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolLoggingOutputNotify
1. Recovery:
Description:
HA active to standby activity transition.
Severity:
Info
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
3-461
Chapter 3
Platform (31000-32800)
HA Score:
Normal
OID:
comcolActiveToStandbyTransNotify
1. Recovery:
1. If this alarm occurs during routine maintenance activity, it may be ignored.
2. Otherwise, it is recommended to contact My Oracle Support.
Description:
HA standby to active activity transition.
Severity:
Info
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolStandbyToActiveTransNotify
1. Recovery:
1. If this alarm occurs during routine maintenance activity, it may be ignored.
2. Otherwise, it is recommended to contact My Oracle Support.
Description:
The HA manager (cmha) is impaired by a software fault.
Severity:
Minor
3-462
Chapter 3
Platform (31000-32800)
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolHaMgmtFaultNotify
1. Recovery:
1. This alarm indicates an error occurred within the high availability subsystem, but
the system has automatically recovered.
2. If the alarm occurs frequently, it is recommended to contact My Oracle Support.
Description:
Highly available server failed to receive mate heartbeats.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrHaServerOfflineNotify
Cause:
The 31283 alarm presents for nodes in the topology that you should be connected
to (for example, not OOS), but that we do not have any TCP links to it over any
configured paths. It does not matter why the links were not established (for example,
networking connectivity, and node not running, etc.).
Diagnostic Information:
Show the alarms that affect the node's HA score:
iqt -h -fpart,no -fsrcNode,no -fsrcTimeStamp,no -p
AppEventLog.0 where "eventNumber in (`iqt -S, -zhp -fnumber
AppEventDef where "haScore != 0" | sed -e's/,$//'`)"
1. Recovery:
3-463
Chapter 3
Platform (31000-32800)
Description:
High availability remote subscriber has not received a heartbeat within the configured
interval.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolHaRemoteHeartbeatWarningNotify
1. Recovery:
1. No action required. This is a warning and can be due to transient conditions. The
remote subscriber will move to another server in the cluster.
2. If there continues to be no heartbeat from the server, it is recommended to contact
My Oracle Support.
Description:
High availability node join recovery entered.
Severity:
Info
Instance:
Cluster set key of the DC outputting the event
HA Score:
Normal
3-464
Chapter 3
Platform (31000-32800)
OID:
comcolHaSbrEntryNotify
1. Recovery:
Description:
High availability node join recovery plan.
Severity:
Info
Instance:
Names of HA Policies (as defined in HA policy configuration)
HA Score:
Normal
OID:
comcolHaSbrPlanNotify
1. Recovery:
Description:
High availability node join recovery complete.
Severity:
Info
Instance:
Names of HA Policies (as defined in HA policy configuration)
HA Score:
Normal
3-465
Chapter 3
Platform (31000-32800)
OID:
comcolHaSbrCompleteNotify
1. Recovery:
Description
High availability site configuration error.
Severity
Critical
Instance
GroupName, Policy ID, Site Name
HA Score
Normal
OID
eagleXgDsrHaBadSiteCfgNotify
1. Recovery
• If this alarm does not clear after correcting the configuration, it is recommended to
contact My Oracle Support for assistance.
Description:
HA manager (cmha) status.
Severity:
Info
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
3-466
Chapter 3
Platform (31000-32800)
HA Score:
Normal
OID:
comcolHaProcessStatusNotify
1. Recovery:
Description:
HA DC election status.
Severity:
Info
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolHaElectionStatusNotify
1. Recovery:
Description:
HA policy plan status.
Severity:
Info
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
3-467
Chapter 3
Platform (31000-32800)
HA Score:
Normal
OID:
comcolHaPolicyStatusNotify
1. Recovery:
Description:
This alarm is raised for nodes in our topology that we should be connected to (for
example, not OOS), but that we do not have any TCP links to it over any configured
paths. It does not matter why the links were not established (networking connectivity,
node not running, etc.).
Severity:
Info
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolHaRaLinkStatusNotify
1. Recovery:
1. If loss of communication between the active and standby servers is caused
intentionally by maintenance activity, alarm can be ignored. It clears automatically
when communication is restored between the two servers.
2. If communication fails at any other time, look for network connectivity issues.
3. If the problem persists, it is recommended to contact My Oracle Support.
3-468
Chapter 3
Platform (31000-32800)
Description:
HA resource registration status.
Severity:
Info
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolHaResourceStatusNotify
1. Recovery:
Description:
HA resource action status.
Severity:
Info
Instance
N/A
HA Score:
Normal
OID:
comcolHaActionStatusNotify
1. Recovery:
3-469
Chapter 3
Platform (31000-32800)
Description:
HA monitor action status.
Severity:
Info
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolHaMonitorStatusNotify
1. Recovery:
Description:
HA resource agent information.
Severity:
Info
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolHaRaInfoNotify
1. Recovery:
3-470
Chapter 3
Platform (31000-32800)
Description:
Resource agent application detailed information.
Severity:
Info
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolHaRaDetailNotify
1. Recovery:
Description:
HA notification status.
Severity:
Info
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolHaNotificationNotify
1. Recovery:
• No action required.
3-471
Chapter 3
Platform (31000-32800)
Description:
HA control action status.
Severity:
Info
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
comcolHaControlNotify
1. Recovery:
• No action required.
Description:
HA topology events.
Severity:
Info
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrHaTopologyNotify
1. Recovery:
• No action required.
3-472
Chapter 3
Platform (31000-32800)
Description
High availability configuration error.
Severity
Minor
Instance
NodeID, or HA Tunnel ID
HA Score
Normal
OID
comcolHaBadCfgNotify
1. Recovery
Description:
Breaker panel breaker unavailable.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdBrkPnlFeedUnavailable
1. Recovery:
3-473
Chapter 3
Platform (31000-32800)
Description:
Breaker panel breaker failure.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdBrkPnlBreakerFailure
1. Recovery
Description:
Breaker panel monitoring failure.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdBrkPnlMntFailure
1. Recovery
3-474
Chapter 3
Platform (31000-32800)
Description:
Power feed unavailable.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdPowerFeedUnavail
1. Recovery
Description:
Power supply 1 failure.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdPowerSupply1Failure
1. Recovery
3-475
Chapter 3
Platform (31000-32800)
Description:
Power supply 2 failure.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdPowerSupply2Failure
1. Recovery
Description:
Power supply 3 failure.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdPowerSupply3Failure
3-476
Chapter 3
Platform (31000-32800)
1. Recovery
Description:
Raid feed unavailable.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdRaidFeedUnavailableNotify
1. Recovery
Description:
Raid power 1 failure.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-477
Chapter 3
Platform (31000-32800)
OID:
tpdRaidPower1Failure
1. Recovery
Description:
Raid power 2 failure.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdRaidPower2Failure
1. Recovery
Description:
Raid power 3 failure.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-478
Chapter 3
Platform (31000-32800)
OID:
tpdRaidPower3Failure
1. Recovery
Description:
Device failure.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdDeviceFailureNotify
1. Recovery:
Description:
This alarm indicates either the IP bond is not configured or is down.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
3-479
Chapter 3
Platform (31000-32800)
HA Score:
Normal
OID:
eagleXgDsrTpdDeviceIfFailureNotify
Cause:
This alarm indicates either the IP bond is not configured or down.
Diagnostic Information:
• Syscheck can be manually executed in the following methods:
– Login as syscheck. When logging in, syscheck runs and then the login
connection is dropped. This account does not have shell access.
– From the root accoun,t the Command Line Interface can be utilized directly.
* Execute syscheck -h for usage information.
– In DSR 6.0 and later, from the admusr account the Command Line Interface
can be used directly when called using sudo.
* Execute syscheck -h for usage information.
– Using the platcfg user interface.
Note:
In versions later than TPD 6.5, root access using SSH is disabled.
The admusr should be used instead. If the command is to be run as
admusr, sudo must be prepended to the command and the full path
to the command must be used.
1. Recovery:
1. Run syscheck in verbose mode by executing syscheck -h for usage
information.
2. Investigate the failed bond and slave devices configuration using netAdm query:
• sudo /usr/TKLC/plat/bin/netAdm query --device=<bondX>
• sudo /usr/TKLC/plat/bin/netAdm query --device=<slave
device>
3. Determine if the failed bond and slave devices have been administratively shut
down or have operational issues:
• cat /proc/net/bonding/bondX, where X is bond designation
• ethtool <slave device>
3-480
Chapter 3
Platform (31000-32800)
4. If bond and slaves are healthy, attempt to administratively bring bond up:
• ifup bondX
5. If condition persists, contact My Oracle Support and provide the system health
check output and output of steps 1 through 4.
6. It is recommended to contact My Oracle Support to request hardware
replacement.
Description:
This alarm indicates the chipset has detected an uncorrectable (multiple-bit) memory
error the ECC (Error-Correcting Code) circuitry in the memory is unable to correct.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdEccUncorrectableErrorNotify
Alarm ID:
TKSPLATCR14
Cause:
This alarm indicates chipset has detected an uncorrectable (multiple-bit) memory
error the ECC (Error-Correcting Code) circuitry in the memory is unable to correct.
Diagnostic Information:
Syscheck can be manually executed using the following methods:
• Login as syscheck. When logging in, syscheck runs and the login connection is
dropped. This account does not have shell access.
• From the root account the Command Line Interface can be used directly.
– Execute syscheck -h for usage information.
• In DSR 6.0 and later, from the admusr account the Command Line Interface can
be used directly when called using sudo.
– Execute syscheck -h for usage information.
• Through the platcfg user interface.
3-481
Chapter 3
Platform (31000-32800)
Note:
In versions later than TPD 6.5, root access using SSH is disabled. The
admusr should be used instead. If the command needs to be run as admusr,
sudo must be prepended to the command and the full path to the command
must be used.
1. Recovery:
Description:
The server failed to receive SNMP information from the switch.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdSNMPGetFailureNotify
Alarm ID:
TKSPLATCR15
Cause:
This alarm indicates the server failed to get SNMP information from the device
configured in the SNMPGET syscheck test.
Diagnostic Information:
Syscheck can be manually executed using the following methods:
• Login as syscheck. When logging in, syscheck runs and the login connection is
dropped. This account does not have shell access.
• From the root account the Command Line Interface can be used directly.
– Execute syscheck -h for usage information.
3-482
Chapter 3
Platform (31000-32800)
• In DSR 6.0 and later, from the admusr account the Command Line Interface can
be used directly when called using sudo.
– Execute syscheck -h for usage information.
• Using the platcfg user interface.
Note:
In versions later than TPD 6.5, root access using SSH is disabled. The
admusr should be used instead. If the command needs to be run as admusr,
sudo must be prepended to the command and the full path to the command
must be used.
1. Recovery:
1. Verify the device is active and responds to the ping command.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
This alarm indicates the server's current time precedes the timestamp of the last
known time the server's time was good.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdNTPDaemonNotSynchronizedFailureNotify
Alarm ID:
TKSPLATCR16
Cause:
The server's current time precedes the timestamp of the last known time when the
server's time was good.
Diagnostic Information:
N/A.
1. Recovery:
3-483
Chapter 3
Platform (31000-32800)
1. Verify NTP settings and NTP sources are providing accurate time.
a. Ensure ntpd service is running with correct options: -x -g.
b. Verify the content of the /etc/ntp.conf file is correct for the server.
c. Type /usr/sbin/ntpdc -c sysinfo to check the current state of the ntpd
daemon.
d. Verify the ntp peer configuration; execute ntpq -np; and analyze the output.
Verify peer data, such as tally code (first column before remote), remote, refid,
stratum (st), and jitter, are valid for server.
e. Execute ntpstat to determine the ntp time synchronization status. If not
synchronized or the stratum is not correct for server, then ping the ntp peer to
determine if peer can be reached.
2. If ntp peer is reachable, then restart the ntpd service.
3. If problem persists, then a reset of the NTP date may resolve the issue.
Note:
Before resetting the ntp date, the applications may need to be stopped;
and subsequent to the ntp reset, the application restarted.
• Reset ntpd:
• sudo service ntpd stop
• sudo ntpdate <ntp server IP>
• sudo service ntpd start
4. Confirm recommended NTP topology and strategy.
• No fewer than tree references are recommended.
• If selecting a different number, the number should be odd.
• No intermediate reference should be on a virtualized server.
• Additional recommendations and topology are available in the NTP strategy
section in the DSR Hardware and Software Installation 1/2 customer
document.
5. If the problem persists, it is recommended to contact My Oracle Support.
Description:
This alarm indicates the server's current time precedes the timestamp of the last
known time the servers time was good.
Severity:
Critical
3-484
Chapter 3
Platform (31000-32800)
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdNTPTimeGoneBackwardsNotify
Alarm ID:
TKSPLATCR17
Cause:
The server's current time precedes the timestamp of the last known time when the
servers time was good.
Diagnostic Information:
N/A.
1. Recovery:
1. Verify NTP settings and NTP sources are providing accurate time.
a. Ensure ntpd service is running with correct options: -x -g
b. Verify the content of the /etc/ntp.conf file is correct for the server.
c. Type /usr/sbin/ntpdc -c sysinfo to check the current state of the ntpd
daemon.
d. Verify the ntp peer configuration; execute ntpq -p; and analyze the output.
Verify peer data, such as tally code (first column before remote), remote, refid,
stratum (st), and jitter, are valid for server.
e. Execute ntpstat to determine the ntp time synchronization status. If not
synchronized or the stratum is not correct for server, then ping the ntp peer to
determine if peer can be reached.
2. If ntp peer is reachable, then restart the ntpd service.
3. If problem persists, then a reset of the NTP date may resolve the issue.
Note:
Before resetting the ntp date, the applications may need to be stopped;
and subsequent to the ntp reset, the application restarted.
• Reset ntpd:
• sudo service ntpd stop
• sudo ntpdate <ntp server IP>
• sudo service ntpd start
3-485
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates the NTP offset of the server currently being synced to is greater
than the critical threshold.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrNtpOffsetCheckFailureNotify
Alarm ID:
TKSPLATCR18
Cause:
The NTP offset of the server currently being synced to is greater than the critical
threshold.
Diagnostic Information:
Run ntpstat command to diagnose the alarm.
1. Recovery:
1. Verify NTP settings and NTP sources can be reached.
a. Ensure ntpd service is running using ps -ef | grep or service ntpd
status.
b. Verify the content of the /etc/ntp.conf file is correct for the server.
c. Type /usr/sbin/ntpdc -c sysinfo to check the current state of the ntpd
daemon.
3-486
Chapter 3
Platform (31000-32800)
d. Verify the ntp peer configuration; execute ntpq -p; and analyze the output.
Verify peer data, such as tally code (first column before remote), remote, refid,
stratum (st), and jitter, are valid for server.
e. Execute ntpstat to determine the ntp time synchronization status. If not
synchronized or the stratum is not correct for server, then ping the ntp peer to
determine if the peer can be reached.
2. If ntp peer is reachable, then restart the ntpd service.
3. If problem persists, then a reset of the NTP date may resolve the issue.
Note:
Before resetting the ntp date, the applications may need to be stopped;
and subsequent to the ntp reset, the application restarted.
• To reset date:
• sudo service ntpd stop
• sudo ntpdate <ntp server IP>
• sudo service ntpd start
4. Confirm to recommended NTP topology and strategy.
• No fewer than tree references are recommended.
• If selecting a different number, the number should be odd.
• No intermediate reference should be a virtualized server.
• Additional recommendations and topology are available in the NTP strategy
section in the DSR Hardware and Software Installation 1/2 customer
document.
5. If the problem persists, it is recommended to contact My Oracle Support.
Description:
This alarm indicates a fan on the application server is either failing or has failed
completely. In either case, there is a danger of component failure due to overheating.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-487
Chapter 3
Platform (31000-32800)
OID:
eagleXgDsrTpdFanErrorNotify
Alarm ID:
TKSPLATMA1
1. Recovery:
1. Run Syscheck in Verbose mode to determine which server fan assemblies is
failing and replace the fan assembly.
2. If the problem persists, it is recommended to contact My Oracle Support.
Description:
This alarm indicates the server is experiencing issues replicating data to one or more
of its mirrored disk drives. This could indicate that one of the server’s disks has either
failed or is approaching failure.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdIntDiskErrorNotify
Alarm ID:
TKSPLATMA2
1. Recovery:
1. Run syscheck in verbose mode.
2. Determine the raid state of the mirrored disks, collect data:
cat /proc/mdstat
cat /etc/raidtab
3-488
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates the off-board storage server had a problem with its hardware
disks.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdRaidDiskErrorNotify
Alarm ID:
TKSPLATMA3
1. Recovery
1. Determine if the hardware platform is PP5160.
Note:
SDM on the PP5160 platform uses raid0 configuration.
Description:
This alarm indicates an error such as a corrupt system configuration or missing files.
Severity:
Major
3-489
Chapter 3
Platform (31000-32800)
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdPlatformErrorNotify
Alarm ID:
TKSPLATMA4
1. Recovery:
1. Run syscheck in verbose mode.
2. Determine the raid state of the mirrored disks, collect data:
cat /proc/mdstat
cat /etc/raidtab
Description:
This alarm indicates unsuccessful writing to at least one of the server’s file systems.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdFileSystemErrorNotify
Alarm ID:
TKSPLATMA5
3-490
Chapter 3
Platform (31000-32800)
1. Recovery:
1. Run syscheck in verbose mode.
2. Address full file systems identified in syscheck output, and run syscheck in
verbose mode.
3. It is recommended to contact My Oracle Support and provide the system health
check output.
Description:
This alarm indicates either the minimum number of instances for a required process
are not currently running or too many instances of a required process are running.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdPlatProcessErrorNotify
Alarm ID:
TKSPLATMA6
1. Recovery:
1. Rerun syscheck in verbose mode.
2. If the alarm has been cleared then the problem is solved..
3. If the alarm has not been cleared then determine the run level of the system.
4. If system run level is not 4 then determine why the system is operating at that run
level.
5. If system run level is 4, determine why the required number of instances
process(es) are not running.
6. If the alarm persists, it is recommended to contact My Oracle Support and provide
the system health check output.
3-491
Chapter 3
Platform (31000-32800)
Description:
Not Implemented.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdRamShortageErrorNotify
1. Recovery
Description:
This alarm indicates the server’s swap space is in danger of being depleted. This is
usually caused by a process that has allocated a very large amount of memory over
time.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdSwapSpaceShortageErrorNotify
Alarm ID:
TKSPLATMA8
1. Recovery:
1. Run syscheck in verbose mode.
2. Determine processes using swap.
3-492
Chapter 3
Platform (31000-32800)
Note:
One method to determine the amount of swap being used by process is:
Description:
This alarm indicates the connection between the server’s ethernet interface and the
customer network is not functioning properly.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdProvNetworkErrorNotify
Alarm ID:
TKSPLATMA9
1. Recovery:
1. Verify that a customer-supplied cable labeled TO CUSTOMER NETWORK is
securely connected to the appropriate server. Follow the cable to its connection
point on the local network and verify this connection is also secure.
2. Test the customer-supplied cable labeled TO CUSTOMER NETWORK with an
Ethernet Line Tester. If the cable does not test positive, replace it.
3. Have your network administrator verify that the network is functioning properly.
4. If no other nodes on the local network are experiencing problems and the fault has
been isolated to the server or the network administrator is unable to determine the
exact origin of the problem, it is recommended to contact My Oracle Support.
3-493
Chapter 3
Platform (31000-32800)
Description:
Uncorrectable ECC Memory Error -- This alarm indicates the chipset has detected an
uncorrectable (multiple-bit) memory error the ECC (Error-Correcting Code) circuitry in
the memory is unable to correct.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdEagleNetworkAErrorNotify
1. Recovery
Description:
Uncorrectable ECC Memory Error -- This alarm indicates the chipset has detected an
uncorrectable (multiple-bit) memory error the ECC (Error-Correcting Code) circuitry in
the memory is unable to correct.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-494
Chapter 3
Platform (31000-32800)
OID:
eagleXgDsrTpdEagleNetworkBErrorNotify
1. Recovery
Description:
Uncorrectable ECC memory error -- This alarm indicates the chipset has detected an
uncorrectable (multiple-bit) memory error the ECC (Error-Correcting Code) circuitry in
the memory is unable to correct.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdSyncNetworkErrorNotify
1. Recovery
Description:
This alarm indicates one of these conditions has occurred:
• A file system has exceeded a failure threshold, which means that more than 90%
of the available disk storage has been used on the file system.
• More than 90% of the total number of available files have been allocated on the
file system.
• A file system has a different number of blocks than it had when installed.
3-495
Chapter 3
Platform (31000-32800)
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdDiskSpaceShortageErrorNotify
Alarm ID:
TKSPLATMA13
1. Recovery:
1. Run syscheck in verbose mode.
2. Examine contents of identified volume in syscheck output to determine if any large
files are in the file system. Delete unnecessary files, or move files off of server.
Capture output from du -sx <file system>.
3. Capture output from df -h and df -i commands.
4. Determine processes using the file system(s) that have exceeded the threshold.
5. It is recommended to contact My Oracle Support and provide the system health
check output and provide additional file system output.
Description:
This alarm indicates the default network route of the server is experiencing a problem.
Caution:
When changing the network routing configuration of the server, verify the
modifications will not impact the method of connectivity for the current login
session. The route information must be entered correctly and set to the
correct values. Incorrectly modifying the routing configuration of the server
may result in total loss of remote network access.
Severity:
Major
3-496
Chapter 3
Platform (31000-32800)
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdDefaultRouteNetworkErrorNotify
1. Recovery:
1. Run syscheck in verbose mode.
2. If the syscheck output is: The default router at <IP_address> cannot
be pinged, the router may be down or unreachable. Do the following:
a. Verify the network cables are firmly attached to the server and the network
switch, router, hub, etc.
b. Verify the configured router is functioning properly. Check with the network
administrator to verify the router is powered on and routing traffic as required.
c. Check with the router administrator to verify that the router is configured to
reply to pings on that interface.
d. Rerun syscheck.
e. If the alarm has not been cleared, it is recommended to collect the syscheck
output and contact My Oracle Support.
3. If the syscheck output is: The default route is not on the
provisioning network, it is recommended to collect the syscheck output and
contact My Oracle Support.
4. If the syscheck output is: An active route cannot be found for a
configured default route, it is recommended to collect the syscheck output
and contact My Oracle Support.
Description:
The internal temperature within the server is unacceptably high.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-497
Chapter 3
Platform (31000-32800)
OID:
tpdServerTemperatureError
Alarm ID:
TKSPLATMA15
1. Recovery:
1. Ensure nothing is blocking the fan intake. Remove any blockage.
2. Verify the temperature in the room is normal. If it is too hot, lower the temperature
in the room to an acceptable level.
Note:
Be prepared to wait the appropriate period of time before continuing with
the next step. Conditions need to be below alarm thresholds consistently
for the alarm to clear. It may take about ten minutes after the room
returns to an acceptable temperature before the alarm cleared.
3. Run syscheck.
a. If the alarm has been cleared, the problem is resolved.
b. If the alarm has not been cleared, continue troubleshooting.
4. Replace the filter.
Note:
Be prepared to wait the appropriate period of time before continuing with
the next step. Conditions need to be below alarm thresholds consistently
for the alarm to clear. The alarm may take up to five minutes to clear
after conditions improve. It may take about ten minutes after the filter is
replaced before syscheck shows the alarm cleared.
5. Re-run syscheck.
a. If the alarm has been cleared, the problem is resolved.
b. If the alarm has not been cleared, continue troubleshooting.
6. If the problem has not been resolved, it is recommended to contact My Oracle
Support.
Description:
This alarm indicates one or more of the monitored voltages on the server main board
have been detected to be out of the normal expected operating range.
3-498
Chapter 3
Platform (31000-32800)
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdServerMainboardVoltageError
Alarm ID:
TKSPLATMA16
1. Recovery:
1. Run syscheck in verbose mode.
2. If the alarm persists, it is recommended to contact My Oracle Support and provide
the system health check output.
Description:
This alarm indicates one of the power feeds to the server has failed. If this alarm
occurs in conjunction with any Breaker Panel alarm, there might be a problem with the
breaker panel.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdPowerFeedErrorNotify
Alarm ID:
TKSPLATMA17
1. Recovery:
3-499
Chapter 3
Platform (31000-32800)
1. Verify all the server power feed cables to the server that is reporting the error are
securely connected.
2. Check to see if the alarm has cleared
• If the alarm has been cleared, the problem is resolved.
• If the alarm has not been cleared, continue with the next step.
3. Follow the power feed to its connection on the power source. Ensure that the
power source is ON and that the power feed is properly secured.
4. Check to see if the alarm has cleared
• If the alarm has been cleared, the problem is resolved.
• If the alarm has not been cleared, continue with the next step.
5. If the power source is functioning properly and the wires are all secure, have an
electrician check the voltage on the power feed.
6. Check to see if the alarm has cleared
• If the alarm has been cleared, the problem is resolved.
• If the alarm has not been cleared, continue with the next step.
7. If the problem has not been resolved, it is recommended to contact My Oracle
Support.
Description:
Either the hard drive has failed or failure is imminent.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdDiskHealthErrorNotify
Alarm ID:
TKSPLATMA18
1. Recovery:
1. Run syscheck in verbose mode.
2. Replace the hard drives that have failed or are failing.
3-500
Chapter 3
Platform (31000-32800)
Description:
The smartd service is not able to read the disk status because the disk has other
problems that are reported by other alarms. This alarm appears only while a server is
booting.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdDiskUnavailableErrorNotify
Alarm ID:
TKSPLATMA19
1. Recovery:
1. Run syscheck in verbose mode.
2. It is recommended to contact My Oracle Support and provide the system health
check output.
Description:
This alarm indicates the off-board storage server had a problem with its disk volume
filling up.
Severity:
Major
3-501
Chapter 3
Platform (31000-32800)
HA Score:
Normal
OID:
eagleXgDsrTpdDeviceErrorNotify
Alarm ID:
TKSPLATMA20
1. Recovery
Description:
This alarm indicates the IP bond is either not configured or down.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdDeviceIfErrorNotify
Alarm ID:
TKSPLATMA21
1. Recovery:
1. Run syscheck in verbose mode.
2. Investigate the failed bond and slave devices configuration:
a. Navigate to /etc/sysconfig/network-scripts for the persistent
configuration of a device.
3. Determine if the failed bond, and slave devices, has been administratively shut
down or has operational issues:
a. cat /proc/net/bonding/bondX, where X is bond designation
b. ethtool <slave device>
4. If bond, and slaves, are healthy attempt to administratively bring bond up:
3-502
Chapter 3
Platform (31000-32800)
a. ifup bondX
5. If the problem has not been resolved, it is recommended to contact My Oracle
Support and provide the system health check output and the output of the above
investigation.
Description:
This alarm indicates that chipset has detected a correctable (single-bit) memory error
that has been corrected by the ECC (Error-Correcting Code) circuitry in the memory.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdEccCorrectableError
Alarm ID:
TKSPLATMA22
1. Recovery:
1. No recovery necessary.
2. If the condition persists, verify the server firmware. Update the firmware if
necessary, and re-run syscheck in verbose mode. Otherwise if the condition
persists and the firmware is up to date, contact the hardware vendor to request
hardware replacement.
Description:
This alarm indicates the power supply 1 (feed A) has failed.
Severity:
Major
3-503
Chapter 3
Platform (31000-32800)
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdPowerSupply1Error
Alarm ID:
TKSPLATMA23
1. Recovery:
1. Verify nothing is obstructing the airflow to the fans of the power supply.
2. Run syscheck in verbose mode. The output provides details about what is wrong
with the power supply.
3. If the problem persists, it is recommended to contact My Oracle Support and
provide the syscheck verbose output. Power supply 1 (feed A) probably needs to
be replaced.
Description:
This alarm indicates the power supply 2 (feed B) has failed.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdPowerSupply2Error
Alarm ID:
TKSPLATMA24
1. Recovery:
1. Verify nothing is obstructing the airflow to the fans of the power supply.
3-504
Chapter 3
Platform (31000-32800)
2. Run syscheck in verbose mode. The output provides details about what is wrong
with the power supply.
3. If the problem persists, it is recommended to contact My Oracle Support and
provide the syscheck verbose output. Power supply 2 (feed B) probably needs to
be replaced.
Description:
This alarm indicates the server is not receiving information from the breaker panel
relays.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdBrkPnlFeedErrorNotify
Alarm ID:
TKSPLATMA25
1. Recovery:
1. Verify the same alarm is displayed by multiple servers:
• If this alarm is displayed by only one server, the problem is most likely to be
with the cable or the server itself. Look for other alarms that indicate a problem
with the server and perform the recovery procedures for those alarms first.
• If this alarm is displayed by multiple servers, go to the next step.
2. Verify the cables that connect the servers to the breaker panel are not damaged
and are securely fastened to both the alarm interface ports on the breaker panel
and to the serial ports on both servers.
3. If the problem has not been resolved, it is recommended to contact My Oracle
Support to request that the breaker panel be replaced.
3-505
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates a power fault has been identified by the breaker panel. The LEDs
on the center of the breaker panel (see Figure 3-1) identify whether the fault occurred
on the input power or the output power, as follows:
• A power fault on input power (power from site source to the breaker panel)
is indicated by one of the LEDs in the PWR BUS A or PWR BUS B group
illuminated red. In general, a fault in the input power means power has been lost
to the input power circuit.
Note:
LEDs in the PWR BUS A or PWR BUS B group that correspond to
unused feeds are not illuminated; LEDs in these groups that are not
illuminated do not indicate problems.
• A power fault on the output power (power from the breaker panel to other frame
equipment) is indicated by either BRK FAIL BUS A or BRK FAIL BUS B is
illuminated red. This type of fault can be caused by a surge or some sort of
power degradation or spike that causes one of the circuit breakers to trip.
Severity:
Major
3-506
Chapter 3
Platform (31000-32800)
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdBrkPnlBreakerErrorNotify
Alarm ID:
TKSPLATMA26
1. Recovery:
1. Verify the same alarm is displayed by both servers. The single breaker panel
normally sends alarm information to both servers:
• If this alarm is displayed by only one server, the problem is most likely with the
cable or the server itself. Look for other alarms that indicate a problem with the
server and perform the recovery procedures for those alarms first.
• If this alarm is displayed by both servers, go to the next step.
2. For each breaker assignment, verify the corresponding LED in the PWR BUS A
group and the PWR BUS B group is illuminated green.
If one of the LEDs in the PWR BUS A group or the PWR BUS B group is
illuminated red, a problem has been detected with the corresponding input power
feed. Perform these steps to correct this problem:
• Verify the customer provided source for the affected power feed is operational.
If the power source is properly functioning, have an electrician remove the
plastic cover from the rear of the breaker panel and verify the power source is
indeed connected to the input power feed connector on the rear of the breaker
panel. Correct any issues found.
• Check the LEDs in the PWR BUS A group and the PWR BUS B group again.
a. If the LEDs are now illuminated green, the issue has been resolved.
Proceed to step 4 to verify the alarm has been cleared.
b. If the LEDs are still illuminated red, continue to the next sub-step.
• Have the electrician verify the integrity of the input power feed. The input
voltage should measure nominally -48VDC (that is, between -41VDC and
3-507
Chapter 3
Platform (31000-32800)
-60VDC). If the supplied voltage is not within the acceptable range, the input
power source must be repaired or replaced.
Note:
Make sure the voltmeter is connected properly. The locations of the
BAT and RTN connections are in mirror image on either side of the
breaker panel.
If the measured voltage is within the acceptable range, the breaker
panel may be malfunctioning. The breaker panel must be replaced.
• Check the LEDs in the PWR BUS A group and the PWR BUS B group again
after the necessary actions have been taken to correct any issues found.
a. If the LEDs are now illuminated green, the issue has been resolved;
proceed to step 4 to verify the alarm has been cleared.
b. If the LEDs are still illuminated red, skip to step 5 .
3. Check the BRK FAIL LEDs for BUS A and for BUS B.
• If one of the BRK FAIL LEDs is illuminated red, then one or more of the
respective Input Breakers has tripped. (A tripped breaker is indicated by the
toggle located in the center position.) Perform the following steps to repair this
issue:
a. For all tripped breakers, move the breaker down to the open (OFF) position
and then back up to the closed (ON) position.
b. After all the tripped breakers have been reset, check the BRK FAIL LEDs
again. If one of the BRK FAIL LEDs is still illuminated red, run syscheck and
contact My Oracle Support.
4. If all of the BRK FAIL LEDs and all the LEDs in the PWR BUS A group and the
PWR BUS B group are illuminated green, there is most likely a problem with the
serial connection between the server and the breaker panel. This connection is
used by the system health check to monitor the breaker panel for failures. Verify
both ends of the labeled serial cables are properly secured. If any issues are
discovered with these cable connections, make the necessary corrections and
continue to the next step to verify the alarm has been cleared, otherwise it is
recommended to run syscheck and contact My Oracle Support.
5. Run syscheck.
• If the alarm has been cleared, the problem is resolved.
• If the problem has not been resolved, it is recommended to contact My Oracle
Support.
3-508
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates a failure in the hardware and/or software that monitors the
breaker panel. This could mean there is a problem with the file I/O libraries, the serial
device drivers, or the serial hardware itself.
Note:
When this alarm occurs, the system is unable to monitor the breaker panel
for faults. Thus, if this alarm is detected, it is imperative the breaker panel
be carefully examined for the existence of faults. The LEDs on the breaker
panel are the only indication of the occurrence of either alarm:
• 32324 – Breaker panel feed error
• 32325 – Breaker panel breaker error
until the breaker panel monitoring error has been corrected.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdBrkPnlMntErrorNotify
Alarm ID:
TKSPLATMA27
1. Recovery:
1. Verify the same alarm is displayed by both servers (the single breaker panel
normally sends alarm information to both servers):
• If this alarm is displayed by only one server, the problem is most likely with the
cable or the server itself. Look for other alarms that indicate a problem with the
server and perform the recovery procedures for those alarms first.
• If this alarm is displayed by both servers, go to the next step.
2. Verify both ends of the labeled serial cables are secured properly (for locations of
serial cables, see the appropriate hardware manual).
3. Run syscheck..
• If the alarm has been cleared, the problem is resolved.
• If the alarm has not been cleared, it is recommended to contact My Oracle
Support.
3-509
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates the heartbeat process has detected that it has failed to receive a
heartbeat packet within the timeout period.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdHaKeepaliveErrorNotify
Alarm ID:
TKSPLATMA28
1. Recovery:
1. Determine if the mate server is currently down and bring it up if possible.
2. Determine if the keepalive interface is down.
3. Determine if heartbeart is running (service TKLCha status).
Note:
This step may require command line ability.
Description:
This alarm indicates DRBD is not functioning properly on the local server. The DRBD
state (disk state, node state, and/or connection state) indicates a problem.
Severity:
Major
3-510
Chapter 3
Platform (31000-32800)
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdDrbdUnavailableNotify
Alarm ID:
TKSPLATMA29
1. Recovery
Description:
This alarm indicates DRBD is not replicating to the peer server. Usually this indicates
DRBD is not connected to the peer server. It is possible that a DRBD Split Brain has
occurred.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdDrbdNotReplicatingNotify
Alarm ID:
TKSPLATMA30
1. Recovery
3-511
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates DRBD is not functioning properly on the peer server. DRBD
is connected to the peer server, but the DRBD state on the peer server is either
unknown or indicates a problem.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdDrbdPeerProblemNotify
Alarm ID:
TKSPLATMA31
1. Recovery
Description:
This major alarm indicates there is an issue with either a physical or logical disk in the
HP disk subsystem. The message includes the drive type, location, slot and status of
the drive that has the error.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-512
Chapter 3
Platform (31000-32800)
OID:
eagleXgDsrTpdHpDiskProblemNotify
Alarm ID:
TKSPLATMA32
1. Recovery:
1. Run syscheck in verbose mode.
2. If Cache Status is OK and Cache Status Details reports a cache error was
detected so diagnostics should be run, there probably is no battery and data was
left over in the write cache not getting flushed to disk and does not since there is
no battery.
3. If Cache Status is Permanently Disabled and Cache Status Details indicated the
cache is disabled and if there is no battery, then the firmware should be upgraded.
4. Re-run syscheck in verbose mode if firmware upgrade was necessary.
5. If the condition persists, it is recommended to contact My Oracle Support and
provide the system health check output. The disk may need to be replaced.
Description:
This major alarm indicates there is an issue with an HP disk controller. The message
includes the slot location, the component on the controller that has failed, and status
of the controller that has the error.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdHpDiskCtrlrProblemNotify
Alarm ID:
TKSPLATMA33
1. Recovery:
1. Run syscheck in verbose mode.
3-513
Chapter 3
Platform (31000-32800)
Description:
This major alarm indicates there is an issue with the process that caches the HP
disk subsystem status. This usually means the hpacucliStatus/hpDiskStatus daemon
is either not running, or hung.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdHPACUCLIProblemNotify
Alarm ID:
TKSPLATMA34
1. Recovery:
1. Run syscheck in verbose mode.
2. Verify the firmware is up to date for the server, if not up to date, upgrade firmware
and re-run syscheck in verbose mode.
3. Determine if the HP disk status daemon is running. If not running, verify it was not
administratively stopped.
Note:
The disk status daemon is named either TKLChpacucli or
TPDhpDiskStatus in more recent versions of TPD.
3-514
Chapter 3
Platform (31000-32800)
Description:
One or more access paths of a multipath device are failing or are not healthy, or the
multipath device does not exist.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdMpathDeviceProblemNotify
1. Recovery:
Description:
The link is down.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-515
Chapter 3
Platform (31000-32800)
OID:
eagleXgDsrTpdSwitchLinkDownErrorNotify
Alarm ID:
TKSPLATMA36
1. Recovery:
1. Verify the cabling between the port and the remote side.
2. Verify networking on the remote end.
3. If the problem persists, it is recommended to contact My Oracle Support to
determine who should verify port settings on both the server and the switch.
Description:
This alarm indicates the number of half open TCP sockets has reached the major
threshold. This problem is caused by a remote system failing to complete the TCP
3-way handshake.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdHalfOpenSockLimitNotify
Alarm ID:
TKSPLATMA37
1. Recovery:
1. Run syscheck in verbose mode.
2. Determine what process and address reports a state of SYN_RECV and collect
data:
• netstat -nap
3. It is recommended to contact My Oracle Support and provide the system health
check output and collected data.
3-516
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates there was an error while trying to update the firmware flash on
the E5-APP-B cards.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdFlashProgramFailureNotify
Alarm ID:
TKSPLATMA38
1. Recovery:
Description:
This alarm indicates a connection to the serial mezzanine board may not be properly
seated.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-517
Chapter 3
Platform (31000-32800)
OID:
eagleXgDsrTpdSerialMezzUnseatedNotify
Alarm ID:
TKSPLATMA39
1. Recovery:
1. Ensure both ends of both cables connecting the serial mezzanine card to the main
board are properly seated into their connectors.
2. It is recommended to contact My Oracle Support if reseating the cables does not
clear the alarm.
Description:
This alarm indicates the maximum number of running processes has reached the
major threshold.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdMaxPidLimitNotify
Alarm ID:
TKSPLATMA40
1. Recovery:
1. Run syscheck in verbose mode.
2. Execute pstree to see what pids are on the system and what process created
them. Collect the output of command and review the output to determine the
process responsible for the alarm.
3. It is recommended to contact My Oracle Support and provide the system health
check output and pid output.
3-518
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates the server is not synchronized to an NTP source and has not
been synchronized for an extended number of hours and has reached the major
threshold.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdNTPDaemonNotSynchronizedErrorNotify
Alarm ID:
TKSPLATMA41
1. Recovery:
1. Verify NTP settings and NTP sources can be reached.
a. Ensure ntpd service is running.
b. Verify the content of the /etc/ntp.conf file is correct for the server.
c. Verify the ntp peer configuration; execute ntpq -p and analyze the output.
Verify peer data, such as tally code (first column before remote), remote, refid,
stratum (st), and jitter, are valid for server.
d. Execute ntpstat to determine the ntp time synchronization status. If not
synchronized or the stratum is not correct for server, then ping the ntp peer to
determine if peer can be reached.
2. If ntp peer is reachable, restart the ntpd service.
3. If problem persists, then resetting the NTP date may resolve the issue.
Note:
Before resetting the ntp date, the applications may need to be stopped
and, subsequent to the ntp reset, the application restarted.
• To reset date:
• sudo service ntpd stop
• sudo ntpdate <ntp server IP>
• sudo service ntpd start
4. If the problem persists, it is recommended to contact My Oracle Support.
3-519
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates the server is not synchronized to an NTP source and has never
been synchronized since the last configuration change.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdNTPDaemonNeverSynchronizedNotify
Alarm ID:
TKSPLATMA42
1. Recovery:
1. Verify NTP settings and that NTP sources can be reached.
a. Ensure ntpd service is running.
b. Verify the content of the /etc/ntp.conf file is correct for the server.
c. Verify the ntp peer configuration; execute ntpq -p and analyze the output.
Verify peer data, such as tally code (first column before remote), remote, refid,
stratum (st), and jitter, are valid for server.
d. Execute ntpstat to determine the ntp time synchronization status. If not
synchronized or the stratum is not correct for server, then ping the ntp peer to
determine if peer can be reached.
2. If the ntp peer is reachable, restart the ntpd service.
3. If the problem persists, then resetting the NTP date may resolve the issue.
Note:
Before resetting the ntp date, the applications may need to be stopped
and, subsequent to the ntp reset, the application restarted.
• To reset date:
• sudo service ntpd stop
3-520
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates the NTP offset of the server that is currently being synced to is
greater than the major threshold.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrNtpOffsetCheckErrorNotify
Alarm ID:
TKSPLATMA43
1. Recovery:
1. Verify NTP settings and that NTP sources can be reached.
a. Ensure ntpd service is running.
b. Verify the content of the /etc/ntp.conf file is correct for the server.
c. Verify the ntp peer configuration; execute ntpq -p and analyze the output.
Verify peer data, such as tally code (first column before remote), remote, refid,
stratum (st), and jitter, are valid for server.
d. Execute ntpstat to determine the ntp time synchronization status. If not
synchronized or the stratum is not correct for server, then ping the ntp peer to
determine if peer can be reached.
2. If the ntp peer is reachable, restart the ntpd service.
3. If the problem persists, then resetting the NTP date may resolve the issue.
Note:
Before resetting the ntp date, the applications may need to be stopped
and, subsequent to the ntp reset, the application restarted.
3-521
Chapter 3
Platform (31000-32800)
• To reset date:
• sudo service ntpd stop
• sudo ntpdate <ntp server IP>
• sudo service ntpd start
4. If the problem persists, it is recommended to contact My Oracle Support.
Description:
This alarms indicates the physical disk or logical volume on RAID controller is not in
optimal state as reported by syscheck.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdDiskProblemNotify
Alarm ID:
TKSPLATMA44
1. Recovery:
1. Run syscheck in verbose mode.
2. It is recommended to contact My Oracle Support and provide the system health
check output.
Description:
This alarms indicates the RAID controller needs intervention.
Severity:
Major
3-522
Chapter 3
Platform (31000-32800)
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdDiskCtrlrProblemNotify
Alarm ID:
TKSPLATMA45
1. Recovery:
1. Run syscheck in verbose mode.
2. Verify firmware is up to date for the server, if not up to date, upgrade firmware and
re-run syscheck in verbose mode.
3. It is recommended to contact My Oracle Support and provide the system health
check output.
Description:
This alarm indicates the upgrade snapshot(s) are invalid and backout is no longer
possible.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdUpgradeSnapshotInvalidNotify
Alarm ID:
TKSPLATMA46
1. Recovery:
1. Run accept to remove invalid snapshot(s) and clear alarms.
2. If the alarm persists, it is recommended to contact My Oracle Support.
3-523
Chapter 3
Platform (31000-32800)
Description:
This alarms indicates the OEM hardware management service reports an error.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdOEMHardware
Alarm ID:
TKSPLATMA47
1. Recovery:
1. Run syscheck in verbose mode.
2. It is recommended to contact My Oracle Support and provide the system health
check output.
Description:
This alarms indicates the hwmgmtcliStatus daemon is not running or is not
responding.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-524
Chapter 3
Platform (31000-32800)
OID:
eagleXgDsrTpdHWMGMTCLIProblemNotify
Alarm ID:
TKSPLATMA47
1. Recovery:
1. Run syscheck in verbose mode.
2. Verify the firmware is up to date for the server, if not up to date, upgrade firmware
and re-run syscheck in verbose mode.
3. Determine if the hwmgmtd process is running. If not running, verify it was not
administratively stopped.
• Execute service hwmgmtd status to produce output indicating the
process is running.
• If not running, attempt to start process service hwmgmtd status.
4. Determine if the TKLChwmgmtcli process is running. If not running, verify it was
not administratively stopped.
• Execute status TKLChwmgmtcli to produce output indicating the process
is running.
• If not running, attempt to start process start TKLChwmgmtcli.
5. Verify there are no hwmgmt error messages in /var/log/messages. If there are this
could indicate the Oracle utility is hung. If hwmgmtd process is hung, proceed with
next step.
6. It is recommended to contact My Oracle Support and provide the system health
check output.
Description:
This alarm indicates the FIPS subsystem is not running or has encountered errors.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdFipsSubsystemProblemNotify
1. Recovery:
3-525
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates HIDS has detected file tampering.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
eagleXgDsrTpdHidsFileTamperingNotify
1. Recovery:
Description:
This alarm indicates the security process monitor is not running.
Severity:
Major
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-526
Chapter 3
Platform (31000-32800)
OID:
eagleXgDsrTpdSecurityProcessDownNotify
1. Recovery:
Description:
This alarm indicates that one of the following conditions has occurred:
• A file system has exceeded a warning threshold, which means that more than
80% (but less than 90%) of the available disk storage has been used on the file
system.
• More than 80% (but less than 90%) of the total number of available files have
been allocated on the file system.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdDiskSpaceShortageWarning
Alarm ID:
TKSPLATMI1
1. Recovery:
1. Run syscheck in verbose mode.
2. Examine contents of identified volume in syscheck output to determine if any large
files are in the file system. Delete unnecessary files, or move files off of server.
Capture output from "du -sx <file system>".
3. Capture output from "df -h" and "df -i" commands.
4. Determine processes using the file system(s) that have exceeded the threshold.
5. It is recommended to contact My Oracle Support, provide the system health check
output, and provide additional file system output.
3-527
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates that either the minimum number of instances for a required
process are not currently running or too many instances of a required process are
running.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdApplicationProcessError
Alarm ID:
TKSPLATMI2
1. Recovery:
1. Run syscheck in verbose mode.
2. If the alarm has been cleared, then the problem is solved.
3. If the alarm has not been cleared, determine the run level of the system.
• If system run level is not 4, determine why the system is operating at that run
level.
• If system run level is 4, determine why the required number of instances
processes are not running.
4. For additional assistance, it is recommended to contact My Oracle Support and
provide the syscheck output.
Description:
This alarm indicates one or more of the server’s hardware components are not in
compliance with specifications. Refer to the appropriate hardware manual.
3-528
Chapter 3
Platform (31000-32800)
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdHardwareConfigError
Alarm ID:
TKSPLATMI3
1. Recovery:
1. Run syscheck in verbose mode.
2. Contact the hardware vendor to request a hardware replacement.
Description:
This alarm is generated by the MPS syscheck software package and is not part of the
TPD distribution.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdRamShortageWarning
Alarm ID:
TKSPLATMI4
1. Recovery
1. Refer to MPS-specific documentation for information regarding this alarm.
2. It is recommended to contact My Oracle Support.
3-529
Chapter 3
Platform (31000-32800)
Description:
This alarm is generated by the MPS syscheck software package and is not part of the
PLAT distribution.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdSoftwareConfigError
1. Recovery
Description:
This alarm indicates the swap space available on the server is less than expected.
This is usually caused by a process that has allocated a very large amount of memory
over time.
Note:
For this alarm to clear, the underlying failure condition must be consistently
undetected for a number of polling intervals. Therefore, the alarm may
continue to be reported for several minutes after corrective actions are
completed.
Severity:
Minor
3-530
Chapter 3
Platform (31000-32800)
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdSwapSpaceShortageWarning
Alarm ID:
TKSPLATMI6
1. Recovery:
1. Run syscheck in verbose mode.
2. Determine which processes are using swap.
a. List application processes and determine the process ID.
b. Determine how much swap each process is using. One method to determine
the amount of swap being used by process is:
• grep VmSwap /proc/<process id>/status
3. It is recommended to contact My Oracle Support, provide the system health check
output, and process swap usage.
Description:
This alarm indicates the default network route is either not configured or the current
configuration contains an invalid IP address or hostname.
Caution:
When changing the server’s network routing configuration, it is important
to verify the modifications do not impact the method of connectivity for
the current login session. It is also crucial this information not be entered
incorrectly or set to improper values. Incorrectly modifying the server’s
routing configuration may result in total loss of remote network access.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
3-531
Chapter 3
Platform (31000-32800)
HA Score:
Normal
OID:
tpdDefaultRouteNotDefined
Alarm ID:
TKSPLATMI7
1. Recovery:
1. Run syscheck in verbose mode.
2. If the syscheck output is: The default router at <IP_address> cannot be
pinged, the router may be down or unreachable. Do the following:
a. Verify the network cables are firmly attached to the server and the network
switch, router, hub, etc.
b. Verify the configured router is functioning properly. Check with the network
administrator to verify the router is powered on and routing traffic as required.
c. Check with the router administrator to verify the router is configured to reply to
pings on that interface.
d. Rerun syscheck.
3. If the alarm has not cleared, it is recommended to collect the syscheck output and
contact My Oracle Support.
Description:
This alarm indicates the internal temperature within the server is outside of the
normal operating range. A server fan failure may also exist along with the Server
Temperature Warning.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdServerTemperatureWarning
3-532
Chapter 3
Platform (31000-32800)
Alarm ID:
TKSPLATMI8
1. Recovery:
1. Ensure nothing is blocking the fan intake. Remove any blockage.
2. Verify the temperature in the room is normal. If it is too hot, lower the temperature
in the room to an acceptable level.
Note:
Be prepared to wait before continuing with the next step. Conditions
need to be below alarm thresholds consistently for the alarm to clear.
It may take about ten minutes after the room returns to an acceptable
temperature before the alarm cleared.
3. Run syscheck.
4. Replace the filter (refer to the appropriate hardware manual).
Note:
Be prepared to wait before continuing with the next step. Conditions
need to be below alarm thresholds consistently for the alarm to clear. It
may take about ten minutes after the filter is replaced before the alarm
cleared.
5. Run syscheck.
6. If the problem has not been resolved, it is recommended to contact My Oracle
Support.
Description:
This alarm indicates that an application process has failed and debug information is
available.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-533
Chapter 3
Platform (31000-32800)
OID:
tpdServerCoreFileDetected
Alarm ID:
TKSPLATMI9
1. Recovery:
1. It is recommended to contact My Oracle Support to create a service request.
2. On the affected server, execute this command:
ll /var/TKLC/core
Add the command output to the service request. Include the date of creation found
in the command output.
3. Attach core files to the My Oracle Support service request.
4. The user can remove the files to clear the alarm with this command:
rm -f /var/TKLC/core/<coreFileName>
Description:
This alarm indicates the NTP daemon (background process) has been unable to
locate a server to provide an acceptable time reference for synchronization.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdNTPDeamonNotSynchronizedWarning
Alarm ID:
TKSPLATMI10
1. Recovery:
1. Verify NTP settings and that NTP sources can be reached.
a. Ensure ntpd service is running.
b. Verify the content of the /etc/ntp.conf file is correct for the server.
3-534
Chapter 3
Platform (31000-32800)
c. Verify the ntp peer configuration; execute ntpq -p and analyze the output.
Verify peer data, such as tally code (first column before remote), remote, refid,
stratum (st), and jitter, are valid for server.
d. Execute ntpstat to determine the ntp time synchronization status. If not
synchronized or the stratum is not correct for server, then ping the ntp peer to
determine if peer can be reached.
2. If ntp peer is reachable, restart the ntpd service.
3. If problem persists, then resetting the NTP date may resolve the issue.
Note:
Before resetting the ntp date, the applications may need to be stopped
and, subsequent to the ntp reset, the application restarted.
• To reset date:
• sudo service ntpd stop
• sudo ntpdate <ntp server IP>
• sudo service ntpd start
4. If the problem persists, it is recommended to contact My Oracle Support.
Description:
The presence of this alarm indicates the CMOS battery voltage has been detected
to be below the expected value. This alarm is an early warning indicator of CMOS
battery end-of-life failure, which causes problems if the server is powered off.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdCMOSBatteryVoltageLow
Alarm ID:
TKSPLATMI11
1. Recovery:
3-535
Chapter 3
Platform (31000-32800)
Description:
A non-fatal disk issue (such as a sector cannot be read) exists.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdSmartTestWarn
Alarm ID:
TKSPLATMI12
1. Recovery:
1. Run syscheck in verbose mode.
2. It is recommended to contact My Oracle Support.
Description:
This alarm indicates that either we are unable to perform an snmpget command
on the configured SNMP OID or the value returned failed the specified comparison
operation.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-536
Chapter 3
Platform (31000-32800)
OID:
tpdDeviceWarn
Alarm ID:
TKSPLATMI13
1. Recovery:
1. Run syscheck in verbose mode.
2. It is recommended to contact My Oracle Support.
Description:
This alarm can be generated by either an SNMP trap or an IP bond error.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdDeviceIfWarn
Alarm ID:
TKSPLATMI14
1. Recovery:
1. Run syscheck in verbose mode.
2. It is recommended to contact My Oracle Support.
Description:
This alarm indicates the hardware watchdog was not strobed by the software and
so the server rebooted the server. This applies to only the last reboot and is only
supported on a T1100 application server.
3-537
Chapter 3
Platform (31000-32800)
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdWatchdogReboot
Alarm ID:
TKSPLATMI15
1. Recovery:
Description:
This alarm indicates the server has been inhibited and therefore HA failover is
prevented from occurring.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdHaInhibited
Alarm ID:
TKSPLATMI16
1. Recovery:
3-538
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates the server is in the process of transitioning HA state from active
to standby.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdHaActiveToStandbyTrans
Alarm ID:
TKSPLATMI17
1. Recovery:
Description:
This alarm indicates the server is in the process of transitioning HA state from standby
to active.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-539
Chapter 3
Platform (31000-32800)
OID:
tpdHaStandbyToActiveTrans
Alarm ID:
TKSPLATMI18
1. Recovery:
Description:
This alarm is used to indicate a configuration error.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdHealthCheckFailed
Alarm ID:
TKSPLATMI19
1. Recovery:
Description:
This minor alarm indicates the time on the server is outside the acceptable range (or
offset) from the NTP server. The Alarm message will provide the offset value of the
server from the NTP server and the offset limit that the application has set for the
system.
Severity:
Minor
3-540
Chapter 3
Platform (31000-32800)
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
ntpOffsetCheckWarning
Alarm ID:
TKSPLATMI20
1. Recovery:
1. Verify NTP settings and that NTP sources can be reached.
a. Ensure ntpd service is running.
b. Verify the content of the /etc/ntp.conf file is correct for the server.
c. Verify the ntp peer configuration; execute ntpq -p and analyze the output.
Verify peer data, such as tally code (first column before remote), remote, refid,
stratum (st), and jitter, are valid for server.
d. Execute ntpstat to determine the ntp time synchronization status. If not
synchronized or the stratum is not correct for server, then ping the ntp peer to
determine if peer can be reached.
2. If ntp peer is reachable, restart the ntpd service.
3. If problem persists, then resetting the NTP date may resolve the issue.
Note:
Before resetting the ntp date, the applications may need to be stopped
and, subsequent to the ntp reset, the application restarted.
• To reset date:
• sudo service ntpd stop
• sudo ntpdate <ntp server IP>
• sudo service ntpd start
4. If the problem persists, it is recommended to contact My Oracle Support.
3-541
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates NTP is synchronizing to a server, but the stratum level of the
NTP server is outside of the acceptable limit. The alarm message provides the
stratum value of the NTP server and the stratum limit the application has set for
the system.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
ntpStratumCheckFailed
Alarm ID:
TKSPLATMI21
1. Recovery:
1. Verify NTP settings and that NTP sources can be reached.
a. Ensure ntpd service is running.
b. Verify the content of the /etc/ntp.conf file is correct for the server.
c. Verify the ntp peer configuration; execute ntpq -p and analyze the output.
Verify peer data, such as tally code (first column before remote), remote, refid,
stratum (st), and jitter, are valid for server.
d. Execute ntpstat to determine the ntp time synchronization status. If not
synchronized or the stratum is not correct for server, then ping the ntp peer to
determine if peer can be reached.
2. If ntp peer is reachable, restart the ntpd service.
3. If problem persists, then resetting the NTP date may resolve the issue.
Note:
Before resetting the ntp date, the applications may need to be stopped
and, subsequent to the ntp reset, the application restarted.
• To reset date:
• sudo service ntpd stop
• sudo ntpdate <ntp server IP>
• sudo service ntpd start
4. If the problem persists, it is recommended to contact My Oracle Support.
3-542
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates the T1200 server drive sensor is not working.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
sasPresenceSensorMissing
Alarm ID:
TKSPLATMI22
1. Recovery:
Description:
This alarm indicates the number of drives configured for this server is not being
detected.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
3-543
Chapter 3
Platform (31000-32800)
OID:
sasDriveMissing
Alarm ID:
TKSPLATMI23
Description:
This alarm indicates a DRBD synchronization is in progress from the peer server to
the local server. The local server is not ready to act as the primary DRBD node, since
its data is not up to date.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdDrbdFailoverBusy
Alarm ID:
TKSPLATMI24
1. Recovery
Description:
This minor alarm indicates that the HP disk subsystem is currently resynchronizing
after a failed or replaced drive, or some other change in the configuration of
the HP disk subsystem. The output of the message will include the disk that is
resynchronizing and the percentage complete. This alarm should eventually clear
3-544
Chapter 3
Platform (31000-32800)
once the resync of the disk is completed. The time it takes for this is dependent on the
size of the disk and the amount of activity on the system.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdHpDiskResync
Alarm ID:
TKSPLATMI25
1. Recovery:
1. Run syscheck in verbose mode.
2. If the percent recovering is not updating, wait at least 5 minutes between
subsequent runs of syscheck.
3. If the alarm persists, it is recommended to contact My Oracle Support and provide
the syscheck output.
Description:
This alarm indicates the Telco switch has detected an issue with an internal fan.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdTelcoFanWarning
3-545
Chapter 3
Platform (31000-32800)
Alarm ID:
TKSPLATMI26
1. Recovery:
• Contact the vendor to get a replacement switch. Verify the ambient air temperature
around the switch is as low as possible until the switch is replaced.
Note:
My Oracle Support personnel can perform an snmpget command or log
into the switch to get detailed fan status information.
Description:
This alarm indicates the Telco switch has detected the internal temperature has
exceeded the threshold.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdTelcoTemperatureWarning
Alarm ID:
TKSPLATMI27
1. Recovery:
1. Lower the ambient air temperature around the switch as low as possible.
2. If the problem persists, it is recommended to contact My Oracle Support.
3-546
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates the Telco switch has detected that one of the duplicate power
supplies has failed.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdTelcoPowerSupplyWarning
Alarm ID:
TKSPLATMI28
1. Recovery:
1. Verify the breaker was not tripped.
2. If the breaker is still good and problem persists, it is recommended to contact My
Oracle Support who can perform a snmpget command or log into the switch to
determine which power supply is failing. If the power supply is bad, the switch
must be replaced.
Description:
This alarm indicates the HP server has detected that one of the setting for either the
embedded serial port or the virtual serial port is incorrect.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdInvalidBiosValue
3-547
Chapter 3
Platform (31000-32800)
Alarm ID:
TKSPLATMI29
1. Recovery:
• Change the BIOS values to the expected values which involves re-booting the
server. It is recommended to contact My Oracle Support for directions on changing
the BIOS.
Description:
This alarm indicates the kernel has crashed and debug information is available.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdServerKernelDumpFileDetected
Alarm ID:
TKSPLATMI30
1. Recovery:
1. Run syscheck in verbose mode.
2. It is recommended to contact My Oracle Support.
Description:
This alarm indicates a TPD upgrade has failed.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
3-548
Chapter 3
Platform (31000-32800)
HA Score:
Normal
OID:
TpdServerUpgradeFailed
Alarm ID:
TKSPLATMI31
1. Recovery:
Description
This alarm indicates the number of half open TCP sockets has reached the major
threshold. This problem is caused by a remote system failing to complete the TCP
3-way handshake.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdHalfOpenSocketWarning
Alarm ID:
TKSPLATMI32
1. Recovery:
1. Run syscheck in verbose mode.
2. It is recommended to contact My Oracle Support.
3-549
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates an upgrade occurred but has not been accepted or rejected yet.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdServerUpgradePendingAccept
Alarm ID:
TKSPLATMI33
1. Recovery:
• Follow the steps in the application procedure to accept or reject the upgrade.
Description:
This alarm indicates the maximum number of running processes has reached the
minor threshold.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdMaxPidWarning
Alarm ID:
TKSPLATMI34
1. Recovery:
1. Run syscheck in verbose mode.
3-550
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates an NTP source has been rejected by the NTP daemon and is not
being considered as a time source.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdNTPSourceIsBad
Alarm ID:
TKSPLATMI35
1. Recovery:
1. Verify NTP settings and that NTP sources can be reached.
a. Ensure ntpd service is running.
b. Verify the content of the /etc/ntp.conf file is correct for the server.
c. Verify the ntp peer configuration; execute ntpq -p and analyze the output.
Verify peer data, such as tally code (first column before remote), remote, refid,
stratum (st), and jitter, are valid for server.
d. Execute ntpstat to determine the ntp time synchronization status. If not
synchronized or the stratum is not correct for server, then ping the ntp peer to
determine if peer can be reached.
2. If ntp peer is reachable, restart the ntpd service.
3. If problem persists, then resetting the NTP date may resolve the issue.
Note:
Before resetting the ntp date, the applications may need to be stopped
and, subsequent to the ntp reset, the application restarted.
• To reset date:
3-551
Chapter 3
Platform (31000-32800)
Description:
This alarm indicates the RAID logical volume is currently resyncing after a failed/
replaced drive, or some other change in the configuration. The output of the message
includes the disk that is resyncing. This alarm should eventually clear once the resync
of the disk is completed. The time it takes for this is dependent on the size of the disk
and the amount of activity on the system (rebuild of 600G disks without any load takes
about 75 minutes).
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdDiskResync
Alarm ID:
TKSPLATMI36
1. Recovery:
1. Run syscheck in verbose mode.
2. If this alarm persists for several hours (depending on a load of a server, rebuilding
an array can take multiple hours to finish), it is recommended to contact My Oracle
Support.
Description:
This alarm indicates the upgrade snapshot(s) are above configured threshold and
either accept or reject of LVM upgrade has to be run soon, otherwise snapshots
become full and invalid.
3-552
Chapter 3
Platform (31000-32800)
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdUpgradeSnapshotWarning
Alarm ID:
TKSPLATMI37
1. Recovery:
1. Run accept or reject of current LVM upgrade before snapshots become invalid.
2. It is recommended to contact My Oracle Support.
Description:
This alarm indicates the FIPS subsystem requires a reboot to complete configuration.
Severity:
Minor
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score:
Normal
OID:
tpdFipsSubsystemWarning
1. Recovery
3-553
Chapter 3
Platform (31000-32800)
Description
Platform data collection error.
Severity
Minor
Instance
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score
Normal
OID
tpdPdcError
1. Recovery
1. Run /usr/TKLC/plat/bin/pdcAdm. If run as admusr, use sudo to run the
command.
2. If this command fails, it is recommended to collect the output and contact My
Oracle Support.
Description
Server patch pending accept/reject.
Severity
Minor
Instance
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score
Normal
OID
tpdServerPatchPendingAccept
3-554
Chapter 3
Platform (31000-32800)
1. Recovery
Description:
The BIOS setting for CPU power limit is different than expected.
Severity:
Minor
Instance:
N/A
HA Score:
Normal
OID:
tpdCpuPowerLimitMismatch
Alarm ID:
TKSPLATMI41
1. Recovery:
Description
Telco switch notification.
Severity
Info
Instance
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and
bindVarNamesValueStr
HA Score
Normal
3-555
Chapter 3
Platform (31000-32800)
OID
tpdTelcoSwitchNotification
1. Recovery:
Description:
This alarm indicates HIDS was initialized.
Default Severity:
Info
OID:
tpdHidsBaselineCreated
1. Recovery:
Description:
HIDS baseline was deleted.
Default Severity:
Info
OID:
tpdHidsBaselineDeleted
1. Recovery:
Description:
HIDS was enabled.
Default Severity:
Info
3-556
Chapter 3
Platform (31000-32800)
OID:
tpdHidsEnabled
1. Recovery:
Description:
HIDS was disabled.
Default Severity:
Info
OID:
tpdHidsDisabled
1. Recovery:
Description:
HIDS monitoring suspended.
Default Severity:
Info
OID:
tpdHidsSuspended
1. Recovery:
Description:
HIDS monitoring resumed.
Default Severity:
Info
3-557
Chapter 3
Diameter Custom Applications (DCA) Framework Alarms and Events (33300-33630)
OID:
tpdHidsResumed
1. Recovery:
Description:
HIDS baseline updated.
Default Severity:
Info
OID:
tpdHidsBaselineUpdated
1. Recovery:
Description
Dsroam failed to create application version on DcaLifecycleSoam table.
Severity
Info
Instance
DcaLifecycleNoam.verId
HA Score
Normal
Throttle Seconds
60
OID
dcaDcaCreateAppVersionFailureNotify
3-558
Chapter 3
Diameter Custom Applications (DCA) Framework Alarms and Events (33300-33630)
1. Recovery
Description
Dsroam failed to synchronize configuration data on SO.
Severity
Info
Instance
ApplicationId.name
HA Score
Normal
Throttle Seconds
60
OID
dcaDcaUpdateConfigDataFailureNotify
1. Recovery
Description
Dsroam failed to delete application version from DcaLifecycleSoam table.
Severity
Info
Instance
DcaLifecycleSoam.verId
HA Score
Normal
Throttle Seconds
60
OID
dcaDcaDeleteAppVersionFailureNotify
1. Recovery
3-559
Chapter 3
Diameter Custom Applications (DCA) Framework Alarms and Events (33300-33630)
Description
The DSR Application UDR Event Queue Utilization is approaching its maximum
capacity.
Severity
Minor, Major, Critical
Instance
RxDcaUdrEventMsgQueue [<DcaDalId.dalId>], DCA
HA Score
Normal
OID
dcaDSRAppUdrEventMessageQueueUtilizationNotify
1. Recovery
1. The DSR Application’s UDR Result Message Queue is approaching its maximum
capacity. This alarm typically does not occur when no other congestion alarms are
asserted. The alarm may occur for a variety of reasons:
The processing of the UDR results by the DCA application indicates the DCA
application is overly CPU intensive. The alarm may also be the result of the DCA
application sending too many UDR queries per Diameter message, which may be
avoided by storing variables in the Diameter transaction context. In both cases,
review and optimize the business logic.
2. If no additional congestion alarms are asserted, the DSR application Task may
be experiencing a problem preventing it from processing messages from its UDR
Event Message Queue. Examine the alarm log from Alarms & Events.
3. It is recommended to contact My Oracle Support for assistance if needed.
Description
The script generated runtime errors.
Severity
Critical
3-560
Chapter 3
Diameter Custom Applications (DCA) Framework Alarms and Events (33300-33630)
Instance
The DCA App short name (DcaDalId.shortName) prefixed with "DCA:" and thread
pool (Request, Answer or SBR Event)
HA Score
Normal
OID
dcaDSRAppRuntimeErrorNotify
1. Recovery
• The error message generated by the Perl interpreter is included in the alarm's
additional info.
Fix the error accordingly and recompile the Perl script, or replace the Trial/
Production version (depending on whether the DA-MP is a Trial DA-MP or not)
with another script version.
Note:
Because the compilation occurs in parallel while the previously compiled
script is still running (and hence keeps raising the alarm), a successful
compilation will not immediately clear the alarm. There will be an auto
clear latency of 20 seconds that will clear the alarm.
Description
The Perl interpreter attempts to invoke a non-existent procedure.
Severity
Critical
Instance
The DCA App short name (DcaDalId.shortName) prefixed with "DCA:" and thread
pool (Request, Answer or UDR Event)
HA Score
Normal
OID
dcaDSRAppProcedureNotFoundNotify
1. Recovery
3-561
Chapter 3
Diameter Custom Applications (DCA) Framework Alarms and Events (33300-33630)
• The name of the missing procedure is include in the alarm's additional info.
The procedure names involved are either the configured Diameter request and
answer event handler names (Main Menu, and then DCA Framework, and then
<Application Name>, and then General Options on the NOAM) or the callback
names coded in the Perl script.
Possible resolutions are:
1. Fix the procedure names in the Perl script and re-compile the Perl script
2. Fix the procedure names in the configuration
3. Replace the Trial/Production version (depending on whether the DA-MP is a
Trial DA-MP or not) with another script version.
Note:
Because the compilation occurs in parallel while the previously compiled
script is still running (and hence keeps raising the alarm,) a successful
compilation will not immediately clear the alarm. There will be an auto
clear latency of 20 seconds that will clear the alarm.
Description
Diameter message routing failure due to full DRL queue. Diameter egress message
could not be sent because the DRL queue is full.
Severity
Info
Instance
The DCA App short name (DcaDalId.shortName) prefixed with "DCA:"
HA Score
Normal
Throttle Seconds
60
OID
dcaEgressMsgRouteFailureDueToDrlQueueExhaustedNotify
1. Recovery
3-562
Chapter 3
Diameter Custom Applications (DCA) Framework Alarms and Events (33300-33630)
Description
DCA failed to send query to UDR due to ComAgent Error.
Severity
Info
Instance
The DCA App short name (DcaDalId.shortName) prefixed with "DCA:"
HA Score
Normal
Throttle Seconds
60
OID
dcaComAgentSendFailureNotify
1. Recovery
Description
The script generates compilation errors.
Severity
Critical
Instance
The DCA App short name (DcaDalId.shortName) prefixed with "DCA:"
HA Score
Normal
OID
dcaDSRAppCompileErrorNotify
1. Recovery
• The error message generated by the Perl interpreter is included in the alarm's
additional info.
3-563
Chapter 3
Diameter Custom Applications (DCA) Framework Alarms and Events (33300-33630)
Fix the error accordingly and recompile the Perl script, or replace the Trial/
Production version (depending on whether the DA-MP is a Trial DA-MP or not)
with another script version.
Description
The DCA application script has been successfully re-compiled and re-loaded.
Severity
Info
Instance
The DCA App short name (DcaDalId.shortName) prefixed with "DCA:"
HA Score
Normal
Throttle Seconds
0 (zero)
OID
dcaDcaAppReloadedNotify
1. Recovery
• No action required.
Description
The script could not be saved in the /tmp/appworks_temp directory.
Severity
Critical
Instance
The DCA App short name (DcaDalId.shortName) prefixed with "DCA:"
HA Score
Normal
OID
dcaDSRAppScriptGenerationErrorNotify
1. Recovery
3-564
Chapter 3
Diameter Custom Applications (DCA) Framework Alarms and Events (33300-33630)
Description:
DCA Asynchronous Task has stopped processing of Logging Events.
Severity:
Minor, Major
Instance:
The DCA App short name (DcaDalId.shortName) prefixed with "DCA:" and suffixed
with DcaAsyncTaskId.
HA Score:
Normal
OID:
DcaLoggingFailureNotify
Trigger Condition:
Low disk space or High event rate or file I/O error.
Description:
The DSR application DCA AsyncTask queue utilization is approaching its maximum
capacity.
Severity:
Minor , Major, Critical
Instance:
The DCA App short name (DcaDalId.shortName) prefixed with DCA:
HA Score:
Normal
OID:
DSRAppDcaAsyncMessageQueueUtilizationNotify
3-565
Chapter 3
Diameter Custom Applications (DCA) Framework Alarms and Events (33300-33630)
Description:
DCA fetch log script has stopped working on the active SO.
Severity:
Minor
Instance:
The DCA App short name (DcaDalId.shortName) prefixed with DCA:
HA Score:
Normal
OID:
DcaFetchLogFailure
Description
DCA failed while sending a CreateAndSend Request message.
Severity
Major
Instance
The DCA App short name (DcaDalId.shortName) prefixed with DCA:
HA Score
Normal
OID
DCACreateAndSendRequestMessageSendFailed
1. Recovery:
3-566
Chapter 3
Diameter Custom Applications (DCA) Framework Alarms and Events (33300-33630)
Description
DcaCustomMeal.descr
Severity
Minor, Major, Critical
Instance
"DCA:" concatenated with the DcaDalId.shortName
HA Score
Normal
OID
"DcaCustomNotification" concatenated with the DcaCustomMeal.id
Description
DcaCustomMeal.descr
Severity
Minor, Major, Critical
Instance
"DCA:" concatenated with the DcaDalId.shortName
HA Score
Normal
OID
DcaCustomNotification concatenated with the DcaCustomMeal.id
3-567
Chapter 3
Independent SBR Alarms and Events (12003-12010, 33730-33830)
Description :
The SBR application is in a congested state and is shedding operations. The
Sbr.RxIngressMsgQueueAvg measurement shows the average percentage of queue
length utilization, which is used to determine congestion.
Severity:
Minor, Major, Critical
Instance:
Sbr.RxIngressMsgQueueMetric[subId], SBR
HA Score:
Normal
Throttle Seconds:
0 (zero)
OID:
sbrCongestionState
Cause:
The SBR application is in a congested state due to high traffic load.
Diagnostic Information:
The SBR queue congestion alarm can have default onset and abatement thresholds
based on average ingress queue percentage utilization. See in the event history the
threshold percentage for queue utilization. Additional capacity may be required to
service the traffic load. Contact My Oracle Support for support.
1. Recovery:
• If congestion falls below the clear threshold, this alarm clears. The SBR
congestion status exceeds the alarm threshold. Additional capacity may be
required to service the traffic load. It is recommended to contact My Oracle
Support for assistance.
3-568
Chapter 3
Independent SBR Alarms and Events (12003-12010, 33730-33830)
Description:
The SBR application has exceeded its active Session Binding threshold. The
configuration, Maximum active session bindings, is used to calculate the percentage.
Severity:
Minor, Major, Critical
Instance:
Sbr.EvCurrentSessionMetric, SBR
HA Score:
Normal
Throttle Seconds:
0 (zero)
OID:
sbrActiveSessBindThreshold
Cause:
The SBR active session bindings count exceeds the alarm threshold which means the
number of bindings and sessions are more than the configured limits.
Diagnostic Information:
Additional capacity may be required to service the traffic load. View additional
information in the event history. Contact My Oracle Support for support.
1. Recovery:
1. If total active session bindings fall below the clear threshold, this alarm clears.
2. Navigate to CPA, and then Configuration, and then SBR to increase the
maximum active session bindings configuration if it is too low.
Description:
The SBR application has stopped.
Severity:
Minor, Major, Critical
Instance:
<Sbr>
HA Score:
Normal
Throttle Seconds:
0 (zero)
OID:
pfeSbrProcTermNotify
3-569
Chapter 3
Independent SBR Alarms and Events (12003-12010, 33730-33830)
Cause:
The SBR process monitored by the process manager has terminated. This should
cause a switch over of the standby SBR server to active.
Diagnostic Information:
• Look for additional information in the event history.
• Contact My Oracle Support (MOS) for support.
1. Recovery:
• When an active SBR is terminated as indicated by this alarm, its standby becomes
active. The Process Manager automatically attempts to restart the terminated
process. If the Process Manager fails to start the terminated process, it raises the
alarm again. The standby that became active remains active until it is placed into
standby mode again.
1. Check the status of the terminated SBR by navigating to Status & Manage,
and then Server.
2. If the Process Manager cannot restart the process, it is recommended to
contact My Oracle Support for assistance.
Description
U-SBR database audit statistics report.
Severity
Info
Instance
<SbrSgName>
HA Score
Normal
Throttle Seconds
0 (zero)
OID
ipfeSbrProcTermNotify
1. Recovery
• This report provides statistics related to Universal SBR table audits. Each SBR
server generates this event upon reaching the last record in a table. The statistics
reported are appropriate for the type of table being audited.
3-570
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
Association down
Severity
Major
Instance
<AssocName>
HA Score
Normal
OID
vSTPVstpassociationDownNotify
1. Recovery
1. If the association is manually disabled, then no further action is needed.
2. Verify the association's local IP address and port number are configured on the
remote ASP.
3. Verify the association's remote IP address and port are correctly identify a remote
ASP.
4. Verify that IP network connectivity exists between the MP server and the remote
ASP.
5. Check the event history logs at Alarms & Events, and then View History for
additional SS7 events or alarms from this MP server.
6. Verify the remote ASP is not under maintenance.
7. It is recommended to contact My Oracle Support for assistance if needed
Description
Link down
Severity
Minor
3-571
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Instance
<LinkName>
HA Score
Normal
OID
vSTPLinkDownNotify
1. Recovery
1. This alarm indicates that an MTP2 link is not in In-Service state. Generally this
alarm is asserted when a server or the network is undergoing maintenance or
when a link has been manually disabled.
2. If the E1/T1 trunk hosting the link or the link itself is manually disabled, then no
further action is necessary.
3. Verify that TimeSlot and LinkSpeed are configured properly.
4. Check the event history logs at Alarms & Events, and then View History for
additional SS7 events or alarms from this MP server.
5. Verify that the remote E1/T1 trunk is not under maintenance.
6. It is recommended to contact My Oracle Support for assistance if needed
Description
HLRR is unable to access the SS7 Destination Point Code because the RSP status is
Unavailable.
Severity
Critical
Instance
<RSPName> (of the RSP/Destination which failed)
HA Score
Normal
OID
vSTPMtp3RouteUnavailableNotify
1. Recovery
1. If the RSP/Destination becomes Unavailable due to a Linkset failure, the M3UA
attempts to automatically recover all links not manually disabled or blocked.
3-572
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
HLRR is unable to access the SS7 Destination Point Code using this route.
Severity
Minor
Instance
<RouteName>
HA Score
Normal
OID
vSTPMtp3RouteUnavailableNotify
1. Recovery
1. If the route becomes Unavailable due to a Linkset failure, the M3UA attempts to
automatically recover all links not manually disabled or blocked.
2. If the route becomes Unavailable due to the receipt of a TFP, MTP3 periodically
attempts to validate the route using the MTP3 signaling-route-set-test procedure.
3. It is recommended to contact My Oracle Support for assistance if needed
Description
The SS7 linkset to an adjacent SP has failed.
Severity
Major
Instance
<LinkSetName>
HA Score
Normal
3-573
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
OID
vSTPMtp3LinksetUnavailableNotify
1. Recovery
1. M3UA attempts to automatically recover all links not manually disabled or blocked.
2. Check the event history logs at Alarms & Events, and then View History for
additional SS7 events or alarms from this MP server.
3. Verify the adjacent server is not under maintenance.
4. It is recommended to contact My Oracle Support for assistance if needed
Description
M3UA has reported to MTP3 that a link is out of service.
Severity
Minor
Instance
<LinkName>
HA Score
Normal
OID
vSTPMtp3LinkUnavailableNotify
1. Recovery
1. M3UA attempts to automatically recover all links not manually disabled or blocked.
2. Check the event history logs at Alarms & Events, and then View History for
additional SS7 events or alarms from this MP server.
3. Verify that the adjacent server is not under maintenance.
4. It is recommended to contact My Oracle Support for assistance if needed
3-574
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
MTP3 has started to utilize a lower priority (higher cost) route to route traffic toward a
given destination address because the higher priority (lower cost) route specified for
that RSP/Destination has become unavailable.
Severity
Major
Instance
<RSPName>
HA Score
Normal
OID
vSTPMtp3PreferredRouteunavailableNotify
1. Recovery
1. Check the event history logs at Alarms & Events, and then View History for
additional SS7 events or alarms from this MP server.
2. Verify the adjacent server is not under maintenance.
3. It is recommended to contact My Oracle Support for assistance if needed
Description
Node isolated - All links down.
Severity
Major
Instance
<None>
HA Score
Normal
OID
vSTPMtp3NodeIsolatedAllLinkDownNotify
1. Recovery
1. Check the event history logs at Alarms & Events, and then View History for
additional SS7 events or alarms from this MP server.
2. Verify the adjacent server is not under maintenance.
3-575
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
The SS7 linkset to an adjacent SP has restricted.
Severity
Major
Instance
<LinksetName>
HA Score
Normal
OID
vSTPMtp3LinksetRestrictedNotify
1. Recovery
1. Check the event history logs at Alarms & Events, and then View History for
additional SS7 events or alarms from this MP server.
2. Verify that the adjacent server is not under maintenance.
3. It is recommended to contact My Oracle Support for assistance if needed.
Description
Link congested
Severity
Minor, Major, Critical
Instance
<LinkName>
HA Score
Normal
OID
vSTPMtp3LinkCongestionNotify
3-576
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
1. Recovery
The percent utilization of the VSTP's link congestion is approaching its maximum
capacity. If this problem persists and the queue reaches 100% utilization based on
the level defined, alarm is generated.
This alarm should not normally occur when no other congestion alarms are
asserted. This may occur for a variety of reasons:
• An IP network or Adjacent node problem may exist preventing SCTP from
transmitting messages into the network at the same pace that messages are
being received from the network.
• The SCTP Association may be experiencing a problem preventing it from
processing events from its event queue.
1. Examine the alarm logs from Main Menu > Alarms & Events.
2. If one or more MPs in a server site have failed, the traffic will be distributed
amongst the remaining MPs in the server site. MP server status can be monitored
from Main Menu > Status & Control > Server Status.
3. There may be an insufficient number of MPs configured to handle the network
traffic load. The egress traffic rate of each MP can be monitored from Main Menu
> Status & Control > KPI Display. If all MPs are in a congestion state then the
offered load to the server site is exceeding its capacity.
4. It is recommended to contact My Oracle Support for assistance if needed.
Description
SCTP connection refused.
Severity
Info
Instance
<Link>
HA Score
Normal
Throttle Seconds
0 (zero)
OID
vSTPSctpConnectionRefusedNotify
1. Recovery
3-577
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
Failed to configure Transport.
Severity
Info
Instance
<AssociationName>
HA Score
Normal
Throttle Seconds
0 (zero)
OID
vSTPFailedtoconfigureConnectionNotify
1. Recovery
Description
Far-end closed the connection.
Severity
Info
Instance
<AssociationName>
HA Score
Normal
Throttle Seconds
10
OID
vSTPFarendclosedtheconnectionNotify
1. Recovery
1. Investigate the remote node is failed or if it is under maintenance.
3-578
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
2. Check the remote node for alarms or logs that might indicate the cause for their
closing the association.
3. It is recommended to contact My Oracle Support for assistance if needed.
Description
SCTP connection closed.
Severity
Info
Instance
<AssociationName>
HA Score
Normal
Throttle Seconds
10
OID
vSTPSctpconnectionclosedNotify
1. Recovery
1. Verify IP network connectivity still exists between the MP server and the remote
server.
2. Verify the remote server is not configured to change IP addresses once connection
is established.
3. Check the event history logs at Alarms & Events, and then View History to
determine if the SCTP Association is experiencing a problem preventing it from
processing events from its event queue.
4. Verify the adjacent server is not under maintenance.
5. It is recommended to contact My Oracle Support for assistance if needed
Description
Remote IP Address state change
Severity
Info
Instance
<AssociationName>
3-579
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
HA Score
Normal
Throttle Seconds
0 (zero)
OID
vSTPRemoteIPAddressstatechangeNotify
1. Recovery
1. Verify IP network connectivity still exists between the MP server and the remote
server.
2. It is recommended to contact My Oracle Support for assistance if needed.
Description
Association admin state change.
Severity
Info
Instance
<AssociationName>
HA Score
Normal
Throttle Seconds
0 (zero)
OID
vSTPAssociationadminstatechangeNotify
1. Recovery
Description
Link admin state change
Severity
Info
3-580
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Instance
<AssociationName>
HA Score
Normal
Throttle Seconds
0 (zero)
OID
vSTPLinkadminStateChangeNotify
1. Recovery
Description
Received invalid M3UA message.
Severity
Info
Instance
<AssociationName>, <LinkName>, or <LinkId>
HA Score
Normal
Throttle Seconds
10
OID
vSTPVstpReceivedinvalidM3UAMessageNotify
1. Recovery
• Examine the M3UA error code and the diagnostic information and attempt to
determine why the far-end of the link sent the malformed message.
• Error code 0x01 indicates an invalid M3UA protocol version. Only version 1 is
supported.
• Error code 0x03 indicates an unsupported M3UA message class.
• Error code 0x04 indicates an unsupported M3UA message type.
• Error code 0x07 indicates an M3UA protocol error. The message contains a
syntactically correct parameter that does not belong in the message or occurs
too many times in the message.
• Error code 0x11 indicates an invalid parameter value. Parameter type and
length are valid, but value is out of range.
3-581
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
Received M3UA ERROR.
Severity
Info
Instance
If message can be mapped to a link, then <LinkName>. Otherwise,
<AssociationName>
HA Score
Normal
Throttle Seconds
10
OID
vSTPVstpReceivedM3uaErrorNotify
1. Recovery
• Examine the M3UA error code and the diagnostic information and attempt to
determine why the far-end of the link sent the ERROR message.
• Error code 0x01 indicates an invalid M3UA protocol version. Only version 1 is
supported.
• Error code 0x03 indicates an unsupported M3UA message class.
• Error code 0x04 indicates an unsupported M3UA message type.
• Error code 0x05 indicates an unsupported M3UA traffic mode.
• Error code 0x07 indicates an M3UA protocol error. The message contains a
syntactically correct parameter that does not belong in the message or occurs
too many times in the message.
• Error code 0x09 indicates an invalid SCTP stream identifier. A DATA message
was sent on stream 0.
3-582
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
• Error code 0x0D indicates that the message was refused due to management
blocking. An ASP Up or ASP Active message was received, but refused for
management reasons.
• Error code 0x11 indicates an invalid parameter value. Parameter type and
length are valid, but value is out of range.
• Error code 0x12 indicates a parameter field error. Parameter is malformed
(such as invalid length).
• Error code 0x13 indicates an unexpected parameter. Message contains an
undefined parameter. The differences between this error and Protocol Error
are subtle. Protocol Error is used when the parameter is recognized, but not
intended for the type of message that contains it. Unexpected Parameter is
used when the parameter identifier is not known.
• Error code 0x14 indicates that the destination status is unknown. This
message can be sent in response to a DAUD from the MP server if the
SG cannot or does not wish to provide the destination status or congestion
information
• Error code 0x16 indicates a missing parameter. Missing mandatory parameter,
or missing required conditional parameter.
• Error code 0x19 indicates an invalid routing context. Received routing context
not configured for any linkset using the association on which the message was
received.
Description
Failed to send DITA message.
Severity
Info
Instance
<LinkName>
HA Score
Normal
Throttle Seconds
10
OID
vSTPMtp3TfpReceivedNotify
1. Recovery
1. Check the event history logs at Alarms & Events, and then View History for
additional events or alarms from this MP server.
2. Verify the remote server is not under congestion. The MP server has alarms to
indicate the congestion if this is the case.
3. It is recommended to contact My Oracle Support for assistance if needed.
3-583
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when a TFP message is received by the MTP3 layer.
Severity:
Info
Instance:
None
Throttle Seconds:
30
OID:
vSTPMtp3TfpReceivedNotify
1. Recovery:
Description:
This event is generated when a TFA message is received by the MTP3 layer.
Severity:
Info
Instance:
None
Throttle Seconds:
30
OID:
vSTPMtp3TfaReceivedNotify
1. Recovery:
3-584
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when a TFR message is received by the MTP3 layer.
Severity:
Info
Instance:
None
Throttle Seconds:
30
OID:
vSTPMtp3TfrReceivedNotify
1. Recovery:
Description:
This event is generated when a TFC message is received by the MTP3 layer.
Severity:
Info
Instance:
None
Throttle Seconds:
30
OID:
vSTPMtp3TfcReceivedNotify
1. Recovery:
Description:
This event is generated when a message was discarded due to a routing error.
Severity:
Info
3-585
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Instance:
None
Throttle Seconds:
10
OID:
vSTPMtp3RoutingFailureNotify
1. Recovery:
Description:
This event is generated when a message was discarded due to a routing error - the
network indicator value received in a message from the network is not assigned to the
MP.
Severity:
Info
Instance:
None
Throttle Seconds:
10
OID:
vSTPMtp3RoutingFailureInvalidNiNotify
1. Recovery:
Description:
This event is generated when a message was discarded due to a routing error - the
SI value received in a message from the network is associated with a User Part that is
not currently supported.
Severity:
Info
Instance:
None
3-586
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Throttle Seconds:
10
OID:
vSTPMtp3RoutingFailureInvalidSiNotify
1. Recovery:
Description:
This event is generated when a M3UA discarded a message due to any of the these
reasons:
• Invalid Header, Unsupported Message Type
• Invalid Header, Version Invalid
• Invalid Header, Unsupported Message Class
• Invalid Header, Invalid Stream Identifier
• Invalid Header, Length is Invalid
• Message Decode Failed
• Unexpected Message
• Invalid Routing Context
• Unsupported Traffic Mode
• No configured AS for ASP
• Link is Disabled
Severity:
Info
Instance:
None
Throttle Seconds:
10
OID:
vSTPFailedToReceiveDataMessageNotify
1. Recovery:
3-587
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
ComAgent service is unavailable or congested.
Severity:
Critical
Instance:
None
HA Score:
Normal
Throttle (Seconds):
86400
OID:
vSTPVstpEirApplDegradedNotify
1. Make sure the UDR connection is up and the ComAgent service is up and not
degraded.
2. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Failed to decode TCAP parameter.
Severity:
Info
Instance:
None
HA Score:
Normal
Throttle (Seconds):
10
3-588
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
OID:
vSTPVstpEirTcapDecodeErrNotify
Description:
Failed to encode message.
Severity:
Minor
Instance:
None
HA Score:
Normal
Throttle (Seconds):
10
OID:
vSTPVstpEirEncodeFailNotify
Description:
IMEI is missing in the received message
Severity:
Minor
Instance:
None
HA Score:
Normal
3-589
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Throttle (Seconds):
3600
OID:
vSTPVstpMissingImeiNotify
1. Recovery:
Description:
Invalid length for map IMEI parameter.
Severity:
Minor
Instance:
None
HA Score:
Normal
Throttle (Seconds):
86400
OID:
vSTPVstpMissingImeiNotify
Description:
Unsupported TCAP message type.
Severity:
Minor
Instance:
None
HA Score:
Normal
3-590
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Throttle (Seconds):
10
OID:
vSTPVstpInvalidImeiNotify
Description:
The percent utilization of the VSTP MP's LSS Stack Event Queue is approaching its
maximum capacity.
Severity:
Major
Instance:
None
HA Score:
Normal
Throttle (Seconds):
86400
OID:
vSTPVstpLssEventQueueNotify
Description:
The percent utilization of the VSTP MP's Logging Stack Event Queue is approaching
its maximum capacity.
Severity:
Minor
Instance:
N/A
3-591
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
HA Score:
Normal
Throttle (Seconds):
86400
OID:
vSTPVstpLssLoggingEventQueueNotify
Description:
EIR log copy from MP to SOAM has failed.
Severity:
Major
Instance:
None
HA Score:
Normal
Throttle (Seconds):
86400
OID:
vSTPVstpEirApplLogFetchErrorNotify
1. Make sure the SOAM is able to copy the EIR logs from SOAM.
2. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
Log write error in MP.
Severity:
Major
3-592
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Instance:
Normal
HA Score:
Normal
Throttle (Seconds):
10
OID:
vSTPVstpEirLogErrorNotify
Description:
This event is generated when vSTP discards an M3UA ingress message for any of
these reasons:
• Invalid Header
• Message Decode Failed
• Unexpected Message, AspInactive received in Invalid State
• Invalid Routing Context
• Received message in Invalid state
• Unsupported Traffic Mode
• Unexpected Message, link state is not active
• No configured AS for ASP
• Unexpected Message, AspPayload received in Invalid State
• Unexpected Message, AspDaud received in Invalid State
• Unexpected Message, AspActive is received in Invalid state
• Link is Disabled
• Unexpected Message, AspUp is received in Invalid state
• Message length is greater than 272 bytes
Severity:
Info
Instance:
None
3-593
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Throttle Seconds:
10
OID:
vSTPM3uaIngressMsgDiscardedNotify
1. Recovery:
Description:
The percent utilization of the VSTP MP's M3RL Linkset Buffer is approaching its
maximum capacity.
Severity:
Major
Instance:
None
HA Score:
Normal
Auto Clear
0 (zero)
Throttle Seconds:
86400
OID:
vSTPM3rlLinksetBufferUtilNotify
1. Recovery:
Description:
The percent utilization of the VSTP MP's M3RL Rsp Buffer is approaching its
maximum capacity.
Severity:
Major
Instance:
None
3-594
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
HA Score:
Normal
Auto Clear
0 (zero)
Throttle Seconds:
86400
OID:
vSTPM3rlRspBufferUtilNotify
1. Recovery:
Description:
The percent utilization of the VSTP MP's M2PA Retransmission Buffer Buffer is
approaching its maximum capacity.
Severity:
Major
Instance:
None
HA Score:
Normal
Auto Clear
0 (zero)
Throttle Seconds:
86400
OID:
vSTPM2paRetransmissionBufferUtilNotify
1. Recovery:
3-595
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
The percent utilization of the VSTP MPs MTP2 Transmission and Retransmission
Buffer is approaching its maximum capacity.
Severity:
Major
Instance:
None
HA Score:
Normal
Auto Clear
0 (zero)
Throttle Seconds:
86400
OID:
vSTPMtp2TransmissionBufferUtil
1. Recovery:
Description:
Mandatory parameter is missing in the received message.
Severity:
Minor
Instance:
None
HA Score:
Normal
Throttle (Seconds):
10
OID:
VstpMissingMandatoryParm
1. Recovery:
1. xxx
2. It is recommended to contact My Oracle Support if further assistance is needed.
3-596
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when the subscriber ID paramenter length is less than or
greater than 2 plus the length of MSISDN.
Severity:
Info
Instance:
None
OID:
VstpMalformedSubId
1. Recovery:
Description:
This event is generated when the choice for subscriber identity is not MSISDN.
Severity:
Info
Instance:
None
OID:
VstpUnexpectedSubId
1. Recovery:
3-597
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when there is an invalid length for the MSISDN value in the
subscriber identity parameter.
Severity:
Info
Instance:
None
OID:
VstpInvalidMsisdn
1. Recovery:
Description:
This event is generated when an invalid requested information parameter is in the ATI
query message.
Severity:
Info
Instance:
None
OID:
VstpInvalidRequestedInfo
1. Recovery:
Description:
This event is generated when digits are truncated in the encoded parameter of the
response message.
3-598
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Severity:
Info
Instance:
None
OID:
VstpDigitsTruncated
1. Recovery:
Description:
ATINP application state has changed to one of these states:
• available
• unavailable
• degraded
This alarm is raised when the UDR connection or CA service is down or degraded.
Severity:
Critical
Instance:
N/A
HA Score:
Normal
Throttle Seconds:
300
OID:
N/A
1. Recovery:
• This alarm clears when the UDR connection is back up or the CA service is
available again.
3-599
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
vSTP egress connection message queue utilization threshold crossed.
Severity:
Minor, Major, Critical
Instance:
<AssocName>
HA Score:
Normal
OID:
vSTPVstpTxConnQueueCongestedNotify
1. Recovery:
1. Determine if an IP network or Adjacent node problem exists, preventing SCTP
from transmitting messages into the network at the same pace that messages are
being received from the network.
2. Check the event history logs at Alarms & Events, and then View History to
determine if the SCTP Association is experiencing a problem preventing it from
processing events from its event queue..
3. Monitor the MP server status at Status & Manage, and then Server to determine
if one or more MPs in a server site have failed, causing traffic to be distributed
amongst the remaining MPs in the server site.
4. Monitor the egress traffic rate of each MP at Status & Manage, and then KPIs
to determine if there is an insufficient number of MPs configured to handle the
network traffic load..
5. It is recommended to contact My Oracle Support for assistance if needed
Description:
vSTP ingress link MSU TPS threshold crossed.
Severity:
Minor, Major, Critical
Instance:
<Link>
HA Score:
Normal
3-600
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
OID:
vSTPVstpRxLinkTpsNotify
1. Recovery:
1. The percent utilization of the vSTP's ingress message traffic coming from the
signaling link. The Ingress control servers the vSTP defense and offers a
protection against traffic floods or Denial of Service type of attacks.
2. It is recommended to contact My Oracle Support for assistance if needed
Description:
vSTP egress link MSU TPS threshold crossed.
Severity:
Minor, Major, Critical
Instance:
<Link>
HA Score:
Normal
OID:
vSTPVstpTxLinkTpsNotify
1. Recovery:
1. The percent utilization of the vSTP's egress message traffic coming from the
signaling link. The Egress control is meant to protect the network to protect the
network elements connected to the STP.
2. It is recommended to contact My Oracle Support for assistance if needed.
Description
vSTP ingress link TPS threshold crossed for Network management messages
Severity
Critical
Instance
<Link>
3-601
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
HA Score
Normal
OID
vSTPVstpRxMgmtLinkTpsNotify
1. Recovery
1. The percent utilization of the vSTP's ingress management message coming from
the signaling link. The ingress control servers the vSTP defense and offers a
protection against traffic floods or Denial of Service type of attacks.
2. It is recommended to contact My Oracle Support for assistance if needed.
Description
vSTP egress connection message is discard threshold crossed.
Severity
Minor, Major, Critical
Instance
<AssocName>
HA Score
Normal
OID
vSTPVstpTxDiscardLevelNotify
1. Recovery
1. Determine if an IP network or Adjacent node problem exists, preventing SCTP
from transmitting messages into the network at the same pace that messages are
being received from the network.
2. Check the event history logs at Alarms & Events, and then View History to
determine if the SCTP Association is experiencing a problem preventing it from
processing events from its event queue.
3. Monitor the MP server status at Status & Manage, and then Server to determine
if one or more MPs in a server site have failed, causing traffic to be distributed
amongst the remaining MPs in the server site.
4. Monitor the egress traffic rate of each MP at Status & Manage, and then KPIs
to determine if there is an insufficient number of MPs configured to handle the
network traffic load.
5. It is recommended to contact My Oracle Support for assistance if needed.
3-602
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
The percent utilization of the vSTP MP's SCCP Stack Event Queue is approaching its
maximum capacity.
Severity
Major
Instance
None
HA Score
Normal
OID
vSTPVstpSccpStackEventQueueUtilNotify
1. Recovery
Description
The percent utilization of the vSTP MP's M3RL Stack Event Queue is approaching its
maximum capacity.
Severity
Major
Instance
None
HA Score
Normal
OID
vSTPVstpM3rlStackEventQueueUtilNotify
1. Recovery
3-603
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
The percent utilization of the vSTP MP's M3RL Network Management Event Queue is
approaching its maximum capacity.
Severity
Major
Instance
None
HA Score
Normal
OID
vSTPVstpM3rlNetMgmtEventQueueUtilNotify
1. Recovery
Description
The percent utilization of the vSTP MP's M3UA Stack Event Queue is approaching its
maximum capacity.
Severity
Major
Instance
None
HA Score
Normal
3-604
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
OID
vSTPVstpM3uaStackEventQueueUtilNotify
1. Recovery
Description
The percent utilization of the vSTP MP's M2PA Stack Event Queue is approaching its
maximum capacity.
Severity
Major
Instance
None
HA Score
Normal
OID
vSTPVstpM2paStackEventQueueUtilNotify
1. Recovery
Description:
The percent utilization of the vSTP MP's M3UA Tx Stack Event Queue is approaching
its maximum capacity.
Severity:
Major
Instance:
None
3-605
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
HA Score:
Normal
OID:
vSTPVstpM3uaTxStackEventQueueUtilNotify
1. Recovery:
1. The alarm is an indication of M3UA Tx Stack Event queue utilization is exceeding
its configured capacity.
2. It is recommended to contact My Oracle Support for assistance if needed.
Description:
M2PA link operational state changed
Severity:
Info
Instance:
<LinkName>
HA Score:
Normal
OID:
vSTPLinkOpStateChangedNotify
1. Recovery:
• No action necessary.
Description:
M2PA link failed
Severity:
Info
Instance:
<LinkName>
3-606
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
HA Score:
Normal
OID:
vSTPLinkFailedNotify
1. Recovery:
• No action necessary.
Description:
M2PA Ingress message discarded
Severity:
Info
Instance:
<LinkName>
HA Score:
Normal
OID:
vSTPIngressMessageDiscardedNotify
1. Recovery:
• No action necessary.
Description:
M2PA Egress message discarded
Severity:
Info
Instance:
<LinkName>
3-607
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
HA Score:
Normal
OID:
vSTPEgressMessageDiscardedNotify
1. Recovery:
• No action necessary.
Description:
M2PA Message Encoding Failed
Severity:
Info
Instance:
<LinkName>
HA Score:
Normal
OID:
vSTPMessageEncodeFailedNotify
1. Recovery:
• No action necessary.
Description:
M2PA Message Decoding Failed
Severity:
Info
Instance:
<LinkName>
3-608
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
HA Score:
Normal
OID:
vSTPMessageDecodeFailedNotify
1. Recovery:
• No action necessary.
Description:
This event is generated when the M2PA proving or proving emergency period timer
(T4) expires.
Severity:
Info
Instance:
<Link Name>
OID:
vSTPProvingTimerExpiredNotify
1. Recovery:
Description:
This event is generated when the M2PA remote timer (M6) expires.
Severity:
Info
Instance:
<Link Name>
3-609
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
OID:
vSTPRemoteCongTimerExpiredNotify
1. Recovery:
Description:
This event is generated when a remote processor outage is received on a M2PA link.
Severity:
Info
Instance:
<Link Name>
OID:
vSTPRpoReceivedNotify
1. Recovery:
Description:
This event is generated when a remote out of service is received on a M2PA link.
Severity:
Info
Instance:
<Link name>
OID:
vSTPRemoteOOSReceivedNotify
1. Recovery:
3-610
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated if the MTP2 link administrative state is manually changed
from one administrative state to another (e.g. Disabled to Enabled and vice versa).
Severity:
Info
Instance:
<Link name>
OID:
vSTPMtp2LinkAdmStateChangeNotify
1. Recovery:
• This event is shows that Link Admin State is changing from one state to another. It
is recommended to contact My Oracle Support for assistance, if needed.
Description:
This event is generated when sending message to TDM driver fails.
Severity:
Info
Instance:
<Link name>
OID:
vSTPMtp2FailedToSendMsgNotify
1. Recovery:
• None. This event is shows that sending message to TDM driver fails. It is
recommended to contact My Oracle Support for assistance, if needed.
3-611
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when receive message from TDM driver fails.
Severity:
Info
Instance:
<Link name>
OID:
vSTPMtp2FailedToRcvMsgNotify
1. Recovery:
• None. This event is showing that receive message from TDM driver fails. It is
recommended to contact My Oracle Support for assistance, if needed.
Description:
This event is generated when MTP2 link operational state is changed
Severity:
Info
Instance:
<Link name>
OID:
vSTPMtp2LinkOpStateChangeNotify
1. Recovery:
• This event is shows that MTP2 link operational state is changed from one state
to another. It is recommended to contact My Oracle Support for assistance, if
needed.
3-612
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when MTP2 link is failed due to Link Out Of Service Message
Received from peer or MTP2 Link Stop Request Received.
Severity:
Info
Instance:
<Link name>
OID:
vSTPMtp2LinkFailedNotify
1. Recovery:
• This event shows that MTP2 link has failed. It is recommended to contact My
Oracle Support for assistance, if needed.
Description:
This event is generated when MTP2 Ingress message is discarded.
Severity:
Info
Instance:
<Link name>
OID:
vSTPMtp2IngressMsgDiscardedNotify
1. Recovery:
3-613
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when MTP2 Egress message is discarded.
Severity:
Info
Instance:
<Link name>
OID:
vSTPMtp2EgressMsgDiscardedNotify
1. Recovery:
Description:
This event is generated when Remote Out Of Service is received from peer on MTP2
link.
Severity:
Info
Instance:
<Link name>
OID:
vSTPMtp2RemoteOOSReceivedNotify
1. Recovery:
• This event shows that Remote Out Of Service is received from peer. It is
recommended to contact My Oracle Support for assistance, if needed.
3-614
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
Subsystem congested.
Instance:
DPC = SSN
Severity:
Major
HA Score:
Normal
Throttle (Seconds)
86400
OID:
vSTPSubSystemCongestedNotify
1. Recovery:
Description:
Subsystem prohibited.
Severity:
Major
Instance:
None
HA Score:
Normal
Throttle (Seconds)
86400
3-615
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
OID:
vSTPSubSystemProhibitedNotify
1. Recovery:
Description:
This event is generated when a remote out of service is received on a M2PA link.
Severity:
Info
Instance:
<Link name>
OID:
vSTPRemoteOOSReceivedNotify
1. Recovery:
Description;
SCCP received invalid message.
Severity:
Info
Instance:
None
HA Score:
Normal
OID:
vSTPSccpInvalidMessageReceivedNotify
1. Recovery:
3-616
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
• No action necessary.
Description:
SCCP message translation failed.
Severity:
Info
Instance:
None
HA Score:
Normal
OID:
vSTPSccpTranslationFailedNotify
1. Recovery:
• No action necessary.
Description:
SCCP Message Routing Failed
Severity:
Info
Instance:
None
HA Score:
Normal
OID:
vSTPSccpMessageRoutingFailedNotify
1. Recovery:
• No action necessary.
3-617
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
SGMG message invalid.
Severity:
Info
Instance:
None
HA Score:
Normal
OID:
vSTPScmgMessageInvalidNotify
1. Recovery:
• No action necessary.
Description:
GTT SCCP loop detected.
Severity:
Info
Instance:
None
HA Score:
Normal
OID:
vSTPGttSccpLoopDetectedNotify
1. Recovery:
• No action necessary.
3-618
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
GTT load sharing failed.
Severity:
Info
Instance:
None
HA Score:
Normal
OID:
vSTPGttLoadSharingFailedNotify
1. Recovery:
• No action necessary.
Description:
The event is generated when the GTT action (for example, DISCARD, UDTS, or
TCAP ERROR) is performed and the UIM required flag is set to Yes for the GTT
Action managed object.
Severity:
Info
Instance:
Combination of Action Set Name:Action Name
OID:
vSTPVstpGTTActionDiscardedMSUNotify
1. Recovery:
3-619
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
The event is generated when the GTT action (for example, DUPLICATE, FORWARD,
or TCAP ERROR) has failed.
Severity:
Info
Instance:
Combination of Action Set Name:Action Name
OID:
vSTPVstpGTTActionFailedNotify
1. Recovery:
Description:
This event is generated when the translation duplicate set type encountered and
fallback option is NO.
Severity:
Info
Instance:
None
OID:
vSTPVstpGTTFlobrDupSetTypeFailedNotify
1. Recovery:
3-620
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when the translation duplicate set type encountered and
fallback option is YES.
Severity:
Info
Instance
None
OID:
vSTPVstpGTTFlobrDupSetTypeWarningNotify
1. Recovery:
Description:
This event is generated when the translation duplicate set type encountered and
fallback option is NO.
Severity:
Info
Instance:
None
OID:
vSTPVstpGTTFlobrDupSetNameFailedNotify
1. Recovery:
3-621
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when the translation duplicate set type encountered and
fallback option is YES.
Severity:
Info
Instance:
None
OID:
vSTPVstpGTTFlobrDupSetNameWarningNotify
1. Recovery:
Description:
This event is generated after the maximum depth search if the translation is not
successful and fallback is NO.
Severity:
Info
Instance:
None
OID:
vSTPVstpGTTFlobrMaxSearchDepthFailedNotify
1. Recovery:
1. xxx
2. It is recommended to contact My Oracle Support if further assistance is needed.
3-622
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated after the maximum depth search if the translation is not
successful and fallback is YES.
Severity:
Info
Instance:
None
OID
vSTPVstpGTTFlobrMaxSearchDepthWarningNotify
1. Recovery:
Description:
This event is generated when any of these conditions occur:
• Unsupported SCCP Type
• ITU TCAP decoding fails
• Sequence Tag parameter is missing
• Unsupported Component Type
• Unsupported MAP Opcode received
• Unsupported MAP version received
• Unsupported TCAP Package Type
• Mandatory parameter is missing (Target MS)
• Mandatory parameter is missing (Sub Identity)
• Mandatory parameter is missing
• Invalid MAP digits
• IMSI decoding failed
• MSISDN decoding failed
3-623
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Severity:
Info
Instance:
None
OID:
vSTPVstpMBRDecodeFailedNotify
1. Recovery:
Description:
GTT Duplicate Action processing stopped.
Severity:
Major
Instance:
None
HA Score:
Normal
OID:
vSTPDuplicateActionInhibitNotify
1. Recovery:
Description:
This event is generated when an XUDT UDT conversion fails.
Severity:
Info
3-624
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Instance:
None
OID:
vSTPVstpXudtUdtConversionFailedNotify
1. Recovery:
Description:
SCCP encode failure is generated when these occur:
• Invalid GTI
• Unsupported GTI
• Invalid Data Message length
• Invalid Optional Portion Length
Severity:
Info
Instance:
None
OID:
vSTPVstpSccpEncodeFailedNotify
1. Recovery:
Description
None
Severity
Major
3-625
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Instance
None
HA Score
Normal
OID
vSTPSfappDcdErrorNotify
1. Recovery
Description
SFAPP Validation Matching State not found
Severity
Major
Instance
None
HA Score
Normal
OID
vSTPSfappTIDNotFoundNotify
1. Recovery
Description
SFAPP Validation Encoding Error
Severity
Major
Instance
None
HA Score
Normal
3-626
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
OID
vSTPSfappEcdErrorNotify
1. Recovery
Description
SFAPP Validation Response Timeout Error
Severity
Major
Instance
None
HA Score
Normal
OID
vSTPSfappRspTimeoutNotify
1. Recovery
Description
SFAPP Validation Velocity Chk Failed.
Severity
Major
Instance
None
HA Score
Normal
OID
vSTPSfappThreshExcdNotify
1. Recovery
3-627
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
SFAPP Validation Failed
Severity
Major
Instance
None
HA Score
Normal
OID
vSTPSfappValidationFailedNotify
1. Recovery
Description
SFAPP Invalid CC/NDC received
Severity
Major
Instance
None
HA Score
Normal
OID
vSTPSfappInvalidCCNDCreceivedNotify
1. Recovery
3-628
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
Updation failed in UDR
Severity
Major
Instance
None
HA Score
Normal
OID
vSTPVstpUpdationFailedinUDRNotify
1. Recovery
Description:
This event is generated when the percent utilization of the vSTP MP's SFAPP Event
Queue is approaching its maximum capacity.
Severity:
Major
Instance:
None
OID:
vSTPSfappEventQueueUtilNotify
• The event is cleared when the percent utilization of the VSTP MP's SFAPP Event
Queue comes back to normal. It is recommended to contact My Oracle Support if
further assistance is needed.
Description:
This event is generated when the MNP length of the conditioned digit is invalid.
Severity:
Info
3-629
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Instance:
None
OID:
vSTPVstpSrvcInvDgtLenNotify
1. xxx
2. It is recommended to contact My Oracle Support if further assistance is needed.
Description:
This event is generated when NC is not defined
Severity:
Info
Instance:
None
OID:
vSTPVstpSrvcDfltNcNotDfnNotify
Description:
This event is generated when a loop is detected
Severity:
Info
Instance:
None
OID:
vSTPVstpGportLoopDetectedNotify
3-630
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when the MNP translated PC type is ANSI.
Severity:
Info
Instance:
None
OID:
vSTPVstpPcTypeAnsiNotify
Description:
This event is generated when there is an invalid MSISDN for SRI or SRISM.
Severity:
Info
Instance:
None
OID:
vSTPVstpInvMsisdnDgtNotify
Description:
This event is generated when the prefix/suffix digit length is more than 21 digits.
3-631
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Severity:
Info
Instance:
None
OID:
vSTPVstpSrvcInvPrefxLenNotify
Description:
This event is generated when MNP is xlated to EAGLE TPC.
Severity:
Info
Instance:
None
OID:
vSTPVstpSrvcXlatedPcIsEagleTpcNotify
Description:
This event is generated when MNP CGPA GTA xlation is crossing the domain.
Severity:
Info
Instance:
None
3-632
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
OID:
vSTPVstpSccpRtnXingDomainNotify
1. Recovery:
Description
DRA digits have exceeded INAP_MAX_CDPN_DIGITS (32)
Severity
Major
Instance
None
HA Score
Normal
OID
VstpTooManyDigitDRA
1. Recovery
Description
Failed to encode the CGPN for IDPR Feature
Severity
Major
Instance
None
HA Score
Normal
OID
VstpIdprCgpnEcdError
1. Recovery
3-633
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
Failed to encode the CDPN for IDPR Feature
Severity
Major
Instance
None
HA Score
Normal
OID
VstpIdprCdpnEcdError
1. Recovery
Description
IDPRCDPN(X) NPP SERVICE is OFF
Severity
Major
Instance
None
HA Score
Normal
OID
VstpIdprCdpnNppServiceOff
1. Recovery
3-634
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
IDPRCGPN NPP SERVICE is OFF
Severity
Major
Instance
None
HA Score
Normal
OID
VstpIdprCgpnNppServiceOff
1. Recovery
Description
DESTINATION ADDRESS DECODING is FAIL
Severity
Major
Instance
None
HA Score
Normal
OID
VstpDestAddrDcdFail
1. Recovery
Description
TCAP ENCODING is FAIL
Severity
Major
3-635
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Instance
None
HA Score
Normal
OID
VstpTcapEncFail
1. Recovery
Description
OUT OF BOUND DIGIT
Severity
Major
Instance
None
HA Score
Normal
OID
VstpOutBoundDigit
1. Recovery
Description
SMS MANDATORY PARAMETER MISSING
Severity
Major
Instance
None
HA Score
Normal
3-636
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
OID
VstpSMSMandParamMiss
1. Recovery
Description
ADDRESS DECODING is FAIL
Severity
Major
Instance
None
HA Score
Normal
OID
VstpAddrDcdFail
1. Recovery
Description
MNPCDPA MATCHES HOME SMSC
Severity
Major
Instance
None
HA Score
Normal
OID
VstpMnpCdpaMatchHomeSmsc
1. Recovery
3-637
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
SCCP XUDT Reassembly Failure
Severity
Major
Instance
None
HA Score
Normal
OID
1. Recovery
Description
SCCP XUDT Segmentation Failure
Severity
Major
Instance
None
HA Score
Normal
OID
1. Recovery
3-638
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when vSTP has received a notification from HA that the
Maintenance Leader resource should transition to the Active role.
Severity:
Info
Instance:
None
OID:
vSTPVstpMpLeaderGoActiveNotificationNotify
1. Recovery:
Description:
This event is generated when vSTP received a notification from HA that the
Maintenance Leader resource should transition to the OOS role.
Severity:
Info
Instance:
None
OID:
vSTPVstpMpLeaderGoOOSNotificationNotify
1. Recovery:
Description:
vSTP routing DB inconsistencies exist among the DA-MPs in the DSR signaling NE.
Severity:
Critical
3-639
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Instance:
Table Name
HA Score:
Normal
Throttle (Seconds)
86400
OID:
vSTPVstpRoutingDbInconsistencyExistsNotify
1. Recovery:
Description:
This event is generated when a vSTP DB table monitoring overrun has occurred. The
COMCOL update synchronization log used by DB Table monitoring to synchronize
routing DB among all DA-MP RT-DBs has overrun. The vSTP-MPs routing DB sharing
table is automatically audited and re-synchronized to correct any inconsistencies.
Severity:
Info
Instance:
<Table Name>
OID:
vSTPVstpTblMonCbOnLogOverrunNotify
1. Recovery:
Description:
This event is generated when an unexpected error occurred during DB table
monitoring.
3-640
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Severity:
Info
Instance:
<Thread Name>
OID:
vSTPVstpSldbMonAbnormalErrorNotify
1. Recovery:
Description:
This event is generated when the egress STP MP is unavailable or congested.
Severity:
Info
Instance:
<Ingress STP-MP hostname>
OID:
vSTPPeerMPUnavlblOrCngstedNotify
1. Recovery:
Description:
This event is generated when:
• no active vSTP-MP leaders are reported by the maintenance leader
• there is a single vSTP-MP and the DSR process is stopped
• there are multiple vSTP-MPs, the DSR process is stopped, and there is a
ComAgent connection failure between two or more vSTP-MPs.
3-641
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Severity:
Info
Instance:
<Network Element>
OID:
vSTPNoVstpMpLeaderDetectedNotify
1. Recovery:
Description:
This event is generated when:
• more than one vSTP-MP reports themselves as leader.
• the DSR process is running on all vSTP-MPs and the ComAgent connection is
down between two or more DA-MPs
The alarm clears when the maintenance leader reports a single active DA-MP leader.
Severity:
Info
Instance:
<Network Element>
OID:
vSTPMultipleVstpMpLeadersDetectedNotify
1. Recovery:
Description:
This event is generated when there are a critical number of fixed connection alarms
for the vSTP-MP.
3-642
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Severity:
Info
Instance:
<vSTP-MP-Hostname>
OID:
vSTPConnectionAlarmAggregationThresholdReachedNotify
1. Recovery:
Description:
This event is generated when the number of critical link alarms for a single network
element exceeds the configurable alarm threshold.
Severity:
Info
Instance:
<Network Element>
OID:
vSTPLinkAlarmAggregationThresholdReachedNotify
1. Recovery:
Description:
This event is generated when the number of critical linkset alarms for a single network
element exceeds the configurable alarm threshold.
Severity:
Info
Instance:
<Network Element>
3-643
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
OID:
vSTPLinksetAlarmAggregationThresholdReachedNotify
1. Recovery:
Description:
This event is generated when the number of critical route alarms for a single network
element exceeds the configurable alarm threshold.
Severity:
Info
Instance:
<Network Element>
OID:
vSTPRouteAlarmAggregationThresholdReachedNotify
1. Recovery:
Description:
This event is generated when the number of critical RSP alarms for a single network
element exceeds the configurable alarm threshold
Severity:
Info
Instance:
<Network Element>
OID:
vSTPRspAlarmAggregationThresholdReachedNotify
3-644
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
1. Recovery:
Description:
This event is generated when vSTP is unable to complete the signaling link test
message exchange due to any of these reasons:
• No Response
• Invalid Point Code (DPC)
• No route to APC on linkset
• Invalid Point Code (OPC)
• Invalid Linkset
• Bad data patteren
• Invalid SLC
Severity:
Minor
Instance:
<Link Name>
OID:
vSTPSltcFailureInvalidSlcNotify
1. Recovery:
Description:
This event is generated when vSTP receives an unexpected TFA message due to any
of these reasons:
• TFA received for Unknown Affected Point Code
• TFA is not generated from the adjacent node
• No Route Configured to Affected Point Code using linkset
3-645
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Severity:
Info
Instance:
None
OID:
vSTPUnexpectedTfaReceivedNotify
1. Recovery:
Description:
This event is generated when vSTP receives an unexpected TFR message due to
any of these reasons:
• TFR is not supported for ITUI domain
• TFR received for Unknown Affected Point Code
• TFR is not generated from the adjacent node
• No Route configured for Affected Point Code using linkset
• Duplicate TFR Received
Severity:
Info
Instance:
None
OID:
vSTPUnexpectedTfrReceivedNotify
1. Recovery:
3-646
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when vSTP receives an unexpected TFP message due to any
of these reasons:
• TFP received for Unknown Affected Point Code
• TFP is not generated from the adjacent node
• No Route configured for Affected Point Code using linkset
• Duplicate TFP Received
Severity:
Info
Instance:
None
OID:
vSTPUnexpectedTfpReceivedNotify
1. Recovery:
Description:
This event is generated when vSTP receives an unexpected TFC message due to
any of these reasons:
• TFC received with congestion level 0
• TFC received for Unknown Affected Point Code
• TFC received for Unavailable Affected Point Code
Severity:
Info
Instance:
None
OID:
vSTPUnexpectedTfcReceivedNotify
1. Recovery:
3-647
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when vSTP finds an invalid H0 or H1 code in the message
due to any of these reasons:
• Invalid H0 code
• Invalid H1 code
Severity:
Info
Instance:
None
OID:
vSTPInvalidH0H1CodeNotify
1. Recovery:
Description:
This event is generated when vSTP generates a TFC message for congested point
code.
Severity:
Info
Instance:
None
Throttle Seconds:
10
OID:
vSTPTfcGeneratedNotify
1. Recovery:
3-648
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when vSTP performs a changeover.
Severity:
Info
Instance:
<Link Name>
OID:
vSTPReceivedCOONotify
1. Recovery:
Description:
This event is generated when vSTP performs an emergency changeover.
Severity:
Info
Instance:
<Link Name>
OID:
vSTPECOPerformedNotify
1. Recovery:
3-649
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when the changeback timer (for example, T5 timer) expires.
Severity:
Info
Instance:
None
OID:
vSTPCbTimerExpiredNotify
1. Recovery:
Description:
This event is generated when vSTP receives a user part unavailable (USP) message
due to any of these reasons:
• SCCP user unavailable, cause unknown
• User part is not SCCP
Severity:
Info
Instance:
None
OID:
vSTPUpuReceivedNotify
1. Recovery:
Description:
Remote blocked.
3-650
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Severity:
Minor
Instance:
None
HA Score:
Normal
Throttle (Seconds)
86400 (this is a latched alarm so 1-day throttling has the same effect as the old
LcEcon)
OID:
vSTPRemoteBlockedNotify
1. Recovery:
Description:
Limited access to the SS7 Destination Point Code because the RSP status is
restricted.
Severity:
Minor
Instance:
<RSP Name>
HA Score:
Normal
Throttle (Seconds)
86400 (this is a latched alarm so 1-day throttling has the same effect as the old
LcEcon)
OID:
vSTPMtp3RspRestrictedNotify
1. Recovery:
3-651
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
Limited access to the SS7 destination point code using this route because its
restricted.
Severity:
Minor
Instance:
<Route Name>
HA Score:
Minor
Throttle (Seconds)
86400 (this is a latched alarm so 1-day throttling has the same effect as the old
LcEcon)
OID:
vSTPMtp3RouteRestrictedNotify
1. Recovery:
Description:
This event is generated when an MSU was discarded due to screening.
Severity:
Info
Instance:
None
OID:
vSTPVstpMsuDiscardDueToScrNotify
1. Recovery:
3-652
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event generates when vSTP receives an ANSI to ITU CDPA GT conversion
falure. This happens when an entry in the default GT Conversion Table could not be
found to match the incoming ANSI message's Translation Type in the Calling Party
Address parameter when the GTCNVDFLT M3rl option is not enabled.
Severity:
Info
Instance:
None
OID:
vSTPVstpAICdTtMismatchNotify
1. Recovery:
Description:
This event generates when vSTP receives an ANSI to ITU CGPA GT conversion
falure. This happens when an entry in the default GT Conversion Table could not be
found to match the incoming ANSI message's Translation Type in the Calling Party
Address parameter when the GTCNVDFLT M3rl option is not enabled.
Severity:
Info
Instance:
None
OID:
vSTPVstpAICgTtMismatchNotify
1. Recovery:
3-653
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
An entry in the default GT Conversion Table could not be found to match the
incoming ITU message's NP/NAI/TT in the Called Party Address parameter when
the GTCNVDFLT M3rl Option is not enabled.
Severity:
Info
Instance:
None
OID:
vSTPVstpIACdTtMismatchNotify
1. Recovery:
Description:
This event is generated when an entry in the default GT Conversion Table could
not be found to match the incoming ITU message's NP/NAI/TT in the Calling Party
Address parameter when the GTCNVDFLT M3rl Option is not enabled.
Severity:
Info
Instance:
None
OID:
vSTPVstpIACgTtMismatchNotify
1. Recovery:
3-654
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when no alias PC of the destination type for the affected point
code is found.
Severity:
Info
Instance:
None
OID:
vSTPVstpAftPcCnvFailNotify
1. Recovery:
Description:
This event is generated when no alias PC of the destination network type for the OPC
is found.
Severity:
Info
Instance:
None
OID:
vSTPVstpM3rlOpcCnvFailNotify
1. Recovery:
3-655
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when no alias PC of the destination network type for the
CGPA PC is found, and the discard CGPA PC option for the destination network type
is off.
Severity:
Info
Instance:
None
OID:
vSTPVstpCgPcAlsUndefinedNotify
1. Recovery:
Description:
This event is generated when the SCCP MSU total length after conversion is greater
than supported message length.
Severity:
Info
Instance:
None
OID:
vSTPVstpInvMsgLengthNotify
1. Recovery:
3-656
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when the segmentation optional parameter length is incorrect
for the message undergoing ANSI/ITU SCCP conversion.
Severity:
Info
Instance:
None
OID:
vSTPVstpInvSegParLengthNotify
1. Recovery:
Description:
This event is generated when a message error is found during the encoding of SCCP
message due to incorrect CDPA, CGPA, or SCCP data message parameter length.
Severity:
Info
Instance:
None
OID:
vSTPVstpInvSccpEleLenErrorNotify
1. Recovery:
3-657
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when an incloming linkset and outgoing message linkset is
same; or when the OPC in the message is configured as self PC for the MTP routed
message.
Severity:
Info
Instance:
None
OID:
vSTPVstpmtp3LoopDetectedNotify
1. Recovery:
Description:
This event is generated when the SCMG message type is invalid.
Severity:
Info
Instance:
None
OID:
vSTPVstpInvScmgMsgTypeNotify
1. Recovery:
3-658
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when the CPC type is not STP and the application is not
provisioned for that CPC type.
Severity:
Info
Instance:
None
OID:
vSTPVstpSCCPAppMSUDiscardedNotify
1. Recovery:
Description
Sccp Egress Tps Threshold Crossed.
Severity
Major
Instance
<AssocName>
HA Score
Normal
OID
vSTPVstpSccpEgressTpsThresholdNotify
1. Recovery
3-659
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when an ACN object identifier length is greater than 32.
Severity:
Info
Instance:
None
OID:
VstpInvAcnLenNotify
1. Recovery:
Description:
This event is generated when there is an invalid INAP Called Party Number and no
parameter sequence.
Severity:
Info
Instance:
None
OID:
VstpFailtoDecodeInapParamNotify
1. Recovery:
3-660
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description:
This event is generated when the INAP Called Party Number is missing.
Severity:
Info
Instance:
None
OID:
VstpFailtoDecodeInapParamNotify
1. Recovery:
Description
Unexpected SI in TIF Stop Action
Severity
Major
Instance
None
HA Score
Normal
OID
VstpTifUnexpectedSi
1. Recovery
Description
Modified MSU too large to route
Severity
Major
3-661
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Instance
None
HA Score
Normal
OID
VstpTifRouteFailed
1. Recovery
Description
ISUP IAM Decode Failed
Severity
Major
Instance
None
HA Score
Normal
OID
VstpIsupDcdFailed
1. Recovery
Description
ISUP IAM Cld Pty decode failed
Severity
Major
Instance
None
HA Score
Normal
3-662
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
OID
VstpIsupDcdCdpaFailed
1. Recovery
Description
ISUP Encode Failed
Severity
Major
Instance
None
HA Score
Normal
OID
VstpIsupEcdFailed
1. Recovery
Description
TIF CgPN NS Failure: CC mismatch in DN
Severity
Major
Instance
None
HA Score
Normal
OID
VstpIsupEcdFailed
1. Recovery
3-663
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
VLR Status changed
Severity
Major
Instance
None
HA Score
Normal
OID
VstpDynVlrStatusChanged
1. Recovery
Description
Velocity Threshold Crossed
Severity
Major
Instance
None
HA Score
Normal
OID
VstpDynVeloThreshCrossed
1. Recovery
3-664
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
Dynamic VLR Profile Aging
Severity
Major
Instance
None
HA Score
Normal
OID
VstpDynVLRProfAging
1. Recovery
Description
Dynamic VLR Roaming Aging
Severity
Major
Instance
None
HA Score
Normal
OID
VstpDynVLRRoamAging
1. Recovery
Description
Vstp Dynamic learning is turned OFF
Severity
Major
3-665
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Instance
None
HA Score
Normal
OID
VstpVlrDynLearningOFF
1. Recovery
Description
Vstp Dynamic learning LEARN Mode Timer Expired
Severity
Major
Instance
None
HA Score
Normal
OID
VstpVlrDynLearningLearntimer
1. Recovery
Description
Vstp Dynamic learning Profile Table Full
Severity
Major
Instance
None
HA Score
Normal
3-666
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
OID
VstpVlrDynProfileTableFull
1. Recovery
Description
Vstp Dynamic learning Roaming Table Full
Severity
Major
Instance
None
HA Score
Normal
OID
VstpVlrDynProfileTableFull
1. Recovery
Description
The percent utilization of the VSTP MP's Security Logging Stack Event Queue is
approaching its maximum capacity.
Severity
Major
Instance
None
HA Score
Normal
OID
VstpSecuLogEventQueue
1. Recovery
3-667
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
vSTP error in logging security logs to csv file in MP.
Severity
Major
Instance
None
HA Score
Normal
OID
VstpSecuLogError
1. Recovery
Description
Vstp Security Log File fetching from MP failed.
Severity
Major
Instance
None
HA Score
Normal
OID
VstpSecuLogFetchError
1. Recovery
3-668
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
Vstp Security Log File fetching from Active SO to Remote Server failed.
Severity
Major
Instance
None
HA Score
Normal
OID
VstpSecuLogRemoteServerError
1. Recovery
70446 - VstpServiceStackEventQueueUtil
Alarm Group
vSTP
Description
The percent utilization of the VSTP MPs Service Stack Event Queue is approaching
its maximum capacity.
Severity
Major
Instance
None
HA Score
Normal
OID
1. Recovery
70451 - serviceMpUnavailable
Alarm Group
vSTP
Description
Service MP not available, can't send message to Service Mp.
Severity
Major
3-669
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Instance
None
HA Score
Normal
OID
VstpSecuLogRemoteServerError
1. Recovery
Description
Ack Message for which Transaction are not found in both Originator and Termination
Side at Service MP
Severity
Major
Instance
None
HA Score
Normal
OID
smsProxyAckTransNotFnd
1. Recovery
Description
SCCP Validation failed in Service MP due to inconsistency between sccp cdpa and
tcap smrpda
Severity
Major
Instance
None
HA Score
Normal
3-670
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
OID
smsProxySccpValidFail
1. Recovery
Description
Service Validation Response Timeout Error
Severity
Major
Instance
None
HA Score
Normal
OID
smsProxyValRspTimeout
1. Recovery
Description
SMS Proxy Message Validation Failed
Severity
Major
Instance
None
HA Score
Normal
OID
smsProxyValidationFailed
1. Recovery
3-671
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
Service Validation Encoding Error
Severity
Major
Instance
None
HA Score
Normal
OID
smsProxyEcdError
1. Recovery
Description
SMS Proxy Message Validation Decoding Error.
Severity
Major
Instance
None
HA Score
Normal
OID
smsProxyDcdErrort
1. Recovery
3-672
Chapter 3
vSTP Alarms and Events (70000-70060, 70100-70999)
Description
Service SMSC Blocklist
Severity
Major
Instance
None
HA Score
Normal
OID
smsProxyBlocklist
1. Recovery
Description
Service SMSC Allowlist
Severity
Major
Instance
None
HA Score
Normal
OID
smsProxyAllowlist
1. Recovery
Description
DOS Timer waits for Delivery Report SM message and on timeout raises this event
Severity
Major
3-673
Chapter 3
Diameter Equipment Identity Register (EIR) (71000-71999)
Instance
None
HA Score
Normal
OID
Vstp smsProxyDosInvkTimeout
1. Recovery
Description
MTFSM Invoke Timer waits for MTFSM message and on timeout raises this event
Severity
Major
Instance
None
HA Score
Normal
OID
smsProxyMtfsmInvkTimeout
1. Recovery
Description
EIR application failed to decode the request.
Severity
N/A
Instance
MP hostname
3-674
Chapter 3
Diameter Equipment Identity Register (EIR) (71000-71999)
HA Score
Normal
Throttle Seconds
10
OID
N/A
• Make sure the length of the IMEI and IMSI numbers are correct.
Description
ECA routing attempt failed due to DRL queue exhaustion.
Severity
N/A
Instance
MP hostname
HA Score
Normal
Throttle Seconds
10
OID
NA
Description
EIR application failed to encode the answer.
Severity
N/A
Instance
MP hostname
HA Score
Normal
3-675
Chapter 3
Diameter Equipment Identity Register (EIR) (71000-71999)
Throttle Seconds
10
OID
NA
Description
EIR Application is Unavailable.
Severity
Critical
Instance
MP hostname
HA Score
Normal
Throttle Seconds
86400
OID
NA
Description
ComAgent connection between DSR EIR and UDR is down.
Severity
Critical
Instance
MP hostname
HA Score
Normal
Throttle Seconds
86400
3-676
Chapter 3
Diameter Equipment Identity Register (EIR) (71000-71999)
OID
NA
Description
The Message rate is exceeding the supported TPS for DSR EIR application.
Severity
Minor/Major/Critical
Instance
MP hostname
HA Score
Normal
Throttle Seconds
86400
OID
NA
Description
The DSR EIR Logging is suspended.
Severity
Major
Instance
MP hostname
HA Score
Normal
Throttle Seconds
86400
OID
NA
1. Make sure the log file and directory are still accessible.
2. Make sure there is enough disk space for the log file.
3-677
Chapter 3
Diameter Equipment Identity Register (EIR) (71000-71999)
Description
EIR request queue utilization threshold exceeded.
Severity
Minor/Major/Critical
Instance
MP hostname
HA Score
Normal
Throttle Seconds
86400
OID
NA
Description
EIR UDR response queue utilization threshold exceeded.
Severity
Minor/Major/Critical
Instance
MP hostname
HA Score
Normal
Throttle Seconds
86400
OID
NA
3-678
Chapter 3
Diameter Equipment Identity Register (EIR) (71000-71999)
Description
EIR Application is congested.
Severity
Major
Instance
MP hostname
HA Score
Normal
Throttle Seconds
86400
OID
NA
Description
ComAgent routing service registration or service notification registration failed, EIR
cannot use the ComAgent service for database queries.
Severity
Critical
Instance
MP hostname
HA Score
Normal
Throttle Seconds
86400
OID
NA
3-679
Chapter 3
Diameter Equipment Identity Register (EIR) (71000-71999)
Description
Fetching of EIR logs failed at SO.
Severity
Major
Instance
MP hostname
HA Score
Normal
Throttle Seconds
86400
OID
NA
3-680
4
Key Performance Indicators (KPIs)
This section provides general information about KPIs and lists the KPIs that can
appear on the Status & Manage > KPIs GUI page.
KPIs Overview
Key Performance Indicators (KPIs) allow you to monitor system performance data,
including CPU, memory, swap space, disk space, shared memory, and uptime per
server. This performance data is collected from all servers within the defined topology.
The KPI display function resides on all OAM servers. Servers that provide a GUI
connection rely on KPI information merged to that server. The Network OAMP servers
maintain status information for all servers in the topology. System OAM servers have
reliable information only for servers within the same network element.
The Status and Manage KPIs page displays performance data for the entire system.
KPI data for the entire system is updated every 60 seconds. If data is not currently
being collected for a particular server, the KPI for that server will be shown as N/A.
KPIs
The Status & Manage, and then KPIs page displays KPIs for the entire system. KPIs
for the server and its applications are displayed on separate tabs. The application KPIs
displayed may vary according to whether you are logged in to an NOAM server or an
SOAM server.
4-1
Chapter 4
General KPIs information
Viewing KPIs
Use this procedure to view KPI data.
1. Navigate to Status & Manage, and then KPIs.
For details about the KPIs displayed on this page, see the application
documentation.
2. Click KPI Filter and specify filter options to see KPI data relevant to an
application.
Note:
The application KPIs displayed may vary according to whether you are
logged in to an NOAM server or an SOAM server. Collection of KPI
data is handled solely by NOAM servers in systems that do not support
SOAMs.
4-2
Chapter 4
General KPIs information
Exporting KPIs
You can schedule periodic exports of security log data from the KPIs page. KPI data
can be exported immediately, or you can schedule exports to occur daily or weekly. If
filtering has been applied in the KPIs page, only filtered data is exported.
During data export, the system automatically creates a CSV file of the filtered data.
The file will be available in the file management area until you manually delete it, or
until the file is transferred to an alternate location using the Export Server feature. For
more information about using Export Server, see Data Export.
4-3
Chapter 4
General KPIs information
Note:
When a KPI is exported to a CSV file, each KPI column name is prefixed
with an appropriate Group name. For example, KPI related to Diameter is
displayed as [Diameter]MsgCopy Queue Utilization.
Note:
Time of Dayis not an option ifExport Frequency equalsOnce.
Note:
Day of Week is not an option if Export Frequency equals Once.
4-4
Chapter 4
Computer Aided Policy Making (CAPM) KPIs
Variable Description
Processing time [µSEC] Average processing time (in microseconds) of
Rule Template on a per Rule Template basis.
Active Templates Number of Rule Templates that are in Active
state.
Test Templates Number of Rule Templates that are in Test
state.
Development Templates Number of Rule Templates that are in
Development state.
Match Rule References one element in the arrayed
measurement.
Variable Description
User Data Ingress message rate The number of User Data Stack Events
received by ComAgent.
Broadcast Data Rate The overall data broadcast rate on the server.
Variable Description
DcaCustomMeal.name DcaCustomMeal.kpiDescr
4-5
Chapter 4
Diameter (DIAM) KPIs
Variable Description
Ingress Message Rate Average Ingress Message Rate (messages
per second) of Diameter messages received
by the DCA Application
U-SBR Query Rate Average U-SBR Query Rate (Stack Events per
second successfully sent to the U-SBR
Runtime Errors Rate Instant Runtime Error Rate (runtime errors per
second during the last sampling interval)
U-SBR Query Failure Rate Average rate of ComAgent errors encountered
when attempting to send an U-SBR query
Transactions Error Answer Diameter transactions that a DCA App relay
answers with error
Completed Transactions Diameter transactions that a DCA App
successfully relays
Transactions Discard Request Diameter transactions that a DCA App
terminates by discarding the request
Max Perl Main Opcodes Maximum number of opcodes executed by the
Perl script main part
Max Perl Handler Opcodes Maximum number of opcodes executed by the
Perl script event handlers
Opcode Quota Exceed Diameter transactions that a DCA App
terminates per second because the maximum
number of opcodes is exceeded
Variable Description
MsgCopyTxQueueUtilization Percentage of utilization of the Message Copy
Tx Queue
Average Response Time The average time from when routing receives
a request message from a peer to when
routing sends an answer message to that peer.
Transaction Success Rate Percentage of Diameter and RADIUS
transactions successfully completed on a DA-
MP server with respect to the offered load.
DP KPIs
Table 4-8 DP KPIs
Variable Description
DpsQueryRate Total number of queries received per second
4-6
Chapter 4
Equipment Identity Register (EIR) KPIs
Variable Description
DpsMsisdnQueryRate Total number of MSISDN queries received per
second
DpsImsiQueryRate Total number of IMSI queries received per
second
DpsNaiQueryRate Total number of NAI queries received per
second
DpsExtIdQueryRate The total number of External Identifier Queries
Received per second
DpsFailedQueryRate Total number of queries failed per second
DpsNotFoundQueryRate Total number of queries with Not Found
responses per second
DpsMsisdnNotFoundQueryRate Total number of MSISDN queries with Not
Found responses per second
DpsImsiNotFoundQueryRate Total number of IMSI queries with Not Found
responses per second
DpsNaiNotFoundQueryRate Total number of NAI queries with Not Found
responses per second
DpsNExtIdNotFoundQueryRate The total number of External Identifier Queries
with NotFound Responses per second
DpsResponseSent Total number of responses sent per second
DpsIngressQueue DP Ingress Queue percentage full
DpsMsisdnBlacklistedRate Total number of MSISDN Queries with
Blacklisted Responses per second
DpsImsiBlacklistedRate Total number of IMSI Queries with Blacklisted
Responses per second
4-7
Chapter 4
Equipment Identity Register (EIR) KPIs
4-8
Chapter 4
IDIH KPIs
IDIH KPIs
The KPI values associated with the IDIH will be visible via the GUI Status & Manage,
and then KPIs
Variable Description
DSR-DIH TTR Bandwidth (KB/sec) Average bandwidth used by DSR in sending
TTRs (including trace start and stop
messages) to DIH in Kbytes per second
Variable Description
CPU % Total CPU used by the IPFE process
Memory Total Absolute memory used by the IPFE process
Memory % Percent memory used by the IPFE process
Mem. Heap Total heap allocated by the IPFE process
IPFE Packets/Sec The average number of packets per second
the IPFE receives
IPFE MBytes/Sec The average number of megabytes per second
the IPFE receives
Variable Description
Avg CPU Utilization Percentage of CPU utilization by the Diameter
process on a DA-MP server.
Offered Load (MPS) Offered load on a DA-MP server,
corresponding to the message rate before
policing by capacity and congestion controls.
Accepted Load (MPS) Accepted load on a DA-MP server,
corresponding to the message rate after
policing by capacity and congestion controls.
4-9
Chapter 4
Full Address Based Resolution (FABR) KPIs
Variable Description
Messsage Processing Load (MPS) Average message processing load (messages
per second) on a MP server. The message
processing load is the number of Diameter
messages that are routed, including Reroute
and MsgCopy.
Variable Description
Ingress Message Rate Ingress Message Rate (messages per second)
utilization on a MP server for the FABR
application. The Ingress Message Rate is
the number of ingress Diameter messages
that were successfully received by the FABR
application.
Resolved Message Rate Resolved Message Rate (messages per
second) utilization on a MP server. The
Resolved Message Rate is the number
of ingress Diameter messages that are
successfully resolved to a Destination by the
FABR application.
DP Response Time Average Average DP response time is the average
time (in milliseconds) it takes to receive a DP
response after sending the corresponding DP
query.
Platform KPIs
The KPI values associated with Platform are available using Status & Manage, and
then KPIs.
Variable Description
CPU Percentage utilization of all processors on the
server by all software as measured by the
operating system.
RAM Percentage utilization of physical memory on
the server by all software as measured by
TPD.
Swap Percentage utilization of swap space on the
server by all software as measured by TPD.
4-10
Chapter 4
Policy and Charging Application (PCA) KPIs
Variable Description
Uptime The total amount of time(days HH:MM:SS) the
server has been running.
Variable Description
PCA Ingress Message Rate Number of Diameter messages including both
requests and answers received by PCA from
the Diameter Routing Layer per second.
P-DRA Ingress Message Rate Number of Diameter messages including both
requests and answers received by P-DRA from
the Diameter Routing Layer per second.
OC-DRA Ingress Message Rate Number of Diameter messages including both
requests and answers received by OC-DRA
from the Diameter Routing Layer per second.
Process-based KPIs
Table 4-17 Process-based KPIs
Variable Description
provimport.Cpu CPU usage of provimport process
provimport.MemHeap Heap memory usage of provimport process
provimport.MemBasTotal Memory usage of provimport process
provimport.MemPerTotal Percent memory usage of provimport process
provexport.Cpu CPU usage of provexport process
provexport.MemHeap Heap memory usage of provexport process
provexport.MemBasTotal Memory usage of provexport process
provexport.MemPerTotal Percent memory usage of provexport process
pdbrelay.Cpu CPU usage of pdbrelay process
pdbrelay.MemHeap Heap memory usage of pdbrelay process
pdbrelay.MemBasTotal Memory usage of the pdbrelay process
pdbrelay.MemPerTotal Percent memory usage of pdbrelay process
pdbaudit.Cpu CPU usage of pdbaudit process
pdbaudit.MemHeap Heap memory usage of pdbaudit process
pdbaudit.MemBasTotal Memory usage of the pdbaudit process
pdbaudit.MemPerTotal Percent memory usage of pdbaudit process
pdba.Cpu CPU usage of pdba process
pdba.MemHeap Heap memory usage of pdba process
4-11
Chapter 4
Provisioning KPIs
Variable Description
pdba.MemBasTotal Memory usage of pdba process
pdba.MemPerTotal Percent memory usage of pdba process
xds.Cpu CPU usage of xds process
xds.MemHeap Heap memory usage of xds process
xds.MemBasTotal Memory usage of xds process
xds.MemPerTotal Percent memory usage of xds process
dpserver.Cpu CPU usage of dpserver process on DP
dpserver.MemHeap Heap memory usage of dpserver process on
DP
dpserver.MemBaseTotal Memory usage of the dpserver process on DP
dpserver.MemPerTotal Percent memory usage of dpserver on DP
era.Cpu CPU usage of era process
era.MemHeap Heap memory usage of era process
era.MemBasTotal Memory usage of era process
era.MemPerTotal Percent memory usage of era process
Provisioning KPIs
Table 4-18 Provisioning KPIs
Variable Description
ProvConnections The number of provisioning client connections
currently established. A single connection
includes a client having successfully
established a TCP/IP connection, sent a
provisioning connect message, and having
received a successful response.
ProvMsgsReceived The number of provisioning messages per
second that have been received from all
sources except import files.
ProvMsgsImported The number of provisioning messages per
second imported from files.
ProvMsgsSuccessful The number of provisioning messages per
second that have been successfully processed
and a success response sent to the requestor.
ProvMsgsFailed The number of provisioning messages per
second that have failed to be processed due
to errors and a failure response sent to the
requestor.
ProvMsgsSent The number of provisioning message
responses sent per second to the requestor.
ProvMsgsDiscarded The number of provisioning messages
discarded per second. provisioning messages
are discarded due to connection shutdown,
server shutdown, server‘s role switching from
active to standby, or transaction not becoming
durable within the allowed amount of time.
4-12
Chapter 4
Range Based Address Resolution (RBAR) KPIs
Variable Description
ProvTxnCommitted The number of provisioning transactions per
second that have been successfully committed
to the database (memory and on disk) on the
active server of the primary SDS cluster.
ProvTxnFailed The number of provisioning transactions
per second that have failed to be started,
committed, or aborted due to errors.
ProvTxnAborted The number of provisioning transactions
aborted per second.
ProvTxnActive The number of provisioning transactions that
are currently active (normal transaction mode
only).
ProvTxnNonDurable The number of transactions that have been
committed, but are not yet durable. Responses
for the associated requests are not sent until
the transaction has become durable.
ProvRelayMsgsSent The number of relayed provisioning messages
sent per second.
ProvRelayMsgsSuccessful The number of relayed provisioning messages
per second that were successful at the HLRR.
ProvRelayMsgsFailed The number of relayed provisioning messages
per second that failed at the HLRR.
ProvRemoteAuditMsgsSent The number of IMSI and MSISDN records
audited per second.
ProvRelayTimeLag Time in seconds between timestamps of last
record PdbRelay processed and latest entry in
the Command Log.
ProvDbException The number of DB Exception errors per
second.
Variable Description
Avg Resolved Message Rate Average Resolved Message Rate (messages
per second) utilization on a MP server.
The Resolved Message Rate is the number
of ingress Diameter messages that are
successfully resolved to a Destination by the
RBAR application.
4-13
Chapter 4
SCEF KPIs
Variable Description
Ingress Message Rate Average Ingress Message Rate (messages
per second) utilization on a MP server for this
DSR application. The Ingress Message Rate
is the number of ingress Diameter messages
that were successfully received by the DSR
application.
SCEF KPIs
The KPI values associated with SCEF are visible using Status & Manage, and then
KPIs.
Variable Description
NIDD Message Processing Rate The number of messages processed every second by the
NIDD feature of SCEF application
NIDD CMR Message The total number of NIDD CMR messages processed by
the NIDD feature of SCEF application
NIDD NIR Message The total number of NIDD NIR messages processed by the
NIDD feature of SCEF application
NIDD TDR Message The total number of NIDD TDR messages processed by
the NIDD feature of SCEF application
NIDD ODR Message The total number of NIDD ODR messages processed by
the NIDD feature of SCEF application
Monitoring Message Rate The number of messages processed every second by the
Monitoring feature of SCEF application
Enhanced Coverage Message The number of messages processed every second by the
Rate Enhanced Coverage feature of SCEF application
DT Message Processing Rate The number of messages processed every second by the
Device Trigger feature of SCEF application
Variable Description
Monitoring CFG Requests Rate Rate at which SCS/AS is submitting T8 Monitoring
Configuration Requests to SCEF application.
Monitoring RPT Received Rate Rate at which SCEF application is receiving Monitoring
Reports from HSS/MME/SGSN.
SCEF Monitoring NOTIFY Sent Rate at which SCEF application is sending T8 Monitoring
Rate Notifications to Scs/As
Successful NIDD Config The average number of successful NIDD configurations
messages by SCEF application
Failed NIDD Config The average number of failed NIDD configuration
messages by SCEF application
Successful NIDD Downlink The average number of successfully transferred NIDD
Transfer Downlink Data messages by SCEF application
4-14
Chapter 4
SS7/Sigtran KPIs
Variable Description
Successfully buffered NIDD The average number of successfully buffered NIDD
Downlink Downlink Data messages by SCEF application
Failed NIDD downlink buffering The average number buffering failure for NIDD Downlink
Data messages by SCEF application
Successful NIDD MO The average number of successful NIDD Uplink Data
messages by SCEF application
Failed NIDD MO The average number of failed NIDD uplink Data messages
by SCEF application
Current NIDD Buffered The number of buffered NIDD downlink Data messages by
SCEF application
SS7/Sigtran KPIs
Table 4-22 SS7/Sigtran KPIs
Variable Description
SCCP Recv Msgs/Sec SCCP messages received per second.
SCCP Xmit Msgs/Sec SCCP messages transmitted per second.
SS7 Process CPU Utilization The average percent of SS7 Process CPU
utilization on an MP server.
Ingress Message Rate The Ingress Message Rate is the number of
non-SNM message that M3UA attempts to
queue in the M3RL Stack Event Queue.
M3RL Xmit Msgs/Sec M3RL DATA MSUs/Sec sent.
M3RL Recv Msgs/Sec M3RL DATA MSUs/Sec received.
Variable Description
SBR Memory Utilization SBR memory utilization (0-100%)
SBR Process CPU Utilization SBR Process CPU Percent Utilization
(0-100%)
Variable Description
SBR Policy Bindings (IMSI) Total number of subscribers with at least one
binding (IMSI)
SBR Binding DB Read Rate Number of SBR Binding DB reads per second
SBR Binding DB Write Rate Number of SBR Binding DB writes per second
4-15
Chapter 4
U-SBR KPIs
Variable Description
SBR Alt Key Bindings (MSISDN) Total number of subscribers with at least one
Alternate Key binding (MSISDN)
SBR Alt Key Bindings (IPv4) Total number of subscribers with an Alternate
Key binding (IPv4)
SBR Alt Key Bindings (IPv6) Total number of subscribers with an Alternate
Key binding (IPv6)
Variable Description
SBR Policy Sessions Number of Active SBR Policy Sessions
SBR Policy Session DB Read Rate Number of SBR Policy Session DB reads per
second
SBR Policy Session DB Write Rate Number of SBR Policy Session DB writes per
second
SBR Online Charging Sessions Number of Active SBR Online Charging
Sessions
SBR OC Session DB Read Rate Number of SBR Online Charging Session DB
reads per second
SBR OC Session DB Write Rate Number of SBR Online Charging Session DB
writes per second
U-SBR KPIs
The KPI values associated with Universal SBR are visible using Status & Manage,
and then KPIs.
Variable Description
GenericCreateStateRate Rate of ingress GenericCreateState stack
events messages received by the U-SBR
server.
GenericCreateOrReadStateRate Rate of ingress of GenericCreateOrReadState
events processed by the U-SBR Server
GenericReadStateRate Rate of ingress of GenericReadState events
processed by the U-SBR Server
GenericUpdateStateRate Rate of ingress of GenericUpdateState events
processed by the U-SBR Server
GenericConcurrentUpdateStateRate Rate of ingress of
GenericConcurrentUpdateState events
processed by the U-SBR Server
GenericDeleteStateRate Rate of ingress of GenericDeleteState events
processed by the U-SBR Server
4-16
Chapter 4
vSTP KPIs
Variable Description
GenericErrRecObsoletedRate Rate of received
GenericConcurrentUpdateState events by the
U-SBR Server that lead to a result event with
the error code set to GenericErrRecObsoleted
GenericTotalRequestsRate Rate of received GenericState events by the
U-SBR Server
GenericErrMalformedRequestRate Rate of Generic State events that could not be
decoded by the U-SBR Server
GenericErrRate Rate of GenericState events that could not
be processed by the U-SBR Server and were
replied with a GenericErr code
vSTP KPIs
The KPI values associated with Universal SBR are visible using Status & Manage,
and then KPIs.
Variable Description
VSTP Process CPU Utilization Average percent VSTP Process CPU
utilization (0-100%) on a MP server
SCCP Xmit Msgs/Sec SCCP messages transmitted per second
SCCP Recv Msgs/Sec SCCP messages received per second
M3RL Xmit Msgs/Sec MTP3 DATA MSUs transmitted per second
M3RL Recv Msgs/Sec MTP3 DATA MSUs received per second
M3UA Xmit Msgs/Sec M3UA DATA MSUs transmitted per second
M3UA Recv Msgs/Sec M3UA DATA MSUs received per second
M2PA Xmit Msgs/Sec M2PA DATA MSUs transmitted per second
M2PA Recv Msgs/Sec M2PA DATA MSUs received per second
SS7 EIR Recv Msgs/Sec EIR Check IMEI received per second
SS7 EIR Xmit Msgs/Sec EIR Check IMEI response transmitted per
second
EIR DB Response Msgs/Sec EIR DB response received per second
EIR DB Request Msgs/Sec EIR DB request transmitted per second
4-17