Common Faults and Alarms On The RTN

2011-5-5 Security Level: A
Handling Common
Faults and Alarms
on the RTN
Network
www.huawei.com
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

Contents
1 Process of Locating Common Faults
2 Locating Link Faults
3 Locating Faults of TDM Services
4 Locating Faults of Packet Services
5 Locating Faults of Protection Schemes
6 Locating Clock Faults
7 Locating DCN Faults
8 Locating Other Faults
9 Handling Common Alarms
10 Typical Cases of Fault Locating
11 Reference Documents
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 2

Process of Locating Common
Faults Start
Check alarms.
Check service flows and

locate faults.
Check key
configurations.
Check black box, errlog,

debugbuf, dopra records.
Record network configurations,

operation procedures, fault symptoms,
and time points of key events.
Collect data.
End

Process of Locating Common
Faults
1. Check the flow of services, including service add/drop nodes, network
topologies, and configuration of convergence NEs.
2. Check the time when service interruption occurs and the triggered events, the
time when services recover and the triggered events.
3. Check current and historical alarms and current and historical 15-minute and
24-hour events reported by the NMS and NEs.
4. Check black box records, errlog records, debugbuf logs, and dopra logs.
5. Check records of manual operations, operation records on the NMS, and oplog
records on NEs. Check version information, NE configuration, and board
configuration.
6. Collect fault information by using specific tools.

Contents



Locating Faults of Microwave Links
- Common Locating Process
Start
1 Yes
Are there any
wrong Perform rollbacks.
operations?
No
2
Are there any Yes
ODU or IF board Handle alarms.
faults?
3
No
Is Tx power
Handle the fault.
normal?
No
4
Yes
Is Rx power lower Handle the fault.
than normal?
No
Yes 5
Fading causes
abnormal Rx power? Handle the fault.
No
6
Yes
Are links faulty
Handle the fault.
unidirectionally?
7 No
No
Locate faults by Go to the next Faults are
performing loopbacks. step. rectified?
Yes
End

- Common Symptoms and Causes
Fault Type Common Cause
Tx power is abnormal. The ODU is faulty.
Rx power is always lower than 1. Antennas are not aligned.
normal value. 2. Antennas have different polarization directions.
3. Transmission is blocked by mountains or buildings.
4. Antennas malfunction or the connection between
antennas and ODUs is faulty, such as wet
waveguide interface and loosely-installed flexible
waveguide.
5. The ODU is faulty.
Slow up-fading causes abnormal Rx There is external interference.
power.
Slow down-fading causes abnormal Fading margin is insufficient.
Rx power.
Fast fading causes abnormal Rx Multipath fading is severe.
power.
Rx power is normal, but the There is external interference.
microwave link is faulty
unidirectionally.

- Handling Method
Handling Procedure Handling Method
1. Check for incorrect Focus on:

operations. 1. Whether the ODU is powered off
2. Whether the ODU is muted
3. Whether a loopback is performed on the IF board
4. Whether the configuration is consistent at the two ends
5. Whether the configuration matches the models of ODUs and
combiners
6. Whether the E1 capacity is consistent at the two ends for the
Hybrid microwave

- Handling Method
2. Handle equipment faults. Focus on:
VOLT_LOS
CONFIG_NOSUPPORT
HARD_BAD
TEMP_ALARM
IF_INPWR_ABN
RADIO_MUTE
RADIO_TSL_HIGH
RADIO_TSL_LOW
RADIO_RSL_HIGH
IF_CABLE_OPEN
3. Handle abnormal Tx Replace the ODU.

power.

- Handling Method
4. Handle lower-than- 1. If Rx power declines rapidly and remains lower than normal, check the
normal Rx power. installation of antennas and ensure that the azimuth of antennas meets
the planning requirements.
2. Check the antenna direction. Especially, check whether the received
signal is from the main lobe.
3. If antennas are not aligned, align antennas.
4. On a 1+1 HSB microwave link, if the Rx power difference between the
active and standby ODUs at one end is higher than 9 dB (for non-
balanced combiners) or 5 dB (for balanced combiners), perform 1+1
switching or replace ODUs/combiners to determine the faulty component.
5. If the RSL difference between the two ends is higher than 10 dB, replace
ODUs to determine the faulty component.
6. Check the polarization directions of antennas and adjust the incorrect
polarization direction.
7. Replace ODUs/combiners to determine the faulty component.
8. Check whether transmission is blocked by any mountains or buildings.
9. Check the antenna gain at the two ends and replace the antennas that
do not provide required antenna gain.

- Handling Method
Handling Handling Method
Procedure
5. Handle fading. To handle down-fading:
•Increase the installation heights of antennas.
•Reduce the transmission distance.
•Increase the antenna gain.
•Increase Tx power.
To handle fast fading:

Contact the network planning department for appropriate plan changes, such
as:
•Adjust the position of the antenna to block the reflected wave or make the
reflection point fall on the ground that has a small reflection coefficient,
reducing multipath fading.
•Configure 1+1 SD for microwave links.
•For microwave links with 1+1 SD, adjust the height difference between two
antennas to make one's Rx power higher than the other's Rx power.

•Increase fading margins by using larger-diameter antennas or raising
antennas' Tx power.
To handle up-fading:
•Check for co-channel interference.
•Use a spectrum analyzer to analyze interference sources.
•Contact the spectrum management department for clearing the interference
spectrum, or change plans to minimize the interference.

- Handling Method
Handling Handling Method
Procedure
6. Handle 1. Check for co-channel interference.
interference. 2. Check for adjacent channel interference.
3. Use a spectrum analyzer to analyze interference sources.
4. Contact the spectrum management department for clearing the
interference spectrum, or change plans to minimize the
interference.
7. Locate faults 1. Perform an inloop on the IF port.
by performing 2. Replace the IF board if the fault persists.
loopbacks. 3. Check cable connectors and redo the substandard ones.
4. Check IF cables and replace those that are soggy, broken, or
pressed.
5. Replace the ODU.
6. If the fault is rectified after replacement, you can infer that the
ODU is faulty.

- Common Alarms
 The CONFIG_NOSUPPORT is an alarm indicating that the configuration is not supported.
Possible Causes
 Cause 1: The model and configuration parameters of the ODU do not meet the
requirements.
 Cause 2: On Hybrid microwave links, the configured ODU's Tx power is beyond the allowed
range. (On Hybrid microwave links, which are composed of IFH2 boards, the maximum Tx
power of ODUs is determined by the IF modulation mode and AM enabling status.)
Handling Procedure
Cause 1: The model and configuration parameters of the ODU do not match the requirements.
 Check the alarm parameters to determine the configuration parameters that do not meet the
requirements.
 If the alarm parameter is 0x01-0x03, check whether the configuration parameters of the
ODU port meet the requirements of network planning.
 If the alarm parameter is 0x04-0x06, check whether the configuration parameters of the IF
port meet the requirements of network planning. If not, change the parameter settings.

- Common Alarms
 The RADIO_RSL_LOW is an alarm indicating that the RSL is over
low.
Possible Causes
 Cause 1: Certain other alarms occur at the opposite site.
 Cause 2: The opposite Tx power is over low.
 Cause 3: Signal attenuation on the microwave link is heavy.

- Common Alarms
Handling Procedure
 Cause 1: Certain other alarms occur at the opposite site. Check whether any of
the following alarms is reported at the opposite site. If yes, clear the alarm
immediately.
 RADIO_MUTE
 CONFIG_NOSUPPORT
 RADIO_TSL_LOW
 BD_STATUS
 Cause 2: The opposite Tx power is over low.

 Check whether the opposite Tx power is normal. If not, replace the opposite ODU.
 Check whether the opposite NE is powered off.

- Common Alarms
 Cause 3: Signal attenuation on the microwave link is heavy.
 Check whether the alarm is repeatedly reported among historical alarms. If the
alarm is reported occasionally, contact the network planning department for
improving anti-fading performance.
 Check whether the antennas at both ends are aligned. If not, realign the antennas.
 Check whether transmission is blocked by any mountains or buildings. If yes,
contact the network planning department for avoiding the block.
 Check whether the polarization direction is set correctly for the antennas, ODUs,
and combiners at both ends. If not, correct the polarization direction.
 Check whether the outdoor units such as antennas, combiners, ODUs, and flexible
waveguides are wet, damp, or damaged. If yes, replace the faulty component.
 Check the antenna gain at the two ends and replace the antennas that do not
provide required antenna gain.

- Common Alarms
 The MW_LOF is an alarm indicating the loss of microwave
frames.
Possible Causes
 Cause 1: The microwave link performance degrades.
 Cause 2: The IF working mode of the local site is different from
that of the opposite site.

 Cause 3: The operating frequency of the local ODU is different
from that of the opposite ODU.

 Cause 4: The transmit unit of the opposite site is faulty.
 Cause 5: The receive unit of the local site is faulty.
Handling Procedure
See "Handling Faults of Microwave Links."

- Common Alarms
 The MW_FECUNCOR is an alarm indicating that uncorrectable
errors exist in the forward error correction (FEC) coding of
microwave frames.
Possible Causes
 Cause 1: The receive power of the ODU is abnormal.
 Cause 2: The transmit unit of the opposite site is faulty.
 Cause 3: The receive unit of the local site is faulty.
 Cause 4: Interference exists.
Handling Procedure
See "Handling Faults of Microwave Links."

- Common Alarms
 The HARD_BAD is an alarm indicating hardware errors.
Possible Causes
 Cause 1: Clock tracing is looped.
 Cause 2: The alarmed board has hardware errors.
Handling Procedure
Cause 1: Clock tracing is looped.
 Check the alarm parameter. The value 0x06 indicates that clock
signals are interlocked and therefore the timing loop needs to be
cleared.
Cause 2: The alarmed board has hardware errors.
 Replace the alarmed board.

- Common Alarms
 The BD_STATUS is an alarm indicating that the board cannot be
detected.
Possible Causes
Cause 1: If the IDU reports the alarm, the possible causes are as follows:
 The board is installed in an incorrect slot.
 The board and the backplane are connected incorrectly.
 The slot that houses the board is faulty.
 The board is faulty.
Cause 2: If the ODU reports the alarm, the possible causes are as follows:
 Check whether the IF board reports the HARD_BAD, BD_STATUS,
IF_CABLE_OPEN, or VOLT_LOS alarm. If yes, clear the alarm immediately.
 The ODU is faulty. Replace the faulty ODU.

- Common Alarms
Handling Procedure
Cause 1: If the IDU reports the alarm, handle the alarm as follows:
Check whether the physical slot and logical slot of the alarmed
board are the same.

Re-install the alarmed board.
Install the board in another slot.
Replace the alarmed board.
Cause 2: If the ODU reports the alarm, handle the alarm as follows:
Check whether the alarm is caused by other alarms.
Replace the faulty ODU.

Locating Faults of Ethernet Links -
Common Locating Process
Any No No
Any ETH
ETH laser physical-layer
alarms? alarms?
Yes Yes
Start C - STM Any No No No

Any alarms Any VC-12
MLPPP laser on SDH alarms?
alarms? ports?
Compute the boards and
Yes Yes Yes
physical links that
services traverse.
E 1 MLPPP
No
No Any alarms physical-
No
Any alarms on Types of NNI layer
on E1 ports?
the boards? ports alarms
Yes
Handle
Yes
alarms.
Reset or
Any alarms No
replace MW
alarmed on IF ports?
boards.
Yes

Checking Alarms on Ethernet Links
ETH_LOS ETH_LINK_DOWN MAC_FCS_EXC
Loss of optical Connection fault Excessive bit
signals on the network errors
port
BTS 1 CES
10G/GE
RTN
RTN STM-1
MPLS
GE/FE RTN
MPLS BSC
Core
CES network
BTS 2 RTN
RTN
10G/GE
STM-1
RTN
ETH
BTS 3
RTN BSC
Possible Causes
Possible Causes
Possible Causes
1. Fiber cuts 2. Faulty optical modules 3. Excessive optical
1.Negotiation
1. Excessive bit errors
fails due are detectedworking
to different at the MAC layer.
modes 2. Line
at the two
attenuation
signals
ends. degrade.cables,
2. Electrical 3. Fiberfiber
performance deteriorates.
connections, or opposite4. units
Optical
areports are dirty.
faulty.

Common Alarms on Ethernet Ports
(1)
 The ETH_LOS is an alarm indicating loss of connection on Ethernet ports.
Possible Causes
 Cause 1: The electrical cable or fiber on the Ethernet port is incorrectly connected.
 Cause 2: The electrical cable or fiber on the Ethernet port is faulty.
 Cause 3: The local Rx power is over low.
 Cause 4: The alarmed board is faulty.
Handling Procedure
Cause 1: The electrical cable or fiber on the Ethernet port is incorrectly connected.
Verify that the electrical cable or fiber on the Ethernet port is correctly connected.
Cause 2: The electrical cable or fiber on the Ethernet port is faulty.
Replace the faulty electrical cable or fiber.
Cause 3: The local Rx power is over low.
Check for the OUT_PWR_ABN alarm on the opposite NE and clear the alarm immediately if it is
reported. If the alarm persists then, clean the receive optical port and fiber connector. If the
alarm persists then, verify that the flange and optical attenuator are used correctly. If the
alarm persists then, add or remove optical attenuators to achieve normal Rx power.
Cause 4: The alarmed board is faulty.
Replace the alarmed board. If the alarm persists, replace the mapping board at the
opposite end.

(2)
 The ETH_LINK_DOWN is an alarm indicating that the connection on the network port is faulty.
Possible Causes
 Cause 1: Negotiation fails due to different working modes at the two ends.
 Cause 2: An inloop is performed on the port.
 Cause 3: The fiber is connected to an incorrect port.
 Cause 4: A certain board is faulty.
Handling Procedure
Cause 1: Negotiation fails due to different working modes at the two ends.
Verify that the working modes are the same at the two ends.
Cause 2: An inloop is performed on the port.
Check for the LOOP_ALM alarm at the two ends and clear the alarm immediately if it is reported.
Cause 3: The fiber is connected to an incorrect port.
Check whether the fiber on the alarmed port is connected to an incorrect port. If yes, connect the
fiber to a correct port.
Cause 4: A certain board is faulty.
Check for hardware-related alarms (such as HARD_BAD) at the two ends and replace the board
that reports any of these alarms.

(3)
 The MAC_FCS_EXC is an alarm indicating that excessive bit errors are detected at the
MAC layer.
Possible Causes
 Cause 1: The line signals deteriorate.
 Cause 2: The input optical power is abnormal.
 Cause 3: The fiber connector is dirty.
Handling Procedure
Cause 1: The line signals deteriorate.
Check for the LOOP_ALM alarm on the NMS and clear the alarm immediately if it is
reported. If the alarm persists then, check for DOS attacks and eradicate any sources
that transmit a large amount of invalid data. If the alarm persists then, verify that the
fiber and electrical cable are normal.
Cause 2: The input optical power is abnormal.
Check whether the alarmed port also reports IN_PWR_ABN. If yes, clear the IN_PWR_ABN
alarm immediately.
Cause 3: The fiber connector is dirty.
Clean the fiber connector and the receive optical port.

Checking Alarms on SDH Links
R_LOS
R_LOC R_LOF
Loss of
Loss of Loss of
optical
clock frame
signals
BTS 1 CES
GE
RTN
RTN STM-1
GE/FE MPLS RTN

MPLS BSC
Core
CES network
BTS 2 RTN
RTN
GE/10GE
STM-1
RTN
ETH
BTS 3
RTN BSC
Possible Causes
Possible
PossibleCauses
Causes
1. Failure in received signals 2. Malfunction of clock extraction modules
1.
1.Fiber
Excessive
cuts 2.
attenuation
Excessive of
loss
received
on the line
signals
3. Malfunction
2. Unframedofstructure
opposite of
transmit
signalsunits
from
the opposite site 3. Malfunction of local receive units

Common Alarms on SDH Ports (1)
 The R_LOS is an alarm indicating loss of signals on the receive side of the line.
Possible Causes
 Cause 1 of lasers: The local optical port is not used but the local laser is open.
 Cause 2 of lasers: The local laser is open but the opposite laser is closed, so there is no output
of optical signals.
 Cause 1 of fibers: No pigtail is connected to the local optical port or the pigtail on the local
optical port is connected incorrectly.
 Cause 2 of fibers: Fiber cuts occur.
 Cause 3 of fibers: Rx power is over low.
 Cause 1 of boards: The local receive board is faulty.
 Cause 2 of boards: The opposite transmit board is faulty.
Handling Procedure
Cause 1 of lasers: The local optical port is not used but the local laser is open.
Check the enabling status of the local laser on the NMS and close the laser if it is
open.
Cause 2 of lasers: The local laser is open but the opposite laser is closed, so there is no output of
optical signals.
Check the enabling status of the opposite laser on the NMS and open the laser if
HUAWEI TECHNOLOGIES CO., LTD.
it is closed. Huawei Confidential Page 28
Handling Procedure
Cause 1 of fibers: No pigtail is connected to the local optical port or the pigtail on the local
optical port is connected incorrectly.
Verify that the pigtail on the local optical port is correctly connected.
Cause 2 of fibers: Fiber cuts occur.
Replace broken fibers.
Cause 3 of fibers: Rx power is over low.
Check for the OUT_PWR_ABN alarm on the opposite transmit port and clear the alarm
immediately if it is reported. If the alarm persists then, clean the receive optical port
and fiber connector. If the alarm persists then, verify that the flange and optical
attenuator are used correctly. If the alarm persists then, add or remove optical
attenuators to achieve normal Rx power.
Cause 1 of boards: The local receive board is faulty.
If the local Rx power is normal, set an inloop for the local receive port. If the alarm
persists, the local board is faulty and needs to be replaced.
Cause 2 of boards: The opposite transmit board is faulty.
Replace the opposite transmit board. If the alarm persists, replace the
opposite cross- connect board.

 The R_LOF is an alarm indicating loss of frames on the receive side of the line.
Possible Causes
 Cause 1: Different types of optical modules are used at the two ends.
 Cause 2: The receive power of the ODU is abnormal.
 Cause 3: Fibers are misconnected.
 Cause 4: The signals transmitted from the opposite site do not have the frame
structure.
 Cause 5: The local receive board is faulty.
Handling Procedure
Cause 1: Different types of optical modules are used at the two ends.
Verify that optical modules of one type are used at the two ends.
Cause 2: The receive power of the ODU is abnormal.
Check whether the alarmed port also reports IN_PWR_ABN. If yes, clear the IN_PWR_ABN
alarm immediately.

Handling Procedure
Cause 3: Fibers are misconnected.
Verify that fibers are connected correctly.
Cause 4: The signals transmitted from the opposite site do not have the frame structure.
Check for the HARD_BAD alarm on the opposite transmit board and clear this alarm
immediately if it is reported.
Cause 5: The local receive board is faulty.
Check for the HARD_BAD alarm on the local receive board and clear this alarm
immediately if it is reported.

Checking Alarms on E1 Links
T_ALOS ALM_E1RAI
Loss of Far-end alarm
signals indication
BTS 1 CES
GE/10GE
RTN
RTN STM-1
GE/FE MPLS RTN

MPLS BSC
Core
CES network
BTS 2 RTN
RTN
GE/10GE
STM-1
RTN
ETH
BTS 3
RTN BSC
Possible Causes
Possible Causes
1. E1/T1 services are not received. 2. Fibers on the DDF-side E1/T1 output ports
Some alarms are reported on the opposite site.
are disconnected or loosely connected. 3. Fibers on local E1/T1 output ports
are disconnected or loosely connected. 4. A certain board is faulty. 5. The
electrical cable is faulty.

Common Alarms on E1 Ports (1)
 The T_ALOS is an alarm indicating loss of signals on E1 ports.
Possible Causes
 Cause 1: The opposite site does not transmit any E1 services.
 Cause 2: E1 cables are disconnected or loosely connected.
 Cause 3: The opposite equipment is faulty.
 Cause 4: The electrical cable is faulty.
Handling Procedure
Cause 1: The opposite site does not transmit any E1 services.
Verify that the opposite site transmits E1 services properly.
Cause 2: E1 cables are disconnected or loosely connected.
Verify that E1 cables are correctly connected.

Handling Procedure
Cause 3: The opposite equipment is faulty.
Perform a self-loop for the alarmed channel on the DDF side. If the alarm clears, the
opposite equipment is faulty and the fault needs to be rectified.
Cause 4: The electrical cable is faulty.
Perform a self-loop for the alarmed channel on the DDF side. If the alarm persists, perform
a self-loop for the alarmed channel on the interface board side. If the alarm clears, the
E1 cable is faulty and needs to be replaced.
Perform a self-loop for the alarmed channel on the interface board side. If the alarm
persists, set an inloop for the alarmed channel on the NMS. If the alarm clears, the
interface board is faulty and needs to be replaced.

 The UP_E1_AIS is an alarm indicating upstream E1 signals. This alarm is reported when
the upstream E1 signal is all 1s.
Possible Causes
 Cause 1: The opposite site reports the T_ALOS alarm.
 Cause 2: An inloop is set for the E1 port.
 Cause 3: Some boards are faulty.
Handling Procedure
Cause 1: The opposite site reports the T_ALOS alarm.
Check for the T_ALOS alarm on the opposite site and clear this alarm immediately if it is
reported.
Cause 2: An inloop is set for the E1 port.
Check whether the E1 port reports the LOOP_ALM alarm on the NMS. If yes, release the
inloop on the E1 port.
Cause 3: Some boards are faulty.
On the NMS, check whether the local NE and the opposite NE report any hardware-related
alarms such as HARD_BAD. If yes, perform a cold reset for the boards that report
hardware-related alarms. If the alarm persists then, replace the boards that may be
faulty.

 The DOWN_E1_AIS is an alarm indication for downstream 2 Mbit/s signals. This alarm is
reported when the downstream E1 signal is all 1s.
Possible Causes
 Cause 1: The alarmed board also reports the UP_E1_AIS or T_ALOS alarm.
 Cause 2: Some boards are faulty.
Handling Procedure
Cause 1: The alarmed board also reports the UP_E1_AIS or T_ALOS alarm.
Check whether the alarmed board reports the UP_E1_AIS or T_ALOS alarm on the NMS. If
yes, clear the UP_E1_AIS or T_ALOS alarm immediately.
Cause 2: Some boards are faulty.
On the NMS, check whether the alarmed board and local cross-connect board report any
hardware-related alarms such as HARD_BAD. If yes, perform a cold reset for the boards
that report hardware-related alarms. If the alarm persists, replace the boards that may
be faulty.

Common Alarms on Other Links (1)
 The IN_PWR_ABN is an alarm indicating that the input optical power is abnormal.
Possible Causes
 Cause 1: The opposite transmit power is abnormal.
 Cause 2: The local receive power is higher than the upper threshold.
 Cause 3: The local receive power is lower than the lower threshold.
 Cause 4: The receive board is faulty.
Handling Procedure
Cause 1: The opposite transmit power is abnormal.
On the NMS, check whether the opposite site reports the OUT_PWR_ABN alarm. If yes, clear
this alarm immediately and check whether the IN_PWR_ABN is cleared. If the alarm
persists, query the local receive power and handle the alarm according to other causes.
Cause 2: The local receive power is higher than the upper threshold.
Add proper optical attenuators to the receive optical port and adjust the input optical power
to a normal value.

Handling Procedure
Cause 3: The local receive power is lower than the lower threshold.
Verify that the bending radius of the pigtail on the local site is no smaller than 6 cm.
If the alarm persists, use proper optical attenuators and correctly connect the
local optical module. If the alarm persists, replace the optical module and clean
the fiber connectors at the two ends.
Cause 4: The receive board is faulty.
Check whether the processing board and cross-connect board on the local site report
any hardware-related alarms such as HARD_BAD and TEMP_OVER. If yes, replace
the boards that report hardware-related alarms.

 The OUT_PWR_ABN is an alarm indicating that the output optical
power is abnormal.
Possible Causes
 Cause 1: The output optical power is over high or over low.
Handling Procedure
Cause 1: The output optical power is over high or over low.
Replace the optical module of the alarmed port.
Replace the alarmed board.

 The LOOP_ALM is an alarm of loopbacks.
Possible Causes
 Cause 1: The port is looped back.
 Cause 2: The service is looped back.
Handling Procedure
Cause 1: The port is looped back.
On the NMS, check whether the alarmed port is looped back. If yes, release the loopback.
Cause 2: The service is looped back.
On the NMS, check whether the service is looped back. If yes, release the loopback. For
Ethernet services, enable the automatic shutdown function for looped-back ports.

 The FLOW_OVER is an alarm indicating the traffic received by the port is
higher than the threshold.
Possible Causes
 Cause 1: The traffic received by the port is higher than the preset threshold
of the port.
Handling Procedure
Cause 1: The traffic received by the port is higher than the preset threshold of
the port.
 Check whether the actual received traffic indicated by the alarm
parameter is higher than the port bandwidth. If yes, reduce the data
transmitted by the opposite site.
 Configure the service on an unused port.

Contents

Locating Faults of TDM Services -
Start
1
Yes
Any equipment Handle alarms.
alarms?
No
2
Any pointer Yes Handle pointer
justifications? justifications.
No SDH optical 3
interface boards Handle RS errors on
SDH optical interface
boards.
Any alarms or Yes Process RS 4

events related to errors on different IF boards
RS errors?
Handle RS errors
boards. on IF boards.
STM-1 electrical 5
No boards Handle RS errors on
STM-1 electrical
interface boards.
6
Any alarms or events
Yes
Handle MS errors
related to MS errors or
HOP errors? and HOP errors.
No
7
Any alarms Yes
related to LOP Handle LOP errors.
errors?
No
Locate faults by Go to the No Faults are

performing sectional next step. rectified?
loopbacks.
Yes
End

Common Symptoms and Causes
Equipment faults 1. Over high board temperature causes bit errors.
2. Clock tracing fails and the upstream link clocks need to be
checked.
3. The board reports the HARD_BAD alarm, and clock tracing
needs to be checked or some boards need to be replaced.
Regenerator section (RS) 1. The line is faulty.
errors • On optical lines, optical power is abnormal, fiber
performance deteriorates, or fiber splice and fiber
connectors are dirty.
• On STM-1 electrical lines, electrical cables deteriorate,
grounding is incorrect, or connectors are incorrectly
connected.
• On microwave links, the MW_FEC_UNCOR or RPS_INDI alarm
is reported.
2. The line board fails.
3. The clock unit fails.
4. Clock quality deteriorates on the network.
5. Clock quality deterioration on the network causes pointer
justifications.
There are multiplex section 1. The line board is faulty.
(MS) errors and higher order 2. Clock quality deteriorates on the network.
path (HOP) errors, but not RS 3. Clock quality deterioration on the network causes pointer
errors. justifications.
4. Operating temperature on the line board is over high.
There are only lower order 1. The PDH service processing board or Ethernet service processing
path (LOP) errors. board is faulty.
2. The cross-connect board is faulty.
3. The PDH service processing board or Ethernet service processing
board has over high working temperature.
4. The working temperature on the cross-connect board is over
high.
5. Unstable power supply, incorrect grounding, or external
interference exists.

Handling Method
Procedure Handling Method
1. Handle alarms. Focus on:
TEMP_ALARM
SYN_BAD
HARD_BAD
MW_CFG_MISMATCH
2. Handle pointer 1. Analyze and process clock alarms.

justifications. 2. Ensure that the configuration is correct and fibers are
correctly connected.
3. Locate the sites with clock asynchronization by changing
clock configuration.
4. Replace the components with poor performance.

Handling Method
4. Handle the RS 1. Exchange the fiber cores in the transmit and receive directions
errors on the SDH on a section of optical channel. If errors change after the fiber
optical interface cores are exchanged, the fibers are faulty or the equipment
board. malfunctions at the two ends.
2. If fibers are faulty, check whether the fiber from the equipment
to the optical distribution frame (ODF) and the fiber that is led
out from the telecommunications room are pressed, and whether
any fiber connector is dirty or damaged.
3. If the equipment at the two ends is faulty, locate the fault by
performing loopbacks on optical ports. If the fault persists after
the loopback on a site, the line board on the site is faulty.
4. If the equipment at the two ends is faulty, replace the alarmed
board or exchange the slots of the alarmed board and anther
working SDH optical interface board. If the alarm is still reported
by the alarmed board, the alarmed board is faulty.
5. Handle the RS 1. Check for the MW_FEC_UNCOR and RPS_INDI alarms.

errors on the IF 2. If any of these alarms is reported, clear the alarm immediately.
board. 3. If none of these alarms is reported, replace the IF board.

Handling Method
5. Handle the RS errors on 1. Exchange the electrical cables in the receive and transmit directions. If
the STM-1 electrical errors change after the exchange, the electrical cables are faulty or the
interface board. equipment at the two ends is faulty.
2. Check whether the electrical cables are grounded properly and whether
the cable connectors and cables are damaged.
3. If the equipment at the two ends is faulty, locate the fault by
performing loopbacks on electrical ports. If the fault persists after a
loopback is performed on a site, the line board on the site is faulty.
4. If the equipment at the two ends is faulty, replace the alarmed board or
exchange the slots of the alarmed board and anther working SDH
electrical interface board. If the alarm is still reported by the alarmed
board, the alarmed board is faulty.
6. Handle the MS errors and 1. Perform a loopback on the alarmed board.

HOP errors. 2. If the alarm persists, replace the alarmed board.
3. If the alarm clears, replace the transmit line board, which corresponds
to the alarmed board.
4. If the alarm persists after board replacement, check for unstable power
supply, improper grounding, and external interference on the SDH
electrical interface board.

Handling Method
7. Handle LOP 1. Replace PDH service processing boards, Ethernet
errors. service processing boards, or cross-connect boards
along the overlapped route of errored services.
2. If the alarm persists after board replacement, check
for unstable power supply, improper grounding, and
external interference.

Common Alarms
 The MW_CFG_MISMATCH is an alarm indicating a configuration mismatch on microwave links.
 Possible Causes
 Cause 1: The number of E1 signals is different on both ends of a microwave link (including the number of E1
signals
on the active page and the number of E1 signals on the standby page).
 Cause 2: The AM enabling is different on both ends of a microwave link.
 Cause 3: The IEEE 1588 overhead enabling is different on both ends of a microwave link.
 Cause 4: The modulation mode is different on both ends of a microwave link.
 Cause 5: The channel spacing is different on both ends of a microwave link.
 Handling Procedure
 Cause 1: The number of E1 signals is different on both ends of a microwave link.
Determine the possible cause of the alarm according to the alarm parameters. Then, check the configuration on
both ends of the microwave link. Ensure that the configuration is the same on both ends of the microwave link.

Contents

Locating Faults of CES Services -
Start
HARD_BAD, Yes
TEMP_OVER, Board hardware errors
Reset/Reseat/
BUS_ERR, or or inter-board
communication failure Replace boards.
COMMUN_FAIL
occurs?
No
Yes Troubleshoot fibers, No

T_ALOS, R_LOS, or Signal loss or Reset/Reseat/
optical modules, or Replace boards.
LASER_MOD_ERR degrade
network cables.
occurs?
No
Yes No
Troubleshoot Troubleshoot the
MPLS_TUNNEL_LO Tunnel faults
physical links. opposite equipment.
CV occurs?
No
Yes Loss of No
SYNC_C_LOS or Troubleshoot Troubleshoot the
synchronization
LTI occurs? clock faults. opposite equipment.
clock
No
Yes Excessive errored Troubleshoot fibers, No

CES_LOSPKT_EXC or Change network
packets, lost optical modules, and
CES_JTRUDR_EXC configurations.
occurs? packets, or jitters connections.
No Faults are
rectified?
Yes
Contact Huawei End

engineers.

Common Symptoms
Symptom Alarm Reported
CES services are interrupted. HARD_BAD, TEMP_OVER, or BUS_ERR
COMMUN_FAIL
T_ALOS
UP_E1_AIS or DOWN_E1_AIS
R_LOS, LASER_MOD_ERR, or IN_PWR_ABN
MPLS_TUNNEL_LOCV

Common Symptoms
CES services have errors and HARD_BAD, TEMP_OVER, or BUS_ERR
the signal quality degrades.
SYNC_C_LOS or LTI
CES_LOSPKT_EXC, CES_MISORDERPKT_EXC, CES_STRAYPKT_EXC,

CES_MALPKT_EXC, CES_JTRUDR_EXC, or CES_JTROVR_EXC
LSR_WILL_DIE, IN_PWR_ABN, TEM_HA, or LSR_BCM_ALM

Common Causes
Cause 1: The board carrying CES services cannot work properly due to hardware
errors, over-high temperature, or inter-board communication failure.
Cause 2: The signal transmitted to the processing board or interface board is lost or
degrades.
Cause 3: The tunnel or PW carrying CES services is interrupted.
Cause 4: On the NE, the priority of synchronization clock source is lost, or the
synchronization clock source is lost.
Cause 5: On the PW carrying CES services, the number of lost packets, errored
packets, or jitters within a time unit crosses the threshold.

Handling Method
Cause 1: The board carrying CES services cannot work properly due to hardware errors, over-high temperature, or
inter-board communication failure.
Handle the HARD_BAD, TEMP_OVER, COMMUN_FAIL, or BUS_ERR alarm if any of them is reported.
Cause 2: The signal transmitted to the processing board or interface board is lost or degrades.
Handle the T_ALOS, UP_E1_AIS, DOWN_E1_AIS, R_LOS, LASER_MOD_ERR, LSR_WILL_DIE, IN_PWR_ABN,
TEM_HA, or LSR_BCM_ALM alarm if any of them is reported.
Cause 3: The tunnel or PW carrying CES services is interrupted.
Enable MPLS OAM. Handle the MPLS_TUNNEL_LOCV alarm if it is reported.
Cause 4: On the NE, the priority of synchronization clock source is lost, or the synchronization clock source is lost.
Handle the SYNC_C_LOS or LTI alarm if any of them is reported.
Cause 5: On the PW carrying CES services, the number of lost packets, errored packets, or jitters within a time unit
crosses the threshold.
Handle the CES_LOSPKT_EXC, CES_MISORDERPKT_EXC, CES_STRAYPKT_EXC, CES_JTRUDR_EXC, or
CES_JTROVR_EXC alarm if any of them is reported.

Common Alarms of CES Services
(1)
 The CES_JTROVR_EXC/CES_JTRUDR_EXC is an alarm indicating the overflow/underflow of CES jitters.
 Possible Causes
 Cause 1: Clock synchronization cannot be performed.
 Cause 2: Link quality deteriorates, causing more jitters.
 Cause 3: The size of buffer area is set to a low value.
 Cause 4: There are too many hops of microwave link on the network side, which generates a large number of jitters.
 On the NMS, check whether the LTI or other clock alarms are reported. If yes, clear these alarms.
 Cause 2: Link quality deteriorates, causing more jitters.
 Check whether the alarmed port also reports IN_PWR_ABN or TEM_HA. If yes, clear the IN_PWR_ABN or TEM_HA alarm
immediately.
 Cause 3: The size of buffer area is set to a low value.
 On the NMS, increase the size of buffer area if possible.
 Cause 4: There are too many hops of microwave link on the network side, which generates a large number of jitters.
 Reduce the number of hops on the network side.


(2)
The CES_LOSPKT_EXC is an alarm indicating packet loss of CES services.
 Possible Causes
 Cause 2: Parameter settings are different at the two ends of CES services.
 Cause 3: The tunnel or PW carrying CES services is congested.
 Cause 4: The link signal deteriorates or is interrupted due to a fault of cables, optical fibers, or optical modules.
 Modify the parameter settings to the same.
 On the NMS, check whether the bandwidth configured for the tunnel or PW is too low and whether the QoS parameters
are set properly. If the bandwidth and QoS settings cannot meet the requirements of CES services, increase the
bandwidth, replan the service trail, and change QoS settings.
 Cause 4: The link signal deteriorates or is interrupted due to a fault of cables, optical fibers, or optical modules.
 Verify that electrical cables and fibers are correctly connected to the ports. Clean the fiber connectors and optical
modules. If the alarm persists, replace the cables, fibers, or optical modules that may be faulty.

(3)
 The CES_MALPKT_EXC is an alarm indicating deformed packets of CES services.
 Possible Causes
 Cause 1: Parameters of CES services are set incorrectly.
 Cause 3: The link signal deteriorates or is interrupted due to a fault of cables, optical fibers, or optical
modules.
 Cause 1: Parameters of CES services are set incorrectly.
 Modify the incorrect parameter settings on the NMS.
 On the NMS, check whether the bandwidth configured for the tunnel or PW is too low and whether the QoS
parameters are set properly. If the bandwidth and QoS settings cannot meet the requirements of CES
services, increase the bandwidth, replan the service trail, and change QoS settings.
modules.
 Verify that electrical cables and fibers are correctly connected to the ports. Clean the fiber connectors and
optical modules. If the alarm persists, replace the cables, fibers, or optical modules that may be faulty.

(4)
 The CES_MISORDERPKT_EXC is an alarm indicating disordered packets of CES services.
 Possible Causes
modules.
 On the NMS, check whether the bandwidth configured for the tunnel or PW is too low and whether the QoS
parameters are set properly. If the bandwidth and QoS settings cannot meet the requirements of CES
services, increase the bandwidth, replan the service trail, and change QoS settings.
modules.
Verify that electrical cables and fibers are correctly connected to the ports. Clean the fiber connectors and
optical modules. If the alarm persists, replace the cables, fibers, or optical modules that may be faulty.

(5)
 The CES_STRAYPKT_EXC is an alarm indicating errored packets of CES
services.
 Possible Causes
 Cause 2: Fibers or cables are connected incorrectly.
 Modify the parameter settings to the same.
 Reconnect the fibers or cables correctly.

Locating Faults of ETH Services -
Start
HARD_BAD,
TEMP_OVER, Yes Board hardware errors Reset/Reseat/
BUS_ERR, or or inter-board
Replace boards.
COMMUN_FAIL communication failure
occurs?
No
Yes Troubleshoot fibers,

ETH_LOS Signal loss or
optical modules, or
occurs? degrade
network cables.
No
ETH_LINK_ Yes Incorrect connections No

Change parameter Reset/Reseat/
DOWN on network ports, port
settings on ports.
negotiation failure Replace boards.
occurs?
No
Yes
LOOP_AL Loopbacks on Release
M occurs? ports loopbacks.
No
Yes Service Rectify service

FLOW_OVER
configuration faults configuration faults.
occurs?
No Faults are
rectified?
Yes
Contact Huawei
End
engineers.

Common Symptoms
Ethernet services are interrupted. HARD_BAD, TEMP_OVER, or BUS_ERR
COMMUN_FAIL
ETH_LOS, ETH_LINK_DOWN, ETH_AUTO_LINK_DOWN, or
LOOP_ALM
LASER_SHUT or LSR_WILL_DIE
Ethernet services have packet loss HARD_BAD, TEMP_OVER, or BUS_ERR

or errored packets.
LSR_WILL_DIE
FLOW_OVER

Common Causes
 Cause 1: The board carrying ETH services cannot work properly due to hardware
errors, over-high temperature, or inter-board communication failure.
 Cause 2: The signal is lost in the receive direction.
 Cause 3: Negotiation between Ethernet ports fails due to incorrect connections
on Ethernet ports.
 Cause 4: Loopbacks are performed for Ethernet ports.
 Cause 5: Traffic limit on Ethernet ports is set to a low value or parameter settings
are different on source and sink ports.

Handling Method
Cause 1: The board carrying ETH services cannot work properly due to hardware errors, over-high
temperature, or inter-board communication failure.
Handle the HARD_BAD, TEMP_OVER, COMMUN_FAIL, or BUS_ERR alarm if any of them is
reported.
Cause 2: The signal is lost in the receive direction.
Handle the ETH_LOS, R_LOS, LASER_SHUT, or LSR_WILL_DIE alarm if any of them is
reported.
Cause 3: Negotiation between Ethernet ports fails due to incorrect connections on Ethernet ports.
Handle the ETH_LINK_DOWN alarm if it is reported.
Cause 4: Loopbacks are performed for Ethernet ports.
Handle the LOOP_ALM or ETH_EFM_LOOPBACK alarm if any of them is reported.
Cause 5: Traffic limit on Ethernet ports is set to a low value or parameter settings are different on
source and sink ports.
1. Handle the FLOW_OVER or ETH_CFM_UNEXPERI alarm if any of them is reported.
2. Check whether the working modes of interconnected Ethernet ports are the same.

Locating Tunnel Faults - Common
Locating Process
Perform tunnel
ping tests.
Ping tests are Tunnel layer is

successful. normal.
Perform
TraceRoute tests.
Locate faulty NEs and

links.
The chip checks tunnel labels

along the service flow.
Inform users of incorrect

Are there incorrect NE labels and suggest
NE labels? modifications.
Start link-layer
detection.

Locating Tunnel Faults - Common
Symptoms and Causes
Common Symptoms
 MPLS tunnels cannot be created, and therefore services cannot be provisioned.
 MPLS tunnels are faulty, causing service interruption.
 Protection switching fails, causing service interruption, packet loss, or bit errors.
Common Causes
 Cause 1: Cross-connections cannot be created.
 Cause 2: The physical links carrying the tunnels are faulty.
 Cause 3: Protection switching fails.

Locating Tunnel Faults - Handling
Method
Cause 1: Cross-connections cannot be created.
1. Check the IP address of each NE on the LSP. If the IP addresses of two NEs are on the
same network segment, change the IP addresses to values on different network segments.
2. Check whether incompatible features are configured for the tunnel.
3. Check whether the number of created tunnels reaches the maximum value. If yes,
replan tunnels or delete redundant tunnels.
Cause 2: The physical links carrying the tunnels are faulty.
1. Handle the HARD_BAD, R_LOS, ETH_LOS, MPLS_TUNNEL_BDI, MPLS_TUNNEL_Excess,
MPLS_TUNNEL_FDI, or MPLS_TUNNEL_LOCV alarm if any of them is reported.
2. Check whether any exceptions (such as board failure or NE reset) occur on the opposite
equipment. If yes, handle the exceptions.
Cause 3: Protection switching fails.
1. MPLS APS protection switching fails. Handle the failure.

Locating Tunnel Faults – Common
Alarms (1)
 The MPLS_TUNNEL_LOCV is an alarm indicating the loss of tunnel connectivity.
 Possible Causes
 Cause 1: The ingress node on the tunnel stops transmitting CV/FFD packets.
 Cause 2: The physical link carrying the tunnel is faulty.
 Cause 3: Some boards on the ingress node are being reset.
 Cause 4: The service interface is configured incorrectly.
 Cause 5: Severe congestion occurs on the network.
 Cause 6: The CPU is highly occupied and cannot process ARP protocol packets.
 Cause 1: The ingress node on the tunnel stops transmitting CV/FFD packets.
 1. Check whether the settings of detection mode and detection packet type are consistent on the two ends.
If not, make consistent settings.
 2. Check the parameter of CV/FFD status on the ingress node. If the CV/FFD status is disabled, change it to
enabled.
 Cause 2: The physical link carrying the tunnel is faulty.
 On the NMS, check whether the egress node reports the HARD_BAD, ETH_LOS, or ETH_LINK_DOWN alarm. If
yes, clear this alarm.

Alarms (1)
Handling Procedure
Cause 3: Some boards on the ingress node are being reset.
On the NMS, check whether the ingress node reports the COMMUN_FAIL alarm. If yes,
clear this alarm.
Cause 4: The service interface is configured incorrectly.
Check whether the tunnel is configured on a proper port according to the NE planning
table.
Cause 5: Severe congestion occurs on the network.
Check the bandwidth utilization of each port on the LSP. If the bandwidth of some ports
is exhausted, allocate some traffic to other links or increase the bandwidth of
congested ports.
Cause 6: The CPU is highly occupied and cannot process ARP protocol packets.
Check for the CPU_BUSY alarm on the NMS and clear this alarm immediately if it is
reported.

Alarms (2)
 The MPLS_TUNNEL_BDI/MPLS_TUNNEL_FDI is an alarm indicating defects
in the forward/backward direction of a tunnel.
 Possible Causes
 Cause: The upstream NE detects that the tunnel at the physical layer is faulty
 Cause: The upstream NE detects that the tunnel at the physical layer is faulty
On the physical link between the local NE and its upstream NE, check for the
faults such as fiber cuts, failure in optical modules, and board failure. Rectify
the fault if any.

Locating PW Faults - Common
Symptoms and Causes
Common Symptoms
1. PWs cannot be created, and therefore services cannot be provisioned.
2. PWs are faulty, causing service interruption, packet loss, or bit errors.
Common Causes
Cause 1: The physical link carrying the PW is faulty.
Cause 2: Cross-connections of PWs cannot be created.
Cause 3: The tunnels carrying PWs are faulty.

Locating PW Faults - Handling
Method
Cause 1: The physical link carrying the PW is faulty.
1. Check whether the physical link between the ingress and egress nodes is
normal. Handle the HARD_BAD, LASER_MOD_ERR, R_LOS, or ETH_LOS alarm if
any of them is reported.
2. Check whether any exceptions (such as board failure or NE reset) occur on the
opposite equipment. If yes, handle the exceptions.
Cause 2: Cross-connections of PWs cannot be created.
1. Check whether the number of created PWs reaches the maximum value. If
yes, replan PWs or delete redundant PWs.
Cause 3: The tunnels carrying PWs are faulty.
1. Handle the faults on tunnels.

Locating PW Faults – Common
Alarms
 The PW_DROPPKT_EXC is an alarm indicating that the
number of lost packets on the PW crosses the threshold.
 Possible Causes
 Cause: A small number of packets are lost on the PW.

 Cause 1: A small number of packets are lost on the PW.
 Check whether any service ports on the PW are congested. If yes, replan
the trail of services or increase the bandwidth of congested ports.

Contents

Locating Faults of 1+1 Protection -
Common Symptoms
Fault Symptoms
1+1 protection switching cannot be triggered.

After the working channel of a 1+1 protection group is restored, services cannot

be switched from the protection channel to the working channel.

The following hardware- and service-related alarms occur:

POWER_FAIL, VOLT_LOS, RADIO_TSL_HIGH, RADIO_TSL_LOW,

RADIO_RSL_HIGH, IF_INPWR_ABN, CONFIG_NOSUPPORT, R_LOC, R_LOF,
R_LOS, MW_LOF, HARD_BAD, MW_RDI
The packet services transmitted on the Hybrid microwave link are unavailable.

1+1 protection switching is delayed.



Common Causes
Possible Causes
Cause 1: The 1+1 protection group is in forced switching state.
Cause 2: The 1+1 protection group works in non-revertive mode or works in RDI state.
Cause 3: Hardware-related alarms occur.
Cause 4: Connections between the IF board and the EMS6 board are incorrect, or the cable
connectors are in poor contact.
Cause 5: Switching is triggered again upon the RDI alarm; anti-jitter function is performed
upon service alarms and the RDI alarm; the NE is being reset; the switching
between active and standby SCC boards is being performed.
Cause 6: IF cables are connected incorrectly.

Handling Method
Handling Procedure
Cause 1: The 1+1 protection group is in forced switching state.
Clear the forced switching state.
Cause 2: The 1+1 protection group works in non-revertive mode or works in RDI state.
Set the revertive mode of the protection group to revertive.
Cause 3: Hardware-related alarms occur.
Handle these alarms.
Cause 4: Connections between the IF board and the EMS6 board are incorrect, or the cable connectors
are in poor contact.
Re-connect the network cables between the IF board and the EMS6 board or use new cable
connectors.
Cause 5: Switching is triggered again upon the RDI alarm; anti-jitter function is performed upon
service alarms and the RDI alarm; the NE is being reset; the switching between active and
standby SCC boards is being performed.
Perform the 1+1 switching 30 minutes later.
Cause 6: IF cables are connected incorrectly.
Connect IF cables correctly.

Common Alarms
 The RPS_INDI is a microwave protection switching alarm
indication.
Possible Causes
 Services are transmitted on the standby channel.
Handling Procedure
 Troubleshoot the working channel.
 Set the revertive mode of the 1+1 protection group to revertive.

Locating Faults of SNCP Protection -
Fault Symptoms
 SNCP switching fails.
 The following hardware- and service-related events occur: performance
events of SDH SNCP protection switching
Possible Causes
Cause 1: SNCP switching fails because the NE software version mismatches the
board software version.
Cause 2: The working and protection channels of an SNCP protection group fail.
Cause 3: TU_AIS insertion upon E1_AIS is not provided (for OptiX RTN 600 V100R005
and OptiX RTN 900 V100R002C01 and later versions).

Handling Method
Handling Procedure
Cause 1: SNCP switching fails because the NE software version mismatches the board
software version.
Upgrade the NE software or board software.
Cause 2: The working and protection channels of an SNCP protection group fail.
Troubleshoot the channels.
Cause 3: TU_AIS insertion upon E1_AIS is not provided (for OptiX RTN 600 V100R005 and
OptiX RTN 900 V100R002C01 and later versions).
Set the TU_AIS insertion upon E1_AIS on the NMS.

Common Alarms
 The PS is an alarm indicating protection switching.
Possible Causes
 Services are transmitted on the standby channel.
Handling Procedure
1. Troubleshoot the active channel.
2. Set the revertive mode of the SNCP protection group to
revertive.

Locating APS Faults - Common
Locating Process Start
Yes Working and Yes Change the

ETH_APS_PA protection channels of
Configurations
TH_MISMATC differ on two configurations
an APS group differ on
H occurs? ends? to the same.
the two ends.
No
Yes
Fibers or cables Reconnect
No are connected
fibers or cables.
incorrectly?
Yes APS frames are Configurations No Change the

ETH_APS_LOST
lost on the are the same configurations
occurs?
protection channel. on two ends? to the same.
Yes
No Enable APS
APS protocol is
enabled on both protocol on both
ends? ends.
Yes
Yes
Hardware Rectify board
alarms occur? hardware faults.
No
Yes
Clock alarms Troubleshoot
occur? clocks.
No
Yes
Tunnel-level alarms Troubleshoot the
occur on the protection protection channel.
channel?
No
Faults are
rectified?
Yes
Contact Huawei
End
engineers.

Symptoms
The APS protection group is configured incorrectly or ETH_APS_PATH_MISMATCH

APS frames cannot be received.
ETH_APS_LOST
ETH_APS_SWITCH_FAIL
ETH_APS_TYPE_MISMATCH
The working tunnel or protection tunnel is faulty. MPLS_TUNNEL_LOCV
MPLS_TUNNEL_MISMERGE
MPLS_TUNNEL_MISMATCH
MPLS_TUNNEL_Excess
MPLS_TUNNEL_SD
MPLS_TUNNEL_SF
MPLS_TUNNEL_UNKNOWN

Causes
 Cause 1: The settings of the APS protection group differ between the two ends.
 Cause 2: The APS protection group is deactivated.
 Cause 4: APS frames cannot be transmitted because hardware-related alarms
occur
on the board that carries the protection channel.
 Cause 5: The system reports clock alarms.
 Cause 6: The working tunnel or protection tunnel is faulty.

Locating APS Faults - Handling
Method
Cause 1: The settings of the APS protection group differ between the two ends.
Check for the ETH_APS_PATH_MISMATCH and ETH_APS_TYPE_MISMATCH alarms.
If any of them is reported, handle the alarm.
Cause 2: The APS protection group is deactivated.
Check for the ETH_APS_LOST and ETH_APS_SWITCH_FAIL alarms. If any of them
is reported, handle the alarm.
Cause 3: Fibers or cables are connected incorrectly.
Reconnect the fibers or cables.

Locating APS Faults - Handling
Method
Cause 4: APS frames cannot be transmitted because hardware-related alarms occur on
the board that carries the protection channel.
Check whether any hardware-related alarms (such as HARD_BAD,
COMMUN_FAIL, and BUS_ERR) occur on the board that carries the protection
channel. If yes, clear these alarms.
Cause 5: The system reports clock alarms.
Check whether the system reports clock alarms such as TR_LOC, SYNC_C_LOS,
and LTI. If yes, clear these alarms.
Cause 6: The working tunnel or protection tunnel is faulty.
Check for tunnel-level alarms. If a tunnel reports a tunnel-level alarm, the
tunnel is faulty. Troubleshoot the tunnel.

Common APS Alarms (1)
The ETH_APS_LOST is an alarm indicating that APS frames are lost.
Possible Causes
Cause 1: The opposite NE is not configured with APS protection.
Cause 2: The settings of the APS protection group differ between the two ends.
Cause 3: The APS protection group is deactivated.
Cause 4: The service on the protection channel is interrupted.
Handling Procedure
Cause 1: The opposite NE is not configured with APS protection.
On the NMS, check whether the opposite NE is configured with APS protection. If the opposite NE is
configured with APS protection, create a matching APS protection group on the opposite NE and
activate the APS protocol.
On the NMS, check whether the settings of the APS protection group are the same at the two ends.
If the settings differ between the two ends, change them to the same.
Cause 3: The APS protection group is deactivated.
Check whether the APS protocol is activated at both ends. If the APS protocol is deactivated at one
end, deactivate the APS protocol at the other end and then activate the APS protocol at both ends.
Cause 4: The service on the protection channel is interrupted.
Check whether the protection channel reports an alarm related to signal loss or signal degrade,
such as ETH_LOS. If yes, clear the alarm immediately.

Common APS Alarms (2)
The ETH_APS_SWITCH_FAIL is an alarm indicating a protection switching failure.

Possible Causes

Handling Procedure
On the NMS, check whether the settings of the APS protection group are the same at the two ends. If the
settings differ between the two ends, change them to the same. Then, deactivate and activate the APS
protection group at the two ends.
The ETH_APS_TYPE_MISMATCH is an alarm indicating a protection scheme mismatch.


Possible Causes
Cause 1: The switching type is different.

Cause 2: The switching mode is different.


Cause 3: The revertive mode is different.


Handling Procedure
Cause: The switching type, switching mode, or revertive mode of the protection group differs between the
two ends.
On the NMS, check whether the settings of the APS protection group are the same at the two ends. If the
settings differ between the two ends, change them to the same. Then, deactivate and activate the APS
HUAWEI TECHNOLOGIES
protection CO.,ends.
group at the two LTD. Huawei Confidential Page 89
Locating ETH LAG Faults - Common
Locating Process

Symptoms
The LAG is invalid, all the member ports LAG_DOWN

cannot be used, and the services are
interrupted.
The member ports in the LAG cannot be LAG_MEMBER_DOWN

used, and the service has packet loss.
LOOP_ALM
ETH_LOS
ETH_LINK_DOWN

Causes
 Cause 1: The NEs at the two ends of the LAG are incorrectly
configured.
 Cause 2: The working mode of the member ports in the LAG is set to
half-duplex.
 Cause 3: The loopback is configured on the member ports in the
LAG.
 Cause 4: The connections of the member ports in the LAG are faulty
or lost.

Locating ETH LAG Faults -
Handling Method
Cause 1: The NEs at the two ends of the LAG are incorrectly configured.
(1) Query current alarms and check whether the LAG_DOWN or LAG_MEMBER_DOWN alarm
exists.
(2) Check whether the configurations of the NEs at the two ends of the LAG are consistent. If
the configurations are inconsistent, modify the configuration as the same, and then check
whether the alarm is cleared.
Cause 2: The working mode of the member ports in the LAG is set to half-duplex.
Check whether the working mode of each member port in the LAG is set to half-duplex. If the
working mode is set to half-duplex, modify the working mode of each port to full-duplex.
Cause 3: The loopback is configured on the member ports in the LAG.
(1) Check whether the LOOP_ALM alarm exists on each member port in the LAG. If yes, release
the loopback on each port to clear the LOOP_ALM alarm.
(2) Check whether the ETH_EFM_LOOPBACK alarm exists on each member port in the LAG. If
yes, release the remote loopback to clear the ETH_EFM_LOOPBACK alarm.
Cause 4: The connections of the member ports in the LAG are faulty or lost.
Check whether the ETH_LOS or ETH_LINK_DOWN alarm exists on each member port in the LAG.
If yes, clear the ETH_LOS or ETH_LINK_DOWN alarm.
Common ETH LAG Alarms (1)
 The LAG_DOWN is an alarm indicating that the LAG is unavailable.
Possible Causes
Cause 1: The opposite NE is not configured with any LAGs.

Cause 2: All member ports in the LAG are unavailable.


Handling Procedure
Cause 1: The opposite NE is not configured with any LAGs.
On the NMS, check whether the opposite NE is configured with a LAG. If the
opposite NE is not configured with a LAG, configure one on the opposite NE
and check whether the alarm clears.
Cause 2: All member ports in the LAG are unavailable.
When a member port in the LAG is unavailable, the system generates an
ETH_LOS, ETH_LINK_DOWN, or LAG_MEMBER_DOWN alarm. Handle and
clear the alarm and activate the member port.

Common ETH LAG Alarms (2)
The LAG_MEMBER_DOWN is an alarm indicating that a member port of a LAG is unavailable.

Possible Causes
Cause 1: The port link is unavailable.

Cause 2: The port receives no LACP packet.


Cause 3: The port works in half-duplex mode.




Handling Procedure
Cause 1: The port link is unavailable.
On the NMS, check whether the port in the LAG is enabled. If the port is not enabled, enable the port in
the LAG and check whether the alarm clears. If the alarm persists, check whether an
ETH_AUTO_LINK_DOWN alarm occurs on the port that reports the LAG_MEMBER_DOWN alarm. If yes,
clear the LAG_MEMBER_DOWN alarm.
Cause 2: The port receives no LACP packet.
On the NMS, check whether the opposite port is added to the LAG. If the opposite port is not added to
the LAG, add the opposite port to the LAG and check whether the alarm clears. If the alarm persists,
check whether an ETH_LOS or FLOW_OVER alarm occurs on the port that reports the
LAG_MEMBER_DOWN alarm. If yes, clear the LAG_MEMBER_DOWN alarm.
Cause 3: The port works in half-duplex mode.
Change the working mode of the port to auto-negotiation or full-duplex.
Release the loopback on the port.
Contents

Locating Clock Faults - Common
Symptoms and Causes
Fault Symptoms
 The service has bit errors or is interrupted.
 The system control, cross-connect, and timing board reports an
EXT_SYNC_LOS/LTI/S1_SYN_CHANGE/SYNC_C_LOS/SYNC_DISABLE alarm.
Possible Causes
 Cause 1: The priority of the synchronous clock source on the service board is
absent from the priority list.
 Cause 2: The synchronous clock source is lost and the clock of the NE works
improperly.
 Cause 3: The clock source is switched in SSM mode and the clock source
traced by the NE is also switched.
 Cause 4: The signals of the synchronous clock source are degraded.
 Cause 5: The external clock source is lost.
 Cause 6: The settings of clock tracing are incorrect.
Locating Clock Faults - Handling
Method
Handling Procedure
Cause 1: The priority of the synchronous clock source on the service board is absent from the
priority list.
Check for the SYNC_C_LOS alarm. If the SYNC_C_LOS alarm occurs, clear the SYNC_C_LOS
alarm.
Cause 2: The synchronous clock source is lost and the clock of the NE works improperly.
Check for the LTI alarm. If the LTI alarm occurs, clear the LTI alarm.
Cause 3: The clock source is switched in SSM mode and the clock source traced by the NE is
also switched.
Check for the S1_SYN_CHANGE alarm. If the S1_SYN_CHANGE alarm occurs, clear the
S1_SYN_CHANGE alarm.
Cause 4: The signals of the synchronous clock source are degraded.
Select a different clock source (by performing a clock source switchover or re-configure the
clock source priority list) and find out signal degrade causes along the clock tracing path.
Cause 5: The external clock source is lost.
Check for the EXT_SYNC_LOS alarm. If the EXT_SYNC_LOS alarm occurs, clear the
EXT_SYNC_LOS alarm.
Cause 6: The settings of clock tracing are incorrect.
Set clock tracing again according to network planning information.

Common Clock Alarms (1)
 The EXT_SYNC_LOS is an alarm indicating the loss of the external
clock source.
Possible Causes
Cause 1: The external clock source is configured in the clock source

priority list, but the external clock source cannot be detected or become
invalid.
Handling Procedure
Cause 1: The external clock source is configured in the clock source priority
list, but the external clock source cannot be detected or become invalid.
Check whether the equipment that provides the external clock source is
faulty, and check whether the cable that connects the external clock
source is normal.

The LTI is an alarm indicating that the synchronous clock source is lost.
Possible Causes
Cause 1: The clock configuration is incorrect.
Cause 2: All the clock sources in the clock source priority list fail.
Handling Procedure
Cause 1: The clock configuration is incorrect.
Query the clock synchronization status and check whether the data in the clock
source priority list meets the network planning requirement.
Cause 2: All the clock sources in the clock source priority list fail.
Troubleshoot the synchronization sources based on the clock source priority list.
If the synchronization source is an external clock, handle the EXT_SYNC_LOS
alarm; if the synchronization source is a line clock, handle the alarm that occurs
on the line board; if the synchronization source is an IF clock, handle the alarm
that occurs on the IF board; if the synchronization source is a tributary clock,
handle the alarm that occurs on the tributary board; if the synchronization
source is an Ethernet clock, handle the alarm that occurs on the Ethernet
board.
 The S1_SYN_CHANGE is an alarm indicating that the clock source is
switched in SSM or extended SSM mode.
Possible Causes
Cause 1: The original clock source is lost when the SSM protocol or extended

SSM protocol is enabled.

Handling Procedure
Cause 1: The original clock source is lost when the SSM protocol or extended
SSM protocol is enabled.
Handle the SYNC_C_LOS alarm that is related to the original clock source.

 The SYNC_C_LOS is an alarm indicating that the synchronization
source is lost.
Possible Causes
Cause 1: The clock source is lost.

Handling Procedure
Cause 1: The clock source is lost.
Based on the clock source priority list, determine the synchronization
source corresponding to the lost clock source.

 The SYNC_DISABLE is an alarm indicating that the automatic
synchronization of SCC boards is disabled.
Possible Causes
Cause 1: The status of the automatic synchronization of SCC boards

changes from enabled to disabled.

Handling Procedure
Cause 1: The status of the automatic synchronization of SCC boards
changes from enabled to disabled.
Change the status of the automatic synchronization of SCC boards from
disabled to enabled. Then, check whether the alarm clears. If the alarm
persists, replace the board that reports the alarm.

Contents

Locating Inband DCN Faults - Common
Locating Process

Locating Inband DCN Faults - Common
Symptoms and Causes
Common Symptoms
 The communication between the NMS and the NE is interrupted. The NE icon on the
NMS is gray, and the NE is unreachable to the NMS.
 The operations on the NMS are not responded. If the response interruption time lasts
for more than two minutes, the communication between the NMS and the NE is
interrupted.
 When youCauses
Common query certain information on the NMS, the query result contains
 incomplete
Cause 1: Oninformation.
a network, the NE IDs, NE IP addresses or subnet masks conflict.
 Cause 2: The inband DCN port of the faulty NE is not enabled, or parameter settings
for the interconnected ports are inconsistent.
 Cause 3: The physical connection between the faulty NE and the NMS is interrupted.
 Cause 4: The received signals of the faulty NE are lost, or the received optical power
is excessively low, and therefore the DCN packets cannot be extracted.
 Cause 6: A DCN storm or DCN interruption occurs as the third-party network that the
DCN packets traverse is faulty.
 Cause 7: The bandwidth configured for the inband DCN channel is excessively small.
 Cause 8: The SCC board on the faulty NE is being reset or switched, and therefore
the inband DCN packets cannot be responded.

Locating Inband DCN Faults - Handling
Method
Cause 1: On a network, the NE IDs, NE IP addresses or subnet masks conflict.
It is usually caused by the new NE on the network. According to the NE plan table, check whether the
NE ID, NE IP address and subnet mask of the new NE are correctly configured. If any parameters are
incorrect or conflict with the configuration of another NE, re-configure these parameters.
Cause 2: The inband DCN port of the faulty NE is not enabled, or parameter settings for the
interconnected ports are inconsistent.
(1) Check whether the ports, which support the DCN function by default, are connected to fibers or
cables. If the fibers or cables are not connected to the ports whose DCN function is enabled by default,
change the present port to a port whose DCN function is enabled by default.
(2) Check whether the ports at the two ends of the link are enabled. If not, enable the inband DCN for
the ports.
(3) Check whether the configurations of the ports at the two ends are consistent, such as the working
mode of the Ethernet port. If inconsistent, modify the configurations to match each other.
Cause 3: The physical connection between the faulty NE and the NMS is interrupted.
Check whether the network cables or fibers of the faulty NE are disconnected from the ports. If the
network cables or fibers are disconnected from the ports, insert the network cables or fibers again.
Cause 4: The received signals of the faulty NE are lost, or the received optical power is excessively low,
and therefore the DCN packets cannot be extracted.
Check whether the R_LOS, ETH_LOS, or IN_PWR_ABN alarm exists on the board configured with the
inband DCN channel. If the alarm exists, clear it.
Check whether the HARD_BAD or TEMP_OVER alarm exists on the board configured with the inband
DCN channel. If the alarm exists, replace the board that reports the alarm.
Locating Inband DCN Faults - Handling
Method (Continued)
Cause 6: A DCN storm or DCN interruption occurs as the third-party network that the DCN
packets traverse is faulty.
If the DCN packets traverse a third-party network, check whether a port loop or physical link
interruption occurs in the third-party network. If yes, rectify the faults in the third-party network
first.
Cause 7: The bandwidth configured for the inband DCN channel is excessively small.
(1) When the number of services configured on the port exceeds a certain number, part of the
query information may be lost. In this case, you should properly increase the bandwidth
configured for the inband DCN channel.
(2) If a DCN gateway manages a large number of NEs, a network congestion may occur,
especially during package loading. If a network congestion occurs, change the position of the
DCN gateway and the number of NEs that the DCN gateway manages. Generally, a DCN
gateway manages a maximum of 64 NEs.
Cause 8: The SCC board on the faulty NE is being reset or switched, and therefore the inband
DCN packets cannot be responded.
(1) Observe whether the PROG indicator on the SCC board is blinking green. If the indicator is
blinking green, it indicates that the SCC board is in the reset state. After the PROG indicator is
steady on (green), the reset of the SCC board is complete and the DCN connection is
automatically recovered.
(2) If the DCN connection is not recovered, check whether a protection switchover occurs on a
board. A protection switchover on a board will reroute DCN packets.
(3) IfHUAWEI TECHNOLOGIES
a protection CO., LTD.
switchover occurs on Huawei Confidential
a board, the DCN connectionPage 108
is automatically recovered
Contents


Locating NE Resets - Fault Symptoms and
Possible Causes
Fault Symptoms
 A cold reset or warm reset occurs on an NE.
Possible Causes
 Cause 1: A manual operation causes the reset.
 Cause 2: The power supply of the NE is abnormal.
 Cause 4: Certain tasks have high CPU usage.
 Cause 5: Other reasons cause the reset.

Locating NE Resets - Handling
Method
Handling Procedure
Cause 1: A manual operation causes the reset.
Check operation records on the NMS and oplog/errlog records on the NE.
Cause 2: The power supply of the NE is abnormal.
Check for a low-voltage reset record or an exception record among errlog records
on the NE and records in the black box; check whether the voltage of the power
supply is stable; check whether the environment causes abnormal power supply.
Replace the faulty board.
Cause 4: Certain tasks have high CPU usage.
Check whether the current network has a large scale and whether the number of
routes is far greater than the recommended value.
Cause 5: Other reasons cause the reset.
Collect the current and historical alarms on the NMS and the NE, records in the
black box, oplog records, errlog records, dopra records, debugbuf records, and
other information required for fault locating, and send all the information to
Huawei engineers.
Locating Package Loading Failures
- Fault Symptoms and Possible
Causes
Fault Symptoms
 Package loading fails.
Possible Causes
 Cause 1: The NE is abnormal in the process of software loading.
 Cause 2: Backing up databases fails.
 Cause 3: A rollback occurs due to a failure in package downloading.
 Cause 4: A rollback occurs due to an error in the software activation process.
 Cause 5: The upgrade task is not rolled back when an error occurs in the
software activation process.
 Cause 6: The SWDL_INPROCESS alarm persists after the upgrade is
complete.
 Cause 7: User interfaces stop responding in the upgrade process.
 Cause 8: A board is reseated in the upgrade process.
 Cause 9: The PCBs of the active and standby SCC boards are of different
versions.
Locating Package Loading Failures -
Possible Causes
Possible Causes
 Cause 10: The NE is in the Undispensed state when an upgrade task is being
created.
 Cause 11: The NE is in the Unactivated state when an upgrade task is being
created.
 Cause 12: The NE is in the Uncommitted state when an upgrade task is being
created.
 Cause 13: No CF card is installed on the SCC board or the memory in the CF
card is insufficient.
 Cause 14: Other reasons cause the failure.

Handling Method
Cause 1: The NE is abnormal in the process of software loading.
Load software 10 minutes later because the NE is in an unstable state.
Cause 2: Backing up databases fails.
Upload the NE databases to the U2000 again; create another upgrade task and
run the task. If the backing-up fails again, perform a warm reset on the NE.
Cause 3: A rollback occurs due to a failure in package downloading.
Check whether DCN communication is normal and whether bandwidth is
sufficient. If no fault is found, check whether a correct software package is
downloaded. If the downloaded software package is correct, check whether the
remaining space on the flash memory is greater than the space required by the
software package. If the remaining space on the flash memory is sufficient,
change the gateway NE.
Cause 4: A rollback occurs due to an error in the software activation process.
Check whether any board is removed or whether the NE is manually reset during
the upgrade. Then, activate the software again.

Handling Method
Cause 5: The upgrade task is not rolled back when an error occurs in the
software activation process.
Select the task and click Ignore to commit the task. Then, check the version
of each board. For boards whose version information is not updated, perform
a cold reset on them. If a resetting command cannot be issued, perform a
warm reset on the SCC board if the NE has only one SCC board, or perform
active/standby switching between SCC boards if the NE has two SCC boards.
Cause 6: The SWDL_INPROCESS alarm persists after the upgrade is complete.
Check whether the NE is in a normal state. If yes, perform a warm reset on
the NE.
Cause 7: User interfaces stop responding in the upgrade process.
Restart the tool and create a new task that runs directly from the NE state,
which is displayed when the task is originally created.

Handling Method
Cause 8: A board is reseated in the upgrade process.
Remove the board, and then insert the board after the NE enters the normal
state. If automatic matching still fails, check whether an SWDL_INPROCESS
alarm occurs. If yes, clear the SWDL_INPROCESS alarm first.
Cause 9: The PCBs of the active and standby SCC boards are of different
versions.
Run the :mon-get-dump:bid,"SWDL.ISWDL.CSWDL","" command on the
Navigator and check whether the returned values of m_byPCB for the active
and standby SCC boards are the same. If the returned values of m_byPCB for
the two SCC boards are different, replace any SCC board to ensure that the
SCC boards use the same PCB.
Cause 10: The NE is in the Undispensed state when an upgrade task is being
created.
Skip Load Package and create a task from the Dispense state; or enter the
:swdl-dnld-swmem command on the Navigator.
Handling Method
Cause 11: The NE is in the Unactivated state when an upgrade task is being
created.
Skip Load Package and Dispense and create a task from the Active state; or
enter the :mon-init-sys:0,swdl command on the Navigator. (An activation
operation will interrupt services. Therefore, check whether an activation
operation is allowed.)
Cause 12: The NE is in the Uncommitted state when an upgrade task is being
created.
Skip Load Package, Dispense, and Active, and create a task from the Commit
state; or enter the :swdl-commit-swmem command on the Navigator.
Cause 13: No CF card is installed on the SCC board or the memory in the CF card is
insufficient.
If no CF card is installed on the SCC board, install a CF card; if the memory in the
CF card is insufficient, delete unnecessary files in the CF card.
Cause 14: Other reasons cause the failure.
Collect data and send the data to Huawei engineers.
Common Alarms
 The CFCARD_FAILED is an alarm indicating that the operation on the
CF card fails.
Possible Causes
 Cause 1: The CF card is faulty, resulting in an initialization failure.
 Cause 2: The SCC board is faulty, resulting in a failure to create a CF file.
Handling Procedure
Cause 1: The CF card is faulty, resulting in an initialization failure.
Replace the CF card and check whether the alarm is cleared.
Cause 2: The SCC board is faulty, resulting in a failure to create a CF file.
Check whether the HARD_BAD alarm exists on the SCC board. If yes,
perform a cold reset on the SCC board. Then, check whether the alarm is
cleared. If the alarm persists, replace the SCC board.

Common Alarms
 The CFCARD_OFFLINE is an alarm indicating that the CF card is offline.
Possible Causes
 Cause 1: The CF card is not installed.
 Cause 2: The CF card is in poor contact with the SCC board.
 Cause 3: The SCC board is faulty.
Handling Procedure
Cause 1: The CF card is not installed.
Check whether the CF card is installed on the SCC board. If not, install a CF
card.
Cause 2: The CF card is in poor contact with the SCC board.
Check whether the CF card is loosened. If yes, re-install the CF card. Then,
check whether the alarm is cleared. If the alarm persists, replace the CF card.
Cause 3: The SCC board is faulty.
Check whether the HARD_BAD alarm exists on the SCC board. If yes, perform
a cold reset on the SCC board. Then, check whether the alarm is cleared. If
the alarm persists, replace the SCC board.

Contents

Handling Common Alarms (1)
 The AM_DOWNSHIFT is an alarm indicating the downshift of the AM scheme.
Possible Causes
 Cause 1: The external factors (for example, the climate) cause the degradation of the working
channels.
 Cause 2: There are interferences around the working channels.
 Cause 3: The ODU at the transmit end has abnormal transmit power.
 Cause 4: The ODU at the receive end has abnormal receive power.
 Cause 5: Multi-path fading occurs due to atmospheric and ground effects.
Handling Procedure
Cause 1: The external factors (for example, the climate) cause the degradation of the working
channels.
When the external factors (for example, the climate) cause the degradation of the working
channels, the downshift of the AM scheme is normal. Therefore, no measure should be taken
to handle the alarm.
Cause 2: There are interferences around the working channels.
Eliminate the interferences around the working channels.
Cause 3: The ODU at the transmit end has abnormal transmit power.
Use the NMS to check whether the transmit power of the ODU at the transmit end is normal.
For details on troubleshooting at the transmit end, see Locating Link Faults.
Cause 4: The ODU at the receive end has abnormal receive power.
Use the NMS to check whether the receive power of the ODU at the receive end is normal. For
details on troubleshooting at the receive end, see Locating Link Faults.
Cause 5: Multi-path fading occurs due to atmospheric and ground effects.
Adjust the elevation angles of the antennas; use large-diameter antennas; re-plan
transmission links to avoid areas with severe multi-path fading.
 The BD_STATUS is an alarm indicating that the board cannot be detected.
Possible Causes
 Cause 1 of the alarm reported by a board of the IDU: The board is installed in an incorrect slot.
 Cause 2 of the alarm reported by a board of the IDU: The board and the backplane are not connected
properly.
 Cause 3 of the alarm reported by a board of the IDU: The slot is faulty.
 Cause 4 of the alarm reported by a board of the IDU: The alarmed board is faulty.
 Cause 5: The ODU is faulty; the power that the IF board supplies to the ODU is abnormal; the IF cable is
damaged or is not properly connected.
Handling Procedure
Cause 1: The board is installed in an incorrect slot.
Check whether the physical slot and logical slot of the alarmed board are the same.
Cause 2: The board and the backplane are not connected properly.
Re-install the alarmed board.
Cause 3: The slot is faulty.
Check whether the slot has broken or bent pins. If yes, insert the board in a vacant slot.
Replace the board.
Cause 5: The ODU is faulty; the power that the IF board supplies to the ODU is abnormal; the IF cable is
damaged or is not properly connected.
Replace the ODU that reports the alarm; check the voltage at the RF port on the IF board; check
whether the IF cable is wet or abnormal; re-connect the IF cable.
 The BUS_ERR is an alarm of bus errors.
Possible Causes
 Cause 1: The board and the backplane are not connected properly.
 Cause 3: The inter-board bus is faulty.
Handling Procedure
Re-install the alarmed board; check whether the backplane has broken or bent
pins. If the backplane has broken or bent pins, insert the board in a vacant slot or
replace the backplane.
Perform a cold reset on the alarmed board. If the alarm persists, perform a cold
reset on the SCC board. If the alarm still persists, replace the alarmed board.
Cause 3: The inter-board bus is faulty.
On the NMS, check whether an alarm indicating loss/deterioration of a clock
source is reported. If yes, clear clock alarms and then check whether the
BUS_ERR alarm clears.
 The COMMUN_FAIL is an alarm indicating an inter-board communication failure.
Possible Causes
 Cause 1: The alarmed board is reset.
 Cause 2: The board and the backplane are not connected properly.
 Cause 4: The slot is faulty.
Handling Procedure
Cause 1: The alarmed board is reset.
Perform a reset on the alarmed board. Then, the alarm disappears automatically.
Re-install the alarmed board; check whether the backplane has broken or bent pins. If the backplane has
broken or bent pins, insert the board in a vacant slot or replace the backplane.
Cause 4: The slot is faulty.
Check whether the slot has broken or bent pins. If yes, insert the board in a vacant slot.
Perform a cold reset on the SCC board. Then, check whether the alarm clears. If the alarm persists, replace
the SCC board.

 The FAN_FAIL is an alarm indicating that a fan is faulty.
Possible Causes
 Cause 1: The alarmed board and the backplane are not connected properly.
 Cause 2: A fan is faulty.
Handling Procedure
Cause 1: The alarmed board and the backplane are not connected properly.
Re-install the alarmed board; check whether the backplane has broken or bent
pins. If the backplane has broken or bent pins, insert the board in a vacant slot or
replace the backplane.
Cause 2: A fan is faulty.
Remove the fan board and clean the fans. Then, install the fan board and check
whether the alarm clears. If the alarm persists, replace the fan board.

 The HARD_BAD is an alarm indicating that the hardware is faulty.
Possible Causes
 Cause 1: The external power supply fails.
 Cause 2: The alarmed board and the backplane are not connected properly.
 Cause 3: The alarmed board has hardware errors.
Handling Procedure
Cause 1: The external power supply fails.
Check the external power supply.
Cause 2: The alarmed board and the backplane are not connected properly.
Re-install the alarmed board; check whether the backplane has broken or bent pins. If the backplane
has broken or bent pins, insert the board in a vacant slot or replace the backplane.
Cause 3: The alarmed board has hardware errors.
Perform a cold reset on the alarmed board and check whether the alarm clears. If the alarm persists,
replace the alarmed board.
Perform a cold reset on the SCC board. Then, check whether the alarm clears. If the alarm persists,
replace the SCC board.

 The IF_CABLE_OPEN is an alarm indicating that the IF cable is open.
Possible Causes
 Cause 1: The IF cable is loose or faulty.
 Cause 2: The IF port on the IF board is damaged.
 Cause 3: The power module of the ODU is faulty.
Handling Procedure
Cause 1: The IF cable is loose or faulty.
Check whether the connector of the IF cable is damaged/wet/corroded/loose or
whether the connector is made properly. (The connectors to be checked include
the connector between the IF pigtail and the IF board, the connector between the
IF pigtail and the IF cable, and the connector between the IF cable and the ODU.)
Cause 2: The IF port on the IF board is damaged.
Replace the alarmed IF board.
Cause 3: The power module of the ODU is faulty.
Replace the ODU connected to the alarmed IF board.

 The IF_INPWR_ABN is an alarm indicating that the power supplied by an IF board to
an ODU is abnormal.
Possible Causes
 Cause 1: The IF board is faulty.
 Cause 2: The IF cable is faulty.
 Cause 3: The ODU is faulty.
Handling Procedure
Cause 1: The IF board is faulty.
Replace the alarmed IF board.
Cause 2: The IF cable is faulty.
Check whether the connector of the IF cable is damaged/wet/corroded/loose or whether the
connector is made properly. (The connectors to be checked include the connector between the IF
pigtail and the IF board, the connector between the IF pigtail and the IF cable, and the connector
between the IF cable and the ODU.)
Cause 3: The ODU is faulty.
Perform a cold reset on the ODU and check whether the alarm clears. If the alarm persists,
replace the ODU.

 The MW_CFG_MISMATCH is an alarm indicating a configuration mismatch on
microwave links.
Possible Causes
 Cause 1: The number of E1 signals is different on both ends of a microwave link (including
the number of E1 signals on the active page and the number of E1 signals on the standby
page).
Handling Procedure
Cause 1: The number of E1 signals is different on both ends of a microwave link.
Cause 2: The AM enabling is different on both ends of a microwave link.
Cause 3: The IEEE 1588 overhead enabling is different on both ends of a microwave link.
Cause 4: The modulation mode is different on both ends of a microwave link.
Cause 5: The channel spacing is different on both ends of a microwave link.
Determine the possible cause of the alarm according to the alarm parameters. Then, check
the configuration on both ends of the microwave link. Ensure that the configuration is the
HUAWEI
sameTECHNOLOGIES
on both endsCO., LTD.
of the microwaveHuawei
link. Confidential Page 129
 The MW_LOF is an alarm indicating that microwave frames are lost.
Possible Causes
 Cause 1: Certain other alarms occur.
 Cause 2: The IF working mode or channel spacing at both ends of a microwave link does not match the preset
modulation mode.
 Cause 3: The operating frequency of the ODU at the local site is inconsistent with the operating frequency of the ODU
at the opposite site, resulting in abnormal receive power.
 Cause 4: An IF/RF transmit/receive channel is faulty.
 Cause 5: An interference event occurs.
Handling Procedure
Cause 1: Certain other alarms occur.
Check for HARD_BAD, VOLT_LOS, IF_CABLE_OPEN, BD_STATUS, RADIO_RSL_LOW, CONFIG_NOSUPPORT, and
TEMP_ALARM alarms. If any of these alarms are reported, clear them immediately.
Cause 2: The IF working mode or channel spacing at both ends of a microwave link does not match the preset modulation
mode.
Modify the settings of IF parameters according to network planning requirements to ensure a match with the preset
modulation mode.
Cause 3: The operating frequency of the ODU at the local site is inconsistent with the operating frequency of the ODU at
the opposite site, resulting in abnormal receive power.
Set the transmit frequency of the local site to the same as the receive frequency of the opposite site. Then, set the
receive frequency of the local site to the same as the transmit frequency of the opposite site. In addition, ensure that
the receive power of the ODU at both ends of the microwave link meets the planned value.
Cause 4: An IF/RF transmit/receive channel is faulty.
Perform loopbacks section by section to check whether the ODU/IF transmit/receive channel is faulty. If a fault is found,
replace the ODU/IF board.
Cause 5: An interference event occurs.
HUAWEI TECHNOLOGIES
Eliminate CO.,
the interference LTD.
source. Huawei Confidential Page 130
 The MW_LIM is an alarm indicating that a mismatched microwave link identifier is detected.
Possible Causes
 Cause 1: The link ID of the local site does not match the link ID of the opposite site.
 Cause 2: The services on other microwave links are received due to the incorrect configuration of the
microwave link receive frequency at the local or opposite site.
 Cause 3: The antenna receives the signals from the other sites, because the direction of the antenna
is set incorrectly.
 Cause 4: The polarization direction of the XPIC is incorrect.
Handling Procedure
Cause 1: The link ID of the local site does not match the link ID of the opposite site.
Check whether the link ID of the local site matches the link ID of the opposite site. If not, set the link
IDs of the two sites to the same value according to the requirements of the networking planning.
Cause 2: The services on other microwave links are received due to the incorrect configuration of the
microwave link receive frequency at the local or opposite site.
Check whether the receive and transmit frequencies of the local site are consistent with the receive
and transmit frequencies of the opposite site. If not, set the receive and transmit frequencies of the
two sites again.
Cause 3: The antenna receives the signals from the other sites, because the direction of the antenna is
set incorrectly.
Align the antennas at the two ends.
Cause 4: The polarization direction of the XPIC is incorrect.
Check whether the configuration of XPIC work groups is correct. In addition, check and adjust the IFX2
board and ODU, and the mapping between the ODU and the feed. Ensure that the V-polarized XPIC IF
boards at the two ends are interconnected through the V-polarized microwave link, and the H-
polarized
HUAWEI XPIC IF boards
TECHNOLOGIES CO.,atLTD.
the two ends are interconnected
Huawei Confidential through the H-polarized
Page 131 microwave link.
 The POWER_ALM is an alarm indicating that the power module is abnormal.
Possible Causes
 Cause 1: If the alarm is reported on the board on the IDU, the input power or the
PIU is abnormal.
 Cause 2: If the alarm is reported on the board on the IDU, the power module is
abnormal.
 Cause 3: If the alarm is reported on the ODU, the power module of the ODU is
faulty.
Handling Procedure
Cause 1: If the alarm is reported on the board on the IDU, the input power or the PIU
is abnormal.
Check whether any alarms are reported on the PIU. If yes, clear the alarms
immediately.
Cause 2: If the alarm is reported on the board on the IDU, the power module is
abnormal.
Cause
HUAWEI 3: If the alarmCO.,
TECHNOLOGIES is reported
LTD. on Huawei
the ODU, the power module Page
Confidential of the ODU is faulty.
132
 The POWER_ABNORMAL is an alarm indicating that the input power supply is
abnormal.
Possible Causes
 Cause 1: The power cable is cut, damaged, or not connected.
 Cause 2: The input power is abnormal.
 Cause 3: The PIU board is faulty.
Handling Procedure
Cause 1: The power cable is cut, damaged, or not connected.
Check whether the power cable is cut, damaged, or not connected. If the power
cable is cut or damaged, replace it with a proper power cable. If the power cable
is not connected, connect it.
Cause 2: The input power is abnormal.
Contact power supply engineers to rectify the fault.
Cause 3: The PIU board is faulty.
Replace the PIU board.

 The RADIO_TSL_HIGH is an alarm indicating that the microwave transmit power is too high.
Possible Causes
Handling Procedure
replace the ODU.
 The RADIO_TSL_LOW is an alarm indicating that the microwave transmit power is too low.
Possible Causes
 Cause 2: The signals from the IF board to the ODU are abnormal.
Handling Procedure
replace the ODU.
Cause 2: The signals from the IF board to the ODU are abnormal.
Perform a cold reset on the IF board and check whether the alarm clears. If the alarm persists,
replace the IF board.
 The TEMP_OVER is an alarm indicating that the operating temperature of the board crosses the
threshold.
Possible Causes
 Cause 1: The ambient temperature is very high or very low due to a fault in the cooler or heater
equipment.
 Cause 2: The configuration of the upper and lower thresholds of the temperature alarm is not proper.
 Cause 3: The fan stops working or the air filter is too dusty.
Handling Procedure
Cause 1: The ambient temperature is very high or very low due to a fault in the cooler or heater
equipment.
Check whether the ambient temperature is higher than 45ºC or lower than 0ºC. If the temperature is
abnormal, check whether the cooler or heater equipment is faulty. Troubleshoot the equipment fault
first.
Cause 2: The configuration of the upper and lower thresholds of the temperature alarm is not proper.
Check the current operating temperature of the board and the configuration of the upper and lower
temperature thresholds. In addition, check whether the configuration is proper according to actual
requirements. If the configuration is not proper, modify the configuration.
Cause 3: The fan stops working or the air filter is too dusty.
Check whether the FAN_FAIL alarm occurs. If yes, clear the alarm first. Check whether the air filter is
covered with dust, which impedes heat dissipation. You can feel the wind and the temperature of the
wind at the air outlet. If heat dissipation is impeded due to the dusty air filter, remove the air filter
HUAWEI TECHNOLOGIES
and clean it. CO., LTD. Huawei Confidential Page 135
Contents

Services on an IF Board Were
Interrupted Because the IF Board Was
Reset Due to Low Voltage
Fault Symptoms
 On an NE that was not powered off or reset, an IF board reported
BD_STATUS alarms but it was not reseated or reset.
 The services that the IF board carried were automatically restored after
an interruption.
Cause Analysis
 The software black box contained a record indicating a board reset due
to low voltage. It was found that a transient voltage dip occurred on the
IF board.
Solutions
Check the power supply records of the NE.

Software Watchdogs of RTN NEs Were
Frequently Reset Due to a Large
Network Scale
Fault Symptoms
 The software watchdogs of RTN NEs on a live network were frequently
reset.
Cause Analysis
 (1) Tasks SOCK, tNetTask, and tL2TSvR1b58 accounted for more than
60% of the CPU usage. As a result, task VIDL could not be carried out.
 (2) Task SOCK is a communication task of the TCP/IP protocol stack; task
tNetTask is a communication task of the VXworks operating system;
task tL2TSvR1b58 is an internal communication task of NEs. When these
three tasks had high CPU usage simultaneously, the communication
traffic was very heavy.
 (3) Route query results showed that some NEs had 600 routes.
Generally, it is recommended that an NE has a maximum of 64 routes
(or 100 routes in particular cases).
Solutions
Divide the network into more Huawei
HUAWEI TECHNOLOGIES CO., LTD.
subnets.
Confidential Page 138
An NE Failed to Trace an External
Clock Due to Inconsistent Settings for
the NEs at the Two Ends
Fault Symptoms
 On a live network, an NE failed to trace an external clock and entered
the free-run state, but its opposite NE properly output clock signals. The
NMS displayed an LTI alarm but not an EXT_SYNC_LOS alarm.
Cause Analysis
 (1) Check the network topology; check whether the external clock was
available; check the connection of the clock line; check which type of
equipment output clock signals at the opposite end.
 (2) Check whether the clock output mode was set to a same value at
the two ends. If the clock output mode was set to 2 Mbit/s, check
whether the settings for the S1 byte and SSM protocol were consistent
between the two ends. In addition, check whether the local external
clock port was configured with DCC overheads.
Solutions
Due to inconsistent setting for the SSM protocol at the two ends, the
local NE could not correctly obtain the S1 byte and as a result, reported
an LTI alarm. The NE finally restored to normal after its SSM protocol
was disabled.
An IF Board Reported an LPUAS Performance
Event But Did Not Report a Lower Order Alarm
Because LP_UNEQ Alarms Were Suppressed
Fault Symptoms
 An IF board reported an LPUAS performance event but did not report a
lower order alarm.
Cause Analysis
 The IF board and line board were configured with cross-connections but
the services that the line board carried were not completely cut over. As
a result, the value of the V5 byte carried in the lower order channel was
0 and accordingly the IF board reported an LPUAS event.
 It was found that the reporting status of LP_UNEQ alarms was set to
DISABLE. Therefore, LP_UNEQ alarms could not be reported. In addition,
LP_REI, LP_RDI, LP_TIM, LP_RFI, BIP_EXC, and BIP_SD alarms could not
also be reported because they were suppressed by LP_UNEQ alarms.
Solutions
Set the reporting status of LP_UNEQ alarms to ENABLE.

Service Interruption Due to
Incorrect IF Cable Connections
Fault Symptoms
 The network diagram is provided in the following figure. After NE2108
was powered off, the services between NE2108, NE2199, NE2299, and
NE2120 were interrupted. The services, however, were not restored
even after NE2108 restarted.
Note: NE2199 and NE2299 were at the same site; NE2108 and NE2120 were at
different sites.

Cause Analysis
 Normally, after NE2108 is powered off, the active and standby ODUs of
NE2199 will report MW_LOF and RADIO_RSL_LOW alarms simultaneously
and NE2299's services will not be affected. ODU 15 of NE2199 and ODU
17 of NE2299, however, simultaneously reported MW_LOF and
RADIO_RSL_LOW alarms. In addition, IF board 7 of NE2199 and IF board
5 of NE2299 simultaneously reported MW_RDI alarms.
 It is suspected that the IF cables for the standby links of NE2199 and
NE2299 were incorrectly connected. Based on the reported alarms, it is
confirmed that IF board 7 of NE2199 was connected to ODU 17 of
NE2299 and IF board 7 of NE2299 was connected to ODU 17 of NE2199.
For the connections, see the figure provided in the next slide.

Solutions
 Exchange the IF cables between IF boards 7 of NE2199 and NE2299.
Incorrect IF cable
connections

A Login to an NE Failed Due to
Conflicting IP Addresses
Fault Symptoms
 A computer has two network adapters, one connected to a public network
and the other connected to an NE. The IP addresses of the two network
adapters and that of the NE were in a same network segment. The subnet
mask of the network adapter connected to the public network and that of
the NE were set to 255.255.255.0, and that of the private network was set
to 255.255.0.0. A user could not find the NE using the Web LCT but could
find the NE using the Navigator. The user, however, could not log in to the
NE or successfully ping the NE.
Cause Analysis
 (1) Ran the arp –a command to query the IP addresses of the devices
connected to the computer, and found that the public network and private
network had same IP addresses.
 (2) Disconnected the network cable that connected one network adapter to
the public network. Then, added the NE again. After the addition of the NE,
the Web LCT properly communicated with the NE.
Solutions
Check IP addresses on a network and ensure that every IP address on the
network is unique.
Service Interruptions Because IF
Parameter Settings Were Not in
Compliance with the Network Planning
Document
Fault Symptoms
 After being activated, a newly deployed BTS frequently encountered
transient service interruptions. It took 20 ms or even 1,000 ms for a
user to successfully ping a BSC from the BTS. It, however, always took
less than 30 ms for a user to successfully ping the BSC from the
microwave equipment that was connected to the BTS.
Cause Analysis
 (1) The pinging duration is normal if undersized packets are transmitted
but is abnormal if oversized packets are transmitted.
 (2) Based on IF configurations, it is found that IF parameters were not
set according to the network planning document. The bandwidth
allocated to data services was very low.
Solutions
Modify the IF parameter settings according to the network planning
document.
Note: Bandwidth available to data services = Service bandwidth - E1-used

bandwidth (for Hybrid microwave)

A Remote Login to an NE Failed
Due to Repeated NE IDs
Fault Symptoms
On a new OptiX RTN network, NE01, NE02, and NE03 formed a chain. A user could log in to
NE03 from NE02 but could not log in to NE03 from NE01.
Cause Analysis
Possible cause 1: NE03 has a hardware fault, causing a DCN communication failure.
Possible cause 2: The network configuration is incorrect.
Handling Procedure
(1) Queried NE03's adjacent routes and found that the NE IDs of NE01 and NE02 were
displayed.
(2) Performed a reset on NE03 and found that the fault persisted.
(3) Checked NE03 on site, and found that one optical port of the EG2 board was connected
to NE02 and another optical port of the EG2 board was connected to NE04.
(4) Logged in to NE04 and found that the NE ID of NE04 was the same as that of NE01.
(5) Changed the NE ID of NE04 to a unique value on the network. Then, logged in to NE03
from NE01. The login was successful.

A Service Created on a Static Tunnel
Could Not Be Set Up Due to Incorrect
Fiber Connections
Fault Symptoms
The DCN communication between two NEs was normal but the service created on the
static tunnel between the two NEs could not be set up.
Cause Analysis
Possible cause 1: The physical link is faulty.
Possible cause 2: The IP addresses of the ports are incorrect.
Possible cause 3: The ARP protocol works improperly.
Handling Procedure
(1) Checked the current alarms of the system and found that none of ETH_LOS,
ETH_LINK_DOWN, and HARD_BAD alarms was reported. In addition, as the DCN
communication was normal, link/port/board hardware malfunctions were ruled out.
(2) Checked the IP addresses of the concerned ports and found that the IP addresses
were correct and were in a same network segment.
(3) Checked the entries of the ARP table and found that the IP address of the opposite
port could not be learned.
(4) The DCN communication could be normal only after the two interconnected ports
successfully learned their opposite ports' MAC addresses. Based on query results, it is
found that the MAC address of the port on the sink NE was not the planned one.
(5) Checked the fiber connections and found that the fibers were incorrectly connected.
Due to the incorrect fiber connection, the ARP protocol worked improperly and the
service created on the static tunnel could not be set up.
(6) Re-connected the fibers according to the NE planning table.

Services Carried by a LAG Were
Interrupted Due to Abnormal Laser
States
Fault Symptoms
The Ethernet services carried by a LAG which consisted of one main port
and three slave ports, were interrupted. The four ports simultaneously
reported LASER_SHUT alarms but were enabled. The lasers actually did not
emit light.
Cause Analysis
The LAG detected that the board entered an abnormal state and then shut
down the lasers at all ports, but could not change the port state to
disabled. As a result, the lasers entered an abnormal state and the services
were interrupted.
Handling Procedure
(1) Checked current alarms and the status of the alarmed ports, and found
that LASER_SHUT alarms were reported and that the alarmed ports were in
the Enabled state.
(2) Queried historical alarms and found a HARD_BAD alarm. The HARD_BAD
alarm indicated that the board was faulty. Due to the board fault, the LAG
shut down the lasers at all ports. When the HARD_BAD alarm cleared,
however, the board was not accordingly restored to normal. Consequently,
LASER_SHUT alarms persisted.
(3) Performed a cold reset on the board. Then, the board restored to
normal, LASER_SHUT alarms cleared, and the services were restored.
(4) Replaced the board to prevent this fault from occurring on the network.
Locating a CES Service Fault by
Performing Loopbacks
 Fault Symptoms
A BER tester detected that a large number of bit errors occurred in the CES service between
the BSC and the BTS.
 (1) Connected a BER tester to NE01 and set an inloop at one 2 Mbit/s port of NE04. The BER tester
detected a large number of bit errors.
 (2) Configured a static ARP entry at NE03 with the MAC address being the egress port of NE03 and
the IP address being NE04, and created a tunnel whose egress label was the same as its ingress
label between NE03 and NE04.
 (3) Set an outloop at the network-side port of NE04. Then, on NE03, set an inloop at the network-
side port that was connected to NE04. In both cases, the BER tester detected bit errors.
 (4) On NE03, set an outloop at the network-side port that was connected to NE02 and found that no
bit error occurred. Therefore, it was inferred that NE03 malfunctioned.
 (5) On NE03, replaced the 10GE line board that was connected to NE02.

Cases of Troubleshooting CES
Services
Case 1: Bit Errors Occurred in CES Services Due to Insufficient Tunnel Bandwidth
[Fault Symptoms]: Configured a 15-timeslot CES service that traversed two NEs into an MLPPP
group. After the setting, the service was set up. Bit errors, however, occurred in a 31-timeslot service
that was created after the 15-timeslot service was deleted.
[Cause Analysis]: The MLPPP group had only one PPP member whose bandwidth was insufficient for
the CES services encapsulated on one PW due to insufficient bandwidth. As a result, a large number of
packets were discarded.
[Solutions]: Add a new member to the MLPPP group.
Case 2: Bit Errors Occurred When NEs Traced Different Clock Sources
[Fault Symptoms]: On a network shown in the following figure, an LSS alarm persisting for one
second was reported and a PW performance event indicating a jitter buffer overflow was also
reported.
[Cause Analysis]: The OptiX RTN 910
NE and the ANT20 traced different clock
sources. As a result, slip frames occurred
due to clock wander and delay variation.
[Solutions]: Let the ANT20 trace the
equipment clock.
Case 3: CES Services Could Not Be Set Up Due to a Mismatch of E1 Framing Mode
[Fault Symptoms]: An inloop was set at one end of a CES service traversing two NEs and a tester
was connected to the other end of the CES service. After the setting, the tester displayed an LSS alarm
and the used E1 port reported an LMFA alarm.
[Cause Analysis]: The alarmed E1 port was set to CRC4-multiframe, whereas the tester was set to
Unframe. As a result, the chip could not correctly align frames.
[Solutions]: Set the tester to PCM31C.

Large Clock Frequency Deviations
Occurred on NodeBs Due to a
Timing Loop
 Fault Symptoms
 All NodeBs connected to NE01 (an OptiX RTN 950 NE) reported an alarm indicating a
large clock frequency deviation.
 Handling Procedure
 (1) Suspected that the clock configuration of NE01 was incorrect because NE01 did not
report an alarm.
 (2) Queried the clock source priority lists of NE01 and NE02, and found that NE01 traced
the line clock from optical port 1 on the EG2 board in slot 1 (of NE01) and NE02 traced
the line clock from optical port 1 on the EG2 board in slot 2 (of NE02). The two optical
ports were directly interconnected. As a result, the clock signals traced by NE01 and
NE02 formed a loop, resulting in clock quality deterioration and large clock frequency
deviations on the NodeBs connected to NE01.
 (3) Changed the clock source of NE01 according to the NE planning table.

Service Interruptions Due to
Incorrect Setting of MPLS Next-
Hop IP Address
Fault Symptoms
 All link services converged to an OptiX RTN 900 NE of V100R001C00/C02 were
interrupted and several tunnels passing the NE frequently reported
MPLS_TUNNEL_LOCV alarms. The alarms, however, transiently cleared. In addition,
the DCN communication between all NEs was normal.
Handling Procedure
 (1) Ruled out a physical link fault because the DCN communication was normal.
 (2) Analyzed the distribution of the affected NEs and found that all interrupted
services were first converged to an NE and then backhauled to the BSC.
 (3) Found that an ARP entry was frequently and automatically added and deleted on
the convergence NE. Changed the ARP entry to a static entry. Then, the tunnel
alarms cleared and some services were restored.
 (4) Checked the configurations of the convergence NE, and found that the NE was
configured with multiple tunnels and that the next-hop IP address of the port was set
to a value same as the next-hop IP address of the convergence port. The incorrect
settings caused abnormal ARP learning and further interrupted tunnel services.
 (5) Deleted incorrectly configured services and tunnels, and re-configured services
and tunnels according to the network planning document. The services were normal
even after the static ARP entry was deleted.

Service Interruptions Due to Inconsistent
Bandwidth Planning for TDM Services
and Ethernet Services
Fault Symptoms
 Two OptiX RTN 950 NEs of a version earlier than V100R002C02SPC100 were
interconnected. They carried TDM services and Ethernet services. The microwave link
was correctly configured, but the Ethernet services could not be set up and no alarm
related to microwave links was reported.
Handling Procedure
 (1) Checked the configurations of the two interconnected NEs and found that the number
of E1s was set to different values at the two ends. The data discrepancy caused
inconsistent bandwidths at the two ends and resulted in service interruptions.
 (2) Changed the number of E1s at the two ends to the same value.
Notes
 Hybrid microwave bandwidth is equal to the sum of the TDM service bandwidth and the
Ethernet service bandwidth. For TDM services carried on a microwave link, the number of
E1s must be the same at the two ends. Otherwise, the TDM services cannot be set up.
Besides, if the set E1 bandwidth uses up all microwave bandwidth, Ethernet services will
be interrupted due to absence of bandwidth.

Contents

Reference Documents
http://support.huawei.com/support/pages/navigation/gotoKBNavi.do?
actionFlag=intoKBNavigation&autoFlag=autoThink&colID=ROOTENWEB|
CO0000000173&itemId0=29-2&itemId1=3-400

Reference Documents (Continued)
Maintenance Guide for the OptiX RTN Equipment
http://support.huawei.com/support/pages/kbcenter/view/product.do?
actionFlag=detailProductSimple&web_doc_id=SE0000498683&doc_type=123-2
For the preceding documents, please download the latest versions from
support.huawei.com. For any comments or suggestions on the
documents, please send your feedback to Chen Shaoying (employee ID:
59800).

Security Level:
www.huawei.com
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

Common Faults and Alarms On The RTN

Uploaded by

Copyright:

Available Formats

Common Faults and Alarms On The RTN

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Common Faults and Alarms On The RTN

Uploaded by

Copyright:

Available Formats

2011-5-5 Security Level: A

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

2 Locating Link Faults

3 Locating Faults of TDM Services

4 Locating Faults of Packet Services

5 Locating Faults of Protection Schemes

6 Locating Clock Faults

7 Locating DCN Faults

8 Locating Other Faults

9 Handling Common Alarms

10 Typical Cases of Fault Locating

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 2

Check service flows and

Check black box, errlog,

Record network configurations,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 3

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 4

2 Locating Link Faults

4 Locating Faults of Packet Services

5 Locating Faults of Protection Schemes

6 Locating Clock Faults

7 Locating DCN Faults

8 Locating Other Faults

9 Handling Common Alarms

10 Typical Cases of Fault Locating

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 5

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 6

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 7

1. Check for incorrect Focus on:

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 8

3. Handle abnormal Tx Replace the ODU.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 9

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 10

To handle fast fading:

antennas to make one's Rx power higher than the other's Rx power.

spectrum, or change plans to minimize the interference.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 11

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 12

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 13

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 14

 Cause 2: The opposite Tx power is over low.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 15

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 16

 Cause 2: The IF working mode of the local site is different from

that of the opposite site.

from that of the opposite ODU.

 Cause 5: The receive unit of the local site is faulty.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 17

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 18

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 19

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 20

board are the same.

Install the board in another slot.

Replace the alarmed board.

Replace the faulty ODU.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 21

Start C - STM Any No No No

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 22