Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Lag

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 39

LAG

Based on the IEEE 802.1ax standard (formerly 802.3ad), Link Aggregation Groups (LAGs) can
be configured to increase the bandwidth available between two network devices, depending on
the number of links installed. LAG also provides redundancy in the event that one or more links
participating in the LAG fail. All physical links in a given LAG links combine to form one logical
interface.
Packet sequencing must be maintained for any given session. The hashing algorithm deployed
by the Alcatel-Lucent routers is based on the type of traffic transported to ensure that all traffic
in a flow remains in sequence while providing effective load sharing across the links in the LAG.
LAGs must be statically configured or formed dynamically with Link Aggregation Control
Protocol (LACP). The optional marker protocol described in IEEE 802.1ax is not implemented.
LAGs can be configured on network and access ports.
The LAG load sharing is executed in hardware, which provides line rate forwarding for all port
types.
The LAG implementation supports LAG that with all member ports of the same speed and LAG
with mixed port-speed members (see later section for details).
The LAG implementation is supported on access and network interfaces.

LACP
Under normal operation, all non-failing links in a given LAG will become active and traffic is load
balanced across all active links. In some circumstances, however, this is not desirable. Instead,
it desired that only some of the links are active (for example, all links on the same IOM) and the
other links be kept in stand-by condition.
LACP enhancements allow active lag-member selection based on particular constrains. The
mechanism is based on the IEEE 802.1ax standard so interoperability is ensured.
To use LACP on a given LAG, operator must enable LACP on the LAG including, if desired,
selecting non-default LACP mode: active/passive and configuring administrative key to be used
(configure lag lacp). IN addition an operator can configure desired LACP transmit interval
(configure lag lacp-xmit-interval).
When LACP is enabled, an operator can see LACP changes through traps/log messages
logged against the LAG. See the TIMETRA-LAG-MIB.mib for more details.

LACP Multiplexing
The router supports two modes of multiplexing RX/TX control for LACP: coupled and
independent.
In coupled mode (default), both RX and TX are enabled or disabled at the same time whenever
a port is added or removed from a LAG group.
In independent mode, RX is first enabled when a link state is UP. LACP sends an indication to
the far-end that it is ready to receive traffic. Upon the reception of this indication, the far-end
system can enable TX. Therefore, in independent RX/TX control, LACP adds a link into a LAG
only when it detects that the other end is ready to receive traffic. This minimizes traffic loss that
might occur in coupled mode if a port is added into a LAG before notifying the far-end system or
before the far-end system is ready to receive traffic. Similarly, on link removals from LAG, LACP
turns off the distributing and collecting bit and informs the far-end about the state change. This
allows the far-end side to stop sending traffic as soon as possible.
Independent control provides for lossless operation for unicast traffic in most scenarios when
adding new members to a LAG or when removing members from a LAG. It also reduces loss for
multicast and broadcast traffic.
Note that independent and coupled mode are interoperable (connected systems can have either
mode set).

Active-Standby LAG Operation


Active/standby LAG is used to provide redundancy by logically dividing LAG into subgroups.
The LAG is divided into subgroups by either assigning each LAG’s ports to an explicit subgroup
(1 by default), or by automatically grouping all LAG’s ports residing on the same line card into a
unique sub-group (auto-iom) or by automatically grouping all LAG’s ports residing on the same
MDA into a unique sub-group (auto-mda). When a LAG is divided into sub-groups, only a single
sub-group is elected as active. Which sub-group is selected depends on selection criterion
chosen.
The active/standby decision for LAG member links is a local decision driven by preconfigured
selection-criteria. When LACP is configured, this decision was communicated to remote system
using LACP signaling.
To allow non-LACP operation, an operator must disable LACP on a given LAG and select
transmitter-driven standby signaling (configure lag standby-signaling power-off). As a
consequence, the transmit laser will be switched off for all LAG members in standby mode. On
switch over (active-links failed) the laser will be switched on all standby LAG members so they
can become active.
When the power-off is selected as the standby-signaling, the selection-criteria best-port can be
used.
It is not be possible to have an active LACP in power-off mode before the correct selection
criteria is selected.
Figure 27 shows how LAG in Active/Standby mode can be deployed towards a DSLAM access
using sub-groups with auto-iom sub-group selection. LAG links are divided into two sub-groups
(one per line card).

Figure 27: Active-Standby LAG Operation without Deployment Examples

In case of a link failure, as shown in Figure 28 and Figure 29, the switch over behavior ensures
that all lag-members connected to the same IOM as failing link will become stand-by and lag-
members connected to other IOM will become active. This way, QoS enforcement constraints
are respected, while the maximum of available links is utilized.

Figure 28: LAG on Access Interconnection

Figure 29: LAG on Access Failure Switchover


LAG on Access QoS Consideration
The following section describes various QoS related features applicable to LAG on access.

Adapt QoS Modes


Link Aggregation is supported on access side with access/hybrid ports. Similarly to LAG on
network side, LAG on access is used to aggregate Ethernet ports into all active or
active/standby LAG. The difference with LAG on networks lies in how the QoS/H-QoS is
handled. Based on hashing configured, a given SAP’s traffic can be sprayed on egress over
multiple LAG ports or can always use a single port of a LAG. There are three user-selectable
modes that allow operator to best adapt QoS configured to a LAG the SAPs are using:
1. adapt-qos distributed (default)
In a distributed mode the SLA is divided among all line cards proportionally to the number of
ports that exist on that line card for a given LAG. For example a 100 Mbps PIR with 2 LAG links
on IOM A and 3 LAG links on IOM B would result in IOM A getting 40 Mbps PIR and IOM B
getting 60Mbps PIR. Because of this distribution, SLA can be enforced. The disadvantage is
that a single flow is limited to IOM’s share of the SLA. This mode of operation may also result in
underrun due to a “hash error” (traffic not sprayed equally over each link). This mode is best
suited for services that spray traffic over all links of a LAG.
2. adapt-qos link
In a link mode the SLA is given to each and every port of a LAG. With the example above, each
port would get 100 Mbps PIR. The advantage of this method is that a single flow can now
achieve the full SLA. The disadvantage is that the overall SLA can be exceeded, if the flows
span multiple ports. This mode is best suited for services that are guaranteed to hash to a
single egress port.
3. adapt-qos port-fair
Port-fair distributes the SLA across multiple line cards relative to the number of active LAG ports
per card (in a similar way to distribute mode) with all LAG QoS objects parented to scheduler
instances at the physical port level (in a similar way to link mode). This provides a fair
distribution of bandwidth between cards and ports whilst ensuring that the port bandwidth is not
exceeded. Optimal LAG utilization relies on an even hash spraying of traffic to maximize the use
of the schedulers' and ports' bandwidth. With the example above, enabling port-fair would result
in all five ports getting 20 Mbps.
When port-fair mode is enabled, per-Vport hashing is automatically disabled for subscriber
traffic such that traffic sent to the Vport no longer uses the Vport as part of the hashing
algorithm. Any QoS object for subscribers, and any QoS object for SAPs with explicitly
configured hashing to a single egress LAG port, will be given the full bandwidth configured for
each object (in a similar way to link mode). A Vport used together with an egress port scheduler
is supported with a LAG in port-fair mode, whereas it is not supported with a distribute mode
LAG.
4. adapt-qos distributed include-egr-hash-cfg
This mode can be considered a mix of link and distributed mode. The mode uses the configured
hashing for LAG/SAP/service to choose either link or distributed adapt-qos modes. The mode
allows:
 SLA enforcement for SAPs that through configuration are guaranteed to hash to a single egress
link using full QoS per port (as per link mode)
 SLA enforcement for SAPs that hash to all LAG links proportional distribution of QoS SLA
amongst the line cards (as per distributed mode)
 SLA enforcement for multi service sites (MSS) that contain any SAPs regardless of their hash
configuration using proportional distribution of QoS SLA amongst the line cards (as per
distributed mode)
The following caveats apply to adapt-qos distributed include-egr-hash-cfg:
 The feature requires chassis mode D.
 LAG mode must be access or hybrid.
 The operator cannot change from adapt-qos distribute include-egr-hash-cfg to adapt-qos
distribute when link-map-profiles or per-link-hash is configured.
 The operator cannot change from adapt-qos link to adapt-qos distribute include-egr-hash-
cfg on a LAG with any configuration.
Table 26 shows examples of rate/BW distributions based on the adapt-qos mode used.

Table 26: Adapt QoS Bandwidth/Rate Distribution

distribute link port-fair distribute include-egr-


hash-cfg

SAP Queues % # local 100% rate 100% rate (SAP hash to 100% rate (SAP hash to one
links 1 one link) link)
or or
%# all links (SAP hash
2
% # local linksa (SAP hash
to all links) to all links)

SAP Scheduler % # local 100% 100% rate (SAP hash to 100% bandwidth (SAP hash
linksa bandwidth one link) to a one link)
or or
%# all linksb (SAP hash % # local linksa (SAP hash
to all links) to all links)

SAP MSS % # local 100% % # local linksa % # local linksa


Scheduler linksa bandwidth
Notes:
1. * % # local links = X * (number of local LAG members on a given line card/ total number of LAG
members)
2. %# all links = X* (link speed)/(total LAG speed)

Per-fp-ing-queuing
Per-fp-ing-queuing optimization for LAG ports provides the ability to reduce the number of
hardware queues assigned on each LAG SAP on ingress when the flag at LAG level is set for
per-fp-ing-queuing.
When the feature is enabled in the config>lag>access context, the queue allocation for SAPs
on a LAG will be optimized and only one queuing set per ingress forwarding path (FP) is
allocated instead of one per port.
The following rules will apply for configuring the per-fp-ing-queuing at LAG level:
 To enable per-fp-ing-queuing, the LAG must be in access mode
 The LAG mode cannot be set to network mode when the feature is enabled
 Per-fp-ing-queuing can only be set if no port members exists in the LAG
 Per-fp-ing-queuing cannot be set if LAG’s port-type is hsmda

Per-fp-egr-queuing
Per-fp-egr-queuing optimization for LAG ports provides the ability to reduce the number of
egress resources consumed by each SAP on a LAG, and by any encap groups that exist on
those SAPs.
When the feature is enabled in the config>lag>access context, the queue and virtual scheduler
allocation will be optimized. Only one queuing set and one H-QoS virtual scheduler tree per
SAP/encap group will be allocated per egress forwarding path (FP) instead of one set per each
port of the LAG. In case of a link failure/recovery, egress traffic uses failover queues while the
queues are moved over to a newly active link.
Per-fp-egr-queuing can be enabled on existing LAG with services as long as the following
conditions are met.
 The LAG’s mode must be access or hybrid.
 The LAG’s port-type must be standard.
 The LAG must have either per-link-hash enabled or all SAPs on the LAG must use per-
service-hashing only and be of a type: VPLS SAP, i-VPLS SAP, or e-Pipe VLL or PBB SAP.
 The system must be, at minimum, in chassis mode d (configure>system>chassis-mode)
To disable per-fp-egr-queuing, all ports must first be removed from a given LAG.

Per-fp-sap-instance
Per-fp-sap-instance optimization for LAG ports provides the ability to reduce the number of SAP
instance resources consumed by each SAP on a lag.
When the feature is enabled, in the config>lag>access context, a single SAP instance is
allocated on ingress and on egress per each forwarding path instead of one per port. Thanks to
an optimized resource allocation, the SAP scale on a line card will increase, if a LAG has more
than one port on that line card. Because SAP instances are only allocated per forwarding path
complex, hardware reprogramming must take place when as result of LAG links going down or
up, a SAP is moved from one LAG port on a given line card to another port on a given line card
within the same forwarding complex. This results in an increased data outage when compared
to per-fp-sap-instance feature being disabled. During the reprogramming, failover queues are
used when SAP queues are reprogrammed to a new port. Any traffic using failover queues will
not be accounted for in SAPs statistics and will be processed at best-effort priority.
The following rules apply when configuring a per-fp-sap-instance on a given LAG:
 Minimum chassis mode D is required.
 Per-fp-sap-ingress-queuing and per-fp-sap-egr-queuing must be enabled.
 The functionality can be enabled/disabled on LAG with no member ports only. Services can be
configured.
Other caveats:
 SAP instance optimization applies to LAG-level. Whether a LAG is sub-divided into sub-groups
or not, the resources are allocated per forwarding path for all complexes LAG’s links are
configured on (i.e. irrespective of whether a given sub-group a SAP is configured on uses that
complex or not).
 Egress statistics continue to be returned per port when SAP instance optimization is enabled. If
a LAG links are on a single forwarding complex, all ports but one will have no change in
statistics for the last interval – unless a SAP moved between ports during the interval.
 Rollback that changes per-fp-sap-instance configuration is service impacting.

LAG and ECMP Hashing


When a requirement exists to increase the available bandwidth for a logical link that exceeds
the physical bandwidth or add redundancy for a physical link, typically one of two methods is
applied: equal cost multi-path (ECMP) or Link Aggregation (LAG). A system can deploy both at
the same time using ECMP of two or more Link Aggregation Groups (LAG) and/or single links.
Different types of hashing algorithms can be employed to achieve one of the following
objectives:
 ECMP and LAG load balancing should be influenced solely by the offered flow packet. This is
referred to as per-flow hashing.
 ECMP and LAG load balancing should maintain consistent forwarding within a given service.
This is achieved using consistent per-service hashing.
 LAG load balancing should maintain consistent forwarding on egress over a single LAG port for
a specific network interface, SAP, etc. This is referred as per link hashing (including explicit per
link hashing with LAG link map profiles). Note that if multiple ECMP paths use a LAG with per
link hashing, the ECMP load balancing is done using either per flow or consistent per
service hashing.
These hashing methods are described in the following subsections. Although multiple hashing
options may be configured for a given flow at the same time, only one method will be selected
to hash the traffic based on the following decreasing priority order:
For ECMP load balancing:
1. Consistent per service hashing
2. Per flow hashing
For LAG load balancing:
1. LAG link map profile
2. Per link hash
3. Consistent per service hashing
4. Per flow hashing

Per Flow Hashing


Per flow hashing uses information in a packet as an input to the hash function ensuring that any
given flow maps to the same egress LAG port/ECMP path. Note that because the hash uses
information in the packet, traffic for the same SAP/interface may be sprayed across different
ports of a LAG or different ECMP paths. If this is not desired, other hashing methods outlined in
this section can be used to change that behavior. Depending on the type of traffic that needs to
be distributed into an ECMP and/or LAG, different variables are used as input to the hashing
algorithm that determines the next hop selection. The following outlines default per flow hashing
behavior for those different types of traffic:
 VPLS known unicast traffic is hashed based on the IP source and destination addresses for IP
traffic, or the MAC source and destination addresses for non-IP traffic. The MAC SA/DA are
hashed and then, if the Ethertype is IPv4 or IPv6, the hash is replaced with one based on the IP
source address/destination address.
 VPLS multicast, broadcast and unknown unicast traffic.
 Traffic transmitted on SAPs is not sprayed on a per-frame basis, but instead the service ID is
used to pick ECMP and LAG paths statically.
 Traffic transmitted on SDPs is hashed on a per packet basis in the same way as VPLS unicast
traffic. However, per packet hashing is applicable only to the distribution of traffic over LAG
ports, as the ECMP path is still chosen statically based on the service ID.
Data is hashed twice to get the ECMP path. If LAG and ECMP are performed on the same
frame, the data will be hashed again to get the LAG port (three hashes for LAG). However, if
only LAG is performed, then hashing will only be performed twice to get the LAG port.
 Multicast traffic transmitted on SAPs with IGMP snooping enabled is load-balanced based on
the internal multicast ID, which is unique for every (s,g) record. This way, multicast traffic
pertaining to different streams is distributed across different LAG member ports.
 The hashing procedure that used to be applied for all VPLS BUM traffic would result in PBB
BUM traffic being sent out on BVPLS SAP to follow only a single link when MMRP was not
used. Therefore, in chassis mode D, traffic flooded out on egress BVPLS SAPs is now load
spread using the algorithm described above for VPLS known unicast.
 Unicast IP traffic routed by a router is hashed using the IP SA/DA in the packet.
 MPLS packet hashing at an LSR is based on the whole label stack, along with the incoming port
and system IP address. Note that the EXP/TTL information in each label is not included in the
hash algorithm. This method is referred to as Label-Only Hash option and is enabled by default,
or can be re-instated in CLI by entering the lbl-only . A couple of options to further hash on the
header of an IP packet in the payload of the MPLS packet are also provided.
 VLL traffic from a service access point is not sprayed on a per-packet basis, but as for VPLS
flooded traffic, the service ID is used to pick one of the ECMP/LAG paths. The exception to this
is when shared-queuing is configured on an e-pipe SAP, i-pipe SAP, or f-pipe SAP, or when H-
POL is configured on an e-pipe SAP. In those cases, traffic spraying is the same as for VPLS
known unicast traffic. Packets of the above VLL services received on a spoke-SDP are sprayed
the same as for VPLS known unicast traffic.
 Note that a-pipe and c-pipe VLL packets are always sprayed based on the service-id in both
directions.
 Multicast IP traffic is hashed based on an internal multicast ID, which is unique for every record
similar to VPLS multicast traffic with IGMP snooping enabled.
In addition to the above outlined per-flow hashing inputs, the system supports multiple option to
modify default hash inputs.
For all cases that involve per-packet hashing, the NPA produces a 20-bit result based on
hashing the relevant packet data. This result is input to a modulo like calculation (divide by the
number of routes in the ECMP and use the remainder) to determine the ECMP index.
If the ECMP index results in the selection of a LAG as the next hop, then the hash result is
hashed again and the result of the second hash is input to the modulo like operation (divide by
the number of ports in the LAG and use the remainder) to determine the LAG port selection.
Note that when the ECMP set includes an IP interface configured on a spoke-SDP (IES/VPRN
spoke interface), or a Routed VPLS interface, the unicast IP packets—which will be sprayed
over this interface—will not be further sprayed over multiple RSVP LSPs (part of the same
SDP), or multiple LDP FEC next-hops when available. In this case, a single RSVP LSP or LDP
FEC next-hop will be selected based on a modulo operation of the service ID. The second
round of the hash is exclusively used for LAG link selection. IP unicast packets from different
IES/VPRN services or Routed VPLS services will be distributed across RSVP LSPs or LDP
FEC next-hops based on the modulo operation of their respective service ID.

Changing Default Per Flow Hashing Inputs


For some traffic patterns or specific deployments, per-flow hashing is desired but the hashing
result using default hash inputs as outlined above may not be produce a desired distribution. To
alleviate this issue, the system allows operators to modify default hash inputs as outlined in the
following subsections.

LSR Hashing
The LSR hash routine operates on the label stack only. However, there is also the ability to
hash on the IP header if a packet is IP. An LSR will consider a packet to be IP if the first nibble
following the bottom of the label stack is either 4 (IPv4) or 6 (IPv6). This allows the user to
include an IP header in the hashing routine at an LSR for the purpose of spraying labeled IP
packets over multiple equal cost paths in ECMP in an LDP LSP and/or over multiple links of a
LAG group in all types of LSPs.
The user enables the LSR hashing on label stack and/or IP header by entering the following
system-wide command: config>system>load-balancing>lsr-load-balancing [lbl-only | lbl-
ip | ip-only]
By default, the LSR falls back to the hashing on label stack only. This option is referred to as lbl-
only and the user can revert to this behavior by entering one of the two commands:
config>system>load-balancing>lsr-load-balancing lbl-only
config>system>load-balancing>no lsr-load-balancing
The user can also selectively enable or disable the inclusion of label stack and IP header in the
LSR hash routine on a specific network interface by entering the following command:
config>router>interface>load-balancing>lsr-load-balancing [lbl-only | lbl-ip | ip-only]
This provides some control to the user such that this feature is disabled if labeled packets
received on a specific interface include non IP packets that can be confused by the hash routine
for IP packets. These could be VLL and VPLS packets without a PW control word.
When the user performs the no form of this command on an interface, the interface inherits the
system level configuration.
The default lbl-only hash option and the label-ip option with IPv4 payload is supported on all
platforms and chassis modes. The ip-only option with both IPv4 and IPv6 payloads as well as
the lbl-ip option with IPv6 payload are only supported on IP interfaces on IOM3/IMM ports.

LSR Default Hash Routine—Label-Only Hash Option


The following is the behavior of ECMP and LAG hashing at an LSR in the existing
implementation. These are performed in two rounds.
First the ECMP hash. It consists of an initial hash based on the source port/system IP address.
Each label in the stack is then hashed separately with the result of the previous hash, up to a
maximum of five labels. The net result will be used to select which LDP FEC next-hop to send
the packet to using a modulo operation of the net result with the number of next-hops. If there is
a single next-hop for the LDP FEC, or if the packet is received on an RSVP LSP ILM, then a
single next-hop exists.
This same net result will feed to a second round of hashing if there is LAG on the egress port
where the selected LDP or RSVP LSP has its NHLFE programmed.

LSR Label-IP Hash Option Enabled


In the first hash round for ECMP, the algorithm will parse down the label stack and once it hits
the bottom it checks the next nibble. If the nibble value is 4 then it will assume it is an IPv4
packet. If the nibble value is 6 then it will assume it is an IPv6 packet. In both cases, the result
of the label hash is fed into another hash along with source and destination address fields in the
IP packet header. Otherwise, it will just use the label stack hash already calculated for the
ECMP path selection.
If there are more than five labels in the stack, then the algorithm will also use the result of the
label hash for the ECMP path selection.
The second round of hashing for LAG re-uses the net result of the first round of hashing. This
means IPv6 packets will continue to be hashed on label stack only.

LSR IP-Only Hash Option Enabled


This option behaves like the label-IP hash option except that when the algorithm reached the
bottom of the label stack in the ECMP round and finds an IP packet, it throws the outcome of
the label hash and only uses the source and destination address fields in the IP packet’s
header.

LSR Ethernet Encapsulated IP Hash only Option


Enabled
This option behaves like LSR IP only hash except for how the IP SA/DA information is found.
The following conditions are verified to find IP SA/DA for hash.
 Label stack must not exceed 3 labels deep
 After the bottom of the stack is reached, the hash algorithm verifies that what follows is Ethernet
II untagged frame (by looking at the value of ethertype at the expected packet location whether
it contains Ethernet encapsulated IPv4 (0x0800) or IPv6 (0x86DD) value.
When the ethertype verification passes, the first nibble of the expected IP packet location is
then verified to be 4 (IPv4) or 6 (IPv6).
L4 Load Balancing
Operator may enable L4 load balancing to include TCP/UDP source/destination port numbers in
addition to source/destination IP addresses in per flow hashing of IP packets. By including the
L4 information, a SA/DA default hash flow can be sub-divided into multiple finer-granularity
flows if the ports used between a given SA/DA vary.
L4 load balancing can be enabled/disabled on system and interface levels. When enabled, the
extra L4 port inputs apply to per-flow hashing for unicast IP traffic and multicast traffic (if mc-
enh-load-balancing is enabled).
System IP Load Balancing
This enhancement adds an option to add the system IP address into the hash algorithm. This
adds a per system variable so that traffic being forward through multiple routers with similar
ECMP paths will have a lower chance of always using the same path to a given destination.
Currently, if multiple routers have the same set of ECMP next hops, traffic will use the same
nexthop at every router hop. This can contribute to the unbalanced utilization of links. The new
hash option avoids this issue.
This feature when enabled, enhances the default per-flow hashing algorithm described earlier. It
however does not apply to services which packets are hashed based on service-id or when per
service consistent hashing is enabled. This hash algorithm is only supported on IOM3-
XPs/IMMs or later generations of hardware.The System IP load balancing can be enabled per-
system only.

TEID Hash for GTP-Encapsulated Traffic


This options enables TEID hashing on L3 interfaces. The hash algorithm identifies GTP-C or
GTP-U by looking at the UDP destination port (2123 or 2152) of an IP packet to be hashed. If
the value of the port matches, the packet is assumed to be GTP-U/C. For GTPv1 packets TEID
value from the expected header location is then included in hash. For GTPv2 packets the TEID
flag value in the expected header is additionally checked to verify whether TEID is present. If
TEID is present, it is included in hash algorithm inputs. TEID is used in addition to GTP tunnel
IP hash inputs: SA/DA and SPort/DPort (if L4 load balancing is enabled). If a non-GTP packet is
received on the GTP UDP ports above, the packets will be hashed as GTP.

Source-Only/Destination-Only Hash Inputs


This option allows an operator to only include source parameters or only include destination
parameters in the hash for inputs that have source/destination context (such as IP address and
L4 port). Parameters that do not have source/destination context (such as TEID or System IP
for example) are also included in hash as per applicable hash configuration. The functionality
allows, among others, to ensure that both upstream and downstream traffic hash to the same
ECMP path/LAG port on system egress when traffic is sent to a hair-pinned appliance (by
configuring source-only hash for incoming traffic on upstream interfaces and destination-only
hash for incoming traffic on downstream interfaces).

Enhanced Multicast Load Balancing


Enhanced multicast load balancing allows operators to replace the default multicast per flow
hash input (internal multicast ID) with information from the packet. When enabled, multicast
traffic for Layer 3 services (such as IES, VPRN, r-VPLS) and ng-MVPN (multicast inside RSVP-
TE, LDP LSPs) are hashed using information from the packet. Which inputs are chosen
depends on which per flow hash inputs options are enabled based on the following:
 IP replication—The hash algorithm for multicast mimics unicast hash algorithm using SA/DA by
default and optionally TCP/UDP ports (Layer 4 load balancing enabled) and/or system IP
(System IP load balancing enabled) and/or source/destination parameters only (Source-
only/Destination-only hash inputs).
 MPLS replication—The hash algorithm for multicast mimics unicast hash algorithm is described
in the LSR Hashing section.
Note:
Enhanced multicast load balancing requires minimum chassis mode D. It is not supported
with Layer 2 and ESM services. It is supported on all platforms except for the 7750 SR-c4
and SR-c12 and the 7450 ESS in standard mode.

Security Parameter Index (SPI) Load Balancing


IPSec tunneled traffic transported over LAG typically falls back to IP header hashing only. For
example, in LTE deployments, TEID hashing cannot be performed because of encryption, and
the system performs IP-only tunnel-level hashing. Because each SPI in the IPSec header
identifies a unique SA, and thus flow, these flows can be hashed individually without impacting
packet ordering. In this way, SPI load balancing provides a mechanism to improve the hashing
performance of IPSec encrypted traffic.
The system allows enabling SPI hashing per L3 interface (this is the incoming interface for hash
on system egress)/L2 VPLS service. When enabled, an SPI value from ESP/AH header is used
in addition to any other IP hash input based on per-flow hash configuration: source/destination
IPv6 addresses, L4 source/dest ports in case NAT traversal is required (l4-load-balancing is
enabled). If the ESP/AH header is not present in a packet received on a given interface, the SPI
will not be part of the hash inputs, and the packet is hashed as per other hashing configurations.
SPI hashing is not used for fragmented traffic to ensure first and subsequent fragments use the
same hash inputs.
SPI hashing is supported for IPv4 and IPv6 tunnel unicast traffic and for multicast traffic (mc-
enh-load-balancing must be enabled) on all platforms and requires L3 interfaces or VPLS
service interfaces with SPI hashing enabled to reside on IOM3-XP or newer line-cards.

Per Link Hashing


The hashing feature described in this section applies to traffic going over LAG and MC-LAG.
Per link hashing ensures all data traffic on a given SAP or network interface uses a single LAG
port on egress. Because all traffic for a given SAP/network interface egresses over a single
port, QoS SLA enforcement for that SAP, network interface is no longer impacted by the
property of LAG (distributing traffic over multiple links). Internally-generated, unique IDs are
used to distribute SAPs/network interface over all active LAG ports. As ports go UP and DOWN,
each SAP and network interface is automatically rehashed so all active LAG ports are always
used.
The feature is best suited for deployments when SAPs/network interfaces on a given LAG have
statistically similar BW requirements (since per SAP/network interface hash is used). If more
control is required over which LAG ports SAPs/network interfaces egress on, a LAG link map
profile feature described later in this guide may be used.
Per link hashing, can be enabled on a LAG as long as the following conditions are met:
 LAG port-type must be standard.
 LAG access adapt-qos must be link or port-fair (for LAGs in mode access or hybrid).
 System must be at minimum in chassis mode d (configure system chassis-mode)
 LAG mode is access/hybrid and the access adapt-qos mode is distribute include-egr-hash-
cfg

Weighted per-link-hash
Weighted per-link-hash allows higher control in distribution of SAPs/interfaces/subscribers
across LAG links when significant differences in SAPs/interfaces/subscribers bandwidth
requirements could lead to an unbalanced distribution bandwidth utilization over LAG egress.
The feature allows operators to configure for each SAPs/interfaces/subscribers on a LAG one of
three unique classes and a weight value to be used to when hashing this service/subscriber
across the LAG links. SAPs/interfaces/subscribers are hashed to LAG links, such that within
each class the total weight of all SAPs/interfaces/subscribers on each LAG link is as close as
possible to each other.
Multiple classes allow grouping of SAPs/interfaces/subscribers by similar bandwidth class/type.
For example a class can represent: voice – negligible bandwidth, Broadband – 10 to 100 Mbps,
Extreme Broadband – 300 Mbps and above types of service. If a class and weight are not
specified for a given service or subscriber, values of 1 and 1 are used respectively.
The following algorithm is used to hash SAPs/interfaces/subscribers to LAG egress links:
 TPSDA subscribers are hashed to a LAG link when subscribers are active, MSE
SAPs/interfaces are hashed to a LAG link when configured
 For a new SAP/interface/subscriber to be hashed to an egress LAG link:
 Select active link with the smallest current weight for the SAP/network/subscriber class
 On a LAG link failure:
 Only SAPs/interfaces/subscribers on a failed link are rehashed over the remaining active links
 Processing order: Per class from lowest numerical, within each class per weight from highest
numerical value
 LAG link recovery/new link added to a LAG:
 auto-rebalance disabled: Existing SAPs/interfaces/subscribers remain on the currently active
links, new SAPs/interfaces/subscribers naturally prefer the new link until balance reached.
 auto-rebalance is enabled: When a new port is added to a LAG a non-configurable 5 second
rebalance timer is started. Upon timer expiry, all existing SAPs/interfaces/subscribers are
rebalanced across all active LAG links minimizing the number of SAPs/interfaces/subscribers
moved to achieve rebalance. The rebalance timer is restarted if a new link is added while the
timer is running. If a port bounces 5 times within a 5 second interval, the port is quarantined
for10 seconds. This behavior is not configurable.
 On a LAG start-up, the rebalance timer is always started irrespective of auto-rebalance
configuration to avoid hashing SAPs/interfaces/subscribers to a LAG before ports have a
chance to come UP.
 Weights for network interfaces are separated from weights for access
SAPs/interfaces/subscribers.
 On a mixed-speed LAG, link selection is made with link speeds factoring into the overall weight
for the same class of traffic. This means that higher-speed links will be preferred over lower-
speed links.
Optionally an operator can use a tools perform lag load-balance command to manually re-
balance ALL weighted per-link-hashed SAPs/interfaces/subscribers on a LAG. The rebalance
follows the algorithm as used on a link failure moving SAPs/interfaces/subscribers to different
LAG links to minimize SAPs/interfaces/subscribers impacted.
Along with the caveats for standard per-link hashing, the following caveats exist:
 When weighted per-link-hash is deployed on a given LAG, no other methods of hash for
subscribers/SAPs/interfaces on that LAG (like service hash or LAG link map profile) should be
deployed, since the weighted hash is not able to account for loads placed on LAG links by
subscriber/SAPs/interfaces using the other hash methods.
 For the TPSDA model only the 1:1 (subscriber to SAP) model is supported.
This feature will not operate properly if the above conditions are not met.

Explicit Per Link Hash Using LAG Link Mapping


Profiles
The hashing feature described in this section applies to traffic going over LAG and MC-LAG.
LAG link mapping profile feature gives operators full control of which links SAPs/network
interface use on a LAG egress and how the traffic is rehashed on a LAG link failure. Some
benefits that such functionality provides include:
 Ability to perform management level admission control onto LAG ports thus increasing overall
LAG BW utilization and controlling LAG behavior on a port failure.
 Ability to strictly enforce QoS contract on egress for a SAP/network interface or a group of
SAPs/network interfaces by forcing it/them to egress over a single port and using access
adapt-qos link or port-fair mode.
To enable LAG Link Mapping Profile Feature on a given LAG, operators configure one or more
of the available LAG link mapping profiles on the LAG and then assign that profile(s) to all or a
subset of SAPs and network interfaces as needed. Enabling per LAG link Mapping Profile is
allowed on a LAG with services configured, a small outage may take place as result of re-
hashing SAP/network interface when a lag profile is assigned to it.
Each LAG link mapping profile allows operators to configure:
 Primary link—defines a port of the LAG to be used by a SAP/network interface when the port is
UP. Note that a port cannot be removed from a LAG if it is part of any LAG link profile.
 Secondary link—defines a port of the LAG to be used by a SAP/network interface as a backup
when the primary link is not available (not configured or down) and the secondary link is UP.
 Mode of operation when neither primary, nor secondary links are available (not configured or
down):
 discard – traffic for a given SAP/network interface will be dropped to protect other
SAPs/network interfaces from being impacted by re-hashing these SAPs/network interfaces
over remaining active LAG ports.
Note:
SAP/network interface status will not be affected when primary and secondary links are
unavailable, unless an OAM mechanism that follows the data path hashing on egress is
used and will cause a SAP/network interface to go down.

 per-link-hash – traffic for a given SAP/network interface will be re-hashed over remaining
active ports of a LAG links using per-link-hashing algorithm. This behavior ensures
SAP/network interfaces using this profile will be given available resources of other active LAG
ports even if that means impacting other SAP/network interfaces on the LAG. The system will
use the QoS configuration to provide fairness and priority if congestion is caused by the default-
hash recovery.
LAG link mapping profiles, can be enabled on a LAG as long as the following conditions are
met:
 LAG port-type must be standard.
 LAG access adapt-qos must be link or port-fair (for LAGs in mode access or hybrid)
 All ports of a LAG on a given router must belong to a single sub-group.
 System must be at minimum in chassis mode d (configure system chassis-mode)
 Access adapt-qos mode is distribute include-egr-hash-cfg.
LAG link mapping profile can co-exist with any-other hashing used over a given LAG (for
example, per flow hashing or per-link-hashing). SAPs/network interfaces that have no link
mapping profile configured will be subject to LAG hashing, while SAPs/network interfaces that
have configured LAG profile assigned will be subject to LAG link mapping behavior, which is
described above.

Consistent Per Service Hashing


The hashing feature described in this section applies to traffic going over LAG, Ethernet tunnels
(eth-tunnel) in loadsharing mode, or CCAG load balancing for VSM redundancy. The feature
does not apply to ECMP.
Per-service-hashing was introduced to ensure consistent forwarding of packets belonging to
one service. The feature can be enabled using the [no] per-service-hashing configuration
option under config>service>epipeand config>service>vpls, valid for Epipe, VPLS, PBB
Epipe, IVPLS and BVPLS. Chassis mode D is required for the 7450 ESS and 7750 SR.
The following behavior applies to the usage of the [no] per-service-hashing option.
 The setting of the PBB Epipe/I-VPLS children dictates the hashing behavior of the traffic
destined to or sourced from an Epipe/I-VPLS endpoint (PW/SAP).
 The setting of the B-VPLS parent dictates the hashing behavior only for transit traffic through
the B-VPLS instance (not destined to or sourced from a local I-VPLS/Epipe children).
The following algorithm describes the hash-key used for hashing when the new option is
enabled:
 If the packet is PBB encapsulated (contains an I-TAG ethertype) at the ingress side and enters
a B-VPLS service, use the ISID value from the I-TAG. For PBB encapsulated traffic entering
other service types, use the related service ID.
 If the packet is not PBB encapsulated at the ingress side
 For regular (non-PBB) VPLS and EPIPE services, use the related service ID
 If the packet is originated from an ingress IVPLS or PBB Epipe SAP
 If there is an ISID configured use the related ISID value
 If there is no ISID configured use the related service ID
 For BVPLS transit traffic use the related flood list id
 Transit traffic is the traffic going between BVPLS endpoints
 An example of non-PBB transit traffic in BVPLS is the OAM traffic
 The above rules apply regardless of traffic type
 Unicast, BUM flooded without MMRP or with MMRP, IGMP snooped
Operators may sometimes require the capability to query the system for the link in a LAG or
Ethernet tunnel that is currently assigned to a given service-id or ISID. This capability is
provided using the tools>dump>map-to-phy-port {ccag ccag-id | lag lag-id | eth-
tunnel tunnel-index} {isid isid [end-isid isid] | service servid-id | svc-name [end-
service service-id | syc-name]} [summary] command.
A sample usage is as follows:
A:Dut-B# tools dump map-to-phy-port lag 11 service 1
ServiceId ServiceName ServiceType Hashing Phys
ical Link
---------- ------------- -------------- ----------------------- ----
---------
1 i-vpls per-
service(if enabled) 3/2/8
A:Dut-B# tools dump map-to-phy-port lag 11 isid 1
ISID Hashing Physical Link
-------- ----------------------- -------------
1 per-service(if enabled) 3/2/8
A:Dut-B# tools dump map-to-phy-port lag 11 isid 1 end-isid 4
ISID Hashing Physical Link
-------- ----------------------- -------------
1 per-service(if enabled) 3/2/8
2 per-service(if enabled) 3/2/7
3 per-service(if enabled) 1/2/2
4 per-service(if enabled) 1/2/3

ESM – LAG Hashing per Vport


Background
Vport is a router BNG representation of a remote traffic aggregation point in the access network.
It is a level in the hierarchical QoS model implemented within the BNG that requires QoS
treatment.
When the BNG is connected to access network via LAG, a VPort construct within the BNG is
instantiated per member link on that LAG. Each instance of the Vport in such a configuration
receives the entire amount of configured bandwidth. When traffic is sprayed in a per-subscriber
fashion over member links in an LAG without awareness of the Vport, it can lead to packet
drops on one member link irrespective of the relative traffic priority on another LAG member link
in the same Vport. The reason is that multiple Vport instances of the same Vport on different
LAG member links are not aware of each other.
With a small number of subscribers per Vport and a great variation in bandwidth service offering
per subscriber (from mbps to gbps), there is a great chance that the load distribution between
the member links will be heavily unbalanced. For example, if the lag consists of two member
links on the same IOM, three 1Gbps high priority subscribers can saturate the 2 Gbps Vport
bandwidth on one member link of the LAG. And all the while, twenty low priority 10 Mbps
subscribers that are using the other link are significantly under-utilizing available bandwidth on
the corresponding Vport.
To remedy this situation, all traffic flowing through the same Vport must be hashed to a single
LAG member link. This way, the traffic treatment will be controlled by a single Vport instance,
and achieve a desired behavior where low priority 10 Mbps subscribers traffic will be affected
before any traffic from the high priority subscribers.

Hashing per Vport


Hashing traffic per Vport ensures that the traffic on the same PON (or DSLAM) traverse the
same Vport, and therefore, it is the same member link that this Vport is associated with. The
Vport instances of the same Vport on another member links are irrelevant for QoS treatment.
The Vport for Alcatel-Lucent routers is referenced via inter-dest-string, which can be returned
via RADIUS. For this reason, the terms hashing per inter-dest-string or hashing per Vport can
be interchangeably used.
If the subscriber is associated with a Vport, hashing will be automatically performed per inter-
dest-string. In case that no such association exists, hashing will default to per-subscriber
hashing.
In certain cases, S-vlan tag can represent Vport. In such a case, per S-vlan hashing is desired.
This can be implicitly achieved by the following configuration:
configure
subscr-mgmt
msap-policy <name>
sub-sla-mgmt
def-inter-dest-id use-top-queue
configure
port <port-id>
ethernet
access
egress
vport <name>
host-match dest <s-tag>

Through this CLI hierarchy, S-tag is implicitly associated with the inter-dest-string and
consequently with the Vport.

Link Placement
This feature requires that all active member ports in a LAG reside on the same forwarding
complex (IOM/IMM).

Multicast Consideration
Multicast traffic that is directly replicated per subscriber follows the same hashing algorithm as
the rests of the subscribers (per inter-dest-string hashing).
Multicast traffic that is redirected to a regular Layer 3 interface outside of the ESM will be
hashed per destination group (or IP address).

VPLS and Capture SAP Considerations


VPLS environment in conjunction with ESM allows hashing based on destination mac address.
This is achieved through the following CLI hierarchy:
configure
service vpls <vpls-id>
sap lag-<id>
sub-sla-mgmt
mac-da-hashing

Note that this is only applicable to L2 ESM. In the case where this is configured and Vport
hashing is required, the following order of evaluation must be executed:
1. Hashing based on subscriber-id or inter-dest-string
2. If configured, mac-da-hashing
Hashing per inter-dest-string will win if a <Vport, subscriber> association is available at the
same time as the mac-da-hashing is configured.
The Mac-da-hashing mechanism cannot transition from a capture SAP to a derived MSAP.

LSR Default Hash Routine— Label-Only Hash


Option
The following is the behavior of ECMP and LAG hashing at an LSR in the existing
implementation. These are performed in two rounds.
First the ECMP hash. It consists of an initial hash based on the source port/system IP address.
Each label in the stack is then hashed separately with the result of the previous hash, up to a
maximum of five labels. The net result will be used to select which LDP FEC next-hop to send
the packet to using a modulo operation of the net result with the number of next-hops. If there is
a single next-hop for the LDP FEC, or if the packet is received on an RSVP LSP ILM, then a
single next-hop exists.
This same net result will feed to a second round of hashing if there is LAG on the egress port
where the selected LDP or RSVP LSP has its NHLFE programmed.

LSR Label-IP Hash Option Enabled


In the first hash round for ECMP, the algorithm will parse down the label stack and once it hits
the bottom it checks the next nibble. If the nibble value is 4 then it will assume it is an IPv4
packet. If the nibble value is 6 then it will assume it is an IPv6 packet. In both cases, the result
of the label hash is fed into another hash along with source and destination address fields in the
IP packet’s header. Otherwise, it will just use the label stack hash already calculated for the
ECMP path selection.
If there are more than five labels in the stack, then the algorithm will also use the result of the
label hash for the ECMP path selection.
The second round of hashing for LAG re-uses the net result of the first round of hashing. This
means IPv6 packets will continue to be hashed on label stack only.

LSR IP-Only Hash Option Enabled


This option behaves like the label-IP hash option except that when the algorithm reached the
bottom of the label stack in the ECMP round and finds an IP packet, it throws the outcome of
the label hash and only uses the source and destination address fields in the IP packet’s
header.

LAG Hold Down Timers


Operators can configure multiple hold down timers that allow control how quickly LAG responds
to operational port state changes. The following timers are supported:
1. Port-level hold-time up/down timer This optional timer allows operator to control delay for
adding/removing a port from LAG when the port comes UP/goes DOWN. Each LAG port runs
the same value of the timer, configured on the primary LAG link. See Port Link Dampening
description in Port Features section of this guide for more details on this timer.
2. Sub-group-level hold-time timer This optional timer allows operator to control delay for a switch
to a new candidate sub-group selected by LAG sub-group selection algorithm from the current,
operationally UP sub-group. The timer can also be configured to never expire, which prevents a
switch from operationally up sub-group to a new candidate sub-group (manual switchover is
possible using tools perform force lag command). Note that, if the port link dampening is
deployed, the port level timer must expire before the sub-group-selection takes place and this
timer is started. Sub-group-level hold-down timer is supported with LAGs running LACP only.
3. LAG-level hold-time down timer This optional timer allows operator to control delay for declaring
a LAG operationally down when the available links fall below the required port/BW minimum.
The timer is recommended for LAG connecting to MC-LAG systems. The timer prevents a LAG
going down when MC-LAG switchover executes break-before-make switch. Note that, if the port
link dampening is deployed, the port level timer must expire before the LAG operational status
is processed and this timer is started.

BFD over LAG Links


The router supports the application of BFD to monitor individual LAG link members to speed up
the detection of link failures. When BFD is associated with an Ethernet LAG, BFD sessions are
setup over each link member, and are referred to as micro-BFD sessions. A link is not
operational in the associated LAG until the associated micro-BFD session is fully established. In
addition, the link member is removed from the operational state in the LAG if the BFD session
fails.
When configuring the local and remote IP address for the BFD over LAG link sessions,
the local-ip parameter should always match an IP address associated with the IP interface to
which this LAG is bound. In addition, the remote-ip parameter should match an IP address on
the remote system and should also be in the same subnet as the local-ip address. If the LAG
bundle is re-associated with a different IP interface, the local-ip and remote-ip parameters
should be modified to match the new IP subnet. The local-ip and remote-ip values do not have
to match in the case of hybrid mode, q-tag or QInQ tagging.

Mixed Port-Speed LAG Support


Alcatel-Lucent routers support mixing different speed member ports in a single LAG. The LAG
must be configured explicitly to allow mixed port-speed operation through the port-weight-speed
command. The port-weight-speed defines both the lowest port speed for a member port in that
LAG and the type of higher speed ports allowed to be mixed in the same LAG. For example,
port-weight-speed 10 defines the minimum member port speed of 10GE and allows addition of
any port that has a speed, which is a multiple of 10GE as long as the mix is supported by a
given release, refer to specific Release Notes. Any LAG can be configured to support mixed
port-speed operation.
For mixed port-speed LAGs:
 Both LACP and non-LACP configurations are supported. With LACP enabled, LACP is unaware
of physical port differences.
 QoS is distributed proportionally to port-speed, unless explicitly configured not to do so (see
internal-scheduler-weight-mode)
 User data traffic is hashed proportionally to port speed when any per-flow hash is deployed.
 CPM-originated OAM control traffic that requires per LAG hashing is hashed per physical port.
 Alcatel-Lucent recommends that operators use weight-threshold instead of port-threshold to
control LAG operational status. For example, when 10GE and 100GE ports are mixed in a LAG,
each 10GE port will have a weight of 1, while each 100GE port will have a weight of 10.
Note that the weight-threshold can also be used for LAGs not in mixed port-speed mode to
allow common operational model (each port has a weight of 1 to mimic port-threshold and
related configuration).
 Alcatel-Lucent recommends that operators use weight-based thresholds for other system
configurations that react to operational change of LAG member ports, like MCAC (see use-lag-
port-weight) and VRRP (see weight-down).
 When sub-groups are used, the following behavior should be noted for selection criteria:
 highest-count – continues to operate on physical link counts. Therefore, a sub-group with lower
speed links will be selected even if its total bandwidth is lower. For example: a 4 * 10GE
subgroup will be selected over a 100GE + 1 GE sub-group).
 highest-weight – continues to operate on operator-configured priorities. Therefore, it is expected
that configured weights take into account the proportional bandwidth difference between
member ports to achieve the desired behavior. For example, to favor sub-groups with higher
bandwidth capacity but lower link count in a 10GE/100GE LAG, 100GE ports need to have their
priority set to a value that is at least 10 times that of the 10GE ports priority value.
 best-port – continues to operate on operator-configured priorities. Therefore, it is expected that
the configured weights will take into account proportional bandwidth difference between
member ports to achieve the desired behavior.
Operators can add higher speed member ports to an existing LAG in service when all ports of
the LAG have the speed as selected by port-weight-speed or when port-weight-speed is
disabled (non-mixed port-speed operation). To do so, first port-based thresholds related to that
LAG should be switched to weight-based thresholds, and then port-speed-weight should be set
to the port speed of the existing member ports. After that, operators can add higher speed ports
adjusting weight-based thresholds as required.
Similarly, operators can disable mixed port-speed operation in service if all ports have the same
port speed and port-weight-speed equals to member ports’ speed. Note that weight-based
thresholds may remain to be in use for the LAG.
Feature limitations:
 requires chassis mode D
 supported on network, access, and hybrid mode LAGs, including MC-LAG
 supported for standard-port LAGs and on 10GE WAN/100GE LAN port combinations
 PIM lag-usage-optimization is not supported and must not be configured
 LAG member links must have the default configuration for config port ethernet egress-
rate/ingress-rate
 not supported on the 7450 ESS-6V
 not supported for ESM
 not supported with weighted per-link-hash

LAG Upgrade
Migrating LAGs to higher speed links involves using mixed-speed LAGs initially, and later
removing lower speed links. However, a consequence is that the lower speed links in the mixed-
speed LAG set the member link limit. Even after all lower speed links are removed, the higher-
speed links maintain a higher weight and this limits how many physical links that a mixed-port
speed LAG can include.
LAG upgrade support allows migration from 1GE to 10GE to 40/100GE without removing all the
ports from the LAG.
LAG upgrade support requires turning on mixed-speed LAG and adding higher speed links to
an existing LAG. Once the lower speed links are removed, the no-port-weight-
speed command is used to turn off mixed-speed LAG and to re-calibrate the number of logical
links. Figure 29 illustrates the steps in this scenario.

Figure 30: LAG Upgrade (Mixed Speed LAGs)

If a 10GE or 100GE port is allocated as 10 links, it would be converted to one link per port if all
the ports in the LAG are the same speed.

Multi-Chassis LAG
This section describes the Multi-Chassis LAG (MC-LAG) concept. MC-LAG is an extension of a
LAG concept that provides node-level redundancy in addition to link-level redundancy provided
by “regular LAG”.
Typically, MC-LAG is deployed in a network-wide scenario providing redundant connection
between different end points. The whole scenario is then built by combination of different
mechanisms (for example, MC-LAG and redundant pseudowire to provide e2e redundant p2p
connection or dual homing of DSLAMs in Layer 2/3 TPSDA).

Overview
Multi-chassis LAG is a method of providing redundant Layer 2/3 access connectivity that
extends beyond link level protection by allowing two systems to share a common LAG end
point.
The multi-service access node (MSAN) node is connected with multiple links towards a
redundant pair of Layer 2/3 aggregation nodes such that both link and node level redundancy,
are provided. By using a multi-chassis LAG protocol, the paired Layer 2/3 aggregation nodes
(referred to as redundant-pair) appears to be a single node utilizing LACP towards the access
node. The multi-chassis LAG protocol between redundant-pair ensures a synchronized
forwarding plane to/from the access node and is used to synchronize the link state information
between the redundant-pair nodes such that proper LACP messaging is provided to the access
node from both redundant-pair nodes.
In order to ensure SLAs and deterministic forwarding characteristics between the access and
the redundant-pair node, the multi-chassis LAG function provides an active/standby operation
towards/from the access node. LACP is used to manage the available LAG links into active and
standby states such that only links from 1 aggregation node are active at a time to/from the
access node.
Alternatively, when access nodes does not support LACP, the power-off option can be used to
enforce active/standby operation. In this case, the standby ports are trx_disabled (power off
transmitter) to prevent usage of the lag member by the access-node.Characteristics related to
MC are:
 Selection of the common system ID, system-priority and administrative-key are used in LACP
messages so partner systems consider all links as the part of the same LAG.
 Extension of selection algorithm in order to allow selection of active sub-group.
 The sub-group definition in LAG context is still local to the single box, meaning that even if sub-
groups configured on two different systems have the same sub-group-id they are still
considered as two separate subgroups within given LAG.
 Multiple sub-groups per PE in a MC-LAG is supported.
 In case there is a tie in the selection algorithm, for example, two sub-groups with identical
aggregate weight (or number of active links) the group which is local to the system with lower
system LACP priority and LAG system ID is taken.
 Providing inter-chassis communication channel allows inter-chassis communication to support
LACP on both system. This communication channel enables the following:
 Supports connections at the IP level which do not require a direct link between two nodes. The
IP address configured at the neighbor system is one of the addresses of the system (interface
or loop-back IP address).
 The communication protocol provides heartbeat mechanism to enhance robustness of the MC-
LAG operation and detecting node failures.
 Support for operator actions on any node that force an operational change.
 The LAG group-ids do not have to match between neighbor systems. At the same time, there
can be multiple LAG groups between the same pair of neighbors.
 Verification that the physical characteristics, such as speed and auto-negotiation is configured
and initiates operator notifications (traps) if errors exist. Consistency of MC-LAG configuration
(system-id, administrative-key and system-priority) is provided. Similarly, load-balancing mode
of operation must be consistently configured on both nodes.
 Traffic over the signaling link is encrypted using a user configurable message digest key.
 MC-LAG function provides active/stand-by status to other software applications in order to built
a reliable solutions.
Figure 31 and Figure 32 show the different combinations of MC-LAG attachments that are
supported. The supported configurations can be sub-divided into following sub-groups:
 Dual-homing to remote PE pairs
 both end-points attached with MC-LAG
 one end-point attached
 Dual-homing to local PE pair
 both end-points attached with MC-LAG
 one end-point attached with MC-LAG
 both end-points attached with MC-LAG to two overlapping pairs

Figure 31: MC-LAG L2 Dual Homing to Remote PE Pairs


Figure 32: MC-LAG L2 Dual Homing to Local PE-Pairs

The forwarding behavior of the nodes abide by the following principles. Note that logical
destination (actual forwarding decision) is primarily determined by the service (VPLS or VLL)
and the principle below applies only if destination or source is based on MC-LAG:
 Packets received from the network will be forwarded to all local active links of the given
destination-sap based on conversation hashing. In case there are no local active links, the
packets will be cross-connected to inter-chassis pseudowire.
 Packets received from the MC-LAG sap will be forwarded to active destination pseudo-wire or
active local links of destination-sap. In case there are no such objects available at the local
node, the packets will be cross-connected to inter-chassis pseudowire.

MC-LAG and Subscriber Routed Redundancy


Protocol (SRRP)
MC-LAG and SRRP enable dual-homed links from any IEEE 802.1ax (formerly 802.3ad)
standards-based access device (for example, a IP DSLAM, Ethernet switch or a Video on
Demand server) to multiple Layer 2/3 or Layer 3 aggregation nodes. In contrast with slow
recovery mechanisms such as Spanning Tree, multi-chassis LAG provides synchronized and
stateful redundancy for VPN services or triple play subscribers in the event of the access link or
aggregation node failing, with zero impact to end users and their services.
Refer to the Triple Play Services Delivery Guide for information about SRRP.

Point-to-Point (p2p) Redundant Connection


Across Layer 2/3 VPN Network
Figure 33 shows the connection between two multi-service access nodes (MSANs) across
network based on Layer 2/3 VPN pseudo-wires. The connection between MSAN and a pair of
PE routers is realized by MC-LAG. From MSAN perspective, redundant pair of PE routers acts
as a single partner in LACP negotiation. At any point in time, only one of the routers has an
active link(s) in a given LAG. The status of LAG links is reflected in status signaling of pseudo-
wires set between all participating PEs. The combination of active and stand-by states across
LAG links as well and pseudo-wires give only 1 unique path between pair of MSANs.

Figure 33: P2P Redundant Connection Through a Layer 2 VPN Network


Note that the configuration in Figure 33 shows one particular configuration of VLL connections
based on MC-LAG, particularly the VLL connection where two ends (SAPs) are on two different
redundant-pairs. In addition to this, other configurations are possible, such as:
 Both ends of the same VLL connections are local to the same redundant-pair.
 One end VLL endpoint is on a redundant-pair the other on single (local or remote) node.

DSLAM Dual Homing in Layer 2/3 TPSDA Model


Figure 34 shows a network configuration where DSLAM is dual homed to pair of redundant PEs
by using MC-LAG. Inside the aggregation network redundant-pair of PEs is connecting to VPLS
service which provides reliable connection to single or pair of Broadband Service Routers
(BSRs).

Figure 34: DSLAM Dual-Homing Using MC-LAG


MC-LAG and pseudo-wire connectivity, PE-A and PE-B implement enhanced subscriber
management features based on DHCP-snooping and creating dynamic states for every
subscriber-host. As in any point of time there is only one PE active, it is necessary to provide
the mechanism for synchronizing subscriber-host state-information between active PE (where
the state is learned) and stand-by PE. In addition, VPLS core must be aware of active PE in
order to forward all subscriber traffic to a PE with an active LAG link. The mechanism for this
synchronization is outside of the scope of this document.

G.8031 Protected Ethernet Tunnels


The Alcatel-Lucent PBB implementation offers the capability to use core Ethernet tunnels
compliant with ITU-T G.8031 specification to achieve 50 ms resiliency for failures in a native
Ethernet backbone. For further information regarding Ethernet tunnels, see G.8031 Protected
Ethernet Tunnels in the Services Overview Guide.

G.8032 Protected Ethernet Rings


Ethernet ring protection switching offers ITU-T G.8032 specification compliance to achieve
resiliency for Ethernet Layer 2 networks. Similar to G.8031 linear protection (also called
Automatic Protection Switching (APS)), G.8032 (Eth-ring) is also built on Ethernet OAM and
often referred to as Ring Automatic Protection Switching (R-APS).
For further information regarding Ethernet rings, see G.8032 Protected Ethernet Rings section
in the Services Overview Guide.

Ethernet Port Monitoring


Ethernet ports can record and recognize various medium statistics and errors. There are two
main types of errors:
 Frame Based — Frame based errors are counted when the arriving frame has an error that
means the frame is invalid. These types of errors are only detectable when frames are presents
on the wire.
 Symbol Based — Symbol errors are invalidly encoded symbols on the physical medium.
Symbols are always present on an active Ethernet port regardless of the presence of frames.
CRC-Monitor and Symbol-Monitor allows the operator to monitor ingress error conditions on the
Ethernet medium and compare these error counts to the thresholds. CRC-Monitor monitors
CRC errors. Symbol-Monitor monitors symbol errors. Symbol Error is not supported on all
Ethernet ports. Crossing a signal degrade (SD) threshold will cause a log event to be raised.
Crossing the configured signal failure (SF) threshold will cause the port to enter an operation
state of down. The operator may consider the configuration of other protocols to convey the
failure, through timeout conditions.
The error rates are in the form of M*10E-N. The operator has the ability to configure both the
threshold (N) and a multiplier (M). By default if the multiplier is not configured the multiplier is 1.
As an example, sd-threshold 3 would result in a signal degrade error rate of 1*10E-3 (one error
per 1000). Changing the configuration to would sd-threshold 3 multiplier 5 result in a signal
degrade rate of 5*10E-3 (5 errors per 1000). The signal degrade value must be a lower error
rate than the signal failure threshold. This threshold can be used to provide notification that the
port is operating in a degraded but not failed condition. These do not equate to a bit error rate
(BER). CRC-Monitor provides a CRC error rate. Symbol-Monitor provides a symbol error rate.
The configured error thresholds are compared to the operator specified sliding window to
determine if one or both of the thresholds have been crossed. Statistics are gathered every
second. This means that every second the oldest statistics are dropped from the calculation.
The default 10 second sliding window means that at the 11th second the oldest 1 second
statistical data is dropped and the 11th second is included.
Symbol error crossing differs slightly from CRC based error crossing. The error threshold
crossing is calculated based on the window size and the fixed number of symbols that will arrive
(ingress) that port during that window. The following configuration is used to demonstrate this
concept.
config>port>ethernet# info detail
----------------------------------------------
symbol-monitor
sd-threshold 5 multiplier 5
sf-threshold 3 multiplier 5
no shutdown
exit
show port 2/1/2 ethernet
======================================================================
=========
Ethernet Interface
======================================================================
=========
Description : 2/1/2
Interface : 2/1/2 Oper Speed : N/A
Link-
level : Ethernet Config Speed : 1 Gbps
Admin State : down Oper Duplex : N/A
Oper State : down Config Duplex : ful
l
Physical Link : No MTU : 921
2
Single Fiber Mode : No Min Frame Length : 64
Bytes
IfIndex : 69271552 Hold time up : 0 s
econds
Last State Change : 06/29/2014 05:04:12 Hold time down : 0 s
econds
Last Cleared Time : N/A DDM Events : Ena
bled
Phys State Chng Cnt: 0
Configured Mode : network Encap Type : nul
l
Dot1Q Ethertype : 0x8100 QinQ Ethertype : 0x8
100
PBB Ethertype : 0x88e7
Ing. Pool % Rate : 100 Egr. Pool % Rate : 100
Ing. Pool Policy : n/a
Egr. Pool Policy : n/a
Net. Egr. Queue Pol: default
Egr. Sched. Pol : n/a
Auto-
negotiate : true MDI/MDX : unknown
Oper Phy-tx-clock : not-applicable
Accounting Policy : None Collect-
stats : Disabled
Acct Plcy Eth Phys : None Collect Eth Phys : Dis
abled
Egress Rate : Default Ingress Rate : Def
ault
Load-balance-
algo : Default LACP Tunnel : Disabled
Down-when-looped : Disabled Keep-alive : 10
Loop Detected : False Retry : 120
Use Broadcast Addr : False
Sync. Status Msg. : Disabled Rx Quality Level : N/A
Tx DUS/DNU : Disabled Tx Quality Level : N/A
SSM Code Type : sdh
Down On Int. Error : Disabled
CRC Mon SD Thresh : Disabled CRC Mon Window : 10
seconds
CRC Mon SF Thresh : Disabled
Sym Mon SD Thresh : 5*10E-
5 Sym Mon Window : 10 seconds
Sym Mon SF Thresh : 5*10E-3 Tot Sym Mon Errs : 0
EFM OAM : Disabled EFM OAM Link Mon : Dis
abled
Configured Address : 8c:90:d3:a0:c7:42
Hardware Address : 8c:90:d3:a0:c7:42
Transceiver Data
Transceiver Status : not-equipped
======================================================================
=========
Traffic Statistics
======================================================================
=========
Input
Output
----------------------------------------------------------------------
---------
Octets 0
0
Packets 0
0
Errors 0
0
======================================================================
=========
======================================================================
=========
Port Statistics
======================================================================
=========
Input
Output
----------------------------------------------------------------------
---------
Unicast Packets 0
0
Multicast Packets 0
0
Broadcast Packets 0
0
Discards 0
0
Unknown Proto Discards 0
======================================================================
=========
======================================================================
=========
Ethernet-like Medium Statistics
======================================================================
=========
Alignment Errors : 0 Sngl Collisions :
0
FCS Errors : 0 Mult Collisions :
0
SQE Test Errors : 0 Late Collisions :
0
CSE : 0 Excess Collisns :
0
Too long Frames : 0 Int MAC Tx Errs :
0
Symbol Errors : 0 Int MAC Rx Errs :
0
In Pause Frames : 0 Out Pause Frames :
0
======================================================================
=========

The above configuration results in an SD threshold of 5*10E-5 (0.00005) and an SF threshold of


5*10E-3 (0.005) over the default 10 second window. If this port is a 1GbE port supporting
symbol monitoring then the error rate is compared against 1,250,000,000 symbols (10 seconds
worth of symbols on a 1GbE port 125,000,000). If the error count in the current 10 second
sliding window is less than 62,500 then the error rate is below the signal degrade threshold and
no action is taken. If the error count is between 62,501 and 6,250,000 then the error rate is
above signal degrade but has not breached the signal failure signal threshold and a log event
will be raised. If the error count is above 6,250,000 the signal failure threshold is crossed and
the port will enter an operation state of down. Consider that this is a very simple example meant
to demonstrate the function and not meant to be used as a guide for configuring the various
thresholds and window times.
A port is not returned to service automatically when a port enters the failed condition as a result
of crossing a signal failure threshold for both CRC-Monitor and Symbol-Monitor. Since the port
is operationally down without a physical link error monitoring stops. The operator may enable
the port using the shutdown and no shutdown port commands. Other port transition functions
like clearing the MDA or slot, removing the cable, and other physical link transition functions.

802.3ah OAM
802.3ah Clause 57 (efm-oam) defines the Operations, Administration, and Maintenance (OAM)
sub-layer, which provides mechanisms useful for monitoring link operation such as remote fault
indication and remote loopback control. In general, OAM provides network operators the ability
to monitor the health of the network and quickly determine the location of failing links or fault
conditions. efm-oam described in this clause provides data link layer mechanisms that
complement applications that may reside in higher layers.
OAM information is conveyed in slow protocol frames called OAM protocol data units
(OAMPDUs). OAMPDUs contain the appropriate control and status information used to monitor,
test and troubleshoot OAM-enabled links. OAMPDUs traverse a single link, being passed
between peer OAM entities, and as such, are not forwarded by MAC clients (like bridges or
switches).
The following efm-oam functions are supported:
 efm-oam capability discovery
 Active and passive modes
 Remote failure indication — Handling of critical link events (link fault, dying gasp, etc.)
 Loopback — A mechanism is provided to support a data link layer frame-level loopback mode.
Both remote and local loopback modes are supported
 efm-oam PDU tunneling
 High resolution timer for efm-oam in 100ms interval (minimum)
 efm-oam link monitoring
 Non-zero Vendor Specific Information Field — The 32-bit field is encoded using the format
00:PP:CC:CC and references TIMETRA-CHASSIS-MIB.
 00 — Must be zeroes
 PP — Platform type based on the installed IOM from tmnxHwEquippedPlatform. Mixed mode
deployments may yield different platform values in the same chassis. Since this is IOM-specific,
the IOM’s unique hardware ID (tmnxCardHwIndex) must be included to retrieve the proper
value.
 CC:CC — Chassis type index value from tmnxChassisType which is indexed in
tmnxChassisTypeTable. The table identifies the specific chassis backplane.
The value 00:00:00:00 is sent for all releases that do not support the non-zero value or are
unable to identify the required elements. There is no decoding of the peer or local vendor
information fields on the network element. The hexadecimal value is included in the show
port port-id ethernet efm-oam output.
When the efm-oam protocol fails to negotiate a peer session or encounters a protocol failure
following an established session the Port State will enter the Link Up condition. This port state is
used by many protocols to indicate the port is administratively UP and there is physical
connectivity but a protocol, such as efm-oam, has caused the ports operational state to enter a
DOWN state. A reason code has been added to help discern if the efm-oam protocol is the
underlying reason for the Link Up condition.
show port
======================================================================
=========
Ports on Slot 1
======================================================================
=========
Port Admin Link Port Cfg Oper LAG/ Port Port Port C/QS/S/
XFP/
Id State State MTU MTU Bndl Mode Encp Type MDIMDX
----------------------------------------------------------------------
---------
1/1/1 Down No Down 1578 1578 - netw null xcme
1/1/2 Down No Down 1578 1578 - netw null xcme
1/1/3 Up Yes Link Up 1522 1522 - accs qinq xcme
1/1/4 Down No Down 1578 1578 - netw null xcme
1/1/5 Down No Down 1578 1578 - netw null xcme
1/1/6 Down No Down 1578 1578 - netw null xcme
# show port 1/1/3
======================================================================
=========
Ethernet Interface
======================================================================
=========
Description : 10/100/Gig Ethernet SFP
Interface : 1/1/3 Oper Speed : N/A
Link-
level : Ethernet Config Speed : 1 Gbps
Admin State : up Oper Duplex : N/A
Oper State : down Config Duplex : ful
l
Reason Down : efmOamDown
Physical Link : Yes MTU : 152
2
Single Fiber Mode : No Min Frame Length : 64
Bytes
IfIndex : 35749888 Hold time up : 0 s
econds
Last State Change : 12/18/2012 15:58:29 Hold time down : 0 s
econds
Last Cleared Time : N/A DDM Events : Ena
bled
Phys State Chng Cnt: 1
Configured Mode : access Encap Type : Qin
Q
Dot1Q Ethertype : 0x8100 QinQ Ethertype : 0x8
100
PBB Ethertype : 0x88e7
Ing. Pool % Rate : 100 Egr. Pool % Rate : 100
Ing. Pool Policy : n/a
Egr. Pool Policy : n/a
Net. Egr. Queue Pol: default
Egr. Sched. Pol : n/a
Auto-
negotiate : true MDI/MDX : unknown
Oper Phy-tx-clock : not-applicable
Accounting Policy : None Collect-
stats : Disabled
Acct Plcy Eth Phys : None Collect Eth Phys : Dis
abled
Egress Rate : Default Ingress Rate : Def
ault
Load-balance-
algo : Default LACP Tunnel : Disabled
Down-when-looped : Disabled Keep-alive : 10
Loop Detected : False Retry : 120
Use Broadcast Addr : False
Sync. Status Msg. : Disabled Rx Quality Level : N/A
Tx DUS/DNU : Disabled Tx Quality Level : N/A
SSM Code Type : sdh
Down On Int. Error : Disabled
CRC Mon SD Thresh : Disabled CRC Mon Window : 10
seconds
CRC Mon SF Thresh : Disabled
Configured Address : d8:ef:01:01:00:03
Hardware Address : d8:ef:01:01:00:03
The operator also has the opportunity to decouple the efm-oam protocol from the port state and
operational state. In cases where an operator wants to remove the protocol, monitor the
protocol only, migrate, or make changes the ignore-efm-state can be configured in
the port>ethernet>efm-oam context. When the ignore-efm-state command is configured on a
port the protocol continues as normal. However, any failure in the protocol state machine
(discovery, configuration, time-out, loops, etc.) will not impact the port on which the protocol is
active and the optional ignore command is configured. There will only be a protocol warning
message if there are issues with the protocol. The default behavior when this optional command
is not configured means the port state will be affected by any efm-oam protocol fault or clear
conditions. Adding and removing this optional ignore command will immediately represent
the Port State and Oper State based on the active configuration. For example, if the ignore-
efm-state is configured on a port that is exhibiting a protocol error that protocol error does not
affect the port state or operational state and there is no Reason Down code. If the ignore-efm-
state is removed from a port with an existing efm-oam protocol error, the port will transition
to Link UP, Oper Down with the reason code efmOamDown.

OAM Events
The Information OAMPDU is transmitted by each peer at the configured intervals. This
OAMPDU performs keepalive and critical notification functions. Various local conditions are
conveyed through the setting of the Flags field. The following Critical Link Event defined in IEEE
802.3 Section 57.2.10.1 are supported;
 Link Fault: The PHY has determined a fault has occurred in the receive direction of the local
DTE
 Dying Gasp: An unrecoverable local failure condition has occurred
 Critical Event: An unspecified critical event has occurred
The local node can set an unset the various Flag fields based on the operational state of the
port, shutdown or activation of the efm-oam protocol or locally raised events. These Flag fields
maintain the setting for the continuance of a particular event. Changing port conditions, protocol
state or operator intervention may impact the setting of these fields in the Information OAMPDU.
A peer processing the Information OAMPDU can take a configured action when one or more of
these Flag fields are set. By default, receiving a set value for any of the Flag fields will cause
the local port to enter the previous mentioned Link Up port state and an event will be logged. If
this default behavior is not desired, the operator may choose to log the event without affecting
the local port. This is configurable per Flag field using the options
under config>port>ethernet>efm-oam>peer-rdi-rx.

Link Monitoring
The efm-oam protocol provides the ability to monitor the link for error conditions that may
indicate the link is starting to degrade or has reached an error rate that exceeds acceptable
threshold.
Link monitoring can be enabled for three types of frame errors; errored-frame, errored-frame-
period and errored-frame-seconds. The errored-frame monitor is the number of frame errors
compared to the threshold over a window of time. The errored-frame-period monitor is the
number of frame errors compared to the threshold over a window of number of received
packets. This window is checked once per second to see if the window parameter has been
reached. The errored-frame-seconds monitor is the number of errored seconds compared to
the threshold over a window of time. An errored second is any second with a single frame error.
An errored frame is counted when any frame is in error as determined by the Ethernet physical
layer, including jabbers, fragments, FCS or CRC and runts. This excludes jumbo frames with a
byte count higher than 9212, or any frame that is dropped by the phy layer prior to reaching the
monitoring function.
Each frame error monitor functions independently of other monitors. Each of monitor
configuration includes an optional signal degrade threshold sd-threshold, a signal failure
threshold sf-threshold, a window and the ability to communicate failure events to the peer by
setting a Flag field in the Information OAMPDU or the generation of the Event Notification
OAMPDU, event-notification. The parameters are uniquely configurable for each monitor.
A degraded condition is raised when the configured signal degrade sd-threshold is reached.
This provides a first level log only action indicating a link could become unstable. This event
does not affect the port state. The critical failure condition is raised when the configured sf-
threshold is reached. By default, reaching the signal failure threshold will cause the port to
enter the Link Up condition unless the local signal failure local-sf-actionhas been modified to
a log-only action. Signal degrade conditions for a monitor in signal failed state will be
suppressed until the signal failure has been cleared.
The initial configuration or the modification of either of the threshold values will take affect in the
current window. When a threshold value for a monitor is modified, all active local events for that
specific monitor will be cleared. The modification of the threshold acts the same as
the clear command described later in this section.
Notification to the peer is required to ensure the action taken by the local port detecting the error
and its peer are synchronized. If peers do not take the same action then one port may remain
fully operational while the other enters a non-operational state. These threshold crossing events
do not shutdown the physical link or cause the protocol to enter a non-operational state. The
protocol and network element configuration is required to ensure these asymmetrical states do
not occur. There are two options for exchanging link and event information between peers;
Information OAMPDU and the Event Notification OAMPDU.
As discussed earlier, the Information OAMPDU conveys link information using the Flags field;
dying gasp, critical link and link fault. This method of communication has a number of significant
advantages over the Event Notification OAMPDU. The Information OAMPDU is sent at every
configured transmit-interval. This will allow the most recent information to be sent between
peers, a critical requirement to avoid asymmetrical forwarding conditions. A second major
advantage is interoperability with devices that do not support Link Monitoring and vendor
interoperability. This is the lowest common denominator that offers a robust communication to
convey link event information. Since the Information OAMPDU is already being sent to maintain
the peering relationship this method of communication adds no additional overhead. The local-
sf-action options allow the dying gasp and critical event flags to be set in the Information
OAMPDU when a signal failure threshold is reached. It is suggested that this be used in place
of or in conjunction with Event Notification OAMPDU.
Event Notification OAMPDU provides a method to convey very specific information to a peer
about various Link Events using Link Event TLVs. A unique Event Notification OAMPDU will be
generated for each unique frame error event. The intension is to provide the peer with the
Sequence Number, Event Type, Timestamp, and the local information that caused the
generation of the OAMPDU; window, threshold, errors and error running total and event running
total specific to the port.
 Sequence Number: The unique identification indicating a new event.
 Window: The size of the unique measurement period for the error type. The window is only
checked at the end. There is not mid-window checking.
 Threshold: The value of the configured sf-threshold
 Errors: The errors counted in that specific window
 Error Running Total: The number of errors accumulated for that event type since monitoring
started and the protocol and port have been operational or a reset function has occurred
 Event Running Total: The number of events accumulated for that event type since the
monitoring started and the protocol and port have been operational
By default, the Event Notification OAMPDU is generated by the network element detecting the
signal failure event. The Event Notification OAMPDU is sent only when the initial frame event
occurs. No Event Notification OAMPDU is sent when the conditions clears. A port that has been
operationally affected as a result of a Link Monitoring frame error event must be recovered
manually. The typical recovery method is to shutdown the port and no shutdown the port. This
will clear all events on the port. Any function that affects the port state, physical fiber pull, soft or
hard reset functions, protocol restarts, etc will also clear the all local and remote events on the
affected node experiencing the operation. None of these frame errors recovery actions will
cause the generation of the Event Notification OAMPDU. If the chosen recovery action is not
otherwise recognized by the peer and the Information OAMPDU Flag fields have not been
configured to maintain the current event state, there is a high probability that the ports will have
different forwarding states, notwithstanding any higher level protocol verification that may be in
place.
A burst of between one and five Event Notification OAMPDU packets may be sent. By default,
only a single Event Notification OAMPDU is generated, but this value can be changed under
the local-sf-action context. An Event Notification OAMPDU will only be processed if the peer
had previously advertised the EV capability. The EV capability is an indication the remote peer
supports link monitoring and may send the Event Notification OAMPDU.
The network element receiving the Event Notification OAMPDU will use the values contained in
the Link event TLVs to determine if the remote node has exceeded the failure threshold. The
locally configured action will determine how and if the local port is affected. By default,
processing of the Event Notification OAMPDU is log only and does not affect the port state. By
default, processing of the Information OAMPDU Flag fields is port affecting. When Event
Notification OAMPDU has been configured as port affecting on the receiving node, action is
only taken when errors are equal to or above the threshold and the threshold value is not zero.
No action is taken when the errors value is less than the threshold or the threshold is zero.
Symbol error, errored-symbols, monitoring is also supported but requires specific hardware
revisions and the appropriate code release. The symbol monitor differs from than the frame
error monitors. Symbols represent a constant load on the Ethernet wire whether service frames
are present or not. This means the optional signal degrade threshold sd-threshold has an
additional purpose when configured as part of the symbol error monitor. When the signal
degrade threshold is not configured, the symbol monitor acts similar to the frame error monitors,
requiring manual intervention to clear a port that has been operationally affected by the monitor.
When the optional signal degrade threshold is configured, it again represents the first level
warning. However, it has an additional function as part of the symbol monitor. If a signal failure
event has been raised, the configured signal degrade threshold becomes the equivalent to a
lowering threshold. If a subsequent window does not reach the configured signal degrade
threshold then the previous event will be cleared and the previously affected port will be
returned to service without operator intervention. This return to service will automatically clear
any previously set Information OAMPDU Flags fields set as a result of the signal failure
threshold. The Event Notification OAMPDU will be generated with the symbol error Link TLV
that contains an error count less than the threshold. This will indicate to the peer that initial
problem has been resolved and the port should be returned to service.
The errored-symbol window is a measure of time that is automatically converted into the
number of symbols for that specific medium for that period of time. The standard MIB entries
“dot3OamErrSymPeriodWindowHi” and “dot3OamErrSymPeriodWindowLo” are marked as
read-only instead of read-write. There is now way to directly configure these values. The
configuration of the window will convert the time and program those two MIB values in an
appropriate manner. Both the configured window and the number of symbols will be displayed
under the show port port-id ethernet efm-oam command.
show port 1/1/1 ethernet efm-oam
======================================================================
=========
Ethernet Oam (802.3ah)
======================================================================
=========
Admin State : up
Oper State : operational
Mode : active
Pdu Size : 1518
Config Revision : 0
Function Support : LB
Transmit Interval : 1000 ms
Multiplier : 5
Hold Time : 0
Tunneling : false
Loop Detected : false
Grace Tx Enable : true (inactive)
Grace Vendor OUI : 00:16:4d
Dying Gasp on Reset: true (inactive)
Soft Reset Tx Act : none
Trigger Fault : none
Vendor OUI : 00:16:4d (alu)
Vendor Info : 00:01:00:02
Peer Mac Address : d8:1c:01:02:00:01
Peer Vendor OUI : 00:16:4d (alu)
Peer Vendor Info : 00:01:00:02
Peer Mode : active
Peer Pdu Size : 1518
Peer Cfg Revision : 0
Peer Support : LB
Peer Grace Rx : false
Loopback State : None
Loopback Ignore Rx : Ignore
Ignore Efm State : false
Link Monitoring : disabled
Peer RDI Rx
Critical Event : out-of-service
Dying Gasp : out-of-service
Link Fault : out-of-service
Event Notify : log-only
Local SF Action Discovery
Event Burst : 1 Ad Link Mon Cap : yes
Port Action : out-of-service
Dying Gasp : disabled
Critical Event : disabled
Errored Frame Errored Frame Period
Enabled : no Enabled : no
Event Notify : enabled Event Notify : enabled
SF Threshold : 1 SF Threshold : 1
SD Threshold : disabled (0) SD Threshold : disabled
(0)
Window : 10 ds Window : 1488095 f
rames
Errored Symbol Period Errored Frame Seconds Summary
Enabled : no Enabled : no
Event Notify : enabled Event Notify : enabled
SF Threshold : 1 SF Threshold : 1
SD Threshold : disabled (0) SD Threshold : disabled
(0)
Window (time) : 10 ds Window : 600 ds
Window (symbols) : 125000000
======================================================================
=========
Active Failure Ethernet OAM Event Logs
======================================================================
=========
Number of Logs : 0
======================================================================
=========
======================================================================
=========
Ethernet Oam Statistics
======================================================================
=========
Input
Output
----------------------------------------------------------------------
---------
Information 238522
238522
Loopback Control 0
0
Unique Event Notify 0
0
Duplicate Event Notify 0
0
Unsupported Codes 0
0
Frames Lost
0
======================================================================
=========
A clear command “clear port port-id ethernet efm-oam events [local | remote]” has been
added to clear port affecting events on the local node on which the command is issued. When
the optional [local | remote] options are omitted, both local and remote events will be cleared
for the specified port. This command is not specific to the link monitors as it clears all active
events. When local events are cleared, all previously set Information OAMPDU Flag fields will
be cleared regardless of the cause the event that set the Flag field.
In the case of symbol errors only, if Event Notification OAMPDU is enabled for symbol errors
and a local symbol error signal failure event exists at the time of the clear, the Event Notification
OAMPDU will be generate with an error count of zero and the threshold value reflecting the
local signal failure threshold. The fact the error values is lower than threshold value indicates
the local node is not in a signal failed state. The Event Notification OAMPDU is not generated in
the case where the clear command is used to clear local frame error events. This is because
frame error event monitors will only act on an Event Notification OAMPDU when the error value
is higher than the threshold value, a lower value is ignored. As stated previously, there is no
automatic return to service for frame errors.
If the clear command is used to clear remote events, events conveyed to the local node by the
peer, no notification is generated to the peer to indicate a clear function has been performed.
Since the Event Notification OAMPDU is only sent when the initial event was raised, there is no
further Event Notification and blackholes can result. If the Information OAMPDU Flag fields are
used to ensure a constant refresh of information, the remote error will be reinstated as soon as
the next Information OAMPDU arrives with the appropriate Flag field set.
Local and remote efm-oam port events are stored in the efm-oam event logs. These logs
maintain and display active and cleared signal failure degrade events. These events are
interacting with the efm-oam protocol. This logging is different than the time stamped events for
information logging purposes included with the system log. To view these events, the event-
log option has been added to the show port port-id ethernet efm-oamcommand. This includes
the location, the event type, the counter information or the decoded Network Event TLV
information, and if the port has been affected by this active event. A maximum of 12 port events
will be retained. The first three indexes are reserved for the three Information Flag fields, dying
gasp, critical link, and link fault. The other nine indexes will maintain the current state for the
various error monitors in a most recent behavior and events can wrap the indexes, dropping the
oldest event.
In mixed environments where Link Monitoring is supported on one peer but not the other the
following behavior is normal, assuming the Information OAMPDU has been enabled to convey
the monitor fault event. The arriving Flag field fault will trigger the efm-oam protocol on the
receiving unsupportive node to move from operational to “send local and remote”. The protocol
on the supportive node that set the Flag field to convey the fault will enter the “send local and
remote ok” state. The supportive node will maintain the Flag field setting until the condition has
cleared. The protocol will recover to the operational state once the original event has cleared;
assuming no other fault on the port is preventing the negotiation from progressing. If both nodes
were supportive of the Link Monitoring process, the protocol would remained operational.
In summary, Link monitors can be configured for frame and symbol monitors (specific hardware
only). By default, Link Monitoring and all monitors are shutdown. When the Link Monitoring
function is enabled, the capability (EV) will be advertised. When a monitor is enabled, a default
window size and a default signal failure threshold are activated. The local action for a signal
failure threshold event is to shutdown the local port. Notification will be sent to the peer using
the Event Notification OAMPDU. By default, the remote peer will not take any port action for the
Event Notification OAMPDU. The reception will only be logged. It is suggested the operator
evaluate the various defaults and configure the local-sf-action to set one of the Flag fields in
the Information OAMPDU using the info-notifications command options when fault notification
to a peer is required. Vendor specific TLVs and vendors specific OAMPDUs are just that,
specific to that vendor. Non-ALU vendor specific information will not be processed.

Capability Advertising
A supported capability, sometimes requiring activation, will be advertised to the peer. The EV
capability is advertisement when Link Monitoring is active on the port. This can be disabled
using the optional command no link-monitoring under the config>port>ethernet>efm-
oam>discovery>advertise-capabilities.

Remote Loopback
EFM OAM provides a link-layer frame loopback mode that can be remotely controlled.
To initiate remote loopback, the local EFM OAM client sends a loopback control OAM PDU by
enabling the OAM remote-loopback command. After receiving the loopback control OAM PDU,
the remote OAM client puts the remote port into local loopback mode.
To exit remote loopback, the local EFM OAM client sends a loopback control OAM PDU by
disabling the OAM remote-loopback command. After receiving the loopback control OAM PDU,
the remote OAM client puts the port back into normal forwarding mode.
Note that during remote loopback test operation, all frames except EFM OAM PDUs are
dropped at the local port for the receive direction, where remote loopback is enabled. If local
loopback is enabled, then all frames except EFM OAM PDUs are dropped at the local port for
both the receive and transmit directions. This behavior may result in many protocols (such as
STP or LAG) resetting their state machines.
When a port is in loopback mode, service mirroring will not work if the port is a mirror-source or
a mirror-destination.

802.3ah OAM PDU Tunneling for Epipe


Service
Alcatel-Lucent routers support 802.3ah. Customers who subscribe to Epipe service treat the
Epipe as a wire, so they demand the ability to run 802.3ah between their devices which are
located at each end of the Epipe.
This feature only applies to port-based Epipe SAPs because 802.3ah runs at port level not
VLAN level. Hence, such ports must be configured as null encapsulated SAPs.
When OAM PDU tunneling is enabled, 802.3ah OAM PDUs received at one end of an Epipe are
forwarded through the Epipe. 802.3ah can run between devices that are located at each end of
the Epipe. When OAM PDU tunneling is disabled (by default), OAM PDUs are dropped or
processed locally according to the efm-oam configuration (shutdown or no shutdown).
Note that by enabling 802.3ah for a specific port and enabling OAM PDU tunneling for the same
port are mutually exclusive. Enforcement is performed at the CLI level.

802.3ah Grace Announcement


Support for vendor-specific soft reset graceful recovery has been added to the SRoS
implementation of the EFM-OAM protocol. This is configured using the grace-tx-
enable command under the config>system>ethernet>efm-oam and
the config>port>ethernet>efm-oam contexts. This feature is not enabled by default. When this
functionality is enabled, the EFM-OAM protocol does not enter a non-operational state when
both nodes acknowledge the grace function. The ports associated with the hardware that has
successfully executed the soft reset will clear all local and remote events. The peer that
acknowledges the graceful restart procedure for EFM-OAM will clear all remote events that it
received from the peer that performed the soft reset. The local events will not be cleared on the
peer that has not undergone soft reset. The Information OAM PDU Flag fields are critical in
propagating the local event to the peer. The Event Notification OAM PDU will not be sent
because it is only sent when the event is initially raised.
A vendor-specific Grace TLV will be included in the Information PDU generated as part of the
802.3ah OAM protocol when a network element undergoes an ISSU function. Nodes that
support the Soft Rest messaging functions will allow the local node to generate the grace TLV.
The grace TLV is used to inform a remote peer that the negotiated interval and multiplier should
be ignored and the new 900s timeout interval should be used to timeout the session. The peer
receiving the Grace TLV must be able to parse and process the vendor specific messaging.
The new command grace-tx-enable has been introduced to enable this functionality. This
command exists at two levels of the hierarchy, system level and port level. By default this
functionality is enabled on the port. At the system level this command defaults to disabled. In
order to enable this functionality both the port and the system commands must be enabled. If
either is not enabled then the combination will not allow those ports to generate the vendor
specific Grace TLV. This functionality must be enabled at both the system and port level prior to
the ISSU or soft reset function. If this is enabled during a soft reset or after the ISSU function is
already in progress it will have no affect during that window. Both Passive and Active 802.3ah
OAM peers can generate the Grace TVL as part of the informational PDU.
There is no command to enable this on the receiving node. As long as the receiver understands
and can parse the Grace TLV it will enter the grace mode of operation.
The basic protocol flow below helps demonstrate the interaction between passive-active and
active-active peer combinations supporting the Grace TLV. In the first diagram the passive node
is entering an ISSU on a node that supports soft reset capabilities.
In Figure 35 and Figure 36, the Active node is experiencing the ISSU function on a node that
supports soft reset capabilities.

Figure 35: Grace TLV Passive Node with Soft Reset

Figure 36: Grace TLV Active Node with Soft Reset

The difference between the two is subtle but important. When an active node performs this
function it will generate an Informational TLV with the Local TLV following the successful soft
reset. When it receives the Information PDU with the Grace Ack it will send its own Information
PDU with both Local and Remote TLV completed. This will complete the protocol restart. When
a passive node is reset the passive port will wait to receive the 802.3ah OAM protocol before
sending its own Information PDU with both the Local and Remote TLV thus completing the
protocol restart.
The renegotiation process allows the node which experienced the soft reset to rebuild the
session without having to restart the session from the discovery phase. This significantly
reduces the impact of the native protocol on data forwarding.
Any situation that could cause the renegotiation to fail will force the protocol to revert to the
discovery phase and fail the graceful restart. During a Major ISSU when the EFM-OAM session
is held operational by the Grace function, if the peer MAC address of the session changes,
there will be no log event raised for the MAC address change.
The vendor-specific grace function benefits are realized when both peers support the
transmitting, receiving and processing of the vendor-specific Grace TLV. In the case of mixed
code versions, products, or vendor environments, a standard EFM-OAM message to the peer
can be used to instruct the peer to treat the session as failed. When the command dying-gasp-
tx-on-reset is active on a port, the soft reset function triggers ETH-OAM to set the dying gasp
flag or critical event flag in the Information OAMPDU. An initial burst of three Informational OAM
PDUs will be sent using a one second spacing, regardless of the protocol interval. The peer
may process these flags to affect its port state and take the appropriate action. The control of
the local port state where the soft reset is occurring is left to the soft reset function. This EFM-
OAM function does not affect local port state. If the peer has acted on the exception flags and
affected its port state, then the local node must take an action to inform the upstream nodes that
a condition has occurred and forwarding is no longer possible. Routing protocols like ISIS and
OSPF overload bits are typically used in routed environments to accomplish this notification.
This feature is similar to grace-tx-enable. Intercepting system messaging, when the feature is
active on a port (enabled both at the port and at the system level) and when the messaging
occurs, is a similar concept. However, because the dying-gasp-tx-on-reset command is not a
graceful function it is interruptive and service affecting. Using dying-gasp-tx-on-reset requires
peers to reestablish the peering session from an initial state, not rebuild the state from previous
protocol information. The transmission of the dying gasp or the critical event commences when
the soft reset occurs and continues for the duration of the soft reset.
If both functions are active on the same port, the grace-tx-enable function is preferred if the
peer is setting and sending the Vendor OUI to 00:16:4d (ALU) in the Information OAMPDU. In
this situation, the dying gasp function will not be invoked. A secondary Vendor OUI can be
configured using the grace-vendor-oui oui command, should an additional Vendor OUI prefer
to support the reception, parsing, and processing of the vendor-specific grace message instead
of the dying gasp. If only one of those functions is active on the port then that specific function
will be called. The grace function should not be enabled if the peer Vendor OUI is equal to
00:16:4d (ALU) and the peer does not support the grace function.
ETH-OAM allows generation of a fault condition by using the trigger-fault {dying-
gasp | critical-event} command. This sets the appropriate flag fields in the Information
OAMPDU and transitions a previously operational local port to Link Up. Removing this
command from the configuration stops the flags from being set and allows the port to return to
service, assuming no other faults would prevent this resumption of service. In cases where a
port must be administratively shut down, this command can be used to signal a peer using the
EFM-OAM protocol, and the session should be considered failed.
These features do not support the clearing of an IOM which does not trigger a soft reset. IOM
clearing is a forceful event that does not trigger graceful protocol renegotiation.
A number of show commands have been enhanced to help operators determine the state of
the802.3ah OAM Grace function and whether or not the peer is generating or receiving the
Grace TLV.
System level information can be viewed using the show system info command.
show system information
======================================================================
=========
System Information
======================================================================
=========
System Name : system-name
System Type : 7750 SR-12
System Version : 11.0r4
System Contact :
System Location :
System Coordinates :
System Active Slot : A
System Up Time : 62 days, 20:29:48.96 (hr:min:sec)
…snip…
EFM OAM Grace Tx Enable: False
======================================================================
=========

EFM OAM Grace Tx Enable:


 False — The system level functionality is not enabled. Grace will not be generated on any ports
regardless of the state of the option on the individual ports
 True — The system level functionality is enabled and the determination of whether to send
grace is base on the state of the option configured at the port level
Individual ports also contain information about the current port configuration and whether or not
the Grace TLV is being sent or received.
Grace Tx Enable has two enable states with the current state in brackets to the right.
 False — The port level functionality is not enabled. Grace will not be generated on the port
regardless of the state of the option at the system level.
 True — The port level functionality is enabled and the determination of whether to send grace is
based on the state of the option configured at the system level
 (inactive) Not currently sending Grace TLV
 (active) Currently sending the Grace TLV as part of the Information PDU
Peer Grace Rx
 False — Not receiving Grace TLV from the peer
 True — Receiving Grace TLV from the peer

MTU Configuration Guidelines


Observe the following general rules when planning your service and physical MTU
configurations:
 The router must contend with MTU limitations at many service points. The physical (access and
network) port, service, and SDP MTU values must be individually defined.
 Identify the ports that will be designated as network ports intended to carry service traffic.
 MTU values should not be modified frequently.
 MTU values must conform to both of the following conditions:
 The service MTU must be less than or equal to the SDP path MTU.
 The service MTU must be less than or equal to the access port (SAP) MTU.

Default MTU Values


Table 27 shows the default MTU values which are dependent upon the (sub-) port type, mode,
and encapsulation.

Table 27: MTU Default Values

Port Type Mode Encap Type Default (bytes)

Ethernet access null 1514

Ethernet access dot1q 1518

Fast Ethernet network — 1514

Other Ethernet network — 9212 1

SONET path or TDM channel access BCP-null 1518

SONET path or TDM channel access BCP-Dot1q 1522

SONET path or TDM channel access IPCP 1502

SONET path or TDM channel network — 9208

SONET path or TDM channel access frame-relay 1578

SONET path or TDM channel access atm 1524


Note:
1. The default MTU for Ethernet ports other than Fast Ethernet is actually the lesser of 9212 and
any MTU limitations imposed by hardware which is typically 16K.

Modifying MTU Defaults


MTU parameters must be modified on the service level as well as the port level.
 The service-level MTU parameters configure the service payload (Maximum Transmission Unit
– MTU) in bytes for the service ID overriding the service-type default MTU.
 The port-level MTU parameters configure the maximum payload MTU size for an Ethernet port
or SONET/SDH SONET path (sub-port) or TDM port/channel, or a channel that is part of a
multilink bundle or LAG.
The default MTU values must be modified to ensure that packets are not dropped due to frame
size limitations. The service MTU must be less than or equal to both the SAP port MTU and the
SDP path MTU values. When an SDP is configured on a network port using default port MTU
values, the operational path MTU can be less than the service MTU. In this case, enter
the show service sdp command to check the operational state. If the operational state is
down, then modify the MTU value accordingly.

Configuration Example
In order for the maximum length service frame to successfully travel from a local ingress SAP to
a remote egress SAP, the MTU values configured on the local ingress SAP, the SDP (GRE or
MPLS), and the egress SAP must be coordinated to accept the maximum frame size the service
can forward. For example, the targeted MTU values to configure for a distributed Epipe service
(ALA-A and ALA-B) are shown in Figure 37.

Figure 37: MTU Configuration Example

Since ALA-A uses Dot1q encapsulation, the SAP MTU must be set to 1518 to be able to accept
a 1514 byte service frame (see Table 27 for MTU default values). Each SDP MTU must be set
to at least 1514 as well. If ALA-A’s network port (2/1/1) is configured as an Ethernet port with a
GRE SDP encapsulation type, then the MTU value of network ports 2/1/1 and 3/1/1
must each be at least 1556 bytes (1514 MTU + 28 GRE/Martini + 14 Ethernet). Finally, the MTU
of ALA-B’s SAP (access port 4/1/1) must be at least 1514, as it uses null encapsulation.
Table 28 shows sample MTU configuration values.

Table 28: MTU Configuration Example Values

ALA-A ALA-B

Access (SAP) Network Network Access (SAP)

Port (slot/MDA/port) 1/1/1 2/1/12 3/1/1 4/1/1

Mode type dot1q network network null

MTU 1518 1556 1556 1514

Deploying Preprovisioned
Components
When a card, CMA, MDA, XCM or XMA is installed in a preprovisioned slot, the device detects
discrepancies between the preprovisioned card type configurations and the types actually
installed. Error messages display if there are inconsistencies and the card will not initialize.
When the proper preprovisioned cards are installed into the appropriate chassis slot, alarm,
status, and performance details will display.

Configuring SFM5-12e Fabric Speed


With the introduction of SFM5-12e and the mini-SFM5-12e, a new tools command (set-fabric-
speed) was added to set the fabric operating speed. (tools command does not apply to SFM4-
12e fabric-speed-a). The 7750 SR-7 and 7750 SR-12 support fabric-speed-b.

fabric-speed-a
The 7750 SR-12e chassis defaults to the fabric-speed-a parameter when initially deployed with
SFM5-12e. The fabric-speed-a parameter operates at 200 GB per slot which permits a mix of
FP2/FP3 based cards to co-exist.

fabric-speed-b
The fabric-speed-b parameter enables the 7750 SR-12e to operate at up to 400 Gb/s, for
which all cards in the 7750 SR-12e are required to be T3 based (FP3 IMM and/or IOM3-XP-C).
The system will not support any FP2 based cards when the chassis is set to fabric-speed-b.

Configuration Process Overview


Figure 38 displays the process to provision chassis slots, cards, MDAs, and ports.

Figure 38: Slot, Card, MDA, and Port Configuration and Implementation Flow
Configuration Notes
The following information describes provisioning caveats:
 If a card or MDA type is installed in a slot provisioned for a different type, the card will not
initialize.
 A card or MDA installed in an unprovisioned slot remain administratively and operationally down
until the card type and MDA is specified.
 Ports cannot be provisioned until the slot, card and MDA type are specified.
 cHDLC does not support HDLC windowing features, nor other HDLC frame types such as S-
frames.
 cHDLC operates in the HDLC Asynchronous Balanced Mode (ABM) of operation.
 APS configuration rules:
 A physical port (either working or protection) must be shutdown before it can be removed from
an APS group port.
 For a single-chassis APS group, a working port must be added first. Then a protection port can
be added or removed at any time.
 A protection port must be shutdown before being removed from an APS group.
 A path cannot be configured on a port before the port is added to an APS group.
 A working port cannot be removed from an APS group until the APS port path is removed.
 When ports are added to an APS group, all path-level configurations are available only on the
APS port level and configuration on the physical member ports are blocked.
 For APS-protected bundles, all members of a working bundle must reside on the working port of
an APS group. Similarly all members of a protecting bundle must reside on the protecting circuit
of that APS group.

You might also like